License: MIT No Attribution

Jupyter Notebook 93.44% Python 6.56%

amazon-sagemaker-immersion-day's Introduction

Welcome to Sagemaker Immersion Labs

Amazon SageMaker Immersion Day help customers and partners to provide end to end understanding of building ML use cases from feature engineering to understanding various in-built algorithm and Train , Tune and Deploy the ML model in production like scenario. It guides you to bring your own model and perform on-premise ML workload Lift-and-Shift to Amazon SageMaker platform. It further demonstrate advance concept like Model Debugging , Model Monitoring and AutoML and guide to evaluate your machine leaning workload through AWS ML Well-architect lens.

Overview of the Labs

Find details on each lab here - https://sagemaker-immersionday.workshop.aws/

License

This library is licensed under the MIT-0 License. See the LICENSE file.

amazon-sagemaker-immersion-day's People

Contributors

Stargazers

Watchers

Forkers

genums kishorebhat-p cristian-popa fl0yd abhinavshankarv jgbrainstorm yudho fps697608 sermolin nataibi franmorillo pnpolcher sumsuzzaman frank-tan christiannolting rominauq shazi7804 sabrinazhengliu samx18 keerthicop tphanqc husseinahmed23 giuliocmsanto digitalproductsatphc 0zz10 nova-ch dgallitelli andreylh mehrdadadl monkeydflo 79212 kennyhe chubaoraka ggperezz shaunkhoo kmotohas bobbruno nijinjose georschi tom5610 amitkayal vikeshpandey daniloedu sidhukumar mf523 prasadmoreyara napkin-dl sysylee aeorfei animetauren techthiyanes melanie531 juliaangxy fullep01 lishuoleo sriharshams-aws coreysalzer-amazon kumfert marinaburkhardt realitian mbkan tommcm1200 mikkeltreuos chaoraw zhuocheng96 vmehmeri darveenvijayan mktoma728 haegendahl caldasdeoliveira izwanidris15 johnnypeng123 space4vv anuj2054 jigarmangukiya danyshmushtaq theisshe dchristian3188 whn09 jrreda mohammedwagdy sshankarn benoitdechateauvieux aadamqkharup chrislitianxin jsotiro bigdatavik arraafat jhdavino bolecodex wangchunyong168 jespereneberg cokecuevas-gg ramu007 lucy-t gonsoomoon-ml cwt515 jpangas jessixu vincentcabosart

amazon-sagemaker-immersion-day's Issues

Graphs not displaying in xgboost_debugger_demo.ipynb

Graphs are not displaying in xgboost_debugger_demo.ipynb when you run the plot_collection() function (or other functions that call plot_collection).

Steps to reproduce:

Run the Debugger lab under Lab 4: https://catalog.us-east-1.prod.workshops.aws/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US/lab4/debugger
Starting at Step 4 of the "Analyzing your debugger metrics" section, displaying plots/graphs will fail.

Fix:
Add plt.show() on the last line of the plot_collection() function.

Missing line in ML Pipelines scripts/pipeline.py

After modifying pipeline.py, codebuild project fails due to missing keyword argument in get_pipeline(). Please insert "sagemaker_project_arn=None" as argument in method definition.

Property does not exist on object

The property referenced on this line:

amazon-sagemaker-immersion-day/preprocessor.py

Line 9 in 374d812

event_data = inference_record.event_data

Does not exist and causes the invocations of the preprocessor to fail.
The error message from the python script is not flushed to the CloudWatch logs, and is therefore very hard for the user to identify. The only error that appears in the CloudWatch log is
Java.lang.Exception: Exception while invoking preprocessor script

However, even though the examples provide an example preprocessor script, this is not used by the example (commented out):

my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=predictor.endpoint_name,
    #record_preprocessor_script=pre_processor_script,
    post_analytics_processor_script=s3_code_postprocessor_uri,
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

If this line is included, the example will not work.

Please update the examples accordingly.

xgboost_debugger_demo.ipynb - TypeError: Descriptors cannot not be created directly / protobuf version

The existing protobuf package version resulted in runtime error: TypeError: Descriptors cannot not be created directly. I fixed this error by installing protobuf version 3.20.x or lower per https://protobuf.dev/news/2022-05-06/#python-updates.

!pip install protobuf==3.20.1

# cell 18
from smdebug.trials import create_trial

description = client.describe_training_job(TrainingJobName=job_name)
s3_output_path = xgboost_estimator.latest_job_debugger_artifacts_path()

# This is where we create a Trial object that allows access to saved tensors.
trial = create_trial(s3_output_path)

BYOC 'sm-docker build' fails due to gevent wheel not found

Running Lab 3, Option 1 (Bring you own Container) after cloning the repo last night as part of an event, we encountered issued in the process of building the Docker container.

The reason for this seems to be that gevent doesn't find a suitable wheel file to use, and thus downloads and attempts to create the wheel from source. But the Dockerfile doesn't include any C compiler or python-dev modules.

Overall, even though we could work around this issue by adding libraries such as build-essential and python-dev to Dockerfile, I believe the underlying problem is we're still using Python 2.7 for this Lab, which in time may start causing additional issues with other libraries

Here are relevant sections from CodeBuild logs:

(...)

DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.

(...)

Collecting gevent
  Downloading gevent-21.8.0.tar.gz (6.2 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'

(...)

Building wheels for collected packages: gevent
  Building wheel for gevent (PEP 517): started
  Building wheel for gevent (PEP 517): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python /usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmpQHzSIn
       cwd: /tmp/pip-install-OouyxP/gevent
  Complete output (344 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/gevent
  copying src/gevent/_monitor.py -> build/lib.linux-x86_64-2.7/gevent
(<Removing additional copying>)
  copying src/gevent/testing/six.py -> build/lib.linux-x86_64-2.7/gevent/testing
  creating build/lib.linux-x86_64-2.7/gevent/resolver
  copying src/gevent/resolver/thread.py -> build/lib.linux-x86_64-2.7/gevent/resolver
  copying src/gevent/resolver/_addresses.py -> build/lib.linux-x86_64-2.7/gevent/resolver
  copying src/gevent/resolver/_hostsfile.py -> build/lib.linux-x86_64-2.7/gevent/resolver
  copying src/gevent/resolver/__init__.py -> build/lib.linux-x86_64-2.7/gevent/resolver
  copying src/gevent/resolver/blocking.py -> build/lib.linux-x86_64-2.7/gevent/resolver
  copying src/gevent/resolver/ares.py -> build/lib.linux-x86_64-2.7/gevent/resolver
  copying src/gevent/resolver/dnspython.py -> build/lib.linux-x86_64-2.7/gevent/resolver
  creating build/lib.linux-x86_64-2.7/gevent/_ffi
  copying src/gevent/_ffi/watcher.py -> build/lib.linux-x86_64-2.7/gevent/_ffi
  copying src/gevent/_ffi/callback.py -> build/lib.linux-x86_64-2.7/gevent/_ffi
  copying src/gevent/_ffi/__init__.py -> build/lib.linux-x86_64-2.7/gevent/_ffi
  copying src/gevent/_ffi/loop.py -> build/lib.linux-x86_64-2.7/gevent/_ffi
  creating build/lib.linux-x86_64-2.7/gevent/libev
  copying src/gevent/libev/watcher.py -> build/lib.linux-x86_64-2.7/gevent/libev
  copying src/gevent/libev/corecffi.py -> build/lib.linux-x86_64-2.7/gevent/libev
  copying src/gevent/libev/__init__.py -> build/lib.linux-x86_64-2.7/gevent/libev
  copying src/gevent/libev/_corecffi_build.py -> build/lib.linux-x86_64-2.7/gevent/libev
  creating build/lib.linux-x86_64-2.7/gevent/libuv
  copying src/gevent/libuv/watcher.py -> build/lib.linux-x86_64-2.7/gevent/libuv
  copying src/gevent/libuv/__init__.py -> build/lib.linux-x86_64-2.7/gevent/libuv
  copying src/gevent/libuv/_corecffi_build.py -> build/lib.linux-x86_64-2.7/gevent/libuv
  copying src/gevent/libuv/loop.py -> build/lib.linux-x86_64-2.7/gevent/libuv
  creating build/lib.linux-x86_64-2.7/gevent/tests/monkey_package
  copying src/gevent/tests/monkey_package/issue1526_no_monkey.py -> build/lib.linux-x86_64-2.7/gevent/tests/monkey_package
(<Removing additional copying>)
  copying src/gevent/tests/test_server.key -> build/lib.linux-x86_64-2.7/gevent/tests
  creating build/lib.linux-x86_64-2.7/gevent/testing/coveragesite
  copying src/gevent/testing/coveragesite/sitecustomize.py -> build/lib.linux-x86_64-2.7/gevent/testing/coveragesite
  running build_ext
  generating cffi module 'build/temp.linux-x86_64-2.7/gevent.libuv._corecffi.c'
  creating build/temp.linux-x86_64-2.7
  Running '(cd  "/tmp/pip-install-OouyxP/gevent/deps/libev"  && sh ./configure -C > configure-output.txt )' in /tmp/pip-install-OouyxP/gevent
  configure: error: in `/tmp/pip-install-OouyxP/gevent/deps/libev':
  configure: error: no acceptable C compiler found in $PATH
  See `config.log' for more details
  Traceback (most recent call last):
    File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
      main()
    File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py", line 205, in build_wheel
      metadata_directory)
    File "/usr/local/lib/python2.7/dist-packages/setuptools/build_meta.py", line 209, in build_wheel
      wheel_directory, config_settings)
    File "/usr/local/lib/python2.7/dist-packages/setuptools/build_meta.py", line 194, in _build_with_temp_dir
      self.run_setup()
    File "/usr/local/lib/python2.7/dist-packages/setuptools/build_meta.py", line 243, in run_setup
      self).run_setup(setup_script=setup_script)
    File "/usr/local/lib/python2.7/dist-packages/setuptools/build_meta.py", line 142, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 481, in <module>
      run_setup(EXT_MODULES)
    File "setup.py", line 465, in run_setup
      "signal_os_incompat = gevent.monkey:_subscribe_signal_os",
    File "/usr/local/lib/python2.7/dist-packages/setuptools/__init__.py", line 162, in setup
      return distutils.core.setup(**attrs)
    File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
      dist.run_commands()
    File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
      self.run_command(cmd)
    File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python2.7/dist-packages/wheel/bdist_wheel.py", line 299, in run
      self.run_command('build')
    File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
      cmd_obj.run()
    File "/usr/lib/python2.7/distutils/command/build.py", line 128, in run
      self.run_command(cmd_name)
    File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-99XjyX/overlay/lib/python2.7/site-packages/cffi/setuptools_ext.py", line 143, in run
      ext.sources[0] = make_mod(self.build_temp, pre_run)
    File "/tmp/pip-build-env-99XjyX/overlay/lib/python2.7/site-packages/cffi/setuptools_ext.py", line 128, in make_mod
      pre_run(ext, ffi)
    File "/tmp/pip-install-OouyxP/gevent/_setuputils.py", line 364, in pre_run
      action()
    File "/tmp/pip-install-OouyxP/gevent/_setuplibev.py", line 55, in configure_libev
      system(libev_configure_command)
    File "/tmp/pip-install-OouyxP/gevent/_setuputils.py", line 195, in system
      if _system(cmd, cwd=cwd, env=env, **kwargs):
    File "/tmp/pip-install-OouyxP/gevent/_setuputils.py", line 191, in _system
      return check_call(cmd, cwd=cwd, env=env, **kwargs)
    File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '(cd  "/tmp/pip-install-OouyxP/gevent/deps/libev"  && sh ./configure -C > configure-output.txt )' returned non-zero exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for gevent
Failed to build gevent
ERROR: Could not build wheels for gevent which use PEP 517 and cannot be installed directly
The command '/bin/sh -c wget https://bootstrap.pypa.io/pip/2.7/get-pip.py && python get-pip.py &&     pip install numpy==1.16.2 scipy==1.2.1 scikit-learn==0.20.2 pandas flask gevent gunicorn &&         (cd /usr/local/lib/python2.7/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) &&         rm -rf /root/.cache' returned a non-zero code: 1

Pipeline lab fails in CodeBuild due to breaking change in image_uris.retrieve()

In Lab 6. SageMaker Pipelines, the changes to pipeline.py no longer work due to a breaking change to image_uris.retrieve() in the sagemaker-python-sdk.

The build fails with the following error:
Exception: instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>)

211 | Traceback (most recent call last):
212 | File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pipelines/run_pipeline.py", line 77, in main
213 | pipeline = get_pipeline_driver(args.module_name, args.kwargs)
214 | File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pipelines/_utils.py", line 33, in get_pipeline_driver
215 | return _imports.get_pipeline(**kwargs)
216 | File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pipelines/customer_churn/pipeline.py", line 125, in get_pipeline
217 | sklearn_processor = SKLearnProcessor(
218 | File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/sklearn/processing.py", line 90, in init
219 | image_uri = image_uris.retrieve(
220 | File "/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/sagemaker/image_uris.py", line 117, in retrieve
221 | raise ValueError("%s should not be a pipeline variable (%s)" % (name, type(val)))
222 | ValueError: instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>)

See related issues:

#3024
#3141

Workaround:
Manually set the variables processing_instance_type and training_instance_type to string literals instead of parameters. For example, add these lines starting at line 124 in pipelines.py:

# Manually override *_instance_type variables to workaround breaking change to image_uris.retrieve()
# https://github.com/aws/sagemaker-python-sdk/issues/3141
processing_instance_type = "ml.m5.xlarge"
training_instance_type = "ml.m5.xlarge"

xgboost_debugger_demo.ipynb - csv_serializer import runtime error.

The existing code resulted in a runtime error. I fixed it as follows:

Existing code:
from sagemaker.predictor import csv_serializer

New code option #1: Comment out the import since csv_serializer is not being used anywhere in notebook
#from sagemaker.predictor import csv_serializer

New code option #2: Import csv_serializer from sagemaker.serializers
from sagemaker.serializers import CSVSerializer

Error Message Step In this step, we’ll build the Dockerfile and push it to Amazon Elastic Container Registry by running cell #2

When running this step In this step, we’ll build the Dockerfile and push it to Amazon Elastic Container Registry by running cell #2, we are getting an Error Message- see below.

The notebook for Lab 3 option 1 launches a ml.m5.large instance by default

bring-custom-script.ipynb launches a ml.m5.large instance by default as noted in the metadata below.

"metadata": {
  "instance_type": "ml.m5.large",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
  },

This instance and kernelgateway app is not predefined and launched by the corresponding cloudformation template and event engine blueprint. This means that during an immersion day workshop the participants have to wait for the instance and app to startup while ml.t3.medium is already available. Upon a quick search in this repository, this is the only notebook that has ml.m5.large as default. Suggest to edit the json metadata of the notebook to use ml.t3.medium for better workshop experience.

Cannot install sagemaker-studio-image-build

When executing #cell 00 in bring-custom-container.ipnyb, sagemaker-studio-image-build fails to install, I've verified the packages successfuly installs on my local machine but not SageMaker Studio. This is the following error.

/opt/conda/lib/python3.7/site-packages/secretstorage/dhcrypto.py:16: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
  from cryptography.utils import int_from_bytes
/opt/conda/lib/python3.7/site-packages/secretstorage/util.py:25: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
  from cryptography.utils import int_from_bytes
Collecting sagemaker-studio-image-build
  Using cached sagemaker_studio_image_build-0.6.0.tar.gz (13 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-3i1kn8pn/sagemaker-studio-image-build_a6da33aa92b34b98b5c2d69117b36e40/setup.py", line 33, in <module>
          package_data={"sagemaker_studio_image_build": ["*.yml", "data/**"]},
        File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 109, in setup
          _setup_distribution = dist = klass(attrs)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 466, in __init__
          for k, v in attrs.items()
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 293, in __init__
          self.finalize_options()
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 885, in finalize_options
          for ep in sorted(loaded, key=by_order):
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 884, in <lambda>
          loaded = map(lambda e: e.load(), filtered)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_vendor/importlib_metadata/__init__.py", line 196, in load
          return functools.reduce(getattr, attrs, module)
      AttributeError: type object 'Distribution' has no attribute '_finalize_feature_opts'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Flink-Async-IO-1.0.jar file in lab 7 is invalid

The Flink-Async-IO-1.0.jar file is invalid now, can someone upload the right file please?

This is what I met:
Error updating application TaxifareKDA An internal error prevented saving your changes for application TaxifareKDA. Wait a few minutes and try again. If the problem persists, contact AWS Support. API response No valid JAR file found in the zip file.

The jar file should be 42.2MB, but now it is only 1KB.

I need this file urgently. Thanks a lot for your help!

Best regards

Lab 6, option 3 broken - method get_pipeline_custom_tags missing in pipelines/customer_churn/pipeline.py

Lab 6, option 3 is broken.
https://catalog.us-east-1.prod.workshops.aws/event/dashboard/en-US/workshop/lab6/option3

Error getting project tags: module 'pipelines.customer_churn.pipeline' has no attribute 'get_pipeline_custom_tags'
529 | No finished training job found associated with this estimator. Please make sure this estimator is only used for building

Resolution: adjust workshop instructions
Add method below to pipelines/customer_churn/pipeline.py


def get_sagemaker_client(region):
     """Gets the sagemaker client.

        Args:
            region: the aws region to start the session
            default_bucket: the bucket to use for storing the artifacts

        Returns:
            `sagemaker.session.Session instance
        """
     boto_session = boto3.Session(region_name=region)
     sagemaker_client = boto_session.client("sagemaker")
     return sagemaker_client


def get_pipeline_custom_tags(new_tags, region, sagemaker_project_name=None):
     try:
         sm_client = get_sagemaker_client(region)
         response = sm_client.describe_project(ProjectName=sagemaker_project_name)
         sagemaker_project_arn = response["ProjectArn"]
         response = sm_client.list_tags(
             ResourceArn=sagemaker_project_arn)
         project_tags = response["Tags"]
         for project_tag in project_tags:
             new_tags.append(project_tag)
     except Exception as e:
         print(f"Error getting project tags: {e}")
     return new_tags

Improve the Automatic model Tuning section (xgboost direct marketing example notebook)

Summary

In the xgboost_direct_marketing_sagemaker.ipynb notebook, section Automatic model Tuning , ensure Automatic model Tuning feature is demonstrated properly, currently this section could do with a little expanding to highlighting the value the Amazon SageMaker automatic model tuning functionality provides

Currently this section does the following:

Running a training job in an async manner, i.e. the cell executes and immediately returns
Users have to then run describe_hyper_parameter_tuning_job cell until it completes
That's it .... it leaves it to the user to go explore the Sagemaker console to see what's happening with the Hyperparameter tuning jobs (doesnt deploy to an endpoint or uses the new trained model)

Suggestion

To highlight the value of this important feature in Sagemaker, I propose making the following changes in the Automatic model Tuning section:

Wait for the Hyperparameter tuning job to finish by changing the parameters of the fit() function, also remove include_cls_metadata parameter as it is False by default.
describe the best trained job (out of default 3 trained jobs)
deploy the best trained model to a Sagemaker endpoint
run prediction
notice the difference between the previous prediction output, this is where the value sit, comparing this prediction with the previous one
run clean-up

PR incoming

Referenced gifs are missing from the bias_and_explainability notebook

The bias_and_explainability notebook contains markdown cells that references (4) missing gifs. If rendered, these gifs would aid the user on how to view the bias and explainability reports within Sagemaker Studio. The missing gifs include:

recordings/bias_detail.gif
recordings/bias_report_chart.gif
recordings/bias_report.gif
recordings/explainability_detail.gif

Link to Cloudformation template with API Gateway/Lambda function broken in Lab 7

The link to cloudformation template to install an API Gateway/Lambda that connects to the SageMaker Endpoint is broken or file missing. It would be extremely helpful to have this working.

Link to template is in this chapter:
https://catalog.us-east-1.prod.workshops.aws/v2/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US/lab7#create-lambda-function-and-api-gateway

conflict between version of botocore and boto3

On lab bring custom container when running cell 00 the code breaks due to the conflict between boto3 and botocore versions. see error attached. with Kernel data science- Kernel Data science 3 generates error but the code doesn't break.

`Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (23.0.1)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-633ajwu1/aiobotocore_87f4867a92af42909769433ea8237476/setup.py", line 69, in <module>
          include_package_data=True
        File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 147, in setup
          _setup_distribution = dist = klass(attrs)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 479, in __init__
          for k, v in attrs.items()
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 283, in __init__
          self.finalize_options()
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 898, in finalize_options
          for ep in sorted(loaded, key=by_order):
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 897, in <lambda>
          loaded = map(lambda e: e.load(), filtered)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_vendor/importlib_metadata/__init__.py", line 196, in load
          return functools.reduce(getattr, attrs, module)
      AttributeError: type object 'Distribution' has no attribute '_finalize_feature_opts'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-dg0kz7eb/sagemaker-studio-image-build_98ae0ba9b2624630adb603dd2cb1084d/setup.py", line 33, in <module>
          package_data={"sagemaker_studio_image_build": ["*.yml", "data/**"]},
        File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 147, in setup
          _setup_distribution = dist = klass(attrs)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 479, in __init__
          for k, v in attrs.items()
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 283, in __init__
          self.finalize_options()
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 898, in finalize_options
          for ep in sorted(loaded, key=by_order):
        File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 897, in <lambda>
          loaded = map(lambda e: e.load(), filtered)
        File "/opt/conda/lib/python3.7/site-packages/setuptools/_vendor/importlib_metadata/__init__.py", line 196, in load
          return functools.reduce(getattr, attrs, module)
      AttributeError: type object 'Distribution' has no attribute '_finalize_feature_opts'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

Lab 3b: Failed to install torchvision

The first cell in Lab 3b fails with the following error:

If I ignore it and continue running, subsequent cells will complain that torchvision is not installed.

This is using the version created in Workshop Studio on 6/2/2023 12:00 (GMT+2).

Non-numeric value found in header and could not convert string to float

Steps to replicate -

Follow steps mentioned in https://catalog.us-east-1.prod.workshops.aws/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US. First we will follow pre-requisites to setup a notebook via SageMaker Studio Classic and then "git clone https://github.com/aws-samples/amazon-sagemaker-immersion-day.git"

Now next we follow Option 2 - Feature Engineering and Data Preparation using "Numpy and Pandas". For this we will use notebook "xgboost_direct_marketing_sagemaker.ipynb"
In this notebook at #cell 08 print(data.corr()) gives error as:
ValueError: could not convert string to float: 'housemaid'
Now we can move on without printing this data by commenting this print statement

Then we move to Lab 2. Train, Tune and Deploy XGBoost. Again we use the same notebook "xgboost_direct_marketing_sagemaker.ipynb". But now in # cell 17 we get another error as:
"Failed - Training job failed". Detailed error is:
"UnexpectedStatusException: Error for Training job xgboost-2024-01-14-10-06-27-472: Failed. Reason: ClientError: Non-numeric value 'F' found in the header line 'False,54,3,999,0,1,0,False,False,False,False,False...' of file 'train.csv'. CSV format require no header line in it. If header line is already removed, XGBoost does not accept non-numeric value in the data., exit code: 1"

Debugger lab doesn't seem to work with SageMaker SDK v1 or v2

The debugger lab includes the line from sagemaker.predictor import csv_serializer, which appears to only be valid with SageMaker SDK v1.x. Later, the lab includes the line container = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework='xgboost', version='1.0-1'), which appears to only be valid with SageMaker SDK v2.x. As such, I can't get the lab to run all the way through with either SDK version.

Tested with SageMaker SDK v1.72.1 and v2.172.0

missing trust relationship when running bring-custom-container.ipynb

running the cell

%%sh

# Cell #2

cd container

chmod +x decision_trees/train
chmod +x decision_trees/serve

sm-docker build .  --repository sagemaker-decision-trees:latest

leads to an error because the sagemaker execution role is missing the trust relationship with codebuild on the sagemaker execution role.

see: https://pypi.org/project/sagemaker-studio-image-build/

fix: add instructions to the sagemaker immersion day https://sagemaker-immersionday.workshop.aws/lab3.html on how to add the trust relationship to the sagemaker role (see trust relationship below):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "sagemaker.amazonaws.com",
          "codebuild.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

MACOS junk files in the scikit-learn zip

There are MACOS_X specific temp junk files which have been archived along with the relevant files which blocks the cell for the user to provide an input to "overwrite" the temp files. the cell hangs forever with the output:

Archive: scikit_bring_your_own.zip creating: scikit_bring_your_own/ replace __MACOSX/._scikit_bring_your_own? [y]es, [n]o, [A]ll, [N]one, [r]ename:

SageMaker-ModelMonitoring.ipynb - TypeError: 'NoneType' object is not iterable

For # cell 07, the existing code resulted in "TypeError: 'NoneType' object is not iterable". I fixed it as follows:
Existing code:
capture_files = [capture_file.get("Key") for capture_file in result.get('Contents')]

New proposed code:
capture_files = [capture_file.get("Key") for capture_file in result.get('Contents')] or []

Lab 6.2:

    df = pd.read_csv(file_path)
    ## Convert to datetime columns
    df["firstorder"]=pd.to_datetime(df["firstorder"],errors='coerce')
    df["lastorder"] = pd.to_datetime(df["lastorder"],errors='coerce')
    ## Drop Rows with null values
    df = df.dropna()
    ## Create Column which gives the days between the last order and the first order
    df["first_last_days_diff"] = (df['lastorder']-df['firstorder']).dt.days
    ## Create Column which gives the days between when the customer record was created and the first order
    df['created'] = pd.to_datetime(df['created'])
    df['created_first_days_diff']=(df['created']-df['firstorder']).dt.days
    ## Drop Columns
    df.drop(['custid','created','firstorder','lastorder'],axis=1,inplace=True)
    ## Apply one hot encoding on favday and city columns
    df = pd.get_dummies(df,prefix=['favday','city'],columns=['favday','city'])
    return df
    
# convert the store_data file into csv format
store_data = pd.read_excel("storedata_total.xlsx")
store_data.to_csv("storedata_total.csv")

When reading the pdf in L2 df = pd.read_csv(file_path), the index needs to be taken care of. Otherwise we end up with a file containing 22 instead of 21 columns breaking the inference. This fixes the issue: df = pd.read_csv(file_path, index_col=0)

BYOC notebook is broken

the BYOC notebook is full of errors:
The first cell:
!pip install --upgrade aiobotocore
breaks with error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. boto3 1.17.23 requires botocore<1.21.0,>=1.20.23, but you have botocore 1.23.24 which is incompatible. awscli 1.19.23 requires botocore==1.20.23, but you have botocore 1.23.24 which is incompatible.

The next cell uses unzip command wheras zip is not installed by default on this image. Even if you try to install zip by pip install zip, it still fails with as wsgiref error.

another thing is, why the workshop recommends a tensorflow kernel for this as per the workshop: https://catalog.us-east-1.prod.workshops.aws/v2/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US/lab3/option2

The instructions in the workshop and the lab notebook do not match.

Missing image in sagemaker-pipelines-train-pipeline.ipynb

The diagram image images/Train_Pipeline_Pattern.png in the sagemaker-pipelines-train-pipeline.ipynb is missing.

Refactor for sagemaker 2.0

Only numeric values are supported for XGBoost in xgboost_direct_marketing_sagemaker

Notebook: xgboost_direct_marketing_sagemaker.ipynb

This notebook prepares training and validation data that have non-numeric values such as "True/False". This causes the training job to fail in #cell 17.

Steps to Reproduce

Run the notebook cells up to the training job

Error: Customer Error: Non-numeric value 'F' found in the header line 'False,54,3,999,0,1,0,False,False,False,False,False...' of file 'train.csv'. CSV format require no header line in it. If header line is already removed, XGBoost does not accept non-numeric value in the data.

Numbered cells of python code are useful for group training situations

bring-custom-script.ipynb has cell numbers, which are really useful to keep track of things and communicate "place" between instructor and student.

Can we just add them to all notebooks?

bring-custom-container.ipynb - Unable to build container

Due to IAM permissions related to the sagemaker-immersion-day-SageMakerExecutionRole-xxxxx, "# cell 02" execution resulted in error(s). In order to make this work, I had to modify the SageMaker execution role to allow iam:PassRole and allow codebuild.amazonaws.com to assume the role. Furthermore, I ran the sm-docker build . --repository sagemaker-decision-trees:latest in the image terminal.

Error:
CalledProcessError: Command 'b'# cell 02\n\ncd lab03_container\n\nchmod +x decision_trees/train\nchmod +x decision_trees/serve\n\nsm-docker build . --repository sagemaker-decision-trees:latest\n'' returned non-zero exit status 1.

aws-samples / amazon-sagemaker-immersion-day Goto Github PK