Skip to content

Amazon SageMaker

toncho11 edited this page Oct 11, 2022 · 18 revisions
  • SageMaker Data Wrangler helps you understand your data and identify potential errors and extreme values with a set of robust pre-configured visualization templates. Histograms, scatter plots, box and whisker plots, line plots, and bar charts are all available.
    • can produce new features
  • Features are saved to Feature Store - a feature can have multiple versions
  • SageMaker Clarify - ensures that training data and features are well balanced
    • to inspect individual predictions to understand how each feature plays a role in the prediction
    • can check the model isn't overly reliant on features that underrepresented in the data
  • SageMaker Clarify and Debugger can be use to improve performance to identify sources of errors and slowness
  • 2 months free
  • you need to shutdown manually all instances/kernels/apps to avoid charges. You also need to click on "Delete all resources" on solutions you won't be using anymore.
  • Canvas is different from Studio as it allows for no coding and requires no ML expertise
  • SageMaker generates pre-trianed models. These models can be used with Python utilizing SageMaker Python API SDK. An Endpoint is created that can be consumed and used for making predictions. The whole tutorial is here.
  • SageMaker Pipelines is explained here. It allows conditions to be set in the pipeline. For example, if the evaluation is good then the model will be registered. Then there can be automatic deployment to production or it can be sent to another person for final approval before deployment to production. It allows for collaboration between the Data Science team and the Devops/MLops team.
  • Different deployment scenarios are explained here. It is configured int the end point config where for each model you can specify how many instances, their type and model weight.
    • Deploy a single model on a real-time end point
    • Deploy model variants (also called production variants) on the same end point - this is useful if you want to gradually introduce a new model in production before deploying it 100%
    • Multi-model end points - where you can dynamically load and unload a large number of models on the same end point. This allows a different model for each customer for example. You can actually select which model you want to use when calling the end point.
  • Examples:
    • An SageMaker notebook where you train locally and you deploy your model on an Amazon end point using SageMaker
      • you use SageMaker SDK on your local computer in the form of notebook
      • some configuration of roles might be required
      • classifier is trained locally
      • your Tensor Flow model must slightly changed to be used by SageMaker
      • load this model into SageMaker utilizing: sagemaker.tensorflow.model.TensorFlowModel
      • create an end point by calling the deploy method of your SageMaker Tensor Flow model