Skip to content

Conversation

@HCharlie
Copy link

@HCharlie HCharlie commented Mar 25, 2024

#4536

Description of changes:
add code_location which is passed to tensorflow estimator object but not passed to model in the deploy function.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Mar 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.44%. Comparing base (0075fb3) to head (07d3f9e).

❗ Current head 07d3f9e differs from pull request most recent head eb186cc. Consider uploading reports for the commit eb186cc to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4537      +/-   ##
==========================================
- Coverage   87.49%   87.44%   -0.06%     
==========================================
  Files         391      389       -2     
  Lines       37254    36889     -365     
==========================================
- Hits        32595    32256     -339     
+ Misses       4659     4633      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@HCharlie HCharlie changed the title MLX-1224 pass code_location to create_model for tensorflow estimator deployment fix: pass code_location to create_model for tensorflow estimator deployment May 2, 2024
@HCharlie
Copy link
Author

HCharlie commented May 2, 2024

Hi @mohanasudhan, do you know what's missing for this PR?

@mohanasudhan
Copy link
Contributor

Can you explain your usecase and add unit/integ test?

@HCharlie
Copy link
Author

HCharlie commented May 2, 2024

Can you explain your usecase and add unit/integ test?

hi @mohanasudhan, thanks for the reply, I think it's either a bug or I am wrong on how to use the tensorflow estimator. I am using Tensorflow estimator, and pass code_location parameter to the estimator like below, and when the code reaches the esitmator.deploy function, it's not parsing the specified parameter code_location to get the s3 bucket, but trying to create a new default s3 bucket. A more detailed version is specifed here. #4536

from sagemaker.tensorflow import TensorFlow
source_dir = 's3://{}/{}/source'.format(bucket, prefix)
output_path = 's3://{}/{}/output'.format(bucket, prefix)

hyperparams = {
    'sagemaker_requirements': 'code/requirements.txt'
}

mnist_estimator = TensorFlow(entry_point='code/mnist.py',
                              base_job_name=base_job_name,
                              output_path=output_path,
                              code_location=source_dir,
                              hyperparameters=hyperparams,
                              role=role,
                              instance_count=2,
                              instance_type='ml.m5.large',
                              framework_version='2.1.0',
                              py_version='py3',
                              distribution={'parameter_server': {'enabled': True}})

## fit
print("start fitting")
mnist_estimator.fit(training_data_uri)

## deploy
print("start deploy")
predictor = mnist_estimator.deploy(initial_instance_count=1, instance_type='ml.m5.large')

sagemaker-bot and others added 8 commits January 27, 2025 14:18
* fix: skip TF tests for unsupported versions

* flake8
* feat: add pytorch-tgi-inference 2.4.0

* add tgi 3.0.1 image

* skip faulty test

* formatting

* formatting

* add hf pytorch training 4.46

* update version alias

* add py311 to training version

* update tests with pyversion 311

* formatting

---------

Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>
…mage (aws#4992)

Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>
pravali96 and others added 28 commits April 15, 2025 08:14
* Fix deepdiff dependencies

* trigger tests
* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

* Revert the PR changes: aws#5122, due to issue https://t.corp.amazon.com/P223568185/overview

* Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>
* fix: tgi image uri unit tests

* fix: black-format and flake8 failures

* fix: parse

* fix: print statement

---------

Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>
…aws#5123)

* clean up

* bump maxdepth for doc/api/training to fix readthedocs

* change maxdepth for readthedocs rendering doc/api/training page

* change maxdepth for readthedocs rendering doc/api/training page

* change maxdepth for readthedocs rendering doc/api/training page
* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

* Revert the PR changes: aws#5122, due to issue https://t.corp.amazon.com/P223568185/overview

* Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication

* Revert PR 5122 changes, due to issues with other processor codeflows

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>
Co-authored-by: Zhaoqi <jzhaoqwa@amazon.com>
…ws#5144)

* add s3 uri check to modeltrainer data source

* update ModelTrainer to support s3 uri and tar.gz file as source_dir

* black-format

* add unit and integ tests

* update logic and unit test to raise value error if the file is not .tar.gz
…image. (aws#5143)

* feature:support custom workflow deployment in ModelBuilder using SMD image. (aws#1661)

* feature:support custom workflow deployment in ModelBuilder using SMD inference image.

* Rename test case and pass session.

* Address PR comments.

* Tweak resource cleanup logic in integ test.

* Fixing CodeBuild integ test failures.

* Renamed integ test.

* Remove unused integ test, restore once GA.

---------

Co-authored-by: Joseph Zhang <cjz@amazon.com>

* Cache client as instance attribute in property@ decorator. (aws#1668)

* Remove property@ decorator from ABC definition.

* Cache client as instance attribute in @Property.

* Fix flake8 issue.

---------

Co-authored-by: Joseph Zhang <cjz@amazon.com>

* Bugfixes from e2e testing. (aws#1670)

* Fix Alabtross Inference component tests

* trigger integ tests

---------

Co-authored-by: cj-zhang <32367995+cj-zhang@users.noreply.github.com>
Co-authored-by: Joseph Zhang <cjz@amazon.com>
Co-authored-by: Pravali Uppugunduri <upravali@amazon.com>
Co-authored-by: adishaa <adishaa@amazon.com>
…5146)

* Fix Flake8 Violations

* Add Owner ID check for bucket with path when prefix is provided

**Description**

Previously we called the head_bucket call to ensure the owner ID check, but this doesnt take into consideration cases where the s3 path is provided through the prefix.

This change makes sure that director level permissions are supported.

**Testing Done**
Tested through unit tests, integ tests and manual testing through the installation file.

Yes

* Address PR comment

* Codestyle fixes

* Minor fix

* Codestyle fixes

* Fix Unit tests
* chore: add huggingface images

* chore: add tei 1.6 image

* chore: add tei 1.6.0 to tei mapping in tests
aws#5098)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](mlflow/mlflow@v2.13.2...v2.20.3)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](mlflow/mlflow@v2.13.2...v2.20.3)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-version: 2.20.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [scikit-learn](https://github.com/scikit-learn/scikit-learn) from 1.3.2 to 1.5.1.
- [Release notes](https://github.com/scikit-learn/scikit-learn/releases)
- [Commits](scikit-learn/scikit-learn@1.3.2...1.5.1)

---
updated-dependencies:
- dependency-name: scikit-learn
  dependency-version: 1.5.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Improve error logging and documentation for issue 4007

* Add hyperlink to RTDs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.