Giter Club home page Giter Club logo

Comments (7)

elgalu avatar elgalu commented on May 13, 2024 1

Hi @winstonaws the link you posted was pointing to master branch so the line doesn't match anymore, could you use a commit id instead?

from amazon-sagemaker-examples.

djarpin avatar djarpin commented on May 13, 2024

Thanks @ThisIsRick . Have you tried using SageMaker's pre-built TensorFlow container for your task? There's an example notebook here which shows how to use TensorBoard with it. There are some intricacies with writing checkpoints to S3 and running TensorBoard locally that may make this more difficult to implement in your own container. Thanks.

from amazon-sagemaker-examples.

ThisIsRick avatar ThisIsRick commented on May 13, 2024

Thanks @djarpin.
I didn't try with SageMaker's pre-built TensorFlow container. My understanding, the model script has to follow the pattern in order to use pre-built TensorFlow container, right? But, our model script doesn't, it is provided by applied scientist.

We're also considering to keep syncing checkpoints to S3 in container, and have another thread in local to sync checkpoints from S3. But our training job is scheduled by aws command line in local desktop, we don't use notebook instance on Sagemaker. So, this makes syncing checkpoints from S3 part a bit more complicated.

from amazon-sagemaker-examples.

winstonaws avatar winstonaws commented on May 13, 2024

@ThisIsRick

The approach you described is the right one. You need your code inside the container to save checkpoints to S3, and you need to periodically sync your local Tensorboard log directory with your S3 checkpoints.

Here is our implementation in the SageMaker Python SDK: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/estimator.py#L29

Are there any specific questions you have about this approach?

from amazon-sagemaker-examples.

djarpin avatar djarpin commented on May 13, 2024

Closing this issue for now, feel free to re-open if your run into more problems with this. Thanks.

from amazon-sagemaker-examples.

ChoiByungWook avatar ChoiByungWook commented on May 13, 2024

@elgalu, I believe @winstonaws was pointing to https://github.com/aws/sagemaker-python-sdk/blob/8a3dea24f04a81b06df35a1c7aa262f6a1a02bb5/src/sagemaker/tensorflow/estimator.py#L29

The most up to date as of now would be: https://github.com/aws/sagemaker-python-sdk/blob/cecea123d4933baa8998afd138fee3eaf28a8e49/src/sagemaker/tensorflow/estimator.py#L46

Otherwise if any of those links are out of date, he is speaking of the TensorBoard class in estimator.py within src/sagemaker/tensorflow.

from amazon-sagemaker-examples.

elgalu avatar elgalu commented on May 13, 2024

from sagemaker.debugger import TensorBoardOutputConfig can also be useful https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_debugger.html#capture-real-time-tensorboard-data-from-the-debugging-hook

from amazon-sagemaker-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.