Giter Club home page Giter Club logo

Comments (10)

hnipun avatar hnipun commented on June 18, 2024

I assume you're running the server on your local machine. Could you please check the memory/CPU usage while the server is running? The FastAPI server logs you provided indicate that error code 134 might be due to insufficient resources.

from labml.

li1553770945 avatar li1553770945 commented on June 18, 2024

Hello, I am running this on my server with ssh. My CPU model is Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, RAM is 128GB and I am not running any other resource consuming applications.

I have observed the CPU and memory usage using htop and it looks like there are no anomalies in resource usage.

image

from labml.

hnipun avatar hnipun commented on June 18, 2024

Thanks, could you please verify if MongoDB is installed and running on the default port (27017)? Additionally, could you provide details about the operating system?

from labml.

li1553770945 avatar li1553770945 commented on June 18, 2024

Sorry, I observed the service in running state after installing MongoDB and assumed it was working fine, when in fact for some reason it quit unexpectedly. I restarted MongoDB and it worked.

Thank you for your help, this is now resolved but I still have two minor issues.

1. The first one is that I have configured the following configuration item in .labml.yaml, based on the pypi and guides:

web_api: 'http://localhost:5005/api/v1/track?'

But it shows

LABML WARNING: Method Not Allowed 405: http://localhost:5005/api/v1/track?run_uuid=0b405056f0c811ee902521fc9abb02ad&rank=0&world_ size=0&labml_version=0.4.168.

I'm guessing it might be caused by different versions of different components of labml not adapting?

image

2. The second question is, if I have completed a training session and the logs folder has not been artificially modified since the training, can I still see the entire training process from the browser?

image

from labml.

hnipun avatar hnipun commented on June 18, 2024

Glad that your problem is solved now.

  1. I think you need to update the labml pip package
  2. All the data to view training progress in the browser is kept in a mongodb database, so you should able to view the training progress in the browser, But it should also save in the log folder.

from labml.

li1553770945 avatar li1553770945 commented on June 18, 2024

Thank you for your help! I have no further questions and I will close this issue.

from labml.

li1553770945 avatar li1553770945 commented on June 18, 2024

Thank you for your help.

I have solved my problem, but I have some suggestions. The latest version of labml-nn is currently 0.4.136, which is incompatible with the latest versions of labml (0.5.1) and labml-app (0.5.2). If the user uses pip install labml-app labml labml-nn for installation, it will result in an exception.

Also, different versions of labml use different configuration files, e.g. web_api in some versions, app_url in others, and I haven't found a complete documentation on which configuration file should be used for that version,.

I'm not sure what the URL of that configuration file should be for each version, Sometime it is http://localhost:5005/api/v1/track?,sometimes it is http://localhost:5005/api/v1/default.

Your work has greatly facilitated my development tasks, thank you again!

from labml.

hnipun avatar hnipun commented on June 18, 2024

We have updated the Readme.md with the latest configuration.

from labml.

li1553770945 avatar li1553770945 commented on June 18, 2024

We have updated the Readme.md with the latest configuration.

Yes, but the instructions in README.md apply to the latest version of labml(0.5.1) and labml-app(0.5.2). However, the latest version of labml-nn(0.4.136) does not adapt to the latest version of labml.

I am trying to run switch transfomers, and if I use the latest version of labml-nn as well as the latest version of labml, these codes report an error, so I have to use an older version of labml as well as an older version of labml-app, but I don't know where the documentation is for these older versions.

from labml.

hnipun avatar hnipun commented on June 18, 2024

@vpj

We have updated the Readme.md with the latest configuration.

Yes, but the instructions in README.md apply to the latest version of labml and labml-app. However, the latest version of labml-nn does not adapt to the latest version of labml.

I am trying to run switch transfomers, and if I use the latest version of labml-nn as well as the latest version of labml, these codes report an error, so I have to use an older version of labml as well as an older version of labml-app, but I don't know where the documentation is for these older versions.

from labml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.