Giter Club home page Giter Club logo

labmlai / labml Goto Github PK

View Code? Open in Web Editor NEW
1.9K 28.0 127.0 112.01 MB

πŸ”Ž Monitor deep learning model training and hardware usage from your mobile phone πŸ“±

Home Page: https://labml.ai

License: MIT License

Python 7.34% Makefile 0.07% Jupyter Notebook 87.05% Shell 0.07% TypeScript 4.86% HTML 0.05% SCSS 0.39% JavaScript 0.17% Jinja 0.01% Cython 0.01%
machine-learning deep-learning pytorch experiment analytics visualization tensorboard mobile keras tensorflow

labml's People

Contributors

adrien1018 avatar dn6 avatar fabvio avatar hnipun avatar hnipuncodify avatar lakshith-403 avatar nmasnadithya avatar vpj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

labml's Issues

Remove git commits/branches check

Hello there!

First of all thanks for your library, used it in my recent open source project!

Now, I want to share my criticism.

I have another project, but there we decided to set remotes' names of our repo different from default: bars and upstream, there is no origin as you can see.

So I had this error:

.labml.yml:

check_repo_dirty: false
experiments_path: '.labml'
web_api: 'secret'

Error:

Traceback (most recent call last):
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/src/train.py", line 395, in <module>
    train()
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/src/train.py", line 354, in train
    with experiment.record(name=MODEL_SAVE_NAME, exp_conf=args.__dict__) if args.labml else ExitStack():
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/labml/experiment.py", line 388, in rec
ord
    create(name=name,
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/labml/experiment.py", line 86, in crea
te
    _create_experiment(uuid=uuid,
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/labml/internal/experiment/__init__.py"
, line 511, in create_experiment
    _internal = Experiment(uuid=uuid,
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/labml/internal/experiment/__init__.py"
, line 225, in __init__
    self.run.repo_remotes = list(repo.remote().urls)
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/git/remote.py", line 553, in urls
    raise ex
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/git/remote.py", line 529, in urls
    remote_details = self.repo.git.remote("get-url", "--all", self.name)
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/git/cmd.py", line 545, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/git/cmd.py", line 1011, in _call_process
    return self.execute(call, **exec_kwargs)
  File "/media/sviperm/9740514d-d8c8-4f3e-afee-16ce6923340c3/sviperm/Documents/Aurora/Aurora.ContextualMistakes/venv/lib/python3.9/site-packages/git/cmd.py", line 828, in execute
    raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git remote get-url --all origin
  stderr: 'fatal: No such remote 'origin''

So my question β„–1 is: why labml check git branches/remote/commits? What is the idea behind this logic? I think that library for training monitoring for ML project doesn't need to do that. If developer / data scientist want to track git and prevent training because of uncommited changes, he/she can write this logic by his own.

Question β„–2: if i set check_repo_dirty: false why labml still checking repo? And what is the default value of the parameter?

2 possible suggestions:

  1. Put condition before try in labml/internal/experiment/__init__.py to prevent all this git code :
    if self.check_repo_dirty:
        try:
            repo = git.Repo(lab_singleton().path)
        
            self.run.repo_remotes = list(repo.remote().urls)
            self.run.commit = repo.head.commit.hexsha
            self.run.commit_message = repo.head.commit.message.strip()
            self.run.is_dirty = repo.is_dirty()
            self.run.diff = repo.git.diff()
        except git.InvalidGitRepositoryError:
            if not is_colab() and not is_kaggle():
                labml_notice(["Not a valid git repository: ",
                              (str(lab_singleton().path), Text.value)])
            self.run.commit = 'unknown'
            self.run.commit_message = ''
            self.run.is_dirty = True
            self.run.diff = ''
  2. Or completely remove all this git tracking code or make it deprecated.

Thanks!

Feature request: Allow setting listen address on command line & infer URL from request

Currently the app is fixed to listen on 0.0.0.0:5005. It would be great if the bind address & port can be set from command line (e.g. labml app-server --bind-address=127.0.0.1 --port=5678).

Also, the webpage will always try to fetch data from localhost:5005, making it inconvenient to connect to the server from a non-local machine. The host URL should be inferred from the request instead of being hardcoded.

Silent configs

Configs without multiple options and explicitly specified values should be treated as silent by default. We can have an api to explicitly mark configs as silent or not.

We can treat non-silent configs as hyper parameters on lab dashboard and when writing to Tensorboard HParams.

tensorflow import?

I don't think the tensorflow import in the experiments.pytorch file is necessary - you can write to tensorboard without tensorflow.

Indeed, some of your usages are actually deprecated.

Implement Flash attention stable diffusion problems

I cannot adapt yor code for flash attention in stable diffusion. Your arguments for SpatialTransformers function are different from the CompVis repository and I don't know if you use different or if you just changed the name of the arguments (for example there is "depth" that O don't find in tour version

Tracker bug: UnicodeEncodeError: 'charmap' codec can't encode characters

After I stopped the training with tracker and start it again, I see the following error from experiment.record(name=args.experiment_name)

Traceback (most recent call last):
  File "model_combined.py", line 282, in <module>
    with experiment.record(name=args.experiment_name):
  File "C:\Users\miles\anaconda3\envs\idio\lib\site-packages\labml\experiment.py", line 439, in record
    return start()
  File "C:\Users\miles\anaconda3\envs\idio\lib\site-packages\labml\experiment.py", line 278, in start
    return _experiment_singleton().start(run_uuid=_load_run_uuid, checkpoint=_load_checkpoint)
  File "C:\Users\miles\anaconda3\envs\idio\lib\site-packages\labml\internal\experiment\__init__.py", line 463, in start
    self.run.save_info()
  File "C:\Users\miles\anaconda3\envs\idio\lib\site-packages\labml\internal\experiment\experiment_run.py", line 249, in save_info
    f.write(self.diff)
  File "C:\Users\me\anaconda3\envs\idio\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 2827-2831: character maps to <undefined>

502 - Bad Gateway

Hi,
since yesterday, I constantly receive the message '502 Bad Gateway' every time I launch an labml experiment, both from jupyter notebook and from Colab. Here an example:

Schermata da 2021-10-11 21-10-26

Moreover, I get this error from https://app.labml.ai/runs :
Schermata da 2021-10-11 21-04-28

Is there a problem with your app?

Thanks in advantage.

Is this Open Source?

Hi,
at readme it says this is Open Source, but where is that MIT license file?

Network error in comparison section

Issue: when runs were added to comparison section then were deleted, there is network error + 404 error

image
image

Run in app.labml.ai

How to reproduce: create 2 runs, add one to another to compare, then delete run which was added

Tested in incognito-mode tab, so this is no cache/cookies problem

Columns and DataType Not Explicitly Set on line 133 of build_numpy_cache.py

Hello!

I found an AI-Specific Code smell in your project.
The smell is called: Columns and DataType Not Explicitly Set

You can find more information about it in this paper: https://dl.acm.org/doi/abs/10.1145/3522664.3528620.

According to the paper, the smell is described as follows:

Problem If the columns are not selected explicitly, it is not easy for developers to know what to expect in the downstream data schema. If the datatype is not set explicitly, it may silently continue the next step even though the input is unexpected, which may cause errors later. The same applies to other data importing scenarios.
Solution It is recommended to set the columns and DataType explicitly in data processing.
Impact Readability

Example:

### Pandas Column Selection
import pandas as pd
df = pd.read_csv('data.csv')
+ df = df[['col1', 'col2', 'col3']]

### Pandas Set DataType
import pandas as pd
- df = pd.read_csv('data.csv')
+ df = pd.read_csv('data.csv', dtype={'col1': 'str', 'col2': 'int', 'col3': 'float'})

You can find the code related to this smell in this link: https://github.com/lab-ml/labml/blob/deea217e6d13d245d32ff904593876e5c56d3528/samples/stocks/build_numpy_cache.py#L123-L143.

I also found instances of this smell in other files, such as:

File: https://github.com/lab-ml/labml/blob/master/helpers/labml_helpers/datasets/csv.py#L16-L26 Line: 21
.

I hope this information is helpful!

Support for TF 2.0

The previous issue #2 suggested that migration to the TensorFlow 2.0 API was underway. Has it being completed? or is there room to work on that at the moment.

Thanks.

UnicodeEncodeError: 'gbk' codec can't encode character

Hello, first of all thank you for your open source, I'm trying to learn to use this module recently. But today I got the following error when running the model, it seems like there are some errors when I follow the GIt commit? What should I do to fix it ?

image

500 Error Issue

Hi,

I see this error "Oops! Something went wrong 500 Seems like we are having issues right now".

I'm also unable to run the app locally. Please advise.

labml app-server gives me: labml: error: argument command: invalid choice: 'app-server' (choose from 'dashboard', 'capture', 'launch', 'monitor', 'service', 'service-run')

Checkpointing optimizers

Hi,
I am working with your framework. First of all, great job. It really saved me from the usual research mess :)
I have some questions about the checkpointing. I've seen that each layer is saved in a .npy format. However, this does not work for other objects that are based on state_dict, for example optimizers. For long trainings they should be saved with the model, since we don't want to retrain the whole model from scratch. I've looked into your checkpointing strategy here. Do you see any significant problem if instead saving all layers in .npy files we directly save the state_dict?

Running issue...

Still updating app.labml.ai, please wait for it to complete...
cannot visulize when I ran the colab example
image

Failed to connect server

Hello,when i start the experiment,the labml warning"[WinError 10061] The connection could not be made due to the target computer's positive rejection. Failed to connect: http://localhost:5005/api/v1/track?β€œis coming and i can't use the command labml app-server to start labml server in Anaconda Prompt,although i have installed the package named labml-app

How can I restarted thread again?

I try to use labml in my project. I run my code, it produced error. Try to fix it and run again but in the second time run it produced the following error:


RuntimeError                              Traceback (most recent call last)

[<ipython-input-42-c44305bd3335>](https://localhost:8080/#) in <module>()
    428 #
    429 if __name__ == '__main__':
--> 430     main()

10 frames

[/usr/lib/python3.7/threading.py](https://localhost:8080/#) in start(self)
    846 
    847         if self._started.is_set():
--> 848             raise RuntimeError("threads can only be started once")
    849         with _active_limbo_lock:
    850             _limbo[self] = self

RuntimeError: threads can only be started once

I think the previous thread stills running!
How can I reset/kill/restart the thread to run again from starting point?

Hardware Naming In Monitor

I'm trying to monitor multiple machines' usages. Their names in the dashbord are always My Computer. It seems there is no option in configs.yaml for naming a machine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.