baguasys / bagua Goto Github PK
View Code? Open in Web Editor NEWBagua Speeds up PyTorch
Home Page: https://tutorials.baguasys.com/
License: MIT License
Bagua Speeds up PyTorch
Home Page: https://tutorials.baguasys.com/
License: MIT License
but we should support --no_python
Also, we need the option to disable creating a log file.
bagua/bagua/torch_api/distributed.py
Lines 124 to 129 in e4ee5ee
TODO
comment in e4ee5ee. It's been assigned to @shjwudp because they committed the code.List[BaguaTensor]
BaguaBucket
Test script https://github.com/BaguaSys/examples/tree/main/benchmark
All rsp.json()
should handle network failure
For better collaboration.
remove some log info in bagua.
INTERNAL:EXEMPT_ISSUE_CHECKER
We need to solve the optimizer OOM when param storage changed
Currently autotune service rely on the order of tensor registration to do bucketing.
We need to add support for reporting tensor completion order so that the autotune service does not rely on tensor registration order
For better bug discovery.
2021-06-15T02:42:53.000561022Z stdout * Environment: production
2021-06-15T02:42:53.000607172Z stdout WARNING: This is a development server. Do not use it in a production deployment.
2021-06-15T02:42:53.00061242Z stdout Use a production WSGI server instead.
We need to fix some type mismatch in the current implementation:
autotune_system_hyperparameters: Function IntParam.__init__ was called with the wrong arguments [wrong-arg-types]
Expected: (self, val, space_dimension: Tuple[int, int])
Actually passed: (self, val, space_dimension: List[int])
File "/github/workspace/bagua/autotune/__init__.py", line 173, in autotune_system_hyperparameters: Function IntParam.__init__ was called with the wrong arguments [wrong-arg-types]
Expected: (self, val, space_dimension: Tuple[int, int])
Actually passed: (self, val, space_dimension: List[int])
File "/github/workspace/bagua/autotune/__init__.py", line 180, in autotune_system_hyperparameters: Function IntParam.__init__ was called with the wrong arguments [wrong-arg-types]
Expected: (self, val, space_dimension: Tuple[int, int])
Actually passed: (self, val, space_dimension: List[int])
File "/github/workspace/bagua/autotune/__init__.py", line 187, in autotune_system_hyperparameters: Function IntParam.__init__ was called with the wrong arguments [wrong-arg-types]
Expected: (self, val, space_dimension: Tuple[int, int])
Actually passed: (self, val, space_dimension: List[int])
Currently there are only broadcast
and allreduce
: https://bagua.readthedocs.io/en/latest/autoapi/bagua/torch_api/communication/index.html
bagua/bagua/torch_api/distributed.py
Lines 125 to 130 in 6f62a84
TODO
comment in 6f62a84. It's been assigned to @NOBLES5E because they committed the code.autotune_client.register_models
should be autotune_client.register_tensors
parameter_groups
should be tensor_groups
response["is_autotune_processing"]
rename this to is_autotune_completedFor example some models in https://huggingface.co/transformers/pretrained_models.html
We need experiment to confirm whether we should change message_size
to 40M.
bagua/bagua/torch_api/__init__.py
Lines 346 to 347 in 3a89e12
In order to be consistent with PyTorch 1.9 launcher.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.