Giter Club home page Giter Club logo

bagua's People

Contributors

dependabot[bot] avatar ganshaoduo avatar github-actions[bot] avatar jliu87 avatar liuhatry avatar marisakirisame avatar nobles5e avatar shjwudp avatar tengxu-sun avatar wangraying avatar woqidaideshi avatar youhe-jiang avatar zhangce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bagua's Issues

@shjwudp add support for reporting tensor completion order so that the autotune ...

# TODO: @shjwudp add support for reporting tensor completion order so that the autotune service does not
# rely on tensor registration order
from bagua.torch_api.communication import get_bagua_hyperparameters
self._bagua_autotune_client.report_metrics(
rank=get_rank(),


This issue was generated by todo based on a TODO comment in 96cb6fe when #24 was merged. cc @BaguaSys.

python refactor tracking issue

  • bucketing
    • bucketing function
      • input: List[BaguaTensor]
      • output: BaguaBucket
    • Bucket class
      • underlying storage
      • list of tensors
    • get from and push to autotune(hyperparameter) server
  • register backward hook function
    • register_backward_hook_for_all_parameters(model, callback: lambda)
  • algorithm classes
  • optimizer
    • optimizer sharding function
      • input: original optimizer, List[BaguaBucket]
      • output: List[optimizer]
  • algorithms
    • gradient allreduce
    • 1bit adam
    • lower precision gradient
    • decentralized
    • low precision decentralized
    • async allreduce
    • sharded async

Test script https://github.com/BaguaSys/examples/tree/main/benchmark

remove parameter group logic

] # TODO: remove parameter group logic
)
self._bagua_autotune_client.register_models( # TODO: @shjwudp rename to register tensors
autotune_tensor_list, bagua_tensor_group_info
).json() # TODO: @shjwudp error check

With bagua-core 0.3, we no longer need to flatten all at once. Now we can remove parameter groups.


This issue was generated by todo based on a TODO comment in 96cb6fe when #24 was merged. cc @BaguaSys.

@shjwudp

req: dict = request.get_json(force=True) # TODO: @shjwudp
tensor_ready_order: list = req["tensor_ready_order"]
communication_time_ms: float = req["communication_time_ms"]
hyperparameters: dict = req["hyperparameters"]


This issue was generated by todo based on a TODO comment in 96cb6fe when #24 was merged. cc @BaguaSys.

Fix service development warning

2021-06-15T02:42:53.000561022Z stdout  * Environment: production
2021-06-15T02:42:53.000607172Z stdout    WARNING: This is a development server. Do not use it in a production deployment.
2021-06-15T02:42:53.00061242Z stdout    Use a production WSGI server instead.

autotune service: fix typing

We need to fix some type mismatch in the current implementation:

autotune_system_hyperparameters: Function IntParam.__init__ was called with the wrong arguments [wrong-arg-types]
         Expected: (self, val, space_dimension: Tuple[int, int])
  Actually passed: (self, val, space_dimension: List[int])
File "/github/workspace/bagua/autotune/__init__.py", line 173, in autotune_system_hyperparameters: Function IntParam.__init__ was called with the wrong arguments [wrong-arg-types]
         Expected: (self, val, space_dimension: Tuple[int, int])
  Actually passed: (self, val, space_dimension: List[int])
File "/github/workspace/bagua/autotune/__init__.py", line 180, in autotune_system_hyperparameters: Function IntParam.__init__ was called with the wrong arguments [wrong-arg-types]
         Expected: (self, val, space_dimension: Tuple[int, int])
  Actually passed: (self, val, space_dimension: List[int])
File "/github/workspace/bagua/autotune/__init__.py", line 187, in autotune_system_hyperparameters: Function IntParam.__init__ was called with the wrong arguments [wrong-arg-types]
         Expected: (self, val, space_dimension: Tuple[int, int])
  Actually passed: (self, val, space_dimension: List[int])

@shjwudp add support for reporting tensor completion order so that the autotune ...

# TODO: @shjwudp add support for reporting tensor completion order so that the autotune service does not
# rely on tensor registration order
from bagua.torch_api.communication import get_bagua_hyperparameters
self._bagua_autotune_client.report_metrics(
rank=get_rank(),


This issue was generated by todo based on a TODO comment in 6f62a84. It's been assigned to @NOBLES5E because they committed the code.

autotune service: naming

  • autotune_client.register_models should be autotune_client.register_tensors
  • parameter_groups should be tensor_groups
  • response["is_autotune_processing"] rename this to is_autotune_completed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.