Giter Club home page Giter Club logo

tensorflow-rematerialization's Introduction

TensorFlow XLA Rematerialization

This project brings an interface to use rematerialization in the TensorFlow XLA Compiler.

Installing

This patch adds rematerialization support to TensorFlow 2.0 XLA branch. To install it, execute the following commands:

git clone https://github.com/tensorflow/tensorflow
git checkout 64c3d382cadf7bbe8e7e99884bede8284ff67f56
cd tensorflow
git clone https://github.com/microsoft/tensorflow-rematerialization
git apply -v tensorflow-rematerialization/remat.patch

After cloning TensorFlow and applying the remat patch, you can compile and install it by following the TensorFlow documentation: https://www.tensorflow.org/install/source

Using

You can control the rematerialization in XLA by passing different flags to the environment variable XLA_FLAGS. The available flags are:

• --xla_use_hlo_rematerialization 
		○ This flag enables the rematerialization to pass after all optimizations and just before code emission. This flag is necessary for the use of any other of the following flags.
		
• --xla_rematerialization_mem_limit={NUMBER} 
		○ This flag sets the memory budget for your application. 
    The remat heuristic will try to fit the model in that amount of memory. 
    If the memory budget is larger than the model size no rematerialization is applied. The parameter {NUMBER} is in bytes.
		
• --xla_rematerialization_scheduler={SCHEDULER_NAME}
		○ This flag allows us to choose which scheduler to use just before rematerialization. The scheduling can impact in the performance of the heuristics. 
    Four options are available: 
      (1) if no scheduler is set the default TF behavior is to apply the three next heuristics and select the best in terms of    performance (not considering the remat here); 
      (2) "postorder"; 
      (3) "DFS" and; 
      (4) "list". 
		
• --xla_rematerialization_algorithm={HEURISTIC_NAME}
		○ This flag sets which heuristics to use to rematerialize. 
    Four options are available: 
      (1) "standard" which uses the original implementation of the rematerialization that can be found inside the TF source tree; 
      (2) "compress" which tries to reorder dimensions of tensors in order to reduce their representation size (this is also an implementation that is inside TF, but it is not a rematerialization); 
      (3) "standardcompress" which applies both the standard remat and the compress technique; and, finally, (4) "path", our approach to remat which recursively tries to rematerialize paths and then derematerialize articulation operations while still in the memory budget.
		
• --xla_rematerialization_small_node_limit={NUMBER}
		○ This flag sets the minimum size that an HLO node has to be considered for rematerialization. The expect {NUMBER} is in MiB and 0 disables this feature. Having a limit of 1 MiB showed to increase significantly the performance of both standard and path heuristics.
		
• --xla_rematerialization_disable_cuda
		○ This flag disables part of the CUDA fusions in the HLO graph making it easier to apply rematerialization as fewer side effect  nodes will exist.
    
• --xla_rematerialization_dump_dot
		○ Dumps a dot graph of the HLO before and after the rematerialization.
   
• --xla_rematerialization_dump_memlog
		○ Dumps a log of the memory use and remat decisions information for each HLO instruction.

Note: TF_XLA_FLAGS="--tf_xla_auto_jit=2" needs to be set to activate XLA compiler.

Example:

TF_XLA_FLAGS="--tf_xla_auto_jit=2" XLA_FLAGS="--xla_dump_to=dump --xla_dump_hlo_as_text --xla_use_hlo_rematerialization --xla_rematerialization_mem_limit=1073741824 --xla_rematerialization_algorithm=path --xla_rematerialization_small_node_limit=1" python resnet_cifar_main.py

Results

Using rematerialization in TensorFlow XLA can help reduce the amount of memory necessary to run a model making it fit in your accelerator.

memory before and after using XLA rematerialization

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

tensorflow-rematerialization's People

Contributors

microsoft-github-operations[bot] avatar microsoftopensource avatar vandersonmr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tensorflow-rematerialization's Issues

Hope for some speed benchmarks corresponding to the memory ones

Thank you for your nice work on XLA that could significantly reduce memory usage!

Upon my understanding, the rematerialization is a trade-off between memory usage and speed in terms of xla (where the memory read would mostly faster than recalculation). I wonder if you could provide some information about the speed loss (if any) of each remat algorithm corresponds to the amount of memory saved. Thank you :).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.