Giter Club home page Giter Club logo

oslo's Introduction

OSLO: Open Source for Large-scale Optimization

What is OSLO about?

OSLO is a framework that provides various GPU based optimization technologies for large-scale modeling. Features like 3D parallelism and kernel fusion which could be useful when training a large model are the key features. OSLO makes these technologies easy-to-use by magical compatibility with Hugging Face Transformers that is being considered as a de facto standard in NLP field. We look forward large-scale modeling technologies to be more democratized by significantly decreasing the difficulty of using these technologies using OSLO.

Installation

OSLO can be easily installed using the pip package manager. Be careful that the ‘core’ is in the PyPI project name.

pip install oslo-core

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {},
  title        = {OSLO: Open Source for Large-scale Optimization},
  howpublished = {\url{https://github.com/EleutherAI/oslo}},
  year         = {2022},
}

Licensing

The code of the OSLO is licensed under the terms of the Apache License 2.0.

Copyright 2022 EleutherAI. All Rights Reserved.

oslo's People

Contributors

bzantium avatar cozytk avatar dongs0104 avatar dongsungkim avatar dtuname avatar erichallahan avatar gimmaru avatar github-actions[bot] avatar hmy831004 avatar hyunwoongko avatar ingyuseong avatar jason9693 avatar jayten-jeon avatar jinmang2 avatar jinok2im avatar jinwonkim93 avatar josemlopez avatar kkieek avatar l-yohai avatar loopinf avatar micpie avatar minqukanq avatar mistobaan avatar ohwi avatar quentin-anthony avatar reniew avatar scsc0511 avatar singleheart avatar tree-park avatar yhna940 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

oslo's Issues

Add datadistribedSampler to DDP

Describe a TODO feature

  • Current test code uses general dataloader which provides data (duplicated). We need to change it to distributedSampler to be used for DDP case.

Assignees

Syntax error in mappings_utils.py when installing OSLO

How to reproduce

python setup.py install

Environment

  • OS : CentOS 7.9
  • Python version : 3.9
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:
Extracting oslo_core-3.0.0-py3.7.egg to /opt/conda/lib/python3.7/site-packages
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/transformers/mapping_utils.py", line 141
    OPT=[
       ^
SyntaxError: invalid syntax


I've changed it for continuing my tests :

       "OPT": [
            Column("q_proj", "k_proj", "v_proj", "fc1"),
            Row("out_proj", "fc2"),
            Update("embed_dim", "num_heads"),
            Head("lm_head", "score"),
        ]

SP parameter device type error

How to reproduce

Environment

  • OS : CentOS 7.9
  • Python version : 3.7
  • Transformers version : 4.21.3
  • Whether to use Docker:
  • Misc.:

Description

model_no_sp = GPT2LMHeadModel(GPT2Config.from_pretrained(configs["model_name"])).cuda()
model_sp = GPT2LMHeadModel(GPT2Config.from_pretrained(configs["model_name"]))

model_sp = SequenceDataParallel(
model_sp,
parallel_context=parallel_context,
)

Error comes from init of _DistributedDataParallel due to parameters are on CPU not on GPU.

  • Need to remove device_type check code of parameter in _DistributedDataParallel

'TrainingArguments' object has no attribute 'parallel_mode' when running mBart test

How to reproduce

python ./tests/transformers/models/mbart/test_training.py

Environment

  • OS : CentOS 7.9
  • Python version : 3.9
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:
python ./tests/transformers/models/mbart/test_training.py Reusing dataset glue (/root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
100%|███████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 682.67it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 68/68 [00:01<00:00, 52.15ba/s]
100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 43.15ba/s]
100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 42.94ba/s]
You are using a model of type bart to instantiate a model of type mbart. This is not supported for all configurations of models and can yield errors.
Some weights of MBartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['encoder.layer_norm.bias', 'decoder.layer_norm.weight', 'encoder.layer_norm.weight', 'decoder.layer_norm.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are using a model of type bart to instantiate a model of type mbart. This is not supported for all configurations of models and can yield errors.
Some weights of MBartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['encoder.layer_norm.bias', 'decoder.layer_norm.weight', 'encoder.layer_norm.weight', 'decoder.layer_norm.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
PyTorch: setting up devices
Traceback (most recent call last):
  File "./tests/transformers/models/mbart/test_training.py", line 94, in <module>
    fp16=False,
  File "./tests/transformers/models/mbart/test_training.py", line 44, in train
    eval_dataset=dataset["validation"],
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/transformers/trainer.py", line 186, in __init__
    if len(args.parallel_mode) > 0:
AttributeError: 'TrainingArguments' object has no attribute 'parallel_mode'

The problem seems to be the parallel_mode property in training_args.py is commented, line 989

# @property # def parallel_mode(self): # """ # The current mode used for parallelism if multiple GPUs/TPU cores are available. One of: # # -ParallelMode.NOT_PARALLEL: no parallelism (CPU or one GPU). # - ParallelMode.NOT_DISTRIBUTED: several GPUs in one single process (uses torch.nn.DataParallel). # - ParallelMode.DISTRIBUTED: several GPUs, each having its own process (uses # torch.nn.DistributedDataParallel). # - ParallelMode.TPU: several TPU cores. # """ # # if is_torch_tpu_available(): # # return ParallelMode.TPU # # elif is_sagemaker_mp_enabled(): # # return ParallelMode.SAGEMAKER_MODEL_PARALLEL # # elif is_sagemaker_dp_enabled(): # # return ParallelMode.SAGEMAKER_DATA_PARALLEL # if self.local_rank != -1: # return ParallelMode.DISTRIBUTED # elif self.n_gpu > 1: # return ParallelMode.NOT_DISTRIBUTED # else: # return ParallelMode.NOT_PARALLEL

TODO: Redesign ZeRO modules

Describe a TODO feature

  • Currently we use Fairscale's copy to perform ZeRO. We need to analyze this code further and modify it to a functional design.

Assignees

Error when installing OSLO

How to reproduce

python setup.py install

Environment

  • OS : CentOS 7.9
  • Python version : 3.9
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:
# python setup.py install
Traceback (most recent call last):
  File "setup.py", line 18, in <module>
    long_description=open("README.md").read(),
FileNotFoundError: [Errno 2] No such file or directory: 'README.md'

Solution applied in my local: include a README.md

add mapping for oslo model

Describe a TODO feature

  • Add mapping for oslo models to test vocab parallel crossentropy loss

Assignees

fused_bias_gelu is missing when call BertModel

How to reproduce

python test_modeling_bert.py

Environment

  • OS : Ubuntu

  • Python version : 3.7.14

  • Transformers version : 4.22.1

  • Whether to use Docker: No

  • Misc.:
    : This bug is caused from #30 removing all fused kernels

  • bert, reberta still use onn.fused_bias_gelu

Error message

AttributeError: module 'oslo.torch.nn' han no attribute 'fused_bias_gelu'

Implement vocab parallel cross entropy loss

Describe a requested feature

  • Implement cross entropy for vocab paralleled logits in tensor parallel 1D, 2D, 2p5D, 3D.
  • Implement test codes.

Expected behavior

>>> criterion = VocabParallelCrossEntropyLoss(parallel_context)
>>> loss = criterion(vocab_parallel_logits, targets)

TODO: Fix pipeline parallelism bugs

Describe a TODO feature

  • Currently, when pipeline parallelization is run on a large model, an issue arises that gradient values are different. This issue should be addressed.

Assignees

Milestone: OSLO 2.1

We'll release OSLO 2.1 here before pipeline parallelism is complete. And we will integrate the features available in version 2.1 into Hugging Face Transformers. cc @stas00

fix _FullyShardedDataParallelMapping when running test_fsdp.py

How to reproduce

python -m torch.distributed.run --nproc_per_node=2 --master_port=2333 ./tests/torch/nn/parallel/data_parallel/test_fsdp.py

Environment

  • OS :
  • Python version :
  • Transformers version :
  • Whether to use Docker:
  • Misc.:

The problem is in _fsdp where _FullyShardedDataParallelMappingForHuggingFace is used instead of _FullyShardedDataParallelMapping
from oslo.transformers.mapping_utils import ( _FullyShardedDataParallelMappingForHuggingFace, )

FSDP returns different loss value with zero stage 2 and 3

How to reproduce

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nnodes=1 --nproc_per_node=2  ./tests/torch/nn/parallel/data_parallel/test_fsdp.py --zero-stage 2

Environment

  • OS : ubuntu18.04
  • Python version : python3.7
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:

PatricStar for Zero

Describe a TODO feature

  • Added PatricStar chunk manager to Zero case

Assignees

No module named oslo.transformers.data when running mBart test.

How to reproduce

python ./tests/transformers/models/mbart/test_training.py

Environment

  • OS : CentOS 7.9
  • Python version : 3.9
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:
# python ./tests/transformers/models/mbart/test_training.py
 
Traceback (most recent call last):
  File "./tests/transformers/models/mbart/test_training.py", line 2, in <module>
    from oslo.transformers.trainer import Trainer as OTrainer
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/transformers/trainer.py", line 37, in <module>
    from .data.data_collator import (
ModuleNotFoundError: No module named 'oslo.transformers.data'

I've changed included __init__.py in the oslo/transformers/data folder to continue with the tests.

Integration ZeroDDP and ShardedModelv2 from colossal AI

Describe a TODO feature

There are two version of Zero support from ZeroDDP and ShardedModelv2

  1. check the possibility to merge two into one
  2. Otherwise, it has a function to choose one of them based on the flag (not shown to users directly)

Assignees

  • Dongsung and Hyen

Add description how to use fused_scale_softmax

Describe a TODO feature

  • It is hard to know how to use fused scale mask softmax
    • what is scale value and how it is used in attention layer.
    • missing test case for scale value result for not scale = 1.0

Assignees

TODO: Make DP + EP available

Describe a TODO feature

  • The Expert Parallelism (MoE) feature we currently have cannot be used with data parallelism. we'll make it can be worked with Data Parallelism and reflects a new design that can further reduce the communication amount by 1.5 times.

Assignees

coloDDP integration

Describe a TODO feature

  • Need to integrate coloDDP for patricstar and zeroDDP
  1. Port coloDDP class
  2. add test code for coloDDP

Assignees

  • Dongsung Kim

TODO: Modify wrapper design to functional

Describe a TODO feature

When multiple parallelizations are overlapped, the wrapper-style design leads to several undesirable results.

Design notes

1. The old design

class TensorParallel:
    def __init__(self, model, ...):
        self.module = model
        self.xxx_for_tp = xxx

class PipelineParallel:
    def __init__(self, model, ...):
        self.module = model
        self.yyy_for_pp = yyy

model = XXXModel.from_pretrained(...)
model = TensorParallel(model)
model = PipelineParallel(model)

2. problems

  • 2.1. accecibility
model.module.module.module.xxx_for_tp <--- it's too bad.
model.generate <--- unavailable
model.save_pretrained <--- unavailable
  • 2.2. checkpoint
"transformer.0.attn.q_proj.weight" => "module.module.module.transformer.0.attn.q_proj.weight"

3. new design - class like function !

def TensorParallel(model, parallel_context, ...):
    # do something
    return model

Assignees

Fix some code errors

  • rename ddp -> _ddp, fsdp -> _fsdp.
  • wrapper loading from save_pretrained.
  • remove tracing inputs in PP functional wrapper.

No module named 'oslo.torch.experimental' when running mbart test

How to reproduce

python ./tests/transformers/models/mbart/test_training.py

Environment

  • OS : CentOS 7.9
  • Python version : 3.9
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:
File "./tests/transformers/models/mbart/test_training.py", line 2, in <module>
    from oslo.transformers.trainer import Trainer as OTrainer
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/transformers/trainer.py", line 27, in <module>
    from oslo.torch.nn.parallel.data_parallel import (
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/torch/nn/parallel/__init__.py", line 1, in <module>
    from oslo.torch.nn.parallel.data_parallel import *
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/torch/nn/parallel/data_parallel/__init__.py", line 4, in <module>
    from oslo.torch.nn.parallel.data_parallel.fully_sharded_data_parallel import (
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/torch/nn/parallel/data_parallel/fully_sharded_data_parallel.py", line 56, in <module>
    from oslo.torch.nn.parallel.data_parallel._flatten_params_wrapper import (
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/torch/nn/parallel/data_parallel/_flatten_params_wrapper.py", line 34, in <module>
    from oslo.torch.experimental.nn.ssd_offload import SsdFlatParameter
ModuleNotFoundError: No module named 'oslo.torch.experimental'

I've changed included __init__.py in the experimental folder to continue with the tests.

Error on test_modeling_bert.py

How to reproduce

python tests/transformers/models/bert/test_modeling_bert.py

Environment

  • OS : Amazon Linux 2
  • Python version : 3.7.10
  • Transformers version : 4.21.3
  • Whether to use Docker: No
  • Misc.: slurm interactive
    return forward_call(*input, **kwargs)
  File "/fsx/loopinf/oslo-1/oslo/torch/nn/modules/linear.py", line 32, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)

Fix sorting error at allocate_param function

  • when sorting dictionary, need to use sorted(dict.items(), key=lambda item: str(item[0])) not just sorted(dict, key lambda x: x[0]) because key is enum which not comparable so need to be converted into str.

wand module not found when running test_mlm.py

How to reproduce

python ./tests/transformers/models/electra/test_mlm.py

Environment

  • OS : CentOS 7.9
  • Python version : 3.9
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:
python ./tests/transformers/models/electra/test_mlm.py 
Traceback (most recent call last):
  File "./tests/transformers/models/electra/test_mlm.py", line 8, in <module>
    import wandb
ModuleNotFoundError: No module named 'wandb'

I'm using an empty box for these tests and wandb is not installed here.
If wandb is a needed library should it be included in the setup.py?

No _TensorParallelMappingForHuggingFace

How to reproduce

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nnodes=1 --nproc_per_node=2  ./tests/torch/nn/parallel/data_parallel/test_ddp.py

The bus comes from latest submission which changes _TensorParallelMappingForHuggingFace to _ParallelMapping. It happens when parallel_context is called. (tensor_parallel import issue)

Environment

  • OS : 18.04
  • Python version : 3.7
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.