Giter Club home page Giter Club logo

torch-mac-linux-error's Introduction

Torch-Mac-Linux-Error

This repo contains everything to reproduce a prediction error between a mac an a linux machine.

Context

While trying to learn an autoregressive model on linux we noticed that the network wasn't learning anything. We randomly tried runing it on a mac we noticed outstanding performances. The results are consistant when executed on the same os. We give the results when the same weigts are used on both machines here after:

Linux

The expected result on a linux machine are as follows

linux_vx linux_vy linux_vz

The expected 1st prediction when runing the run.py script:

tensor([-192.7449, 255.3847, 153.4133, 605.0937, 1240.0849, 2499.9711], dtype=torch.float64)

Mac

mac_vx mac_vy mac_vz

The expected 1st prediction when runing the run.py script:

tensor([-0.0719, -0.0403, -0.0102, -0.0413,  0.1258, -0.0873], dtype=torch.float64)

Install

There is a python environment file that can be installed with conda.

conda create --name env_name -f environment.yml

If the install fails for some reason, the error has reproduced using torch 2.0-2.4 so feel free to try it with any of the versions.

Running

There are two scripts to generate the results:

  1. run.py Loads the model trained on the mac machine as well as the training data and runs the model.
  2. train.py Runs the training of the model with a fixed seed to get the same results as on the mac machine.

Both script will also generate the results in a pdf format for quick visual representaiton.

More info

What has been tried.

  • Tried on 3 different Linux machines 18.04, 20.04, 22.04.
  • Tried with different pytorch version 2.0, 2.1, 2.2, 2.3, 2.4.
  • Looked at different Linear algebra backend (MKL, OPENBLAS, ADVANCED) on mac and linux.
  • Reproduced the results on 2 different MACs machine

torch-mac-linux-error's People

Contributors

nicolayp avatar

Watchers

 avatar

torch-mac-linux-error's Issues

Setup error in macOS with Apple M3 Pro

I got an error using the command from README

conda create --name torch_test -f environment.yml
Channels:
 - pytorch
 - defaults
 - conda-forge
Platform: osx-64
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - environment.yml

Current channels:

  - https://conda.anaconda.org/pytorch
  - defaults
  - https://conda.anaconda.org/conda-forge

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

After running the following command

conda env create -f environment.yml

conda env create -f environment.yml

Channels:
 - pytorch
 - nvidia
 - conda-forge
 - defaults
Platform: osx-64
Collecting package metadata (repodata.json): done
Solving environment: done

Downloading and Extracting Packages:
                                                                                              
Preparing transaction: done                                                                   
Verifying transaction: done                                                                   
Executing transaction: done                                                                   
Channels:                                                                                     
 - pytorch                                                                                    
 - nvidia                                                                                     
 - conda-forge                                                                                
 - defaults                                                                                   
Platform: osx-64                                                                              
Collecting package metadata (repodata.json): done                                             
Solving environment: failed                                                                   
Channels:                                                                                     
 - pytorch                                                                                    
 - nvidia                                                                                     
 - conda-forge                                                                                
 - defaults                                                                                   
Platform: osx-64                                                                              
Collecting package metadata (repodata.json): done                                             
Solving environment: failed                                                                   
                                                                                              
LibMambaUnsatisfiableError: Encountered problems while solving:                               
  - nothing provides requested _openmp_mutex 4.5**                                            
  - nothing provides requested cuda-cudart 11.8.89**                                          
  - nothing provides requested cuda-cupti 11.8.87**
  - nothing provides requested cuda-libraries 11.8.0**
  - nothing provides requested cuda-nvrtc 11.8.89**
  - nothing provides requested cuda-nvtx 11.8.86**
  - nothing provides requested cuda-runtime 11.8.0**
  - nothing provides requested libcublas 11.11.3.6**
  - nothing provides requested libcufft 10.9.0.58**
  - nothing provides requested libcufile 1.9.1.3**
  - nothing provides requested libcurand 10.3.5.147**
  - nothing provides requested libcusolver 11.4.1.48**
  - nothing provides requested libcusparse 11.7.5.86**
  - nothing provides requested libgcc-ng 13.2.0**
  - nothing provides requested libgomp 13.2.0**
  - nothing provides requested libnpp 11.8.0.86**
  - nothing provides requested libnsl 2.0.1**
  - nothing provides requested libnvjpeg 11.9.0.86**
  - nothing provides requested libstdcxx-ng 13.2.0**
  - nothing provides requested pytorch 2.3.0**
  - nothing provides cuda 11.8.* needed by pytorch-cuda-11.8-h8dd9ede_2
  - nothing provides requested torchaudio 2.3.0**
  - nothing provides requested torchvision 0.18.0**

Could not solve for environment specs
The following packages are incompatible
├─ _openmp_mutex 4.5**  does not exist (perhaps a typo or a missing channel);
├─ cuda-cudart 11.8.89**  does not exist (perhaps a typo or a missing channel);
├─ cuda-cupti 11.8.87**  does not exist (perhaps a typo or a missing channel);
├─ cuda-libraries 11.8.0**  does not exist (perhaps a typo or a missing channel);
├─ cuda-nvrtc 11.8.89**  does not exist (perhaps a typo or a missing channel);
├─ cuda-nvtx 11.8.86**  does not exist (perhaps a typo or a missing channel);
├─ cuda-runtime 11.8.0**  does not exist (perhaps a typo or a missing channel);
├─ libcublas 11.11.3.6**  does not exist (perhaps a typo or a missing channel);
├─ libcufft 10.9.0.58**  does not exist (perhaps a typo or a missing channel);
├─ libcufile 1.9.1.3**  does not exist (perhaps a typo or a missing channel);
├─ libcurand 10.3.5.147**  does not exist (perhaps a typo or a missing channel);
├─ libcusolver 11.4.1.48**  does not exist (perhaps a typo or a missing channel);
├─ libcusparse 11.7.5.86**  does not exist (perhaps a typo or a missing channel);
├─ libgcc-ng 13.2.0**  does not exist (perhaps a typo or a missing channel);
├─ libgomp 13.2.0**  does not exist (perhaps a typo or a missing channel);
├─ libnpp 11.8.0.86**  does not exist (perhaps a typo or a missing channel);
├─ libnsl 2.0.1**  does not exist (perhaps a typo or a missing channel);
├─ libnvjpeg 11.9.0.86**  does not exist (perhaps a typo or a missing channel);
├─ libstdcxx-ng 13.2.0**  does not exist (perhaps a typo or a missing channel);
├─ pytorch-cuda 11.8**  is not installable because it requires
│  └─ cuda 11.8.* , which does not exist (perhaps a missing channel);
├─ pytorch 2.3.0**  does not exist (perhaps a typo or a missing channel);
├─ torchaudio 2.3.0**  does not exist (perhaps a typo or a missing channel);
└─ torchvision 0.18.0**  does not exist (perhaps a typo or a missing channel).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.