lightning-ai / ecosystem-ci Goto Github PK
View Code? Open in Web Editor NEWAutomate issue discovery for your projects against Lightning nightly and releases.
License: Apache License 2.0
Automate issue discovery for your projects against Lightning nightly and releases.
License: Apache License 2.0
update Makefile to run any and all integrations
enable full local testing, to emulate the cloud solution...
inspiration/parallel solution as https://github.com/PyTorchLightning/lightning-tutorials/blob/main/Makefile
Mention contact person in particular slack message
The user can safely ignore all that is not related to him...
add in the receipt yaml contact with names and then add these people in message
contact:
slack:
- Borda
email: ...
add a command for parsing with arg --channel slack
Nemo tests it's own CI on a Pytorch container from NGC (versioned as YY.MM) and these are generally available on other cloud providers too. Note that - usually once pytorch has a public release, it takes at least one month for the next container to actually have the public pytorch release. By actually have the released pytorch, I mean that the current container will have an alpha release of pytorch with some cherry-picked changes vs the actual full new release in public.
This can cause cases where improper version checking (using distutils instead of packaging.version.Version) can fail these alpha version comparison tests and cause PTL inside of the container to pick incorrect code paths. So the ecosystem CI will work fine ... but when you run it on a pytorch container released from Nvidia (ie on most cloud providers) it may fail (and not just Nemo, anything that uses PTL and hits that code path).
So maybe on a separate test prior to release, test the ecosystem CI on the latest public NGC pytorch container (or really any cloud container which has pytorch built into it). Ofc this is a big task so it's just a suggestion.
For a current example of exactly how we have to patch for such an issue right now (wrt Pytorch 1.10, NGC Container 21.01 and Pytorch Lightning 1.5.9), https://github.com/NVIDIA/NeMo/blob/8e15ba43ba0a17b456d3bfa09444574ef1faa301/Jenkinsfile#L70-L76 due to an issue regarding torchtext.
For an extreme case of exactly how bad things become - we had to adaptively install torch, PTL and nemo dependencies based on whether the install occurred inside a container or not.. https://github.com/NVIDIA/NeMo/blob/r1.0.0rc1/setup.py#L107-L146
Maybe test the ecosystem CI (or just even PTL alone) on the latest public NGC pytorch container (or really any cloud container which has pytorch built into it). Ofc this is a big task so it's just a suggestion.
Apart from manual patching of PTL source at install time, we haven't found any better solution than to wait it out for a month or two before the container actually contains the latest code from the latest torch release.
The slack bot does not run on other OS:
Error: Container action is only supported on Linux
https://github.com/PyTorchLightning/ecosystem-ci/runs/4601419524?check_suite_focus=true
the Slack integration shall be running anywhere...
See alternatives:
copy all tests for a project to _itenration
instead of repo root
thinking about having
dependencies:
- name: pytorch-lightning
HTTPS: https://github.com/PyTorchLightning/pytorch-lightning.git
checkout: release/1.5.x
and also:
dependencies:
- name: Cython
checkout: 3.56
which would do pip install Cython==3.56
simple cleaning in make or anywhere
When a change is just for particular configs we want to build a configuration with includes these configs as in principal al configs shall be completely independent and so isolated...
We may use GH CLI, ask if the change is only the one config and then skip all others...
Our CLI: input PR number and the config path and return bool if the config shall be build
Logic: build particular configs in all changes are only in configs and this is one of the changed
GH action: for execution steps add if: ...
with skip if no need for building
we want to be as efficient as possible so do not build anything which is not really needed
also taking too many resources with growing ecosystem size may suck pool for other our projects as the pool org-wide shared
the targeted build will speed up PR checks and lower the costs
While the current container tests are run against bare python, there are many important libraries that require conda to be available to properly install dependencies. Therefore conda support would be the request.
A use case for this in NeMo is Numba - we utilize it for custom CUDA kernels for RNNT loss, custom spec augment kernels and in the future for jit compiled CPU code as well. We have plenty of unit tests that are optional (and hence skipped) when these are not installed.
Another is Pynini (for WFST based text normalization and inverse text normalization) - however since the graph building takes significant time if not cached, we do not recommend performing this CPU test during your ecosystem tests. It is just an example.
The reason we prefer Numba to be installed from conda (whereas it does have a pip install path) is that often an increment in numba version is accompanied by an increment in the llvmlite library too - and this causes significant issues. llvmlite is a library that will often throw errors during upgrade, and it can be forced via --force-reinstall (for pip) but that brings about an unstable environment. Conda bypasses all of this.
Install miniconda on the fly (or use a base container that supports conda).
Leave conda tests out, and request ecosystem members to make their tests skippable if dependencies are not installed. NeMo can do this (and currently does skip numba tests if its not installed or doesn't support a recent cuda version), it may be cumbersome on a large scale.
It would be awesome if we could set the CI up to make an issue on the corresponding CI if something was to break.
Even better would be to have a stack trace showing what broke! Something like below:
Hi! I'm the PyTorchLightning EcoSystem-CI bot.
I've detected an incompatibility with your repository and Lightning's release (1.5.x). I've attached the stack trace below for help in debugging.
Any additional breakages could update the same issue!
Hi, I'm a contributor of https://github.com/intel-analytics/BigDL
Our project is using pytorch lightning and we want to use this CI, but there are some issues:
I noticed this CI always run pip install .
in the root directory of target_repository:
ecosystem-ci/actions/assistant.py
Line 178 in 2ba4864
Our project has multiple packages which can be installed, so it doesn't contain a setup.py
in the root directory.
I hope this CI can add an option to control in which directory it will run pip install .
.
i.e. change the current commands:
git clone {url}
cd {repo_name}
pip install .
to:
git clone {url}
cd {repo_name}
cd {new option to specify the installation path}
pip install .
Thanks!
lets being open and transparent and have it as an extra item in config:
build_commands:
- pip install Cython==3.56
- conda update numba --silent
more flexibility on building environment, eventually download some external/public data needed for testing
see: Lightning-AI/pytorch-lightning#14137 (comment)
Showcase all passing integrations on a special PL page
Get more visibility of active collaborations
enable some external messaging on CI failure to via email
Expand the reach of notified users as not everyone is using Slack, and/or there are too many messages...
check the following actions as options:
if we want to be progressive incompatibility we shall also report on deprecation warnings and eventually treat them as failer...
as an alternative, we can give a user option to set it as strict ๐ค
Compatibility for the future
pass pytest argument -W error::DeprecationWarning
in
https://github.com/PyTorchLightning/ecosystem-ci/blob/c544b7bc10fbef6f6b498805dd08837eb1ba96a2/actions/_config.yaml#L54
https://docs.pytest.org/en/latest/how-to/capture-warnings.html#controlling-warnings
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.