ray-project / ray-educational-materials Goto Github PK

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

License: Apache License 2.0

Jupyter Notebook 96.43% Python 3.57%

deep-learning distributed-machine-learning ray-distributed ray-tune ray ray-train ray-data ray-serve generative-ai llm

ray-educational-materials's People

Contributors

Stargazers

Watchers

ray-educational-materials's Issues

[Suggestion]: Reorganize this repository under consistent directories centered around workflows.

Please share your suggestion here

As the number of different notebooks grows, it becomes more and more difficult to surface what it is that users are interested in. Right now, the directories are named around either relevant library (e.g. "Ray Core") or around type of data (e.g. "Computer_vision_workloads").

At the very least, these conventions should be consistent, and ideally, centered around workflows that developers would relate to. In addition, the README should increase in quality to better describe this repository as well as direct attention and traffic to the relevant modules more quickly.

[Suggestion]: Ray use cases section should split simple scaling vs advanced use cases

Please share your suggestion here

Currently the list of use cases in https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Overview_of_Ray.ipynb contains the following:

Exoshuffle
Building a custom feature engineering library
Alpa
RLlib / FIFA
Multi-model serving
RL training / Riot
ML platform / Shopify
ML platform / Spotify

This is skewed toward advanced use cases, which I don't think accurately reflects the entire target audience of Ray. I think it would be productive to break this down into two categories:

Scaling simple ML workloads
- Batch inference on CPUs and GPUs (Core / Data)
- Parallel training of many small models / Distributed training of large models (Core / Train)
- Managing parallel experiments and hyperparameter tuning (Tune)
- Serving model pipelines or multiple models (Serve)
- Reinforcement Learning (RLlib)
- ML platform use cases (Shopify, Spotify)
Implementing advanced ML workloads
- Alpha
- Exoshuffle
- Custom feature eng library
- RL training / Riot / FIFA

[Bug]: Introduction to Ray AIR Serve Code Snippet Wrong

Notebook with bug

Introduction to Ray AIR

What happened?

The serve code snippet is the tune one, and should be swapped out.

Environment info

n/a

Issue Severity

Low: Minor problem.

[Suggestion]: NLP_workloads/Text_generation /LLM_finetuning_and_batch_inference.ipynb Preprocessor

Please share your suggestion here

Should the labels in preprocess_function be enconded output? It seems to used input_ids as label instead of output.

[Suggestion]: It's better to test the exmaples in the educational materials

Please share your suggestion here

predictions_dataset = predictor.predict(data=dataset, batch_size=1)

If I run on a GPU server, this line will raise a RayTaskError. It seems

the returned segmentation_maps_postprocessed has to be put into CPU numpy and the `num_gpus_per_worker=1' has to be set. It took me much time to realize the example has that issue. For a newbie, even a minor issue may lead to confusion.

Thanks

[Bug]: Broken link to Dask in the Overview of Ray notebook

Notebook with bug

https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Overview_of_Ray.ipynb

What happened?

The link to "Dask on Ray" takes you to a 404 page:

https://docs.ray.io/en/latest/data/dask-on-ray.html

It looks like it should be this page:

https://docs.ray.io/en/latest/ray-more-libs/dask-on-ray.html

Environment info

N/A

Issue Severity

Low: Minor problem.

[Suggestion]: incorporate feedback from "Overview of Ray" dry run

Please share your suggestion here

Here are a list of small changes to make based off of feedback from the "Overview of Ray" dry run:

include an object store visualization under the section "Put data in the object store"
change the naming of training and testing set components to be more readable
redirect use case links to YouTube videos rather than our site
lower the number of models to be trained
start with n_estimators as 8 and then increment in 8 to achieve a more satisfying convergence

[suggestion] batch inference module - merge sections to better present Ray AIR

Please share your suggestion here

Merge Datasets and BatchPredictor approaches into one: "Distributed batch inference with Ray AIR".

Datasets approach is more basic; BatchPredictor is more specialized, easy to use and feature rich as it also:

supports various predictos (TorchPredictor, HFPredictor)
handles framework native batch conversions
give an options to resume operations from AIR checkpoint to prediction, selection / keep columns, etc.

Note in this section that BatchPredictor calls dataset.map_batches() under the hood. From that perspective they are similar.

Ray Website "Try It Out" Quick Start with Ray AIR Colab Error on Import

Notebook with bug

https://colab.research.google.com/github/ray-project/ray-educational-materials/blob/main/Introductory_modules/Quickstart_with_Ray_AIR_Colab.ipynb

What happened?

Description
Running the "try it out" colab on the website fails with import error.
AttributeError: 'NoneType' object has no attribute 'replace'
Using the latest version of xgboost-ray (0.1.18) fix the problem.

Link
https://colab.research.google.com/github/ray-project/ray-educational-materials/blob/main/Introductory_modules/Quickstart_with_Ray_AIR_Colab.ipynb

Environment info

ray==2.3.0 xgboost_ray==0.1.15

Issue Severity

Low: Minor problem.

[Bug]: Ray_Core/ray_core_1_remote_functions.ipynb invalid import

Notebook with bug

https://github.com/ray-project/ray-educational-materials/blob/main/Ray_Core/Ray_Core_1_Remote_Functions.ipynb

What happened?

The first line of example 3 includes the following import: import tasks_helper_utils as t_utils. But, tasks_helper_utils is not a real library.

Environment info

ray, version 2.7.0, Python 3.11.5, MacOS Monterey 12.2.1

Issue Severity

Low: Minor problem.

[Bug]: ray.air checkpoints has moved to ray.train checkpoints

Notebook with bug

Computer_vision_workloads/Semantic_segmentation/Scaling_batch_inference.ipynb

What happened?

Import as well as other dependencies need to be fixed for chekpoint related changes.

#from ray.air import Checkpoint
from ray.train import Checkpoint

Futher Checkpoint.from_dict() does not work as:

AttributeError: The new ray.train.Checkpoint class does not support from_dict(). Instead, only directories are supported.

Environment info

Ray 2.10.0
Python 3.10.13
Ubuntu

Issue Severity

None

[Suggestion]: add "Part 3" to the Overview of Ray

Please share your suggestion here

Add Part 3, that will consist of small coding exercises:

Work with Object store

add object with ray.put()
print returned object reference
use ray.get() to access value of the object.
Mention that tasks and actors return futures that are references as well.

Compute pi digits
Use this docs example to show highly_parallel computational job - compute pi digits.

What's the meaning on these senstence of "Part 5: Distributed batch inference with Ray Core API"

When using Ray, you can pass objects as arguments to remote functions. Ray will automatically store these objects in the local object store (on the worker node where the function is running) using the ray.put() function. This makes the objects available to all local tasks. However, if the objects are large, this can be inefficient as the objects will need to be copied every time they are passed to a remote function.

To improve performance, you can explicitly store both the model and feature extractor in the object store by using ray.put(). This avoids the need to create multiple copies of the objects.

I am confused on the words on : ray.put()

"However, if the objects are large, this can be inefficient as the objects will need to be copied every time they are passed to a remote function "
"To improve performance, you can explicitly store both the model and feature extractor in the object store by using ray.put(). This avoids the need to create multiple copies of the objects."

which sentence should I follow ?

[Suggestion]: add descriptions on how many Actors are needed given my cluster

Please share your suggestion here

Help Ray users understand how they can estimate number of Actors and compute needed to achieve performant batch prediction. Mention the following:

actor defaults (1 cpu) and how to change it
how to assign GPU to actors
total number of actors as a function of number of cpus or gpus in the cluster.
for large cluster mention good practice of limiting the number of CPUs made available on the head node (docs).

[Bug]: UnidentifiedImageError in Ray_Core/Ray_Core_1_Remote_Functions.ipynb

Notebook with bug

Ray_Core_1_Remote_Functions.ipynb

What happened?

Running this cell gives

UnidentifiedImageError: cannot identify image file '**/ray-educational-materials/Ray_Core/task_images/stennis.jpg'

Environment info

Ray: 2.3.1
Python: 3.10.12
OS: Ubuntu 22.04

Issue Severity

Minor

[Suggestion]: No link to ray dashboard as instructed

Please share your suggestion here

hi,

I dont see any link to ray dashboard as instructed:

[Bug]: Halt due to resources are not available

Example 3: How to use Ray distributed tasks for image transformation and computation

What happened?

When I run the "run_distribued"， I had the following errors:

In my case I set the batch to 100 but even I set it to 35, the errors raised too.

I am new to Ray and can not figure out what is going on . What resouces are unavailable and why does the syestm halt?

Environment info

System: Centos 7
CPUs: 128
Ray: 2.3
python 3.9

Issue Severity

None

[Suggestion]: Batch inference module: improve comparison table at the end of module

Please share your suggestion here

Add more rows to the table:

level of control
exposed or hidden parallelism
stateless / stateful
pre and post processing options
how much you should know about ray to use it?
flexibility and ease-of-use dimensions

Add more content to the table:

ideal use case
when to use? why?

[Suggestion]: Better examples for ML practitioner

Please share your suggestion here

ML practitioner examples -> add scalable training and parallel training examples. Training many models in parallel

[Suggestion]: batch prediction module: merge Actors and ActorPool sections

Please share your suggestion here

Merge Actors and ActorPool approaches into one.

As ActorPool is a utility, it can be presented as a convenience wrapper that it easy to work with. It provides load balancing and Actors management so that Ray user does not need to implement it themselves (as presented in the Actors section).

[Bug]:

Notebook with bug

LLM_finetuning_and_batch_inference.ipynb

What happened?

Get the following errors while running the following cell
trainer = HuggingFaceTrainer( trainer_init_per_worker=trainer_init_per_worker, scaling_config=ScalingConfig(num_workers=num_workers, use_gpu=use_gpu), datasets={ "train": train_dataset, "evaluation": validation_dataset, }, run_config=RunConfig( checkpoint_config=CheckpointConfig( num_to_keep=1, checkpoint_score_attribute="eval_loss", checkpoint_score_order="min", ), ), preprocessor=batch_preprocessor, )

Environment info

ray 2.8 python3.9

Issue Severity

High: It blocks me from completing my task.

[Suggestion]: Improve Readability of Ray Serve Use Case Image

Please share your suggestion here

The collection of diagrams for the Ray Serve use case under the section "Mutli-model composition for model serving" is illegible and cluttered. Replace this image with a more readable diagram whenever it becomes available.

[Bug]: Failing to read AWS S3 file(s)

Notebook with bug

https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Introduction_to_Ray_AI_Runtime.ipynb

What happened?

Failed to execute following python code:

# Read Parquet file to Ray Dataset.
dataset = ray.data.read_parquet(
    "s3://anyscale-training-data/intro-to-ray-air/nyc_taxi_2021.parquet"
)




### Environment info



Python version: 3.11.3
Ray version: 2.5.0



### Issue Severity

High: It blocks me from completing my task.

ray-project / ray-educational-materials Goto Github PK

ray-educational-materials's People

Contributors

Stargazers

Watchers

Forkers

ray-educational-materials's Issues

Please share your suggestion here

Please share your suggestion here

Notebook with bug

What happened?

Environment info

Issue Severity

Please share your suggestion here

Please share your suggestion here

Notebook with bug

What happened?

Environment info

Issue Severity

Please share your suggestion here

Please share your suggestion here

Notebook with bug

What happened?

Environment info

Issue Severity

Notebook with bug

What happened?

Environment info

Issue Severity

Notebook with bug

What happened?

Environment info

Issue Severity

Please share your suggestion here

Please share your suggestion here

Notebook with bug

What happened?

Environment info

Issue Severity

Please share your suggestion here

What happened?

Environment info

Issue Severity

Please share your suggestion here

Please share your suggestion here

Please share your suggestion here

Notebook with bug

What happened?

Environment info

Issue Severity

Please share your suggestion here

Notebook with bug

What happened?

Recommend Projects

Recommend Topics

Recommend Org