The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.

Home Page: https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/

License: Apache License 2.0

Jupyter Notebook 94.68% Shell 1.13% Python 4.05% Dockerfile 0.15%

chatgpt dalle2 data-science deep-learning diffusion-models generative-adversarial-network gpt-3 machine-learning python stable-diffusion

generative_deep_learning_2nd_edition's People

Contributors

Stargazers

Watchers

Forkers

complete-dope thainguyen54 yone550 andreacoppari iam-007swarna rufus73 souvikghosh-git ankitkumar174 stjordanis subratnayak7504 karthy257 jwgu standardgalactic j13tw jumbokh keasyops faisito denismcolin bridget462 englianhu csprabala mishirosakura vortexmath vcliang hamzaelrhazi msongi rickiepark ccwu0918 karaposu abhiagar2019 pleopardi crimsonkin cx1964 techthiyanes xiemeigongzi iamleon121 aguai1904 andyqian2015 krishnatray temp3rr0r ahtesham519 elvinado archerhuang siennasun dimitarpg13 ashkrit marrowp1968 bibuwei saibaldasprivate obinna tk2025 myflash911 zahrizhalali dreoporto jayanip giader jprussoibanez megagyger anmol369 sirkyven cm16161 tranbavinhson gitleiou avr248 dschling pamekitti frank-doroudian alexey-yakymets nsadawi subhrm menglingwei vinayasathyanarayana pbruna charlienbailey imobit thatting apulache piotr-maciag jschuller arek99 kkpalczewski pappa aadehamid bespinozam purevoidov hsahn123 jundavd krishchalana albertlleo g-iyer paulowe sugeerth spiez ian-wairimu nievespg1 peterb49 sx-yaginuma david-wl pawankumar18 nicolasbataille

generative_deep_learning_2nd_edition's Issues

Minor: Typos in README

Looks like some files got renamed but the README wasn't updated accordingly.

The two docker compose commands are using docker-compose-gpu.yml file but the file is now docker-compose.gpu.yml

Confused about RealNVP (book text vs code - confusion with forward and back passes).

I have a printed copy of the book.

Page 178 says that the forward pass is z = x exp(s) + t.
Page 179 says that the backwards/inverse pass is x =(z-t) exp(-s)

This matches some other guides such as this one

In the book it says, page 183 "if training=True, we move forward (from data to latent space). If training=False, we move backwards through the layers (from latent space to data).

If training=True, then direction=-1.

If we check the for loop: for i in range(self.coupling_layers)[::direction]: then with a -1 instead of direction, we are moving backwards, not forward as the text say.

Let's focus on the training=False case because it is simpler, then direction is 1. Based on the previous for loop, we are moving forward (even though the text says that training=false means moving backwards).

With direction = 1, we get that the gate (gate = (direction - 1) / 2) is equals to 0.

If we substitute direction =1 and gate = 0 in:
(x * tf.exp(direction * s) + direction * t * tf.exp(gate * s))

we get x*tf.exp(s) + t. Which is what in page 178 it said it is the forward pass.

Going back to training=True, then direction = -1, then gate = -1. if we substitute these values in
(x * tf.exp(direction * s) + direction * t * tf.exp(gate * s))

we get x*tf.exp(-s) -t * tf.exp(-s) which is (z-t) * exp(-s) or the inverse pass.

So basically, if training is true, we are doing the inverse pass, from latent space to data. While if training is false, we are doing the forward pass, from data to latent space.

What I am getting wrong?

Not same results as book in WGAN notebook

There are examples in the book of what the WGAN should generate after 25 epochs of training.

But when I train the model, these are the generated samples at the 25th epoch of training.

I tried to change many hyperparameters (e.g., learning rate), but I never succeeded in getting a model that generates the type of faces in the book example, even after the 200 training epochs.

It looks similar to issue #13, and I was thinking that if, in the DCGAN chapter, I never succeeded in getting a model that generates LEGO, it was because of the problems of GANs explained later in the chapter.
But it seems to be another problem common to the two chapters.

Has anyone succeeded in obtaining a good model?

how did you create: Jsb16thSeparated.npz?

Hey David,
I ran your script and got this data and would love to know how you create it from midi, .wav to .npz format. To my understanding, .npz is after creating the embeddings. Looking forward to hearing from you.
Thanks,
Andy

Music Generation - MuseGAN notebook is not using the same files as in the notebook

Hey David,
I saw the notebook using diiferent files than the notebook... script used in boo is not available. I hope you can update.
Thanks,
Andy

my windows

was having a problem with my windows machine building the docker image.

304.4 E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/libh/libhttp -date-perl/libhttp-date-perl_6.05-1_all.deb 403 connecting to archive.ubuntu.c om:80: connecting to 91.189.91.82:80: dial tcp 91.189.91.82:80: connectex: A con nection attempt failed because the connected party did not properly respond afte r a period of time, or established connection failed because connected host has failed to respond. [IP: 91.189.91.82 80] 304.4 E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/libs/libsoup 2.4/libsoup-gnome2.4-1_2.70.0-1_amd64.deb 403 connecting to archive.ubuntu.com :80: connecting to 91.189.91.82:80: dial tcp 91.189.91.82:80: connectex: A conne ction attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has fa iled to respond. [IP: 91.189.91.82 80] 304.4 E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/libx/libxdam age/libxdamage1_1.1.5-2_amd64.deb 403 connecting to archive.ubuntu.com:80: con necting to 91.189.91.82:80: dial tcp 91.189.91.82:80: connectex: A connection at tempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 91.189.91.82 80] 304.4 E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/g/gtk+3.0/li bgtk-3-common_3.24.20-0ubuntu1.1_all.deb 403 connecting to archive.ubuntu.com: 80: connecting to 91.189.91.82:80: dial tcp 91.189.91.82:80: connectex: A connec tion attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has fai led to respond. [IP: 91.189.91.82 80]

I was able to fix it with

RUN apt-get --allow-releaseinfo-change update
before the RUN apt-get update command in the Dockerfile.gpu

Applying CGANs on more than 2 classes

Following the book, ive been trying to apply ConditionalWGAN for fashion mnist dataset with 10 classes but am facing error. I changed the model accordingly for both generator and critic and instead of one hot encoding the labels, im passing them just as numbers. What are the changes I am supposed to make. I've tried a few but none works.
And if i try to fit it normally, this error occurs. Does anyone know how to implement it?

Lesson No. 3: Variational autoencoder code for CelebA dataset takes too long to load

Specifically this code,

train_data = utils.image_dataset_from_directory( "celeba-dataset/img_align_celeba/img_align_celeba", labels = None, color_mode="rgb", image_size = (64, 64), batch_size = 128, shuffle=True, seed=42, interpolation="bilinear", )

takes about 20-30 minutes to load all of the 57,000 images from 'img_align_celeba' directory. Is everyone facing this issue or is it just me?

Edit: Attached the image for preview. It took 21 minutes this time as calculated with time module in Python:

Docker problem

A strange problem, do I need to run some other commands to prepare?

Docker compose up not working

-To anyone who has issues while setting up the container in the very beginning-

Reference to instructions: https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/tree/main?tab=readme-ov-file#running-the-container

-> Non-GPU
This command did not start the container correctly: docker compose up

I changed it to: docker compose -f docker-compose.yml up

This did the trick - explicity setting the compose yaml in the compose command

Kaggle 401

Kaggle dataset downloaders return 401 - Unauthorized

Suggestion: use Kubernetes with GKE Autopilot instead of VMs to run book examples on a cloud GPU

This repo provides instructions on how to set up GCP cloud VM instance with GPU to run examples.
I would like to recommend to take it further and use GKE Autopilot for GPU workloads instead of VMs.
Some benefits are:

GKE Autopilot's pay-per-use model ensures cost efficiency. Applying workloads via kubectl apply is simple, and pod deletion when idle is effortless.
Leverage service-based load balancing to expose Jupyter Lab, eliminating the need for port forwarding.
Maintenance/upgrades are managed seamlessly by GKE Autopilot, freeing users from routine system upkeep.
Adopting Kubernetes, a scalable and industry-standard platform, equips readers with practical experience, setting them ahead of a docker compose on a VM setup.

This is how I deployed the examples to GKE Autopilot:

Build and deploy docker image:

IMAGE=<your_image> # you can also skip this step and use bulankou/gdl2:20230715 that I build
docker build -f ./docker/Dockerfile.gpu -t $IMAGE .
docker push $IMAGE .

Create GKE Autopilot cluster with all default settings.
Apply the following K8s manifest (kubectl apply -f <yaml>) . Make sure to update <IMAGE> below. Also note cloud.google.com/gke-accelerator: "nvidia-tesla-t4" and autopilot.gke.io/host-port-assignment annotation, that ensure that we pick the right node type as well as enable host port on Autopilot.

apiVersion: v1
kind: Pod
metadata:
  name: app
  annotations:
    autopilot.gke.io/host-port-assignment: '{"min":6006,"max":8888}'
  labels:
    service: app
spec:
  nodeSelector:
    cloud.google.com/gke-accelerator: "nvidia-tesla-t4"
  containers:
    - command: ["/bin/sh", "-c"]
      args: ["jupyter lab --ip 0.0.0.0 --port=8888 --no-browser --allow-root"]
      image: <IMAGE>
      name: app
      ports:
        - containerPort: 8888
          hostPort: 8888
        - containerPort: 6006
          hostPort: 6006
      resources:
        limits:
          nvidia.com/gpu: 1
        requests:
          cpu: "18"
          memory: "18Gi"
      tty: true
  restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: app
spec:
  type: LoadBalancer
  ports:
    - name: "8888"
      port: 8888
      targetPort: 8888
    - name: "6006"
      port: 6006
      targetPort: 6006
  selector:
    service: app

Issue with AdamW on Apple M1

There seems to be a problem when trying to run the denoising diffusion models on Apple Silicon. This seems to be the related keras issue: keras-team/tf-keras#176
See also: https://developer.apple.com/forums/thread/729732

I was able to get it running using the legacy.Adam optimizer:

ddm.compile(
    # optimizer=optimizers.experimental.AdamW(
    #     learning_rate=LEARNING_RATE, weight_decay=WEIGHT_DECAY
    # ),
    optimizer=optimizers.legacy.Adam(
        learning_rate=LEARNING_RATE
    ),
    loss=losses.mean_absolute_error,
)

Not sure how this effects the quality of this model though...

DCGAN training gone wrong

Hi,

When running the first notebook, as-is, from chapter 4, I am getting very odd results.

During training, I could notice very huge swings of accuracy/loss of the generator/discriminator, and after about 240 epochs all hell breaks loose, with the discriminator apparently starting to predict all images as fake ones (fake discrimator acc remains at 1, while real discrimator acc remains at 0).

Trying to wrap my head around what may be happening here. Given that I have not altered the code, the only difference would be the seeds for random noise used as input to the generator. But I doubt that this could cause such wide differences compared to the results presented in the chapter (the graphs presented in the chapter are quite smooth compared to my run).

It doesn't seem like its a case of the discriminator overpowering the generator, even though it seems to be the case on some prior epochs where the discrimator acc is peaking at 1 (or maybe it is a case of discriminator overpower, given that it predicts all images produced by the generator as fake ones). I don't really understand why the discriminator would suddenly drop to being "perfect" in term of accuracy (while it had about the same acc for real/fake during training) to stay at a perfect average after epoch 240 (nor while it swinged so widely during training or plateaued at 1 for some epochs, then suddenly dropping).

Could there be a "bug" that slipped into the code ?
Do you have any intuition as to what may have gone wrong ?
And more generally, as a novice practitioner, what should be the thought process here for troubleshooting such a training gone bad ?

Thanks !

Setting up Google Cloud VM

I've been trying for a while to set up and run the docker container in Google Cloud's VM. I cannot resolve an issue I keep having at the final step. I can build the Docker image in the VM instance without issues, but whenever I try to run it with docker compose -f docker-compose.gpu.yml up I get the following error message:

`[+] Running 2/1
✔ Network generative_deep_learning_2nd_edition_default Created
✔ Container generative_deep_learning_2nd_edition-app-1 Created
Attaching to generative_deep_learning_2nd_edition-app-1

Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]`

I cannot figure out what might be going wrong. Any suggestions would greatly help!

docker issue

I've built the docker image successfully. However, when using docker-compose -f docker-compose-gpu.yml up, it returns: Starting generative_deep_learning_2nd_edition_app_1 ... error

ERROR: for generative_deep_learning_2nd_edition_app_1 Cannot start service app: could not select device driver "nvidia" with capabilities: [[gpu]]

ERROR: for app Cannot start service app: could not select device driver "nvidia" with capabilities: [[gpu]]

Can't create VM instance

I can't create a VM instance in the Google Cloud Console. It says "n1-standard-4 VM instance with nvidia-tesla-t4 accelerator(s) is currently unavailable in the us-east1-c zone." I've tried other zones, same issue.

docker-compose.gpu.yml error upon running

I'm on Windows with NVIDA 4070 and seeing the below error when trying to launch docker with docker-compose.gpu.yml. Any idea how to resolve this?

PS C:\Projects\GenerativeDeepLearning> docker compose -f docker-compose.gpu.yml up
[+] Running 2/0
✔ Network generativedeeplearning_default Created 0.0s
✔ Container generativedeeplearning-app-1 Created 0.1s
Attaching to app-1
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 2, stdout: , stderr: fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7fcd133aad54]

runtime stack:
runtime.throw({0x5286a1?, 0x6d?})
/usr/local/go/src/runtime/panic.go:992 +0x71
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:802 +0x389

goroutine 1 [syscall]:
runtime.cgocall(0x4f48d0, 0xc00017d958)
/usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc00017d930 sp=0xc00017d8f8 pc=0x40523c
github.com/NVIDIA/go-nvml/pkg/dl._Cfunc_dlopen(0x9c8820, 0x1)
_cgo_gotypes.go:113 +0x4d fp=0xc00017d958 sp=0xc00017d930 pc=0x4ee78d
github.com/NVIDIA/go-nvml/pkg/dl.(*DynamicLibrary).Open(0xc00017da30)
/go/src/nvidia-container-toolkit/vendor/github.com/NVIDIA/go-nvml/pkg/dl/dl.go:55 +0x74 fp=0xc00017d9d0 sp=0xc00017d958 pc=0x4ee994
gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/info.(*infolib).HasNvml(0xc00012c1e0?)
/go/src/nvidia-container-toolkit/vendor/gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/info/info.go:47 +0x85 fp=0xc00017da68 sp=0xc00017d9d0 pc=0x4eed85
github.com/NVIDIA/nvidia-container-toolkit/internal/info.ResolveAutoMode({0x54f5c8, 0x6333e0}, {0xc000138157?, 0x52974f?})
/go/src/nvidia-container-toolkit/internal/info/auto.go:42 +0x1bb fp=0xc00017db18 sp=0xc00017da68 pc=0x4ef53b
main.doPrestart()
/go/src/nvidia-container-toolkit/cmd/nvidia-container-runtime-hook/main.go:77 +0xdd fp=0xc00017df08 sp=0xc00017db18 pc=0x4f2e7d
main.main()
/go/src/nvidia-container-toolkit/cmd/nvidia-container-runtime-hook/main.go:176 +0x11e fp=0xc00017df80 sp=0xc00017df08 pc=0x4f43de
runtime.main()
/usr/local/go/src/runtime/proc.go:250 +0x212 fp=0xc00017dfe0 sp=0xc00017df80 pc=0x4368d2
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00017dfe8 sp=0xc00017dfe0 pc=0x460981: unknown

"docker compose build" not working on Mac (M1 Max)

I constantly got this error, although I installed pkg-config, added /opt/homebrew and /opt/homebrew/bin to PATH, and I set up HDF5_DIR=/opt/homebrew/opt

46.21 Building h5py requires pkg-config unless the HDF5 path is explicitly specified using the environment variable HDF5_DIR. For more information and details, see https://docs.h5py.org/en/stable/build.html#custom-installation
46.21 error: pkg-config probably not installed: FileNotFoundError(2, 'No such file or directory')
46.21 [end of output]
46.21
46.21 note: This error originates from a subprocess, and is likely not a problem with pip.
46.21 ERROR: Failed building wheel for h5py

ModuleNotFoundError: No module named 'notebooks'

I am running this on windows(not in docker). I got the following error:

  ---------------------------------------------------------------------------
  ModuleNotFoundError                       Traceback (most recent call last)
  Cell In[3], line 5
        2 import matplotlib.pyplot as plt
        4 from tensorflow.keras import layers, models, optimizers, utils, datasets
  ----> 5 from notebooks.utils import display
  
  ModuleNotFoundError: No module named 'notebooks'

when it runs "from notebooks.utils import display".

I tried to run "pip install notebooks", and I got:

ERROR: Could not find a version that satisfies the requirement notebooks (from versions: none)
ERROR: No matching distribution found for notebooks

searching google doesn't help. Please help here. Thanks!

Try GenAI

Docker issue (with port)

Hi --

I am trying to build the docker file on a Mac M1. I think I've done everything correctly, but I get the following error when I build the docker file - any ideas:

WARN[0000] The "JUPYTER_PORT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "JUPYTER_PORT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "TENSORBOARD_PORT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "TENSORBOARD_PORT" variable is not set. Defaulting to a blank string. 
WARN[0000] The "JUPYTER_PORT" variable is not set. Defaulting to a blank string. 
services.app.ports array items[0,1] must be unique

How to call downloader script

Hello,

I am having issues running any notebook that depends on downloaded files ( from VA_CalebAFaces to MuseGAN).

It seems that my app/data folder is empty, however, I cannot find any call on the notebooks to download the files.

But I do see the downloader scripts in app/scripts.

I guess this is an easy fix, or I might be missing something obvious? :)

"Docker compose" does nothing

Hi,

I'm stuck when I follow the instructions of the readme file:

docker compose -f docker-compose.gpu.yml up
The new 'docker compose' command is currently experimental. To provide feedback or request new features please open issues at https://github.com/docker/compose-cli
services.app.ports array items[0,1] must be unique

... and nothing happens.

What am I missing ?

I encounter this issue after following up the setup when I execute the first line of the notebook.

I encounter this issue after following up the setup when I execute the first line of the notebook.
Any idea how to solve the issue? I followed the cpu only instruction.

2023-04-28 17:00:53.581481: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-28 17:00:53.767800: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-04-28 17:00:53.775889: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-28 17:00:53.775906: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-04-28 17:00:53.933240: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-04-28 17:00:55.197049: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-04-28 17:00:55.197129: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-04-28 17:00:55.197135: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Notes on running the notebooks on Windows 11

Hello,

I would like to list possible necessary steps for those who want to run the notebooks on Windows 11.

You need to install Docker Desktop and follow the instructions.

Everything should run well up until downloading the data, here we must do something.

In my case, my additional steps were:

1- on the cloned repo directory run in terminal: "wsl -l"

You need it to choose Ubuntu as default, so that it shows:

Ubuntu (Default)
docker-desktop
docker-desktop-data

1.a - If you do not have ubuntu, run "wsl --install"

1.b after Ubuntu has installed, run "wsl -s Ubuntu" to make it default

2 - Go to Docker Desktop' settings and turn on this option -> Add the *.docker.internal names to the host's etc/hosts file (Requires password)

3 - Before downloading the data, understand that some scripts have different DOS/UNIX linebreaks, which means you will have to fix such scripts' linebreaks.

One way of doing this is by installing and running dos2unix on the scripts (as explained here: https://stackoverflow.com/questions/11616835/r-command-not-found-bashrc-bash-profile ).

So, first, "apt install dos2unix"

And then run, for instance, "dos2unix scripts\downloaders\download_bach_cello_data.sh"

4 - you can now download the data by running:

bash scripts/download.sh faces
bash scripts/download.sh bricks
bash scripts/download.sh recipes
bash scripts/download.sh flowers
bash scripts/download.sh wines
bash scripts/download.sh cellosuites
bash scripts/download.sh chorales

I hope this can help anyone running on Windows, I have written this from memory, so there might typo/ missing detail.

Thank you

davidadsp / generative_deep_learning_2nd_edition Goto Github PK

generative_deep_learning_2nd_edition's People

Contributors

Stargazers

Watchers

Forkers

generative_deep_learning_2nd_edition's Issues

Recommend Projects

Recommend Topics

Recommend Org