Comments (7)
FYI what works for me:
#! /bin/bash
export SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
export PROJECT_DIR="$( cd -- "$( dirname -- "$SCRIPT_DIR" )" &> /dev/null && pwd )"
cd $PROJECT_DIR
export PYTHONPATH="$PYTHONPATH:$PROJECT_DIR"
export llama_tokenizer_path="LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="LWM-Chat-1M-Jax/params"
export input_file="taylor.jpg"
python3 -u -m lwm.vision_chat \
--prompt="What is the image about?" \
--input_file="$input_file" \
--vqgan_checkpoint="$vqgan_checkpoint" \
--dtype='fp32' \
--load_llama_config='7b' \
--max_n_frames=8 \
--update_llama_config="dict(sample_mode='text',theta=50000000,max_sequence_length=131072,use_flash_attention=False,scan_attention=False,scan_query_chunk_size=128,scan_key_chunk_size=128,remat_attention='',scan_mlp=False,scan_mlp_chunk_size=2048,remat_mlp='',remat_block='',scan_layers=True)" \
--load_checkpoint="params::$lwm_checkpoint" \
--tokenizer.vocab_file="$llama_tokenizer_path" \
2>&1 | tee ~/output.log
read
But I didn't get video to work yet. Probably doesn't input mp4.
Also the --mesh_dim='!1,-1,32,1' \
seems off always, or has to be chosen or removed.
I wish the creators gave minimal running examples using the scripts.
from lwm.
.mkv format works for me.
from lwm.
Thanks for sharing, @pseudotensor ! I was also wondering if the .mp4 video file format is not supported.
from lwm.
is the .avi video format supported?
from lwm.
I got the same problem. It cannot process .mp4 file.
from lwm.
.mkv format works for me.
Would you mind sharing your script? I tried to use .mkv but still got the same error. Thank you for your help.
from lwm.
The mesh_dim
argument depends on the number of devices you're using for inference. If you want to do tensor parallelism over 8 gpus, then mesh_dim
should be 1,1,8,1
. The default 32 might be too high if your machine doesn't have 32 devices.
Regarding supported video files, the code here:
Line 84 in 0f441d3
just uses
decord
to read the video, so any video format that works for decord
should work.from lwm.
Related Issues (20)
- Request for publicizing the LWM-1K/8K JAX or PyTorch model
- AttributeError: module 'jax.numpy' has no attribute 'DeviceArray' when run sample_video.sh
- RESOURCE_EXHAUSTED: XLA:TPU compile permanent
- DP FSDP & SP
- ValueError: Incompatible shapes for broadcasting: (2, 1, 1, 526464) and requested shape (2, 1, 32768, 32768) HOT 2
- Can it be used in the environment H100 ?
- Great work! Any plan for the vision-language models in Pytorch?
- Weight conversion scripts HOT 1
- Minimum GPU memory capacity required to run HOT 1
- vision model initialization
- what is the "_missing_keys"?
- Interesting Problems of Accuracy & Inference Speed with run_eval_needle.sh
- Question about loading LLaMA-2 7B on the LLM context extension stage
- vison-language model training data example for videos
- Any consideration on why use 4 sp & 32 tp?
- Quantize model weights
- Error while running bash command: run_sample_video.sh | Error: "TypeError: missing a required argument: 'segment_ids'" HOT 5
- Hang in vision_generation.py with newer versions of Jax HOT 1
- A question on your implementation of decoder phase of llama
- I wonder if you will release the tokenized dataset.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lwm.