Hi, I'm trying to run run_vision_chat.sh</

FYI what works for me: <div class="snippet-clipboard-content notranslate position-

Thanks for sharing, <a class="user-mention notranslate" data-hovercard-type="user" dat

vision chat error about lwm HOT 7 OPEN

Minyoung1005 commented on July 20, 2024 1

vision chat error

from lwm.

Comments (7)

pseudotensor commented on July 20, 2024 4

FYI what works for me:

#! /bin/bash

export SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
export PROJECT_DIR="$( cd -- "$( dirname -- "$SCRIPT_DIR" )" &> /dev/null && pwd )"
cd $PROJECT_DIR
export PYTHONPATH="$PYTHONPATH:$PROJECT_DIR"

export llama_tokenizer_path="LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="LWM-Chat-1M-Jax/params"
export input_file="taylor.jpg"

python3 -u -m lwm.vision_chat \
    --prompt="What is the image about?" \
    --input_file="$input_file" \
    --vqgan_checkpoint="$vqgan_checkpoint" \
    --dtype='fp32' \
    --load_llama_config='7b' \
    --max_n_frames=8 \
    --update_llama_config="dict(sample_mode='text',theta=50000000,max_sequence_length=131072,use_flash_attention=False,scan_attention=False,scan_query_chunk_size=128,scan_key_chunk_size=128,remat_attention='',scan_mlp=False,scan_mlp_chunk_size=2048,remat_mlp='',remat_block='',scan_layers=True)" \
    --load_checkpoint="params::$lwm_checkpoint" \
    --tokenizer.vocab_file="$llama_tokenizer_path" \
2>&1 | tee ~/output.log
read

But I didn't get video to work yet. Probably doesn't input mp4.

Also the --mesh_dim='!1,-1,32,1' \ seems off always, or has to be chosen or removed.

I wish the creators gave minimal running examples using the scripts.

from lwm.

mileyan commented on July 20, 2024 1

.mkv format works for me.

from lwm.

Minyoung1005 commented on July 20, 2024

Thanks for sharing, @pseudotensor ! I was also wondering if the .mp4 video file format is not supported.

from lwm.

cyj95 commented on July 20, 2024

is the .avi video format supported?

from lwm.

commented on July 20, 2024

I got the same problem. It cannot process .mp4 file.

from lwm.

commented on July 20, 2024

.mkv format works for me.

Would you mind sharing your script? I tried to use .mkv but still got the same error. Thank you for your help.

from lwm.

wilson1yan commented on July 20, 2024

The mesh_dim argument depends on the number of devices you're using for inference. If you want to do tensor parallelism over 8 gpus, then mesh_dim should be 1,1,8,1. The default 32 might be too high if your machine doesn't have 32 devices.

Regarding supported video files, the code here:

LWM/lwm/vision_chat.py

Line 84 in 0f441d3