Giter Club home page Giter Club logo

Comments (7)

pseudotensor avatar pseudotensor commented on July 20, 2024 4

FYI what works for me:

#! /bin/bash

export SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
export PROJECT_DIR="$( cd -- "$( dirname -- "$SCRIPT_DIR" )" &> /dev/null && pwd )"
cd $PROJECT_DIR
export PYTHONPATH="$PYTHONPATH:$PROJECT_DIR"

export llama_tokenizer_path="LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="LWM-Chat-1M-Jax/params"
export input_file="taylor.jpg"

python3 -u -m lwm.vision_chat \
    --prompt="What is the image about?" \
    --input_file="$input_file" \
    --vqgan_checkpoint="$vqgan_checkpoint" \
    --dtype='fp32' \
    --load_llama_config='7b' \
    --max_n_frames=8 \
    --update_llama_config="dict(sample_mode='text',theta=50000000,max_sequence_length=131072,use_flash_attention=False,scan_attention=False,scan_query_chunk_size=128,scan_key_chunk_size=128,remat_attention='',scan_mlp=False,scan_mlp_chunk_size=2048,remat_mlp='',remat_block='',scan_layers=True)" \
    --load_checkpoint="params::$lwm_checkpoint" \
    --tokenizer.vocab_file="$llama_tokenizer_path" \
2>&1 | tee ~/output.log
read

But I didn't get video to work yet. Probably doesn't input mp4.

Also the --mesh_dim='!1,-1,32,1' \ seems off always, or has to be chosen or removed.

I wish the creators gave minimal running examples using the scripts.

from lwm.

mileyan avatar mileyan commented on July 20, 2024 1

.mkv format works for me.

from lwm.

Minyoung1005 avatar Minyoung1005 commented on July 20, 2024

Thanks for sharing, @pseudotensor ! I was also wondering if the .mp4 video file format is not supported.

from lwm.

cyj95 avatar cyj95 commented on July 20, 2024

is the .avi video format supported?

from lwm.

 avatar commented on July 20, 2024

I got the same problem. It cannot process .mp4 file.

from lwm.

 avatar commented on July 20, 2024

.mkv format works for me.

Would you mind sharing your script? I tried to use .mkv but still got the same error. Thank you for your help.

from lwm.

wilson1yan avatar wilson1yan commented on July 20, 2024

The mesh_dim argument depends on the number of devices you're using for inference. If you want to do tensor parallelism over 8 gpus, then mesh_dim should be 1,1,8,1. The default 32 might be too high if your machine doesn't have 32 devices.

Regarding supported video files, the code here:

vr = decord.VideoReader(f, ctx=decord.cpu(0))

just uses decord to read the video, so any video format that works for decord should work.

from lwm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.