Giter Club home page Giter Club logo

easy-gpt4o's Introduction

easy-gpt4o

Blog Link: Easy-GPT4o - reproduce GPT-4o in less than 200 lines

Easy-GPT4O opensource version: use OpenAI older API implements GPT-4o in less than 200 lines of code.

Motivation

Why I start this project? This is just a toy project and a simple demo. I want to prove some ideas in this project:

  • Developers can build their own GPT-4o using existing APIs. By leveraging available tools, developers can easily access the capabilities of advanced models.
  • End-to-end models provide low latency but limited customization. This project explores the trade-off between latency and customization, highlighting the benefits and limitations of each approach.
  • The combined power of multiple models can outperform a single multimodal model. This project demonstrates the effectiveness of a collaborative approach, leveraging the collective intelligence of various models to achieve superior results.

Prerequisites

  • Python 3.6 or higher
  • OpenAI Python package (openai)
  • FFmpeg (for audio extraction)

Installation

  1. Clone the repository:

    git clone https://github.com/Chivier/easy-gpt4o
  2. Install the required Python packages:

    pip install -r requirements.txt
  3. Download and install FFmpeg from the official website: https://ffmpeg.org/

Usage

# Set your own openai api
export OPENAI_API_KEY=xxxxxxx
python main.py input_video.mp4 output_audio.mp3

Replace input_video.mp4 with the path to your input video file, and output_audio.mp3 with the desired path to save the output audio file.

How to make it happen

image
  • Extracts audio from a video file
  • Transcribes the audio using OpenAI Whisper API
  • Generates image descriptions for key frames in the video using OpenAI GPT-4 Turbo API
  • Combines the audio transcription and image descriptions into a comprehensive response
  • Converts the response to speech using OpenAI TTS API

Demo

Demo 1

a.mp4
a1.mov
a2.mov

Demo 2

b.mp4
b.mov

TODO

  • Open-source Model Replace OpenAI API
  • Streaming video processing
  • Use RAG store long period memory

easy-gpt4o's People

Contributors

chivier avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.