Giter Club home page Giter Club logo

pickpod's Introduction

Pickpod

Python Version License Streamlit App

Integrated tools to transfer internet audio to text, extract unpopular views, and pick up podcasts for you.

Pickpod helps to build your private wiki efficiently.

This repository contains:

  1. A Python package that can easily call specified tasks.

  2. A Streamlit app that provides a web UI to manage your podcast library.

  3. Several package usage examples of complete tasks for target audio.

Welcome to our commercial deployment: Pickpod, implementation with Java and microservice architecture.

Compared to the personal open-source prototype in this repository, the commercial version provides powerful performance and stable services.

Table of Contents

Background

The goals for Pickpod are:

  1. High-quality integration with yt-dlp, faster-whisper, and pyannote-audio, so that users can quickly obtain the text result of the corresponding audio transcription by simply inputting a link or a local file.

  2. The convenient use of LISTEN NOTES Podcast API and Claude API. After completing the necessary settings and making a task, Pickpod can get the list of podcasts the users are interested in regularly according to the specified release period. Thus, the transcription task can be completed in batch. Then, Pickpod can pick up podcasts based on the evaluation through the extracted keywords, summaries, and views. Users can reference and modify the recommendation, and the sort of podcasts will update soon.

  3. Rapid deployment for local environments, so that when the user launches the project, all features are easily accessible in the browser.

Install

Since ffmpeg and ffprobe are strongly recommended by yt-dlp, it is necessary to install the ffmpeg binary within the system before installing Pickpod.

You can refer to the installation method provided by pydub, or go to the ffmpeg download page and ffmpeg compilation guide for more.

Moreover, please see the note about hugging face access token fetching in pyannote-audio for more information on using speaker-diarization.

If you need to filter the list of podcasts to be batch transcribed based on customized rules or use LLM to analyze the transcribed text, please refer to the API documentation provided by Listen Notes and Anthropic to obtain the necessary Access Keys, respectively.

❗️Warning

According to our experiments, the latest branch of pyannote-audio using torch>=2.0.0 may not detect GPU and run only on CPU, so Pickpod requires pyannote.audio==2.1.1 using torch==1.13.1.

Due to Pickpod strictly restricting the version of used Python packages, pyannote-audio or other packages may automatically solve conflicts and remove some packages that you have installed before, such as torch>=2.0.0. To avoid unnecessary conflicts or damage to your environment, we strongly recommend installing Pickpod in a brand new Python environment or a Python virtual environment.

Python

You don't need this source code if you just want to use the package. Just run:

$ pip install --upgrade pickpod

If you want to modify the package, install from source with:

$ pip install ./pickpod

If you want to run the Streamlit app that provides a web UI, install from source with:

$ pip install -r ./pickpod/app/requirements.txt
$ streamlit run ./pickpod/app/Home.py --server.port 8051

Then visit http://127.0.0.1:8051 in your local browser.

Installation in a typical environment

We chose nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 as a typical system environment to try to install Pickpod. The docker image has the following base configuration:

$ python3 -V

  Python 3.10.12


$ nvidia-smi

  Tue Aug 15 08:06:56 2023
  +-----------------------------------------------------------------------------+
    NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
  |-------------------------------+----------------------+----------------------+
    GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
                                  |                      |               MIG M. |
  |===============================+======================+======================|
      0  NVIDIA GeForce ...  On   | 00000000:65:00.0 Off |                  N/A |
     0%   43C    P8    23W / 370W |   1481MiB / 24576MiB |      0%      Default |
                                  |                      |                  N/A |
  +-------------------------------+----------------------+----------------------+

  +-----------------------------------------------------------------------------+
    Processes:                                                                  |
     GPU   GI   CI        PID   Type   Process name                  GPU Memory |
           ID   ID                                                   Usage      |
  |=============================================================================|
  +-----------------------------------------------------------------------------+

First, we need to install ffmpeg, python3-pip, and other essential tools, then upgrade the software packages.

$ sudo apt-get -y install cmake libsndfile1 ffmpeg python3-pip
$ sudo apt update && apt upgrade -y

We can verify if ffmpeg is installed successfully in the following way:

$ ffmpeg -version

  ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100

After downloading the source code and running setup.py, we can import Pickpod in Python.

$ git clone https://github.com/shixiangcap/pickpod.git
$ pip install ./pickpod

Usage

Do internet Pickpod task

from pickpod.config import TaskConfig
from pickpod.doc import AudioDocument
from pickpod.task import PickpodTask

HUGGING_FACE_KEY = "YOUR_HUGGING_FACE_KEY"

# For example: https://www.youtube.com/watch?v=xxxxxxxxxxx
audio_url = "YOUR_AUDIO_URL_ON_INTERNET"

# Set audio information
audio_doc = AudioDocument(audio_url=audio_url)
# Config pickpod task
task_config = TaskConfig(key_hugging_face=HUGGING_FACE_KEY, pipeline=True)
# Initial pickpod task
pickpod_task = PickpodTask(audio_doc, task_config)
# Start pickpod task
pickpod_task.pickpod_with_url()
# Get the result of pickpod task
print(pickpod_task.audio_doc.__dict__)

Do local Pickpod task

from pickpod.config import TaskConfig
from pickpod.doc import AudioDocument
from pickpod.task import PickpodTask

HUGGING_FACE_KEY = "YOUR_HUGGING_FACE_KEY"

# For example: xxxxxxxxxxx.m4a
audio_path = "YOUR_LOCAL_FILE_PATH"

# Set audio information
audio_doc = AudioDocument(audio_path=audio_path)
# Config pickpod task
task_config = TaskConfig(key_hugging_face=HUGGING_FACE_KEY, pipeline=False)
# Initial pickpod task
pickpod_task = PickpodTask(audio_doc, task_config)
# Start pickpod task
pickpod_task.pickpod_with_local()
# Save the result of pickpod task
pickpod_task.audio_doc.save_as_json()

Examples

Modification of user configuration and task options

A Pickpod task returning results continuously during execution

A complete transcription result of a audio file

If the target YouTube video is Introducing GPT-4, the Pickpod can get the JSON file afeb5810-25ee-426d-aa88-7b58484d4c6f.json

If the target 小宇宙 podcast is EP 35. ICML现场对话AI研究员符尧:亲历AI诸神之战,解读LLM前沿研究,Llama 2,AI Agents, the Pickpod can get the JSON file 93aa3140-300d-4af6-9d9c-2c41e9095821.json

Related Efforts

  • yt-dlp - A youtube-dl fork with additional features and fixes.

  • faster-whisper - Faster Whisper transcription with CTranslate2.

  • pyannote-audio - Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding.

Maintainers

@shixiangcap

Contributing

Feel free to dive in! Open an issue or submit PRs.

License

MIT © shixiangcap

pickpod's People

Contributors

edwardoll avatar yzhang1918 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.