Giter Club home page Giter Club logo

create-speech-to-text-pipeline / pipeline Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 8.0 5.13 MB

A tool that can be deployed to process posting and receiving text and audio files from and into a data lake, apply transformation in a distributed manner, and load it into a warehouse in a suitable format to train a speech-to-text model

License: MIT License

Jupyter Notebook 99.61% HTML 0.04% CSS 0.01% JavaScript 0.12% Python 0.21% Shell 0.01% Dockerfile 0.01%
apache-airflow apache-kafka apache-spark kafka-js kafka-python pyspark reactjs amazon-msk amazon-s3-storage

pipeline's Introduction

Logo

Speech-to-Text Data Collection

A tool that can be deployed to process posting and receiving text and audio files from and into a data lake, apply transformation in a distributed manner, and load it into a warehouse in a suitable format to train a speech-to-text model.

Data Capture Pipeline

Pipeline Diagram

Directory Structure

.
├── airflow
│   ├── dags
│   │   ├── extract_load.py
│   │   └── scripts
│   │       ├── dataloader.py
│   │       ├── db_connection.py
│   │       ├── __init__.py
│   │       └── schema
│   │           └── amharicnews.sql
│   ├── data
│   │   └── AmharicNewsDataset.csv
│   ├── docker-compose.yaml
│   └── logs
│       └── scheduler
│           └── latest -> /opt/airflow/logs/scheduler/2022-10-05
├── backend
│   └── dummy.txt
├── frontend
│   ├── dummy.txt
│   ├── frontend
│   │   ├── package.json
│   │   ├── package-lock.json
│   │   ├── public
│   │   │   ├── favicon.ico
│   │   │   ├── index.html
│   │   │   ├── logo192.png
│   │   │   ├── logo512.png
│   │   │   ├── manifest.json
│   │   │   └── robots.txt
│   │   ├── README.md
│   │   └── src
│   │       ├── App.css
│   │       ├── App.js
│   │       ├── App.test.js
│   │       ├── index.css
│   │       ├── index.js
│   │       ├── logo.svg
│   │       ├── reportWebVitals.js
│   │       └── setupTests.js
│   └── proto.png
├── img
│   ├── logo.png
│   └── pipelineDiagram.png
├── LICENSE
├── logging
│   └── dummy.txt
├── notebook
│   └── Amharic_news_Classification.ipynb
├── README.md
├── requirements.txt
├── screenshots
│   ├── airflowscreenshoot.png
│   └── design diagram.png
└── testing
    ├── dummy.txt
    └── test_dataloading.py

17 directories, 39 files

Run Locally

Clone the project

  git clone https://github.com/create-speech-to-text-pipeline/pipeline

Go to the project directory

  cd pipeline

Install dependencies

  pip3 install -r requirements.txt

Set up pipeline

  python3 setup.py

Screenshots

App Screenshot

Authors

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.