Giter Club home page Giter Club logo

create-speech-to-text-pipeline / pipeline Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 8.0 5.13 MB

A tool that can be deployed to process posting and receiving text and audio files from and into a data lake, apply transformation in a distributed manner, and load it into a warehouse in a suitable format to train a speech-to-text model

License: MIT License

Jupyter Notebook 99.61% HTML 0.04% CSS 0.01% JavaScript 0.12% Python 0.21% Shell 0.01% Dockerfile 0.01%
apache-airflow apache-kafka apache-spark kafka-js kafka-python pyspark reactjs amazon-msk amazon-s3-storage

pipeline's People

Contributors

akrobi avatar haylemicheal avatar kaydeejr avatar mohammedesamaldin avatar nahomhmichael avatar yonamg avatar

Watchers

 avatar

pipeline's Issues

Create a javascript tag

Create a Kafka cluster

  • Based on Installing a Kafka Cluster and Creating a Topic - Hands-on Labs | A Cloud Guru, set up a cluster in your assigned AWS machine.

  • Your cluster will be responsible for creating a Delta Lake - a bucket in S3 where Spark transformed streaming data from users reading the texts you showed them are stored. (hint You will write a code that can generate an ID for a randomly selected text and its audio equivalent, receives an ID from an API, sends back as json the ID + audio to Kafka like URL

Planning and design

  • Build or simulate a Kafka event source for the text corpus - you should read Breaking News: Everything Is An Event! (Streams, Kafka And You) (florimond.dev)

  • Develop an overview of your approach and document it. Explain why this approach and why these tools. Explain how this approach will provide a good data source for the clients’ speech-to-text ML engine. Explain the purpose of each of these tools - should defend it if one asks them why, not simple python code.

EDA

Jupyter notebook that illustrate your data exploration with professional plots, readable axes labels, title, and legend; good choice of color

Backend

prepare API endpoints for kafka - using flask

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.