Giter Club home page Giter Club logo

podcast-semanticsearch's Introduction

Semantic search on podcast transcripts using Weaviate

Dataset: 300 Podcast transcripts from Changelog
Vectorization module: sentence-transformers/msmarco-distilroberta-base-v2

Set-up Guide

  1. Set-up Weaviate: docker-compose up -d*
  2. Install Weaviate client: pip install weaviate_client==3.2.2
  3. Import data: python3 import.py**
  4. Query data: Go to console.semi.technology on Chrome/Safari and connect to http://localhost:9999. Click on Query Module to start querying using GraphQL

*Change port 9999 in docker-compose.yml and import.py to a different value (like 8888), if not able to connect
**Could take up to 3 hrs ๐Ÿ™‚

Example Queries:

Suppose we want to listen to some Changelog episodes discussing GraphQL. We can list the desired episode titles (and transcripts too) via nearText for the concept Episode about graphql:

Screenshot 2022-03-29 191123

The Changelog #255 is Why is GraphQL so cool?
The Changelog #297 is Prisma and the GraphQL data layer
The Changelog #316 is REST easy, GraphQL is here

Well, that was quite simple. In fact, a podcast search engine could have provided the same results.
So how about we list some episodes about web development but in the context of Python and not Javascript.
In addition to nearText for the concept of Episode about web development, we'll also add moveTo (for python) and moveAwayFrom (for javascript) arguements:

image

The Changelog #301 is Python at Microsoft
The Changelog #229 is Python, Django, and Channels

Let's say that listening to the GraphQL and Python episodes has inspired us to create a Machine Learning startup. Thus we would now like to listen to CEOs and Founders but in the field of Machine Learning or Data Science instead of vanilla Web Development:

aiCeo

The Practical AI #149 is Trends in data labeling (With CEO of Label Studio)
The Changelog #305 is Putting AI in a box at MachineBox (With founders of MachineBox)
The Practical AI #134 is Apache TVM and OctoML (With CEO and co-founder of OctoML )
The Practical AI #148 is Stellar inference speed via AutoNAS (With CEO and co-founder of Deci)
The Practical AI #141 is Towards stability and robustness (With CTO of BeyondMinds)

podcast-semanticsearch's People

Contributors

pkdyn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.