Giter Club home page Giter Club logo

embedding_search's Introduction

README

Motivation

I am faszinated by radioWissen podcasts from BR2. They provide a wide range of topics: History, literature, music, mythology, religions, philosophy, psychology, nature, biology, economy and politics. Currently (2019-10-14) there are 2068 podcasts online. Therefore it is sometimes hard to find podcasts that I want to listen to based on quite specific query. Since I like the idea of embeddings to represent content, I have written a function to inspire me what I can listen to next.

The embeddings are based on feed-forward Neural-Net Language Models with 128-dimensional embedding vectors from tfhub.dev.

Usage of radioWissen search function (radioWissen usage)

The idea is to get an inspiration which podcasts -- with similar content -- are available.

Find top_n closest podcasts to search_string.

$ python quick_search_radiowissen.py --search_string="Klassizismus" --top_n=15
Searching for best matches...
                                                    Klassizismus
Gottfried Semper - Architekt und Revolutionär           0.731763
Hieronymus Bosch - Im Lustgarten der Dämonen            0.691967
Humanismus, Renaissance, Reformation - Aufbruch...      0.679882
Béla Bartók- Meister der Moderne                        0.666955
Henri de Toulouse-Lautrec - Genie der Plakatkunst       0.666286
Igor Strawinsky - Der Schock der Moderne                0.662641
Architekt und Bauhausgründer - Walter Gropius           0.661809
Jugendstil - Natur als Kunst, Schönheit als Rev...      0.657266
Ludwig II. - Der Mondkönig                              0.654857
Merkantilismus - Zölle für den Adelsprunk               0.649388
Versailles - Musik am Hof des Sonnenkönigs              0.649136
Renaissance-Gärten - Kunstwerk aus Himmel, Erde...      0.648772
Herrlichkeit und Katzenjammer - Die Epoche der ...      0.648647
Der Englische Garten - Münchens grünes Herz             0.646806
Das frühe Byzanz - Ein christliches Imperium im...      0.645438

Most of the results are related to architecture or at least art.

Demo (general usage)

Use the code for your own data.

Find clostest match from list of sentences (or list of single words) to a search query.

1. Imports

#Suppress tensorflow infos, deprecation warnings
import warnings
warnings.filterwarnings("ignore")
import tensorflow as tf
tf.logging.set_verbosity(1000)
tf.__version__ # Tested with '1.13.1', does not work with 2.x
# 
import embeddings as es

2. Get Embeddings

Embed sentences or a list of words.

e = es.TfHubEmbedder()
data = ["Auto", "Bus", "Caesar"]
e.embed(data)
print("Embeddings' shape: {}".format(e.embeddings.shape))
e.save("data/demo_emb.npy", "data/demo_titles.npy")
e.load("data/demo_emb.npy", "data/demo_titles.npy")
> Embeddings' shape: (3, 128)

3. Find ClosestEmbedding

Compare embeddings from (2) with your search_string.

ce = es.ClosestEmbedding(e.embeddings, e.titles)
sim_df = ce.find_closest(search_string="VW")
ce.plot_result()
print(sim_df)
>               VW
> Auto    0.720517
> Bus     0.602985
> Caesar  0.468566

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.