Giter Club home page Giter Club logo

rasa_lookup_demo's Introduction

Phrase Matcher Demo

This is a simple demo of the new lookup table feature in rasa_nlu. See the blog post accompanying this repository here

The goal is to show how lookup tables may improve entity extraction under certain conditions and also give some advice on using this feature effectively.

This repo contains two demos:

  1. A simple restaurant example with very few training examples and only one entity.
  2. A medium-sized company name extraction example with a few thousand examples and several entities.

Running the demo.

No installation is necessary although you must have rasa_nlu installed and version > 0.13.3 or above.

To run one or both of the demos:

python run_lookup.py <demo_key>

where <demo_key> is one of {food, company}. If <demo_key> is ommitted, it will run both of the demos.

Code Structure

data/ holds the training data and lookup tables for each of the demos.

models/ is where the models are persisted.

configs/ holds the rasa_nlu configs to do the baseline evaluation and the lookup table evaluation.

img/ stores plots and outputs from the runs.

Cleaning lookup tables

The script filter_lookup.py may be used to clean up lookup tables by removing any elements that match with a cross-list.

You can call this scripy by running

python filter_lookup.py <lookup_in> <cross_list> <lookup_out>

<lookup_in> is a lookup table with newline-separated elements.

<cross_list> is either a comma or newline-separated list of elements that you'd like to remove from <lookup_in>

<lookup_out> is the name of the file that you'd like to write the filtered list to.

Speed Testing

We include the directory speed_test/ for testing the speed of training as a function of the lookup table size.

This generates random lookup tables and times each component of the training and evaluation process. We use the company dataset data/company/company_train_lookup.py.

cd speed_test
python time_lookups.py

See speed_test/README.md for more details.

Ngrams

A simple ngrams tester is included and can be run by

python run_ngrams.py

This loads two lookup tables, data/company/pos_ngrams.txt & data/company/neg_ngrams.txt, each containing ngrams that were found to be influential to classifying phrases as company names. We then compute the f1 score as a function of random noise injected into the entities. The 'noise' value is the probability of a character flip in each character of each company entity in the test set.

This gives the following plot

rasa_lookup_demo's People

Contributors

twhughes avatar metcalfetom avatar akelad avatar akshay2000 avatar imsurinder90 avatar llermaly avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.