Giter Club home page Giter Club logo

padatious's Introduction

License CLA Team Status

PRs Welcome Join chat

Padatious

An efficient and agile neural network intent parser. Padatious is a core component of Mycroft AI.

Features

  • Intents are easy to create
  • Requires a relatively small amount of data
  • Intents run independent of each other
  • Easily extract entities (ie. Find the nearest gas station -> place: gas station)
  • Fast training with a modular approach to neural networks

Getting Started

Installing

Padatious requires the following native packages to be installed:

  • FANN (with dev headers)
  • Python development headers
  • pip3
  • swig

Ubuntu:

sudo apt-get install libfann-dev python3-dev python3-pip swig libfann-dev python3-fann2

Next, install Padatious via pip3:

pip3 install padatious

Padatious also works in Python 2 if you are unable to upgrade.

Example

Here's a simple example of how to use Padatious:

program.py

from padatious import IntentContainer

container = IntentContainer('intent_cache')
container.add_intent('hello', ['Hi there!', 'Hello.'])
container.add_intent('goodbye', ['See you!', 'Goodbye!'])
container.add_intent('search', ['Search for {query} (using|on) {engine}.'])
container.train()

print(container.calc_intent('Hello there!'))
print(container.calc_intent('Search for cats on CatTube.'))

container.remove_intent('goodbye')

Run with:

python3 program.py

Learn More

Further documentation can be found at https://mycroft.ai/documentation/padatious/

padatious's People

Contributors

forslund avatar krisgesling avatar matthewscholefield avatar nielstron avatar penrods avatar repodiac avatar stratus-ss avatar tadashi-hikari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

padatious's Issues

test_train_timeout_subprocess fail randomly

Hello,

test in question is failing with ca 30% probability in our build system. I have extraxted following testcase:

from time import monotonic
  
import os
import random

from padatious.intent_container import IntentContainer

cont = IntentContainer('temp')
cont.add_intent('a',
        [' '.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5))
            for __ in range(300)])
cont.add_intent('b',
        [' '.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5))
            for __ in range(300)])

for x in range(10):
    a = monotonic()
    assert not cont.train_subprocess(timeout=0.1)
    b = monotonic()
    print (b - a)

When I run it, I had got for example:

 0.47674093791283667
 0.5609202678315341
 0.5488572919275612
 6.474134984891862
 0.4769664751365781
 0.45290810498408973
 0.470392829971388
 0.4690805918071419
 0.46847033803351223
 0.4608854129910469

First word of a match gets chopped off sometimes

Describe the bug

I'm working on a feature addition to the spotify skill that works with the following intent:

list (all) (albums|records) by {artist}

For values of artist that are multiple words, the first word in this phrase gets chopped off:

(Pdb) message.data
{'artist': 'might be giants', 'utterance': 'list albums by they might be giants', 'utterances': ['list albums by they might be giants']}

Words after entity get matched too

I have the following setup:

code.intent:
(code|error) (|is) {code}

code.entity:
###

Example Phrase:
How is code 404 named?

In this case "code.intent" triggers as expected only with 3-digit-numbers, but captures all words following the entity too. So in this example message.data['code'] returns "404 named" instead of only "404".

LICENSE and tests/

Could you please consider to add LICENSE and tests/ into pythonhosted.org tarball?

Support named-group entity matches

Right now I have a big long auto-generated list of different units, things like [meter,mile,amp,ampere,] etc. This list is created using a combination of auto-generation and hand-editing.

Right now I need to create copies of the file, unitFrom.entity and unitTo.entity as an example. You can see some sample vocab here:

How many {unitTo} is {unitFrom}
How many {unitFrom} are in a {unitTo}
How many {unitFrom} in a {unitTo}
What is {unitFrom} in {unitTo}

Duplicating the files is a bad solution because if those files get out-of-sync at any point it could create some very confusing and hard-to-debug issues. Using symlinks is also confusing and presumes that mycroft will only ever be deployed on linux.

I think the best solutions would be to allow named-capture-groups, perhaps something like

How many {unit:to} is {unit:from}
How many {unit:from} are in a {unit:to}
How many {unit:from} in a {unit:to}
What is {unit:from} in {unit:to}

Basic Install Failing

After installing those things needed from the documentation:
sudo apt-get install libfann-dev python3-dev python3-pip swig

When I then go to run this:
pip3 install padatious

I get this error:

Collecting padatious
  Downloading https://files.pythonhosted.org/packages/33/c1/a54ac3f8fe5fac7fc9537beb90576673a660f3da9147e1317adf6e4c3cfb/padatious-0.4.7.tar.gz
Collecting fann2 (from padatious)
  Downloading https://files.pythonhosted.org/packages/80/a1/fed455d25c34a62d4625254880f052502a49461a5dd1b80854387ae2b25f/fann2-1.1.2.tar.gz (66kB)
    100% |████████████████████████████████| 71kB 4.6MB/s
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-mui8t_t6/fann2/setup.py", line 92, in <module>
        build_swig()
      File "/tmp/pip-install-mui8t_t6/fann2/setup.py", line 85, in build_swig
        find_fann()
      File "/tmp/pip-install-mui8t_t6/fann2/setup.py", line 73, in find_fann
        raise Exception("Couldn't find FANN source libs!")
    Exception: Couldn't find FANN source libs!
    Looking for FANN libs...

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-mui8t_t6/fann2/

This happens on both a WSL (Windows subsystem) and a digital ocean droplet that has the latest version of ubuntu on it. Did the documentation leave something out or did something break?

Unable to use Padatious on Mac OS X

Log files
import pathlib, padatious
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/padatious/init.py", line 15, in
from .intent_container import IntentContainer
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/padatious/intent_container.py", line 25, in
from padatious.entity import Entity
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/padatious/entity.py", line 17, in
from padatious.simple_intent import SimpleIntent
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/padatious/simple_intent.py", line 15, in
from fann2 import libfann as fann
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fann2/init.py", line 4, in
from fann2 import libfann
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fann2/libfann.py", line 13, in
from . import _libfann
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fann2/_libfann.cpython-37m-darwin.so, 2): Library not loaded: libdoublefann.2.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fann2/_libfann.cpython-37m-darwin.so
Reason: image not found

Environment (please complete the following information):

  • Device type: Macbook air 2020
  • OS: Mac OS X Cataline

No module named padatious

When I try to start Mycroft with the cli i get this error: No module named padatious but it's already installed
Képernyőkép 2023-10-01 110924

How can I fix it?

Entity matching more than it should

For an intent file like this:
(Start|Set) (a|) 5 minute timer (called|for) {name}.
The phrase:
"start a 5 minute timer called lasagna"
Return the entity "name" as "called lasagna".

However it does correctly parse:
"set a 5 minute timer called lasagna"
Returning the entity "name" as "lasagna".

how to load model

I am working on it and can't able to load the model. Please help me out for the same.

Cannot save the container model after training

This is the sample code and I am wondering that after training the container, is there any way to save this model?

from padatious import IntentContainer

container = IntentContainer('intent_cache')
#add intents and entities here just like regex
#documentation at: https://mycroft.ai/documentation/padatious/
container.add_intent('greeting', ['(Hi | Goodbye | Good | Hello) (| there!) {greeting}', 
                                  'Hello.'])
container.add_intent('fired',['fired','cancel','{person} (are | is) (terminated | fired).',
                             '(I | We) do not (need | want) your services (now | anymore).',
                             '{person} (are | services) (are not | not | now) (needed | canceled) (now | anymore |).'])

container.train()

I have tried using pickle and dill:

import pickle
import dill

with open('padatious_model', 'wb') as fp:  #throws cant pickle swigPy objects
    pickle.dump(container, fp)

with open("dillable_config.pkl", "wb") as f:    #throws cant pickle swigPy objects
    dill.dump(container, f)

I was wondering if there are another ways to save this padatious model after training, load and use them later to avoid re-training everytime.

pip3 issue finding libfann-dev libraries in Ubuntu 18.06

Issue can be resolved by searching directory '/usr/lib/x86_64-linux-gnu/' which is where libfann-dev installs. Alternatively, running command 'sudo ln -s /usr/lib/x86_64-linux-gnu/fann -d /lib/' will symbolically link the libraries into pips path

Parsing Datetime

Hello,

Thanks for the wonderful software. Pls is there a way to declare like a datetime entity, which gives as its output some utc or something?

For example I type, “What is the weather in London on Friday?” It gives me weather intent, alongside converting “Friday” to datetime string.

Thanks

Intent does not handle apostrophes properly

Give the following intent

add {Food} to (| (the | my)) {ShoppingList} (| list) (under {Category} |)

Shopping list is improperly parsed
Using the phrase:
add temperature sensors to steve's projecta

The intent parser produces
steve ' s projecta instead of steve's projecta

The utterance shows correct parsing:

~~~~50788 | __main__:handle_utterance:72 | Utterance: ["add temperature sensors to steve's projects"]

however the message.data shows that shoppinglist has been poorly parsed

 {'food': 'temperature sensors', 'shoppinglist': "steve ' s projects", 'utterance': "add temperature sensors to steve's projects"}

This obviously causes a errors or unmatched entities

The error comes from match_data.py. This statement:

def detokenize(self):
        self.sent = ' '.join(self.sent)

combine with the fact that self.sent is split like so:

 'sent': ['add', 'something', 'to', 'steve', "'", 's', 'projects'], 'matches': {}, 'conf': 0.0}

Causes the error. One solution that could be refined is

    @staticmethod
    def handle_apostrophes(old_sentence):
        new_sentence = ''
        apostrophe_present = False
        sentence = "steve's projects"

        for word in old_sentence:
          if word == "'":
              apostrophe_present = True
              new_sentence += word
          else:
            if apostrophe_present:
                new_sentence += word
                apostrophe_present = False
            else:
                if len(new_sentence) > 0:
                    new_sentence += " " + word
                else:
                    new_sentence = word
            apostrophe_presnet = False
        return new_sentence  


    # Converts parameters from lists of tokens to one combined string
    def detokenize(self):
        self.sent = self.handle_apostrophes(self.sent)

        new_matches = {}
        for token, sent in self.matches.items():
            print(self.handle_apostrophes(sent))
            new_token = token.replace('{', '').replace('}', '')
            new_matches[new_token] = self.handle_apostrophes(sent)
        self.matches = new_matches

Test failure in test_train_timeout_subprocess

On Alpine Linux, some architectures (not all of them, e.g. it works fine on x86_64) seem to fail on test_train_timeout_subprocess:

============================= test session starts ==============================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /builds/PureTryOut/aports/testing/py3-padatious/src/padatious-0.4.8
collected 36 items

tests/test_all.py ......                                                 [ 16%]
tests/test_container.py ....sF.........                                  [ 58%]
tests/test_entity_edge.py .                                              [ 61%]
tests/test_id_manager.py ....                                            [ 72%]
tests/test_intent.py ..                                                  [ 77%]
tests/test_match_data.py ..                                              [ 83%]
tests/test_train_data.py .                                               [ 86%]
tests/test_util.py .....                                                 [100%]

=================================== FAILURES ===================================
______________ TestIntentContainer.test_train_timeout_subprocess _______________

self = <tests.test_container.TestIntentContainer object at 0xffff8f5d54c0>

    def test_train_timeout_subprocess(self):
        self.cont.add_intent('a', [
            ' '.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5))
            for __ in range(300)
        ])
        self.cont.add_intent('b', [
            ' '.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5))
            for __ in range(300)
        ])
        a = monotonic()
        assert not self.cont.train_subprocess(timeout=0.1)
        b = monotonic()
>       assert b - a <= 1
E       assert (3178089.64288705 - 3178088.391084973) <= 1

tests/test_container.py:149: AssertionError
=========================== short test summary info ============================
FAILED tests/test_container.py::TestIntentContainer::test_train_timeout_subprocess
=================== 1 failed, 34 passed, 1 skipped in 5.89s ====================

I'd think assert with those values would be less than 1 so equal to true, but maybe it doesn't like the . for some reason?

Trouble with (|the) in .intent file

Hi

I forked a skill to https://github.com/aussieW/nature-sound-skill which I am trying to improve but I am having an issue with the .intent file under one specific condition.

The .intent file contains:

play (|{sound}) relaxation music
listen to {sound} relaxation music
relax (with|to) {sound}
relax to the sound of (|the) {sound}
play (|some) relaxing (music|sounds|{sound})
listen to (|some) relaxing (music|sounds|{sound} 

I understand from a recent conversation on Mattermost that the last two lines probably don't work yet.

{sound} represents one of a number of available mp3 files.

The .entity file is dynamically constructed from the available mp3 files. In this case it contains:

dawn chorus
rainy river
ocean waves
rainforest
hot spring
urban thunderstorm
tropical storm

My problem relates to line 4, 'relax to the sound of (|the) {sound}'. If 'the' is used in any request it always ends up as part of {sound}.

e.g. 'relax to the sound of the rainforest' results in {sound} = 'the rainforest'

IntentContainer.instantiate_from_disk() method does not load data for padaos regex patterns

Hi,

I contributed the instantiate_from_disk() functionality some time ago (short recap: you can reuse and load trained models from disk via utilizing cached contents which have been stored externally).

Unfortunately, there is a bug I found out recently. I fixed it already with some kind of workaround. In the following I describe the details/background.

Background:

  • As I already found out and noted down as comment in the source code, the padaos (https://github.com/MycroftAI/padaos) regex "compiler" has to recompile also when loading the trained models from disk.
  • The current interface to this component does not allow to persist patterns to disk and reload them in the same fashion as instantiate_from_disk() does. Another code change and pull request for this project might be necessary here but that can be discussed on the padaos repo website.
  • The bug is/was that line self.padaos.add_entity(name, lines) in method add_entity of intent_container.py contained empty lines - I provided an empty list via instantiate_from_disk() since for reloading/instantiating from disk no training data is (usually) necessary! In this case it is, unfortunately (due to the need to recompile patterns with padaos)
  • The workaround (i.e. fix) was to provide the original training data for both entities and intents to the padaos compiler so that it can work with all the text for outputting the required patterns eventually.

Implications: If no contents for entities and intents are provided to the padas compiler, most entities embedded in intents are not detected and some intents tend to be completely wrong most of the time. So it is rather severe.

Solution: As said before, I would like to contribute the fix via another pull request. The fixed functionality has also been considered in an improved unit test (in test_container.test_instantiate_from_disk())

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.