Giter Club home page Giter Club logo

act-scio2's People

Contributors

frbor avatar geirskjo avatar martineian avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

act-scio2's Issues

KeyError when trying to run scio-analyze on stdin

I'm currently trying to get act-scio2 to work a number of text documents, following the README instructions. The installation of the various packages went well, but I do get an error when running scio-analyze on some stdin input like in the provided example:

$ echo "The companies in the Bus; Finanical, Aviation and Automobile industry are large." | scio-analyze --beanstalk=
[2020-06-24 12:00:19] app=scio-analyze level=INFO msg=Waiting for work on stdin
[2020-06-24 12:00:19] app=scio-analyze level=ERROR msg=Got LookupError. If nltk data is missing, run scio-nltk-download, which should download all nltk data to ~/nltk_data.
[2020-06-24 12:00:19] app=scio-analyze level=ERROR msg=Task exception was never retrieved
future: <Task finished coro=<async_main() done, defined at /nr/samba/user/plison/anaconda3/lib/python3.7/site-packages/act/scio/analyze.py:98> exception=KeyError('threat_actor')>
Traceback (most recent call last):
  File "/nr/samba/user/plison/anaconda3/lib/python3.7/site-packages/act/scio/analyze.py", line 128, in async_main
    await task
  File "/nr/samba/user/plison/anaconda3/lib/python3.7/site-packages/act/scio/analyze.py", line 75, in analyze
    await asyncio.gather(*tasks)
  File "/nr/samba/user/plison/anaconda3/lib/python3.7/site-packages/act/scio/plugins/threatactor_pattern.py", line 28, in analyze
    ini['threat_actor']['alias'] = os.path.join(self.configdir, ini['threat_actor']['alias'])
  File "/nr/samba/user/plison/anaconda3/lib/python3.7/configparser.py", line 958, in __getitem__
    raise KeyError(key)
KeyError: 'threat_actor'

Any idea how to fix this?

Performance of NLP modules

This is more a suggestion than an actual issue, but I see that much of the processing time is spent on running NLP modules from NLTK. I would actually recommend to drop NLTK altogether -- it's now a quite outdated piece of software, and it was never developed for processing large document bases (it was actually primarily meant for educational purposes, as part of NLP courses).

I am a big fan of spacy (www.spacy.io). It's very fast, reliable, contains all the standard NLP modules you might need (tokeniser, POS tagger, lemmatiser, NER, parser, etc.) and has a developer-friendly interface. The accuracy of these NLP modules is also definitely better than NLTK (spacy relies on deep learning models). Plus, spacy it now supports a dozen languages or so. Give it a try and you will never look back at NLTK :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.