Giter Club home page Giter Club logo

nicolay-r / arekit Goto Github PK

View Code? Open in Web Editor NEW
54.0 54.0 3.0 22.93 MB

Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML

Home Page: https://nicolay-r.github.io/arekit-page/

License: MIT License

Python 99.92% Shell 0.08%
bert datasets frames language-models neural-networks nlp pandas pandas-dataframe prompt prompting relation-extraction sentiment-analysis tensorflow

arekit's Introduction

Hi I'm Nicolay! 👋

  • My personal website at github for more information about me
  • Combine it with track-and-field 🏃‍♂️, ⛷️ and 🌊🏄‍♂️

The most recent

arekit's People

Contributors

nicolay-r avatar trellixvulnteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

arekit's Issues

Separate experiments data_io provider into serialization and training stages

The idea comes from an application of SynonymsCollection, which is (might be) useless during neural networks training process on already serialized data. However the latter is important on input data serialization stage.
Therefore, there is a need to split DataIO API of experiment subfolder onto:

  1. SerializationData
  2. TrainingData
    Considering renaming from DataIO -> BaseData.

Entity -- provide GroupIndex property

Добавить в класс Entity такое своство, которое возвращает int либо None.
Для чего это необходимо:

  1. Так можно уйти от испльзования коллекции синонимов, если синонимы применялись при составлении коллекции (уже были размечены сущности) => GroupIndex будет предоставлять такой индекс
  2. Получение правильной связи синонимов, так как сейчас при обращении в коллекцию используется лемматизация, что искажает результат
  3. Ускорение загрузки и обработки данных (за счет отключения коллекции синонимов)
  4. Удаление чтения синонимов из RuAttittudes.

SynonymsCollection considered in ReadOnly mode only

Due to evaluation process assumes to perform mapping of model results towards etalon results, there is a need to utilize Synonyms in evaluation process.
It is also used for etalon collection initialization.

In general, it is important to have a read-only synonyms collection which could cover entries of input examples of a variety sources types, such as train, test, (dev) simultaneously.

The source of synonyms for OpinionCollection (Trusted/Non trusted)

It is necessary to clarify whether the source of synonym is trusted or not.
Here 'trusted' means that the synonyms collection has been obtained from the same corpora as opinions, i.e. we guarantee the absence of duplicated synonymous opinions within a document.
Otherwise, we are able to skip (ommit) a duplicated opinions during OpinionCollection initialization

OpinionOperation -- remove SynonymsCollection property.

Using specific experiment method that we refer to during experiment data serialization.
This allows us demarcate two type of SynonymCollection:

  1. For results evaluation (OpinionOperations)
  2. For additional search of synonymous entities during serialization stage. (described above)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.