Giter Club home page Giter Club logo

Comments (4)

peteriz avatar peteriz commented on August 15, 2024 1

Hi @amit8121
The dataloader used in the NER tutorial is called SequentialTaggingDataset (I also corrected a typo in the tutorial), this loader ingests whitespace separated token tag format where the first token is the word and the rest are the tags. Sentences are separated using a blank line. For example, given a sentence 'a b c':
a tag_a another_tag
b tag_a no_tag
c tag_b no_tag

Using the loader you can specify sentence length, max word length and which column to use as the labels (for your specific problem)

Alternatively, you can write custom dataloader to your data files and make it according to the format required by the NER model (strings converted into int, padded, etc..)

from nlp-architect.

kumar-nilesh-101 avatar kumar-nilesh-101 commented on August 15, 2024 1

This sample from my dataset may be helpful:

British	B-NRP
Foreign	B-PRO
Secretary	I-PRO
Malcolm	B-PER
Rifkind	I-PER
said	O
on	O
Tuesday	B-DYN
from	O
Pakistan	B-LOC
his	O
government	B-ORG
would	O
only	O
take	O
action	O
against	O
the	O
planned	O
Islamists	B-NRP
gathering	O
in	O
London	B-LOC
if	O
British	B-NRP
law	O
was	O
broken	O
.	O

It is a single sentence in which each token(word) is in a seperate line. There may be many entities. This model takes takes data in BIO format. 'B' marks the start of a named entity, 'I' marks the intermediate word in the entity and 'O' is used to declare that this word is no entity. Such as Foreign Secretary is a named entity(a profession) so Foreign B-PRO and Secretary I-PRO. seperate the tag and word by a tab and seperate sentences by a blank line.

from nlp-architect.

prakashwarbler avatar prakashwarbler commented on August 15, 2024

how to change a text into BILOU format . kindly suggest some alternatives

from nlp-architect.

peteriz avatar peteriz commented on August 15, 2024

@prakashwarbler we don't have a BILOU parser or data loader in NLP Architect (you're welcome to contribute 😃).
You can write a script to translate your tags to BILOU format (more info here)

from nlp-architect.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.