Giter Club home page Giter Club logo

nlp-tutorial's Introduction

NLP Tutorial

LICENSE GitHub issues GitHub stars GitHub forks

A list of NLP(Natural Language Processing) tutorials built on PyTorch.

Table of Contents

A step-by-step tutorial on how to implement and adapt to the simple real-word NLP task.

Text Classification

This repo provides a simple PyTorch implementation of Text Classification, with simple annotation. Here we use Huffpost news corpus including corresponding category. The classification model trained on this dataset identify the category of news article based on their headlines and descriptions.
Keyword: CBoW, LSTM, fastText, Text cateogrization

This text classification tutorial trains a transformer model on the IMDb movie review dataset for sentiment analysis. It provides a simple PyTorch implementation, with simple annotation.
Keyword: Transformer, Sentiment analysis

This repo provides a simple PyTorch implementation of Question-Answer matching. Here we use the corpus from Stack Exchange to build embeddings for entire questions. Using those embeddings, we find similar questions for a given question, and show the corresponding answers to those I found.
Keyword: CBoW, TF-IDF, LSTM with variable-length seqeucnes

This repo provides a simple Keras implementation of TextCNN for Text Classification. Here we use the movie review corpus written in Korean. The model trained on this dataset identify the sentiment based on review text.
Keyword: TextCNN, Sentiment analysis


Neural Machine Translation

This neural machine translation tutorial trains a seq2seq model on a set of many thousands of English to French translation pairs to translate from English to French. It provides an intrinsic/extrinsic comparison of various sequence-to-sequence (seq2seq) models in translation.
Keyword: sequence to seqeunce network(seq2seq), Attention, Autoregressive, Teacher-forcing

This neural machine translation tutorial trains a Transformer model on a set of many thousands of French to English translation pairs to translate from French to English. It provides a simple PyTorch implementation, with simple annotation.
Keyword: Transformer, SentencePiece


Natural Language Understanding

This repo provides a simple PyTorch implementation of Neural Language Model for natural language understanding. Here we implement unidirectional/bidirectional language models, and pre-train language representations from unlabeled text (Wikipedia corpus).
Keyword: Autoregressive language model, Perplexity

nlp-tutorial's People

Contributors

lyeoni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlp-tutorial's Issues

question-answer-matching missing file

Hi Lyeoni,

First of all, thank you a lot for your work in making these tutorials, which are interesting !

I am trying to run the question-answer-matching tutorial and reproduce your evaluation. Unfortunately, I can't download the Posts.xml file from git lfs as it looks like your subscription doesn't accept download anymore.
By any chance, do you have that file hosted somewhere else ? That would allow me to run the evaluation with your trained model.

Thanks a lot and I wish you a nice day ! :-)

Using the classifier

Hi
After saving the model in news-category-classification, how do you actually use it to predict text classification?
Can you put up an example, please?

Arabic to Urdu Machine Translation

@lyeoni

In the case I want to train an Arabic to Urdu Machine Translation:

  • is that attainable using this project?
  • what options should be set in training?
  • do you suggest another github project?

Little improvements for right indexes in vocabulary dictionaries

Hi, @lyeoni !
You have written great tutorials. I really appreciate you)
We can improve a little bit with one pretty line. Look, please)
Here, we fill first key-value items of stoi, itos by special tokens.
I suggest insert this line before cycle.
special_tokens = filter(lambda x: x is not None, [self.unk_token, self.bos_token, self.eos_token, self.pad_token])
If we don't set value for self.unk_token and set for self.bos_token, then index in dictionary become wrong. So, we need filter None values before.
Input
vocab = Vocab(body, bos_token='<bos>'); vocab.build(); vocab.stoi;
Wrong Output
'<bos>': 1 ' ': 1, 'hi': 2, 'bear': 3, ...

How could utilize GPU totally?

Thanks for your code!

I found that when I do training, the GPU are not totally utilized. So it there is way to add batch to train more pairs at one iter?

num_samples should be a positive integer value, but got num_samples=0

python train.py --epochs 12 --batch_size 2 --learning_rate .001 --hidden_size 64 --n_layers 1 --dropout_p .1

number of trained word vectors of data/glove.6B.100d.txt: 400000
Traceback (most recent call last):
File "train.py", line 200, in
train_loader = DataLoader(dataset=qa_train, batch_size=config.batch_size, shuffle=True, num_workers=4)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 176, in init
sampler = RandomSampler(dataset)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 66, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Please Let Know What's the exact issue

neural-machine-translation - nmt ZeroDivisionError: integer division or modulo by zero

Traceback (most recent call last):

File "", line 1, in
runfile('D:/nlp-tutorial/neural-machine-translation/nmt/train.py', wdir='D:/nlp-tutorial/neural-machine-translation/nmt')

File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 254, in
trainiters(pairs, encoder, decoder, n_iters)

File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 184, in trainiters
train_pairs += [random.choice(train_pairs) for i in range(n_iters%len(train_pairs))]

ZeroDivisionError: integer division or modulo by zero

Question about validate acc

Thanks for your great job! I learned a lot. However, I have a question.
I train the model for 7 epochs reaching a train acc of 95.2 and test(validate) acc of 85.2.
Is that normal? Could the final test(validate) acc be higher after more epochs? Thanks!

typo in preprocessing?

Hi,
In cleaning function in the script : nlp-tutorial/news-category-classifcation/preprocessing.py,
line 21 is written as text = re.sub(r'[!]{2,}', '?', text) # multiple ?s -> ?. There should be ? in first argument and It should be text = re.sub(r'[?]{2,}', '?', text) # multiple ?s -> ?.
Am I correct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.