Giter Club home page Giter Club logo

nlp's Introduction

NLP Course

version release language last-update last-update

Free hands-on course with the implementation (in Python) and description of several Natural Language Processing (NLP) algorithms and techniques.

Although it is not intended to have the formal rigor of a book, we tried to be as faithful as possible to the original algorithms and methods, only adding variants, when these were necessary for didactic purposes.

Quick Start

If you want to play with these notebooks online without having to install any library or configure hardware, you can use the following service:

  • Open In Colab

What is NLP?

Natural Language Processing project with Python frameworks. NLP is a discipline where computer science, artificial intelligence and cognitive logic are intercepted, with the objective that machines can read and understand our language for decision making.

NLP Header

Content

1. NLP with spaCy
  • Read natural text of a book in Spanish
  • Create a NLP model with spaCy
  • Working with POS, NES and sentences
2. Semantic Enrichment of Entities
  • Semantic Enrichment
  • SPARQL
  • DBpedia
3. Spell Checker/Corrector
  • Spell Checker from scratch
  • Spell Checker using PySpellChecker class
4. Word Embedding with Gensim
  • Read natural text of a book in English
  • Tokenize and remove Stopwords
  • Create a Word2Vec model
  • Plot similars words
  • Export similarity between the words
5. Relationship between Words
  • Networks and Force System
  • d3.js

Data

Books in plain text, both in English and Spanish. The enrichment of the entities is done from DBpedia.

Python Dependencies

    conda install -c conda-forge spacy
    python -m spacy download en_core_web_sm
    python -m spacy download es_core_news_sm
    conda install -c conda-forge sparqlwrapper
    pip install pyspellchecker
    conda install -c anaconda gensim
    conda install -c conda-forge wordcloud

Software Version

  • Python 3.8.5
  • spaCy 3.0.5

Contributing and Feedback

Any kind of feedback/suggestions would be greatly appreciated (algorithm design, documentation, improvement ideas, spelling mistakes, etc...). If you want to make a contribution to the course you can do it through a PR.

Author

  • Created by Andrés Segura Tinoco
  • Created on June 04, 2019
  • Updated on July 04, 2021

License

This project is licensed under the terms of the MIT license.

Acknowledgments

I would like to thank Project Gutenberg for sharing the books in English and Peter Norvig for the spell checker algorithm.

nlp's People

Contributors

ansegura7 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.