Giter Club home page Giter Club logo

nlp_quickbook's Introduction

Natural Language Processing Notebooks

Written for Practicing Engineers

This work builds on the outstanding work which exists on Natural Language Processing. These range from classics like Jurafsky's Speech and Language Processing to rather modern work in The Deep Learning Book by Ian Goodfellow et al.

While they are great as introductory textbooks for college students - this is intended for practitioners to quickly read, skim, select what is useful and then proceed. There are several notebooks divided into 7 logical themes.

Each section builds on ideas and code from previous notebooks, but you can fill in the gaps mentally and jump directly to what interests you.

Chapter 01

Introduction To Text Processing, with Text Classification

  • Perfect for Getting Started! We learn better with code-first approaches

Chapter 02

  • Text Cleaning notebook, code-first approaches with supporting explanation. Covers some simple ideas like:
    • Stop words removal
    • Lemmatization
  • Spell Correction covers almost everything that you will ever need to get started with spell correction, similar words problems and so on

Chapter 03

Leveraging Linguistics is an important toolkit in any practitioners toolkit. Using spaCy and textacy we look at two interesting challenges and how to tackle them:

  • Redacting names
    • Named Entity Recognition
  • Question and Answer Generation
    • Part of Speech Tagging
    • Dependency Parsing

Chapter 04

Text Representations is about converting text to numerical representations aka vectors

  • Covers popular celebrities: word2vec, fasttext and doc2vec - document similarity using the same
  • Programmer's Guide to gensim

Chapter 05

Modern Methods for Text Classification is simple, exploratory and talks about:

  • Simple Classifiers and How to Optimize Them from scikit-learn
  • How to combine and ensemble them for increased performance
  • Builds intuition for ensembling - so that you can write your own ensembling techniques

Chapter 06

Deep Learning for NLP is less about fancy data modeling, and more engineering for Deep Learning

  • From scratch code tutorial with Text Classification as an example
  • Using PyTorch and torchtext
  • Write our own data loaders, pre-processing, training loop and other utilities

Chapter 07

Building your own Chatbot from scratch in 30 minutes. We use this to explore unsupervised learning and put together several of the ideas we have already seen.

  • simpler, direct problem formulation instead of complicated chatbot tutorials commonly seen
  • intents, responses and templates in chat bot parlance
  • hacking word based similarity engine to work with little to no training samples

nlp_quickbook's People

Contributors

dependabot[bot] avatar nirantk avatar theainerd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlp_quickbook's Issues

anaconda-client version might be outdated

On running pip install -r requirements.txt in the virtual environment, this shows up

Collecting anaconda-client==1.6.14 (from -r requirements.txt (line 1))
  ERROR: Could not find a version that satisfies the requirement anaconda-client==1.6.14 (from -r requirements.txt (line 1)) (from versions: 1.1.1, 1.2.2)
ERROR: No matching distribution found for anaconda-client==1.6.14 (from -r requirements.txt (line 1))

Did some digging around and found that there are two versions supported now - 1.2.2 and 1.1.1 (https://pypi.org/project/anaconda-client/#history). I changed it to 1.2.2 and it worked fine.

nlp = spacy.load('en'), error appeared

for nlp = spacy.load('en') when I run on jupyter i faced following error.


OSError Traceback (most recent call last)
in ()
1 #python -m spacy download en as Administrator
----> 2 nlp = spacy.load('en')

C:\Users\Public\Anaconda3\lib\site-packages\spacy_init_.py in load(name, **overrides)
13 if depr_path not in (True, False, None):
14 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 15 return util.load_model(name, **overrides)
16
17

C:\Users\Public\Anaconda3\lib\site-packages\spacy\util.py in load_model(name, **overrides)
117 elif hasattr(name, 'exists'): # Path or Path-like to model data
118 return load_model_from_path(name, **overrides)
--> 119 raise IOError(Errors.E050.format(name=name))
120
121

OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

To which you suggested to apply "python -m spacy download en as Administrator" . I was given following error
(base) C:\Users\Public>python -m spacy download en as Administrator
Collecting as
Could not find a version that satisfies the requirement as (from versions: )
No matching distribution found for as

(base) C:\Users\Public>python
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

-m spacy download en as Administrator
File "", line 1
-m spacy download en as Administrator
^
SyntaxError: invalid syntax
exit()

image

Do review and advice

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.