Giter Club home page Giter Club logo

lowresource-nlp-bootcamp-2020's Introduction

CMU LTI Low Resource NLP Bootcamp 2020

This is a page for a low-resource natural language and speech processing bootcamp held by the Carnegie Mellon University Language Technologies Institute in May 2020. The bootcamp was held virtually for some visitors to the institute, but we are making the videos and materials available for those interested in learning on your own. It comes in 8 parts, all with lecture videos and example exercises that you can do to expand your knowledge.

1. NLP Tasks

This lecture by Graham Neubig gives a high-level overview of a variety of NLP tasks (slides).

NLP Tasks

The exercise has participants download spaCy and see the types of linguistic outputs generated in its tutorial. We also examined the Universal Dependencies Treebank to see the various other languages that have annotated data such as that generated by spaCy's analysis.

2. Linguistics - Phonology and Morphology

This lecture by David Mortensen gives some linguistic background of phonology and morphology (slides).

Linguistics - Phonology and Morphology

The exercise has participants use epitran to generate phonetic transcriptions of words, and try to read some words in the international phonetic alphabet.

3. Machine Translation

This lecture by Antonis Anastasopoulos explains about machine translation, both phrase-based and neural (slides).

Machine Translation

The exercise runs through tutorials on word alignment with fast-align, and neural machine translation with JoeyNMT using data from the Latvian-English translation task at WMT.

4. Linguistics - Syntax and Morphosyntax

This lecture by Lori Levin explains about aspects of linguistics related to syntax and morphosyntax (slides).

Linguistics - Syntax and Morphosyntax

The exercise consists of creating an interlinear gloss for the language of your choice.

5. Neural Representation Learning

This lecture by Pengfei Liu explains about various methods for learning neural representations of language (slides).

Neural Representation Learning

The exercise, by Antonis Anastasopoulos, introduces learning of word representations using fastText, using them for simple text classification, and finding similar words.

6. Multilingual NLP

This lecture by Yulia Tsvetkov explains about how you can train multilingual NLP systems that work in many different languages (slides).

Multilingual NLP

The exercise, by Chan Park, provides two Jupyter noteboks that explain how to train a Naive Bayes Classifier for classification across languages, and introduces how to use multilingual BERT for cross-lingual classification.

7. Speech Synthesis

This lecture by Alan Black explains about speech synthesis, generating speech from text (slides: overview, building voices, unwritten languages).

Speech Synthesis

The exercise demonstrates how you can build your own talking clock using your voice in a language of your choice, and you can get instructions here and here.

8. Speech Recognition

This lecture by Bhiksha Raj explains about speech recognition, converting speech into textual transcriptions (slides).

Speech Recognition

The exercise, by Hira Dhamyal, demonstrates how to build a speech recognition system in Kaldi, specifically focusing on the mini-librispeech example.

lowresource-nlp-bootcamp-2020's People

Contributors

neubig avatar chan0park avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.