Giter Club home page Giter Club logo

text-mining's Introduction

text-mining

Unstructured Data Analysis (Graduate) @Korea University

Notice

Schedule

Topic 1: Introduction to Text Mining

  • The usefullness of large amount of text data and the challenges
  • Overview of text mining methods

Topic 2: From Texts to Data

  • Obtain texts to analyze
  • Text data collection through APIs and web scraping

Topic 3: Natural Language Processing

  • Introduction to NLP
  • Lexical analysis
  • Syntax analysis
  • Other topics in NLP
  • Reading materials
    • Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2), 48-57. (PDF)
    • Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug), 2493-2537. (PDF)
    • Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709. (PDF)

Topic 4-1: Document Representation I: Classic Methods

  • Bag of words
  • Word weighting
  • N-grams

Topic 4-2: Document Representation II: Distributed Representation

  • Word2Vec
  • GloVe
  • FastText
  • Doc2Vec

Topic 5: Dimensionality Reduction

  • Dimensionality Reduction
  • Supervised Feature Selection
  • Unsupervised Feature Extraction: Latent Semantic Analysis (LSA) and t-SNE
  • R Example

Topic 6: Document Similarity & Clustering

  • Document similarity metrics
  • Clustering overview
  • K-Means clustering
  • Hierarchical clustering
  • Density-based clustering

Topic 7: Document Classification I

  • Document classification overview
  • Naive Bayesian classifier
  • k-Nearest Neighbor classifier
  • Classification tree
  • Support Vector Machine (SVM)

Topic 8: Document Classification II

  • Introduction to Neural Network
  • Recurrent neural network-based document classification
  • Convolutional neural network-based document classification

Topic 9-1: Topic Modeling I

  • Topic modeling overview
  • Probabilistic Latent Semantic Analysis: pLSA
  • LDA: Document Generation Process

Topic 9-2: Topic Modeling II

  • LDA Inference: Gibbs Sampling
  • LDA Evaluation

text-mining's People

Contributors

pilsung-kang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.