Giter Club home page Giter Club logo

text-mining-course's Introduction

Text Mining

Assignments | Lab Worksheets | Syllabus

Overview

This course introduces students to the knowledge discovery process and methods used to mine patterns from a collection of text. We will critically review text mining methods developed in the knowledge discovery and databases, information science, and computational linguistics communities. Students will develop proficiency with modeling text through individual projects.

How can computers read? When we look at a paragraph of text, we have a set of skills to understand and interpret it: what is the message? Is it an argument? What is the sentiment? Computers don't have the same context or literacy. Their language is quantitative. Through text mining, this course will equip you with the skills to use understanding text through computing.

Text mining is most useful in the new affordances that it allows. In most cases, the tools of text mining aren't meant to replace 'close reading'; they give us new ways to ask questions - about literature, news, scholarship, correspondence, etc. - and are best applied in service of that novelty. Computing allows for:

  • Scale: Computers compare poorly to us in their ability to interpret meaning, but the things they can do may be applied to enormous scales. If you're interested in hundreds of books, thousands or web pages, or millions of tweets, simply reading them is unfeasible.
  • Re-contextualization: With text mining, you take apart texts and put them together in new ways. These give you new ways to understand information in a text or appreciate a book. Likewise, breaking down text to data also provides new comparative or critical tools. For example, we can understand what makes Jane Austen's books different from her contemporaries, or attribute authorship for anonymous or pseudonymous writing.
  • Summarization: Aggregation, extraction, and visualization all serve to report patterns you. For example, text summarization models can extract the takeaway points from a set of medical literature. A few final notes on course philosophy.

First, the broad view of text mining can encompass many disciplinary approaches. This course hews closely to the sub-area referred to as text analysis, intended to treat text mining in the services of qualitative questions. This is closest to the treatments in the digital humanities and computational social sciences.

For this course, you will be expected to learn new programming skills. Note that this is not a programming course. We will cover a subset of skills in Python that pertain to data science. Most of the time, your needs will be served by tinkering with and modifying code examples that I provide for you.

I understand the time constraints of being a student. To account for the time you will spend in this course learning new tools and writing code, I have tried to keep reading and writing loads reasonable.

Succeeding in this course will be through many little steps. The assignments are small but frequent. If you are looking at the entire outline of ideas and skills in this course, it may look overwhelming. However, going one step at a time, learning the language of text mining won't be scary.

Pre- and Co-requisites

An introductory level database and programming course or permission of the instructor.

Required Texts

This course incorporated readings from a variety of sources. Readings will openly accessible and posted on/linked from the course website. In addition to individual essays and papers, we will also return repeatedly to the following texts:

Schedule

  • Week 1: Introduction
  • Week 2: Fundamentals
  • Week 3: Features
  • Week 4: Text Mining for Art and Criticism
  • Week 5: Documentation Access; Natural Language Processing 1 - Part of Speech Tagging
  • Week 6: Natural Language Processing 2 - Information Extraction and Dependency Parsing
  • Week 7: Classification 1
  • Week 8: Classification 2
  • Week 9: Clustering
  • Week 10: Topic Modeling and Dimensionality Reduction 1
  • Week 11:Topic Modelling 2; Sentiment Analysis
  • Week 12: Visualization
  • Week 13: Word Embeddings
  • Week 14: What's Next: Remainder Notes from Text Mining

The week-to-week syllabus, with readings, slides, and schedule notes is on the Syllabus page.

Assignments

  • 30% Lab Tasks - Due Weekly
  • 20% Small Assigments
    • 10% - Twitter Bot Assignment
    • 10% - Topic Modelling Assignment
  • 35% Text Mining Project
  • 5% Problem Statement
  • 5% Literature review + 5% Data collection
  • 20% Final report
  • 15% Participation
  • 5% Attendance
  • 10% Forum posts, comments, class engagement

Details are on the Assignments page.

text-mining-course's People

Contributors

organisciak avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.