Giter Club home page Giter Club logo

ptm's Introduction

Practical Introduction to Text Mining

Binder

Lecturer(s):

Dr. Andreas Niekler is a researcher at Leipzig University. He studied media technology at HTWK Leipzig and the University of West Scotland. After two years working as a freelance programmer and teacher at the Leipzig School of Media, he joined the NLP group at Leipzig University. In his dissertation project he worked with automated methods for content and topic analyses in news-oriented text sources.

Dr. Arnim Bleier is a postdoctoral researcher in the Department Computational Social Science at GESIS. His research interests are in the field of Natural Language Processing and Computational Social Science. In collaboration with social scientists, he develops Bayesian models for the content, structure and dynamics of social phenomena.

Date: 18.04 - 19.04.2018

Contents

The workshop provides an introduction to Natural Language Processing (NLP) with a special emphasis on the analysis of Job Advertisements. NLP techniques enable researchers to describe contents of a collection of documents or filter a huge collection for specific thematic aspects. This workshop concentrates on the basic concepts needed for quantitative text analysis. It provides an overview starting with issues related to data import, frequency analysis and continues with co-occurrence analysis. Participants take part in short theoretical lectures and will be provided with R scripts to compute own models in exemplary tutorials.

Goals

The course is targeted at labour market researchers and researchers form the humanities who are interested in analyzing large textual data sets. Participants will learn about opportunities and limits of text mining methods to analyze qualitative and quantitative aspects of large text collections. With example scripts provided in the programming language R, participants will learn how to realize single steps of such an analysis on a corpus of Job Advertisements. We cover a range of text mining methods from simple lexicometric measures such as word frequencies, key term extraction and co-occurrence analysis. Furthermore, we provide a short overview of more complex machine learning approaches such as topic models and supervised text classification. The goal is to provide a broad overview of technologies that are already established in the social sciences and that have the potential to be used in NLP based labour market research.

Requirements

The workshop is hands-on oriented and we will used the programming language R. Thus, we strongly recommend some basic knowledge of R. If you already have a certain amount of knowledge in another programming language, learning R will be easy for you. However, since R is a statistical programming language, some of its concepts largely differ from other languages. For participants without basic knowledge of R, we strongly recommend to learn at least a little in preparation of the course. For this, we provide links to material and online tutorials prior to the course through the Gesis e-learning platform.

For a very brief overview of common R commands see: Basic R functions

Tutorials

The course consists of tutorials originally described in this paper.

  1. Processing of textual data
  2. Frequency analysis
  3. Key term extraction
  4. Co-occurrence analysis
  5. Model inference
  6. Model application and visualization

License

These notebooks are built on tm4ss - Text Mining for Social Scientists and Digital Humanists by Gregor Wiedemann and Andreas Niekler.

Wiedemann, Gregor; Niekler, Andreas (2017): Hands-on: a five day text mining course for humanists and social scientists in R. Proceedings of the 1st Workshop Teaching NLP for Digital Humanities (Teach4DH), GSCL 2017, Berlin.

Funded partially by the German Research Foundation (DFG). FKZ/project number: 324867496.

ptm's People

Contributors

arnim avatar bitnik avatar mio-hiehei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ptm's Issues

Security Vulnerability

Dear @arnim or @bitnik
Could you please solve this vulnerability issue.
Thanks.

1 jupyterhub vulnerability found in binder/requirements.txt 35 minutes ago
Remediation
Upgrade jupyterhub to version 0.9.6 or later. For example:

jupyterhub>=0.9.6
Always verify the validity and compatibility of suggestions with your codebase.

Details
CVE-2019-10255 More information
moderate severity
Vulnerable versions: < 0.9.6
Patched version: 0.9.6
An Open Redirect vulnerability for all browsers in Jupyter Notebook before 5.7.8 and some browsers (Chrome, Firefox) in JupyterHub before 0.9.6 allows crafted links to the login page, which will redirect to a malicious site after successful login. Servers running on a base_url prefix are not affected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.