Giter Club home page Giter Club logo

qts's Introduction

QTS โ€“ Quick Text Stats

QTS (Quick Text Stats) is a command line tool based on nltk that easily extracts basic measures from raw text using language-specific models. The output language depends on the text language (i.e. english for english texts, italian for italian texts).

Currently supported languages

  • Italian
  • English

Currently included features

  • Number of sentences and tokens
  • Token/sentence ratio
  • Number of characters (with and without whitespaces)
  • Number of tyoe words
  • Type/Token ratio
  • Number of content vs. functional words
  • Part-of-Speech distribution (e.g. number of NOUNs, VERBs, etc.)*
  • Part-os-Speech/sentence ratio (e.g. number of NOUNs per sentence)*
  • english only (due to NLTK limitations). Using universal tagset.

Usage

$python3 QTS.py it your_raw_text.txt # Italian
$python3 QTS.py en your_raw_text.txt # English

Upcoming features

  • Extended language support (French, Spanish, Russian)
  • Dependency relation distribution (e.g. number of OBJ relations)
  • Dependency relation/sentence ratio (e.g. number of OBJ relation per sentence)
  • Other tagsets for PoS-tagging (e.g. PennTreebank tagset)
  • Readability indexes
  • Number of paragraphs(?)

qts's People

Contributors

andreafailla avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.