Giter Club home page Giter Club logo

word_frequency_lists_ita's Introduction

Word_frequency_Lists_ITA

Handy frequency lists for Italian lexical words calculated from the corpus ItWac (Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E., 2009).

Contents:

NOUNS

  • itwac_nouns_lemmas_notail_2_0_0.csv List of word forms tagged as NOUNS. The minimum token frequency in this list is 3. Contains: wordform, lemma, POS, frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemma_v2.0.0

  • itwac_nouns_lemmas_raw_2_0_0.zip List of word forms tagged as NOUNS. The minimum token frequency in this list is 1. Contains: wordform, lemma, POS, frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemma_v2.0.0

VERBS

  • itwac_verbs_lemmas_notail_2_1_0.csv List of word forms tagged as lexical VERBS (no auxiliary verbs). Contains: wordform, lemma, POS, modality, POS2 (ideally, functional verbs), frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemma_verb_2_1_0.

  • itwac_verbs_list_of_lemmas_2_1_0.csv List of lemmas from most to least represented across lexical VERB wordforms. Encoding: utf-8. Calculated using countlemma_verb_2_1_0.

ADJECTIVES

  • itwac_adj_lemmas_notail_2_1_0.csv List of word forms tagged as ADJ. The minimum token frequency in this list is 3. Contains: wordform, lemma, POS, frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemmaADJ

  • itwac_adj_lemmas_raw_2_1_0.zip List of word forms tagged as ADJ. The minimum token frequency in this list is 1. Contains: wordform, lemma, POS, frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemmaADJ

  • countlemma_v2.0.0.R Code used to provide a frequency list of all the NOUN forms present in Itwac, tagged for POS and lemma. This version is less time consuming in handling big files if compared to v_1.

  • countlemma_verb_2_1_0.R Code used to provide a frequency list of all the VERB forms present in Itwac, tagged for POS, lemma, modality.

  • countlemma_adj.R Code used to provide a frequency list of all the ADJ forms present in Itwac, tagged for POS and lemma.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.