The lftk from brucewlee

Student at the University of Pennsylvania.

I'm interested in language, intelligence, and measuring things.

See my open-source projects below and papers here Scholar.

lftk's People

Contributors

Stargazers

Watchers

lftk's Issues

Is this function correct for calculating NERF?

Hello!

I am seeking to estimate text readability, and to accomplish this, I employed the NERF formula for computation. However, I encountered issues with the LingFeat library due to a dependency problem, as outlined in this GitHub issue.
Despite this setback, I attempted to use the LFTK library and found success.

Nevertheless, there is a disparity in the names of features between LeafFeat and LFTK. Consequently, I am uncertain about the correctness of the correspondence. Additionally, I am unable to locate the variable 'Constituency Parse Tree Height.' In an effort to address this concern, I turned to using the LingFeat library and it worked.

However, due to the differing names of features between LeafFeat and LFTK, I am unsure if the correspondence is correct. Could you please confirm whether this correspondence is accurate, especially regarding the variable 'Constituency Parse Tree Height'?

Thank you in advance.

import spacy
import lftk
import math
import nltk
from supar import Parser


# load models
nlp = spacy.load('en_core_web_sm')
SuPar = Parser.load('crf-con-en')

def preprocess(doc, short=False, see_token=False, see_sent_token=True):
    n_token = 1
    n_sent = 1
    token_list = []
    #raw_token_list = []
    sent_token_list = []

    # sent_list is for raw string sentences
    sent_list = []

    # count tokens, sentence + make lists
    #for sent in self.NLP_doc.sents:
    for sent in doc.sents:
        n_sent += 1
        sent_list.append(sent.text)
        temp_list = []
        for token in sent:
            if token.text.isalpha():
                temp_list.append(token.text)
                if short == True:
                    n_token += 1
                    token_list.append(token.lemma_.lower())
                if short == False:
                    if len(token.text) >= 3:
                        n_token += 1
                        token_list.append(token.lemma_.lower())
        if len(temp_list) > 3:
            sent_token_list.append(temp_list)

    #self.n_token = n_token 
    #self.n_sent = n_sent
    #self.token_list = token_list
    #self.sent_token_list = sent_token_list
    
    result = {"n_token": n_token, 
                "n_sent":n_sent
                }

    if see_token == True:
        result["token"] = token_list
    if see_sent_token == True:
        result["sent_token"] = sent_token_list

    return result

def calculate_nerf(extracted_features):
    return (0.04876 * extracted_features['t_kup'] - 0.1145 * extracted_features['t_subtlex_us_zipf']) / extracted_features['t_sent'] \
        + (0.3091 * (extracted_features['n_noun'] + extracted_features['n_verb'] + extracted_features['n_num'] + extracted_features['n_adj'] + extracted_features['n_adv']) + 0.1866 * extracted_features['n_noun'] + 0.2645 * extracted_features['to_TreeH_C']) / extracted_features['t_sent'] \
        + (1.1017 * extracted_features['t_uword']) / math.sqrt(extracted_features['t_word']) - 4.125


text = 'This is simple example sentence. This is another example sentence.'
doc = nlp(text)

# initiate LFTK extractor by passing in doc
LFTK = lftk.Extractor(docs = doc)
LFTK.customize(stop_words=True, punctuations=False, round_decimal=3)

preprocessed_features = preprocess(doc, short=False, see_token=False, see_sent_token=True)
TrSF = retrieve(SuPar, preprocessed_features['sent_token'])
feature_keys = ['t_kup', 't_subtlex_us_zipf', 't_sent', 'n_noun', 'n_verb', 'n_adj', 'n_adv', 'n_num', 't_uword', 't_word']

extracted_features = LFTK.extract(features = feature_keys)
extracted_features.update(TrSF)

# convert to float
extracted_features = {k: float(v) for k, v in extracted_features.items()}

print(calculate_nerf(extracted_features))

Whl file

Hallo,
I could build the package and create the .whl file, but I could not import it.
could you please check out why? or provide a correct file?

The .whl file is here:
https://drive.google.com/drive/folders/1SnDOXHBCpyBH1nSO72lQgqB1QDjiOxz2?usp=sharing

I need to install it without neither using internet connection nor .tar.gz file, could you please suggest how !
Thank you

bug in the `total_number_of_unique_words_no_lemma` function

Hi, dear author, thank you for your awesome work! I want to bring your attention to a possible bug in the implementation of total_number_of_unique_words_no_lemma function

In lines 194-195 of file lftk/foundation/wordsent.py, the total_number_of_unique_words_no_lemma function still does the lemma operation, even though no lemma is specified. This makes the "total_number_of_unique_words_no_lemma" function almost identical to total_number_of_unique_words.

This will lead to an error that corr_ttr is same as corr_ttr_no_lem

Recommend Projects

brucewlee / lftk Goto Github PK

lftk's Introduction

lftk's People

Contributors

Stargazers

Watchers

Forkers

lftk's Issues

Is this function correct for calculating NERF?

Whl file

bug in the `total_number_of_unique_words_no_lemma` function

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent