Giter Club home page Giter Club logo

ruts's Issues

ошибка подсчета статистик на коротких текстах

Простой пример:

ds = DiversityStats('саид, ты опять абдулле насолил?').get_stats()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/ruts/diversity_stats.py", line 163, in get_stats
    'dttr': self.dttr,
  File "/usr/local/lib/python3.6/dist-packages/ruts/diversity_stats.py", line 119, in dttr
    return calc_dttr(self.words)
  File "/usr/local/lib/python3.6/dist-packages/ruts/diversity_stats.py", line 300, in calc_dttr
    return log10(n_words)**2 / (log10(n_words) - log10(n_lexemes))
ZeroDivisionError: float division by zero

Проверялось на 0.5.0

Разница между ruts.DiversityStats и отдельными функциями

Заметил странное поведение , все значения разные.
t1 = 'Бальзам хороший, но пришёл один а не два, как написано '

import ruts
ds = ruts.DiversityStats(t1)
ds.get_stats()

{'ttr': 1.0,
'rttr': 3.162277660168379,
'cttr': 2.23606797749979,
'httr': 1.0,
'sttr': 0,
'mttr': 0.0,
'dttr': 0,
'mattr': 1.0,
'msttr': 1.0,
'mtld': 0.0,
'mamtld': 1.0,
'hdd': -1,
'simpson_index': 0,
'hapax_index': 0}

vs

print('ttr' , ruts.diversity_stats.calc_ttr(t1))
print('rttr',ruts.diversity_stats.calc_rttr(t1))
print('cttr',ruts.diversity_stats.calc_cttr(t1))
print('httr',ruts.diversity_stats.calc_httr(t1))
print('sttr',ruts.diversity_stats.calc_sttr(t1))
print('mttr',ruts.diversity_stats.calc_mttr(t1))
print('dttr',ruts.diversity_stats.calc_dttr(t1))
print('mattr',ruts.diversity_stats.calc_mattr(t1))
print('msttr',ruts.diversity_stats.calc_msttr(t1))
print('mtld',ruts.diversity_stats.calc_mtld(t1))
print('mamtld',ruts.diversity_stats.calc_mamtld(t1))
print('hdd',ruts.diversity_stats.calc_hdd(t1))
print('simpson_index' , ruts.diversity_stats.calc_simpson_index(t1) )
print('hapax_index',ruts.diversity_stats.calc_hapax_index(t1) )

ttr 0.4
rttr 2.9664793948382653
cttr 2.0976176963403033
httr 0.7713465066366824
sttr 0.5314553128319692
mttr 0.1313826679597258
dttr 7.611354035728222
mattr 0.41
msttr 0.42
mtld 14.338133470257823
mamtld 12.708333333333334
hdd 0.4587105249530551
simpson_index 15.0
hapax_index 319.06649307394474

[Feature request] Опция "нормализации"/масштабирования в Basic stats

Предлагаю добавить опцию представления в нормализованных/относительных величинах большей части статистик из набора BasicStats(). Все количества слов, кроме общего числа слов делить на это общее число слов. Аналогично со знаками.
c_letters, c_syllables, n_complex_words, n_monosyllable_words, n_polysyllable_words, n_long_words, n_simple_words, n_unique_words делить/нормировать на n_words.
n_letters, n_punctuations, n_spaces делить/нормировать на n_chars.
Удобнее не самому делить, а сразу получать в выдаче.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.