Giter Club home page Giter Club logo

naivebayes's Introduction

Hi there πŸ‘‹

I'm Michal, a data and open-source enthusiast with a background in statistics. I pursued my academic interests at the University of Vienna and Duke University, where I explored statistical analysis and programming.

Open source contributions

In my free time, I engage in both independent open-source software development, working on my own projects, and collaborative contributions to existing projects. Notably, I contribute to projects such as oHMMed within the field of population genetics and AIRSHIP which focuses on visualizing simulation results in clinical trials.

I'm dedicated to open source, driven by the belief in the benefits of collaborative development, transparent sharing of knowledge in the software community, and the personal growth that comes with it.

One of the interesting aspects of my journey has been the growing popularity of the naivebayes R package I developed, reaching over 300K downloads and being used in the actual high quality scientific research. It's rewarding to see it becoming a valuable tool in the R community - particularly for those diving into machine learning - as well as in the scientific community.

I am currently working on...

I am actively engaged in the development of easyPlot, an intuitive graphical user interface (GUI) meticulously designed for ggplot2. easyPlot serves as a powerful tool, allowing users to effortlessly create four fundamental types of graphs β€” scatterplots, histograms, boxplots, and bar charts β€” with just a few clicks.



naivebayes's People

Contributors

majkamichal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

naivebayes's Issues

How to use additional density()-parameters for naive_bayes() tuning

I was wondering, if further additional parameters of the stats::density() function can be used when executing naive_bayes().

Actually, I am applying the naive_bayes() classifier on a mixed variables data set where most of the numeric data is non-negative. For this reason a log-normal prior-distribution or a KDE, which ensures no probabilities for values < 0 are estimated, seem to be a good choice for my case.
The stats::density() function, which you used for the KDE in naive_bayes() has the argument 'to', which could ensure probablities for values < 0 are zero.

Is it possible to make use of this argument when executing the naive_bayes() with kernel= TRUE?

Many thanks in advance for any reply and best regards
AndrΓ©

Numerical underflow in predict.naive_bayes

Naive Bayes is vulnerable to numerical underflow in the prediction step if the dimensionality of the predictors is much larger than the number of observations. For example, consider the following:

library(naivebayes)
n<-100; k<-2000
X<-matrix(rnorm(n*k),nrow=n)
b<-rnorm(k)
eta<-drop(X%*%b)
y<-rbinom(n,1,plogis(eta))
tr_idx<-1:floor(.8*n)
Xtrn<-X[tr_idx,]
ytrn<-y[tr_idx]
Xtst<-X[-tr_idx,]
ytst<-y[-tr_idx]
fit<-naive_bayes(Xtrn,ytrn,usekernel=TRUE)
preds<-predict(fit,Xtst,type="prob")
head(preds) #they will mostly all be NaN

I believe this is due to the implementation of the log-sum-exp operation. If these lines are replaced with the equivalent, more stable functions from package matrixStats such as logSumExp and/or rowLogSumExps, the underflow issue will probably go away.

log(p) = -Inf

In predict.naive_bayes, around line 46 and 50, p might be =0
p[p==0]=threshold should be added to get -Inf for log(p).

plot crashes when missing data present in trainingset

This works as expected

library(naivebayes)
m <- naive_bayes(Species ~ Sepal.Width, data=iris)
plot(m)

This crashes

iris$Sepal.Width[1] <- NA
m <- naive_bayes(Species ~ Sepal.Width, data=iris)
plot(m)

Error in seq.default(r[1], r[2], length.out = 512) : 
  'from' must be a finite number

Great package btw. I love how the naive_bayes interface is modeled after base R!

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] naivebayes_0.9.3

loaded via a namespace (and not attached):
[1] compiler_3.5.2 tools_3.5.2    yaml_2.2.0    

Extracting feature importance

Excellent package! The most accessible approach to NB classification that I've found.

I'm wondering if there is a way to extract feature weight/importance from the model? I didn't see any relevant accessors nor any obvious slots in the naive_bayes object.

Error when feeding just 1 predictor into Naivebayes model

>     i<-2
>     nbmodel<-naive_bayes(data=trainset, y=trainset$label,x=trainset[2:i],usekernel= TRUE)
>     nbmodel_predict<-predict(nbmodel,as.vector(x_test))
Warning message:
In t(log_sum) + log(prior) :
  Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.

I suppose the package does not expect to handle a data-set with just 1 feature? Or i am misunderstanding some fundamental concept here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.