Giter Club home page Giter Club logo

aeda's Introduction

Hi there ๐Ÿ‘‹

aeda's People

Contributors

daryabusen avatar migraber avatar tuanle618 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

aeda's Issues

check all analysis' if NAs are in dataframe

I noticed that cluster analysis does not work with NAs. I will randomly insert NAs in testthat/base_finishReport file and check, if all analysis' can handle NAs or not.

In general: How should we handle missing Values? Omit them or do some kind of NA Imputation? Like columns average... ?

makeTask

I think we should create a function makeTask which has some features like the current makeCorrTask (basically only ID and data and we can add getDataTypes as well)
Finally assign this as ReportTask or something like that. From that all other tasks inherit from it.

Make in-body args to formal-defaults

In some functions, like makeClusterAnalysis, there are args which should be accesable for the user (for dbscan method the eps arg) but since they are set fix the user cant override them. If move them to the defaults (functions formals) then there should be freely accesable.
So we should check the functions for such problems.

Add check for installed packages

At the moment packages are loaded via require to prevent loading the same package multiple times.
-> If a package ist not installed user will get error: function foo noz found. So it would be better to check if the package is availabe( installed) and if not throw a more meaningful error

Condense commands

At the moment we need multiple commands to produce one finished rmd file:

my.report.task = makeReportTask(id = "test.report", data = airquality, target = "Wind")
basic.report = makeBasicReport(my.report.task, data = airquality)

Here there are two commands

my.creport.task = makeCorrTask(id = "corr.report", data = airquality)
my.creport = makeCorr(my.creport.task)
corr.report = makeCorrReport(my.creport, type = "CorrPlot")

And here are 3 commands needed.
Should we condense it so we always need one? Or does this has low priority?

Add generic text to summary reports

I tried starting to add generic text to numeric, categorical and cluster analysis reports.
Please run text examples for all reports and think what we might add. I believe we might even add some background for the methods applied (MDS, PCA, Factor Analysis).
@daryabusen In PCA please add more generic text. Method applied and some background what PCA does in general.

Child Structure + Plots

I think it would be better to organize the child so that the each plot has its own section:

\```{r}
plot1
\```
\```{r}
plot2
\```

This way it is easier to add title and generic text.

We should think about a proper object to handle this issue

INFO: Weird random.seed for MDS and PCA Reports

When trying to run fastReport I noticed, that the ID for PCA and MDS Reports are the same. I believe that those functions (for pca prcomp() and for mds cmdscale() but also isoMDS and maybe the other methods in makeMDSTask() might as well set the seed after execution to the same seed as the pca. Because when calling makeReport(pca.result) and makeReport(mds.result) both report.ids are the same. I investigated this further and found out that when applying another report between mds and pca, like numsum for example and then after pca another report like catsum, the id for numsum and catsum are the same. This I believe confirms my believe, that somehow after the makeMDS and makePCA which are right before the makeReport step set the seed to the same number.

Reproducible error:

#start with clean R-session CTRL+SHIFT+F10

devtools::load_all()

set.seed(1)

my.mds.task = makeMDSTask(id = "swiss", data = swiss)
mds.analysis = makeMDSAnalysis(my.mds.task)
mds.report = makeReport(mds.analysis)

cluster.task = makeClusterTask(id = "iris", data = iris,
  method = "cluster.kmeans")
cluster.analysis = makeClusterAnalysis(cluster.task)
cluster.report = makeReport(cluster.analysis)

pca.task = makePCATask(id = "iris.test", data = iris, center = TRUE, target = "Species")
pca.result = makePCA(pca.task)
pca.report = makeReport(pca.result)

#compare IDs
cluster.report$report.id
#[1] "T6cG3IC7CQJg3pcu"

mds.report$report.id
#[1] "oWz26cG3IC7CQJg3"

pca.report$report.id
#[1] "oWz26cG3IC7CQJg3"

#remove workspace
rm(list = ls())
#start with new session, CTRL+SHIFT+F10

devtools::load_all()

set.seed(1)

my.mds.task = makeMDSTask(id = "swiss", data = swiss)
mds.analysis = makeMDSAnalysis(my.mds.task)
mds.report = makeMDSAnalysisReport(mds.analysis)

pca.task = makePCATask(id = "iris.test", data = iris, center = TRUE, target = "Species")
pca.result = makePCA(pca.task)
pca.report = makePCAReport(pca.result)

cluster.task = makeClusterTask(id = "iris", data = iris,
  method = "cluster.kmeans")
cluster.analysis = makeClusterAnalysis(cluster.task)
cluster.report = makeClusterAnalysisReport(cluster.analysis)

mds.report$report.id
#[1] "oWz26cG3IC7CQJg3"

pca.report$report.id
#[1] "oWz26cG3IC7CQJg3"

cluster.report$report.id
#[1] "T6cG3IC7CQJg3pcu"

###try even more reports:
rm(list=ls())


#clean r session

devtools::load_all()

#try different seed
set.seed(10)

#for MDS try even another method
my.mds.task = makeMDSTask(id = "swiss", data = swiss, method = "isoMDS")
mds.analysis = makeMDSAnalysis(my.mds.task)
mds.report = makeReport(mds.analysis)

num.sum.task = makeNumSumTask("iris.test", iris, target = "Species")
num.sum = makeNumSum(num.sum.task)
num.sum.report = makeReport(num.sum)

pca.task = makePCATask(id = "iris.test", data = iris, center = TRUE, target = "Species")
pca.result = makePCA(pca.task)
pca.report = makeReport(pca.result)

cat.sum.task = makeCatSumTask("iris.test", iris, target = "Species")
cat.sum = makeCatSum(cat.sum.task)
cat.sum.report = makeReport(cat.sum)

cluster.task = makeClusterTask(id = "iris", data = iris,
  method = "cluster.kmeans")
cluster.analysis = makeClusterAnalysis(cluster.task)
cluster.report = makeReport(cluster.analysis)

mds.report$report.id
#[1] "oWz26cG3IC7CQJg3"

num.sum.report$report.id
#[1] "mcu73QN9ORHKrj73"

pca.report$report.id
#[1] "oWz26cG3IC7CQJg3" ---> SAME

cat.sum.report$report.id
#[1] "mcu73QN9ORHKrj73" ---> now catsum has the same report ID like num sum, which right after mds #was called

cluster.report$report.id
#[1] "T6cG3IC7CQJg3pcu"

As of now I set the seed to 89 in makeReport.PCAObj and makePCAReport to manually set another seed and fix the issue.

makeNumSum output description

Maybe we should provide a description for the columns? mean, min, max, ... are clear but for example lower/upper bound arent without looking into the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.