Giter Club home page Giter Club logo

datamicroarray's People

Contributors

ramhiser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datamicroarray's Issues

describe_data() deviates from datasets

chiaretti:
describe_data() says n = 111, p = 12625 and K = 2.
But when I load the dataset, I get a matrix of dimension 128 x 12625 and the class vector has 6 levels.

golub:
describe_data() says p = 3, but class vector has only 2 levels.

shipp:
describe_data() says n = 58 and p = 6817, but data matrix has dimension 77 x 7129.

Add Adam et al. (2002) prostate cancer data set

The mass-spec prostate cancer data set is mentioned in the Levina et al. (2008) Annals of Applied Statistics paper entitled "Sparse estimation of large covariance matrices via a nested Lasso penalty"

Investigate improved storage of data sets

Currently, we store each data set in the /data folder so that they are installed when the package is installed. Given that some of the data sets are quite large, it would be desirable to load these data sets in a more dynamic manner.

For example: say the data sets are stored on github, but only a few select (small) data sets are included when the package is installed initially. But then add a helper function to download the additional data sets.

problems with loading library

running library(datamicroarray) in vim plugin Nvim-R for the first time, I get the following error:

Error in read.table(idx, sep = "\t", quote = "", stringsAsFactors = FALSE) : 
  no lines available in input

traceback direct to

5: stop("no lines available in input")
4: read.table(idx, sep = "\t", quote = "", stringsAsFactors = FALSE)
3: GetFunDescription(packlist)
2: nvim.bol(paste0(bdir, "omnils_", p, "_", pvi), p, TRUE)
1: nvimcom:::nvim.buildomnils("datamicroarray")

running the load library again in the same R session the problem would not occur.

in the Nvim-R issue tracker: jalvesaq/Nvim-R#277

Examine Gravier (2010) data set

Måns Thulin from Uppsala University sent the following email to me:

I am now planning to use the Gravier (2010) data to illustrate a new method in a paper, but was wondering if perhaps some of the patients in the study have been misclassified in your R package. According to the Gravier et al. paper and your description of the data on the wiki, there should be 111 patients labelled "good" and 57 labelled "poor". However, when I import the data into R, I get the following:

summary(gravier$y)
good poor
106 62

The numbers of patients (168) and features (2,905) are correct, but there seems to be a problem with the class labels. Have 5 "good" patients been labelled as "poor" or is there in fact a misprint in the Gravier et al. paper? Any insights that you could provide regarding this would be deeply appreciated!

Add data set descriptions to github Wiki

  • Alon et al. (1999)
  • Bhattacharjee et al. (2001)
  • Chiaretti et al. (2004)
  • Golub et al. (1999)
  • Gordon et al. (2002)
  • Gravier et al. (2010)
  • Khan et al. (2001) - SRBCT
  • Oberthuer et al. (2006) - Neuroblastoma
  • Pomeroy et al. (2002) - CNS
  • P53
  • Shipp et al. (2002)
  • Singh et al. (2002)
  • van't Veer et al. (2002)
  • West et al. (2001)
  • Yeoh et al. (2002) - St. Jude

Update GEO/ArrayExpress scripts to use Bioconductor

Before I learned about the GEOquery and ArrayExpress packages on Bioconductor, I was downloading the data manually.

Update the scripts for each of the following data sets:

  • Borovecki (2005)
  • Christensen (2009)
  • Gravier (2010)

Add helper function to list all data sets

This function will be useful in simulations where we wish to apply classifiers to every data set collected. For example, if we are interested only in data sets with K = 2, we need only load these data sets.

The function should list the following:

  • Object name, e.g., the Golub et al. (1999) is located in golub
  • The number of classes, K
  • The sample size, N
  • The number of features, p

Add data set descriptions to help.r

  • Alon et al. (1999)
  • Bhattacharjee et al. (2001)
  • Chiaretti et al. (2004)
  • Golub et al. (1999)
  • Gordon et al. (2002)
  • Gravier et al. (2010)
  • Khan et al. (2001) - SRBCT
  • Oberthuer et al. (2006) - Neuroblastoma
  • Pomeroy et al. (2002) - CNS
  • P53
  • Shipp et al. (2002)
  • Singh et al. (2002)
  • van't Veer et al. (2002)
  • West et al. (2001)
  • Yeoh et al. (2002) - St. Jude

Chin data set not exported as numeric matrix

The matrix is exported as a character matrix:

> library(datamicroarray)
> data(chin)
> chin$x[, 1]
  [1] "10.169815" "10.565664" "9.589976"  "10.324175" "9.784195"  "8.969536"
  [7] "10.973057" "11.399529" "10.798559" "9.685487"  "12.051186" "10.030907"
 [13] "10.307187" "11.320309" "10.404591" "10.833785" "9.735426"  "10.797899"
 [19] "10.627682" "10.631056" "10.335046" "9.758425"  "10.472313" "10.469456"
 [25] "10.266331" "9.789535"  "10.788861" "11.191206" "10.337112" "10.871078"
 [31] "9.896685"  "9.651166"  "10.793316" "10.475492" "9.740225"  "10.437926"
 [37] "9.941238"  "10.752173" "11.025179" "10.449146" "10.502874" "9.887005"
 [43] "10.324535" "11.92731"  "10.011643" "9.074154"  "9.650978"  "10.960044"
 [49] "11.080833" "10.730092" "10.144769" "10.258973" "11.342681" "11.20937"
 [55] "10.439279" "9.872279"  "10.067042" "10.843696" "9.799298"  "10.762967"
 [61] "11.250308" "10.739098" "10.967985" "10.139285" "10.482729" "11.012492"
 [67] "10.839745" "11.115753" "10.995832" "10.024971" "10.111507" "11.373869"
 [73] "10.818594" "11.437675" "10.709085" "11.275032" "10.537405" "10.175087"
 [79] "10.822135" "9.781922"  "9.165403"  "10.538037" "8.688913"  "10.582591"
 [85] "10.726001" "10.150915" "10.373924" "10.986752" "11.470086" "10.666458"
 [91] "10.65508"  "11.493546" "10.419414" "10.164545" "9.44763"   "10.199079"
 [97] "10.612112" "10.538597" "10.92159"  "11.112414" "9.917373"  "10.352251"
[103] "10.749506" "10.191069" "10.953824" "7.211304"  "9.702876"  "10.076364"
[109] "11.080688" "10.278594" "11.371984" "10.271792" "10.553228" "10.193828"
[115] "11.170514" "10.349621" "10.679596" "10.031797"

bug by install_github("datamicroarry","ramey")

Dear Ramhiser,

i wanted to install Datamicroarry package , but unfortunately i always get an error.

It looks like this:
install.packages("devtools") # okay
devtools::install_github("ramey/datamicroarry") # but here:

Downloading github repo ramey/datamicroarry@master
Error in download(dest, src, auth) : client error: (404) Not Found
Could you help me on this?

With kind regards,
SHASHANK K S

Add van't Veer et al. (2002) breast cancer data set

The data set is given in the nki object in the breastCancerNKI package on Bioconductor.

Type abstract(nki), and notice that there should be "151 had lymph-node-negaitve disease, and 144 had lymph-node-positive disease." However, in exprs(nki), there are 337 observations. It is unclear which are the 144 + 151 = 295 observations.

Finalize data set descriptions in wiki

Finalize the wiki description of each of the following data sets:

  • Alon (1999)
  • Borovecki (2005)
  • Burczynski (2006)
  • Chiaretti (2004)
  • Chin (2006)
  • Chowdary (2006)
  • Christensen (2009)
  • Golub (1999)
  • Gordon (2002)
  • Gravier (2010)
  • Khan (2001)
  • Pomeroy (2002)
  • Shipp (2002)
  • Singh (2002)
  • Sorlie (2001)
  • Su (2002)
  • Subramanian (2005)
  • Tian (2003)
  • West (2001)
  • Yeoh (2002)

Add data sets from this PLOS ONE paper

This PLOS ONE paper provides a table of 19 2-population microarray data sets. I have gathered the majority of these data sets into the datamicroarray package, but below I have listed several the papers that I missed. (The data set number provided in Table 2 is given in square brackets.)

  1. Hodges (2006) HD Caudate [2]
  2. Hodges (2006) HD Cerebellum [4]
  3. Okada (2003) Liver Cancer [10]
  4. Beer (2002) Lung Cancer [12]
  5. Iizuka (2003) Liver Cancer [13]
  6. Dhanasekaran (2001) Prostate Cancer [14]
  7. Gruvberger (2001) Breast Cancer [15]
  8. Berchuck (2005) Ovarian Cancer [17]
  9. Zapala (2005) Neural Tissue [18]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.