bhklab / orcestra Goto Github PK

View Code? Open in Web Editor NEW

4.0 3.0 1.0 84.32 MB

ORCESTRA is a new web application that enables users to search, request and manage pharmacogenomic datasets (PSets).

License: Apache License 2.0

JavaScript 98.18% CSS 1.58% HTML 0.24%

orcestra's People

Contributors

Stargazers

Watchers

Forkers

mattbocc

orcestra's Issues

Update TCGA on orcestra to be from XenaBrowser

Microarray type for GDSC does not affect search/filtering

similar to #20 , changing the type of microarray in GDSC does not change which options for download are shown.

Use of OCRESTRA API end-points break backwards compatibility in PharmacoGx?

I just noticed this right now while trying to respond to a user issue for PharmacoGx.

When we migrated the availablePSets and downloadPSet functions to query the ORCESTRA API we started forcing all package users since this update to use the most recent PharmacoSet objects. This may break backwards compatibility if, for example, we migrate the data in the @molecularProfilesSlot to a MultiAssayExperiment. It may even break for new Bioconductor releases if there is a change in the object serialization in R from one release to another (e.g., as was done in version 3.5 -> 3.6), preventing users from loading PharmacoSets without updating their version of R.

Since all the previously released PharmacoSets are available on Zenodo, would it be possible to version the API URL such that we ensure users of previous PharmacoGx releases download the appropriate PharmacoSets? Then we can update the URL for each release to prevent breaking backwards compatibility. That way users get the appropriate version of the PharmacoSet for the BiocManager version.

Let me know your thoughts on this?

Chris

BeatAML SMILES missing

Hello,

I am using the PharmacoGx R package to acces the BeatAML dataset. For the other datasets in PharmacoGx there is SMILES associated to each compound but it's not the case for BeatAML. Since the PSets in PharmacoGx are version controlled via ORCESTRA, I am reaching you for help. Is there a way to have acces to this information?

Search function does not seem to incorporate drug sensitivity filtering status into search of the pset table.

If you do not select the filtered checkbox, filtered datasets still get returned

Annotation of eset in BeatAML is also incorrectly set

The mutation data is also incorrectly annotated in beataml:

metadata(SE)$annotation needs to be equal to mutation

Wrong GDSC array in canonical PSets

Should be fixed asap

gCSI 2019 does not have published statistics included in dataset

The gCSI dataset does not have any of Marc Hafner's precomputed GR metrics included. Need to fix pipeline

Problem with gene symbols in CCLE (and possibly other PSets)

ss1 <- fNames(CCLE, mDataType = "mutation")
ss2 <- featureInfo(CCLE, mDataType = "mutation")[ , "Symbol",drop=TRUE]

table(is.na(ss1) == is.na(ss2))

FALSE TRUE
83 1584

These 2 vectors should be the same

Broken links on statistics page to each dataset

Clicking on a dataset on this page takes me to undefined: https://orcestra.ca/Stats

Integrate non-immunotherapy datasets (Roche RAAN challenge)

Nine datasets that were curated as a part of Roche RAAN Phase I needs to be put on ORCESTRA. The data objects (SE) are already uploaded to zenodo
Code - https://github.com/bhklab/Clinical-Trial-SE/blob/master/ClinicalTrial_SE_curation.Rmd
Nine curated datasets available as R SummarizedExperiment objects on Zenodo - (See Open Access datasets section of Clinical trial curation documentation)

Incorrect mutation data for gCSI_2017

While investigating an issue opened by a user on GitHub I have discovered a problem with the gCSI_2017 mutation data. It appears that the matrix contains expression values instead of the normal strings needed for summarizeMolecularProfiles to work:

> molecularProfiles(gCSI_2017, 'mutation')[1:5, 1:5]
         NCI-H358   NCI-H292   NCI-H522   NCI-H650    NCI-H23
ARID1A -0.4137651 -0.4137651 -0.4137651 -0.4137651 -0.4137651
JAK1   -0.2579865 -0.2579865 -0.2579865 -0.2579865 -0.2579865
MSH2   -0.2113687 -0.2113687 -0.2113687 -0.2113687 -0.2113687
MSH6   -0.3179615 -0.3179615 -0.3179615 -0.3179615 -0.3179615
NFE2L2 -0.1223739 -0.1223739 -0.1223739 -0.1223739 -0.1223739

As such the results returned for summarizeMolecularProfiles are nonsensical:

> assay(summarizeMolecularProfiles(gCSI_2017, 'mutation', summary.stat='and'), 1)[1:5, 1:5]
       NCI-H358 NCI-H292 NCI-H522 NCI-H650 NCI-H23
ARID1A "1"      "1"      "1"      "1"      "1"    
JAK1   "1"      "1"      "1"      "1"      "1"    
MSH2   "1"      "1"      "1"      "1"      "1"    
MSH6   "1"      "1"      "1"      "1"      "1"    
NFE2L2 "1"      "1"      "1"      "1"      "1"

Please see PharmacoGx issue #71 for more information on the users code and the sessionInfo for their R environment.

fusion data missing for CCLE

BeatAML pset does not have PSet@annotation$version set to 2, but it should

download pset tries to convert the molecular profiles to summarized experiments, but fails because they are already summarized experiments. this is fixed by setting the version >=2

NCI60 RNA data misannotated

I am not sure what happened in the creation of the NCI60 PSet, but the feature into of the RNASeq data is completely misaligned with the row names of the object. For example:

> rowData(molecularProfiles(NCI60)$rnaseq.comp["ERBB2",])
DataFrame with 1 row and 9 columns
              gene_id hugo_symbol entrez_gid    cytoband          gene_name_url
          <character> <character>  <numeric> <character>            <character>
ERBB2 ENSG00000203663       OR2L2      26246        1q44 http://www.genenames..
              entrez_gid_url      genomic_coord_url       gene_description
                 <character>            <character>            <character>
ERBB2 http://www.ncbi.nlm... https://www.ncbi.nlm.. olfactory receptor f..
                 ensembl_tid
                 <character>
ERBB2 ENST00000642011|ENST..
> rowData(molecularProfiles(NCI60)$rnaseq.iso["ERBB2",])
DataFrame with 1 row and 9 columns
              gene_id hugo_symbol entrez_gid    cytoband          gene_name_url
          <character> <character>  <numeric> <character>            <character>
ERBB2 ENSG00000203663       OR2L2      26246        1q44 http://www.genenames..
              entrez_gid_url      genomic_coord_url       gene_description
                 <character>            <character>            <character>
ERBB2 http://www.ncbi.nlm... https://www.ncbi.nlm.. olfactory receptor f..
                 ensembl_tid
                 <character>
ERBB2 ENST00000642011|ENST..

The PSet needs to be fixed.

BeatAML pset is missing tissueid in cellInfo.

Yes, its not very informative, but it is a required column for pharmacogx to work.

Rename Gender to Sex in all PSets

Hello ORCESTRA team,

I am moving this issue here from PharmacoGx since we source all PSets from ORCESTRA now:

@bhaibeka

"Gender" in cellInfo should be replaced by "Sex" in all PSets

Thanks,
Chris

Update gCSI2018 pipeline to use SummarizedExperiments

Title is self-explanatory.

Expired SSL Certificate

Hi @mnakano,

I am getting an invalid SSL certificate when accessing https://orcestra.ca/.

Best,
Chris

Non-UTF byte in CCLE drugInfo

There is a non-UTF byte in drugInfo(CCLE)[4, 2]. That is the 'Compound..brand.name.' column, I think it is probably a TM symbol. But it breaks a bunch of stuff, such as reading in the table as a .csv in Python. Also some R show methods.

We should have a general mechanism to ensure that only valid UTF-8 strings are stored in a PSet. There is a utility for this already in base called iconv.

We could do something like:

DF$column <- iconv(DF$column, to='UTF-8', sub='')

submitting without login fails?

Tried to submit GDSC1 with old array and no filtering, without logging in. got this error:

Error in Request ProcessCannot read property 'name' of undefined

Available PSets API is broken when `canonical=FALSE`

Hi @mnakano and @anthfm,

It looks like this API URL is broken: http://www.orcestra.ca/api/psets/available.

As a result, availablePSets() in PharmacoGx is breaking when canonical=FALSE.

Could you please look into this?