Giter Club home page Giter Club logo

nf-encyclopedia's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

nf-encyclopedia's Issues

Tests fail locally

Hello there,

I noticed that tests fail when running them locally with this error:

Command error:
  Unable to find image 'talusbio/nf-encyclopedia:latest' locally
  docker: Error response from daemon: pull access denied for talusbio/nf-encyclopedia, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

because they use this line:

process.container = "nf-encyclopedia"

and therefore fail because even though ghcr.io/talusbio/nf-encyclopedia:latest had been pulled, the tag is not matched.

Notably, that line also makes the tests pass in CICD because right before the test runs, the image is built using:

/usr/bin/docker buildx build --iidfile /tmp/docker-build-push-V0tuRX/iidfile --tag nf-encyclopedia:latest --load --metadata-file /tmp/docker-build-push-V0tuRX/metadata-file .

https://github.com/TalusBio/nf-encyclopedia/actions/runs/3192943440/jobs/5210985034#step:5:102

So maybe a complete solution would entail looking for env variables and use a file that contains only "nf-encyclopedia" as the container definition, otherwise use "ghcr.io/talusbio/nf-encyclopedia:latest".

LMK what you think!

TLDR:
tests fail if you didnt build the image in the same computer. we can either document it or fix it :P

Add Documentation

We need better documentation is addition to the README. After a conversation with @cia23, here are some things to add, with likely more to come:

Parameter documentation

  • This could take the form of a JSON file, like what is used by nf-core (example). The json file is easy to parse, but unfortunately does not live in the config file.
  • Alternatively, this could take the form of special comments incorporated into the config file. I'm thinking something akin to doxygen for C++ or roxygen2 for R. The downside here would be that we'd need to write a parser for it to build documentation.

Running the pipeline

  • We should have an example command for folks to run.
  • We need to better document prerequisites like Docker.

Unable to access jarfile /code/encyclopedia.jar

I am running the pipeline via WSL2 setup and getting the following error

./nextflow run TalusBio/nf-encyclopedia -r latest --input input.csv --dlib proteins.dlib --fasta proteins.fasta
N E X T F L O W  ~  version 22.10.2
Launching `https://github.com/TalusBio/nf-encyclopedia` [dreamy_minsky] DSL2 - revision: 63c5d914a2 [latest]
executor >  local (2)
[-        ] process > CONVERT_TO_MZML:MSCONVERT                         -
[-        ] process > BUILD_CHROMATOGRAM_LIBRARY:ENCYCLOPEDIA_SEARCH    -
[-        ] process > BUILD_CHROMATOGRAM_LIBRARY:ENCYCLOPEDIA_AGGREGATE -
[fe/53691a] process > PERFORM_QUANT:ENCYCLOPEDIA_SEARCH (1)             [100%] 2 of 2, failed: 2 ✘
[-        ] process > PERFORM_QUANT:ENCYCLOPEDIA_AGGREGATE              -
[-        ] process > MSSTATS                                           -
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'PERFORM_QUANT:ENCYCLOPEDIA_SEARCH (2)'

Caused by:
  Process `PERFORM_QUANT:ENCYCLOPEDIA_SEARCH (2)` terminated with an error exit status (1)

Command executed:

  gzip -df mz600-604.210712_ratio_22m_01_058.mzML.gz
  java -Djava.aws.headless=true -Xmx31G -jar /code/encyclopedia.jar \
      -i mz600-604.210712_ratio_22m_01_058.mzML \
      -f proteins.fasta \
      -l proteins.dlib \
      -percolatorVersion v3-01 -quantifyAcrossSamples true -scoringBreadthType window \
       \
  | tee mz600-604.210712_ratio_22m_01_058.mzML.local.log
  gzip mz600-604.210712_ratio_22m_01_058.mzML.features.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error: Unable to access jarfile /code/encyclopedia.jar

Work dir:
  /home/ash022/work/eb/9560810abcdc8abe743dda3e406ae4

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

The inputs are from base repo folder https://github.com/TalusBio/nf-encyclopedia/tree/main/tests/data

cat input.csv
file, chrlib
mz600-604.210712_ratio_22m_01_046.mzML.gz, false
mz600-604.210712_ratio_22m_01_058.mzML.gz, false

Any ideas how to proceed?

[Bug] MSstats drops some proteins

We've observed that MSstats is dropping some proteins. At first, we suspected this was due to too much missing data for a peptide. Now, after @cia23's discussions with the Drug Disco team, it seems to be a bug.

Currently, we perform an inner join on the proteins from the EncyclopeDIA proteins.txt and peptides.txt output, under the assumption that the Protein columns should always yield a 1-to-many match and all peptides with a protein accepted at 1% FDR would be accounted for. Unfortunately, this assumption is wrong: the Protein column in peptides.txt does not contain the protein groups in the Protein column from proteins.txt, leading to missing proteins.

The fix here is to use the PeptideSequences column in proteins.txt to map proteins to peptides, which will be a headache.

Find the difference between the EncyclopeDIA CLI and GUI

We've noticed significant differences between results obtained with the EncyclopeDIA GUI and CLI. Unfortunately, talking with Brian and Seth hasn't revealed anything that could be the cause.

Here's how we should find the problem:

  1. Create a small mzML file to iterate with. I think the best way to do this is to take a normal file and filter for 1-2 DIA windows using msconvert.
  2. Verify that we can reproduce our problems with this smaller file.
  3. Add a print statement to the EncyclopeDIA codebase to see exactly what parameters are being used by the CLI. @ricomnl has already made some progress on this. @ricomnl - do you know if we can add it as some kind of "debug level" logging in the official EncyclopeDIA version?

I hope that this small file is one that we could incorporate for unit tests as well.

[TODO] Add process capturing QC metrics

We want to add a process to the pipeline that captures a set of QC metrics and then writes these to a DB.

Current idea:

Change unique_peptides_proteins → qc_task
Within the qc_task, read all the wide elib, the quant_peptides, and the quant_proteins
Calculate unique peptides and proteins, %cv across dmso samples, gopher values
For now drop it in S3 → show it in the data standard-report
Then store it in noSQL in scispot

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.