talusbio / nf-encyclopedia Goto Github PK
View Code? Open in Web Editor NEWA NextFlow pipeline for chromatogram library DIA proteomics workflows
License: Apache License 2.0
A NextFlow pipeline for chromatogram library DIA proteomics workflows
License: Apache License 2.0
Hello there,
I noticed that tests fail when running them locally with this error:
Command error:
Unable to find image 'talusbio/nf-encyclopedia:latest' locally
docker: Error response from daemon: pull access denied for talusbio/nf-encyclopedia, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
because they use this line:
nf-encyclopedia/conf/test.config
Line 10 in 6b469ea
and therefore fail because even though ghcr.io/talusbio/nf-encyclopedia:latest had been pulled, the tag is not matched.
Notably, that line also makes the tests pass in CICD because right before the test runs, the image is built using:
/usr/bin/docker buildx build --iidfile /tmp/docker-build-push-V0tuRX/iidfile --tag nf-encyclopedia:latest --load --metadata-file /tmp/docker-build-push-V0tuRX/metadata-file .
https://github.com/TalusBio/nf-encyclopedia/actions/runs/3192943440/jobs/5210985034#step:5:102
So maybe a complete solution would entail looking for env variables and use a file that contains only "nf-encyclopedia" as the container definition, otherwise use "ghcr.io/talusbio/nf-encyclopedia:latest".
LMK what you think!
TLDR:
tests fail if you didnt build the image in the same computer. we can either document it or fix it :P
Eventually we should update the README to include details about this workflow, including what the workflow does and how to test it locally.
Current we support notifications through email. However, it would be nice to support notifications through Slack, like nf-core pipelines such as quantms do.
We need better documentation is addition to the README. After a conversation with @cia23, here are some things to add, with likely more to come:
I am running the pipeline via WSL2 setup and getting the following error
./nextflow run TalusBio/nf-encyclopedia -r latest --input input.csv --dlib proteins.dlib --fasta proteins.fasta
N E X T F L O W ~ version 22.10.2
Launching `https://github.com/TalusBio/nf-encyclopedia` [dreamy_minsky] DSL2 - revision: 63c5d914a2 [latest]
executor > local (2)
[- ] process > CONVERT_TO_MZML:MSCONVERT -
[- ] process > BUILD_CHROMATOGRAM_LIBRARY:ENCYCLOPEDIA_SEARCH -
[- ] process > BUILD_CHROMATOGRAM_LIBRARY:ENCYCLOPEDIA_AGGREGATE -
[fe/53691a] process > PERFORM_QUANT:ENCYCLOPEDIA_SEARCH (1) [100%] 2 of 2, failed: 2 ✘
[- ] process > PERFORM_QUANT:ENCYCLOPEDIA_AGGREGATE -
[- ] process > MSSTATS -
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'PERFORM_QUANT:ENCYCLOPEDIA_SEARCH (2)'
Caused by:
Process `PERFORM_QUANT:ENCYCLOPEDIA_SEARCH (2)` terminated with an error exit status (1)
Command executed:
gzip -df mz600-604.210712_ratio_22m_01_058.mzML.gz
java -Djava.aws.headless=true -Xmx31G -jar /code/encyclopedia.jar \
-i mz600-604.210712_ratio_22m_01_058.mzML \
-f proteins.fasta \
-l proteins.dlib \
-percolatorVersion v3-01 -quantifyAcrossSamples true -scoringBreadthType window \
\
| tee mz600-604.210712_ratio_22m_01_058.mzML.local.log
gzip mz600-604.210712_ratio_22m_01_058.mzML.features.txt
Command exit status:
1
Command output:
(empty)
Command error:
Error: Unable to access jarfile /code/encyclopedia.jar
Work dir:
/home/ash022/work/eb/9560810abcdc8abe743dda3e406ae4
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
The inputs are from base repo folder https://github.com/TalusBio/nf-encyclopedia/tree/main/tests/data
cat input.csv
file, chrlib
mz600-604.210712_ratio_22m_01_046.mzML.gz, false
mz600-604.210712_ratio_22m_01_058.mzML.gz, false
Any ideas how to proceed?
Currently, conditions that contain spaces or begin with numbers will cause the pipeline to fail. Instead, we should be able to gracefully handle these cases.
I think changing check.names
to FALSE
in read.table()
should fix this:
Line 56 in 6b469ea
We've observed that MSstats is dropping some proteins. At first, we suspected this was due to too much missing data for a peptide. Now, after @cia23's discussions with the Drug Disco team, it seems to be a bug.
Currently, we perform an inner join on the proteins from the EncyclopeDIA proteins.txt and peptides.txt output, under the assumption that the Protein
columns should always yield a 1-to-many match and all peptides with a protein accepted at 1% FDR would be accounted for. Unfortunately, this assumption is wrong: the Protein
column in peptides.txt does not contain the protein groups in the Protein
column from proteins.txt, leading to missing proteins.
The fix here is to use the PeptideSequences
column in proteins.txt to map proteins to peptides, which will be a headache.
We've noticed significant differences between results obtained with the EncyclopeDIA GUI and CLI. Unfortunately, talking with Brian and Seth hasn't revealed anything that could be the cause.
Here's how we should find the problem:
I hope that this small file is one that we could incorporate for unit tests as well.
LMK if you want me to take a look at this!
We want to add a process to the pipeline that captures a set of QC metrics and then writes these to a DB.
Current idea:
Change unique_peptides_proteins → qc_task
Within the qc_task, read all the wide elib, the quant_peptides, and the quant_proteins
Calculate unique peptides and proteins, %cv across dmso samples, gopher values
For now drop it in S3 → show it in the data standard-report
Then store it in noSQL in scispot
The output for msstats.proteins.txt
contains the row names, which causes problems for Excel users. We should just remove it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.