Comments (5)
[wfmash::map] Reference = [/data/HLA/DRB1-3123.fa.gz]
[wfmash::map] Query = [/data/HLA/DRB1-3123.fa.gz]
[wfmash::map] Kmer size = 19
[wfmash::map] Window size = 30
[wfmash::map] Segment length = 5000 (read split allowed)
[wfmash::map] Block length min = 25000
[wfmash::map] Chaining gap max = 100000
[wfmash::map] Percentage identity threshold = 70%
[wfmash::map] Skip self mappings
[wfmash::map] Mapping output file = /dev/stdout
[wfmash::map] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[wfmash::map] Execution threads = 16
[wfmash::skch::Sketch::build] minimizers picked from reference = 10970
[wfmash::skch::Sketch::index] unique minimizers = 2797
[wfmash::skch::Sketch::computeFreqHist] Frequency histogram of minimizers = (1, 19) ... (22, 1)
[wfmash::skch::Sketch::computeFreqHist] With threshold 0.001%, consider all minimizers during lookup.
[wfmash::map] time spent computing the reference index: 0.00974391 sec
[wfmash::skch::Map::mapQuery] mapped 0.00% @ 0.00e+00 bp/s elapsed: 00:00:00:00[wfmash::skch::Map::mapQuery] mapped 100.00% @ 3.26e+05 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00
[wfmash::skch::Map::mapQuery] count of mapped reads = 11, reads qualified for mapping = 12, total input reads = 12, total input bp = 163416
[wfmash::map] time spent mapping the query: 5.02e-01 sec
[wfmash::map] mapping results saved in: /dev/stdout
wfmash -s 5000 -l 25000 -p 70 -n 11 -k 19 -H 0.001 -X -t 16 --tmp-base /data/HLA/outputtt /data/HLA/DRB1-3123.fa.gz --approx-map
0.09s user 0.02s system 24% cpu 0.51s total 63704Kb max memory
python3 /usr/local/bin/scripts/paf2net.py -p /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.mappings.wfmash.paf
0.03s user 0.00s system 97% cpu 0.04s total 10636Kb max memory
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-x0sxurwi because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Detected 2 communities.
python3 /usr/local/bin/scripts/net2communities.py -e /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.mappings.wfmash.paf.edges.list.txt -w /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.mappings.wfmash.paf.edges.weights.txt -n /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.mappings.wfmash.paf.vertices.id2name.txt --accurate-detection --output-prefix /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e
0.83s user 2.05s system 519% cpu 0.55s total 72840Kb max memory
samtools faidx /data/HLA/DRB1-3123.fa.gz gi|568815592:32578768-32589835 gi|568815551:3814534-3830133 gi|568815561:3988942-4004531 gi|568815567:3779003-3792415 gi|568815569:3979127-3993865 gi|28212469:126036-137103 gi|28212470:131613-146345 gi|157702218:147985-163915 gi|528476637:32549024-32560088
0.00s user 0.00s system 100% cpu 0.00s total 3368Kb max memory
samtools faidx /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.community.0.fa
0.00s user 0.00s system 20% cpu 0.01s total 3192Kb max memory
samtools faidx /data/HLA/DRB1-3123.fa.gz gi|568815529:3998044-4011446 gi|345525392:5000-18402 gi|29124352:124254-137656
0.00s user 0.00s system 50% cpu 0.00s total 3484Kb max memory
samtools faidx /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.community.1.fa
0.00s user 0.00s system 28% cpu 0.00s total 3128Kb max memory
pggb -i /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.community.0.fa \
-o /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.community.0.fa.out \
-p 5000 -l 25000 -p 70 -n 12 -K 19 -F 0.001 \
-k 19 -f 0 -B 10000000 \
-H 12 -j 0 -e 0 -G 700,900,1100 -P 1,19,39,3,81,1 -O 0.001 -d 100 -Q Consensus_ \
--threads 16 --poa-threads 16
pggb -i /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.community.1.fa \
-o /data/HLA/outputtt/DRB1-3123.fa.gz.6deb21e.community.1.fa.out \
-p 5000 -l 25000 -p 70 -n 12 -K 19 -F 0.001 \
-k 19 -f 0 -B 10000000 \
-H 12 -j 0 -e 0 -G 700,900,1100 -P 1,19,39,3,81,1 -O 0.001 -d 100 -Q Consensus_ \
--threads 16 --poa-threads 16
from pggb.
Is this intended? I would expect 1 community.
from pggb.
With -p 98
several communities. But with -p 70
it should be one from my understanding.
from pggb.
That is a quite small (and short) dataset for applying community detection.
However, it doesn't seem like a bug. Those sequences are more similar to each other than all the others, so they form a community of their own. I suppose they tend to stay together also with different values of -p
.
from pggb.
I see. The longer the better. But how long? LPA? E. coli? Yeast?
Should one already know, there are several chromosomes in the dataset?
Can you somehow quantify when to apply the community detection?
from pggb.
Related Issues (20)
- empty VCF after running PanGenIe on the pggb assembly HOT 1
- GFA with no P lines HOT 5
- DRB1-3123 example not producing a nice graph anymore after `biwflambda` update. HOT 5
- PGGB use case with hexaploidy genomes HOT 1
- force reference output in VCF HOT 2
- Three chromosome take too long time HOT 16
- High heterogeneity in sequences identity HOT 2
- extracting node path-coverage information HOT 3
- wfmash -Y option HOT 3
- About the result study HOT 4
- Question about the example "scerevisiae7.fasta.gz " HOT 1
- ValueError: too many values to unpack (expected 13) HOT 3
- Annotating the 1D pangenome graph visualisation with centromere coordinates
- Get the fasta file of non reference sequence
- [W::vcf_parse] Contig '2' is not defined in the header. (Quick workaround: index the file with tabix.) HOT 4
- PGGB get the fasta file of non reference sequence
- Building a graph from fragmented assemblies
- interoperability with vg - error:[vg::SmallSnarlSimplifier] Invalid graph on iteration 0 HOT 14
- Current Bioconda release does not find python scripts HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pggb.