usda-vs / vsnp Goto Github PK
View Code? Open in Web Editor NEWvSNP -- validate SNPs
License: GNU General Public License v3.0
vSNP -- validate SNPs
License: GNU General Public License v3.0
Hello, I've run vSNP step 1, and the program stopped after outputting this message "reading *_unaligned_R2.fastq not such file or directory". Why does this happens? Has the step1 finished correctly anyway?
Thanks in advance,
You need to change the system calls to "gatk" to "gatk3" since GATK4 was released.
Hi Tod,
I think that adding these 2 options would make the usage of the pipeline more flexible:
I would actually use those 2 options often. We rarely have lots of new strains to process here...
Thanks,
Marco
vsnp_fasta_gbk_gff_by_acc.py -b -a <ncbi accession>
downloads the genbank short version and not the long version. The long version is needed by vSNP_step2.py to annotate tables. Manually downloading the long version from a web browser and placing into the dependency folder is current work around. Remember to check the file coding (UTF-8, LF).
Hello,
I'm facing a problem running the first part of vSNP pipeline.
I have Mycobacterium bovis raw reads obtained from Illumina Miniseq, with a paired end run, and I want the Mycobacterium bovis AF2122 as a reference. I wrote the following command:
SNP_step1.py -r1 47104-2018_S10_L001_R1_001.fastq.gz -r2 47104-2018_S10_L001_R2_001.fastq.gz -r Mycobacterium_AF2122
and the analysis starts properly until I get this message:
Traceback (most recent call last):
File "/home/[email protected]/anaconda3/envs/vSNP/bin/vSNP_step1.py", line 122, in
align_reads.align()
File "/home/[email protected]/anaconda3/envs/vSNP/bin/vsnp_alignment_vcf.py", line 376, in align
group_reporter = GroupReporter(zero_coverage_vcf, ref_option)
File "/home/[email protected]/anaconda3/envs/vSNP/bin/vsnp_group_reporter.py", line 30, in init
defsnp_iterator = iter(defining_snps.iteritems())
File "/home/[email protected]/anaconda3/envs/vSNP/lib/python3.9/site-packages/pandas/core/generic.py", line 5989, in getattr
return object.getattribute(self, name)
AttributeError: 'Series' object has no attribute 'iteritems'
After that anything happen and my folder look like this:
Sometimes it happen that the command line write itself this: Minimum k-mer coverage is 7
and keep going into the analysis (?), but not always.
How can I solve this problem?
Hope I wrote clearly,
thank you in advance
Valentina
When using the "all" option on a high number of VCF file (4460), I get the following error.
`All_VCFs table dimensions: (4298, 63111)
All_VCFs RAxML running...
ERROR: missing ')' at line 0 near '5MIDNRdeerMontm_zc'
cat: write error: Broken pipe
All_VCFs Getting map quality...
All_VCFs annotating from annotation dictionary... D20181218_1709
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/concurrent/futures/process.py", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/concurrent/futures/process.py", line 153, in
return [fn(*args) for args in chunk]
File "/home/bioinfo/vSNP/functions.py", line 2720, in get_snps
excelwriter(out_sort) #***FUNCTION CALL #sort
File "/home/bioinfo/vSNP/functions.py", line 2889, in excelwriter
wb.close()
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 306, in close
self._store_workbook()
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/site-packages/xlsxwriter/workbook.py", line 677, in _store_workbook
xlsx_file.write(os_filename, xml_filename)
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/zipfile.py", line 1645, in write
with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/zipfile.py", line 1378, in open
return self._open_to_write(zinfo, force_zip64=force_zip64)
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/zipfile.py", line 1488, in _open_to_write
self._writecheck(zinfo)
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/zipfile.py", line 1604, in _writecheck
" would require ZIP64 extensions")
zipfile.LargeZipFile: Filesize would require ZIP64 extensions
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/bioinfo/vSNP/vSNP.py", line 227, in
functions.run_script2(arg_options)
File "/home/bioinfo/vSNP/functions.py", line 1696, in run_script2
for samples_in_fasta in pool.map(get_snps, directory_list, itertools_repeat(arg_options), chunksize=5):
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/concurrent/futures/process.py", line 366, in _chain_from_iterable_of_lists
for element in iterable:
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/home/bioinfo/miniconda3/envs/vsnp/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
zipfile.LargeZipFile: Filesize would require ZIP64 extensions`
Hi I have installed vSNP using your instructions.
Due to an error, I had to uninstall samtools using conda since it suggested the set-up was not correct. In the processes it removed a few extra dependecies including pysam. I then installed both pysam and samtools and then I tried vSNP.py out on the mycobacterium test dataset.
That ended with this output:
runtime: 0:17:34.115335:
average_coverage: 13.8
time_stamp: 2018-12-03_20-33-19
sample_name: 13-1941
species: af
reference_sequence_name: NC_002945.4
R1size: 31.0MB
R2size: 38.8MB
allbam_mapped_reads: 286,205
genome_coverage: 99.02%
ave_coverage: 13.8
ave_read_length: 227.2
unmapped_reads: 1871
unmapped_assembled_contigs: 745
good_snp_count: 675
mlst_type: N/A
octalcode: 640013777377600
sbcode: N/A
hexadecimal_code: 68-0-5F-7E-FF-60
binarycode: 1101000000000010111111111110111111111100000
Q_ave_R1: 34.7
Q30_R1: 89.9%
Q_ave_R2: 27.4
Q30_R2: 41.4%
Path to cumulative stat summary file not found
runtime: 0:19:52.687740:
See files, vSNP has finished alignments
What does that mean, when path to stat summary is not found. Is that bad?
Thomas
Hello Tod,
I'm a grad student looking to study how to best genotype Mycobacterium Bovis, and vSNP code provides an excellent way to study this workflow. I was wondering if there were any scientific papers that you read that helped to inform your decision making when creating the vSNP program, because I would like to have a paper to complement my study of the program!
The script vcftofasta.sh at line 244 call for a file "ALL_WGS.xlsx", but it's not present in the dependencies. Can you please add it up?
"script2" has changed a bit since last time I used it. I'm trying to implement all the new changes in my local installation.
The annotation issue is being cause by this join notation being used in the gbks. For example the line:
complement(join(4727880..4728107,1..738))
Annotation works if the join is removed. Making the above:
complement(4727880..4728107)
When fixing this, search the gbk on the keyword join
to also find lines such as this:
join(4727880..4728107,1..738)
This line will also break table annotation.
Simply deleting the line from the gbk will make the fix.
Be aware that these "fixes" will cause regions of the genome to not be identified, however this should be a very small amount.
Hello,
I am having trouble installing/using this software.
OS: CentOS Linux release 7.9.2009 (Core)
Conda version: 4.11.0
Steps taken to install:
conda create -n vsnp
source activate vsnp
conda install -c defaults -c bioconda -c conda-forge vsnp
cd
git clone https://github.com/USDA-VS/vSNP_reference_options.git
vsnp_path_adder.py -d ~/vSNP_reference_options
This installs the software, but upon using vSNP_step1.py, I get:
### SRR Making indexes...
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
conda list
output:
# packages in environment at /home/bc06026/.conda/envs/vsnp:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
abyss 2.0.2 h51208dd_5 bioconda
asciitree 0.3.3 py_2
bc 1.07.1 h7f98852_0 conda-forge
biopython 1.78 py39h7f8727e_0
blas 1.0 mkl
bokeh 2.4.2 py39h06a4308_0
bottleneck 1.3.4 py39hce1f21e_0
brotli 1.0.9 he6710b0_2
bwa 0.7.17 h7132678_9 bioconda
bzip2 1.0.8 h7b6447c_0
c-ares 1.18.1 h7f8727e_0
ca-certificates 2022.3.29 h06a4308_0
certifi 2021.10.8 py39h06a4308_2
click 8.0.4 py39h06a4308_0
cloudpickle 2.0.0 pyhd3eb1b0_0
cycler 0.11.0 pyhd3eb1b0_0
cytoolz 0.11.0 py39h27cfd23_0
dask 2022.2.1 pyhd3eb1b0_0
dask-core 2022.2.1 pyhd3eb1b0_0
dbus 1.13.18 hb2f20db_0
distributed 2022.2.1 pyhd3eb1b0_0
expat 2.4.4 h295c915_0
fasteners 0.16.3 pyhd3eb1b0_0
fontconfig 2.13.1 h6c09931_0
fonttools 4.25.0 pyhd3eb1b0_0
freebayes 0.9.21.7 0 bioconda
freetype 2.11.0 h70c0345_0
fsspec 2022.2.0 pyhd3eb1b0_0
giflib 5.2.1 h7b6447c_0
glib 2.69.1 h4ff587b_1
gst-plugins-base 1.14.0 h8213a91_2
gstreamer 1.14.0 h28cd5cc_2
h5py 3.6.0 py39ha0f2276_0
hdf5 1.10.6 hb1b8bf9_0
heapdict 1.0.1 pyhd3eb1b0_0
htslib 1.14 h9093b5e_0 bioconda
humanize 3.10.0 pyhd3eb1b0_0
icu 58.2 he6710b0_3
intel-openmp 2021.4.0 h06a4308_3561
jinja2 3.0.3 pyhd3eb1b0_0
joblib 1.1.0 pyhd3eb1b0_0
jpeg 9d h7f8727e_0
kiwisolver 1.3.2 py39h295c915_0
krb5 1.19.2 hac12032_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libcurl 7.80.0 h0b77cf5_0
libdeflate 1.7 h7f98852_5 conda-forge
libedit 3.1.20210910 h7f8727e_0
libev 4.33 h7f8727e_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1d223b6_14 conda-forge
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 11.2.0 h1d223b6_14 conda-forge
libnghttp2 1.46.0 hce63b2e_0
libpng 1.6.37 hbc83047_0
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 11.2.0 he4da1e4_14 conda-forge
libtiff 4.2.0 h85742a9_0
libuuid 1.0.3 h7f8727e_2
libwebp 1.2.2 h55f646e_0
libwebp-base 1.2.2 h7f8727e_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.12 h03d6c58_0
libzlib 1.2.11 h166bdaf_1014 conda-forge
locket 0.2.1 py39h06a4308_2
lz4-c 1.9.3 h295c915_1
make 4.2.1 h1bed415_1
markupsafe 2.0.1 py39h27cfd23_0
matplotlib 3.5.1 py39h06a4308_1
matplotlib-base 3.5.1 py39ha18d171_1
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py39h7f8727e_0
mkl_fft 1.3.1 py39hd3c417c_0
mkl_random 1.2.2 py39h51133e4_0
mpi 1.0 openmpi
msgpack-python 1.0.2 py39hff7bd54_1
munkres 1.1.4 py_0
ncurses 6.3 h7f8727e_2
networkx 2.7.1 pyhd3eb1b0_0
numcodecs 0.9.1 py39h295c915_0
numexpr 2.8.1 py39h6abb31d_0
numpy 1.21.2 py39h20f2e39_0
numpy-base 1.21.2 py39h79a1101_0
openjdk 11.0.13 h87a67e3_0
openmpi 4.0.2 hb1b8bf9_1
openssl 1.1.1n h7f8727e_0
packaging 21.3 pyhd3eb1b0_0
pandas 1.4.1 py39h295c915_1
pandoc 2.12 h06a4308_0
partd 1.2.0 pyhd3eb1b0_1
pcre 8.45 h295c915_0
perl 5.26.2 h14c3975_0
picard 2.18.29 0 bioconda
pillow 9.0.1 py39h22f2fdc_0
pip 21.2.4 py39h06a4308_0
pomegranate 0.14.4 py39h9a67853_0
psutil 5.8.0 py39h27cfd23_1
py-cpuinfo 8.0.0 pyhd3eb1b0_1
pyparsing 3.0.4 pyhd3eb1b0_0
pyqt 5.9.2 py39h2531618_6
pysam 0.16.0.1 py39h051187c_3 bioconda
python 3.9.11 h12debd9_2
python-dateutil 2.8.2 pyhd3eb1b0_0
python_abi 3.9 2_cp39 conda-forge
pytz 2021.3 pyhd3eb1b0_0
pyvcf 0.6.8 py39hde42818_1002 conda-forge
pyyaml 6.0 py39h7f8727e_1
qt 5.9.7 h5867ecd_1
raxml 8.2.12 hec16e2b_4 bioconda
readline 8.1.2 h7f8727e_1
regex 2022.3.15 py39h7f8727e_0
samtools 1.15 h3843a85_0 bioconda
scikit-allel 1.3.5 py39hde0f152_1 conda-forge
scikit-learn 1.0.2 py39h51133e4_1
scipy 1.7.3 py39hc147768_0
seaborn 0.11.2 pyhd3eb1b0_0
setuptools 58.0.4 py39h06a4308_0
sip 4.19.13 py39h295c915_0
six 1.16.0 pyhd3eb1b0_1
sortedcontainers 2.4.0 pyhd3eb1b0_0
sqlite 3.38.2 hc218d9a_0
tabixpp 1.1.0 hb264ae4_8 bioconda
tblib 1.7.0 pyhd3eb1b0_0
threadpoolctl 2.2.0 pyh0d69192_0
tk 8.6.11 h1ccaba5_0
toolz 0.11.2 pyhd3eb1b0_0
tornado 6.1 py39h27cfd23_0
typing_extensions 4.1.1 pyh06a4308_0
tzdata 2022a hda174b7_0
vcflib 1.0.3 hecb563c_1 bioconda
vsnp 2.03 hdfd78af_2 bioconda
wheel 0.37.1 pyhd3eb1b0_0
xlrd 2.0.1 pyhd3eb1b0_0
xlsxwriter 3.0.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zarr 2.8.1 pyhd3eb1b0_0
zict 2.0.0 pyhd3eb1b0_0
zlib 1.2.11 h166bdaf_1014 conda-forge
zstd 1.4.9 haebb681_0
Attempts to resolve this:
conda install -c bioconda samtools=1.9 --force-reinstall
, but I'm gettingUnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions
Package libgcc-ng conflicts for:
python=3.9 -> zlib[version='>=1.2.11,<1.3.0a0'] -> libgcc-ng[version='>=10.3.0|>=7.2.0']
python=3.9 -> libgcc-ng[version='>=7.3.0|>=7.5.0']
Package ncurses conflicts for:
python=3.9 -> ncurses[version='>=6.2,<7.0a0|>=6.3,<7.0a0']
python=3.9 -> readline[version='>=8.0,<9.0a0'] -> ncurses[version='>=6.1,<7.0a0']
Package _libgcc_mutex conflicts for:
samtools=1.9 -> libgcc-ng[version='>=7.3.0'] -> _libgcc_mutex[version='*|0.1',build='main|conda_forge|main']
python=3.9 -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1',build='main|conda_forge|main']The following specifications were found to be incompatible with your system:
- feature:/linux-64::__glibc==2.17=0
- feature:|@/linux-64::__glibc==2.17=0
- samtools=1.9 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.17
I tried making a new environment and installing vsnp with conda install -c defaults -c bioconda -c conda-forge vsnp python=3.7
and after a very long time it did give me a y/n proceed to install prompt, but samtools was at 1.7 and I still got the same libcrypto.so.1.0.0 error.
I tried downloading the most recent .tar.bz2 from here: https://anaconda.org/bioconda/vsnp/files, editing the meta.yaml changing python >=3.7 to python=3.7, conda-build .
, but got:
conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform linux-64: {'vcflib', 'pysam', 'freebayes', 'scikit-allel', 'picard', 'raxml', 'samtools', 'abyss', 'bwa', 'pyvcf'}
Please advise the best way to resolve this and let me know if you need any other information. Thank you for your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.