suwonglab / arcsv Goto Github PK
View Code? Open in Web Editor NEWComplex structural variant detection from WGS data
License: MIT License
Complex structural variant detection from WGS data
License: MIT License
I am facing some difficulties when trying to run the example provided on the github, I get the following error:
[run] ref files {'reference': 'example/reference.fa', 'gap': 'example/gaps.bed'}
[run] calling SVs in 20:0-250000
[parse_bam] extracting approximate library stats
[parse_bam] read_len: 100; rough_insert_median: 367.0
[library_stats] processed 200000 reads (75932 chunks) for each lib
[library_stats] processed 400000 reads (145049 chunks) for each lib
[library_stats] processed 600000 reads (210832 chunks) for each lib
[library_stats] processed 800000 reads (272950 chunks) for each lib
[library_stats] processed 1000000 reads (345156 chunks) for each lib
Traceback (most recent call last):
File "/Users/ebattist/Library/Python/3.11/bin/arcsv", line 156, in
main()
File "/Users/ebattist/Library/Python/3.11/bin/arcsv", line 26, in main
run(args)
File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/call_sv.py", line 93, in run
call_sv(opts, inputs, reference_files)
File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/call_sv.py", line 161, in call_sv
pb_out = parse_bam(opts, reference_files, bamfiles)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/bamparser_streaming.py", line 122, in parse_bam
als = extract_approximate_library_stats(opts, bam, rough_insert_median)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/bamparser_streaming.py", line 87, in extract_approximate_library_stats
insert_pmf = [pmf_kernel_smooth(il, 0, opts['insert_max_mu_multiple'] * mu,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/bamparser_streaming.py", line 87, in
insert_pmf = [pmf_kernel_smooth(il, 0, opts['insert_max_mu_multiple'] * mu,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/bamparser_streaming.py", line 467, in pmf_kernel_smooth
pct = np.percentile(a_trunc, (25, 75))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<array_function internals>", line 200, in percentile
File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4205, in percentile
return _quantile_unchecked(
^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4473, in _quantile_unchecked
return _ureduce(a,
^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 3752, in _ureduce
r = func(a, **kwargs)
^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4639, in _quantile_ureduce_func
result = _quantile(arr,
^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4756, in _quantile
result = _lerp(previous,
^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4575, in _lerp
lerp_interpolation = asanyarray(add(a, diff_b_a * t, out=out))
~~~~~~~~~^~~
File "/opt/homebrew/lib/python3.11/site-packages/numpy/matrixlib/defmatrix.py", line 218, in mul
return N.dot(self, asmatrix(other))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<array_function internals>", line 200, in dot
ValueError: shapes (2,976007) and (2,1) not aligned: 976007 (dim 1) != 2 (dim 0)
It is very likely due to a difference in packages.
Could you send me the exact requirements for the packages and the version of python used? The setup.py only specifies the version of pysam
Hi,
Thank you for developing this wonderful tool.
I have learned that arcsv can call complex-SVs in each sample. I was wondering that is it possible that I merge all sample's complex-SV calls into one file, and use the merged SVs to genotype all the samples? Or if I identified some complex-SVs by comparing two genomes, could I use arcsv to genotype these SVs in population samples?
Thank you
Best wishes,
Songtao Gui
Hi, this looks very useful, will it work with cram files as input? thanks
Please consider depending on igraph
instead of python-igraph
. The latter name has been deprecated for nearly two years on PyPI and will soon stop receiving updates. See igraph/python-igraph#699 for details.
Ref: https://github.com/search?q=repo%3ASUwonglab%2Farcsv%20python-igraph&type=code
Hi,
I am very interested in testing arcsv as I heard about it at the ASHG2022 in LA, CA. However, I struggle at installing it on a cluster on which I do not have much support or admin rights. I was wondering if a container version of arcsv was available?
Thanks for your help,
Best regards,
Tatiana
.local/lib/python3.4/site-packages/arcsv/bamparser_streaming.py", line 56
if not_primary(aln) or aln.mpos < start or aln.mpos >= endor aln.is_duplicate:
^
SyntaxError: invalid syntax
I have been using arcsv
to genotype SV in a series of samples aligned using BWA without problems. But now, I'm aprocessing a series of samples generated using 10X and aligned with emerald and I got the following error:
/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
[run] ref files {'reference': '/media/NFS2/refdata-b37-2.1.0/fasta/genome.fa', 'gap': '/media/NFS/Carles/SV/tools/arcsv/resources/GRCh37_gap.bed'}
[run] calling SVs in 2:0-243199373
Traceback (most recent call last):
File "/home/carleshf/miniconda2/envs/py36/bin/arcsv", line 156, in <module>
main()
File "/home/carleshf/miniconda2/envs/py36/bin/arcsv", line 26, in main
run(args)
File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/call_sv.py", line 93, in run
call_sv(opts, inputs, reference_files)
File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/call_sv.py", line 161, in call_sv
pb_out = parse_bam(opts, reference_files, bamfiles)
File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/bamparser_streaming.py", line 109, in parse_bam
bam_has_unmapped = has_unmapped_records(bam) File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/bamparser_streaming.py", line 491, in has_unmapped_records
if any([a.is_unmapped and a.qname == aln.qname for a in alns]):
File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/bamparser_streaming.py", line 491, in <listcomp>
if any([a.is_unmapped and a.qname == aln.qname for a in alns]):
File "/home/carleshf/miniconda2/envs/py36/lib/python3.6/site-packages/arcsv/bamparser_streaming.py", line 430, in <genexpr>
return itertools.chain.from_iterable(b.fetch(*o1, **o2) for b in self.bamlist)
File "pysam/libcalignmentfile.pyx", line 855, in pysam.libcalignmentfile.AlignmentFile.fetch (pysam/libcalignmentfile.c:11188)
File "pysam/libcalignmentfile.pyx", line 783, in pysam.libcalignmentfile.AlignmentFile.parse_region (pysam/libcalignmentfile.c:10755)
ValueError: start out of range (-1)
I don't think that the warning has any impact on the caller but I am not getting why the problem with the BAM files. Any help is welcome!
Hi,
I am running arcsv for complex SVs. The error raises at line 22 of the function below, which I highlighted by stars.
def sv_affected_len(path, blocks):
# ref_path = list(range(0, 2 * len(blocks)))
n_ref = len([x for x in blocks if not x.is_insertion()])
ref_block_num = list(range(n_ref))
ref_string = ''.join(chr(x) for x in range(ord('A'), ord('A') + n_ref))
print('ref_string: {0}'.format(ref_string))
path_block_num = []
path_string = ''
for i in path[1::2]:
block_num = int(np.floor(i / 2))
path_block_num.append(block_num)
if i % 2 == 1: # forward orientation
path_string += chr(ord('A') + block_num)
else: # reverse orientation
path_string += chr(ord('A') + block_num + 1000)
**print('path_string: {0}'.format(path_string))**
affected_idx_1, affected_idx_2 = align_strings(ref_string, path_string)
affected_block_1 = set(ref_block_num[x] for x in affected_idx_1)
affected_block_2 = set(path_block_num[x] for x in affected_idx_2)
affected_blocks = affected_block_1.union(affected_block_2)
affected_len = sum(len(blocks[i]) for i in affected_blocks)
return affected_len
Please help, thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.