fred-2 / optitype Goto Github PK

View Code? Open in Web Editor NEW

179.0 179.0 72.0 15.2 MB

Precision HLA typing from next-generation sequencing data

License: BSD 3-Clause "New" or "Revised" License

Python 96.28% Dockerfile 3.72%

optitype's People

Contributors

Stargazers

Watchers

optitype's Issues

Optitype crashes with "IndexError: in the future, 0-d boolean arrays will be interpreted as a valid boolean index"

Looking at the commits, this might already be fixed. However, can we get a release so I can update the Bioconda package?

+ OptiTypePipeline.py --config /tmp/tmp.7tWxKjFp3A/tmp.giJXfvqJlQ/tmp.d/optitype.ini --verbose --input /tmp/tmp.7tWxKjFp3A/tmp.giJXfvqJlQ/tmp.d/reads_left.fastq /tmp/tmp.7tWxKjFp3A/tmp.giJXfvqJlQ/tmp.d/reads_right.fastq --dna --outdir /tmp/tmp.7tWxKjFp3A/tmp.giJXfvqJlQ/out.tmp
Traceback (most recent call last):
  File "/tmp/tmp.7tWxKjFp3A/tmpl00sjz26/.snakemake/conda/7a1aa8ba/bin/OptiTypePipeline.py", line 427, in <module>
    coverage_mat = ht.calculate_coverage(plot_variables, features, hlatype, features_used)
  File "/tmp/tmp.7tWxKjFp3A/tmpl00sjz26/.snakemake/conda/7a1aa8ba/share/optitype-1.2-0/hlatyper.py", line 626, in calculate_coverage
    coverage[bool(i_mismatches)][i_pairing-1][i_hitcount-1][i_pos-1:i_pos-1+i_read_length] += 1
IndexError: in the future, 0-d boolean arrays will be interpreted as a valid boolean index
+ rm -rf /tmp/tmp.7tWxKjFp3A/tmp.giJXfvqJlQ
Traceback (most recent call last):
  File "/home/mholtgre/Development/open_pipeline/tools/cubi_wrappers/cubi_wrappers/wrappers/optitype/.snakemake.nq3mkf2a.wrapper.py", line 119, in <module>
    """)
  File "/bioconda/2017-03/miniconda3/lib/python3.4/site-packages/snakemake/shell.py", line 80, in __new__
    raise sp.CalledProcessError(retcode, cmd)

AttributeError: 'IndexedVarWithDomain' object has no attribute 'reset'

Hello,

I received the above error running on example data. Do you have any advice about how I can proceed? Thanks!

Command:
python OptiTypePipeline.py -i test/exome/NA11995_SRR766010_1_fished.fastq --verbose --dna --outdir /home/ubuntu/OptiType-master

Config:
[MAPPING]

please specify the razerS3 binary path

RAZERS3=/usr/bin/razers3
THREADS=4

[LIBRARIES]
RNA_REF=/home/ubuntu/OptiType-master/data/hla_reference_rna.fasta
DNA_REF=/home/ubuntu/OptiType-master/data/hla_reference_dna.fasta
ALLELES=/home/ubuntu/OptiType-master/data/alleles.h5

[OPTIMIZATION]

the solver has to be supported by Coopr

SOLVER=cbc
THREADS=1

Log:
0:00:00.33 Mapping NA11995_SRR766010_1_fished.fastq to GEN reference...
0:00:15.47 Generating binary hit matrix.
0:00:15.47 Loading alleles and read IDs from /home/ubuntu/OptiType-master/2015_09_27_22_16_19/2015_09_27_22_16_19_0.sam...
0:00:16.37 11179 alleles and 1909 reads found.
0:00:16.37 Initializing mapping matrix...
0:00:16.38 1909x11179 mapping matrix initialized. Populating 1344422 hits from SAM file...
10% completed
20% completed
30% completed
40% completed
50% completed
60% completed
70% completed
80% completed
90% completed
100% completed
0:03:20.32 1344422 elements filled. Matrix sparsity: 1 in 15.87

0:03:22.42 temporary pruning of identical rows and columns

0:03:22.51 Size of mtx with unique rows and columns: (434, 1021)
0:03:22.51 determining minimal set of non-overshadowed alleles
/home/ubuntu/.local/lib/python2.7/site-packages/pandas/util/decorators.py:13: FutureWarning: diff is deprecated. Use difference instead
FutureWarning)

0:03:23.70 Keeping only the minimal number of required alleles (125,)

0:03:23.70 Creating compact model...

0:03:23.83 Initializing OptiType model...
Traceback (most recent call last):
File "OptiTypePipeline.py", line 322, in
result = op.solve(args.enumerate)
File "/home/ubuntu/OptiType-master/model.py", line 142, in solve
self.__instance.x.reset()
AttributeError: 'IndexedVarWithDomain' object has no attribute 'reset'`

ERR031857

One of the samples not correctly predicted by OptiType is ERR031857, where A02:06 is misclassified as A02:01.

Even when expanding the results (-e 5), the correct solution is not found:

    A1  A2  B1  B2  C1  C2  Reads   Objective                                                                                        
0   A*02:01 A*11:01 B*07:05 B*54:01 C*07:02 C*01:02 413 394.4049999999999
1   A*02:01 A*11:01 B*07:05 B*55:02 C*07:02 C*01:02 407 388.6749999999999
2   A*02:01 A*11:01 B*54:01 B*07:02 C*07:02 C*01:02 400 381.97999999999985
3   A*02:01 A*11:01 B*55:02 B*07:02 C*07:02 C*01:02 394 376.24999999999983
4   A*02:01 A*11:01 B*07:05 B*55:04 C*07:02 C*01:02 392 374.34000000000003

Curiously, e.g. Major et al. (2013) were able to predict the correct HLA types.

Does anybody have an idea why this happens?
(I would be interested if Optitype2 can handle this case.)

Python 3 support

Would you consider a patch adding Python 3 support to OptiType? This would greatly simplify our lives at the place I work as this would make OptiType integrate better with the rest of our stack.

"ValueError: cannot reindex from a duplicate axis"

Hello,

I received the following error. I am not familiar with python so am unable to investigate further. I have run OptiType successfully on many other similar files and encountered this error only once. Thank you in advance for your consideration.

Traceback (most recent call last):
File "/home/ubuntu/OptiType-master/OptiTypePipeline.py", line 342, in
coverage_mat = ht.calculate_coverage(plot_variables, features, hlatype, features_used)
File "/home/ubuntu/OptiType-master/hlatyper.py", line 505, in calculate_coverage
hit_counts[reads]):
File "/home/ubuntu/.local/lib/python2.7/site-packages/pandas/core/series.py", line 561, in getitem
return self._get_with(key)
File "/home/ubuntu/.local/lib/python2.7/site-packages/pandas/core/series.py", line 604, in _get_with
return self.reindex(key)
File "/home/ubuntu/.local/lib/python2.7/site-packages/pandas/core/series.py", line 2151, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "/home/ubuntu/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 1773, in reindex
method, fill_value, copy).finalize(self)
File "/home/ubuntu/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 1790, in _reindex_axes
fill_value=fill_value, copy=copy, allow_dups=False)
File "/home/ubuntu/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 1876, in _reindex_with_indexers
copy=copy)
File "/home/ubuntu/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 3150, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "/home/ubuntu/.local/lib/python2.7/site-packages/pandas/core/index.py", line 1860, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

System information:

Linux version 3.19.0-31-generic (buildd@lcy01-07) (gcc version 4.9.2 (Ubuntu 4.9.2-10ubuntu13) ) #36-Ubuntu SMP Wed Oct 7 15:04:02 UTC 2015

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.04
Release: 15.04
Codename: vivid

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
Stepping: 2
CPU MHz: 1220.812
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4862.53
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39

MemTotal: 165051848 kB
MemFree: 7424336 kB
MemAvailable: 111855276 kB
Buffers: 111620 kB
Cached: 102348516 kB
SwapCached: 9496 kB
Active: 91282240 kB
Inactive: 63390396 kB
Active(anon): 51573908 kB
Inactive(anon): 640368 kB
Active(file): 39708332 kB
Inactive(file): 62750028 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 1048572 kB
SwapFree: 766880 kB
Dirty: 120940 kB
Writeback: 0 kB
AnonPages: 52205360 kB
Mapped: 84072 kB
Shmem: 428 kB
Slab: 2388540 kB
SReclaimable: 2310464 kB
SUnreclaim: 78076 kB
KernelStack: 11584 kB
PageTables: 111768 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 83574496 kB
Committed_AS: 77935112 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 439564 kB
VmallocChunk: 34274108676 kB
HardwareCorrupted: 0 kB
AnonHugePages: 17944576 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 133116 kB
DirectMap2M: 167770112 kB

siblings : 20
core id : 8
cpu cores : 10
apicid : 49
initial apicid : 49
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ida fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs :
bogomips : 4862.53
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

Create a version tag

We'd like to experiment with OptiType and I'd like to fix on a particular code set. Instead of relying on a commit, would it be possible to have a tagged version?

OptiType throwing error: Invalid constraint expression

I'm getting the following error message. Any idea what is causing it?

mapping with 8 threads...

 0:01:18.24 Mapping filtered_fished.fastq to GEN reference...

 0:01:45.95 Generating binary hit matrix.
0:01:45.97 Loading optitype_outdir/2017_04_25_02_50_30/2017_04_25_02_50_30_1.bam started. Number of HLA reads loaded (updated every thousand):

 0:01:46.04 672 reads loaded. Creating dataframe...
0:01:46.12 Dataframes created. Shape: 672 x 11179, hits: 6952 (11622), sparsity: 1 in 646.39

 0:01:51.76 temporary pruning of identical rows and columns

 0:01:52.06 Size of mtx with unique rows and columns: (43, 54)
0:01:52.06 determining minimal set of non-overshadowed alleles

 0:01:52.19 Keeping only the minimal number of required alleles (4,)

 0:01:52.19 Creating compact model...

starting ilp solver with 1 threads...

 0:01:52.20 Initializing OptiType model...
Welcome to the CBC MILP Solver 
Version: 2.9 
Build Date: Mar  7 2017 

command line - /n/sw/fasrcsw/apps/Core/Cbc/2.9-fasrc01/bin/cbc -printingOptions all -import /tmp/tmpm4vfz6ss.pyomo.lp -import -stat=1 -solve -solu /tmp/tmpm4vfz6ss.pyomo.soln (default strategy 1)
Option for printingOptions changed from normal to all
 CoinLpIO::readLp(): Maximization problem reformulated as minimization
Current default (if $ as parameter) for import is /tmp/tmpm4vfz6ss.pyomo.lp
Presolve 16 (-8) rows, 11 (-3) columns and 37 (-13) elements
Statistics for presolved model
Original problem has 8 integers (8 of which binary)
Presolved problem has 6 integers (6 of which binary)
==== 3 zero objective 7 different
1 variables have objective of -116
1 variables have objective of -101
2 variables have objective of -5
3 variables have objective of 0
2 variables have objective of 0.045
1 variables have objective of 0.909
1 variables have objective of 1.044
==== absolute objective values 7 different
3 variables have objective of 0
2 variables have objective of 0.045
1 variables have objective of 0.909
1 variables have objective of 1.044
2 variables have objective of 5
1 variables have objective of 101
1 variables have objective of 116
==== for integers 2 zero objective 4 different
1 variables have objective of -116
1 variables have objective of -101
2 variables have objective of -5
2 variables have objective of 0
==== for integers absolute objective values 4 different
2 variables have objective of 0
2 variables have objective of 5
1 variables have objective of 101
1 variables have objective of 116
===== end objective counts


Problem has 16 rows, 11 columns (8 with objective) and 37 elements
Column breakdown:
4 of type 0.0->inf, 1 of type 0.0->up, 0 of type lo->inf, 
0 of type lo->up, 0 of type free, 0 of type fixed, 
0 of type -inf->0.0, 0 of type -inf->up, 6 of type 0.0->1.0 
Row breakdown:
0 of type E 0.0, 0 of type E 1.0, 0 of type E -1.0, 
0 of type E other, 0 of type G 0.0, 1 of type G 1.0, 
0 of type G other, 10 of type L 0.0, 1 of type L 1.0, 
4 of type L other, 0 of type Range 0.0->1.0, 0 of type Range other, 
0 of type Free 
Continuous objective value is -224.957 - 0.00 seconds
Cgl0004I processed model has 16 rows, 11 columns (6 integer (6 of which binary)) and 37 elements
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of -224.957
Cbc0038I Relaxing continuous gives -224.957
Cbc0038I Before mini branch and bound, 6 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.00 seconds)
Cbc0038I After 0.00 seconds - Feasibility pump exiting with objective of -224.957 - took 0.00 seconds
Cbc0012I Integer solution of -224.957 found by feasibility pump after 0 iterations and 0 nodes (0.00 seconds)
Cbc0001I Search completed - best objective -224.957, took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from -224.957 to -224.957
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                -224.95700000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.00
Time (Wallclock seconds):       0.01

Total time (CPU seconds):       0.00   (Wallclock seconds):       0.01

Traceback (most recent call last):
  File "/n/regal/nowak_lab/immunotherapy/OptiType/OptiTypePipeline.py", line 405, in <module>
    result = op.solve(args.enumerate)
  File "/n/regal/nowak_lab/immunotherapy/OptiType/model.py", line 188, in solve
    self.__instance.c.add(expr >= 1)
  File "/n/scrb152/Software/Python/py35/lib/python3.5/site-packages/pyomo/core/base/constraint.py", line 1188, in add
    cdata = self._check_skip_add(self._nconstraints + 1, expr)
  File "/n/scrb152/Software/Python/py35/lib/python3.5/site-packages/pyomo/core/base/constraint.py", line 895, in _check_skip_add
    self._data[index].name))
ValueError: Invalid constraint expression. The constraint expression resolved to a trivial Boolean (False) instead of a Pyomo object. Please modify your rule to return Constraint.Infeasible instead of False.

invalid literal for int() with base 10: 'c'

This is my first time running optitype, so forgive me if I missed something obvious. I am getting an odd parsing error. This also occurs with running the test dataset.

Command:
python OptiTypePipeline.py -i ./sample_opti.1.fq ./sample_opti.2.fq --dna -c ./config.ini -v -o ./optitype_work/

Output:

...
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part is 24641

Solving LP relaxation...
GLPK Simplex Optimizer, v4.59
24641 rows, 13057 columns, 387578 non-zeros
      0: obj =  -0.000000000e+00 inf =   6.000e+00 (6)
      6: obj =  -5.000000000e-02 inf =   0.000e+00 (0)
*   500: obj =   3.181086340e+04 inf =   2.501e-14 (5968)
*  1000: obj =   3.630040867e+04 inf =   2.998e-15 (5832) 1
*  1500: obj =   3.858366231e+04 inf =   1.865e-14 (5697) 1
*  2000: obj =   4.039086154e+04 inf =   8.253e-15 (5604)
*  2500: obj =   4.164897250e+04 inf =   8.357e-15 (5524) 1
*  3000: obj =   4.272261500e+04 inf =   5.965e-15 (5427) 1
*  3500: obj =   4.343465417e+04 inf =   8.882e-15 (5356) 2
*  4000: obj =   4.420333917e+04 inf =   6.217e-15 (5259) 1
*  4500: obj =   4.513628286e+04 inf =   1.753e-14 (5178) 4
*  5000: obj =   4.557089286e+04 inf =   0.000e+00 (5148) 1
*  5500: obj =   4.586324125e+04 inf =   0.000e+00 (5136) 1
*  6000: obj =   4.677771714e+04 inf =   4.199e-14 (5013) 1
*  6500: obj =   4.804854833e+04 inf =   1.110e-16 (4827) 3
*  7000: obj =   4.926832833e+04 inf =   0.000e+00 (4595) 2
*  7500: obj =   4.933832833e+04 inf =   0.000e+00 (4564)
*  8000: obj =   4.940799500e+04 inf =   3.331e-15 (4515)
*  8500: obj =   4.947966167e+04 inf =   0.000e+00 (4492) 1
*  9000: obj =   4.955066167e+04 inf =   0.000e+00 (4462)
*  9500: obj =   5.095554333e+04 inf =   0.000e+00 (4141) 2
* 10000: obj =   5.227048333e+04 inf =   0.000e+00 (3768) 1
* 10500: obj =   5.347663333e+04 inf =   0.000e+00 (3333) 1
* 11000: obj =   5.448885333e+04 inf =   0.000e+00 (2891)
* 11500: obj =   5.521412000e+04 inf =   0.000e+00 (2429) 1
* 12000: obj =   5.594569333e+04 inf =   0.000e+00 (1965)
* 12500: obj =   5.638794833e+04 inf =   0.000e+00 (1485)
* 13000: obj =   5.676556000e+04 inf =   0.000e+00 (1004)
* 13500: obj =   5.715105500e+04 inf =   0.000e+00 (516)
* 14000: obj =   5.752945500e+04 inf =   0.000e+00 (34) 1
* 14037: obj =   5.755625833e+04 inf =   0.000e+00 (0)
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
+ 14037: mip =     not found yet <=              +inf        (1; 0)
+ 14038: >>>>>   5.755594000e+04 <=   5.755594000e+04   0.0% (2; 0)
+ 14038: mip =   5.755594000e+04 <=     tree is empty   0.0% (0; 3)
INTEGER OPTIMAL SOLUTION FOUND
Time used:   12.8 secs
Memory used: 75.9 Mb (79626336 bytes)
Writing MIP solution to '/var/folders/76/zt8rzbc5077bnxl8pvhpw6d4wvz933/T/tmpH4rygQ.glpk.raw'...
37709 lines were written
invalid literal for int() with base 10: 'c'
Traceback (most recent call last):
  File "~/GitHub/OptiType/OptiTypePipeline.py", line 373, in <module>
    result = op.solve(args.enumerate)
  File "~/GitHub/OptiType/model.py", line 149, in solve
    res = self.__solver.solve(self.__instance, options={}, tee=self.__verbosity)
  File "~/anaconda2/lib/python2.7/site-packages/pyomo/opt/base/solvers.py", line 578, in solve
    result = self._postsolve()
  File "~/anaconda2/lib/python2.7/site-packages/pyomo/opt/solver/shellcmd.py", line 161, in _postsolve
    results = self.process_output(self._rc)
  File "~/anaconda2/lib/python2.7/site-packages/pyomo/opt/solver/shellcmd.py", line 220, in process_output
    self.process_soln_file(results)
  File "~/anaconda2/lib/python2.7/site-packages/pyomo/solvers/plugins/solvers/GLPK.py", line 445, in process_soln_file
    raise ValueError(msg)
ValueError: Error parsing solution data file, line 1

When I open up the MIP solution tmp file it has the format of:

c Problem:    
c Rows:       24642
c Columns:    13058
c Non-zeros:  387579
c Status:     INTEGER OPTIMAL
c Objective:  x13058 = 57555.94 (MAXimum)
c
s mip 24642 13058 o 57555.9400000032
i 1 2
i 2 2
i 3 2
i 4 2
...

I guess it has to do with the 'c' in the file being treated as integers but I am not sure why.
Any ideas?

Creation of tmp files not working

Hi,

When running this, I'm trying to figure out why tmp files that are supposed to be created for input into the solver (I'm using Cbc) do not get written. Before I go hunting around code, does anyone know what is creating these tmp files and perhaps venture a guess as to why they are not getting written. I'll post code if people think it's needed.

Thanks,

-todd

Starting with bam files

Hi,

Is there anyway for OptiType to take bam files directly without having to run bam2fq?

Thanks,

-todd

Docker image reproducible and making biodocker the user is restrictive

The docker image can't be rebuilt locally because the source image biodckr/biodocker does not exist anymore. Also, making the USER biodocker means that a biodocker user needs to exist on whatever machine this container is being run on or the data and folders have to be read and writable by everyone.

Question about hla_referenence_dna.fasta

Hello,

I was looking through the given fasta files under data directory and noticed that some of the sequences provided in there are combinations of coding sequence and non-coding sequence.

What puzzles me is that some of them have different first 2 digits. (ex. HLA07296_HLA00097 HLA-A_33:53 (introns from HLA-A_31:01:02)). And it seems like the list of combinations of exon and intron that share first two digits are not exhaustive.

Is there a logic to how these alleles are combined? (And if I were to update this fasta to more recent alleles from the HLA db, what logic should I use to combine the alleles?)

Thanks.

weird HLA results

I'm using optitype/1.3.1, specifying solver=cbc in config.ini, using hla_reference_dna.fasta. i prefilter my reads using razers3. This is the output i'm getting with my own data.

        A1      A2      B1      B2      C1      C2      Reads   Objective
0       HLA00001        HLA00037        HLA00344        HLA00180        HLA00
433     HLA00401        4305    4072.53

EDIT:
This is the output i get using the test data from the code.

        A1      A2      B1      B2      C1      C2      Reads   Objective
0       HLA00001        HLA00001        HLA00146        HLA00381        HLA00
433     HLA00430        1156    1135.192

The stdout
filtering for hla region reads for R1
convert filtered bam1 to fastq1
filtering for hla region reads for R2
convert filtered bam1 to fastq1
run hla typing

mapping with 16 threads...

0:00:00.51 Mapping EVL35_1.fastq to GEN reference...

0:00:25.57 Mapping EVL35_2.fastq to GEN reference...

0:00:58.46 Generating binary hit matrix.
Warning: PySam not available on the system. Falling back to primitive SAM par
sing.
0:00:58.46 Loading alleles and read IDs from 01-filter-hla-read/output/EVL35/
2018_01_18_18_39_13/2018_01_18_18_39_13_1.sam...
0:01:03.47 11179 alleles and 6968 reads found.
0:01:03.47 Initializing mapping matrix...
0:01:03.47 6968x11179 mapping matrix initialized. Populating 1549982 hits fro
m SAM file...
10% completed
20% completed
0:04:55.05 1549982 elements filled. Matrix sparsity: 1 in 50.26
Warning: PySam not available on the system. Falling back to primitive SAM par
sing.
0:04:55.42 Loading alleles and read IDs from 01-filter-hla-read/output/EVL35/
2018_01_18_18_39_13/2018_01_18_18_39_13_2.sam...
0:05:00.29 11179 alleles and 6989 reads found.
0:05:00.29 Initializing mapping matrix...
0:05:00.30 6989x11179 mapping matrix initialized. Populating 1519093 hits fro
m SAM file...
10% completed
0:08:47.45 1519093 elements filled. Matrix sparsity: 1 in 51.43
0:08:48.94 Alignment pairing completed. 6164 paired, 1561 unpaired, 34 discor
dant

0:08:52.71 temporary pruning of identical rows and columns

0:08:52.96 Size of mtx with unique rows and columns: (983, 890)
0:08:52.96 determining minimal set of non-overshadowed alleles

0:08:55.61 Keeping only the minimal number of required alleles (77,)

0:08:55.61 Creating compact model...

starting ilp solver with 1 threads...

0:08:55.93 Initializing OptiType model...
Welcome to the CBC MILP Solver
Version: 2.8
Build Date: Aug 5 2015
Revision Number: 2210

command line - /risapps/rhel6/cbc/2.8/bin/cbc -printingOptions all -import /t
mp/tmp2d8uhj.pyomo.lp -import -stat=1 -solve -solu /tmp/tmp2d8uhj.pyomo.soln
(default strategy 1)
Option for printingOptions changed from normal to all
Coin0009I CoinLpIO::readLp(): Maximization problem reformulated as minimizat
ion
Current default (if $ as parameter) for import is /tmp/tmp2d8uhj.pyomo.lp
Presolve 845 (-1) rows, 494 (-1) columns and 3059 (-1) elements
Statistics for presolved model

Problem has 845 rows, 494 columns (458 with objective) and 3059 elements
Column breakdown:
208 of type 0.0->inf, 1 of type 0.0->up, 0 of type lo->inf,
0 of type lo->up, 0 of type free, 0 of type fixed,
0 of type -inf->0.0, 0 of type -inf->up, 285 of type 0.0->1.0
Row breakdown:
0 of type E 0.0, 0 of type E 1.0, 0 of type E -1.0,
0 of type E other, 0 of type G 0.0, 6 of type G 1.0,
0 of type G other, 624 of type L 0.0, 0 of type L 1.0,
215 of type L other, 0 of type Range 0.0->1.0, 0 of type Range other,
0 of type Free
Continuous objective value is -4072.53 - 0.01 seconds
Cgl0004I processed model has 839 rows, 494 columns (285 integer) and 2982 ele
ments
Cbc0038I Solution found of -4072.53
Cbc0038I Before mini branch and bound, 285 integers at bound fixed and 25 con
tinuous
Cbc0038I Mini branch and bound did not improve solution (0.02 seconds)
Cbc0038I After 0.02 seconds - Feasibility pump exiting with objective of -407
2.53 - took 0.00 seconds
Cbc0012I Integer solution of -4072.53 found by feasibility pump after 0 itera
tions and 0 nodes (0.02 seconds)
Cbc0001I Search completed - best objective -4072.530000000001, took 0 iterati
ons and 0 nodes (0.02 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from -4072.53 to -4072.53
Probing was tried 0 times and created 0 cuts of which 0 were active after add
ing rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after addi
ng rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after ad
ding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after addi
ng rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were ac
tive after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after a
dding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after
adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value: -4072.53000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.03
Time (Wallclock seconds): 0.03

Total time (CPU seconds): 0.03 (Wallclock seconds): 0.04

0:08:56.29 Result dataframe has been constructed...

KeyError

Hi,

I've been running OptiType mostly successfully; however, sometimes there is a KeyError that is raised, which causes the program to stop. It only affects some samples but I haven't found a common link between them. I've included the error message and was wondering if you had encountered this before.

Traceback (most recent call last):
File "[base]/OptiType/OptiTypePipeline.py", line 315, in
r = result_4digit[["A1", "A2", "B1", "B2", "C1", "C2", "nof_reads", "obj"]]
File "[base]/python/lib/python2.7/site-packages/pandas/core/frame.py", line 1672, in getitem
return self._getitem_array(key)
File "[base]/python/lib/python2.7/site-packages/pandas/core/frame.py", line 1716, in _getitem_array
indexer = self.ix._convert_to_indexer(key, axis=1)
File "[base]/python/lib/python2.7/site-packages/pandas/core/indexing.py", line 1085, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['A1' 'A2'] not in index"

Thanks

Allele Database is Old

The files in the data folder are more than 3 years old. Packaging an up-to-date version of IMGT/HLA would be appreciated. Mis-annotated alleles have been fixed and new alleles added since 2014.

Role of aligner in determining HLA type

Hi,

I have used both razers3 and yara mapper for the aligner for Optitype and I am wondering if it would work with other aligners. Is there a specific setting / sweet spot for mismatches or clipping?

It seems like Optitype is sensitive to spurious alignments, ie reads aligning badly and hence causes a false HLA type after Optitype counts the reads. But on the other hand it can benefit with aligners that can include more reads and more information to discern the HLA types.

Would there be any issues if BWA mem is used instead to align the reads and would the scripts have an issue since it is designed to work with razers3 and yara?

I have seen some differences in the HLA types predicted with different aligners and I think it is a good way to show the robustness of optitype.

Optitype not predicting HLA-C alleles

I ran optitype, and everything looked fine, but the output file did not give any predictions for the HLA-C alleles. Here is the output log:

mapping with 8 threads...

 0:01:14.27 Mapping filtered_fished.fastq to GEN reference...

 0:03:26.66 Generating binary hit matrix.
0:03:26.70 Loading optitype_outdir/2017_04_25_02_02_55/2017_04_25_02_02_55_1.bam started. Number of HLA reads loaded (updated every thousand):
1K...
 0:03:26.97 1255 reads loaded. Creating dataframe...
0:03:27.18 Dataframes created. Shape: 1255 x 11179, hits: 22073 (22250), sparsity: 1 in 630.55

 0:03:33.52 temporary pruning of identical rows and columns

 0:03:33.63 Size of mtx with unique rows and columns: (50, 59)
0:03:33.63 determining minimal set of non-overshadowed alleles

 0:03:33.78 Keeping only the minimal number of required alleles (12,)

 0:03:33.78 Creating compact model...

starting ilp solver with 1 threads...

 0:03:33.83 Initializing OptiType model...
Welcome to the CBC MILP Solver 
Version: 2.9 
Build Date: Mar  7 2017 

command line - /n/sw/fasrcsw/apps/Core/Cbc/2.9-fasrc01/bin/cbc -printingOptions all -import /tmp/tmpext6iepo.pyomo.lp -import -stat=1 -solve -solu /tmp/tmpext6iepo.pyomo.soln (default strategy 1)
Option for printingOptions changed from normal to all
 CoinLpIO::readLp(): Maximization problem reformulated as minimization
Current default (if $ as parameter) for import is /tmp/tmpext6iepo.pyomo.lp
Presolve 103 (-9) rows, 61 (-3) columns and 285 (-15) elements
Statistics for presolved model
Original problem has 37 integers (37 of which binary)
Presolved problem has 35 integers (35 of which binary)
==== 8 zero objective 32 different
==== absolute objective values 32 different
==== for integers 7 zero objective 17 different
==== for integers absolute objective values 17 different
===== end objective counts


Problem has 103 rows, 61 columns (53 with objective) and 285 elements
Column breakdown:
25 of type 0.0->inf, 1 of type 0.0->up, 0 of type lo->inf, 
0 of type lo->up, 0 of type free, 0 of type fixed, 
0 of type -inf->0.0, 0 of type -inf->up, 35 of type 0.0->1.0 
Row breakdown:
0 of type E 0.0, 0 of type E 1.0, 0 of type E -1.0, 
0 of type E other, 0 of type G 0.0, 3 of type G 1.0, 
0 of type G other, 73 of type L 0.0, 0 of type L 1.0, 
27 of type L other, 0 of type Range 0.0->1.0, 0 of type Range other, 
0 of type Free 
Continuous objective value is -429.083 - 0.00 seconds
Cgl0004I processed model has 102 rows, 61 columns (35 integer (35 of which binary)) and 279 elements
Cbc0038I Initial state - 0 integers unsatisfied sum - 1.44329e-15
Cbc0038I Solution found of -429.083
Cbc0038I Relaxing continuous gives -429.083
Cbc0038I Before mini branch and bound, 35 integers at bound fixed and 5 continuous
Cbc0038I Mini branch and bound did not improve solution (0.01 seconds)
Cbc0038I After 0.01 seconds - Feasibility pump exiting with objective of -429.083 - took 0.00 seconds
Cbc0012I Integer solution of -429.083 found by feasibility pump after 0 iterations and 0 nodes (0.01 seconds)
Cbc0001I Search completed - best objective -429.083, took 0 iterations and 0 nodes (0.01 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from -429.083 to -429.083
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                -429.08300000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.01
Time (Wallclock seconds):       0.02

Total time (CPU seconds):       0.02   (Wallclock seconds):       0.03

The output file looks like this:

	A1	A2	B1	B2	C1	C2	Reads	Objective
0	A*11:01	A*11:01	B*07:02	B*07:02			433	429.083

Any idea why this is?

KeyError - '%s not in index' % objarr[mask] (pandas)

Hello,
My colleague is getting this error:

  File "/apps/RH7U2/gnu/OptiType/1.3.1/OptiTypePipeline.py", line 415, in <module>
    r = result_4digit[["A1", "A2", "B1", "B2", "C1", "C2", "nof_reads", "obj"]]
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/frame.py", line 1958, in __getitem__
    return self._getitem_array(key)
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/frame.py", line 2002, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/apps/RH7U2/gnu/python/2.7.13/lib/python2.7/site-packages/pandas/core/indexing.py", line 1231, in _convert_to_
indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['nof_reads' 'obj'] not in index"

pandas 0.20.3

Please advise.
Thank you.

E::hts_open_format] Failed to open file out/2017_11_18_07_05_37/2017_11_18_07_05_37_1.bam

prateek@cpu:~/dhwani$ sudo docker run -v test:/test -t fred2/optitype -i NA11995_SRR766010_1_fished.fastq NA11995_SRR766010_1_fished.fastq -d -o out
sudo: unable to resolve host cpu: Connection timed out
[E::hts_open_format] Failed to open file out/2017_11_18_07_05_37/2017_11_18_07_05_37_1.bam
Traceback (most recent call last):
File "/usr/local/bin/OptiType/OptiTypePipeline.py", line 299, in
pos, read_details = ht.pysam_to_hdf(bam_paths[0])
File "/usr/local/bin/OptiType/hlatyper.py", line 186, in pysam_to_hdf
sam = pysam.AlignmentFile(samfile, sam_or_bam)
File "pysam/libcalignmentfile.pyx", line 444, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 621, in pysam.libcalignmentfile.AlignmentFile._open
IOError: [Errno 2] could not open alignment file out/2017_11_18_07_05_37/2017_11_18_07_05_37_1.bam: No such file or directory

optitype halts

Hi I was wondering if you have a fix for the following issue. I get this error:

0:00:14.95 Mapping 4.R1.fished.fastq to NUC reference...

0:00:22.52 Mapping 4.R2.fished.fastq to NUC reference...

0:00:31.25 Generating binary hit matrix.
0:00:31.26 Loading OptiType_RNA/2017_03_16_11_54_33/2017_03_16_11_54_33_1.bam started. Number of HLA reads loaded (updated every thousand):

0:00:31.26 0 reads loaded. Creating dataframe...
Traceback (most recent call last):
  File "optitype-1.0/OptiType/OptiTypePipeline.py", line 267, in <module>
    pos, read_details = ht.pysam_to_hdf(bam_paths[0])
  File "optitype-1.0/OptiType/hlatyper.py", line 230, in pysam_to_hdf
    pos_df = pd.DataFrame.from_items(hits.iteritems()).T
  File "python-2.7.10/lib/python2.7/site-packages/pandas/core/frame.py", line 1046, in from_items
    keys, values = lzip(*items)
ValueError: need more than 0 values to unpack

With the following command for two of five datasets.

python2.7 OptiTypePipeline.py \
--config optitype_config.txt \
-i 4.R1.fished.fastq \
4.R2.fished.fastq \
--rna \
-v \
-o ~/OptiType_RNA

Error message for too few reads

Hi, I have been getting

Traceback (most recent call last):
  File "/data/rozencompute2/a0073895/optitest/OptiType/OptiTypePipeline.py", line 304, in <module>
    "in your config file (currently %.3f), because you may need to resort to using unpaired reads.") % unpaired_weight
TypeError: not enough arguments for format string

even when I have changed the unpaired weights in the config.ini file to 1. I am wondering if this is a error message when there are few reads in the input fastq or if something else is happening.

Optitype mapping error?

Hi,

I'm running optitype in rna mode for fastq.gz files. I get a large cryptic error that I'm not really able to figure out. Can someone help me translate this? See the command and error below:

Thanks,

-todd

OptiType-master/OptiTypePipeline.py --rna --verbose --config /Biomarker/ngs/software/OptiType/OptiType-master/config.ini -i /ts19/ngs/studies/ngs_000230/fastq/TB2-EM11595_R1.fastq.gz /ts19/ngs/studies/ngs_000230/fastq/TB2-EM11595_R2.fastq.gz -o /ts19/ngs/studies/ngs_000230/neoantigen_analysis/optitype

0:00:03.82 Mapping TB2-EM11595_R1.fastq.gz to NUC reference...
/home/mi/esiragusa/seqan/include/seqan/basic/basic_exception.h:368 FAILED! (Uncaught exception of type seqan::UnexpectedEnd: Unexpected end of input.)

stack trace:
0 [0xa9d3f7] /Biomarker/ngs/software/razers3/razers3-3.4.0-Linux-x86_64/bin/razers3()
1 [0xab37a6] __cxxabiv1::terminate(void (*)()) + 0x6
2 [0xab37d3] /Biomarker/ngs/software/razers3/razers3-3.4.0-Linux-x86_64/bin/razers3()
3 [0xab491e] /Biomarker/ngs/software/razers3/razers3-3.4.0-Linux-x86_64/bin/razers3()
4 [0x76b253] /Biomarker/ngs/software/razers3/razers3-3.4.0-Linux-x86_64/bin/razers3()
5 [0x7ce297] void seqan::readRecord<seqan::String<char, seqan::Alloc >, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5Q>, seqan::Alloc >, seqan::String<char, seqan::Alloc >, seqan::Iter<seqan::VirtualStream<char, seqan::Tagseqan::Input_, std::char_traits >, seqan::StreamIterator<seqan::Tagseqan::Input_ > > >(seqan::String<char, seqan::Alloc >&, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5Q>, seqan::Alloc >&, seqan::String<char, seqan::Alloc >&, seqan::Iter<seqan::VirtualStream<char, seqan::Tagseqan::Input_, std::char_traits >, seqan::StreamIterator<seqan::Tagseqan::Input_ > >&, seqan::Tagseqan::TagFastq_) + 0x77
6 [0x81924c] bool seqan::loadReads<MyFragStoreConfig, seqan::FragmentStoreConfig, seqan::RazerSOptions<seqan::RazerSSpec<false, false> > >(seqan::FragmentStore<MyFragStoreConfig, seqan::FragmentStoreConfig >&, seqan::FormattedFile<seqan::Tagseqan::TagFastq_, seqan::Tagseqan::Input_, void>&, seqan::RazerSOptions<seqan::RazerSSpec<false, false> >&) + 0x4ec
7 [0xa9c205] int mapReads<seqan::RazerSSpec<false, false> >(seqan::StringSet<seqan::String<char, seqan::Alloc >, seqan::Owner<seqan::Tagseqan::Default_ > >&, seqan::StringSet<seqan::String<char, seqan::Alloc >, seqan::Owner<seqan::Tagseqan::Default_ > >&, seqan::RazerSOptions<seqan::RazerSSpec<false, false> >&) + 0x4c5
8 [0x76287f] main + 0x22f
9 [0x31e101ed1d] __libc_start_main + 0xfd
10 [0x763039] /Biomarker/ngs/software/razers3/razers3-3.4.0-Linux-x86_64/bin/razers3()

0:00:03.99 Mapping TB2-EM11595_R2.fastq.gz to NUC reference...
/home/mi/esiragusa/seqan/include/seqan/basic/basic_exception.h:368 FAILED! (Uncaught exception of type seqan::UnexpectedEnd: Unexpected end of input.)

0:00:04.85 Generating binary hit matrix.
Traceback (most recent call last):
File "/Biomarker/ngs/software/OptiType/OptiType-master/OptiTypePipeline.py", line 275, in
pos, read_details = ht.pysam_to_hdf(bam_paths[0])
File "/Biomarker/ngs/software/OptiType/OptiType-master/hlatyper.py", line 177, in pysam_to_hdf
sam = pysam.AlignmentFile(samfile, sam_or_bam)
File "pysam/calignmentfile.pyx", line 311, in pysam.calignmentfile.AlignmentFile.cinit (pysam/calignmentfile.c:4929)
File "pysam/calignmentfile.pyx", line 480, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:6905)
IOError: file /ts19/ngs/studies/ngs_000230/neoantigen_analysis/optitype/TB2-EM11595/2018_03_02_10_16_01_1.bam not found

Memory issues with Razers3

I have a 28GB bam file, and I'm running into memory issues with Razers3 even when I allocate 32GB of RAM for this job. Do you have any recommendations on what to do? Thanks.

AttributeError: type object 'PluginGlobals' has no attribute 'add_env'

Dear all,

To my knowledge, I have installed OptiType and the required softwares and libraries, however I got the following error:

C02NQ30CG3QT:OptiType-master jimene01$ python OptiTypePipeline.py --help
Error loading 'pyutilib.component' entry points: 'type object 'PluginGlobals' has no attribute     'add_env''
Traceback (most recent call last):
  File "OptiTypePipeline.py", line 108, in <module>
    from model import OptiType
  File "/Users/jimene01/Documents/ResearchPlacement/Project/OptiType/OptiType-master/model.py", line 15, in <module>
    import coopr.environ
  File "/Users/jimene01/anaconda/lib/python2.7/site-packages/coopr/environ/__init__.py", line 48, in   <module>
    import_packages()
  File "/Users/jimene01/anaconda/lib/python2.7/site-packages/coopr/environ/__init__.py", line 38, in import_packages
    do_import(pname)
  File "/Users/jimene01/anaconda/lib/python2.7/site-packages/coopr/environ/__init__.py", line 20, in do_import
    __import__(pname, globals(), locals(), [], -1)
  File "/Users/jimene01/anaconda/lib/python2.7/site-packages/coopr/pyomo/__init__.py", line 16, in <module>
    from pyomo.environ import *
 File "/Users/jimene01/anaconda/lib/python2.7/site-packages/pyomo/environ/__init__.py", line 13, in <module>
    import pyomo.core
  File "/Users/jimene01/anaconda/lib/python2.7/site-packages/pyomo/core/__init__.py", line 10, in <module>
    from pyomo.util.plugin import PluginGlobals
  File "/Users/jimene01/anaconda/lib/python2.7/site-packages/pyomo/util/__init__.py", line 10, in <module>
    from pyomo.util._task import pyomo_api, PyomoAPIData, PyomoAPIFactory
  File "/Users/jimene01/anaconda/lib/python2.7/site-packages/pyomo/util/_task.py", line 26, in   <module>
    plugin.PluginGlobals.add_env("pyomo")
AttributeError: type object 'PluginGlobals' has no attribute 'add_env'

I would highly appreciate if you would not mind helping me troubleshooting my installation, since I noticed that I have the softwares but for many of them the version is different from the one you have put in the requirements, and I do not know if that affects OptiType.

Moreover, I am not quite sure if Razers3 and Cbc works correctly... To be honest, this is the first time I install something that needs other many programs and I am not quite sure everything works correctly together.

I would highly appreciate any help that may be provided,
Alejandro

permission error occured when using docker file

when using docker file, It give me an error info, it seems to be the permission error of mkdir function. I googled and used the solution of add '--previliged=true', but it still doesn't work. My system is Ubuntu 16.04.3 and docker ce. Is there anything I can do to solve it?

Traceback (most recent call last):
File "/usr/local/bin/OptiType/OptiTypePipeline.py", line 264, in
os.makedirs(out_dir)
File "/usr/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/data/2017_12_31_08_56_44'

another error info I encounter is the software installation by bioconda, the error info is ：

0:12:16.95 Generating binary hit matrix.
Traceback (most recent call last):
File "/home/zjd/miniconda3/envs/python2.7/bin/OptiTypePipeline.py", line 303, in
pos, read_details = ht.pysam_to_hdf(bam_paths[0])
File "/home/zjd/miniconda3/envs/python2.7/share/optitype-1.2.1-0/hlatyper.py", line 186, in pysam_to_hdf
sam = pysam.AlignmentFile(samfile, sam_or_bam)
AttributeError: 'module' object has no attribute 'AlignmentFile'

OptiType crashing with message "this is indicative of a SERIOUS ERROR in the expression reuse detection scheme."

OptiType is crashing with the following error message from the log:

mapping with 8 threads...

 0:00:59.80 Mapping PGDX3144N_HLA_filtered_fished.fastq to GEN reference...

 0:03:48.28 Generating binary hit matrix.
0:03:48.31 Loading ../OptiType_output/2017_03_07_16_43_12/2017_03_07_16_43_12_1.bam started. Number of HLA reads loaded (updated every thousand):
1K...2K...3K...4K...5K...6K...7K...8K...9K...10K...11K...12K...13K...14K...
 0:04:18.66 14383 reads loaded. Creating dataframe...
0:04:21.58 Dataframes created. Shape: 14383 x 11179, hits: 4063907 (4063907), sparsity: 1 in 39.56

 0:04:37.63 temporary pruning of identical rows and columns

 0:04:39.04 Size of mtx with unique rows and columns: (767, 1332)
0:04:39.04 determining minimal set of non-overshadowed alleles

 0:04:55.87 Keeping only the minimal number of required alleles (168,)

 0:04:55.91 Creating compact model...

starting ilp solver with 1 threads...

 0:04:58.69 Initializing OptiType model...
ERROR: Rule failed when generating expression for objective read_cov:
	RuntimeError: Expression entered generate_expression() with too few references (-1<0); this is indicative of a SERIOUS ERROR in the expression reuse detection scheme.
ERROR: Constructing component 'read_cov' from data=None failed:
	RuntimeError: Expression entered generate_expression() with too few references (-1<0); this is indicative of a SERIOUS ERROR in the expression reuse detection scheme.

The explicit error message is:

Traceback (most recent call last):
  File "OptiTypePipeline.py", line 404, in <module>
    config.get("ilp", "solver"), threads, verbosity=VERBOSE)
  File "/n/regal/nowak_lab/immunotherapy/OptiType/model.py", line 97, in __init__
    model.reconst[a] * model.x[a] for a in model.L), sense=maximize)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/pyomo/core/base/block.py", line 484, in __setattr__
    self.add_component(name, val)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/pyomo/core/base/block.py", line 890, in add_component
    val.construct(data)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/pyomo/core/base/objective.py", line 307, in construct
    tmp = _init_rule(_self_parent)
  File "/n/regal/nowak_lab/immunotherapy/OptiType/model.py", line 96, in <lambda>
    rule=lambda model: sum(model.occ[r] * (model.y[r] - model.beta * (model.re[r])) for r in model.R) - sum(
  File "/n/regal/nowak_lab/immunotherapy/OptiType/model.py", line 96, in <genexpr>
    rule=lambda model: sum(model.occ[r] * (model.y[r] - model.beta * (model.re[r])) for r in model.R) - sum(
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/pyomo/core/base/numvalue.py", line 460, in __sub__
    return generate_expression(_sub,self,other)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/pyomo/core/base/expr_coopr3.py", line 1028, in generate_expression
    other = _generate_expression__clone_if_needed(other, 0)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/pyomo/core/base/expr_coopr3.py", line 918, in _generate_expression__clone_if_needed
    % ( getrefcount(obj) - UNREFERENCED_EXPR_COUNT, ))
RuntimeError: Expression entered generate_expression() with too few references (-1<0); this is indicative of a SERIOUS ERROR in the expression reuse detection scheme.

Do you know what the problem could be?

Classes II ?

Dear main contributors @andras86 @b-schubert

Thank you to release this open-source efficient code.

From my few experiments, it works well for Class I. Nice! :-)
Obviously, more is better :-) and Class II should be nicer.

From what I have quickly read in your paper and in your code, your proposed strategy should work (more or less) with any Class. Right ?
Less in the sense that the Integer Linear Program will be harder and harder to solve when increasing the number of alleles, since the matrix of constraints will be large (or extremely large).

Therefore, I would like asking:

Is it still planned to release an updated version ?

If yes, is it available elsewhere ? branch, repo, etc. Even an half-cooked version
If no, does it exist documented attempts ? what passed and what failed ? Saving some time to avoid already known traps.

All the best

Pyomo-supported ILP solver

Hello,

OptiType returned the following message while calling the module model.py:

Invalid option '-s'; try /group/bioinformatics/software/GLPK/4.61/bin/glpsol --help
ERROR: "[base]/site-packages/pyomo/opt/base/solvers.py", 599, solve
	Solver (asl) returned non-zero return code (1)
ERROR: "[base]/site-packages/pyomo/opt/base/solvers.py", 602, solve
	See the solver log above for diagnostic information.

This is my command:

module load anaconda3
source activate python2.7

module load GLPK/4.61 HDF5/1.10.0-patch1 samtools/1.2 bwa/0.7.15 sambamba/0.5.6

python ./OptiTypePipeline.py -i ./test/exome/NA11995_SRR766010_1_fished.fastq ./test/exome/NA11995_SRR766010_2_fished.fastq -o test -d -v

source deactivate

I was not able to install RazerS and CPLEX on our server, hence went for bwa and GLPK instead. I modified OptiTypePipeline.py a little bit to take in bwa alignment commands instead of the default RazerS .

The command-line solver for GLPK is glpsol, which I also changed in config.ini. Could that be the reason why the error was reported? The modified OptiTypePipeline.py, config, commands and log files are attached here if helpful myfiles.zip

Any help would be highly appreciated. Thanks much in advance :)

Best,
Riyue (Sunny)
The University of Chicago

Mess with the version

Hello!
In the readme the version is 1.3.1 (2014).
But in releases we can see only 1.2.1. I think it would be better to fix this.
And inside the OptiTypePipeline.py we can see:

Date: April 2014
Version: 1.0

Some problem about RazerS 3.1

Hi,
When I installed the OptiType, I found that the website about Razer3 could not be accessed and it seems that there is no other way to download the Razer3. Could you mind to help?

OSError: [Errno 13] Permission denied: '/local'

I know somebody down here had the same problem.

I'm trying to use optotype in docker:
docker run -v /local/pVACtools/:/data/ -t fred2/optitype --input SRR2672972_1.fastq SRR2672972_2.fastq --rna -o /local/pVACtools/Optitype/RNA_control

And I get this:

Traceback (most recent call last):
  File "/usr/local/bin/OptiType/OptiTypePipeline.py", line 235, in <module>
    os.makedirs(args.outdir)        
  File "/usr/lib/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/local'

I assume that python 2.7is also in the docker. I've tried to run this as sudo and ik keeps showing the same errors.

Thanks

Unable to get docker container to work

Steps I've taken:

Filter out MHC region from WGS BAM file
- samtools view -b subject1.bam chr6:29940260-33086201 > subject1.mhc.bam
Sort BAM file and extract pair reads as fastq files
- samtools sort -n -l 1 subject1.mhc.bam subject1.mhc.sorted
- bedtools bamtofastq -i subject1.mhc.sorted.bam -fq subject1.mhc-end1.fq -fq2 subject1.mhc-end2.fq
User razers3 for filtering out reads as suggested
- razers3 --percent-identity 90 --max-hits 1 --distance-range 0
  --output subject1.mhc-raz1.sam OptiType/data/hla_reference_dna.fasta subject1.mhc-end1.fq
- razers3 --percent-identity 90 --max-hits 1 --distance-range 0
  --output subject1.mhc-raz2.sam OptiType/data/hla_reference_dna.fasta subject1.mhc-end2.fq
Converting SAM files to fastq
- cat subject1.mhc-raz1.sam grep -v ^@ | awk '{print "@"$1"\n"$10"\n+\n"$11}' > subject1.mhc-raz1.fastq
- cat subject1.mhc-raz2.sam grep -v ^@ | awk '{print "@"$1"\n"$10"\n+\n"$11}' > subject1.mhc-raz2.fastq
Finally I'm trying to run the OptiType container on the two fastq files.
- docker run -v /home/biodocker:/data/ -t fred2/optitype -i /home/biodocker/subject1.mhc-raz1.fastq /home/biodocker/subject1.mhc-raz2.fastq -d -o /home/biodocker

Here is the error I'm getting:

docker run -v /home/biodocker:/data/ -t fred2/optitype -i /home/biodocker/subject1.mhc-raz1.fastq /home/biodocker/subject1.mhc-raz2.fastq -d -o /home/biodocker
Traceback (most recent call last):
  File "/usr/local/bin/OptiType/OptiTypePipeline.py", line 299, in <module>
    pos, read_details = ht.pysam_to_hdf(bam_paths[0])
  File "/usr/local/bin/OptiType/hlatyper.py", line 186, in pysam_to_hdf
    sam = pysam.AlignmentFile(samfile, sam_or_bam)
  File "pysam/libcalignmentfile.pyx", line 397, in pysam.libcalignmentfile.AlignmentFile.__cinit__ (pysam/libcalignmentfile.c:5831)
  File "pysam/libcalignmentfile.pyx", line 558, in pysam.libcalignmentfile.AlignmentFile._open (pysam/libcalignmentfile.c:7556)
IOError: file `/home/biodocker/2017_05_10_23_04_02/2017_05_10_23_04_02_1.bam` not found

Any thoughts on what I may be doing wrong? Any advice or suggestions would be greatly appreciated.

Inconsistency with verification data

Hey guys.

I'm developing geneotyper with a similar purpose as OptiType. When I was comparing it to OptiType I noticed there are some inconsistencies with the verification data I found on the 1000 Genomes FTP site ( ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20140725_hla_genotypes/20140702_hla_diversity.txt ) and the data you use in your supplementary table S2.

For example: In your table you have...

A*02:07 but 1000 genomes have 02:01 for samples NA18537, NA18550, NA18552, NA18573, NA18943, NA18966, NA18987, and NA19007.
B*27:05 but 1000 genomes have 27:03/27:52/27:09 for sample NA12005.
C*02:10 but 1000 genomes have 02:02 for samples NA18504, NA18505, NA18507, NA18522, and NA18870.

What's the reason behind this inconsistency? I'm sorry if you mention it somewhere in your article, but I couldn't find it.

Cheers,
Hannes

ConfigParser.NoSectionError: No section: 'LIBRARIES'

Hello,

I ran the script with the provided exome test files and got these error messages. I would greatly appreciate it if you can have a look and help me troubleshoot the issues.

[jc2545@login-0-1 exome]$ python ~/programs/OptiType-master/OptiTypePipeline.py -i NA11995_SRR766010_1_fished.fastq NA11995_SRR766010_2_fished.fastq -d -v -o .
Traceback (most recent call last):
  File "/home/jc2545/programs/OptiType-master/OptiTypePipeline.py", line 195, in <module>
ALLELE_HDF = config.get("LIBRARIES", "ALLELES")
  File "/home/jc2545/python/lib/python2.7/ConfigParser.py", line 607, in get
raise NoSectionError(section)
ConfigParser.NoSectionError: No section: 'LIBRARIES'

Here is my config.ini file

[jc2545@login-0-0 OptiType-master]$ cat config.ini 
[MAPPING]
#please specify the razerS3 binary path
RAZERS3=/home/jc2545/programs/razers3-3.4.0-Linux-x86_64/bin/razers3
THREADS=8

[LIBRARIES]
RNA_REF=./data/hla_reference_rna.fasta
DNA_REF=./data/hla_reference_dna.fasta
ALLELES=./data/alleles.h5

[OPTIMIZATION]
#the solver has to be supported by Coopr
SOLVER=cbc
THREADS=1

I believe all the required softwares and libraries are installed.

[jc2545@login-0-1 exome]$ python ~/programs/OptiType-master/OptiTypePipeline.py --help
usage: OptiType [-h] --input INPUT [INPUT ...] (--rna | --dna) [--beta BETA]
            [--enumerate ENUMERATE] --outdir OUTDIR [--verbose]

OptiType: 4-digit HLA typer

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT [INPUT ...], -i INPUT [INPUT ...]
                        Fastq files with fished HLA reads. Max two files (for
                        paired-end)
  --rna, -r             Specifiying the mapped data as RNA.
  --dna, -d             Specifiying the mapped data as DNA.
  --beta BETA, -b BETA  The beta value for for homozygosity detection.
  --enumerate ENUMERATE, -e ENUMERATE
                         The number of enumerations.
  --outdir OUTDIR, -o OUTDIR
                        Specifies the out directory to which all files should
                        be written
  --verbose, -v         Set verbose mode on.

Error at determining minimal set of non-overshadowed alleles

Hello,

I am using OptiType with Python 2.7.10. After installing some modules, I could run the analysis:
python /OptiType/OptiTypePipeline.py -d -v -i $curDir/NeoEpitopePrediction/${Tumor}_1.fastq $curDir/NeoEpitopePrediction/${Tumor}_2.fastq -o $curDir/NeoEpitopePrediction/HLA/
until determining minimal set of non-overshadowed alleles step:

` 0:00:00.38 Mapping Sample_214310406_T-AL-O_1.fastq to GEN reference...

0:00:19.11 Mapping Sample_214310406_T-AL-O_2.fastq to GEN reference...

0:00:38.76 Generating binary hit matrix.
0:00:38.76 Loading alleles and read IDs from /data/Analysis/NeoEpitopePrediction/HLA/2017_04_25_11_59_59/2017_04_25_11_59_59_0.sam...
0:00:40.06 11179 alleles and 2016 reads found.
0:00:40.06 Initializing mapping matrix...
0:00:40.07 2016x11179 mapping matrix initialized. Populating 1077618 hits from SAM file...
10% completed
20% completed
30% completed
40% completed
50% completed
60% completed
70% completed
80% completed
90% completed
100% completed
0:03:35.02 1077618 elements filled. Matrix sparsity: 1 in 20.91
0:03:44.25 Loading alleles and read IDs from /data/Analysis/NeoEpitopePrediction/HLA/2017_04_25_11_59_59/2017_04_25_11_59_59_1.sam...
0:03:45.11 11179 alleles and 2177 reads found.
0:03:45.11 Initializing mapping matrix...
0:03:45.12 2177x11179 mapping matrix initialized. Populating 992781 hits from SAM file...
10% completed
20% completed
30% completed
40% completed
50% completed
60% completed
70% completed
80% completed
90% completed
100% completed
0:06:28.36 992781 elements filled. Matrix sparsity: 1 in 24.51

0:06:40.68 temporary pruning of identical rows and columns

0:06:40.71 Size of mtx with unique rows and columns: (312, 446)
0:06:40.71 determining minimal set of non-overshadowed alleles `

Could this problem be related with the solver? I tried both solvers cbc and glpk that I added to my $PATH.

Thank you in advance for your help

KeyError when using BAM files

Hello,
I get the following error when I use BAM files as an input for OptiType. Is there a restriction on the chromosome notation or does this error have a different cause?

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1434, in _has_valid_type
error()
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1429, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [chr1] is not in the [index]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/software/install/OptiType/OptiTypePipeline.py", line 355, in
alleles_to_keep = list(filter(is_frequent, binary.columns))
File "/home/software/install/OptiType/OptiTypePipeline.py", line 142, in is_frequent
return table.loc[allele_id]['4digit'] in freq_alleles and table.loc[allele_id]['flags'] == 0 or (table.loc[allele_id]['locus'] in 'HGJ')
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1328, in getitem
return self._getitem_axis(key, axis=0)
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1551, in _getitem_axis
self._has_valid_type(key, axis)
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1442, in _has_valid_type
error()
File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1429, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [chr1] is not in the [index]'

Ambiguous allele combinations or ambiguous results in general

Hello all,

I was wondering how OptiType handles ambiguous results. I would expect that the solver returns something along the lines of "no single best solution found" or similar. Or will all best results be reported?

Further, is this a problem at all? I.e. you have benchmarked with quite a lot of datasets, have you ever seen such a case or is too rare to worry about? I just saw the -e switch, so I can't say yet for our benchmarks.

Some background

From Ambiguous allele combinations in HLA Class I and Class II sequence-based typing: when precise nucleotide sequencing leads to imprecise allele identification http://dx.doi.org/10.1186%2F1479-5876-2-30

However, one of the inherent problems with this typing method is the interpretation of ambiguous allele combinations which occur when two or more different allele combinations produce identical sequences.

Example: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC517951/figure/F1/

The complete list can be found here: http://www.ebi.ac.uk/ipd/imgt/hla/ambig.html

[External] GLPK error 4.5x

With Pyomo 5.x the following error occurs for GLPK <= 4.5x (see Pyomo/pyomo#146)

Writing MIP solution to `/tmp/tmpDUGZvJ.glpk.raw'... 474 lines were written ERROR: Expecting 's' row after 'c' rows Traceback (most recent call last): File "OptiTypePipeline.py", line 405, in <module> result = op.solve(args.enumerate) File "/home/travis/build/FRED-2/OptiType/model.py", line 153, in solve res = self.__solver.solve(self.__instance, options={}, tee=self.__verbosity) File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/pyomo/opt/base/solvers.py", line 610, in solve result = self._postsolve() File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/pyomo/opt/solver/shellcmd.py", line 268, in _postsolve results = self.process_output(self._rc) File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/pyomo/opt/solver/shellcmd.py", line 330, in process_output self.process_soln_file(results) File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/pyomo/solvers/plugins/solvers/GLPK.py", line 362, in process_soln_file raise ValueError(msg) ValueError: Error parsing solution data file, line 2

Solver (glpk) returned non-zero return code

0 packages

biopython==1.64
Coopr 3.5.8787 (CPython 2.7.6 on Linux 3.13.0-37-generic)
matplotlib==1.3.1
pandas==0.13.1
solver: glpk 4.35

1 run the example command on Ubuntu 14.04 LTS

python OptiTypePipeline.py -i ./test/exome/NA11995_SRR766010_1_fished.fastq ./test/exome/NA11995_SRR766010_2_fished.fastq -d -v -o ./test/exome/

2 Warnings and Errors:

WARNING: No construction rule or expression specified for constraint 'c'     
Invalid option `--threads'; try /usr/local/bin/glpsol --help    
ERROR: "[base]/dist-packages/coopr/opt/base/solvers.py", 448, solve     
    Solver (glpk) returned non-zero return code (1)   
ERROR: "[base]/dist-packages/coopr/opt/base/solvers.py", 451, solve   
    See the solver log above for diagnostic information.    
glp_read_lp: reading problem data from `/tmp/tmpA9lW95.pyomo.lp'...   
/tmp/tmpA9lW95.pyomo.lp:3620: warning: lower bound of variable `x481' redefined    
/tmp/tmpA9lW95.pyomo.lp:3620: warning: upper bound of variable `x481' redefined    
...   (multiple lines similar warnings)

3 Results

    A1  A2  B1  B2  C1  C2  Reads   Objective
0   A*01:01 A*01:01 B*08:01 B*57:01 C*07:01 C*06:02 1156    1135.192

Error when running with GLPK solver

Hi,

When running against the test data, I'm getting the below error that I think traces back to GLPK but I'm not sure. Can someone help me possibly debug this issue?

Here's the shelll script I used to run it:

!/bin/bash

export SAMTOOLS=/Biomarker/ngs/software/samtools/samtools-1.2/bin
export GLPK=/Biomarker/ngs/software/glpk/glpk-4.59/bin
export PATH=$SAMTOOLS:$GLPK:$PATH
export HDF5_DIR=/Biomarker/ngs/software/HD5/hdf5-1.8.16-linux-centos7-x86_64-gcc483-shared
export LD_LIBRARY_PATH=/Biomarker/ngs/software/HD5/hdf5-1.8.16-linux-centos7-x86_64-gcc483-shared/lib

/Biomarker/ngs/software/bin/python OptiType-master/OptiTypePipeline.py -i OptiType-master/test/exome/NA11995_SRR766010_1_fished.fastq OptiType-master/test/exome/NA11995_SRR766010_2_fished.fastq --dna --verbose --config OptiType-master/config.ini -o OptiType-master/test/exome/
`

The head of the .raw file looks like this:

c Problem:
c Rows: 450
c Columns: 282
c Non-zeros: 1715
c Status: INTEGER OPTIMAL
c Objective: x282 = 1135.192 (MAXimum)
c
s mip 450 282 o 1135.192
i 1 1
i 2 2
i 3 2
i 4 1
i 5 1
i 6 1
i 7 1
i 8 2
i 9 2
i 10 1

ERROR (at the bottom):

0:00:01.08 Mapping NA11995_SRR766010_1_fished.fastq to GEN reference...

0:00:31.21 Mapping NA11995_SRR766010_2_fished.fastq to GEN reference...

0:00:57.64 Generating binary hit matrix.
0:00:57.66 Loading OptiType-master/test/exome/2016_03_23_16_57_45/2016_03_23_16_57_45_1.bam started. Number of HLA reads loaded (updated every thousand):
1K...
0:01:00.97 1909 reads loaded. Creating dataframe...
0:01:01.22 Dataframes created. Shape: 1909 x 11179, hits: 688669 (1249465), sparsity: 1 in 17.08
0:01:01.60 Loading OptiType-master/test/exome/2016_03_23_16_57_45/2016_03_23_16_57_45_2.bam started. Number of HLA reads loaded (updated every thousand):
1K...
0:01:04.73 1876 reads loaded. Creating dataframe...
0:01:04.92 Dataframes created. Shape: 1876 x 11179, hits: 657359 (1192811), sparsity: 1 in 17.58
0:01:05.67 Alignment pairing completed. 1681 paired, 359 unpaired, 32 discordant

0:01:11.14 temporary pruning of identical rows and columns

0:01:11.32 Size of mtx with unique rows and columns: (496, 776)
0:01:11.32 determining minimal set of non-overshadowed alleles

0:01:13.67 Keeping only the minimal number of required alleles (62,)

0:01:13.67 Creating compact model...

0:01:13.82 Initializing OptiType model...
GLPSOL: GLPK LP/MIP Solver, v4.59
Parameter(s) specified in the command line:
--write /tmp/tmpGZXIuT.glpk.raw --wglp /tmp/tmpmXCoNz.glpk.glp --cpxlp /tmp/tmpWPTOBn.pyomo.lp
Reading problem data from '/tmp/tmpWPTOBn.pyomo.lp'...
/tmp/tmpWPTOBn.pyomo.lp:3620: warning: lower bound of variable 'x1' redefined
/tmp/tmpWPTOBn.pyomo.lp:3620: warning: upper bound of variable 'x1' redefined
450 rows, 282 columns, 1715 non-zeros
171 integer variables, all of which are binary
3791 lines were read
Writing problem data to '/tmp/tmpmXCoNz.glpk.glp'...
3276 lines were written
GLPK Integer Optimizer, v4.59
450 rows, 282 columns, 1715 non-zeros
171 integer variables, all of which are binary
Preprocessing...
2 hidden packing inequaliti(es) were detected
95 hidden covering inequaliti(es) were detected
444 rows, 280 columns, 1705 non-zeros
170 integer variables, all of which are binary
Scaling...
A: min|aij| = 1.000e+00 max|aij| = 6.000e+00 ratio = 6.000e+00
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part is 444
Solving LP relaxation...
GLPK Simplex Optimizer, v4.59
444 rows, 280 columns, 1705 non-zeros
0: obj = -0.000000000e+00 inf = 5.000e+00 (5)
5: obj = -3.000000000e-02 inf = 0.000e+00 (0)

241: obj = 1.135192000e+03 inf = 3.064e-14 (0)
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
241: mip = not found yet <= +inf (1; 0)
241: >>>>> 1.135192000e+03 <= 1.135192000e+03 0.0% (1; 0)
241: mip = 1.135192000e+03 <= tree is empty 0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
Time used: 0.0 secs
Memory used: 0.7 Mb (722870 bytes)
Writing MIP solution to '/tmp/tmpGZXIuT.glpk.raw'...
741 lines were written
invalid literal for int() with base 10: 'c'
WARNING: Solver does not support multi-threading. Please change the config file accordingly. Falling back to single-threading.
GLPSOL: GLPK LP/MIP Solver, v4.59
Parameter(s) specified in the command line:
--write /tmp/tmpz_UceC.glpk.raw --wglp /tmp/tmpW8xrDS.glpk.glp --cpxlp /tmp/tmphE7GB3.pyomo.lp
Reading problem data from '/tmp/tmphE7GB3.pyomo.lp'...
/tmp/tmphE7GB3.pyomo.lp:3620: warning: lower bound of variable 'x1' redefined
/tmp/tmphE7GB3.pyomo.lp:3620: warning: upper bound of variable 'x1' redefined
450 rows, 282 columns, 1715 non-zeros
171 integer variables, all of which are binary
3791 lines were read
Writing problem data to '/tmp/tmpW8xrDS.glpk.glp'...
3276 lines were written
GLPK Integer Optimizer, v4.59
450 rows, 282 columns, 1715 non-zeros
171 integer variables, all of which are binary
Preprocessing...
2 hidden packing inequaliti(es) were detected
95 hidden covering inequaliti(es) were detected
444 rows, 280 columns, 1705 non-zeros
170 integer variables, all of which are binary
Scaling...
A: min|aij| = 1.000e+00 max|aij| = 6.000e+00 ratio = 6.000e+00
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part is 444
Solving LP relaxation...
GLPK Simplex Optimizer, v4.59
444 rows, 280 columns, 1705 non-zeros
0: obj = -0.000000000e+00 inf = 5.000e+00 (5)
5: obj = -3.000000000e-02 inf = 0.000e+00 (0)
241: obj = 1.135192000e+03 inf = 3.064e-14 (0)
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
241: mip = not found yet <= +inf (1; 0)
241: >>>>> 1.135192000e+03 <= 1.135192000e+03 0.0% (1; 0)
241: mip = 1.135192000e+03 <= tree is empty 0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
Time used: 0.0 secs
Memory used: 0.7 Mb (722870 bytes)
Writing MIP solution to '/tmp/tmpz_UceC.glpk.raw'...
741 lines were written
invalid literal for int() with base 10: 'c'
Traceback (most recent call last):
File "OptiType-master/OptiTypePipeline.py", line 374, in
result = op.solve(args.enumerate)
File "/Biomarker/ngs/software/OptiType/OptiType-master/model.py", line 150, in solve
res = self.__solver.solve(self.__instance, options={}, tee=self.__verbosity)
File "/Biomarker/ngs/software/python/latest/lib/python2.7/site-packages/pyomo/opt/base/solvers.py", line 578, in solve
result = self._postsolve()
File "/Biomarker/ngs/software/python/latest/lib/python2.7/site-packages/pyomo/opt/solver/shellcmd.py", line 161, in _postsolve
results = self.process_output(self._rc)
File "/Biomarker/ngs/software/python/latest/lib/python2.7/site-packages/pyomo/opt/solver/shellcmd.py", line 220, in process_output
self.process_soln_file(results)
File "/Biomarker/ngs/software/python/latest/lib/python2.7/site-packages/pyomo/solvers/plugins/solvers/GLPK.py", line 445, in process_soln_file
raise ValueError(msg)
ValueError: Error parsing solution data file, line 1

AssertionError: Index length did not match values

Hello!

I'm running into an error at the "Result dataframe has been constructed..." stage.

Here is he command I used:
python ~/src/OptiType/OptiTypePipeline.py -v -i nebula_finished.fastq --dna -o . &> run_log.txt

My config.ini:

[MAPPING]
#please specify the razerS3 binary path
RAZERS3=/home/ubuntu/src/razers3-3.4.0-Linux-x86_64/bin/razers3
THREADS=8

[LIBRARIES]
RNA_REF=./data/hla_reference_rna.fasta
DNA_REF=./data/hla_reference_dna.fasta
ALLELES=./data/alleles.h5

[OPTIMIZATION]
#the solver has to be supported by Coopr
SOLVER=cbc
THREADS=1

And the run log:

0:00:02.66 Mapping nebula_finished.fastq to GEN reference...

0:05:23.24 Generating binary hit matrix.
0:05:23.24 Loading alleles and read IDs from ./2015_02_23_21_31_27/2015_02_23_21_31_27_0.sam...
0:05:27.01 11179 alleles and 13842 reads found.
0:05:27.01 Initializing mapping matrix...
0:05:27.02 13842x11179 mapping matrix initialized. Populating 4135583 hits from SAM file...
    10% completed
    20% completed
    30% completed
    40% completed
    50% completed
    60% completed
    70% completed
    80% completed
    90% completed
    100% completed
0:49:01.96 4135583 elements filled. Matrix sparsity: 1 in 37.42

0:50:19.10 temporary pruning of identical rows and columns

0:50:19.82 Size of mtx with unique rows and columns: (2163, 1384)
0:50:19.82 determining minimal set of non-overshadowed alleles

0:50:24.99 Keeping only the minimal number of required alleles (184,)

0:50:24.99 Creating compact model...

0:50:25.35 Initializing OptiType model...
WARNING: No construction rule or expression specified for constraint 'c'
Welcome to the CBC MILP Solver 
Version: 2.8.7 
Build Date: Dec 28 2013 

command line - /usr/bin/cbc -printingOptions all -import /tmp/tmp8qUjdt.pyomo.lp -import -stat=1 -solve -solu /tmp/tmp8qUjdt.pyomo.soln (default strategy 1)
Option for printingOptions changed from normal to all
Coin0009I  CoinLpIO::readLp(): Maximization problem reformulated as minimization
Current default (if $ as parameter) for import is /tmp/tmp8qUjdt.pyomo.lp
Presolve 2401 (-1) rows, 1379 (-1) columns and 19535 (-1) elements
Statistics for presolved model


Problem has 2401 rows, 1379 columns (1323 with objective) and 19535 elements
Column breakdown:
597 of type 0.0->inf, 1 of type 0.0->up, 0 of type lo->inf, 
0 of type lo->up, 0 of type free, 0 of type fixed, 
0 of type -inf->0.0, 0 of type -inf->up, 781 of type 0.0->1.0 
Row breakdown:
0 of type E 0.0, 0 of type E 1.0, 0 of type E -1.0, 
0 of type E other, 0 of type G 0.0, 6 of type G 1.0, 
0 of type G other, 1791 of type L 0.0, 0 of type L 1.0, 
604 of type L other, 0 of type Range 0.0->1.0, 0 of type Range other, 
0 of type Free 
Continuous objective value is -6923.74 - 0.04 seconds
Cgl0004I processed model has 2395 rows, 1379 columns (781 integer) and 19351 elements
Cbc0038I Solution found of -6923.74
Cbc0038I Before mini branch and bound, 781 integers at bound fixed and 26 continuous
Cbc0038I Mini branch and bound did not improve solution (0.08 seconds)
Cbc0038I After 0.08 seconds - Feasibility pump exiting with objective of -6923.74 - took 0.01 seconds
Cbc0012I Integer solution of -6923.74 found by feasibility pump after 0 iterations and 0 nodes (0.08 seconds)
Cbc0001I Search completed - best objective -6923.739999999943, took 0 iterations and 0 nodes (0.09 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from -6923.74 to -6923.74
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                -6923.74000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.11
Time (Wallclock seconds):       0.11

Total time (CPU seconds):       0.12   (Wallclock seconds):       0.12


0:50:26.55 Result dataframe has been constructed...
Traceback (most recent call last):
  File "/home/ubuntu/src/OptiType/OptiTypePipeline.py", line 325, in <module>
    coverage_mat = ht.calculate_coverage(plot_variables, features, hlatype, features_used)
  File "/home/ubuntu/src/OptiType/hlatyper.py", line 505, in calculate_coverage
    hit_counts[reads]):
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 641, in __getitem__
    return self._get_with(key)
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 688, in _get_with
    return self.reindex(key)
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 2646, in reindex
    return self._reindex_with_indexers(new_index, indexer, copy=copy, fill_value=fill_value)
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 2650, in _reindex_with_indexers
    return Series(new_values, index=index, name=self.name)
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 492, in __new__
    subarr.index = index
  File "properties.pyx", line 74, in pandas.lib.SeriesIndex.__set__ (pandas/lib.c:29541)
AssertionError: Index length did not match values

Error loading 'pyutilib.component' entry points: 'type object 'PluginGlobals' has no attribute 'push_env

Hi
I am trying to use OptiType and get the error:

Error loading 'pyutilib.component' entry points: 'type object 'PluginGlobals' has no attribute 'push_env''
Traceback (most recent call last):
File "OptiTypePipeline.py", line 124, in
from model import OptiType
File "/home/usr/OptiType/model.py", line 19, in
from pyomo.environ import ConcreteModel, Set, Param, Var, Binary, Objective, Constraint, ConstraintList, maximize
File "/home/python2.7/site-packages/pyomo/environ/init.py", line 14, in
from pyomo.core import *
File "/home/python2.7/site-packages/pyomo/core/init.py", line 11, in
from pyomo.util.plugin import PluginGlobals
File "/home/python2.7/site-packages/pyomo/util/init.py", line 11, in
from pyomo.util._task import pyomo_api, PyomoAPIData, PyomoAPIFactory
File "/home/python2.7/site-packages/pyomo/util/_task.py", line 21, in
import pyutilib.workflow
File "/home/python2.7/site-packages/pyutilib/workflow/init.py", line 11, in
pyutilib.component.core.PluginGlobals.push_env("pyutilib.workflow")
AttributeError: type object 'PluginGlobals' has no attribute 'push_env'

All dependencies are already installed
Thanks for the help
Eli

Segmentation fault (core dumped)

OptiType was installed with conda install optitype , and it can print help message, but cannot deal with small data( HLA_1.fastq = 31k and HLA1_2.fastq = 31k).
The command is python OptiTypePipeline.py --input HLA1_1.fastq HLA1_2.fastq --rna -v -o ./test/rna/
and error information is as follows:

mapping with 4 threads...
0:00:04.45 Mapping HLA1_1.fastq to NUC reference...
0:00:10.85 Mapping HLA1_2.fastq to NUC reference...
Segmentation fault (core dumped)

how can I deal with this? thanks

Error pyomo

Hi.

I'm trying to use OptiType, but I get some warnings and errors. I assume that is related to pyomo, but not sure how can I solve it. Downgrade the version? I'm using version 5.2

WARNING: Constant objective detected, replacing with a placeholder to prevent solver failure.
ERROR: Expecting 's' row after 'c' rows
WARNING: Solver does not support multi-threading. Please change the config file accordingly. Falling back to single-threading.
WARNING: Constant objective detected, replacing with a placeholder to prevent solver failure.
ERROR: Expecting 's' row after 'c' rows
Traceback (most recent call last):
  File "/home/.local/lib/python3.5/site-packages/pyomo/solvers/plugins/solvers/GLPK.py", line 351, in process_soln_file
    raise ValueError("Expecting 's' row after 'c' rows")
ValueError: Expecting 's' row after 'c' rows

Thanks for your time.

Getting a non in index error

Hello,
Optitype keeps giving me the same error for a group of samples. This is single end, 100bp data. Normally the software works great for me so I don't think it's my version/setup. I can send you ~1Mb fastq file to recreate if you like.
Command

python OptiTypePipeline.py -i ${path}/temp2_fished1.fastq --rna --o ${path} -v

Error:

Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part = 408
Solving LP relaxation...
GLPK Simplex Optimizer, v4.49
408 rows, 248 columns, 2006 non-zeros
      0: obj =   0.000000000e+00  infeas =  1.000e+00 (0)
*     1: obj =   0.000000000e+00  infeas =  0.000e+00 (0)
*   192: obj =   3.090929000e+03  infeas =  0.000e+00 (0)
OPTIMAL SOLUTION FOUND
Integer optimization begins...
+   192: mip =     not found yet <=              +inf        (1; 0)
+   192: >>>>>   3.090929000e+03 <=   3.090929000e+03   0.0% (1; 0)
+   192: mip =   3.090929000e+03 <=     tree is empty   0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
Time used:   0.0 secs
Memory used: 0.7 Mb (727283 bytes)
Writing MIP solution to `/tmp/tmpGkiOZD.glpk.raw'...
672 lines were written

0:14:51.33 Result dataframe has been constructed...
Traceback (most recent call last):
  File "OptiTypePipeline.py", line 315, in <module>
    r = result_4digit[["A1", "A2", "B1", "B2", "C1", "C2", "nof_reads", "obj"]]
  File "~/common/python/2.7.6/lib/python2.7/site-packages/pandas/core/frame.py", line 1781, in __getitem__
    return self._getitem_array(key)
  File "~/common/python/2.7.6/lib/python2.7/site-packages/pandas/core/frame.py", line 1825, in _getitem_array
    indexer = self.ix._convert_to_indexer(key, axis=1)
  File "~/common/python/2.7.6/lib/python2.7/site-packages/pandas/core/indexing.py", line 1140, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['B1' 'B2'] not in index"

Can I use already mapped and SNP calling VCF to do OptiType?

This is Yijia Li from Yunnan Province Stem Cell Bank, China.
We are very interested using your OptiType to do HLA typing.
However may I ask if I can use already mapped and SNP calling VCF to do that without using the original fastq file?
Thanks very much.

HLA class II prediction

Hello again,

The OptiType paper says that "[OptiType] can be easily adapted to predict genotypes for loci other than HLA-I such as HLA-II". I was wondering if it's just a matter of changing some parameters in OptiTypePipeline.py or is something different. In any case, I am very interested in predicting HLA class II, so I was wondering how achievable this would be.

Best,
Alejandro

IOError

Hi,

I was wondering if you could please help. Have attempted to install and run Optitype as per the guidance. Unfortunately this error is generated:

python /Users/markg14/software/OptiType-master/OptiTypePipeline.py -i ./test/rna/CRC_81_N_1_fished.fastq ./test/rna/CRC_81_N_2_fished.fastq --rna -v -o ./test/rna/

mapping with 4 threads...

0:00:02.83 Mapping CRC_81_N_1_fished.fastq to NUC reference...

0:00:04.49 Mapping CRC_81_N_2_fished.fastq to NUC reference...

0:00:05.09 Generating binary hit matrix.
Traceback (most recent call last):
File "/Users/markg14/software/OptiType-master/OptiTypePipeline.py", line 298, in
pos, read_details = ht.pysam_to_hdf(bam_paths[0])
File "/Users/markg14/software/OptiType-master/hlatyper.py", line 186, in pysam_to_hdf
sam = pysam.AlignmentFile(samfile, sam_or_bam)
File "pysam/libcalignmentfile.pyx", line 351, in pysam.libcalignmentfile.AlignmentFile.cinit (pysam/libcalignmentfile.c:5200)
File "pysam/libcalignmentfile.pyx", line 544, in pysam.libcalignmentfile.AlignmentFile._open (pysam/libcalignmentfile.c:7366)
IOError: file ./test/rna/2017_03_02_15_59_39/2017_03_02_15_59_39_1.bam not found

Any help would be greatly appreciated.

Thank you

Mark

Docker container is not building

hdf5 resources are inavailable

Optitype error with gzipped fastqs

Hi @andras86,

I realized that razers3.4 has issues with gzipped fastqs due to seqan mentioned in #987. Is there other alignments that I can use in place of razers for optitype? It seems like razers3 is not maintained anymore.

fred-2 / optitype Goto Github PK

optitype's People

Contributors

Stargazers

Watchers

Forkers

optitype's Issues

please specify the razerS3 binary path

the solver has to be supported by Coopr

Some background

0 packages

1 run the example command on Ubuntu 14.04 LTS

2 Warnings and Errors:

3 Results

!/bin/bash

Recommend Projects

Recommend Topics

Recommend Org