clicumu / doepipeline Goto Github PK
View Code? Open in Web Editor NEWA python package for optimizing processing pipelines using statistical design of experiments (DoE).
License: MIT License
A python package for optimizing processing pipelines using statistical design of experiments (DoE).
License: MIT License
This is indirectly associated to doepipeline but we can not open issues on a fork so I open it here.
I was trying to install pyDOE2 using python 2.7.14 in a conda environment but I get the following error:
$ python setup.py install
File "setup.py", line 11
SyntaxError: Non-ASCII character '\xc3' in file setup.py on line 11, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
This due to the special letter in the family name. I changed it and then it complains about the encoding argument that I think was introduced in python 3.
$ python setup.py install
Traceback (most recent call last):
File "setup.py", line 16, in <module>
long_description=read('README.md'),
File "setup.py", line 5, in read
with open(fname, encoding=encoding) as f:
TypeError: 'encoding' is an invalid keyword argument for this function
Removing it solves the issue and pyDOE is installed but without considering the encoding.
I was asked about the possibility to set environment variables when running remotely. This does not currently work at all since paramiko executes each command in an isolated session.
I suggest this is solved by letting environment variable setting and setup script execution be prepended at each execution command in a similar manner as directory change is now:
if execution_dir:
cd = [path for path in execution_dir
if not 'cd {path}'.format(path=path) in command]
prefix = '. ./.bash_profile; cd {path};'.format(path=posixpath.join(*cd))
else:
prefix = '. ./.bash_profile;'
full_command = prefix + command
Reading of .bash_profile could also be then specified as a "setup-script". This would be a generalization of current functionality, since now the package assumes that BASH is the terminal used at the server.
BaseSSHExecutor
can simply override BasePipelineExecutor._set_env_variables
to build a prefix that can be fetched in execute_command
.
Sidenote:
I also think that the remote-executor could override BasePipelineExecutor._cd
in a similar manner to instead keep check of current location which can be fetched as a script-prefix. This would tidy up the now butt-ugly conditional checking in BaseSSHExecutor.execute_command
.
I realized by a typo that the predicted optima might be stuck outside the tested values if init_min/init_max values are outside the range of min and max. We should have a test against that to avoid mistakes in config file.
Great package. I've just begun tinkering at a very low level to make sure my intuition reflects what the code is doing. I've set a two factor experiment with a normal distribution for a response. For example;
# standard
import pandas as pd
import sys,os
import pylab as plt
import numpy as np
# plotting
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.colors as colors
import matplotlib.cm as cmx
# stats
from scipy.stats import multivariate_normal
# doe
from doepipeline.designer import ExperimentDesigner
import doepipeline
# logging
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def generate_response(dfi,rv):
return 1000*rv.pdf(dfi)
number_of_iterations = 5 # number of iterations to try
mu_x = 60
variance_x = 500
mu_y = 75
variance_y = 500
minx = 40
maxx = 100
miny = 45
maxy = 110
responses = {"fraction":{"criterion":"maximize"}}
factors = {
"A": {
"min":10,
"max": 150,
"low_init": minx,
"high_init": maxx,
},
"B": {
"min":10,
"max":150,
"low_init":miny,
"high_init":maxy,
}
}
# create grid and multivariate normal
x = np.linspace(minx,maxx,100)
y = np.linspace(miny,maxy,100)
X, Y = np.meshgrid(x,y)
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X; pos[:, :, 1] = Y
rv = multivariate_normal([mu_x, mu_y], [[variance_x, 0], [0, variance_y]])
Z = generate_response(pos,rv)
# view response surface for factors A & B
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X,Y,Z)
ax.set_xlabel("A")
ax.set_ylabel("B")
ax.set_zlabel("response")
fig.savefig("response.png",bbox_inches='tight',dpi=300)
plt.show()
I then now want to iteratively optimize as per the doepiepline to hopefully converge on the solution set by mu_x and mu_y.
exp = ExperimentDesigner(factors,
'fullfactorial2levels',
responses,
model_selection='greedy',
skip_screening=True,
shrinkage=0.9)
# create first design (skipping screening)
df = exp.new_design()
dfstart = df.copy()
factors =[]
optimal = []
designs = []
best = []
designs.append(df)
for niters in range(number_of_iterations):
r_0 = generate_response(df,rv)
fractioni = pd.DataFrame.from_dict({"fraction":r_0})
bi = exp.get_best_experiment(df,fractioni)
fi = exp.update_factors_from_optimum(bi)
opti = exp.get_optimal_settings(fractioni)
df = exp.new_design()
best.append(bi)
factors.append(fi)
designs.append(df)
print("Iteration",niters+1)
Now I create a simple function to loop through each experimental design at each step. I made one or two modifications to return the model and optima to inspect interactively in my notebook.
def plot_search(dflist,Zin,optima,expi):
fig,ax = plt.subplots(figsize=(10,8))
falpha = np.linspace(0.3,0.9,len(dflist))[::-1]
falphar = np.linspace(0.3,0.9,len(dflist))
plt.plot(dfstart['A'],dfstart['B'],'rs',ms=5,label='start')
jet = cm = plt.get_cmap('viridis')
cNorm = colors.Normalize(vmin=0, vmax=len(dflist))
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
xyb = []
ax.plot(mu_x,mu_y,'yo',label='peak response (optimal)')
cbar = ax.imshow(Z,origin='lower',cmap='Blues',extent=[minx,maxx,miny,maxy],aspect='auto')
for idx,dfin in enumerate(dflist):
# set rectangle locations based on new design
width = np.max(dfin["A"])-np.min(dfin["A"])
height = np.max(dfin["B"])-np.min(dfin["B"])
corner_y = np.min(dfin["B"])
corner_x = np.min(dfin["A"])
colorVal = scalarMap.to_rgba(idx)
if idx != 0:
rect = plt.Rectangle((corner_x,corner_y),width,height,linewidth=3,edgecolor=colorVal,facecolor='none')
else:
rect = plt.Rectangle((corner_x,corner_y),width,height,ls='--',linewidth=2,edgecolor='r',facecolor='none')
ax.add_patch(rect)
# obtain the best factors
xyb.append([best[idx-1]['factor_settings']["A"],best[idx-1]['factor_settings']["B"]])
# label the iteration in the center of each square
ax.text(corner_x+width/2.,corner_y+height/2.,idx+1,fontsize=15,fontweight='bold',color=colorVal,va='center',ha='center')
xyb = np.array(xyb)
best_exp = expi._best_experiment
plt.plot(best_exp['optimal_x']['A'],best_exp['optimal_x']['B'],'gx',ms=15,label='best')
ax.set_xlabel("A")
ax.set_ylabel("B")
plt.colorbar(cbar)
plt.xlim([minx,maxx])
plt.ylim([miny,maxy])
plt.subplots_adjust(left=0.1,right=0.99,top=0.9,bottom=0.1)
plt.savefig("search.png",bbox_inches='tight',dpi=300)
plt.grid()
# plot each experimental design proposed
plot_search(designs,Z,[mu_x,mu_y],exp)
r_0 = generate_response(df,rv)
fractioni = pd.DataFrame.from_dict({"fraction":r_0})
dfoptimal,model,prediction = exp.get_optimal_settings(fractioni)
plt.legend(loc='best')
plt.savefig("search.png",bbox_inches='tight',dpi=300)
plt.show()
The dashed line is where the first initial guess is then each 1,2,3,4,5,6 is each iteration. The yellow dot being the hard set desired optimal to be found.
A few questions:
exp.get_best_experiment
?get_best_experiment
and get_optimal get_optimal_settings
only can provide experimental conditions that have already been tested and not actually interpolated optimal responses? I would expect after maybe three or four iterations for the system to predict the optimal to be closer to the true optimal, no? I guess I'm a bit confused over the nomenclature.model.summary()
being quite desirable, it still didn't converge (see below).designer.py
at _new_optimization_design(self) I can't seem to see how it uses OLS to generate a new set of experimental settings, at least in my intuitive arrangement here: fractioni = pd.DataFrame.from_dict({"fraction":r_0})
bi = exp.get_best_experiment(df,fractioni)
fi = exp.update_factors_from_optimum(bi)
opti = exp.get_optimal_settings(fractioni)
df = exp.new_design()
Lastly, I get the following:
Given mu_x = 60
and mu_y = 75
, I'm just wondering if I've done something wrong, set it up incorrectly or taken the pipeline into an area it isn't best suited. Apologies for any silly errors, just trying to understand what's going on as the examples provided have a certain overhead to getting started. A very simple purely Pythonic example would be greatly appreciated. Thanks for any help you might provide.
We discussed earlier to find a better way of submitting/distributing jobs. I have used snakemake quite a lot and I think it could be an option to use their API instead to get a more stable solution. They have support for several schedulers plus kubernetes.
I include a small example I found below
#!/usr/bin/env python3
"""
rule all:
input:
"reads.counts"
rule unpack_fastq:
'''Unpack a FASTQ file'''
output: "{file}.fastq"
input: "{file}.fastq.gz"
resources: time=60, mem=100
params: "{file}.params"
threads: 8
log: 'unpack.log'
shell:
'''zcat {input} > {output}
echo finished 1>&2 {log}
'''
rule count:
'''Count reads in a FASTQ file'''
output: counts="{file}.counts"
input: fastq="{file}.fastq"
run:
n = 0
with open(input.fastq) as f:
for _ in f:
n += 1
with open(output.counts, 'w') as f:
print(n / 4, file=f)
"""
In pure python this is equivalent to the following code.
workflow.include("pipeline.conf")
shell.prefix("set -euo pipefail;")
@workflow.rule(name='all', lineno=6, snakefile='.../Snakefile')
@workflow.input("reads.counts")
@workflow.norun()
@workflow.run
def __all(input, output, params, wildcards, threads, resources, log, version):
pass
@workflow.rule(name='unpack_fastq', lineno=17, snakefile='.../Snakefile')
@workflow.docstring("""Unpack a FASTQ file""")
@workflow.output("{file}.fastq")
@workflow.input("{file}.fastq.gz")
@workflow.resources(time=60, mem=100)
@workflow.params("{file}.params")
@workflow.threads(8)
@workflow.log('unpack.log')
@workflow.shellcmd(
"""zcat {input} > {output}
echo finished 1>&2 {log}
"""
)
@workflow.run
def __unpack_fastq(input, output, params, wildcards, threads, resources, log, version):
shell("""zcat {input} > {output}
echo finished 1>&2 > {log}
"""
)
@workflow.rule(name='count', lineno=52, snakefile='.../Snakefile')
@workflow.docstring("""Count reads in a FASTQ file""")
@workflow.output(counts = "{file}.counts")
@workflow.input(fastq = "{file}.fastq")
@workflow.run
def __count(input, output, params, wildcards, threads, resources, log, version):
n = 0
with open(input.fastq) as f:
for _ in f:
n += 1
with open(output.counts, 'w') as f:
print(n / 4, file=f)
### End of output from snakemake --print-compilation
workflow.check()
print("Dry run first ...")
workflow.execute(dryrun=True, updated_files=[])
print("And now for real")
workflow.execute(dryrun=False, updated_files=[], resources=dict())
Another option that I have used earlier is ipython-cluster-helper but it probably other options available.
Modified the test pipeline in an attempt to run the pipeline locally on the KAW server.
Modified batch_execution.py:
from doepipeline.generator import PipelineGenerator
#from doepipeline.executor import SSHExecutor
from doepipeline.executor import LocalExecutor
import os
import yaml
if __name__ == '__main__':
generator = PipelineGenerator.from_yaml('/media/data/daniel/doe/doepipeline_testscript/pipeline.yaml')
designer = generator.new_designer_from_config()
design = designer.new_design()
pipeline = generator.new_pipeline_collection(design)
executor = LocalExecutor(workdir='test_pipeline', execution_type='serial', base_command='nohup {script}')
executor.execute_command('cd test_pipeline; ls | grep "[0-9]" | xargs rm -r')
results = executor.run_pipeline_collection(pipeline)
optimum = designer.update_factors_from_response(results)
pass
$ (doe_pipeline)daniel@kaw:/media/data/daniel/doe/doepipeline_testscript$ python batch_execution.py
Traceback (most recent call last):
File "/media/data/db/data/anaconda/envs/doe_pipeline/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/generator.py", line 22, in __init__
self._validate_config(config)
File "/media/data/db/data/anaconda/envs/doe_pipeline/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/generator.py", line 221, in _validate_config
'job specified with SLURM but SLURM project-name is missing'
AssertionError: job specified with SLURM but SLURM project-name is missing
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "batch_execution.py", line 10, in <module>
generator = PipelineGenerator.from_yaml('/media/data/daniel/doe/doepipeline_testscript/pipeline.yaml')
File "/media/data/db/data/anaconda/envs/doe_pipeline/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/generator.py", line 55, in from_yaml
return cls(config, *args, **kwargs)
File "/media/data/db/data/anaconda/envs/doe_pipeline/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/generator.py", line 24, in __init__
raise ValueError('Invalid config: ' + str(e))
ValueError: Invalid config: job specified with SLURM but SLURM project-name is missing
Does this boil down to line 220 in generator.py?
assert (any('SLURM' in job.keys() for job in jobs) and 'SLURM' in config_dict),\
'job specified with SLURM but SLURM project-name is missing'
The YAML file I use does not contain 'SLURM'.
I run into the following error. I'm using the same version of pyDOE as before
Traceback (most recent call last):
File "/media/data/daniel/doe_2018/doepipeline/bin/doepipeline", line 76, in <module>
designer = generator.new_designer_from_config()
File "/media/data/db/data/anaconda2/envs/doepipeline/lib/python3.6/site-packages/doepipeline-0.1-py3.6.egg/doepipeline/generator.py", line 77, in new_designer_from_config
return designer_class(factors, design_type, responses, *args, **kwargs)
File "/media/data/db/data/anaconda2/envs/doepipeline/lib/python3.6/site-packages/doepipeline-0.1-py3.6.egg/doepipeline/designer.py", line 179, in __init__
self._design_matrix = matrix_designer(n)
File "/media/data/db/data/anaconda2/envs/doepipeline/lib/python3.6/site-packages/doepipeline-0.1-py3.6.egg/doepipeline/designer.py", line 167, in <lambda>
'ccf': lambda n: pyDOE.ccdesign(n, (0, 1), face='ccf'),
File "/media/data/db/data/anaconda2/envs/doepipeline/lib/python3.6/site-packages/pyDOE/doe_composite.py", line 147, in ccdesign
H1 = ff2n(n)
File "/media/data/db/data/anaconda2/envs/doepipeline/lib/python3.6/site-packages/pyDOE/doe_factorial.py", line 115, in ff2n
return 2*fullfact([2]*n) - 1
File "/media/data/db/data/anaconda2/envs/doepipeline/lib/python3.6/site-packages/pyDOE/doe_factorial.py", line 78, in fullfact
rng = lvl*range_repeat
TypeError: 'numpy.float64' object cannot be interpreted as an integer
It would be convenient to easily see which version that is installed usign doepipeline --version
UPPMAX seem to cap the number of possible qos=short jobs that you can have in the queue. I think this number is set to 10. I receive the following error when trying to submit a CCC design with three factors (14 exp):
$ python gatk_snp_execute.py
Traceback (most recent call last):
File "gatk_snp_execute.py", line 18, in <module>
results = executor.run_pipeline_collection(pipeline)
File "/media/data/db/data/anaconda/envs/doe_pipeline/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/executor/base.py", line 199, in run_pipeline_collection
self.run_jobs(job_steps, experiment_index, env_variables, **kwargs)
File "/media/data/db/data/anaconda/envs/doe_pipeline/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/executor/mixins.py", line 334, in run_jobs
_, stdout, _ = self.execute_command(command, job_name=exp_name)
File "/media/data/db/data/anaconda/envs/doe_pipeline/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/executor/remote.py", line 159, in execute_command
raise CommandError('\n'.join(err))
doepipeline.executor.base.CommandError: sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
This should be possible to fix with a try/except-clause when submitting the jobs. If the job submission is rejected the job should remain in an internal queue for submission later.
There's been trouble running a pipeline remotely on UPPMAX. After a few iterations the following error usually pops up. Should be fixed by a reconnect I suspect.
Socket exception: An existing connection was forcibly closed by the remote host (10054)
Traceback (most recent call last):
File "manta_sv.py", line 49, in <module>
results = executor.run_pipeline_collection(pipeline)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\base.py", line 200, in run_pipeline_collection
self.run_jobs(job_steps, experiment_index, env_variables, **kwargs)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\mixins.py", line 353, in run_jobs
self.wait_until_current_jobs_are_finished()
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\base.py", line 239, in wait_until_current_jobs_are_finished
status, msg = self.poll_jobs()
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\remote.py", line 247, in poll_jobs
return mixins.SlurmExecutorMixin.poll_jobs(self)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\mixins.py", line 377, in poll_jobs
__, stdout, __ = self.execute_command(cmd)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\remote.py", line 153, in execute_command
stdin, stdout, stderr = self._client.exec_command(full_command)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\paramiko\client.py", line 341, in exec_command
chan = self._transport.open_session(timeout=timeout)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\paramiko\transport.py", line 618, in open_session
timeout=timeout)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\paramiko\transport.py", line 739, in open_channel
raise e
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\paramiko\transport.py", line 1608, in run
ptype, m = self.packetizer.read_message()
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\paramiko\packet.py", line 386, in read_message
header = self.read_all(self.__block_size_in, check_rekey=True)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\paramiko\packet.py", line 249, in read_all
x = self.__socket.recv(n)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Changes made in this commit introduces a discrepancy between what arguments are actually used for an experiment, and what arguments doepipeline thinks were used. All factor values are rounded and turned into ints before substituting them into the template script. This because HaplotypeCaller (and probably more softwares) take int arguments and cannot handle floats.
I think in the first round we could try the following setup for the scaffolding optimization.
Organism
Assembly
Scaffolder
I have some problem to get my pipeline to work correctly using slurm. The same pipeline works nicly using local executor in serial mode. Using Uppmax (/proj/nobackup/b2015353/scaffolding/) with the those files.
The output indicates that the job failed but it seems that it finished correctly.
Andreas-MBP-6:scaffolding_optimization andreassjodin$ python links_execute_1.py /Users/andreassjodin/anaconda/lib/python3.5/site-packages/pyDOE-0.3.8-py3.5.egg/pyDOE/doe_factorial.py:78: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future design: KMER DVALUE 0 15.0 1000.0 1 25.0 1000.0 2 15.0 4000.0 3 25.0 4000.0 4 15.0 2500.0 5 25.0 2500.0 6 20.0 1000.0 7 20.0 4000.0 8 20.0 2500.0 LINKSScaffolder_exp_7 has failed. (exit code 127:0) Traceback (most recent call last): File "links_execute_1.py", line 19, in results = executor.run_pipeline_collection(pipeline) File "/Users/andreassjodin/anaconda/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/executor/base.py", line 199, in run_pipeline_collection self.run_jobs(job_steps, experiment_index, env_variables, **kwargs) File "/Users/andreassjodin/anaconda/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/executor/mixins.py", line 349, in run_jobs self.wait_until_current_jobs_are_finished() File "/Users/andreassjodin/anaconda/lib/python3.5/site-packages/doepipeline-0.1-py3.5.egg/doepipeline/executor/base.py", line 246, in wait_until_current_jobs_are_finished raise PipelineRunFailed(msg) doepipeline.executor.base.PipelineRunFailed: LINKSScaffolder_exp_7 has failed. (exit code 127:0)
Not sure what I did wrong so I would be helpful with advice how to fix it.
I'm currently trying to optimize parameters for Manta, a structural variant caller. There's a recurring issue that experiments fail, for example experiment 19 below:
RunManta_exp_19 has failed. (exit code 1:0)
Traceback (most recent call last):
File "manta_sv.py", line 31, in <module>
results = executor.run_pipeline_collection(pipeline)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\base.py", line 199, in run_pipeline_collection
self.run_jobs(job_steps, experiment_index, env_variables, **kwargs)
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\mixins.py", line 349, in run_jobs
self.wait_until_current_jobs_are_finished()
File "C:\Users\dasw0002\AppData\Local\Continuum\Anaconda\envs\doepipeline\lib\site-packages\doepipeline-0.1-py3.5.egg\doepipeline\executor\base.py", line 246, in wait_until_current_jobs_are_finished
raise PipelineRunFailed(msg)
doepipeline.executor.base.PipelineRunFailed: RunManta_exp_19 has failed. (exit code 1:0)
Checking the Manta log file I can see this:
[2016-09-27T13:24:09.175970] [m196.uppmax.uu.se] [61581_1] [TaskManager] Completed command task: 'generateCandidateSV_0066' launched from master workflow
[2016-09-27T13:24:52.698109] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] Unhandled Exception in TaskManager-Thread
[2016-09-27T13:24:52.909386] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] Traceback (most recent call last):
[2016-09-27T13:24:52.910425] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 1660, in run
[2016-09-27T13:24:52.911376] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] self._startTasks()
[2016-09-27T13:24:52.912096] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 526, in wrapped
[2016-09-27T13:24:52.912850] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] return f(self, *args, **kw)
[2016-09-27T13:24:52.913684] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 1818, in _startTasks
[2016-09-27T13:24:52.914829] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] self._launchTask(task)
[2016-09-27T13:24:52.916007] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 1762, in _launchTask
[2016-09-27T13:24:52.917214] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] trun = self._getCommandTaskRunner(task)
[2016-09-27T13:24:52.918028] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 1745, in _getCommandTaskRunner
[2016-09-27T13:24:52.918808] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] task.setRunstate)
[2016-09-27T13:24:52.919517] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 1137, in __init__
[2016-09-27T13:24:52.920267] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] BaseTaskRunner.__init__(self, runStatus, taskStr, sharedFlowLog, setRunstate)
[2016-09-27T13:24:52.921161] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 1041, in __init__
[2016-09-27T13:24:52.921949] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] self.setInitialRunstate()
[2016-09-27T13:24:52.922960] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 1079, in setInitialRunstate
[2016-09-27T13:24:52.923728] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] self.setRunstate("running")
[2016-09-27T13:24:52.924557] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 1076, in setRunstate
[2016-09-27T13:24:52.925421] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] self._setRunstate(*args, **kw)
[2016-09-27T13:24:52.926591] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 526, in wrapped
[2016-09-27T13:24:52.927825] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] return f(self, *args, **kw)
[2016-09-27T13:24:52.928734] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 2110, in setRunstate
[2016-09-27T13:24:52.929669] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] self.tdag.writeTaskStatus()
[2016-09-27T13:24:52.930562] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 526, in wrapped
[2016-09-27T13:24:52.931612] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] return f(self, *args, **kw)
[2016-09-27T13:24:52.932358] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 2475, in writeTaskStatus
[2016-09-27T13:24:52.933449] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] forceRename(tmpFile, self.taskStateFile)
[2016-09-27T13:24:52.934858] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] File "/sw/apps/bioinfo/manta/1.0.0/milou/lib/python/pyflow/pyflow.py", line 170, in forceRename
[2016-09-27T13:24:52.935823] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] os.rename(src,dst)
[2016-09-27T13:24:52.936617] [m196.uppmax.uu.se] [61581_1] [TaskManager] [ERROR] OSError: [Errno 2] No such file or directory
[2016-09-27T13:25:07.711376] [m196.uppmax.uu.se] [61581_1] [WorkflowRunner] [ERROR] Workflow terminated due to unhandled exception in TaskManager
I believe there should be some kind of error-checking feature of doepipeline that detects that an experiment has failed and restarts it. I think it not too unlikely that this kind of spontaneous failing is restricted only to Manta, and could be a major issue for the usability of doepipeline in a range of different optimization problems.
For the other kind of problem, where the site you are performing your experiments at (in this case Uppmax) becomes unavailable, whether it being due to connection trouble or a planned down-time of the resource, I think there needs to be a feature for the user to resume the optimization after the last completed iteration.
/Daniel
I have tested the code and I think we need a replacement for the quick fix f5298d6. To many software are quite picky to be feeded by the correct type. I think the best solution, as suggested by @RickardSjogren, would be to add a int/float definition field to the yaml file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.