Giter Club home page Giter Club logo

deepsignal's People

Contributors

huangnengcsu avatar pengni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deepsignal's Issues

Error when using newly trained model

Hi Peng,

I had the following error when using my last custom models for calling methylation :

2020-07-15 10:42:05.610543: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key BDGRU_rnn/bw/multi_rnn_cell/cell_0/lstm_cell/bias not found in 
checkpoint
Process Process-2:
Traceback (most recent call last):
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key BDGRU_rnn/bw/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/devic
e:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/deepsignal/call_modifications.py", line 168, in _call_mods_q
    saver.restore(sess, model_path)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key BDGRU_rnn/bw/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/devic
e:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/bin/deepsignal", line 11, in <module>
    sys.exit(main())
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/deepsignal/deepsignal.py", line 353, in main
    args.func(args)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/deepsignal/deepsignal.py", line 89, in main_call_mods
    batch_size, learning_rate, class_num, nproc, is_gpu, f5_args)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/deepsignal/call_modifications.py", line 409, in call_mods
    p.start()
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/popen_fork.py", line 74, in _launch
    code = process_obj._bootstrap()
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/tools/python/3.6.3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/deepsignal/call_modifications.py", line 167, in _call_mods_q
    saver = tf.train.Saver()
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in __init__
    self.build()
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps
    restore_sequentially)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6/deepsignal-0.1.6_venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key BDGRU_rnn/bw/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/devic
e:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

It produces this error but jobs are still running in background (with no activity it appears). It is the first time I encountered this error.

Thanks for your support,

Paul

HX1 Data availability

Hi @PengNi ,
I was wondering if your HX1 nanopore sequencing data is available somewhere?
I could not find the link to data from the paper.

Thanks,
Vahid.

Results of extract features step

Hi, I am now trying to reproduce your example. According to your guidance, at the extract features step, the output file will be a tab-delimited text file. And I am confused with two terms of this file, "strand" and "read_strand", what is the difference between them? For my trying, the "read_strand" is all t in the output file, is it something that I missed?

Get position of targeted base in reads event table

Hello PengNi,

I would like to investigate the signal stored in the event table of resquiggled reads.
In that extent I thought starting from the extracted_features file would be my best shot.
However I miss one information, it is the position of the targeted base in the event table of each reads. For example this could be the start and end position of the 17 k-mer in the event table.

Do you know how/where I could recover this information ?

Could it be added to the extracted_features files ?

Thanks a lot,

Paul

Add warning or more test when deleting content of model directory

Hi there,

I'm opening this issue because I think these lines in the train_model.py file might be dangerous.

The first time I used this script I wanted to save the custom model at the root of my deepsignal working directory so I set --model_dir value to . and it did delete everything in my current folder without any explanation (hopefully I recovered everything).

Best,
Paul

model_path

So when I try to run:
deepsignal call_mods --input_path fast5s.al.CpG.signal_features.17bases.rawsignals_360.tsv --model_path model.CpG.R9.4_1D.human_hx1.bn17.sn360/bn_17.sn_360.epoch_7.ckpt --result_file fast5s.al.CpG.call_mods.tsv --nproc 10 --is_gpu no
the model path is wrong. From the trainer I downloaded form:
http://bioinformatics.csu.edu.cn/resources/softs/nipeng/DeepSignal/index.html
I get a file named:
model.CpG.R9.4_1D.human_hx1.bn17.sn360
but when I go into the file there is no file named:
bn_17.sn_360.epoch_7.ckpt
witch of the files should I be using for the second half of the model_path?
Screenshot from 2019-05-30 11-49-58

Meaning of cent_signals

Hi,

May I ask what does cent_signal mean exactly?

Also, with reference to your extract features module, how was the signal mean, std and signal len calculated for each base in each k-mer after resquiggle was performed? I noticed that the TSV file output of the extract features module contains a list of comma separated values for the signal mean, std, signal len columns.

Expected results for example data set?

Hi,

Running deepsignal locally, i used the example data/model located here and getting these results:
screen shot 2019-02-21 at 12 36 32 pm

and I was wondering if the accuracy is supposed to be this low for the example dataset?

Extracting methy_label from fast5 files

How does the extract_features function get the true label for each candidate target base?
I have been going over the code but could not figure out how it determines whether a candidate target is methylated or not methylated?

get_refloc_of_methysite_in_motif function seems to return all the candidate targets and then I could not figure out how you get the label for each targeted base (input).

download problems

I can't download the model.CpG.R9.4_1D.human_hx1.bn17.sn360.tar.gz,
the problem is (the Forbidden You don't have permission to access /~luofeng/deepsignal/model.CpG.R9.4_1D.human_hx1.bn17.sn360.tar.gz on this server.)
can youhelp me?

Error while calling mod and assigning threads

Hi @PengNi
I am calling modification from the extracted feature file. After running for a while the software gives me this error:
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
I would appreciate if you could help me with this matter.
BTW: another problem I face is that no matter how many threads you define for the deepsignal in nproc when extracting features or calling mods on CPU, it will take all the available threads.
Thanks,
Vahid.

running "deepsignal extract" without tombo.index

Hi!

I'm running a large analysis on DNA data (about 15,000,000 reads)
I was running tombo resquiggle when my server suddenly crashed and only half of the reads had been correctly resquiggled.
I checked which of the reads had been resquiggled through a custom python script using h5py and I relaunched tombo resquiggle on the others.
However, for the reads that had been correctly resquiggled before the server crashed no tombo.index was generated.
Since it took almost 3 days to resquiggle those reads it would be painful to re-run the resquiggle on the entire dataset and I was wondering if the tombo.index is actually required for running deepsignal extract (and downstream steps) or If it is not essential

Thanks a lot in advance

Warning and missing checkpoint with model training on gpu

Hi Peng,

So I have a couple of new questions regarding some issues I had with model training on gpu.

  • First I add a warning I'm not sure to have understood with a 20M samples training file and 10K to validate. This the gpu I use :
name: Tesla K40m major: 3 minor: 5 memoryClockRate(GHz): 0.745
totalMemory: 11.17GiB freeMemory: 11.07GiB

This is the warning :

2020-03-29 17:42:04.757305: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.87GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

So I guess it's a memory issue ? Do you know if it is affecting computing speed or anything ?

The training process was stopped after 2 days (I work on a slurm env) during epoch 3. The issue is I only have checkpoint of epoch 0 and 1. Do you think it is related to the memory issue ?

  • This brings me to my second issue. This time I did train with a 10M samples file. It was the first time the training finished on its own, these are the last line of the ouput :
================ epoch 5 best accuracy: 0.835, best accuracy: 0.835
training finished, costs 161783.9 seconds..

Unfortunately I have only up to checkpoint 4 :

model_checkpoint_path: "bn_17.sn_360.epoch_4.ckpt"
all_model_checkpoint_paths: "bn_17.sn_360.epoch_0.ckpt"
all_model_checkpoint_paths: "bn_17.sn_360.epoch_1.ckpt"
all_model_checkpoint_paths: "bn_17.sn_360.epoch_2.ckpt"
all_model_checkpoint_paths: "bn_17.sn_360.epoch_3.ckpt"
all_model_checkpoint_paths: "bn_17.sn_360.epoch_4.ckpt"

I was hoping you could tell me a bit more about when the training should end ? I am using default paremeters. I also had one warning also during this training :

/usr/local/bioinfo/src/DeepSignal/deepsignal-0.1.6-gpu/deepsignal-0.1.6-gpu_venv/lib/python3.6/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

Thanks a lot for your help!

Paul

Training a new model

Hi!

Problem:
I was trying to train a new model and I have the following issues (FYI, before answering check out my potential cause at the bottom):

  1. After training the output directories (specified with --model_dir and --log_dirare flags) are created, but are empty.

  2. Best accuracy for all epoch's is 0.0. I.e. the output looks like this:

================ epoch 0 best accuracy: 0.000, best accuracy: 0.000
================ epoch 1 best accuracy: 0.000, best accuracy: 0.000
================ epoch 2 best accuracy: 0.000, best accuracy: 0.000
================ epoch 3 best accuracy: 0.000, best accuracy: 0.000
================ epoch 4 best accuracy: 0.000, best accuracy: 0.000

Sometimes, but not always, even though I am using the same input files, I also get the following error message: right before those 0.0 accuracy stats:

PATH/anaconda3/envs/deepsignal/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples.
  'precision', 'predicted', average, warn_for)

Commands Used to run training:

deepsignal train --train_file PATH/trainingSet.TOP1000.tsv --valid_file PATH/validationSet.TOP1000.tsv --model_dir PATH/models/testModel --log_dir PATH/models/testModel_log

Files used in the above example (download available up to two weeks from now):
https://transfer.sh/PboqP/trainingSet.TOP1000.tsv
https://transfer.sh/hINl0/validationSet.TOP1000.tsv

Possible reason of the problem:
In the above example I was actually using files with only 1000 lines from both training and validation files, in order to see how the program behaves. Still, the same issue is true if 10k top lines are considered for each file. Of course this was only for testing, not really for creating the best model. Nevertheless, I was expecting to have some results. Is it possible that problems described at the beginning are related with a small number of samples? Are you able to reproduce it?

I would appreciate your help.
Thanks,
Wojciech

Question about train dataset

Hi,
I have a question about your train dataset. In the previous question, you suggest that other public dataset can be used for training.(#7 (comment))
My question is that do these datasets need base-calling first? Or I just start from tombe re-squiggle step? And for the experiment data generated by myself, maybe I need to do the base-calling before using tombo? And can I use Guppy rather than Albacore? Can DeepSignal use the base-calling result of Guppy?
Many thanks.

How long does it take to finish 30X human data?

Hi,

I want to know how long does it take to finish methylation calling of 30X 1D human WGS data on 4 Titan V you used.

More specifically I want to know how you get the running time in table S3 of your paper

In table S3, for 30X NA12878 data testing, you wrote that it takes 7366.1min, which corresponds to about 5days.

Does it mean that it takes 5 days even with 4 Titan V GPUs?

Jinyoung

process interrupt

Two deepsignal processes running on the same data at the same time. Once one finished normally, the other will be interrupted/stopped.

Problem while running deepsignal on GPU

Hi PengNi,

I am trying to run deepsignal on our HPC GPU, but I get this error:

# ===============================================
## parameters: 
input_path:
	/home/ls760/nanopore/us/scripts/test_area/ls760/methylation_pipeline/VWD1047/tmp
model_path:
	/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/resources/methylation_models/deepsignal_human/model.CpG.R9.4_1D.human_hx1.bn17.sn360/bn_17.sn_360.epoch_7.ckpt
is_cnn:
	yes
is_rnn:
	yes
is_base:
	yes
kmer_len:
	17
cent_signals_len:
	360
batch_size:
	512
learning_rate:
	0.001
class_num:
	2
result_file:
	Nanopore_methylationanalysis.tsv_call_mods.tsv
recursively:
	yes
corrected_group:
	RawGenomeCorrected_000
basecall_subgroup:
	BaseCalled_template
reference_path:
	/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/scripts/test_area/ls760/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
is_dna:
	yes
normalize_method:
	mad
methy_label:
	1
motifs:
	CG
mod_loc:
	0
f5_batch_num:
	100
positions:
	None
nproc:
	10
is_gpu:
	yes
# ===============================================
898913 fast5 files in total..
parse the motifs string..
read genome reference file..
read position file if it is not None..
/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
write_process started..
2020-03-03 14:44:05.613202: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-03-03 14:44:24.483271: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key modelem/bw/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
Process Process-9:
Traceback (most recent call last):
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key modelem/bw/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/site-packages/deepsignal-0.1.7-py3.6.egg/deepsignal/call_modifications.py", line 171, in _call_mods_q
    saver.restore(sess, model_path)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key modelem/bw/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/resources/envs/ont/deepsignalenv_gpu/bin/deepsignal", line 11, in <module>
    load_entry_point('deepsignal==0.1.7', 'console_scripts', 'deepsignal')()
  File "/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/site-packages/deepsignal-0.1.7-py3.6.egg/deepsignal/deepsignal.py", line 423, in main
    args.func(args)
  File "/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/site-packages/deepsignal-0.1.7-py3.6.egg/deepsignal/deepsignal.py", line 87, in main_call_mods
    f5_args)
  File "/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/site-packages/deepsignal-0.1.7-py3.6.egg/deepsignal/call_modifications.py", line 393, in call_mods
    is_rnn, is_base, is_cnn)
  File "/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/site-packages/deepsignal-0.1.7-py3.6.egg/deepsignal/call_modifications.py", line 339, in _call_mods_from_fast5s_gpu
    p_call_mods_gpu.start()
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ls760/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/rds/project/who1000/rds-who1000-wgs10k/WGS10K/data/projects/nanopore/us/resources/envs/ont/deepsignalenv_gpu/lib/python3.6/site-packages/deepsignal-0.1.7-py3.6.egg/deepsignal/call_modifications.py", line 170, in _call_mods_q
    saver = tf.train.Saver()
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in __init__
    self.build()
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps
    restore_sequentially)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/ls760/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key modelem/bw/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Searching a bit on the internet looks like it is a model problem. would oyu agree? do you have any idea on how to solve it?

Thanks,
Luca

Questions about plant DNA methylation

Hi Ni Peng,
I am impressed by your deepsignal tools.
Our focus is on DNA methylation in plants, which not only including CG methylation, but also CHH and CHG. What's more, there are little data in plants nanopore reads for training a model. Now, our strategy for data analysis is using Nanopolish for CG methylation detection and tombo Alternative Model for 6mA, CHG, and CHH detection. But we don't know the accuracy of the results. Are there any suggestions for this pipeline if we want to use deepsignal for methylation calling?

Best,
Zhao Long

It would be very nice if you could add cuda option to DeepSignal

Hi @PengNi ,
One thing I noticed that could be a good enhancement to deepsignal is cuda option.
We have 7 GPUs and when I run deppsignal on gpu server it distributes job on all gpus and just uses part of their ram and processing power, so it would be very useful if you could add options to select the gpu number and amount of threads and ram for each gpu in deepsignal.

Thanks,
Vahid

Installation Issue

Dear Team,
I am facing problem with installing deepsignal.
Tensorflow 1.8.0 is not getting installed properly.
Please share a link where I can download tensorflow=1.8.0

Tensorflow_installation_error

Training motif specific models

Hi,

I would like to use deepsignal to detect 6mA modifications and I would need some clarification about the process because I am not sure of what I understood.

From previous questions I understood I should start with training a model.

I actually have runs that cover two samples of a bacteria, one is native so I consider the motif I am interested in to be fully methylated, the other is a mutant for which we assessed the absence of methylation at the same motif.

Here is the process I have in mind :

  1. Blend fast5 from both samples
  2. Split the blended fast5 in two so I have a --train_file dataset and a --valid_file dataset ready for extraction
  3. Unsing deepsignal extract on both group of blended fast5 with (ex:) --motifs GATC & --mod_loc 1 (1 would be the position of A in GATC)
    At this step I don't understand the use of --methy_label. How should I make a choice when I just blended positive and negative sample ?
  4. Give extraction output tsv files to the deepsignal train command ?

I feel I am missing something here!

Thanks in advance!
Paul.

add a "region" option

may be it is better to add a "--region" option in the extract_features and call_methy modules, to only extract and testing the basemods in the interested regions.

insertion methylation level

Hi,
Is there any way to detect DNA modifications of insertions (one type of SVs) by using deepsignal ?
Or only DNA modification within reference genome regions can be detected?

Thank you.

Yeeok Kang -

Nanopore data for NA12878

Hi DeepSignal,

I really like your model design for the Nanopore methylation calling. But I have some issues and would really appreciate it if you can help:

  1. Did you only use the original .fast5 files from NA12878 dataset(https://www.ebi.ac.uk/ena/data/view/PRJEB13021) for the paper and then did basecall on your own? Have you ever considered rel6 genomic DNA?

  2. Is there any specific reason for using Albacore? I think Albacore is deprecated by Nanopore after R9.4 and Guppy is the main base caller.

  3. 4mC and 5mC exists at the same time in bacteria and cannot be distinguished by oxBS-seq because of the mechanism. How will the model deal with this issue?

  4. You are using highly-confidence 5mC from oxBS-seq to label your Nanopore reads for training, but the problem is that the abundance for 100%5mC should be very rare in NA12878, and in most cases, a CpG site should be mixed with 5mC and 5C in the genome level. How do you solve the problem?

  5. Can you provide more details on calculating accuracy at read-level? Do you calculate the percentage of correct methylation calls in a read? How about other evaluation parameters such as sensitivity, specificity, AUC at a read-level?

  6. Since Nanopore now released R10.3, is your method compatible with the latest version?

Thank you so much for your help!

Best,
Ziwei

Your paper circle plot

Hi @PengNi ,
Sorry for this silly question, but I was wondering how you made that circle plot in your paper.
As you mentioned in the paper you binned genome in 1mb regions, did you also binned methylation frequency file and mean/sum methylation frequencies in those 1mb regions?

I am trying to reproduce a similar plot but mine is completely different.
Thanks,
Vahid.

How to calculate accuracy and other metrics by evaluate_mods_call.py?

Hi @PengNi ,

I would be very grateful if you could help me on how to calculate accuracy.
I know have bisulfite data and methylation_frequency data from DeepSignal. How should I use your script to calculate accuracy?
It says I need two result files, one contains calls for methylated sites, another contains calls for
unmethylated sites. What is calls for methylated sites (is it methylation frequency file?) and what is calls for unmethylated sites?

Many thanks,
Vahid.

Model training, feature extraction error.

Hi @PengNi ,

I am trying to use DeepSignal to train a model.
I found the zero-based position of CpGs that are modified or not. Then I used these position files to extract features to train a model on. But, when I used the extract module it does not extract any features?

Here is my command:
deepsignal extract -i fast5s/ --reference_path hg38.fa --methy_label 1 --positions position.tsv -o extracted_featur.tsv

and here is the header of my position file:
1 3287543 1 3287547 1 3287551 1 3287555 1 3287559 1 3287563 1 3287567 1 3287571 1 3287575 1 3287579

My position file does not have a header.

However, when I do not give the position option to the command it extracts the features.

Thanks,
Vahid.

Model training outputs many files and still runing

Hi @PengNi ,

It is now about 6 days that my deepsignal train model script is running on a CPU machine.
I see several files in the output directory. What are these different files for? and which one would be the final trained model?
BTW, the script is still running.

Many thanks,
Vahid.

How to calculate methylation frequencies from predict probability

Hi,
I have a question about the correlation analysis between methylation frequencies of CpGs calculated by DeepSignal/ nanopolish with those from bisulfite sequencing.

When you calculate the Pearson coefficient, how do you calculate the methylation frequencies in the DeepSignal model at genome level? I think your model provides the prediction result of methylated probability P+’ and the unmethylated probability P- ’ of each tested site in the genome, but not the methylation frequencies. So do you hypothesize that methylated probability P+’ is equal to methylation frequencies 5mC% at each target site?

How about Nanopolish? Since Nanopolish uses the log-likelihood ratio to make a methylation call for each site, how do you convert the ratio into methylation frequencies to make it compatible to BS-seq?

Thank you so much for your help!

Output BED format of call_mods_freq results

Add option/funtion to output call_mods_freq result in BED format.
Related to call_modification_frequency.py and call_modification_frequency2.py.
Add coverage cutoff of output sites too.

Clarification about filter_samples_by_positions.py

Hi Peng,

So I am ready to train a model using samples extracted and labelled from bisulfite high confidence sites. I figured that the filter_samples_by_positions.py script was intented for this purpose.
However some basic clarifications would be great :

  • Because there is a --label argument, could you confirme I should use the script twice, one to extract and label methylated samples and one for unmethylated samples ?

  • I see the position file should include only the chromosome and the genomic position (chromosome\tpos_in_forward_strand). Does it mean I also have to run this script twice to extract sample matching forward and reverse strand.

So I should build 4 lists of positions, two for the forward strand (methylated and unmethylated) and two for the reverse strand, then combine the outputs and shuffle it ?

Best,

Paul

multiprocessing

  1. Testing on CPUs, when using multiprocessing.process(), even use only one process, multi cores (in extract_features module) and all cores (in call_modifications module) are used. Better figure out why and fix it, but not urgent.

  2. complete the call_modifications module, take the fast5s as input, test it on CPU.

  3. need to test on GPU, see if the call_modifications module can run as expect.

  4. need to figure out how to check if tensorflow runs on GPU or CPU inside the python code, for setting the nproc param properly under 2 (GPU/CPU) cases. Cause on GPU the nproc may not be set to >1.

after commit 3fa49c4.

how to train a model

Hi Deepsignal,

I am impressed by the high sensitivity and accuracy of deepsignal in calling methylation sites. I would very much like to try it in my study. Here I have a few questions.

  1. Deepsignal only provides a human CpG model. I want is to extract all methylation motifs (not only CpG) of all methylaiton types (6mA, 5mC, 4mC) from microorganisms. So it seems I have to train a custom model. Am I right?

  2. deepsignal extract can extract features for training. Could you please explain a little bit about what exactly is extracted?

  3. I have tried deepsignal extract on the example yeast data. The methy_label of all positions are all '1'. Does '1' mean that this position will be used for training? What does '1' mean?

  4. If the result of deepsignal extract is used for training a model, how can deepsignal know which base is methylated?

  5. deepsignal extracts selected motifs with the same mod_loc. If I want to extract all types of motifs (probably with different mod_loc), including novel motifs. Does this mean that deepsignal extract is not applicable to me?

  6. For training a model, if the input is a pool of all methylation types, is there a requirement for the number of a type, or of a specific motif of a type?

  7. Could you please give some advice on how to prepare the files for training a model?

Thank you so much.
Shangjin

subprocess won't stop

subprocesses in extract_features module won't stop as expect when the data contains millions of reads, all in S (Interruptible sleep) state.

when there are thousands of reads, it looks fine.

test on v0.1.2 (commit 7dcebae).

self._sem._semlock._get_value() not implemented error

Would this error cause a problem with the returned values ? Or should I just ignore it?

 # Raises NotImplementedError on Mac OSX because of broken sem_getvalue()
    return self._maxsize - self._sem._semlock._get_value()

How to get my files to basecall

I am trying to get my files in the right format to run deepsignal and I am having a lot of troubles. My fast5 files need to be basecalled from my fastq files. In the instructions it says "If the basecall results are saved as fastq, run the [tombo proprecess annotate_raw_with_fastqs]". I have done that but I am getting this error:
Not working
Help?

wget link for data

Please provide the link to download the data using wget from google drive.

Don't import tensorflow when extracting features

Hi!

This would just be a small quality-of-life improvement. Currently, deepsignal.py imports train_model and call_mods even when I am actually calling extract. train_model and call_mods then want to import tensorflow, and tensorflow depends on libcuda, none of which is required by the feature extraction command.

This can be a bit problematic in a compute cluster environment where you have dedicated GPU nodes (which have libcuda) and non-GPU nodes (which don't have libcuda). It forces me to run the feature extraction on a GPU node, despite it not using a GPU, thus reserving a valuable resource I'm not actually using.

Thanks a lot!

Multi_read Fast5s

Hi,
Is there any possibility that you update DeepSignal to accepts multi-read fast5 files?
Because, when the number of samples is high, it is painful to convert multi-read files to single read and produce hundreds of millions of files.
Thanks,
Vahid.

Deepsingal not running on gpu

Hi,
I'm running deepsignal with the following command:
deepsignal call_mods --input_path deepSignal_NPmeT1_fast5_CpG_signalFeature.tsv --model_path deepSignalModel/model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+/bn_17.sn_360.epoch_9.ckpt --result_file deepSignal_NPmeT1_fast5.CpG.call_mods.tsv --nproc 10 --is_gpu yes

It is running (with many warnings) but for some reason rather than running on the gpu it is running on the cpu(s) and taking a very long time. I though this may be similar to issue #38 however, I checked and I am using the model for deepsignal v0.1.7 (which is what I seem to be running).

I'm pasting the warnings, below.
Thank you for your help,
Avital

/home/miniconda3/envs/deepsignalenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/miniconda3/envs/deepsignalenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/miniconda3/envs/deepsignalenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/miniconda3/envs/deepsignalenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/miniconda3/envs/deepsignalenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/miniconda3/envs/deepsignalenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])

===============================================

parameters:

input_path:
deepSignal_NPmeT1_fast5_CpG_signalFeature.tsv
model_path:
/home/bioinformatics/faige/nanopore/deepSignalModel/model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+/bn_17.sn_360.epoch_9.ckpt
is_cnn:
yes
is_rnn:
yes
is_base:
yes
kmer_len:
17
cent_signals_len:
360
batch_size:
512
learning_rate:
0.001
class_num:
2
result_file:
deepSignal_test.tsv
recursively:
yes
corrected_group:
RawGenomeCorrected_000
basecall_subgroup:
BaseCalled_template
reference_path:
None
is_dna:
yes
normalize_method:
mad
methy_label:
1
motifs:
CG
mod_loc:
0
f5_batch_num:
100
positions:
None
nproc:
10
is_gpu:
yes

===============================================

write_process started..
2020-04-03 01:02:23.840665: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2020-04-03 01:02:23.841327: I tensorflow/core/common_runtime/process_util.cc:63] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

Feature extraction

Hi @PengNi
Do you consider mismatches with the reference during feature extraction?
As you know, the error rate of the nanopore is high and there may be many cytosines that actually might not be C when compare it to reference. I was wondering that deepsignal extracts features for cytosines that are called C from nanopore and also are C in the ref?

Thanks,
Vahid.

Result verification

Hi there,

I recently use Deepsignal to detect DNA methylation and received the result as seen below.

1       387268  -       4855350 079a30f7-9f4c-4a7a-ad5b-39e3ef61530b    t       0.565757        0.43424305      0       CTGCCTGTCGTGGGCGG
1       2781788 -       2460830 0802b481-faa3-4aa6-9572-54fa77053c81    t       0.057231583     0.9427684       1       CCCTGATCCGGACGGAA
1       2780877 -       2461741 0802b481-faa3-4aa6-9572-54fa77053c81    t       0.06506838      0.9349316       1       ACGCAGCGCGAGGATCT

This is the result from E. coli genome sequencing and I used .ckpt from model.GATC.R9_2D.tem.puc19.bn17.sn360.tar.gz for --model_path.

From this output snippet, I understand that the last two rows show methylated C at 9th position of 17 mers. I also confirmed the predicted methylation position with the genome position giving in the 2nd column.

However, I observe G_A_TC, which is a recognition motif for one of the DNA methyltransferase (methylated nucleotide in this motif is flanked by _ ), not in the overlapping regions with the predicted nucleotide methylation. Interestingly, the motif usually appears elsewhere in 17 mers.

I also checked for other recognition motifs i.e. A_A_CGTCG, C_C_[A/T]GG and ATGC_A_T of the different DNA methyltransferases as mentioned in PMC4231299. None of expected methylated nucleotides (flanked by _ ) in these motifs are found in the predicted methylation (9th nucleotide in 17 mers). But I found these motifs frequently pop up at various positions in the 17 mers.

So, I doubt how reliable the prediction is and if I interpret the result correctly.
It is quite a detailed question and I would be glad to receive any feedback.

Best,

Wannisa

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.