xryanglab / ribocode Goto Github PK

View Code? Open in Web Editor NEW

46.0 46.0 14.0 308 KB

release version

License: MIT License

Python 99.35% Shell 0.65%

bioinformatics orfs peptides ribosome-profiling

ribocode's People

Contributors

Stargazers

Watchers

Forkers

zhengtaoxiao yf0205 sunqiangzai zhaoyuehrb marcasriv kennyluo4 sitaoz silhouetteq sherkinglee huguanjing zachery001

ribocode's Issues

请问如何进行lncRNA的ORF鉴定

prepare_transcripts这一步用gencode.lncRNA.gtf生成的文件中，cds文件就已经是空的了，后续用metaplots直接报错no orf出不了结果了

上面是lncRNA的gtf生成的，下面是完整gtf生成的

No obviously periodicity are detected from bam file when running metaplots step

When I process metaplots, the log printed that:

No obviously periodicity are detected from bam file SRR22462570/SRR22462570.Aligned.toTranscriptome.out.bam,
it could be due to poor quality sequencing.
Please check the metagene plots and try again by lowering the value of frame0_percent

Even thought used the following parameters
metaplots -a $genecode.v44.ribo.anno -r $sample.Aligned.toTranscriptome.out.bam -o $sample.ribocode -f0_percent 0.01
The log still reported the same contents, which causes RiboCode to interrput

Error when writing transcripts to file

Hi,
RiboCode runs on my dataset from Cryptococcus neoformas, then fails during the writing to gtf. This meant that ORF calling succeeded and wrote a complete .txt file, but for the .gtf output left large gaps of entire chromosomes or half-chromosomes.

Error message was:
Errors: 88% has finished! Writing the results to file ..... error when transform the transcript interval to genomic! Traceback (most recent call last): File "/usr/local/bin/RiboCode", line 10, in <module> sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/RiboCode/RiboCode.py", line 63, in main output_gtf=output_gtf, output_bed=output_bed) File "/usr/local/lib/python3.6/dist-packages/RiboCode/detectORF.py", line 455, in main write_to_gtf(gene_dict, transcript_dict, orf_results, collapsed_orf_idx, outname) File "/usr/local/lib/python3.6/dist-packages/RiboCode/detectORF.py", line 192, in write_to_gtf exon_ivs = transcript_iv_transform(tobj, orf_iv) File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 285, in transcript_iv_transform exons_ivs.append(Interval_from_directional(exons_bound[i],exons_bound[i+1],strand))

I suggest three possible ways to help with this.

Report which transcript / exon has failed.
Put some exception-handling code around that failure so that remaining transcripts are still written.
Write out the ORFs in transcript co-ordinates first, which would be better for my application anyway.

I can provide access to data and annotations if that helps to debug!

Thanks
Edward

Whether multiple mapping affect ribocode result

Sorry to bother you.
I followed your workflow use STAR arguments --quantMode TranscriptomeSAM and --outFilterMultimapNmax 1,this is the command looks like

STAR --outFilterType BySJout --runThreadN 10 --outFilterMismatchNmax 2 \
--genomeDir /reference/GRCm38/STAR \
--readFilesIn /my/rmrRNA/${sample}_trim_norrna.fq  \
--outFileNamePrefix ${sample} \
--outSAMtype BAM SortedByCoordinate \
--quantMode TranscriptomeSAM GeneCounts \
--outFilterMultimapNmax 1 --outFilterMatchNmin 16 --alignEndsType EndToEnd

the output Aligned.toTranscriptome.out.bam file still have multiple mapping sequence ( NH:i > 1 ) , because STAR wiil output all records in this toTranscriptome bam file , Does these multiple mapping records affect the Ribocode result to find uORF ?

Error, the references in bam are different from transcriptome annotation

Hi,

I got the following error when running my script:

Error, the references in bam are different from transcriptome annotation

It seems to throw this as my reads were aligned to the transcriptome and have the transcript version ID instead of just the transcript ID. Is it possible for the output of the prepare_transcript function to include the version number?

Thanks

Error in GTFupdate

Hi, I want to use a special gtf file and want to process these files through GTFupdate, but I find that there will be some errors.
The following is the content of my gtf file：

chr1	Cufflinks	exon	11872	12227	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000002.2"
chr1	Cufflinks	exon	12613	12721	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000002.2"
chr1	Cufflinks	exon	13225	14412	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000002.2"
chr1	Cufflinks	exon	11874	12227	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000003.2"
chr1	Cufflinks	exon	12595	12721	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000003.2"
chr1	Cufflinks	exon	13403	13655	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000003.2"
chr1	Cufflinks	exon	13661	14409	.	+	.	gene_id "NONHSAG000001.2";gene_name "NONHSAG000001.2";transcript_id "NONHSAT000003.2"

I modified it with reference to the format of GTF_update.rst, but there are still the following errors;

Traceback (most recent call last):
  File "/home/leelee/miniconda3/envs/p3/bin/GTFupdate", line 10, in <module>
    sys.exit(main())
  File "/home/leelee/miniconda3/envs/p3/lib/python3.7/site-packages/RiboCode/GTF_update.py", line 117, in main
    gset,sourted_gset_keys = GroupGeneSubsets(args.gtfFile)
  File "/home/leelee/miniconda3/envs/p3/lib/python3.7/site-packages/RiboCode/GTF_update.py", line 34, in GroupGeneSubsets
    gid=field_dict["attr"]["gene_id"]
KeyError: 'gene_id'

How can i solve this problem？
Thanks,
LeeLee

psites_number table is not generated with v1.2.15

I think I still encounter an issue that is similar to issue#32
i.e., I did not get the psites_number table in the hdf5 file.

  group           name       otype dclass   dim
0     /        p_sites H5I_DATASET   VLEN 46826
1     / transcript_ids H5I_DATASET STRING 46826

Also the dimension of the p_sites table seems to be wrong.
Otherwise the code ran okay and generated other files.

This job is running on skl-119 on Wed May 17 23:21:46 EDT 2023
0% has finished! ^M2% has finished! ^M4% has finished! ^M6% has finished! ^M9% has finished! ^M11% has finished! ^M13% has finished! ^M15% has finished! ^M17% has finished! ^M19% has finished! ^M21% has finished! ^M23% has finished! ^M26% has finished! ^M28% has finished! ^M30% has finished! ^M32% has finished! ^M34% has finished! ^M36% has finished! ^M38% has finished! ^M41% has finished! ^M43% has finished! ^M45% has finished! ^M47% has finished! ^M49% has finished! ^M51% has finished! ^M53% has finished! ^M56% has finished! ^M58% has finished! ^M60% has finished! ^M62% has finished! ^M64% has finished! ^M66% has finished! ^M68% has finished! ^M70% has finished! ^M73% has finished! ^M75% has finished! ^M77% has finished! ^M79% has finished! ^M81% has finished! ^M83% has finished! ^M85% has finished! ^M88% has finished! ^M90% has finished! ^M92% has finished! ^M94% has finished! ^M96% has finished! ^M98% has finished! ^M[2023-05-17 23:29:14] Finished!
        Loading transcripts.pickle ...
        Reading bam file: /mnt/home/larrywu/CTRL_arabidopsis/data/RiboCode_STAR/ribo_mapped/D1//star_D1_Aligned.toTranscriptome.out.bam......
Finished reading bam file!

Any suggestions for how to deal with this issue? Thanks!

can't locate attribute: psites_number

I'm trying to debug a workflow that runs RiboCode. I tried using the latest container from quay.io and this is the error I'm running into. Any guidance on what I should look into first?

Loading transcripts.pickle ...
Loading Psites from xxxxx.transcriptome.dedup......
_psites.hd5Traceback (most recent call last):
File "/usr/local/bin/RiboCode", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/RiboCode/RiboCode.py", line 40, in main
tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
File "/usr/local/lib/python3.9/site-packages/RiboCode/process_bam.py", line 107, in psites_count
tpsites_sum,total_psites_number = read_bam(configList[0])
File "/usr/local/lib/python3.9/site-packages/RiboCode/process_bam.py", line 45, in read_bam
tpsites,psites_number = load_psites(name + "_psites.hd5" )
File "/usr/local/lib/python3.9/site-packages/RiboCode/process_bam.py", line 31, in load_psites
psites_number = fin.attrs["psites_number"]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.9/site-packages/h5py/_hl/attrs.py", line 56, in getitem
attr = h5a.open(self._id, self._e(name))
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5a.pyx", line 80, in h5py.h5a.open
KeyError: "Can't open attribute (can't locate attribute: 'psites_number')"

Unknown Attribute Error

Hi,

I am getting a weird AttributeError with not much description of what's going wrong:

Traceback (most recent call last):
File "/home/bwee/.local/bin/metaplots", line 9, in
load_entry_point('RiboCode==1.2.10', 'console_scripts', 'metaplots')()
File "/home/bwee/.local/lib/python2.7/site-packages/RiboCode/metaplots.py", line 241, in main
meta_analysis(gene_dict,transcript_dict,args)
File "/home/bwee/.local/lib/python2.7/site-packages/RiboCode/metaplots.py", line 227, in meta_analysis
distancePlot(distance_to_start_count,distance_to_stop_count,pre_psite_dict,length_counter,args.outname + sampleName)
File "/home/bwee/.local/lib/python2.7/site-packages/RiboCode/metaplots.py", line 108, in distancePlot
with PdfPages(outname + ".pdf") as pdf:
AttributeError: exit

Could someone help me understand what's going wrong?

Thanks,
Brendan

Read counts issues

Is this package only counts the unique mapped reads in the setp "ORFcount"？

Change readme from pip to pip3

Hey, since RiboCode requires Python 3.6, it is better to set default in readme to pip3, as this makes it more failsafe for users who have both. If there is not something I am missing ?

prepare_transcripts failed

Traceback (most recent call last):
File "/home/dr/anaconda2/bin/prepare_transcripts", line 10, in
sys.exit(main())
File "/home/dr/anaconda2/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 388, in main
processTranscripts(args.genomeFasta,args.gtfFile,args.out_dir)
File "/home/dr/anaconda2/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 308, in processTranscripts
gene_dict,transcript_dict = readGTF(gtfFile)
File "/home/dr/anaconda2/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 187, in readGTF
gene => transcript => exon (or CDS)" % i)
RiboCode.prepare_transcripts.ParsingError: Error in line 0. The annotation in GTF file should be three-level hierarchy of gene => transcript => exon (or CDS)

I am getting this error and I am new to bioinformatics kindly resolve....

ModuleNotFoundError: No module named 'fasta'

Hi,
I got a ModuleNotFoundError when testing the RiboCode installation with RiboCode_onestep -V
Can you help?

$ RiboCode_onestep -V
Traceback (most recent call last):
  File "/nexus/posix0/MAGE-flaski/service/projects/data/Bioinformatics/bit_pipe_ribosome_profiling/libraries/venv3/bin/RiboCode_onestep", line 5, in <module>
    from RiboCode.RiboCode_onestep import main
  File "/nexus/posix0/MAGE-flaski/service/projects/data/Bioinformatics/bit_pipe_ribosome_profiling/libraries/venv3/lib/python3.9/site-packages/RiboCode/RiboCode_onestep.py", line 16, in <module>
    from .prepare_transcripts import *
  File "/nexus/posix0/MAGE-flaski/service/projects/data/Bioinformatics/bit_pipe_ribosome_profiling/libraries/venv3/lib/python3.9/site-packages/RiboCode/prepare_transcripts.py", line 17, in <module>
    from pyfasta import Fasta
  File "/nexus/posix0/MAGE-flaski/service/projects/data/Bioinformatics/bit_pipe_ribosome_profiling/libraries/venv3/lib/python3.9/site-packages/pyfasta/__init__.py", line 3, in <module>
    from fasta import Fasta, complement, DuplicateHeaderException
ModuleNotFoundError: No module named 'fasta'

my package list

$ pip3 list
Package            Version
------------------ ---------
AGEpy              0.8.2
autopaths          1.6.0
bcrypt             3.2.0
biomart            0.9.2
biopython          1.78
certifi            2020.12.5
cffi               1.14.5
chardet            4.0.0
charset-normalizer 3.1.0
click              7.1.2
coloredlogs        15.0
colormath          3.0.0
cryptography       3.4.6
cycler             0.10.0
decorator          4.4.2
et-xmlfile         1.0.1
future             0.18.2
h5py               3.1.0
HTSeq              0.13.5
humanfriendly      9.1
idna               2.10
ipaddress          1.0.23
jdcal              1.4.1
Jinja2             2.11.3
joblib             1.0.1
kiwisolver         1.3.1
lzstring           1.0.4
Markdown           3.3.4
MarkupSafe         1.1.1
matplotlib         3.3.4
minepy             1.2.6
multiqc            1.9
networkx           2.5
numpy              1.20.1
openpyxl           3.0.6
pandas             1.2.2
paramiko           2.7.2
patsy              0.5.1
Pillow             8.1.0
pip                23.0.1
plumbing           2.11.2
py                 1.11.0
pybedtools         0.8.1
pycparser          2.20
pyfasta            0.5.2
PyNaCl             1.4.0
pyparsing          2.4.7
pysam              0.16.0.1
python-dateutil    2.8.1
pytz               2021.1
PyYAML             5.4.1
requests           2.25.1
retry              0.9.2
RiboCode           1.2.15
scikit-learn       0.24.1
scipy              1.6.1
seaborn            0.11.1
setuptools         57.5.0
sh                 2.0.3
simplejson         3.17.2
six                1.15.0
spectra            0.0.11
statsmodels        0.12.2
suds-jurko         0.6
threadpoolctl      2.1.0
tqdm               4.65.0
urllib3            1.26.3
Wand               0.6.5
wheel              0.38.4
xlrd               2.0.1
XlsxWriter         1.3.7

Error in <Preparing the transcripts annotation files> step

ribocode was installed by conda and run in python2.7 env.
when i run this command prepare_transcripts -g ~/data/reference/IRGSP/IRGSP.gtf -f ~/data/reference/IRGSP/IRGSP.fa -o annotation/

then i get fellow error:
Traceback (most recent call last):
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/bin/prepare_transcripts", line 10, in
sys.exit(main())
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 388, in main
processTranscripts(args.genomeFasta,args.gtfFile,args.out_dir)
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 311, in processTranscripts
genomic_seq = GenomeSeq(genomeFasta)
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/RiboCode/prepare_transcripts.py", line 205, in init
self.fh = Fasta(filename, key_fn = get_chrom)
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/pyfasta/fasta.py", line 73, in init
flatten_inplace)
File "/home/wuyuechao/data/bio-tools/anaconda3/envs/ribocode/lib/python2.7/site-packages/pyfasta/records.py", line 48, in prepare
idx = cPickle.load(fh)
ValueError: unsupported pickle protocol: 4

please help thx!

h5py TypeError: No conversion path for dtype: dtype('<U18')

RiboCode -a RiboCode_annot -c metaplots_pre_config.txt -l no -g -o RiboCode_ORFs_result

Loading transcripts.pickle ...
Reading bam file: /n/jobspace/bbcore/schaffer_rpf_abby_min6/two_batches/rpf/alignments_transcriptome/20210125_Index_3_JS8600_S11_R1_001_Aligned.toTranscriptome.out.bam......
Traceback (most recent call last):
File "/home/panh/miniconda3/envs/ribocode/bin/RiboCode", line 10, in
sys.exit(main())
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/RiboCode.py", line 40, in main
tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/process_bam.py", line 118, in psites_count
tpsites,psites_number = read_bam(configData)
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/process_bam.py", line 80, in read_bam
write_psites(tpsites,psites_number, name + "_psites.hd5")
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/process_bam.py", line 21, in write_psites
fout.create_dataset("transcript_ids",data=list(tpsites.keys()),dtype=ds)
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/h5py/_hl/group.py", line 136, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "/home/panh/miniconda3/envs/ribocode/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 170, in make_new_dset
dset_id.write(h5s.ALL, h5s.ALL, data)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 212, in h5py.h5d.DatasetID.write
File "h5py/h5t.pyx", line 1654, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1715, in h5py.h5t.py_create
TypeError: No conversion path for dtype: dtype('<U18')

psites HDF5 file fails to be created

My run failed for my dataset, that has all genes set to equal length.
What happens is this (I here display the partial hdf5 file):

  group           name       otype dclass        dim
0     /        p_sites H5I_DATASET   VLEN 551 x 1000
1     / transcript_ids H5I_DATASET STRING       1000

You see the "psites_number" table is not made, because it fails at inserting the "p_sites" table properly.
I think the newest h5py with anaconda (3.6.0) version does not work now as the code is implemented,
or am I wrong here?

ERROR is:

Traceback (most recent call last):
  File "/home/roler/anaconda3/envs/ribocode_env/bin/RiboCode", line 10, in <module>
    sys.exit(main())
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/RiboCode.py", line 40, in main
    tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 111, in psites_count
    tpsites_sum,total_psites_number = read_bam(configList[0])
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 84, in read_bam
    write_psites(tpsites,psites_number, name + "_psites.hd5")
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 26, in write_psites
    fout.create_dataset("p_sites",data=list(tpsites.values()),dtype=dt, compression="gzip")
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/h5py/_hl/group.py", line 149, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 145, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
  File "h5py/_conv.pyx", line 784, in h5py._conv.ndarray2vlen
AttributeError: 'int' object has no attribute 'dtype'

I tried to make the h5 file from scratch, but it still fails with this error:

Traceback (most recent call last):
  File "/home/roler/anaconda3/envs/ribocode_env/bin/RiboCode", line 10, in <module>
    sys.exit(main())
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/RiboCode.py", line 40, in main
    tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 111, in psites_count
    tpsites_sum,total_psites_number = read_bam(configList[0])
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 49, in read_bam
    tpsites,psites_number = load_psites(name + "_psites.hd5" )
  File "/home/roler/anaconda3/envs/ribocode_env/lib/python3.9/site-packages/RiboCode/process_bam.py", line 35, in load_psites
    psites_number = fin["psites_number"].value
AttributeError: 'Dataset' object has no attribute 'value'

It now says there is no ".value" getter, which might be because ".value" is deprecated?
Relevant link: https://stackoverflow.com/questions/67409919/attributeerror-dataset-object-has-no-attribute-value

Alternative Start Codons

Hi,

I was wondering how to add an alternative start codon list in the RiboCode step. Thank you for your help!

Best,
Qidi

Error in 'RiboCode'

Hi,

When I try to run RiboCode:

RiboCode -a annot/ -c config.txt -l no -g -o result/

I got this error:

Loading transcripts.pickle ...
Reading bam file: newstarAligned.toTranscriptome.out.bam......
Finished reading bam file!
100% has finished!
Writing the results to file .....
Traceback (most recent call last):
File "/usr/local/bin/RiboCode", line 11, in
load_entry_point('RiboCode==1.2.10', 'console_scripts', 'RiboCode')()
File "/usr/local/lib/python3.6/dist-packages/RiboCode/RiboCode.py", line 63, in main
output_gtf=output_gtf, output_bed=output_bed)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/detectORF.py", line 379, in main
write_result(orf_results,outname)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/detectORF.py", line 156, in write_result
header = "\t".join(list(orf_results[0].keys())[:-1])
IndexError: list index out of range

I really don't know how to deal with this problem.Could you show me some advice?

error in metaplots step

When I run metaplots, it gives an error:
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

I don't know where the problem is, please help, thanks.

psite HDF5 file is saved in the wrong location

Correct me if I am wrong, but currently, the hdf5 file with psite information is saved in working dir and not in defined output-dir, this makes it fail if output-dir is not the working directory for the call.

This was tested on the latest version through Conda.

ValueError: unsupported pickle protocol: 5 in metaplots

when i use the metaplots function of ribocode 1.2.14， it show the error:, how can i slove it ,thank you in advance!!!

Create metaplot file and predict the P-site locations ...
Loading transcripts.pickle ...
Traceback (most recent call last):
File "/home/zuozd/miniconda3/envs/ribocode/bin/metaplots", line 10, in
sys.exit(main())
File "/home/zuozd/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/metaplots.py", line 241, in main
gene_dict,transcript_dict = load_transcripts_pickle(os.path.join(args.annot_dir,"transcripts.pickle"))
File "/home/zuozd/miniconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/prepare_transcripts.py", line 293, in load_transcripts_pickle
gene_dict, transcript_dict = pickle.load(fin)
ValueError: unsupported pickle protocol: 5

Question on adjusted P values reported in collapsed output

Dear developers,

It seems that the adjusted p values in the collapsed output was calculated from filtered p values. Shouldn't multiple testing correction be done before filtering p-values with the default pvalue cutoff of 0.05? To do the multiple testing correction correctly, could I set --pval-cutoff 1 and then filter the output by adjusted_pval < 0.05 manually?

> collapsed[order(-pval_combined)]
      pval_combined adjusted_pval
    1:  4.993873e-02  4.993873e-02
    2:  4.977925e-02  4.978278e-02
    3:  4.968042e-02  4.968748e-02
    4:  4.958320e-02  4.959377e-02
    5:  4.957438e-02  4.958848e-02
   ---                            
14073: 6.443817e-267 1.814192e-263
14074: 3.203795e-276 1.127496e-272
14075:  0.000000e+00  0.000000e+00
14076:  0.000000e+00  0.000000e+00
14077:  0.000000e+00  0.000000e+00

Besides, I noticed that there are two parameters on calculating combined p-values:

  --dependence_test {none,mic,pcc}
                        the method for measuring the dependence between frame1 and frame2. This test could help determine whether the combined p-values should be ajusted to account for the dependence between two test (i.e. F0 vs F1 and F0 vs F2). mic: Maximal Information
                        Coefficient; pcc: Pearson Correlation Coefficient.
  --stouffer_adj {none,nyholt,liji,gao,galwey}
                        the method for adjustment the cominbed p-values to account for the dependence between two tests (i.e. F0 vs F1 and F0 vs F2). see details at: https://search.r-project.org/CRAN/refmans/poolr/html/stouffer.html

What's the recommended way to set the two parameters?

Thanks in advance.

Error in 'prepare_transcripts'

Hi,

When I try to run prepare_transcripts with command line:

prepare_transcripts -g Arabidopsis_thaliana.TAIR10.41.gtf -f tair10.fa -o tair10_annot/

I got this error:

Preparing annotation files ...
Reading the GTF file: Arabidopsis_thaliana.TAIR10.41.gtf .......
Traceback (most recent call last):
File "/usr/local/bin/prepare_transcripts", line 11, in
load_entry_point('RiboCode==1.2.10', 'console_scripts', 'prepare_transcripts')()
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 388, in main
processTranscripts(args.genomeFasta,args.gtfFile,args.out_dir)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 308, in processTranscripts
gene_dict,transcript_dict = readGTF(gtfFile)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 174, in readGTF
field_dict = parsing_line(line)
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 102, in parsing_line
field_dict = {"chrom": intern(chrom),"source":source,"feature": intern(feature),"iv":iv,"attr":parsing_attr(attr)}
File "/usr/local/lib/python3.6/dist-packages/RiboCode/prepare_transcripts.py", line 74, in parsing_attr
k,v = i.strip().split(" ",1)
ValueError: not enough values to unpack (expected 2, got 1)

The gtf file was downloaded from http://plants.ensembl.org/index.html
Could you show me some advice about this error?
Have a nice day.

ValueError: Can't transform the genomic interval, please check!

when i use the command "prepare_transcripts",it show "ValueError: Can't transform the genomic interval, please check!",I use the "GTFupdate",it also show the same error,how can i slove this? thanks!

No obviously periodicity are detected from bam file, it could be due to poor quality sequencing. Please check the metagene plots and try again by lowering the value of frame0_percent

I am using the QC data to do my best analysis and I am not getting the following error, how can I change the p-value or how can I fix this problem？
error:No obviously periodicity are detected from bam file, it could be due to poor quality sequencing. Please check the metagene plots and try again by lowering the value of frame0_percent

thank you!

Empty "transcripts_cds.txt" file

Hi, the "transcripts_cds.txt" file generated is empty, how should I fix it? Thanks!

Here is GTF

A01	phytozomev13	gene	34694907	34695456	.	-	.	gene_id "Gohir.A01G126666.v2.1"; gene_name "Gohir.A01G126666.v2.1";
A01	phytozomev13	transcript	34694907	34695456	.	-	.	gene_id "Gohir.A01G126666.v2.1"; gene_name "Gohir.A01G126666.v2.1"; transcript_id "Gohir.A01G126666.1.v2.1";
A01	phytozomev13	exon	34694907	34695456	0	-	.	gene_id "Gohir.A01G126666.v2.1"; transcript_id "Gohir.A01G126666.1.v2.1"; Name "Gohir.A01G126666";
A01	phytozomev13	gene	119528395	119531897	.	+	.	gene_id "Gohir.A01G229700.v2.1"; gene_name "Gohir.A01G229700.v2.1";
A01	phytozomev13	transcript	119528395	119531897	.	+	.	gene_id "Gohir.A01G229700.v2.1"; gene_name "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1";
A01	phytozomev13	exon	119528395	119528785	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";
A01	phytozomev13	exon	119528884	119529005	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";
A01	phytozomev13	exon	119529125	119529262	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";
A01	phytozomev13	exon	119529365	119529507	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";
A01	phytozomev13	exon	119529601	119529736	0	+	.	gene_id "Gohir.A01G229700.v2.1"; transcript_id "Gohir.A01G229700.1.v2.1"; Name "Gohir.A01G229700";

And here is the script

(ribocode) [hugj2006@bigram2 translatome2022]$ prepare_transcripts -g td1.gtf -f ../cottonLeaf/refGenomes/TM1utx_26.fasta -o RiboCode_annot
[2022-06-08 05:12:58] Preparing annotation files ...
	Loading transcripts.pickle ...
[2022-06-08 05:13:09] The step of preparing transcript annotation has been completed.
(ribocode) [hugj2006@bigram2 translatome2022]$ 
(ribocode) [hugj2006@bigram2 translatome2022]$ ls -lh RiboCode_annot/
total 166M
-rw-rw-r--+ 1 hugj2006 domain users    0 Jun  8 04:53 transcripts_cds.txt
-rw-rw-r--+ 1 hugj2006 domain users  71M Jun  8 04:53 transcripts.pickle
-rw-rw-r--+ 1 hugj2006 domain users 186M Jun  8 04:53 transcripts_sequence.fa

Detect C-terminal extensions with RiboCode

Hi,

I was wondering if it will be possible to identify C-terminal extensions using RiboCode. I've noticed that N-terminal extensions / truncations are indeed identified by RiboCode (they fall in the category 'annotated', but they have different start codon than the main annotated ORF).

Thank you very much,

Kind regards,

Marina

error in GTFupdate

GTFupdate error， I download the annotation from NCBI. The same problem is present in maize and rice

Incorrect handling of hdf5 file for repeated runs of RiboCode

Ribocode does not correctly handle the hdf5 file it creates to store temperary data, resulting in identical results for multiple runs that are ran from the same folder.

For example, running this script resulted in all identical ORF libraries for each of the datasets.

while IFS= read -r folder; do
    echo "$folder"
    mkdir "$folder/GRCh38_110/ribocode/"
    RiboCode_onestep -g "$assembly/Homo_sapiens.GRCh38.110.gtf" \
                     -f "$assembly/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
                     -r "${folder}/GRCh38_110/aligned_tran.bam" \
                     -l no \
                     -o "${folder}/GRCh38_110/ribocode/ribo" \
                     -f0_percent 0.5
done < "$1"

I solved this by removing the hdf5 file created after each run

while IFS= read -r folder; do
    echo "$folder"
    mkdir "$folder/GRCh38_110/ribocode/"
    rm aligned_tran_psites.hd5                                               <-- remove file
    RiboCode_onestep -g "$assembly/Homo_sapiens.GRCh38.110.gtf" \
                     -f "$assembly/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
                     -r "${folder}/GRCh38_110/aligned_tran.bam" \
                     -l no \
                     -o "${folder}/GRCh38_110/ribocode/ribo" \
                     -f0_percent 0.5
done < "$1"

RiboCode nf-core module

Hi Developers!

I am one of the developers of the nf-core/riboseq pipeline that we have started work on recently. It is now recognised as 'in-development" by the nf-core community but we still have a lot of work to do. I am trying to develop a RiboCode module so that the pipeline can support ORF calling as an option.

Firstly, I wanted to let you know about this effort and secondly, I have a few suggestions regarding the usability of the tool.

GTF parsing - this could be made to work on a wider range of GTF variants. Much of this can be handled by the GTFupdate subtool. eg GTFupdate should sort the gtf file
Switch to pyfaidx - Brent Pedersen the developer of pyfasta archived it back in 2018. This should not be too arduous but unfortunately it is not a straight swap.

I am happy to help with these and will likely make PRs here for them as I need them

GTF specification should include start and stop codons

I think, the input gtf to metaplots and prepare_transcripts needs start_codon and stop_codon features in order to recognize something as an annotated CDS; CDS features are not enough. Could this be added as an option in GTF_update.rst?

Importantly, could this be clarified explicitly in the gtf specification in README.md?

pickle.load() AttributeError: 'module' object has no attribute 'Gene'

Hi,

I ran your tool following the guidelines but when I try to run the Ribocode.py bit, I am getting this error:
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'Gene'

It seems to stem from pickle.load()

Let me know if you have any suggestions.

Best

prepare_transcripts not taking multiple CDSs into account

Hi,

I'm trying to run prepare_transcripts to extract transcripts and CDSs from a GTF file. The problem is that this GTF file contains transcripts with multiple CDSs and, when checking the output file 'transcripts_cds.txt' from prepare_transcripts, it seems to contain only one CDS for each transcript. Does RiboCode support transcripts with multiple CDSs at all? Any way around this?

different transcripts of one gene

Hi!
I'm sorry to bother you again. I don't know how to deal with different transcripts of one gene when quantification and study 3-nt periodicity in translation. Do you have some suggestions? Thank you very much!

remove PCR duplicate

Hi!
I'm sorry to bother you. I followed Ribocoed workflow, but I don't know whether need to remove PCR duplicate after STAR alignment. Could you give me some suggestions? Thank you very much!

KeyError while reading BAM

Dear developers,

I'm having the following error message:

Traceback (most recent call last):
  File "/home/huf/.local/bin/RiboCode", line 11, in <module>
    sys.exit(main())
  File "/home/huf/.local/lib/python2.7/site-packages/RiboCode/RiboCode.py", line 40, in main
    tpsites_sum, total_psites_number = process_bam.psites_count(configIn.configList,transcript_dict,thread_num=1)
  File "/home/huf/.local/lib/python2.7/site-packages/RiboCode/process_bam.py", line 111, in psites_count
    tpsites_sum,total_psites_number = read_bam(configList[0])
  File "/home/huf/.local/lib/python2.7/site-packages/RiboCode/process_bam.py", line 77, in read_bam
    tpsites[tid][t_psite] += 1
KeyError: 'ENSMUST00000035606.8'

I even sorted the bam, but it gave me the same error for a different transcript Id. Do you know where possibly goes wrong?

Best wishes,
Fengyuan

ImportError: No module named Tkinter

The Tkinter is needed when import the matplotlib module. If the following error is returned, pls check whether the tkinter package is installed. On Linux platform, users can install this package using follow command:
apt-get install python-tk
or
yum install python-tk

Traceback (most recent call last):
File "/usr/bin/metaplots", line 7, in
from RiboCode.metaplots import main
File "/usr/lib64/python2.7/site-packages/RiboCode/metaplots.py", line 7, in
import matplotlib.pyplot as plt
File "/usr/lib64/python2.7/site-packages/matplotlib/pyplot.py", line 115, in
_backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/usr/lib64/python2.7/site-packages/matplotlib/backends/init.py", line 32, in pylab_setup
globals(),locals(),[backend_name],0)
File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_tkagg.py", line 6, in
from six.moves import tkinter as Tk
File "/usr/lib/python2.7/site-packages/six.py", line 203, in load_module
mod = mod._resolve()
File "/usr/lib/python2.7/site-packages/six.py", line 115, in _resolve
return _import_module(self.mod)
File "/usr/lib/python2.7/site-packages/six.py", line 82, in _import_module
import(name)
ImportError: No module named Tkinter

RiboCode_onestep crashes when calling detectORF

When trying to use RiboCode_onestep I come across this error below:

Finished reading bam file!
Traceback (most recent call last):
File "/usr/local/bin/RiboCode_onestep", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/RiboCode/RiboCode_onestep.py", line 76, in main
detectORF.main(gene_dict=gene_dict, transcript_dict=transcript_dict, annot_dir = "annot",
TypeError: main() missing 3 required positional arguments: 'dependence_test', 'stouffer_adj', and 'pval_adj'

My command line looks like this:
Singularity> RiboCode_onestep -g Homo_sapiens.GRCh38.104.gtf -f Homo_sapiens.GRCh38.dna.primary_assembly.fa -r my.transcriptome.dedup.bam -l no -o RiboCode_ORFs_result

ImportError: cannot import name 'PPoly' from partially initialized module 'scipy.interpolate' (most likely due to a circular import) (/home/zhai2/miniconda3/lib/python3.8/site-packages/scipy/interpolate/init.py)

when i use the command : metaplots -a -r ,it show
ImportError: cannot import name 'PPoly' from partially initialized module 'scipy.interpolate' (most likely due to a circular import) (/home/zhai2/miniconda3/lib/python3.8/site-packages/scipy/interpolate/init.py)

how can i slove this ? thankyou

请问关于ORF预测结果中‘Psites_frame0_RPKM’代表什么意思？

请问Psites_frame0_RPKM代表的位于0读码框架的reads计算的ORF的表达量吗，不包含读码框1和2的reads

No R script supp

I just wanted to ask for the supplementary R file for the data analysis. I followed the the link but the document is not available anymore.

Thanks!

Question on handling multimappers

Dear RiboCode developers,

According to the documentation, it is recommended to include outFilterMultimapNmax 1 parameter in STAR alignment to exclude non-unique alignments and reduce noise for downstream analyses.

In case of default outFilterMultimapNmax 10 setting, how does RiboCode handle non-unique alignments? Are they included in P-site estimating and ORF detection? Does RiboCode differentiate between primary and secondary alignment flags when dealing with multi-mapped reads?

Thank you very much!

plot_orf_density error,it show the following arguments are required: --start-codon

plot_orf_density error,it show the following arguments are required: --start-codon,but i have added the start condn:
plot_orf_density -a totalreads_out/ -c metaplots_pre_config.txt -t itf00g40750.t4 -s 65 -e 295 --start-condon ATG --plot-annotated-orf yes
how can i slove?
thank you

Generating figures with matplotlib when DISPLAY variable is undefined or invalid

When running the "metaplots" or "plot_orf_density" command, some users received errors similar to the following:

"raise RuntimeError('Invalid DISPLAY variable')"
_"tkinter.TclError: no display name and no $DISPLAY environment variable"

The main problem is that default backend of matplotlib is unavailable. The solution is to modify the backend. A very simple solution is to set the MPLBACKEND environment variable, either for your current shell or for a single script:

export MPLBACKEND="module://my_backend"

Giving below are non-interactive backends, capable of writing to a file:

Agg
PS
PDF
SVG
Cairo
GDK

TypeError: '<' not supported between instances of 'str' and 'NoneType'

When i used metaplots -a RiboCode_annot -r HEK293Aligned.toTranscriptome.out.bam, an error occurred:
[2022-07-28 21:05:31] Create metaplot file and predict the P-site locations ...
Loading transcripts.pickle ...
Traceback (most recent call last):
File "/home/user/BGM/lit/anaconda3/envs/ribocode/bin/metaplots", line 10, in
sys.exit(main())
File "/home/user/BGM/lit/anaconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/metaplots.py", line 241, in main
meta_analysis(gene_dict,transcript_dict,args)
File "/home/user/BGM/lit/anaconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/metaplots.py", line 181, in meta_analysis
filter_tids = filter_transcript(gene_dict,transcript_dict)
File "/home/user/BGM/lit/anaconda3/envs/ribocode/lib/python3.7/site-packages/RiboCode/metaplots.py", line 48, in filter_transcript
level = list(sorted(levels))[0]
TypeError: '<' not supported between instances of 'str' and 'NoneType'
how can i solve this? thanx!!

xryanglab / ribocode Goto Github PK

ribocode's People

Contributors

Stargazers

Watchers

Forkers

ribocode's Issues

When I process metaplots, the log printed that:

No obviously periodicity are detected from bam file SRR22462570/SRR22462570.Aligned.toTranscriptome.out.bam, it could be due to poor quality sequencing. Please check the metagene plots and try again by lowering the value of frame0_percent

Recommend Projects

Recommend Topics

Recommend Org

No obviously periodicity are detected from bam file SRR22462570/SRR22462570.Aligned.toTranscriptome.out.bam,
it could be due to poor quality sequencing.
Please check the metagene plots and try again by lowering the value of frame0_percent