josuebarrera / genera Goto Github PK

genEra is a fast and easy-to-use command-line tool that estimates the age of the last common ancestor of protein-coding gene families.

License: GNU General Public License v3.0

Shell 88.33% R 11.67%

bioinformatics comparative-genomics phylostratigraphy gene-age genera founder-gene gene-family genomics founder-events

genera's People

Contributors

Stargazers

Watchers

Forkers

yuzhenpeng rocesv lotharukpongjs proginski eulium

genera's Issues

GenEra run fails at Step 2 - Rearranged ncbi_lineages file not found

Hello,

Really interesting software! I am just giving it a go and trying to determine the gene ages for the human proteome. I am using the fasta file for canonical human proteins from Uniprot and a Swissprot DIAMOND database for this purpose. I already also generated the ncbi_lineages file using ncbitax2lin. Thus the command that I used was

genEra -q proteomes/processed/HUMAN.fa -t 9606 -b data/swissprot -d data/taxdump -i true -o output -n 100 -r output/ncbi_lineages_2023-04-12.csv

Step 1 runs correctly and so does a part of step 2 that creates the tmp_column_* files. However the run fails at the part where a rearranged ncbi_lineage file is created, specifically line 829 of genEra

It seems either the file is not created or there is an issue with opening it later. I do not know which one for sure as the program exits before I can know. The traceback is as follows

STARTING STEP 2: GENERATING TAXONOMIC DATABASE FOR THE PHYLOSTRATIGRAPHIC ASSIGNMENT OF YOUR GENES
--------------------------------------------------
Using the raw "ncbi_lineages" file provided by the user. Skiping ncbitax2lin
--------------------------------------------------
Rearranging the raw "ncbi_lineages" file by taxonomic hierarchy
wget: /software/centos7/devel/anaconda/3/lib/libuuid.so.1: no version information available (required by wget)
/home/.conda/envs/genEra/bin/genEra: line 829: /home/projects/GenEra/tmp_9606_7724/tmp_arranged_output/ncbi_lineages_2023-04-12.csv: No such file or directory
awk: fatal: cannot open file `/home/projects/GenEra/tmp_9606_7724/tmp_arranged_output/ncbi_lineages_2023-04-12.csv' for reading (No such file or directory)
--------------------------------------------------
Extracting all the lineages that match more than 10 percent of your query proteins
awk: cmd. line:1: fatal: cannot open file `/home/projects/GenEra/tmp_9606_7724/tmp_arranged_output/ncbi_lineages_2023-04-12.csv' for reading (No such file or directory)
--------------------------------------------------
Collapsing the phylostrata that are not represented in your DIAMOND results
--------------------------------------------------
Generating the species-tailored database
/home/.conda/envs/genEra/bin/genEra: line 886: /home/projects/GenEra/tmp_9606_7724/tmp_9606_output/ncbi_lineages_2023-04-12.csv: No such file or directory
/home/.conda/envs/genEra/bin/genEra: line 894: output/9606_output/ncbi_lineages_2023-04-12.csv: No such file or directory
rm: cannot remove ‘/home/projects/GenEra/tmp_9606_7724/tmp_arranged_output/ncbi_lineages_2023-04-12.csv’: No such file or directory
rm: cannot remove ‘/home/projects/GenEra/tmp_9606_7724/tmp_9606_output/ncbi_lineages_2023-04-12.csv’: No such file or directory
rm: cannot remove ‘/home/projects/GenEra/tmp_9606_7724/tmp_column_*’: No such file or directory

  ERROR: The species-tailored database is empty! please send me an email to figure out what might be the issue ([email protected])
  Exiting

Can you please help me with this?

Thanks!

May I use GenEra to study the genomic phylostratigraphy of Mycobacterium tuberculosis?

Hi,

GenEra is so great that I would like to use it to study the genomic phylostratigraphy of Mycobacterium tuberculosis.

GenEra appears to have been designed for use in eukaryotes, but I still wonder if GenEra can be used in prokaryotes?

best.

Chengtao

Trouble with step2

Hello, I want to do some genomic phylostratigraphy analysis of vertebrates data and I tried human data.
The first step diamonds worked, I got two temp files, "9606_Diamond_results.bout" and "tmp_9606.abc".
But the second step I got these errors.

`Collapsing the phylostrata that are not represented in your DIAMOND results

WARNING: The following phylostrata were collapsed due to lack of sufficient ge
nomic data:
[Homo sapiens] [Homo] [Homininae] [Hominidae] [Hominoidea] [Catarrhini] [Simii
formes] [Haplorrhini] [Primates] [Euarchontoglires] [Boreoeutheria] [Eutheria] [
Theria] [Mammalia] [Amniota] [Tetrapoda] [Dipnotetrapodomorpha] [Sarcopterygii]
[Euteleostomi] [Teleostomi] [Gnathostomata] [Vertebrata] [Craniata] [Chordata] [
Deuterostomia] [Bilateria] [Eumetazoa] [Metazoa] [Opisthokonta] [Eukaryota] [cel
lular organisms]
If you want to include them in your analysis, please add the necessary taxa in
a custom database (-a or -f), making sure their last common ancestor to the que
ry species can be assigned to that specific taxonomic level in the NCBI taxonomy
database

Is this the error due to making blast db? The nr base worked, I had used it for blastp before.

NA for all gene ages

Hi again @josuebarrera!

I'm trying to run GenEra on a longest-isoform proteome of Arabidopsis, and while the DIAMOND (v2.1.6) search step seems to complete successfully, the final output has NA for all genes in gene_ages.tsv. I believe there's a known issue with DIAMOND failing to propagate scientific names correctly during DB construction, but the taxonomy IDs are present and the output file contains the taxonomy IDs as expected.

Furthermore, I've spot-checked the DIAMOND output manually and in those cases, results for the genes assigned NA are present in the bout file, e.g.:

~/data$ grep 'AraTha-rna-NM_001084197.2' 3702_gene_ages.tsv
AraTha-rna-NM_001084197.2	Absent from the DIAMOND/MMseqs2 results	NA	NA

~/data/tmp_3702_23548$ zgrep 'AraTha-rna-NM_001084197.2' 3702_Diamond_results.bout.gz
AraTha-rna-NM_001084117.3\tgene-AT1G23147\tNC_003070.9\t+\t8205934:8206206\t91\t8205934-8206206	AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	9.45e-09	47.8	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	2.38e-67	196	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	AraTha-rna-NM_001084905.2\tgene-AT4G11653\tNC_003075.7\t-\t7037086:7037358\t91\t7037086-7037358	1.70e-24	87.8	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	AraTha-rna-NM_001084638.2\tgene-AT3G04735\tNC_003074.8\t-\t1292408:1292725\t106\t1292408-1292725	1.41e-07	45.1	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	AraTha-rna-NM_001084117.3\tgene-AT1G23147\tNC_003070.9\t+\t8205934:8206206\t91\t8205934-8206206	2.08e-07	44.3	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	AraTha-rna-NM_001335782.1\tgene-AT2G22055\tNC_003071.7\t+\t9379578:9379817\t80\t9379578-9379817	6.58e-07	42.7	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	AraTha-rna-NM_104838.2\tgene-AT1G61566\tNC_003070.9\t-\t22717265:22717492\t76\t22717265-22717492	6.75e-06	40.0	3702
AraTha-rna-NM_104838.2\tgene-AT1G61566\tNC_003070.9\t-\t22717265:22717492\t76\t22717265-22717492	AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	5.70e-06	40.0	3702
AraTha-rna-NM_001335782.1\tgene-AT2G22055\tNC_003071.7\t+\t9379578:9379817\t80\t9379578-9379817	AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	5.85e-07	42.7	3702
AraTha-rna-NM_001084638.2\tgene-AT3G04735\tNC_003074.8\t-\t1292408:1292725\t106\t1292408-1292725	AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	6.50e-07	43.5	3702
AraTha-rna-NM_001084905.2\tgene-AT4G11653\tNC_003075.7\t-\t7037086:7037358\t91\t7037086-7037358	AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	1.40e-23	85.5	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	NP_001077666.1	6.20e-63	195	3702;1240361
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	CAA0267533.1	3.59e-62	193	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	VYS48050.1	5.98e-61	190	3702
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	XP_002874722.1	1.31e-38	134	59689;81972
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	KAG7552807.1	3.59e-36	128	1240361
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	KAG7557481.1	2.01e-35	126	45249
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	XP_019084701.1	1.62e-21	91.3	90675
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	NP_001078374.1	3.12e-20	87.8	3702;1240361
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	KAG7620113.1	8.93e-20	86.7	45249
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	KAG7579121.1	3.51e-17	80.1	45249;1240361
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	XP_024010228.1	1.77e-16	78.2	72664
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	XP_002883850.1	4.06e-16	77.4	59689;81972
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	XP_019088361.1	2.39e-15	75.5	90675
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	EOA22711.1	1.18e-14	74.3	81985
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	ESQ38323.1	1.38e-13	70.5	72664
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	XP_002874709.1	8.73e-12	66.2	81972
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	XP_020869967.1	3.56e-09	60.1	59689;81972
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	KAG7598041.1	2.79e-08	57.8	45249
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	RID64494.1	1.75e-06	52.8	3708;3711
AraTha-rna-NM_001084197.2\tgene-AT1G35467\tNC_003070.9\t+\t13049164:13049433\t90\t13049164-13049433	KAG2273089.1	9.48e-06	50.8	52824

Any idea what might be going on? Thanks!

split results

Hi,

Is your feature request related to a problem? Please describe.
As the last release of the human genome, with its ~145k CDS produces a 630Go results, and as the help of v1.4.0 says that one needs around 200Go RAm for 180Go of results, it seems one needs ~700Go of RAM to complete the analysis with the -F option.

Describe the solution you'd like
Once step 1 (+/- 2) is completed, is it possible to manually split the input fasta and Diamond results to better each chunk's performance? (I'm not saying it will not require a lot of RAM also ;) )

Describe alternatives you've considered
I just tried something like

faSplit sequence cds_from_genomic.faa 10 cds_from_genomic
grep ">" cds_from_genomic04.fa | sed -E "s/>(.*)/^\1\t/" > cds_from_genomic04.txt
grep -f cds_from_genomic04.txt tmp_9606_18134/9606_Diamond_results.bout > cds_from_genomic04.bout # This step of course is "expensive"
genEra \
-t 9606 \
-q  cds_from_genomic04.fa\
-n 40 \
-p cds_from_genomic04.bout \
-c 9606_ncbi_lineages.csv \
-r ncbi_lineages_2023-07-12.csv \

The chunk has 87 CDS and of course, it went turbo-fast.
The ages assigned to the CDS were the same as when the entire original fasta was used.
So is it possible to do so, and could it be of any interest?

Paul

Problem with setting up Diamond databases

Hi I am stuck at the step:
diamond makedb --in nr --db nr --taxonmap prot.accession2taxid --taxonnodes taxdump/nodes.dmp --taxonnames taxdump/names.dmp --memory-limit 100

I got the error:
Error: Option is not permitted for this workflow: memory-limit

but if I ran with diamond makedb --in nr --db nr --taxonmap prot.accession2taxid --taxonnodes taxdump/nodes.dmp --taxonnames taxdump/names.dmp

I got the error:
Accession parsing rules triggered for database seqids (use --no-parse-seqids to disable):
UniRef prefix 0
gi|xxx| prefix 0
xxx| prefix 31111
|xxx suffix 31111
.xxx suffix 918334653
:PDB= suffix 0

Loading taxonomy names... [1.24s]
Loaded taxonomy names for 2514621 taxon ids.
Loading taxonomy mapping file...
Error opening file prot.accession2taxid: No such file or directory

Any advises on this? Many thanks!

Best,
CW

v1.4.0 : no tmp .bout files

Dear genEra developers,

Describe the bug
The CDS of A thaliana I am using, won't be dated.
I already succeded using genEra v1.4.0 with a subset of H sapiens' CDS
Now using the enclosed fasta, even when providing 500Go RAM for 262Go of results, it does not work.
Notice that I performed the same analysis (same command) with v1.2.0 and it went perfectly fine (except it took longer of course).
I would have bet the problem is caused by the "|" character in the middle of the CDS name, but it worked with the previous version.

To Reproduce
Steps to reproduce the behaviour, e.g.

genEra \
-t 3702\
-q CDS/cds_from_genomic.faa \
-b /diamonddb/NR_DB/nr \
-n 75 \
-r ncbi_lineages_2023-07-12.csv

Expected behaviour
The ages are not assigned :
#gene phylostratum rank taxonomic_representativeness
lcl|NC_000932.1_cds_NP_051037.1_48181 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051038.1_48226 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051039.1_48182 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051040.2_48183 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051041.1_48184 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051042.1_48185 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051043.1_48186 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051044.1_48187 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051045.1_48188 Absent from the DIAMOND/MMseqs2 results NA NA

Screenshots or code
Here are the last lines of the err file (16 Mo of similar 'No such file or directory' lines)

awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_001321941.1_644.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_177334.1_10947.bout': No such file or directory
awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_565027.1_10948.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003071.7_cds_NP_001323584.1_19320.bout': No such file or directory
.................................................. 1M
.................................................. 2M
.................................................. 3M
.................................................. 4M
...
[mclIO] writing </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
.......................................
[mclIO] wrote native interchange 48227x48227 matrix with 4144755 entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
[mclIO] wrote 48227 tab entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.tab>
[mcxload] tab has 48227 entries
[mclIO] reading </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mcl>
.......................................
[mclIO] read native interchange 48227x8569 matrix with 48227 entries

Session info:

singularity build --fakeroot genEra_intraS.sif docker://josuebarrera/genera
cds_from_genomic.tar.gz

Paul

`ncbitax2lin` memory fix

Hi @josuebarrera,

Two things:

You note in the README that you have uploaded a compressed lineages file for people having memory issues with ncbitax2lin, but I can't seem to find the link/file itself. Have I overlooked something?
I was able to avoid the ncbitax2lin memory usage issue by modifying the fmt.py file from ncbitax2lin to reduce the number of workers in the call to concurrent.futures.ProcessPoolExecutor() (which defaults to the number of CPUs on the system) by adding max_workers=<n>, where <n> is some lower number e.g.,

with concurrent.futures.ProcessPoolExecutor() as executors: -> with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executors:

DIAMOND search step segfault

First, thanks for the software! I've run a similar (less thorough) analysis "by hand" in the past, and it's exciting to have this sort of pipeline made more accessible.

I'm trying to run GenEra following the installation setup described in the documentation, using the RefSeq human proteome vs a local version of the nr db. I'm running on a small server with 32GB RAM (could be part of the issue?), Ubuntu 22.10, DIAMOND v2.1.6, GenEra v1.1.0.

The first step runs for ~6 hrs, and produces the following error:

/home/glarue/miniconda3/envs/genEra/bin/genEra: line 701: 131901 Segmentation fault      (core dumped) diamond blastp --${SENSITIVITY} --query ${QUERY_FASTA} --db ${NR_DB} --outfmt 6 qseqid sseqid evalue bitscore staxids --evalue ${EVALUE} --max-target-seqs 0 --threads ${THREADS} --out ${TMP_PATH}/${NCBITAX}_Diamond_prefiltered_results.bout --quiet ${DIAMONDOPTS}

  ERROR: DIAMOND didn't run properly, verify that the database was built correctly
  Exiting

The _results.bout file seems to have been created successfully (~500 MB), as is an .abc file, but the _prefiltered_results.bout file is empty. My first thought was that this could be a memory issue, but the search itself seems to have completed successfully so I'm not sure that makes sense.

Any help is appreciated—thanks!

Comparison of protein families from multiple species

Thank you for the intriguing tool. I have predicted the functional annotations (amino acid sequences) of different microbial eukaryotes from the genomics data, and I would like to compare their gene families. Do you think it is something that is within the scope of the tool? For example comparing the protein sequences from multiple species.

-a option : can we concatenate custom output with precomputed blast against nr ?

Hi,

I mentioned it in a comment but it seems better to make a proper question with it :

Let's imagine I launched genEra on a single proteome (-q ), and now, I would like to add some extra proteomes with -a.
As step1 took a long time (diamond blastp vs nr), I would like not to rerun it.

Is it possible to launch
diamond blastp --query single_proteome --db extra_proteomes_db -o extra_Diamond_results.bout --outfmt 6 qseqid sseqid evalue bitscore --evalue ${EVALUE} --max-target-seqs 0 # and then,
cat extra_Diamond_results ${TAXID}_Diamond_results.bout > tmp
mv tmp ${TAXID}_Diamond_results.bout
genEra ... -a extra_prot.tsv -p ${TAXID}_Diamond_results.bout
?

From what I have understood, it would be fine...

Docker image with version tag

Is your feature request related to a problem? Please describe.
I would like to implement this tool in Galaxy, and in order to be able to specify the tool version when using the docker images it would be necessary to create a image with the tag corresponding to the tool version.

Describe the solution you'd like
It would be great if you could include a docker image with the tag corresponding to the tool version.

HPC tunning

Hi Josue,

I'm Paul, we met at the SMBE poster session last month.
Thanks for genEra again !

I suppose HPC are the most suitable support to use your tool, at least to process entire and consequent genomes, so here are two questions :
1 - Do you have a rough idea of the memory consumption genEra has? In other words, is there any point to provide it with hundreds of Gb of RAM?
2 -Regarding the reading/writing time potential bottleneck: can we assume most of the computation is made on the temporary directory (-x argument), or are there a lot of files read/written directly on the working directory ? I was considering having my wd on a store space and having only the temp dir directly on the HPC node.

I'm starting with Human, mouse, and Cie ... I'll let you know how it goes ;)

Trouble with installation

Hi,
I would love to give this a but I am currently struggling with installation, this bit doesn't work for me, as the file gets somehow corrupted I think and cannot gunzip it

wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr{.gz,.gz.md5} && md5sum -c *.md5
gunzip nr.gz

Thanks!
Laura

Maybe Ensembl protein formart bugs

Dear Josué,
Hello, another bugs occured. When I used Ensembl genome, such as mouse genome, GRCm38.92, it doesn't work. I had used this GRCm38.92 to run Diamond, worked, so this may genEra bug. And I got those feedback:tax10090.txt

Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed.
I tried to add '--ignore-warnings' to the end of command：
nohup time genEra -q /mnt/data4/disk/yxj/EmbroGenesis/Ref/Sequences/Mm.fa -t 10090 -b /mnt/data4/disk/yxj/diamond_nr/nr -d /mnt/data4/disk/yxj/diamond_nr/taxdump --ignore-warnings > /mnt/data4/disk/yxj/result/tax10090_2.txt &
and I got ERROR: One or more invalid arguments.
Then I tried another version, GRCm38.86, still the same error.
By the way, NCBI Refseq format works well.
Best regrads,
Xujiang