abacus-gene / paml Goto Github PK

PAML is a program package for model fitting and phylogenetic tree reconstruction using DNA and protein sequence data. Please report only **technical issues** on this repository (e.g., compiling, programs abort or do not run at all, etc.). Problems with input data and general questions should be posted at https://groups.google.com/g/pamlsoftware?pli

License: GNU General Public License v3.0

C 99.19% Makefile 0.12% Visual Basic 6.0 0.69%

paml's Introduction

Phylogenetic Analysis by Maximum Likelihood

PAML is a program package for model fitting and phylogenetic tree reconstruction using DNA and protein sequence data. The programs are written in ANSI C.

paml's People

Contributors

Stargazers

Watchers

Forkers

lzh93 jehops afarah1 fmh2417 simonjuleseric2 yz46606 tanxiaoqin888 obdulia1 tanghuihao syafiqgithub ningshuang-yao lila14 dongzhang0725 eiav66 biopeterson abubakariabdulwasid gaballench thesamesam

paml's Issues

# seqs in tree file does not match. Read as the nexus format.

Hello,
I figured out that paml 4.10.6 seems to behave differently than paml 4.9j I got the error

seqs in tree file does not match. Read as the nexus format.

in 4.10 - but not i 4.9... Is there a reason why?
Thanks - N. Grundmann
AZI2.tre.txt
AZI2_U.fasta.txt
codeml.ctl.txt

Error after update to 4.10.7

Hi,

I've encountered an error in codeml after updating to 4.10.7, where the program runs almost to completion but then errors out with "error: end of tree file". The below example works fine in 4.9j, but not work with 4.10.7:

controlfile:
seqfile = test_codeml_alignment.phy outfile = test_codeml_null.out treefile = test_codeml_tree.nwk noisy = 3 verbose = 1 seqtype = 1 CodonFreq = 2 ndata = 1 clock = 0 model = 2 NSsites = 2 icode = 0 fix_omega = 1 omega = 1 cleandata = 0

alignment:
6 87 sequence_1 ATGCAATCATATGCTTCTGCCATGTTAAGCGTATTTAACACTGATGGTTACAGTCCAGCTGCGCAACAGAATATTCCTGCTCTCCGG sequence_2 ATGCAATCATATGCTTCTGCCATGTTAAGCGTATTTAACACTGATGGTTACAGTCCAGCTGCGCAACAGAATATTCCTGCTCTCCGG sequence_3 ATGCAATCATATGCTTCTGCCATGTTAAGCGTGTTTAACACTGATGGTTACAGTCCAGCTGCGCAACAGCATATTCCTGCTCTCCGG sequence_4 ATGCAATCATATGCTTCTGCCATGTTAAGCGTATTTAACACTGATGGTTACAGTCCAGCTGCGCAACAGAATATTCCTGCTCTCCGG sequence_5 ATGCAATCATATGCTTCTGCCATGTTAAGCGTATTTAACACTGATGGTTACAGTCCAGCTGCGCAACAGAATATTCCTGCTCTCCGG sequence_6 ATGCAATCATATGCTTCTGCCATGTTAAGCGTATTTAACACTGATGGTTACAGTCCAGCTGCGCAACAGAATATTCCTGCTCTCCGG

tree:
(sequence_6,sequence_3,((sequence_2,sequence_5),(sequence_4,sequence_1 #1)));

I'll just keep running 4.9j for now, but would be great to know if there's something that I'm missing here!

Thanks!

Axel

set model for amino acid sequence in mcmctree

Hello,
I tried to use mcmctree to estimate the divergece time among several species, but I counldn't find the tutorials concering the model setting for amino acid sequence.
The best-fit model for my data is JTT+G4+F, which was detected using modeltest-ng.
How should I set the model in ctl file?

      seed = -1
   seqfile = 125_single_copy_gene.phy
  treefile = input_tree.txt
  mcmcfile = mcmc.txt
   outfile =representative_approx

     ndata = 1
   seqtype = 2  * 0: nucleotides; 1:codons; 2:AAs
   usedata = 3    * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
     clock = 2    * 1: global clock; 2: independent rates; 3: correlated rates
   RootAge = <2000  * safe constraint on root age, used if no fossil for root.

     _model = 0    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85_ **(How should I set this model?)**
     alpha = 0.5    * alpha for gamma rates at sites
     ncatG = 5    * No. categories in discrete gamma

 cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?

   BDparas = 0.1 0.1 0.1    * birth, death, sampling

kappa_gamma = 6 2 * gamma prior for kappa
alpha_gamma = 1 1 * gamma prior for alpha
rgene_gamma = 2 2000 1 * gamma prior for overall rates for genes
sigma2_gamma = 1 10 1 * gamma prior for sigma^2 (for clock=2 or 3)
finetune = 1: 0.1 0.1 0.1 0.1 0.1 0.1 * auto (0 or 1) : times, musigma2, rates, mixing, paras, FossilErr
print = 1
burnin = 3000000
sampfreq = 100
nsample = 100000

*** Note: Make your window wider (100 columns) before running the program.

Thanks.

Inconsistent LRT results for site models after altering parameters

Hello PAML Community,

I am currently working with an alignment of 54 primate sequences, using codeml Nsite models 0, 1, 2, 3, 4, 7, and 8. Since I recently upgraded from PAML version 4.10.5 to 4.10.7, I noticed that in the control file downloaded with the package (v. 4.10.7), the three parameters "cleandata," "fix_blength," and "method" are ignored with asterisks. We want to use “cleandata=1” (because there are a couple of places where there is missing data for a few sequences) and “fix_blength=0” (because our tree topologies are reliable but our branch lengths are not). We don’t have any preference with regard to “method”. We were unsure what the default setting where when the three parameters are preceded by * and thus ignored so we ran some tests. We were also aiming to check the convergence with the previous runs (v. 10.4.5), so we modified the three parameters and found that changing them made significant differences in the results obtained from the LRT test.

Table 1: LRT results of tests with different settings.

Runs #	Version of model	Settings for cleandata, fix_blength, and method	LRT of M1 vs M2	LRT of M7 vs M8
1	4.10.7 Linux	cleandata = 1 fix_blength = 0 method = 1	insignificant(LR=-0.06244)	insignificant(LR= -0.73016)
2	4.10.7 Linux	cleandata = 1 fix_blength = 0 *method = 1	significant(LR =10.185022)	significant(LR=20.272906)
3	4.10.7 Linux	cleandata = 1 fix_blength = 0 method = 0	insignificant(LR = 5.331764)	significant(LR=12.917131999998674)

*The other parameters are all kept the same for the three runs, except for cleandata, fix_blength, and method. As an example, the control file for run 1 is attached below, and the aligned sequences and tree file we used are attached in the zip file, along with the control files for all three runs.

Figure 1: The control file of run 1

Questions:

Since the parameter method = 1 specifies a newer algorithm than method=0, we thought it made sense to set it to 1. However, changing it from 0 to 1 (runs 1 and 3, above) changes one of the LR values significantly, which concerns me. What sorts of issues with the data, or real phenomena, could lead to method = 0/1 making such a big difference? What other tests could I run to gain more confidence in the results?
Simply ignoring the three parameters (as in the control file that comes with the package) also changes the LRT from insignificant (if we use the parameters we thought we choose to use, run 1) to significant. Given our test runs above, this seems to be due to “method=1” vs “method=0”, so this question is probably related to the previous one: in such situations, where a simple change in parameters leads to drastic changes in the results, how should one proceed? Note: we have also run different methods that estimate dN/dS, such as the FEL and MEME methods from hyphy, and got significant results and some sites with evidence for dN/dS>1. We are aware that those models/methods have many differences, but perhaps this helps in solving this mystery.

We are very grateful for any insights or suggestions that anyone has.
Best regards
paml_files.zip

segmentation fault

I have the 'segmentation fault' error for the 4.10 version. I can successfully run the ORF1b data, but have this error with ORF1a data. Total sequences are about 500, and same tree for the ORF1b data and ORF1a data.

Abrupt termination

Hi,

I have been using PAML / codeml with success for the most part, but have stumbled into some issues for a small number of analysis (~ 100 out of 5000).

The CTL file looks like this:

      seqfile = OG6203_CDS.paml           
     treefile = TREE.nwk                         
      outfile = OUT

        noisy = 3
      verbose = 1

      seqtype = 1
        ndata = 1
        icode = 0
    cleandata = 0

        model = 2
          NSsites = 0
    CodonFreq = 2
          estFreq = 0
        clock = 0
    fix_omega = 0
        omega = 0.5

and I'm varying the input tree file and the output name.

Case 1
If I run codeml using as input a tree file with 4 trees, it ends up abruptly:

TREE #  4:  (((((((((10, 15), 7), (((8, 3), 9), ((14, 17), 13))), 11), 1), 2), ((((5, 19), 18), 6), 16)), 12), 4);   MP score: -1
This is a rooted tree.  Please check!

In some of the other failed runs, it may end on TREE # 3 (it is not always on the same tree).

Case 2
Running codeml using 2 trees (Tree #4 and another one) finishes successfully.

Case 3
Testing with 3 trees, and putting Tree #4 at the beginning finishes successfully.

Case 4
Testing with 4 trees, and putting Tree #4 at the beginning also finishes successfully.

In the attached file you will find the corresponding inputs used at each case. Re-running the first case always results in the software stopping at the same point.

Currently using paml-4.10.6

Any input regarding this issue will be greatly appreciated!

Thanks.

error for run: codeml codem.ctl

Hello, when I run codeml codeml.ctl ,it shows
ns = 7 ls = 1643893
Reading sequences, sequential format..
Reading seq # 1: Arur
Reading seq # 2: Daca
Reading seq # 3: Dape
Reading seq # 4: Lesa

Error: EOF?.

Please tell me how should I solve this problem，thank you very much.

High Sequence Count (>2000) Causes Extended Computation Time in ConDEML

Hi,
I extend my heartfelt gratitude for providing such outstanding software. I analysised with over 2000 sequences and performed site model, branch model, and branch-site model analyses using condeml. The computation has been running for a week, and I've observed that it operates as a single thread on Linux.
I wanted to inquire if condeml supports multi-threading functionality, as enabling this feature could significantly reduce the computation time, especially when dealing with a large number of sequences.
Thanks.

MCMCtree fails to summarise the mcmc sample or to create FigTree tree file

After the MCMC iteration is finished the program may fail to summarise the MCMC sample and sometimes with an error message such as "Abort trap: 6" on macos or .

MCMCTREE in paml version 4.10.7, June 2023 Reading main tree. error: raise NS?(base)

After constructing a Maximum Likelihood (ML) tree using the amino acid sequences of 16 single-copy genes from 808 microbial genomes, I conducted a molecular clock analysis using this ML tree and the amino acid alignment file. However, when using the complete dataset, the following error occurred, while using data from 500 genomes allowed the process to run normally.

MCMCTREE in paml version 4.10.7, June 2023

Reading options from mcmctree.ctl..
finetune is deprecated now.

Reading main tree.

error: raise NS?(base)

How can I solve this problem? Additionally, which parameters in the mcmctree.ctl file should I pay attention to? I have seen some articles using the WAG protein substitution model, but it seems unavailable in this version. Here is my mcmctree.ctl file.
mcmctree.txt

Linux packaging

Not a issue, I just let You know that I'm packaging PAML for openSUSE at https://build.opensuse.org/package/show/home:vojtaeus/paml Everything works well, so thank You. :-)

Same lnLs in Model A and corresponding null model of the Branch-Site models

hi, guys!
I'm in trouble with codeml for positve selection
Model A:

Null model:

But the results of both are always the same

crazy!

Thanks,
Jinloong

Root-to-tip dN/dS

Hi,
I am using codeml and calculated the dN/dS of an alignment with a user tree. How do I get the root-to-tip parameters for all the tips ?
Thank you

How to identify positive selected sites using Branch site model

Hello, Professor
I am running /data/01/user157/software/paml4.8/bin/codeml test.ctl to identify positive selected genes. I can see the positive selected sites in the foreground Hgl clade, which is marked by blue box in the picture. However the codeml output show no positive selected sites
for the gene.

I have uploaded these input file: sequence file (cds.paml), ctl file (branch-site.Hgl.nofix.ctl), tree file (tree.Hgl) and the output file (branch-site.Hgl.nofix.mlc) for your viewing. I could not actually understand the BEB test, could you give me any suggestions?
Thank you very much!
branch-site.Hgl.nofix.ctl.txt
branch-site.Hgl.nofix.mlc.txt
cds.paml.txt
tree.Hgl.txt

help！my mcmctree doesn't work！

When I use mcmctree to analyze the protein sequence, when I import in.BV, I get an error: error strange calibration. May I ask where is the problem? I have been modifying the tree file for several days.

pamp "err:PathwayMP 0 != 4".

mcmctree/tipdate analysis is broken

The program is not reading the date information from the sequences and print out 0 dates for sampled sequences.

ndata of MCMCtree analysis using Single Copy orthologs in multiple genomes

Dear Dr. Ziheng Yang,

Hello, I am Hyeonseon Park, currently conducting research on the genome sequencing and comparative genomics of plants in the Poaceae family. To infer the divergence times of these species, I am planning to perform an MCMCTree analysis. I have identified around 300 single-copy orthologs across 11 species.

I am wondering whether I should set ndata=300 in the control (.ctl) file for this analysis. I have noticed in other research papers that multiple single-copy orthologs are concatenated and analyzed with ndata=1. Could you please explain the difference between these approaches and advise on which method is more appropriate?

Thank you for developing such insightful software for evolutionary biology research.

Best regards,

Hyeonseon Park

Mcmctree not running when using a Mac M1 silicon

Hello everyone! :)

I just installed PAML in my Mac (Mac mini (M1, 2020)). Baseml and others work, but mcmctree gives me an error related to hardware incompatibility: "zsh: illegal hardware instruction /Users/palomaruizdd/paml/bin/mcmctree", after reading the tree correctly.

I have tried using Roseta 2 (and I have checked the path and the permits, they are fine), and I have compiled and tried two different versions of PAML (4.8 and 4.10), but the error persists. Any clues? Is there any compatible version of mcmctree with Mac M1?

I have used it previously installing it with miniconda, but it does not seem to be available anymore.

Thank you for all the help!!
Paloma

bump version number

hi Ziheng,

I just stumbled on this git repo - nice!

I downloaded v4.10.6.tar.gz from the 'releases' section of the git repo just now, compiled it, and noticed that the version info is out of date when I run codeml. I can see it is out of date in the src/paml.h file, too:

#define pamlVerStr "paml version 4.10.0, September 2020"

(the precompiled version doesn't work on my system, but I imagine that will show the wrong version info too)

all the best - hope you are well,

Janet

Dr. Janet Young

Malik lab
http://research.fhcrc.org/malik/en.html

Division of Basic Sciences
Fred Hutchinson Cancer Center
1100 Fairview Avenue N., A2-025,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 4512
email: jayoung ...at... fredhutch.org

Dating soft bound broken

The examples in examples/TipDate.HIV2/ and examples/TipDate.FluH1/ do not work. When trying any of the five mcmctree control files in those directories, the following error is printed on the screen:

Trace/BPT trap: 5

and the MCMC run is aborted before starting. This was tested on a MacBook air M2.

$ uname -a
Darwin roraima.home 23.4.0 Darwin Kernel Version 23.4.0: Wed Feb 21 21:51:37 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T8112 arm64

Some of the control files were tested with mcmctree 4.9j and they appear to be running correctly.

# seqs in tree file does not match. Read as the nexus format.

Codeml from PAML 4.10.6 is not correctly processing the site models. Using our or your GitHub examples always lead to the same error: "# seqs in tree file does not match. Read as the nexus format." This error is also reproduced using codeml from PAML 4.10.3. However, the same data sets were used in the codeml from 4.10.0. or older leads to correct processing.
Please fix this PAML 4.10 issue.
Best regards, Juergen Schmitz

AZI2_fasta.txt
AZI2_tree.txt
codeml_ctl.txt

‘Error: strange calibration?.’ when second ronud of mcmctree

I used a tree file with calibration for first round of mcmctree, and generated out.BV.
But when I use the out.BV as in.BV and the same tree file for the second round 0f mcmctree, it failed and reported ‘Error: strange calibration?.’

The tree file is:
30 1
(((Dre,Cau),((Ots,Omy),((Oni,(Pny,Nbr)'B(14.1,30.1)'),(Osi,Ola)))),((Cpi,Gga),(((Ptr,Hsa),(((Fca,(Pti,Ple)'B(11.6,14.7)'),((Clud,(Cluf,Clufgsd)),Vvu)),((Cwa,Ssc),((Bta,(Bmu,Bgr)),(Chi,Oar))))),(Mau,(Mmu,Mca)))));

CodeML Site model err

The err is Warning: Hessian matrix may be unreliable for zero branch lengths and then is killed.
I do not how to do it.
Thanks for your help!

problems about "multiple NSsites models"

Hi Professor:
When I run multiple NSsites models in one go by specifying several models on the NSsites line in codeml.ctl: NSsites = 0 1 2 3 7 8, only the results of NSsites=0 was outputted, and revealed "# seqs in tree file does not match. Read as the nexus format" in the screen.
How can I run multiple NSsites models in one analysis?
Thank you very much for your reading and replying!

model 8 - newer versions don't always converge - choosing extreme dN/dS instead

hi there,

I recently upgraded from a quite old version (4.9a) to version 4.10.6.

I noticed that for some of our 'favorite' genes, the newer PAML version does NOT give robust evidence for sitewise positive selection (M8 versus M8a or M7, also M2 versus M1) when old PAML did. One such gene is MxA where we've got good experimental evidence for functional differences at the sites PAML picks out with BEB (nice!).

I think I have a small clue about what's going on. Bottom line is that the older version is better at converging on something like this for the positively selected class in M8 and M2 - I'll call this result A:

(p1 =   0.02162) w =   4.94585

but the newer version sometimes (not always) chooses something like this instead - I'll call this result B:

(p1 =   0.00082) w = 999.00000

From a biology standpoint, result A makes more sense - result B is too extreme to make practical sense (p1 often works out to less than one codon). Also from a numerical standpoint, result A seems better supported:

Result B (newer version) has a much worse ML (no better than M7 or M8a) than result A (significant LRT against with M8-8a, M8-7 and M2-1).
When I see result A, it's usually robust to using different codon models (2 or 3) as well as different starting omega (0.4 or 3). When I see result B, it's less robust - sometimes changing codon model actually rescues it to result A (but because the finding changes with different starting parameters, we would have thrown out that result).

I'm seeing result B more often with version 4.10.6 than 4.9a. I've also tested versions 4.9g, 4.9h, 4.9j - I think they also come up with result B more often.

Did something change between 4.9a and 4.9g about the constraints on estimating p1 and omega of the positively selected class?

I think for our lab we'll stick to v4.9a for now, but I know that's getting old.

I'll attach a couple of alignments that demonstrate the behavior. Hope I'm making some sense here - happy to chat if I'm not.

thanks!

Janet

Dr. Janet Young

Malik lab
http://research.fhcrc.org/malik/en.html

Division of Basic Sciences
Fred Hutchinson Cancer Center
1100 Fairview Avenue N., A2-025,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 4512
email: jayoung ...at... fredhutch.org

Can we use snp only to estimate divergence time using mcmctree?

Hi,

Can we use snp only (without any non-poly sites) to estimate divergence time using mcmctree? Why or why not?

Best,
Kun

baseml does not parse dates when using a clock model

I apologize if this not the sort of issue to post here, but I think it is related to a bug in the code and not my use of the program. (Please close this without comment if you disagree!)

I was not able to get baseml to parse tip dates when using a clock model. The output always returned tip dates of 0 for all tips.

I think this is is because GetTipDate in treesub.c iterates over the species in stree and as far as I can tell this is 0 in baseml (and maybe codeml as well). In my own experiments I was able to get baseml to parse the dates if I instead iterate over com.ns and use nodes instead of stree.nodes. I don't think this is a very robust solution and probably breaks other parts of the code, but I hope it helps point to a straightforward solution.

tree file not found.

Hello,

I tried to use the example alignment HIVenvSweden.txt with the following control file codeml.ctl:

   seqfile = HIVenvSweden.txt
   outfile = results.out
    noisy = 0  
   verbose = 0  
   runmode = -2  
   seqtype = 1  
  CodonFreq = 2  
    model = 0
   NSsites = 0

And I get the following error with codeml

CODONML in paml version 4.10.5, March 2022

tree file  not found.

dS > 1

Hi,
I weird results with a dS of 26 for a neutral model. How can this be possible ? Isn't the dS supposed to be the pS devided by the number of positions where a sunonymous mutation is possible ?

Thanks,

Reg. free ration of #999

Dear Dr. Yang,
I agree you have answered several times about this issue in google_groups. Can you kindly clarify by doubt on how to interpret this ratio whether branch with #999 is error or it can be a positive selection ?

Thanking you!

Regards,
Dr. Prabhakaran S, Scientist, NIPGR.

error: edid 253 / 253 patterns (codeml)

Hello, when running codeml (paml v4.10.7) on MacBook Pro M1, the program quits during NSsites Model 8 with the following error:

error: edid 253 / 253 patterns 4:02(base) ~/desktop/paml$

I've included the output from running the codeml command, the control file, and the multiple sequence alignment, and the tree. Any help would be greatly appreciated.
err.txt
paml_out.txt
consensAlign.ordered.phylip.txt
consensAlign.ordered.phylip.treefile.txt
codeml.ctl.txt

develop a python test base

mcmctree terminates with "Resetting lnL"

remove compiler warning messages

edit the code to remove compiler warning messages.

Wrong version number output in 4.10.7 (mac)

Hello,
I noticed today that the 4.10.7 version for mac prints out that it's version 6. I also checked that this is so in the src/paml.h file:

grep version paml.h #define pamlVerStr "paml version 4.10.6, November 2022"

I am assuming this is just a matter of fixing this message, but I wanted to bring it to your attention and also confirm that this version for mac is indeed 4.10.7?

Thank you

Errors when running codeml 4.10.7 for mac M1

Hello,

Recently, I started running codeml using M7 and M8 to detect positive selection from the latest release PAML 4.10.7 for mac M1. Model 7 was running with no problems, but M8 kept giving me the same error:

# seqs in tree file does not match.  Read as the nexus format.
Error: tree err1: EOF.

However, when I run codeml using the same control file, tree file, and sequence alignment file using 4.10.7 for linux, it ran smoothly without errors. I wonder if there is any issue specific for the mac M1 release. Could you please check? Thanks so much!

Best,
Sung-Ya

DO ASR with PAML，But all ancestral sequences I got have same length？

Dear Doc. Yang,

I don't know why all ancestral sequences I got by paml4.6 have same length. No matter I use my data or example data in the paml4.6.
Following is my output(I changed some sites of the example data stewart.aa to check if there was dash in ancestral sequences, I marked the sites I changed with []):

List of extant and reconstructed sequences

10    130

Langur KIFERCELAR TLKKLGLDGY KGVSLANWVC LAKWESGYNT EATNYNPGDE STDYGIFQIN SRYWCNNGK[-] PGAVDACHIS CSALLQNNIA DAVACAKRVV SDPQGIRAWV AWRNHCQNKD VSQYVKGCGV
Baboon KIFERCELAR TLKRLGLDGY RGISLANWVC LAKWESDYNT QATNYNPGDQ STDYGIFQIN SHYWCNDGK[-] PGAVNACHIS CNALLQDNIT DAVACAKRVV SDPQGIRAWV AWRNHCQNRD VSQYVQGCGV
Human KVFERCELAR TLKRLGMDGY RGISLANWMC LAKWESGYNT RATNYNAGDR STDYGIFQIN SRYWCNDGK- PGAVNACHLS CSALLQDNIA DAVACAKRVV RDPQGIRAWV AWRNRCQNRD VRQYVQGCGV
Rat KTYERCEFAR TLKRNGMSGY YGVSLADWVC LAQHESNYNT QARNYDPGDQ STDYGIFQIN SRYWCNDGK- PRAKNACGIP CSALLQDDIT QAIQCAKRVV RDPQGIRAWV AWQRHCKNRD LSGYIRNCGV
Cow KVFERCELAR TLKKLGLDGY KGVSLANWLC LTKWESSYNT KATNYNPSSE STDYGIFQIN SKWWCNDGK- PNAVDGCHVS CSELMENDIA KAVACAKKIV SE-QGITAWV AWKSHCRDHD VSSYVEGCTL
Horse KVFSKCELAH KLKAQEMDGF GGYSLANWVC MAEYESNFNT RAFNGKNANG SSDYGLFQLN NKWWCKDNK- RSSSNACNIM CSKLLDENID DDISCAKRVV RDPKGMSAWK AWVKHCKDKD LSEYLASCNL
node #7 KVFERCELAR TLKRLGMDGY RGISLANWVC LAKWESNYNT QATNYNPGDQ STDYGIFQIN SRYWCNDGKL PGAVNACHIS CSALLQDNIA DAVACAKRVV RDPQGIRAWV AWRNHCQNRD VSQYVQGCGV
node #8 KVFERCELAR TLKRLGMDGY RGISLANWVC LAKWESGYNT QATNYNPGDQ STDYGIFQIN SRYWCNDGKL PGAVNACHIS CSALLQDNIA DAVACAKRVV RDPQGIRAWV AWRNHCQNRD VSQYVQGCGV
node #9 KIFERCELAR TLKRLGLDGY RGISLANWVC LAKWESGYNT QATNYNPGDQ STDYGIFQIN SRYWCNDGKL PGAVNACHIS CSALLQDNIA DAVACAKRVV SDPQGIRAWV AWRNHCQNRD VSQYVQGCGV
node #10 KVFERCELAR TLKRLGMDGY RGISLANWVC LAKWESNYNT QATNYNPGDE STDYGIFQIN SKWWCNDGKL PGAVNACHIS CSELLEDNIA DAVACAKRVV RDPQGITAWV AWRNHCQDRD VSQYVQGCGL

############################################################
codeml.ctl file like following:
seqfile = stewart.aa * sequence data filename
treefile = stewart.trees * tree structure file name
outfile = mlc * main result file name

    noisy = 9  * 0,1,2,3,9: how much rubbish on the screen
  verbose = 1  * 0: concise; 1: detailed, 2: too much
  runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic
               * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

  seqtype = 2  * 1:codons; 2:AAs; 3:codons-->AAs
CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table

   ndata = 10
  clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
 aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a

aaRatefile = ../dat/jones.dat * only used for aa seqs with model=empirical(_F)
* dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own

  model = 2
             * models for codons:
                 * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
             * models for AAs or codon-translated AAs:
                 * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
                 * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)

NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
             * 5:gamma;6:2gamma;7:beta;8:beta&w;9:beta&gamma;
             * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
             * 13:3normal>0

  icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below
  Mgene = 0
             * codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff
             * AA: 0:rates, 1:separate

fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated
kappa = 2 * initial or fixed kappa
fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate
omega = .4 * initial or fixed omega, for codons or codon-based AAs

fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha
alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)
Malpha = 0 * different alphas for genes
ncatG = 8 * # of categories in dG of NSsites models

  getSE = 0  * 0: don't want them, 1: want S.E.s of estimates

RateAncestor = 1 * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

Small_Diff = .5e-6
cleandata = 0 * remove sites with ambiguity data (1:yes, 0:no)?

fix_blength = -1 * 0: ignore, -1: random, 1: initial, 2: fixed
method = 0 * Optimization method 0: simultaneous; 1: one branch a time
Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,
4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt.,
7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt.,
10: blepharisma nu.
These codes correspond to transl_table 1 to 11 of GENEBANK.

Error: you should specify # seqs in the tree file.

Hi, there is an error when I run the mcmctree, counld you give me some help?
this is my tree:
((Dicentrarchus_labrax,(Larimichthys_crocea,(Acanthopagrus_latus,Sparus_aurata)))'>2.1',((Epinephelus_lanceolatus,Sander_lucioperca#),(Lates_calcarifer,(Oreochromis_niloticus,(Danio_rerio,(Oryzias_latipes,Oryzias_melastigma))))),(Lateolabrax_maculatus,Micropterus_salmoides));

this is error log:
mcmctree mcmctree3.ctl
MCMCTREE in paml version 4.10.0, September 2020

Reading options from mcmctree3.ctl..
finetune is deprecated now.
Reading master tree.

Error: you should specify # seqs in the tree file.

mcmctree terminated with errors "resetting lnL" when usedata =2

Hello, I meet a problem when using approximate likelihood calculation during the MCMC. Please help me.

My phylogenetic tree is over 100 taxa, I want to shorten the estimation of divergence time, so I use approximate likelihood calculation. The first step "usedata = 3" is smooth, but the next step "usedata = 2"in nohup.out file, e.g., 910275614542723227185095639040.000000 = 910275614542725760459886034944.000000? Resetting lnL
Similar sentences appear five times. Please check my compressed files including 4 input files (input.tre
input.tre.zip
, input.phy
input.phy.zip
, mcmctree2.ctl
mcmctree2.ctl.zip
, in.BV) and 4 output files output files.zip (nohup.out, mcmc.out, mcmc.txt, SeedUsed), I would sincerely appreciate your advice.

Look forward to your reply!
Best regards,

Mingyue YE

TipDate option in baseml and codeml broken (since 4.9i) and not printing out date estimates

ndata in baseml/codeml: read one tree for each dataset

Read one tree for each dataset/alignment, and allow the possibility that the number of species/sequences may vary among datasets.

Some thing go wrong when species ID is/are numeric.

Hi,

When species name is/are numeric(larger than species number, >122 in my case), mcmctree can not run normally, it seems that only v4.7a/b can handle this situation.
For example
122 1
((((((((((((((((38,110),80),40),(((1,44),10),36)),(((((116,((84,85),86)),30),Hic),107),87)),((122,108),88)),(120,125)),121),((119,123),124)),((66,79),55)),81),22),(((((((((39,(((11,98),12),((47,(62,70)),48))),43),((((103,109),74),(32,100)),(7,95))),3),(15,19)),(46,56)),((((20,75),(35,37)),41),61)),93),(((((((((((((((54,52),53),33),17),50),(60,94)),((((59,(64,(105,9))),101),(2,63)),(34,(71,90)))),6),(((((((4,72),5),31),29),102),27),89)),106),49),(57,16)),(104,8)),23),28))),((82,99),97)),((((51,(((((((((58,112),76),((18,77),67)),69),(113,114)),(118,13)),111),92),83)),((21,24),26)),91),((25,((42,73),(68,78))),96))),45)'>0.0688<0.2096';

mcmctree mcmctree100w.ctl
MCMCTREE in paml version 4.9j, February 2020

Reading options from mcmctree100w.ctl..
finetune is deprecated now.
Reading master tree.

Seq/species #1 (Hic) occurs more than once in the tree

and when I change Hic to 130

MCMCTREE in paml version 4.9j, February 2020

Reading options from mcmctree100w.ctl..
finetune is deprecated now.
Reading master tree.

Seq/species #1 (130) occurs more than once in the tree

and when I add R(or any other letters) before each species ID, every thing was normal.

So I think this is a bug(except v4.7a/b).

Best,
Kun

`#elif with no expression` in treesub.c function ‘DistanceMatNuc’

I'm using gcc 7.3.0 and get a compilation error from baseml.c

cc  -O3 -Wall -Wno-unused-result -c baseml.c
In file included from baseml.c:137:0:
treesub.c: In function ‘DistanceMatNuc’:
treesub.c:2144:6: error: #elif with no expression
 #elif
      ^
make: *** [baseml.o] Error 1

The error seems to be because #elif is used without defining elif condition in treesub.c:

#if(1)
         if (fout) fprintf(fout, " %6.3f", t*com.ls);
#elif
         if (fout) fprintf(fout, " %9.6f", com.ls);
#endif

I think you may have this conditional to keep old code in the elif around, is so this change may work:

#if(1)
         if (fout) fprintf(fout, " %6.3f", t*com.ls);
#elif(1)
         if (fout) fprintf(fout, " %9.6f", com.ls);
#endif

free-ratio doesn't work?

Hi! Does the free-ratio model still work for paml4.10?

I can't find any description of the free-ratio in paml4.10.

Thank you for your reply!