Giter Club home page Giter Club logo

lsd2's People

Contributors

bqminh avatar tothuhien avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lsd2's Issues

Date file with the format yyyy-mm-dd

Dear Tothuhien,

How can I use a date file with the format:

tip_1 2020-01-15
tip_2 2020-02-26
tip_3 2019-12-31
tip_4 2020-06-45

to perform a dated tree?

Better error message for all unknown dates

Hi Hien,

I have a very corner-case example of a tree with a few tips and a date file that contains some dates for the ids that are not in the tree, and some very large date intervals (all the same) for the ids that are in the tree (arisen due to a naming problem ;) ).
I try to run LSD2 as follows:

lsd2 -i tree.nwk -d dates.tab -v 2 -c  -f 1000 -o tree_lsdated2 -r a -e 3 -s 1000

and obtain a segmentation fault:

TREE 1
*PROCESSING:
Reading the tree ... 
Parameter to adjust variances was set to 0.000684512 (settable via option -b)
Calculating the outlier nodes ...
Segmentation fault (core dumped)

What I would like to obtain instead is something like: "There are not enough temporal constraints provided to date this tree".

The tree and the dates are here.

date format

Hi,

Could you tell me how to format the file with the dates?
It works with the year but I don't understand how to format the date with months and possibly the day?
Thanks,

Cyril

Allow teh user to choose -s behaviour

Hi Hien,

I have a full-genome alignment of length 3,985,129 and it's a shame to have it replaced by 1000 for the CI estimation, as the CIs I get are unreasonably large for the dates, e.g. 2015.26 [1995.59 - 2016.04] while non-existant for the rates 1.47261e-06 [1.47261e-06; 1.47261e-06].

It would be better in my opinion to keep 1000 as a default but use the user-supplied value if it's available.

Cheers,
Anna

Handling unresolved trees

Hi Hien,

Could it be possible to add support for trees with politomies in LSD2? Now when I try to date one I get:

...
Estimating the root position locally around the given root ...
lsd2: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

If I randomly resolve politomies by adding zero branches, everything works.

Or at least a more informative error message should be added, so the user knows what to do :)

Thanks,
Anna

Details about confidence intervals

Hi Authors,

Thanks for your great work.
I am using IQ-TREE with default parameters to estimate the dates of the nodes. According to IQ-TREE: the confidence intervals are estimated based on a mixture of Poisson and lognormal distributions for a relaxed clock model.

May I know is it the 95% confidence interval or 99% confidence interval?

Thanks so much for your kind help.

malloc: Incorrect checksum for freed object ...: probably modified after being freed

I cloned the github repo, complied and tried LSD2 with a simple example:

~/software/lsd2/src/lsd2 -i example.phy.treefile -r a -s 1998 -c

However, there is a crash:

TREE 1
*PROCESSING:
Reading the tree ... 
Using the median branch length 0.0981519 to adjust variances ...
Minimum branch length of time scaled tree was set to 0/365 = 0
Estimating the root position on all branches using fast method ...
lsd2(11564,0x1153f9dc0) malloc: Incorrect checksum for freed object 0x7fa3cdd06690: probably modified after being freed.
Corrupt value: 0x0
lsd2(11564,0x1153f9dc0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Interestingly, this does not always occur, only after a few runs.

Moreover, when I tried the homebrew version, everything works fine:

TREE 1
*PROCESSING:
Reading the tree ... 
Estimating the root position on all branches using fast methode ...
*WARNINGS:
- The results correspond to the estimation of relative dates when T[mrca]=0 and T[tips]=1
*RESULTS:
- Dating results:
 rate 0.298264, tMRCA 0, objective function 148.416

TOTAL ELAPSED TIME: 0.006538 seconds

Do you know what happened? The tree file is attached.
example.phy.treefile.txt

Thanks!
Minh

Issue with CI calculation on a datafile with constraints on internal nodes (v2.3)

Hi Hien,

I have tried to redo an old analysis (which worked with lsd2 v1.4.2.2) and run into the segmentation fault.

Here is the tree: rooted_tree.nwk and the dates file: lsd2.dates, which has some constraints on some internal nodes, looking like:

mrca(CU2281-16,CU443-12)	b(1900,2008.62977767797)

With lsd2 v1.4.2.2, I was running the following command (successfully):

lsd2 -i rooted_tree.nwk -d lsd2.dates -v 2 -c -s 3894 -e 3 -f 100

With lsd v2.3, I can run the dating without CIs with no problem:

lsd2 -i rooted_tree.nwk -d lsd2.dates -v 2 -s 3894 -e 3

However, adding CIs (-f 100):

lsd2 -i rooted_tree.nwk -d lsd2.dates -v 2 -s 3894 -e 3 -f 100

leads to the following error:

TREE 1
*PROCESSING:
Reading the tree ... 
Collapse 23 (over 384) internal branches having branch length <= 0.000128403
 (settable via option -l)
Parameter to adjust variances was set to 0.013743 (settable via option -b)
Calculating the outlier nodes with Zscore threshold 3 (setable via option -e)...
Minimum branch length of time scaled tree (settable via option -u and -U): 0
Dating under temporal constraints mode ...
Re-estimating using variances based on the branch lengths of the first run ...
Computing confidence intervals using sequence length 3894 and a lognormal
 relaxed clock with mean 1, standard deviation 0.2 (settable via option -q)
Segmentation fault (core dumped)

If I remove the constraints from the date file, everything works smoothly.

Cheers,
Anna

Outliers not being removed

Hi Hien,

There seems to be a problem with outlier removal in LSD2 v1.4.2.3:
I'm running it as

lsd2 -i {input_tree} -d {input_dates} -v 2 -c -s {sequence_length} -o {output_name} -f 1000 -e 3 

It detects 32 outliers but they are present in the output {output_name}.date.nexus tree.

My tree and dates are here.

Accepting dates in ISO format

Right now the date file for -d option must contain dates as integer.

Is it possible to accept dates in ISO 8601 format, such as "2020-04-17" for 17 April 2020?

This is a common format that people used to store the dates of the sample. It's quite handy, also many users wouldn't know how to convert back and forth to integers, that LSD supports.

Thanks!

Segmentation fault when all the dates are imprecise

Hi Hien,

I have a data set where the only information available for the sampling is year, and I tried to specify the dates (all of them) as intervals, e.g. b(1970.0,1970.9972602739726) for 1970, then to run LSD2 (v1.4.2.2) as

lsd2 -i {input_tree} -d {input_dates} -v 2 -c -s {sequence_length} -f 1000 -o {work_dir} -e 3

What I get is a segmentation fault:

TREE 1
*PROCESSING:
Reading the tree ... 
Calculating the outlier nodes ...
bash: line 2: 13495 Segmentation fault      (core dumped) 

I suspect it might be due to the fact that intervals are not taken into account for outlier detection. In any case at least a more informative error message would be helpful.

Thanks,
Anna

Confidence interval values

Hi again Hien,

I have another question about confidence intervals. I have been getting some weird values on some of the nodes. See this tree attached for an example. The root node is supposed to be -65 mya old. The next node up from it is: 2.33388e-310, 2.33388e-310. I am guessing it has to do with the small branch lengths? There are other nodes like it on this tree. Is there a parameter I can use to adjust it so these numbers make sense?

I have also been trying this in iqtree2 the newest version (covid). I get similar errors in both.

Thanks so much again!

LSD14.nexus.pdf

Allow YYYY-MM-DD for tip and root dates

With the new support of YYYY-MM-DD date format, can you please allow -a rootDate and -z tipDate to accept this format?

Also these options currently require integer. Can you relax this assumption, i.e. a real number is OK?

Thanks a lot for your work, Hien!

Segmentation fault

Dear Hien,
I'm trying to run the latest versions of LSD2 (1.4.2, 1.4.2.1) and getting a segmentation fault:

> lsd2 -i ./rooted_tree.nwk -d ./lsd2.dates -v 2 -c -s 3512 -f 1000 -o ./lsd2_wd -e 3

TREE 1
*PROCESSING:
Reading the tree ... 
Calculating the outlier nodes ...
Segmentation fault (core dumped)

while the previous version (1.4) was working fine on the same data (available here).

Add functionality to include LSD2 as a library

It would be great being able to call LSD2 within another program, because it's often used as downstream analysis to tree reconstruction. So adding some API to call LSD2 as a library is desirable, which should meet the following:

  1. Converting input/output to C++ stream:
    One main issue is the input/output, which right now only works with files. It's better if the program allows to do input/output from memory to speed up the interface. However, this is not easy because LSD2 currently uses FILE data structure of C.

One solution is to convert everything into C++ istream for input and ostream for output. This re-factoring would allow us to pass everything to the library via in-memory structure such as stringstream and don’t have to rely on external disc files.

  1. Ensuring thread-safety:
    The API should be thread safe as it might be called on different threads (e.g. to date many trees in parallel). So for example, the functions should not use global or static variables.

Is it possible to do that?

Option -n broken

I stated the problem here, initially I thought it was a problem of the R wrapper.

Any version after 1.7.1 breaks when using -n != 1.
This problem carries over to IQtree and the R version.

is this okay for me

(base) lixingguangtekiMacBook-puro:bin lixingguang$ ./lsd2_mac -i 755.newick.tre -d 755.date -s 8462 -v 2 -f 1000

TREE 1
*PROCESSING:
Reading the tree ...
Collapse 2 (over 753) internal branches having branch length <= 5.90877e-05
(settable via option -l)
Parameter to adjust variances was set to 0.029848 (settable via option -b)
Minimum branch length of time scaled tree (settable via option -u and -U): 0
Dating under temporal constraints mode ...
Re-estimating using variances based on the branch lengths of the first run ...
Computing confidence intervals using sequence length 8462 and a lognormal
relaxed clock with mean 1, standard deviation 0.2 (settable via option -q)
*RESULTS:

  • Dating results:
    rate 0.00261091, tMRCA 1945.75 , objective function 0.910066
  • Results of the second run with variances based on results of the first run:
    rate 0.00263291, tMRCA 1943.85, objective function 0.951829
  • Results with confidence intervals:
    rate 0.00263291 [0.00237043; 0.00297681], tMRCA 1943.85 [1930.17; 1953.32], objective function 0.951829

TOTAL ELAPSED TIME: 58.9098 seconds
(base) lixingguangtekiMacBook-puro:bin lixingguang$

Understanding date notation in date file

Hi @tothuhien,
I'm looking at your examples for date notation in the date file and could use help getting this right for my analysis.

The example on the readme:

5			# number of temporal constraints
A 1999.2		# the date of A is 1999.2
B 2000.1		# the date of B is 2000.1
C l(1990.5)		# the date of C is at least 1990.5
D b(1998.21,2000.5)	# the date of D is between 1998.21 and 2000.5
mrca(A,B,C) u(2000.12)	# the date of the most recent ancestor of A,B, and C is at most 2000.12

I have an ancestral node that is the MRCA of a monophyletic group that is at least 48.7 MYA, but the max age is not know. Reading the example above, I would think that should be represented as l(-48.7), the node is at least -48.7, but could be older (more negative).

In comparing runs of lsd2 where I use l(-48.7) and u(-48.7), I think I have it backwards and I should be using u(-48.7). l(-48.7) results in an estimated node age of -48.7 from the analysis, but when I use u(-48.7) I get an estimated age that is older, -157.

Am I thinking about "at least" and "at most" in your examples backwards?
Thank you!

Request for date range specification

A nice feature of LSD is to allow users to specify the range of the dates in case of uncertainty. So you have b(x,y) for date range (x,y); u(x) for (-inf, x) and l(x) for the range (x,+inf). While this is OK, I think this notation of b, u, l is not intuitive and quite difficult to remember.

Why not just use the same math notation I wrote above?

An even easier solution for biologists (who are not mathy) is to use the range notation in R, because many biologists nowadays know R:

x:y for the range (x,y)

You can extend this format, say x: or x:NA if the upper bound is +inf and :y or NA:y if the lower bound is -inf. (NA for not available).

I prefer this format, which is more user-friendly.

What do you think? It is possible to implement this in LSD?

Perhaps you can still allow the b, u, l notation at the same time, for backward compatibility.

Error with setting root date

Hello-
I am trying to date the root node with -a b(-65,-55) and get an syntax error. Is it possible to give the root node an interval ?

~/bigdata/lsd2/src/lsd2 -i mytree6.tree -f 100 -s 1748014 -a b(-65,-55) -z 0 -u 0 -l -1 -o ~/bigdata/CH3-150/LCOMP_8_outfiles/LSD_OUT/LSD12

Thanks,
Dylan

More flexible date format

LSD allows the date format YYYY-MM-DD, which is very useful. thanks!

However, in my data there are some sample which dated 2020-01, meaning that we don't have the exact day, and only know that it was sampled in Jan 2020. Is it possible for you to accept this format?

I can of course pre-process the date again to the range from 2020-01-01 to 2020-01-31. But that's a bit tedious because different months have different number of days...

So can you allow YYYY-MM, just to specify the year and month of the sample? And internally LSD will convert it into a range and automatically account for uncertainty.

Also I have other samples with just YYYY, i.e. we only know the year of the sample. For example, 2019 really means the range from 2019-01-31 to 2019-12-31.

Is it also possible for accept this format? I know this is ambiguous. But perhaps if you see that at least another taxon has the date YYYY-MM-DD, then a simple YYYY would really mean that. Only when there is no YYYY-MM-DD for any taxa, then YYYY will really mean that number.

Sorry for a lot of requests... I'm just thinking about how to make LSD most user-friendly. Hopefully then people will come to use your software.

Thanks!

Relaxed Clock Parameters

I'm very excited about the ability to estimate confidence intervals using a relaxed clock! Do you have any suggestions for how to estimate an appropriate standard deviation for the UCLD based on the data?

Thanks!

Compiling error in Windows

Would you like to support LSD2 in Windows?

I'm using clang for Windows to compile LSD. There is a compilation error, see attached screenshot. It complaint that the time() function is not declared. I'll try to fix it.

Screen Shot 2020-04-05 at 1 35 07 pm

Request to change mrca notation

LSD allows to specify the date of ancestral node via mrca(A,B,C) for the most recent common ancestor of A,B,C. To be honest, this is quite difficult to remember due to the mrca notation.

How about just simply use a list A,B,C?

Together with my suggestion about the rate range format, the example may look like:

5			# number of temporal constraints
A 1999.2		# the date of A is 1999.2
B 2000.1		# the date of B is 2000.1
C 1990.5:NA		# the date of C is at least 1990.5
D 1998.21:2000.5	# the date of D is between 1998.21 and 2000.5
A,B,C NA:2000.12	# the date of the most recent ancestor of A,B, and C is at most 2000.12

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.