jiscah / sequoia Goto Github PK
View Code? Open in Web Editor NEWR package for pedigree inference based on SNP data
R package for pedigree inference based on SNP data
Hi Jisca,
I'm taking your advice and simulating pedigrees to test the use of sequoia for my intended use case. I'm running many many replicates across some parameters of interest (depth of simulated pedigree, probability of a female choosing to reproduce asexually, tolerance for sib-mating) and I've tested my pedigree simulation function quite well.
I have encountered the following error while trying to compile a markdown file for some results:
Quitting from lines 2031-2148 (analysis.Rmd)
Error in SeqParSib(ParSib = "sib", FortPARAM, GenoM, LhIN = LifeHistData, :
ERROR! *** setParTmp: Sibship number out of bounds *** Please report this bug.
Calls: <Anonymous> ... suppressMessages -> withCallingHandlers -> sequoia -> SeqParSib
Execution halted
This does not happen every time. In this case, it happened on the 41st replicate scenario. Unfortunately because it occurred while compiling a markdown file, I cannot retrieve the genotype matrix and pedigree than produced the error.
I am not at all sure what this means but it does instuct me to report the bug, so here I am!
Best,
Ollie
Dear Jisca,
I know that you are working hard on a new version of sequoia. Maybe It is the time to suggest you an option :
1- Option suggestion :
Sometimes i works with population without overlapping generations (hatchery process)... Indeed, I am always confident that parents are contained N generation and individuals of N+1 are offsprings of N individuals. One potential problem of such design is that as the generation goes the inbreeding increase and decrease our power of reconstructing the pedigree.
2- My trick to analyse without overlapping generations :
Some context elements to understand the design. I am analyzing a 4 generation massale selection populations. Individuals at generation N are challenged against a virus, and the survivors are mated together (the exact pedigree is unknow) to produce N+1 individuals and so on.
Actually, I analyze the dataset by take the individuals of generation N (I set the "By" column in Lfh table to 1 and sex with 4) and for N+1 individuals (I assign set the "By" column in Lfh table to 2 and sex with 4). By iteration, I get the whole pedigree with less wholes compare to the strategy where i analyze the dataset with everybody with all the generation.
Do you have an smart/better idea ?
I hope my frenchy english is good enough to explain my problem and solution.
Dear Jisca,
Again It is me.
Everything is ok with the new version of Sequoia, but I remark that is impossible to export a .Rdata after a sequoia run.
The error message is weird :
##-----------------------------------------------------------------------------
EstimateAssignmentPerformance <- function (geno=NA,ped=NA,lfh=NA,nb_repetition=NA,nb_loci=
[... more lines with Sequoia in it ...]
}
Run the function with small numbers
test <- EstimateAssignmentPerformance(geno=geno, ped=ped1,lfh=lfh1, nb_repetition=1 , nb_loci=c(5,10), ErroR=0.01 )
[... output from the function ...]
Trying to export test dataset as test.Rdata
name <- paste("test",version,".Rdata",sep="")
mypath4 <- file.path("G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug",name,fsep = .Platform$file.sep)
print(mypath4)
the output of this line is not OK... I should have only one line
[1] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/testx86_64-w64-mingw32.Rdata"
[2] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/testx86_64.Rdata"
[3] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/testmingw32.Rdata"
[4] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/testx86_64, mingw32.Rdata"
[5] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/test.Rdata"
[6] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/test3.Rdata"
[7] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/test4.1.Rdata"
[8] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/test2017.Rdata"
[9] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/test06.Rdata"
[10] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/test30.Rdata"
[11] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/test72865.Rdata"
[12] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/testR.Rdata"
[13] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/testR version 3.4.1 (2017-06-30).Rdata"
[14] "G:/IFREMER/02-PROJET/GENOYSTER/07_Sequoia_bug/ReproduceBug/testSingle Candle.Rdata"
save(test,file=mypath4)
Error in gzfile(file, "wb") : invalid 'description' argument
In addition: Warning message:
In if (!nzchar(file)) stop("'file' must be non-empty string") :
the condition has length > 1 and only the first element will be used
##-----------------------------------------------------------------------------
I attach folder to reproduce the bug.
Cheers,
JB
Hi,
I'm trying to run Sequoia on an Ubuntu server (16.04, running Microsoft R Open 3.3.2, with the current github version of Sequoia, 0.5).
This is my command:
ParOUT <- sequoia(GenoFile = GenoDataFile,LifeHistFile = "LifeHistFile_assigned_SA.txt", NumSibRounds=10, MaxSibshipSize=10000, MaxMismatch = 20, WriteFiles = TRUE)
And that's the output:
At line 7488 of file Sequoia.f90
Fortran runtime error: No such file or directory
I do get AgePriors.txt and SequoiaSpecs.txt in sequoiaOUT/ folder, so these functions work.
Same function with the same data works on windows.
Thanks, Ido
From the example in the vignette, the call to sequoia with MaxSubIter=1, you get an error that it can't find ErrM in the DuplicateCheck function. However, ErrM is an element of ParOUT, so you should make sure to pass ParOUT$ErrM to DuplicateCheck. Here is code that gets the error and shows a work-around.
`
library(sequoia")
data(Ped_HSg5, LH_HSg5)
set.seed(1000)
Geno <- SimGeno(Ped = Ped_HSg5, nSnp = 200)
Geno[1:5,1:5]
ParOUT <- sequoia(GenoM = Geno, LifeHistData = LH_HSg5, MaxSibIter = 0)
names(ParOUT)
table(is.na(ParOUT$PedigreePar[,2]))
SeqOUT <- sequoia(GenoM = Geno, SeqList = ParOUT, MaxSibIter = 1)
SeqOUT <- sequoia(GenoM = Geno, SeqList = ParOUT, MaxSibIter = 1, Err=1e-4)
ErrM <- ParOUT$ErrM
SeqOUT <- sequoia(GenoM = Geno, SeqList = ParOUT, MaxSibIter = 1)
`
Hi,
Could you please clarify what version of R I should install to run sequoia?
I have tried 4.2.0 and 4.0.0 and in both cases I get the message "package ‘sequoia’ is not available (for R version 4.0.0)".
Thank you so much!!!!!!!!
Leo
Hi.
I am running sequoia for 4702 genotyped animals with 30K SNP, 500 SNP, and 200 SNP.
4702 animals are the same birth year and have no pedigree, they are from fish cages, produced by m x n parents but I did not know how many. I would like to test if there is a ped construct for the data.
It seemed Sequoied is running non-stop for almost a week without any output with 200SNPs even in my Mac (16 GBs) and HPC (5 cores, 124 Gbs). I am using Sequoia_notR.
Could you please help me with how to faster the run or/and parameter setting for my data?
Much appreciated!
Vu Nguyen
Dear Jisca,
I am trying to make power analysis on various data quality (mendelian errors and segregation distortions).
My "R loops for" are ok, but as soon as I uncomment sequoia part, the program crashes sometimes.
You could download ".Rdata" (geno + lfh +ped) and the R code to reproduce this error.
For example, I able to run very large problem using sequoia (2000 snp across 200 individuals) alone... but as soon sequoia is embledded in a loop for, bug could appear with a rather unpredictable fashion. This make bug tracking quite tedious.
Initially I was suspecting a problem with Rstudio 0.9 and R 3.4.0. This morning, I reinstall R 3.4.1 and the lastest Rstudio 1.0.153.
With the previous install, R session is crashing when I embedded sequoia in a for loop, the bug is not stable sometimes it happen, sometimes not.
My ideas (I am not a specialist) :
I am searching stables solutions to make a power analysis (avoid to crashes a the end of 1000 iterations).
1- I will re-install sequoia using " install_github("JiscaH/sequoia")".
2- Use the same code on different computer (linux).
3- Use sequoia from Fortran version (on linux wrapped in bash script)... But the documentation is scarce, especially for a fortran newbies as me. Could you provide even rough scripts of fortran equivalent of :
-sequoia(GenoM = GM, LifeHistData = lfh, MaxSibIter = 0, Err=0.1, MaxMismatch = threshold,quiet=TRUE)
-sequoia(GenoM = GM, SeqList=ParOUT, MaxSibIter=10 , MaxSibshipSize=30,quiet=TRUE)
-PedCompare(Ped1=ped,Ped2=SeqOUT$Pedigree).
Best regards,
JB
Hi Jisca,
I am sure that you have a R trick to access to dam object stored in compareOUT$Counts...
compareOUT <- PedCompare(Ped1=ped1,Ped2=SeqOUT$Pedigree)
dam <- compareOUT$Counts[1] ## NOT WORKING
dam <- unlist(compareOUT$Counts)[1] ## NOT WORKING
best regards,
JB
Hi,
I'm running a large dataset in sequoia that is exiting with the following error:
Sibships - Initial Total LL :
[1] -232355
Round 01 end, Total LogLik:
[1] -228607.6
No. dams, sires (incl. dummies):
[1] 362 362
*** caught segfault ***
address (nil), cause 'unknown'
Traceback:
1: SeqParSib(ParSib = "sib", Specs = Specs, GenoM = GenoM, LhIN = LifeHistData[, 1:3], AgePriors = Ag$
2: sequoia(GenoM = GenoM, LifeHistData = LH, MaxSibIter = 10, MaxMismatch = 10, UseAge = "yes", FindM$
An irrecoverable exception occurred. R is aborting now ...
/var/spool/slurmd/job34109895/slurm_script: line 25: 164736 Segmentation fault (core dumped) Rscript $
Other, smaller, data files complete normally. I have reproduced this error across R console v3.5.1 and RStudio v1.1.456 on my MAC running Sierra, as well as on an HPC. In all cases, the package runs normally most of the way through the analysis before exiting with the seg fault error. Hoping to find a fix for this, thanks for any suggestions!
Dear Jisca,
First, thanks for writing such a useful package.
I have encountered an unusual behaviour which I don't know how to resolve. I have genotypes of parents and offspring, and for the vast majority of them (48/51) sequoia correctly assigns the offspring using the main sequoia()
function. The scenario is simple: several controlled monogamous crosses where maternal and paternal ID is exactly known. There are three for whom the mother cannot be assigned even though the father is correctly inferred and all of the other offspring in that family are assignable.
When I use the CalcPairLL()
function specifying "PO" as the focal relationship for all pairs (each pair being a unique combination of possible parents and the orphan offspring) the output shows that for most pairs all relationships are considered impossible (LL = 777.00). For instance here is the output table for one such offspring:
ID1 ID2 Sex1 Sex2 AgeDif focal patmat dropPar1 dropPar2 PO FS HS GP FA HA U TopRel LLR
C9_310521_04 C01220 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -242.01 U NA
C9_310521_04 C10223 3 2 1 PO 2 none none 777.00 999.00 999.00 999 999.00 999.00 -222.22 U NA
C9_310521_04 C01222 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -232.22 U NA
C9_310521_04 C01232 3 2 1 PO 2 none none 777.00 999.00 999.00 999 999.00 999.00 -229.85 U NA
C9_310521_04 C01223 3 2 1 PO 2 none none 777.00 999.00 999.00 999 999.00 999.00 -228.17 U NA
C9_310521_04 C01213 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -236.77 U NA
C9_310521_04 C01230 3 2 1 PO 2 none none 777.00 999.00 999.00 999 999.00 999.00 -217.27 U NA
C9_310521_04 C01224 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -227.89 U NA
C9_310521_04 C01211 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -222.45 U NA
C9_310521_04 C01231 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -224.64 U NA
C9_310521_04 C10133 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -243.79 U NA
C9_310521_04 C01221 3 2 1 PO 2 none none 777.00 999.00 999.00 999 999.00 999.00 -231.01 U NA
C9_310521_04 C01219 3 2 1 PO 2 none none 777.00 999.00 999.00 999 999.00 999.00 -248.07 U NA
C9_310521_04 C01227 3 2 1 PO 2 none none -205.81 -211.12 -211.18 777 -211.32 -211.18 -224.29 PO 5.31
C9_310521_04 C01233 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -215.57 U NA
C9_310521_04 C01228 3 1 1 PO 1 none none 777.00 999.00 999.00 999 999.00 999.00 -219.18 U NA
C9_310521_04 C01234 3 2 1 PO 2 none none 777.00 999.00 999.00 999 999.00 999.00 -227.95 U NA
C9_310521_04 C01210 3 2 1 PO 2 none none 777.00 999.00 999.00 999 999.00 999.00 -251.84 U NA
In this case, the father is correctly assigned (C01227 in column ID2) but for any other potential parent, the relationship is considered impossible and no LL is calculated. The true mother here is C01228. I confess I have no idea what this means for how the software is behaving. Can you please advise?
Best,
Oliver
Edit:
I should add that when using the primary function sequoia()
these three offspring have no mother assigned to them, rather than an incorrect mother. Here is a sample output:
$PedigreePar
id dam sire LLRdam LLRsire LLRpair OHdam OHsire MEpair
C9_310521_04 <NA> C01227 NA 4.60 NA NA 0 NA
I have a dataset of tortoises that I'm running through sequoia
, then plotting with pedtools
. I'm using the true birth years of the tortoises as a prior (birth years range from 1960-2020), but unfortunately the sibship analysis run by sequoia
allows parent-offspring relations between tortoises that are only 1-3 years apart. In reality, we know it takes ~10 years for a tortoise to become reproductively active. When I change birth year to "generation" (e.g. 1, 2, 3, 4, etc.), I get many more spurious relationships (five tortoises become their own ancestors).
It seems like using the true birth years as a prior, then just manually changing the resulting pedigree from impossible parent-offspring relationships to sibling relationships resolves this issue. I wonder, however, how complicated it would be to include generation time or minimum age at reproduction, as another optional parameter.
Thanks for the great package. It's proving very useful in my work.
From the example in the vignette, the call to sequoia with MaxSubIter=1, you get an error that it can't find ErrM in the DuplicateCheck function. However, ErrM is an element of ParOUT, so you should make sure to pass ParOUT$ErrM to DuplicateCheck. Here is code that gets the error and shows a work-around.
library(sequoia") data(Ped_HSg5, LH_HSg5) set.seed(1000) Geno <- SimGeno(Ped = Ped_HSg5, nSnp = 200) Geno[1:5,1:5] ParOUT <- sequoia(GenoM = Geno, LifeHistData = LH_HSg5, MaxSibIter = 0) MaxSibIter = 0) names(ParOUT) table(is.na(ParOUT$PedigreePar[,2])) args(sequoia) SeqOUT <- sequoia(GenoM = Geno, SeqList = ParOUT, MaxSibIter = 1) SeqOUT <- sequoia(GenoM = Geno, SeqList = ParOUT, MaxSibIter = 1, Err=1e-4) ErrM <- ParOUT$ErrM SeqOUT <- sequoia(GenoM = Geno, SeqList = ParOUT, MaxSibIter = 1)
Dear Jisca,
I am trying to reconstruct pedigree (at least parents-offsprings relationship) in a lab population under experimental infection during 4 generation. I only know the age of individual... In some cases (limited) I know the sex of individual. The good thing is that the design is not with overlapping generation.
My intuition is telling me, that is a format error... but I am not more idea.
Error in SeqParSib("par", Specs, GenoM, LifeHistData, AgePriors, Parents = NULL, :
'AgePriors' matrix should have size nAgeClasses * 8
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
I added to files to reproduce the error (I am on window 7 64 bit, using sequoia 0.9.3).
geno2_debug.txt
lfh2_debug.txt
Cheers,
JB
Dear, thank you for releasing this package which has been very helpful for my analysis.
I have searched for information on this and have not been able to find out if it is possible to tell sequoia in any way cross pairs as a priori information. I have pairs of males and females that we know were crossed, but we need to assign which offspring were generated by those crosses. In doing so without prior, and unfortunately due to a low amount of high quality SNPs we receive (app 50), the generated assignment does not correspond correctly to the crosses we know a priori, i.e. offspring are assigned to mothers and fathers that we know were not outcrossed.
Is there a way to incorporate this information into the analysis? Thank you in advance for your help.
Hi,
I am trying to get sequoia for R, and I am running into some errors. Any help would be appreciated.
Thank you!
Downloading GitHub repo JiscaH/sequoia@stable
✔ checking DESCRIPTION meta-information ... /5j3hs53s2cn0lhr1ps49slfw0000gn/T/RtmpNYHr2z/remotes1c093ea83438/JiscaH-sequoia-85ea741/DESCRIPTION’ ...
─ cleaning src
Warning: file 'sequoia/cleanup' did not have execute permissions: corrected
Warning: file 'sequoia/cleanup' did not have execute permissions: corrected
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.