pievos101 / popgenome Goto Github PK

View Code? Open in Web Editor NEW

25.0 1.0 3.0 923 KB

An Efficient Swiss Army Knife for Population Genomic Analyses in R

R 65.37% C 19.12% C++ 15.51%

population-genomics snps genome-analysis

popgenome's People

Contributors

Stargazers

Watchers

Forkers

jodyphelan pgomezgonzalez teresapegan

popgenome's Issues

Kind ask about readMS result

Dear professor,
Thanks for developing PopGenome which is the most useful package for me.
If I have MSMS simulation result which is stored in txt file and use 'readMS' to read it,

could you please tell me how can I extract that positions (0-1 scale) of each replicate of MSMS in GENOME.class?
Thank you for your help and time.

Extracting total non-synonymous and synonymous sites per alignment

Hi @pievos101

Is it possible to extract the total number of predicted non-synonymous and synonymous sites from a "GENOME" object created from on an in frame fasta alignment? I can get SNPs of either category but not invariant sites.

I've sent a question through your weebly site also, sorry for the duplication.

Best wishes,
Jack

PopGenome Not Reading In FASTA File

'm trying to run my alignment of a gene (CDS) with multiple species and multiple individuals per species. When using the 'readData()' function in the PopGenome package, I can see that it's reading in my file because it shows the correct number of sites, but doesn't show anything else (# gaps, unknowns, trans/transv ratio, etc).

> get.sum.data(Croc_AVP.class) n.sites n.biallelic.sites n.gaps n.unknowns n.valid.sites n.polyallelic.sites trans.transv.ratio ExonCap-Crocodylus_AVP_outNT_MKTtest.fasta 858 0 0 0 0 0 NaN

I tried dos2unix to convert my file to have Unix linebreaks in the fasta file to mirror the PopGenome example fastas, but to no avail.

Which method is implemented to calculate site.FST(Fst value par SNP)

@pievos101
Hi, Thank you for excellent tool.

I would like to ask about site.FST.

Which method is implemented to calculate Fst values per SNPs?
Wier & Cockerham or something like that....

I would like to calculate Fst values using haploid genome.

Best regards,

Archived on CRAN

Hi Bastian,
as PopGenome is currently archived on CRAN, our package coala, which suggests PopGenome, has also been archived. Will PopGenome be back on CRAN any time soon such that it makes sense to keep the popgenome import functions in coala and the vignette that explains the interface?
Best wishes and many thanks,
Dirk

How do you specify a genetic code for analyses like MKT?

Hello! I would like to use the MKT function in PopGenome with a mitochondrial dataset. Because this involves assigning sites as synonymous or nonsynonymous, it is important for me to be able to specify that I need to use the vertebrate mitochondrial code, not the standard code. I cannot find a way to do this. Is it possible, or can this MKT function only be used on standard data?
Thanks!
-Teresa

Unable to read (compressed) vcf

Dear PopGenome admin,

I am unable to read or process any (small) vcf format with the PopGenome package (on windows and Centos HPC).
The vcf only contains about 1000 SNP's.

The traceback from the HPC is as following:

loc <- "/ddn1/vol1/site_scratch/leuven/330/vsc33060/VCF/vcf_compressed/load"
GENOME.class <- readData(loc, format = "VCF", SNP.DATA = T, include.unknown=T)
| : | : | 100 %
|
*** caught segfault ***
address (nil), cause 'unknown'

Traceback:
1: myReadVCF(filepath)
2: PopGenread(liste[xx], format)
3: doTryCatch(return(expr), name, parentenv, handler)
4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, classes, parentenv, handlers)
6: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L sm <- strsplit(conditionMessage(e), "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && isTRUE(getOption("show.error.messages"))) { cat(msg, file = outFile) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
7: try(PopGenread(liste[xx], format), silent = TRUE)
8: readData(loc, format = "VCF", SNP.DATA = T, include.unknown = T)

Thank you for your insights,
With kind regards,
Frederik Van Daele

Inconsistent results with other software

Hello,

Importing 100 sequences from a FASTA file, results of neutrality stats are incosistent with other software and R packages. Number of segregating sites for each population is lower by 1 from those calculated by other software. Is this a bug?

Suggestion for speed up

Could you please provide a method to clear the calculated statistics? This can avoid repeated reading and speed up the program

RNDmin

Hi there,

In the introgression.stats method, is there ability to calculate Z-scores for the RNDmin metric (as for D and f stats)?

I have a sister-taxon introgression scenario I wish to test and this would be most helpful!

Bryan

Error with vcf file

Dear Bastian,

When I read in a vcf file, I encountered this error, could you please help to tell me how to solve it?

VCF_readIntoCodeMatrix :: Malformed GT field!
Error in 1:numusedcols : NA/NaN argument
Calls: readVCF

Thank you so much for your attention and participation.

Best wishes,
Xiaomeng

Add CSI index support

We had a request from a BCFtools user to assist in splitting the chromosomes from a VCF file into smaller scaffolds, which can then be indexed in TBI format. The user intends to use the indexed VCF file in a PopGenome analysis, but currently cannot do so, because some chromosome in the file are larger than 2²⁹-1, which is the upper limit of the TBI format, and PopGenome apparently does not accept CSI indices, which could accommodate these large chromosomes.

You can find a minimal description of the CSI format here.