thierrygosselin / stackr Goto Github PK

stackr: an R package to run stacks software pipeline

Home Page: http://thierrygosselin.github.io/stackr/

R 100.00%

radseq rad genotype-likelihoods filter genetics genomics genomic-data-analysis genomics-visualization radseq-data gbs genotyping-by-sequencing stackr

stackr's Introduction

stackr: an R package to run stacks software pipeline

This is the development page of the stackr.

What’s the difference with running stacks directly in the terminal?

Besides running stacks within R, not much, tiny differences here and there that speed up my RADseq workflow:

The philosophy of working by project with pre-organized folders.
Some important steps are parallelized.
You have more than 1 sequencing chip/lane ? This workflow will save you lots of time.
Technical replicates, inside or across chip/lanes are managed uniquely.
Noise reduction.
Data normalization.
nightmares because of a crashed computer/cluster/server? stackr manage stacks unique integer (previously called SQL IDs) throughout the pipeline. It’s integrated from the start, making it a breeze to just re-start your pipeline after a crash!
mismatch testing: de novo mismatch threshold series is integrated inside run_ustacks and stackr will produce tables and figures automatically.
catalog: for bigger sampling size project, breaking down the catalog into several separate cstacks steps makes the pipeline more rigorous if your computer/cluster/server crash.
logs generated by stacks are read and transferred in human-readable tables/tibbles. Detecting problems is easier.
summary of different stacks modules: available automatically inside stackr pipeline, but also available for users who didn’t use stackr to run stacks.
For me all this = increased reproducibly.

Who’s it for?

It’s currently developed with my own projects in mind.
To help collaborators to get the most out of stacks.

It’s not for R or stacks beginners. stacks related issues should be highlighted on stacks google group.

Installation

To try out the dev version of stackr, copy/paste the code below:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("thierrygosselin/stackr")
library(stackr)

Citation:

To get the citation, inside R:

citation("stackr")

Web site with additional info: http://thierrygosselin.github.io/stackr/

Life cycle

stackr is maturing, but in order to make the package better, changes are inevitable. Argument names are very stable and follows stacks development closely.

Philosophy, major changes and deprecated functions/arguments are documented in life cycle section of functions.
The latest changes are documented in changelog, versions, new features and bug history
issues and contributions

Stacks modules and RADseq typical workflow

stackr package provides wrapper functions to run STACKS process_radtags, ustacks, cstacks, sstacks, rxstacks and populations inside R.

Below, a flow chart showing the corresponding stacks modules and stackr corresponding functions.

stackr's People

Contributors

Stargazers

Watchers

Forkers

anne-laureferchaud idobar juadiegaitan kawu001 crisale84 roseannagg wangpanqiao schnappi-wkl

stackr's Issues

Error installing stackr

When installing the stackr from Rstudio using the #devtools::install_github("thierrygosselin/stackr")# line, I get the following error

installing source package 'stackr' ...
** R
** inst
** preparing package for lazy loading
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called 'assertthat'
ERROR: lazy loading failed for package 'stackr'
removing 'C:/Users/uno/Documents/R/win-library/3.4/stackr'
Error: Command failed (1)

I have the last version of R (x64), Rstudio and Rtools.
Attached you can check the full log.

Error.R.txt

Reading in fastq files with vroom

Hi Thierry,

I was having a problem with reading in fastq files to generate read depth plots. I kept receiving an error message that vroom could not guess the deliminator.

Setting delim = "/n" in the code for both read_depth_plot and clean_fq seemed to do the trick.

Cheers,

Alex

vcf2dadi

Hi -

I'm having trouble using the vcf2dadi() function. First, the example code gives me the error: "Error in as_data_frame(.) : Not a graph object" when I run like bit of code assigned to id.vcf.

Second, it's not clear how to generate the files needed for assigning an outgroup. Do I need to run stacks populations on both ingroup and the outgroup separately to generate a fasta + sumstats file for each? I've tried various renditions of this but am being told there are 0 common markers between my in- and out- group.

Thanks for your help!

Example project info file for paired end sequencing with >1 barcode and multiple lanes

Hi,

it would be very helpful if you could provide an example project info file as the instructions are a bit unclear about what it should look like if one has multiple plates/lanes in the paired-end scenario. For example: should the file have another column called LANES?

Similarly, it would be very helpful to see how to add multiple barcodes per sample (ie scenario 5 in section 4.1.1 of Stacks manual) - should they be in the same column separated with some delimiter? Or should there be two columns for barcodes? If the latter, what should their names be?

A simple example file would clarify this :-)

This looks like a very exciting way to "tidy up" a typically messy Stacks workflow. Thanks a lot!

error in summary_ustacks

Hi Thierry,
I am using your stackr to do my pictures, but now I have this error, I run stacks in a cluster, and I am trying to do my summarizes in my local Rstudio, can you help me with this?
Thank you for your time

ustacks.summary<- summary_ustacks(

ustacks.folder= "./denovo_M1/",
parallel.core = parallel::detectCores() -1,
filename = "./denovo_M1/",
verbose = TRUE)
#######################################################################
##################### stackr::summary_ustacks #########################
#######################################################################
Removing these catalog files from the summary:
catalog.alleles.tsv.gz
catalog.calls
catalog.fa.gz
catalog.snps.tsv.gz
catalog.tags.tsv.gz
Summarizing 8 ustacks (snps, tags, alleles) files...

Summarizing information...
| | 0%, ETA NA
Error: Argument 1 must have names
In addition: Warning message:
In .stackr_parallel(X = sample.name, FUN = summarise_ustacks, mc.cores = parallel.core, :
scheduled cores encountered errors in user code

PDF of help files/documentation, examples of integrated code

Hello,

This package is amazing in its completeness. I'm wondering if there is anywhere that I can find anyone's code, who has used the package for a series of analyses, so that I can get a sense for how to implement the workflow that is provided, and how to integrate the various functions (perhaps from the Benestan paper)? Also, I've read through a whole bunch of the help files for a whole lot of functions, but is there a way that I can get all the functions in a single PDF, sort of as documentation?

Many thanks for this great package.
Ella

Some problem around colony input files

Dear Thierry,

I am using your package to input a large amount of data to Colony. Our package is the unique way to pass data from stack to colony.

1- First, i use a awk script to filtred out locus without polymorphisms
awk '$2 > 0 {print}' "$src_root"/14_reassignation/input/batch_2.haplotypes.tsv > "$src_root"/14_reassignation/input/TRIM.haplotypes.tsv

2- I use your R package to feed the haplo2colony fonction

res <- haplo2colony("/media/XXX/TRIM.haplotypes.tsv"
, blacklist.id = NULL, whitelist.loci = NULL,
sample.markers = 5, 1, 2, pop.select = "all",
allele.freq = FALSE, inbreeding = 0, mating.sys.males = 0,
mating.sys.females = 0, clone = 0, run.length = 2, analysis = 1,
allelic.dropout = 0, error.rate = 0.02, print.all.colony.opt = FALSE,
imputations = FALSE, imputations.group = "populations", num.tree = 100,
iteration.rf = 10, split.number = 100, verbose = TRUE,
parallel.core = 2, filename = "/home/XXX/colony/colony2_v1.dat")
3- I rename the colony2_v1.dat to colony2.dat into the colony directory
4- I got an error when using colony2s.ifort.out with the
jean-baptiste@ordi[colony] mv ./colony2_v1.dat ./colony2.dat [ 6:43]
jean-baptiste@ordi[colony] ./colony2s.ifort.out [ 6:43]

COLONY, Version 2.0.6.2, Build 20160825, Expire Date 20180825
Copyright (C) by Jinliang Wang, Institute of Zoology, Zoological Society of London
Email: [email protected]

Opening & reading data input file: colony2.dat
Marker 2 has the same ID, 169, as marker 1
Errors in DATA. Insufficient data or incorrect format.
Please check DATA and format and then re-run the program
Program stopped in subroutine StopOnDataError

5- After looking into the colony manual user, in the attached file (i modified the extension)
colony2.txt
line 23, the loci name (header) is duplicated... After deleting all duplicates by hand I got a new (and more severe error). :

jean-baptiste@ordi[jean-baptiste] cd ~/colony [ 6:36]
jean-baptiste@ordi[colony] ./colony2s.ifort.out [ 6:36]

COLONY, Version 2.0.6.2, Build 20160825, Expire Date 20180825
Copyright (C) by Jinliang Wang, Institute of Zoology, Zoological Society of London
Email: [email protected]

Opening & reading data input file: colony2.dat
Reading offspring genotype data...
forrtl: Is a directory
forrtl: severe (30): open failure, unit 10, file /home/jean-baptiste/colony/
Image PC Routine Line Source
colony2s.ifort.ou 0000000000633E04 Unknown Unknown Unknown
colony2s.ifort.ou 00000000006493AB Unknown Unknown Unknown
colony2s.ifort.ou 000000000042AE18 Unknown Unknown Unknown
colony2s.ifort.ou 0000000000423E26 Unknown Unknown Unknown
colony2s.ifort.ou 0000000000401EF6 Unknown Unknown Unknown
colony2s.ifort.ou 0000000000401E7E Unknown Unknown Unknown
colony2s.ifort.ou 00000000006E47A4 Unknown Unknown Unknown

Since colony2 inputs are quite plainfull to build-up, i will be very happy to have any inputs.

from where to downloads file project.info.turtle.tsv

version of R

Hi,
I would like to know for what version of R is the library available. I have RStudio 1.4 and I cannot install this package.

Cheers
Maria

problem running tsv2bam

Hi Thierry,
I have problems running the tsv2bam command.
I got the following error:

For progress, look in the log file:
09_log_files/[email protected]
tsv2bam completed

Moving/Renaming stacks tsv2bam log file:

Merging BAM files with SAMtools to generate a catalog.bam file...
Number of bam files to merge: 0
Error: Tibble columns must have compatible sizes.

Size 0: Existing data.
Size 2: Column SPLIT_VEC.
ℹ Only values of size one are recycled.
Run rlang::last_error() to see where the error occurred.
In addition: Warning messages:
1: In system2(command = "tsv2bam", args = command.arguments, stdout = tsv2bam.log.file) :
error in running command
/bin/sh: tsv2bam: command not found

I tried to figure out the problem, do you have any idea how to solve this?

Best
Maria

Problem with run_process_radtags

Hi Thierry,

I have trouble running the cstacks command line with my samples, but it worked with RADproc. I was already using stackr for fasqcr, but I would like to use stackr for the whole workflow instead of switching from command line to R.

However, I have trouble with run_process_radtags. I am doing RADseq with paired end, so my project.info file is expected to have four columns Barcodes, Individuals, Forward and Reverse. Each of my individual is identified by a specific pair of barcodes (one at each end of the sequence). How am I supposed to give both barcodes with only one column?

Thank you in advance,

Teddy Urvois

thierrygosselin / stackr Goto Github PK

stackr's Introduction

stackr: an R package to run stacks software pipeline

Installation

Citation:

Life cycle

Stacks modules and RADseq typical workflow

stackr's People

Contributors

Stargazers

Watchers

Forkers

stackr's Issues

parallel.core = parallel::detectCores() -1,

COLONY, Version 2.0.6.2, Build 20160825, Expire Date 20180825 Copyright (C) by Jinliang Wang, Institute of Zoology, Zoological Society of London Email: [email protected]

jean-baptiste@ordi[jean-baptiste] cd ~/colony [ 6:36] jean-baptiste@ordi[colony] ./colony2s.ifort.out [ 6:36]

COLONY, Version 2.0.6.2, Build 20160825, Expire Date 20180825 Copyright (C) by Jinliang Wang, Institute of Zoology, Zoological Society of London Email: [email protected]

Recommend Projects

Recommend Topics

Recommend Org

COLONY, Version 2.0.6.2, Build 20160825, Expire Date 20180825
Copyright (C) by Jinliang Wang, Institute of Zoology, Zoological Society of London
Email: [email protected]

jean-baptiste@ordi[jean-baptiste] cd ~/colony [ 6:36]
jean-baptiste@ordi[colony] ./colony2s.ifort.out [ 6:36]

COLONY, Version 2.0.6.2, Build 20160825, Expire Date 20180825
Copyright (C) by Jinliang Wang, Institute of Zoology, Zoological Society of London
Email: [email protected]