angsd-wrapper / angsd-wrapper Goto Github PK
View Code? Open in Web Editor NEWUtilities for analyzing next generation sequencing data.
License: MIT License
Utilities for analyzing next generation sequencing data.
License: MIT License
How many iterations does the Inbreeding Coefficient run? I tried reading the documentation but have not found the info. I am running it for a big dataset, it has been running for more than 5 days and is up to iteration 126. I see that its set at max_iters: 1500 min_iters: 10 and in that case it will not be able to finish before running out of time (max 7 days on our server). Is there a way to change this? Does the program need to do all 1500 iterations for the results to be usable?
Hello,
I am trying to install this tool on Linux 4.15.0-147-generic #151-Ubuntu. The error I am getting is
abcHWE.cpp: In member function ‘void abcHWE::HWE_EM(double*, double*, int)’:
abcHWE.cpp:272:8: error: ‘isnan’ was not declared in this scope
if(isnan(newFreq1*0.5 + newFreq2)){
^~~~~
abcHWE.cpp:272:8: note: suggested alternative:
In file included from analysisFunction.h:6:0,
from abc.h:8,
from abcFreq.h:2,
from abcHWE.cpp:17:
/usr/include/c++/7/cmath:639:5: note: ‘std::isnan’
isnan(_Tp __x)
^~~~~
Makefile:47: recipe for target 'abcHWE.o' failed
make: *** [abcHWE.o] Error 1
which was previously reported. I installed libgeos-dev, but it did not solve the problem
stelo@H4:~/sw/angsd-wrapper$ sudo apt install libgeos-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libgeos-dev is already the newest version (3.6.2-1build2).
after I install angsd-wrapper and try to run shiny I get the following error. Any suggestions?
MBP-de-Joanna:angsd-wrapper joannamalukiewicz$ angsd-wrapper shiny graphing
angsd-wrapper running from /Applications/angsd-wrapper
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:base’:
format.pval, units
Attaching package: ‘ape’
The following object is masked from ‘package:Hmisc’:
zoom
Attaching package: ‘DT’
The following objects are masked from ‘package:shiny’:
dataTableOutput, renderDataTable
Fehler: With R version 3.5 or greater, install Bioconductor packages using BiocManager; see https://bioconductor.org/install
Ausführung angehalten
Hello,
I realize that there is active development going on but wanted to make sure you were aware of an issue that is currently preventing install / compilation:
read_data.cpp:1:10: fatal error: gsl/gsl_rng.h: No such file or directory
Hopefully this will be a quick fix?
Thanks!
-A
Trying to use '.clst' clusters file to color code subgroups. I must be coding my cluster file incorrectly. I should only have one 'gross' cluster, but there are three, see image. Thanks in advance. Loving using this wrapper!
FID IID CLUSTER
HeliagrossG5_11_4 1 gross
HeliagrossG9_9_4 1 ang
HeliavertA5_11_7 1 vert
HeliavertE11_9_8 1 miss
HeliavertE12_9_5 1 miss
HeliavertG2_9_3 1 vert
HeliavertG3_9_7 1 vert
HeliavertG4_9_6 1 miss
HeliavertH12_11_3 1 miss
Can I somehow add the -minIndDepth filter (Only use site if at least minInd of samples has this minimum depth) to the Common_config file or other config files? I have found it to be very important to deal with missing data when comparing ancient and modern genomes.
I would also like to add these filters which I have also found to be important for my dataset
-setMinDepth
-setMaxDepth
-rmTrans
-minMaf
-SNP_pval 1e-6
Just want to report what I think is a bug in the code for the Fst wrapper. It seems that the FST.sh script moves the shared.pos file one directory up. However, at the stage where the script is preparing the files for graphing, it still calls the file from ${OUT}/shared.pos. If I'm reading this right, I think shared.pos is at the same level as ${OUT} rather than within it. This triggers an error at the end of the FST wrapper.
Hi there,
I wanted to install angsd-wrapper locally. Unfortunately, the dependencies setup of angsd-wrapper was not easy as it was described on github (https://github.com/ANGSD-wrapper/angsd-wrapper). There was an error message as following when I ran “./angsd-wrapper setup dependencies”,
############################################################
abcHWE.cpp: In member function ‘void abcHWE::HWE_EM(double*, double*, int)’:
abcHWE.cpp:272:8: error: ‘isnan’ was not declared in this scope
if(isnan(newFreq1*0.5 + newFreq2)){
^~~~~
abcHWE.cpp:272:8: note: suggested alternative:
In file included from analysisFunction.h:6:0,
from abc.h:8,
from abcFreq.h:2,
from abcHWE.cpp:17:
/usr/include/c++/7/cmath:639:5: note: ‘std::isnan’
isnan(_Tp __x)
^~~~~
Makefile:47: recipe for target 'abcHWE.o' failed
make: *** [abcHWE.o] Error 1
############################################################
OS: Windows 10 subsystem Linux
Version: Linux version 4.4.0-17134-Microsoft ([email protected]) (gcc version 5.4.0 (GCC) )
Command used: just following the setup instruction on https://github.com/ANGSD-wrapper/angsd-wrapper.
Can somebody help me to solve the problem? Thanks!
Hi guys,
I am here to report a problem I had when running ./angsd-wrapper shiny graphing. After doing the whole tutorial (which is very helpful, thanks for that!), I tried to run shiny graphing and got this:
~/Desktop/programs/angsd-wrapper$ ./angsd-wrapper shiny graphing
angsd-wrapper running from /home/carlos/Desktop/programs/angsd-wrapper
Loading required package: methods
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:base’:
format.pval, units
Error: package or namespace load failed for ‘ape’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/carlos/Desktop/programs/angsd-wrapper/.RLibs/ape/libs/ape.so':
libRlapack.so: cannot open shared object file: No such file or directory
Execution halted
However, this directory exists:
~/Desktop/programs/angsd-wrapper$ ls /home/carlos/Desktop/programs/angsd-wrapper/.RLibs/ape/libs
ape.so
Re-installing the 'ape' package in R did not change the output. I did it and it automatically installed where I have my R-3.4:
> install.packages('ape')
Installing package into ‘/home/carlos/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
...
* DONE (ape)
The downloaded source packages are in
‘/tmp/RtmpgvgvS3/downloaded_packages’
I tried adding /home/carlos/R/x86_64-pc-linux-gnu-library/3.4 in my PATH variable of .bashrc and the result is still the same:
export PATH="/home/carlos/R/x86_64-pc-linux-gnu-library/3.4/:$PATH"
export LD_LIBRARY_PATH=/home/carlos/Desktop/programs/angsd-wrapper/.RLibs/ape/libs/ape.so:${LD_LIBRARY_PATH}
I saw a similar issue here: mojaveazure/angsd-wrapper#47 and I realized I had also installed anaconda in my PC. I tried to deactivate it (adding # to #export PATH="/home/carlos/miniconda3/condabin/:$PATH" in .bashrc), but I still had the same issue.
What could be wrong?
Best regards and thanks
I followed the installation instructions and the tutorial instructions but am running into an error when I try to run the Site Frequency Spectrum.
angsd-wrapper SFS ./Site_Frequency_Spectrum_Config
This is my output and it appears it's failing when trying to find the file needed to fold (or not fold) the spectrum, but it can't.
WRAPPER: Zipping advanced arguments onto basic ones
-> angsd version: 0.911-44-g1c0ebb6 (htslib: 1.3.1-30-gbb03b02) build(Oct 31 2021 11:04:52)
-> Reading fasta: /mnt/steelhead/remote/Sophie/Programs/angsd-wrapper/Example_Data/Sequences/Tripsacum_TDD39103.fa
-> Reading fasta: /mnt/steelhead/remote/Sophie/Programs/angsd-wrapper/Example_Data/Sequences/Zea_mays.AGPv3.30.dna_sm.chromosome.10.fa
-> (Using Filipe G Vieira modification of: abcSaf.cpp)
-> Parsing 11 number of samples
-> Region lookup 1/1
-> We have now allocated approximately 10 Megabytes of raw nodes to the nodepool
-> Printing at chr: 10 pos:17551496 chunknumber 1100
-> We have now allocated approximately 20 Megabytes of raw nodes to the nodepool
-> Printing at chr: 10 pos:19386992 chunknumber 2000 [emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=11
-> Printing at chr: 10 pos:22395913 chunknumber 3200 [emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=10
[emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=10
[emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=10
-> Printing at chr: 10 pos:24004662 chunknumber 3600 [emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=11
-> Printing at chr: 10 pos:24908040 chunknumber 4000
-> Done reading data waiting for calculations to finish
-> Done waiting for threads
-> npools:26 unfreed tnodes before clean:0
-> Output filenames:
->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.arg"
->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.mafs.gz"
->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.geno.gz"
->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.gz"
->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.pos.gz"
->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.idx"
-> Sun Oct 31 12:08:56 2021
-> Arguments and parameters for all analysis are located in .arg file
[ALL done] cpu-time used = 199.08 sec
[ALL done] walltime used = 130.00 sec
-> Version of fname:/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.idx is:2
-> Assuming .saf.gz file: /mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.gz
-> Assuming .saf.pos.gz: /mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.pos.gz
-> Problem opening file: '-fold'
Looking at the wrapper shell script (Site_Frequency_Spectrum.sh) it appears that is failing in the final section of the script in the middle of a series of pipes to the final file which does get output in my scratch directory, it's just empty.
#!/usr/bin/env bash
set -e
set -o pipefail
# Load variables from supplied config file
source "$1"
# Are we using Common_Config? If so, source it
if [[ -f "${COMMON}" ]]
then
source "${COMMON}"
fi
# Where is angsd-wrapper located?
SOURCE=$2
# Where is ANGSD?
ANGSD_DIR=${SOURCE}/dependencies/angsd
# Variables created from transforming other variables
# The number of individuals in the taxon we are analyzing
N_IND=$(wc -l < "${SAMPLE_LIST}")
# How many inbreeding coefficients are supplied?
N_F=$(wc -l < "${SAMPLE_INBREEDING}")
# For ANGSD, the actual sample size is twice the number of individuals, since each individual has two chromosomes.
# The individual inbreeding coefficents take care of the mismatch between these two numbers
# Perform a check to see if number of individuals matches number of inbreeding coefficients
if [ "${N_IND}" -ne "${N_F}" ]
then
echo "Mismatch between number of samples in ${SAMPLE_LIST} and ${SAMPLE_INBREEDING}"
exit 1
fi
# Check to see if ancestral state is supplied: If not, polarize samples using
# the reference sequence and generate folded saf.
if [ ! -f "${ANC_SEQ}" ]
then
echo "Ancestral state data not found, using reference sequence to polarize alignment data. BAQ will likewise not be calculated."
if [ ! -f "${REF_SEQ}" ]
then
echo "No reference sequence supplied, unable to perform calculations."
exit 2
else
ANC_SEQ=$REF_SEQ
REF_SEQ=
BAQ=0
FOLD=1
fi
else
FOLD=0
fi
# Create outdirectory
OUT="${SCRATCH}"/"${PROJECT}"/SFS
mkdir -p "${OUT}"
# Now we actually run the command, this creates a binary file that contains the prior SFS
if [[ -f "${OUT}"/"${PROJECT}"_SFSOut.mafs.gz ]] && [ "$OVERRIDE" = "false" ]
then
echo "WRAPPER:maf already exists and OVERRIDE=false, skipping angsd -bam..."
else
# Do we have a regions file?
if [[ -f "${REGIONS}" ]]
then
WRAPPER_ARGS=$(echo -bam "${SAMPLE_LIST}" \
-out "${OUT}"/"${PROJECT}"_SFSOut \
-indF "${SAMPLE_INBREEDING}" \
-doSaf "${DO_SAF}" \
-uniqueOnly "${UNIQUE_ONLY}" \
-anc "${ANC_SEQ}" \
-minMapQ "${MIN_MAPQ}" \
-minQ "${MIN_BASEQUAL}" \
-nInd "${N_IND}" \
-minInd "${MIN_IND}"\
-baq "${BAQ}" \
-ref "${REF_SEQ}" \
-GL "${GT_LIKELIHOOD}" \
-P "${N_CORES}" \
-doMajorMinor "${DO_MAJORMINOR}" \
-doMaf "${DO_MAF}" \
-doGeno "${DO_GENO}" \
-rf "${REGIONS}" \
-doPost "${DO_POST}")
# Are we missing a definiton for regions?
elif [[ -z "${REGIONS}" ]]
then
WRAPPER_ARGS=$(echo -bam "${SAMPLE_LIST}" \
-out "${OUT}"/"${PROJECT}"_SFSOut \
-indF "${SAMPLE_INBREEDING}" \
-doSaf "${DO_SAF}" \
-uniqueOnly "${UNIQUE_ONLY}" \
-anc "${ANC_SEQ}" \
-minMapQ "${MIN_MAPQ}" \
-minQ "${MIN_BASEQUAL}" \
-nInd "${N_IND}" \
-minInd "${MIN_IND}"\
-baq "${BAQ}" \
-ref "${REF_SEQ}" \
-GL "${GT_LIKELIHOOD}" \
-P "${N_CORES}" \
-doMajorMinor "${DO_MAJORMINOR}" \
-doMaf "${DO_MAF}" \
-doGeno "${DO_GENO}" \
-doPost "${DO_POST}")
# Assuming a single region was defined in config file
else
WRAPPER_ARGS=$(echo -bam "${SAMPLE_LIST}" \
-out "${OUT}"/"${PROJECT}"_SFSOut \
-indF "${SAMPLE_INBREEDING}" \
-doSaf "${DO_SAF}" \
-uniqueOnly "${UNIQUE_ONLY}" \
-anc "${ANC_SEQ}" \
-folded "${FOLD}" \
-minMapQ "${MIN_MAPQ}" \
-minQ "${MIN_BASEQUAL}" \
-nInd "${N_IND}" \
-minInd "${MIN_IND}" \
-baq "${BAQ}" \
-ref "${REF_SEQ}" \
-GL "${GT_LIKELIHOOD}" \
-P "${N_CORES}" \
-doMajorMinor "${DO_MAJORMINOR}" \
-doMaf "${DO_MAF}" \
-doGeno "${DO_GENO}" \
-doPost "${DO_POST}" \
-r "${REGIONS}")
fi
fi
# Check for advanced arguments, and overwrite any overlapping definitions
FINAL_ARGS=($(source "${SOURCE}/Wrappers/Arg_Zipper.sh" "${WRAPPER_ARGS}" "${ADVANCED_ARGS}"))
# DEBUGGING
# echo "Wrapper arguments: ${WRAPPER_ARGS}" 1<&2
# echo -e "Final arguments:" ${FINAL_ARGS} 1<&2
"${ANGSD_DIR}"/angsd "${FINAL_ARGS[@]}"
"${ANGSD_DIR}"/misc/realSFS \
"${OUT}"/"${PROJECT}"_SFSOut.saf.idx \
-P "${N_CORES}" \
-fold "${FOLD}" \
> "${OUT}"/"${PROJECT}"_DerivedSFS.graph.me`
I can also include my configuration file if helpful (Site_Frequency_Spectrum_Config) which also directs the script to another configuration file in the same directory (Common_Config), but I'm wondering whether anyone else has run into this error while trying to move through this tutorial before. I am trying to figure out if this is a file path issue or if the SFS is not running correctly and there is some other error in the output file I am not identifying correctly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.