laninsky / gbs_snp_filter Goto Github PK
View Code? Open in Web Editor NEWfiltering SNPs based on LD and HWE status
License: MIT License
filtering SNPs based on LD and HWE status
License: MIT License
Hello,
I have come across an error at the HWE step. The error message reads,
"Error in names(hwetablebin) <- c("snp", popnames) :
'names' attribute [9] must be the same length as the vector [1]
Execution halted
ls: cannot access *pop.vcf: No such file or directory"
I do get the HWE table in the outfiles, but the HWE.vcf does not get written.
I would appreciate any suggestions.
Dear Alana,
Hope you are fine.
Again this is Omid and sorry because of a new error. Actually I am working on the same samples as previous but today I decided to generate a vcf file including two populations and when I decided to use your package for filtering I faced with the following error. I will be so grateful if you can help to overcome this error.
Parsed with column specification:
cols(
CHR_A = col_character(),
BP_A = col_double(),
SNP_A = col_character(),
CHR_B = col_character(),
BP_B = col_double(),
SNP_B = col_character(),
R2 = col_double(),
X8 = col_logical()
)
Parsed with column specification:
cols(
CHR_A = col_character(),
BP_A = col_double(),
SNP_A = col_character(),
CHR_B = col_character(),
BP_B = col_double(),
SNP_B = col_character(),
R2 = col_double(),
X8 = col_logical()
)
Warning messages:
1: Missing column names filled in: 'X8' [8]
2: Missing column names filled in: 'X8' [8]
[1] "Up to 2 out of 13 pairwise LD comparisons"
Error in if (is.na(SNP_record[(j - 1), 8])) { :
argument is of length zero
Execution halted
I also will send you my files through Dropbox.
Thank you very much in advance.
Regards,
Omid
Should write a check to make sure the vcf file is in the GT:DP:other_stuff format
Tweak the readme to point out what vcf is the one to use, because people are not necessarily going to scroll through the novel that is the readme :P
Dear Alana,
hope you are doing well.
This is Omid Jafari. Previously I was working with stacks ver.1 pipeline and you helped me with solving my error in using your package by updating it. But now I have generated a vcf file from stacks ver 2 which in column ID there is some changes and I think the error backs to that. It should be mentioned that my pipeline was genome reference-based.
Error: 'populations.0.65_0.9.0.01_3.HWE.vcf' does not exist in current working directory ('/home/omid/Khazar-vcf-mehrshad/hwe').
Execution halted
In the GBS_SNP_filter.txt file, at the last lane I changed _.*
to :.*
and it gets a bit running but then again I face with error.
There were 50 or more warnings (use warnings() to see the first 50)
[1] "Up to 1 out of 5 populations"
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, :
length of 'dimnames' [2] not equal to array extent
Calls: unlist -> lapply -> FUN -> which -> Ops.data.frame -> matrix
In addition: Warning message:
Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `enframe(name = NULL)` instead.
This warning is displayed once per session.
Execution halted
ls: cannot access *pop.vcf: No such file or directory
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: readr
Loading required package: stringr
Loading required package: tibble
Parsed with column specification:
cols(
X1 = col_character()
)
Error: 'populations.recode.0.65_0.9.0.01_3.HWE.vcf' does not exist in current working directory ('/home/omid/Khazar-vcf-mehrshad/hwe').
Execution halted
I'll share my original vcf file, popmap.txt and GBS_SNP_filter.txt and will be so grateful if you can help me to pass over this error.
I should add the point that I think the the error backs to my vcf file, because when apply for using the package on some other .vcf files (generated from stacks 2) with the change in last line code of GBS_SNP_filter.txt file (as mentioned above) it works fluently, so I think there is some thing wrong in that vcf file!!
Regards,
Omid
Warning messages:
1: Missing column names filled in: 'X8' [8]
2: Missing column names filled in: 'X8' [8]
3: Missing column names filled in: 'X8' [8]
Error in gsub(parameters[8, 1], "", as.matrix(temp %>% select(!!parameters[7, :
incorrect number of dimensions
Execution halted
Suspect it is something to do with no regex pattern being present for this dataset - drill into it more tomorrow.
Hi laninsky,
thank you very much for your sweet package!
I am running into an issue
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: readr
Loading required package: stringr
Loading required package: tibble
Parsed with column specification:
cols(
.default = col_character(),
`1864` = col_double(),
`15` = col_double()
)
See spec(...) for full column specifications.
Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) :
object 'ALT' not found
Calls: %>% ... mutate.tbl_df -> mutate_impl -> str_count -> stri_count_regex
In addition: Warning message:
Duplicated column names deduplicated: '.' => '._1' [6], '0/0:25:25,0:40:-0.00,-9.18,-124.38' => '0/0:25:25,0:40:-0.00,-9.18,-124.38_1' [17], '0/0:18:18,0:40:-0.00,-7.07,-90.18' => '0/0:18:18,0:40:-0.00,-7.07,-90.18_1' [24], '0/0:23:23,0:40:-0.00,-8.57,-114.61' => '0/0:23:23,0:40:-0.00,-8.57,-114.61_1' [26], '0/0:40:40,0:40:-0.00,-13.69,-197.66' => '0/0:40:40,0:40:-0.00,-13.69,-197.66_1' [30], '0/0:28:28,0:40:-0.00,-10.08,-139.03' => '0/0:28:28,0:40:-0.00,-10.08,-139.03_1' [31], '0/0:30:30,0:40:-0.00,-10.68,-148.81' => '0/0:30:30,0:40:-0.00,-10.68,-148.81_1' [32], '0/0:15:15,0:40:-0.00,-6.17,-75.53' => '0/0:15:15,0:40:-0.00,-6.17,-75.53_1' [34], '0/0:35:35,0:40:-0.00,-12.19,-173.23' => '0/0:35:35,0:40:-0.00,-12.19,-173.23_1' [36], '0/0:23:23,0:40:-0.00,-8.57,-114.61' => '0/0:23:23,0:40:-0.00,-8.57,-114.61_2' [37], '0/0:14:14,0:40:-0.00,-5.86,-70.64' => '0/0:14:14,0:40:-0.00,-5.86,-70.64_1' [39], '0/0:18:18,0:40:-0.00,-7.07,-90.18' => '0/0:18:18,0:40:-0.00,-7.07,-90.18_2' [43], '0/0:19:1 [... truncated]
Execution halted
ls: cannot access *pop.vcf: No such file or directory
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: readr
Loading required package: stringr
Loading required package: tibble
Parsed with column specification:
cols(
X1 = col_character()
)
grateful for any help!
My parameters file:
cat GBS_SNP_filter.txt
test.vcf
0.0
1
0.05
0.5
1
file attached ( test.vcf but called test.vcf.txt for upload authorisations
test.vcf.txt
)!
I'm putting togther the GBS_SNP_filter.txt file and am a bit confused about line 8: the locus ID regex pattern. I have a reference-aligned vcf file output from Stacks v2. The #CHROM column is the scaffold the locus was assembled against, followed by POS and ID columns. The locus ID is in the 'ID' column but also has additional information, separated by colons. For example: (columns #CHROM, POS, ID) - ENA|CAJHIB020000001|CAJHIB020000001.1 13970 3:73:+.
From the ReadMe, it looks like I need to use a regex pattern because of the formatting of the locus ID column. I'm a bit confused about what that would look like in my case and if the colons in the ID column will be a problem, given what's mentioned in the ReadMe ("the locus name should not have a colon in it, because everything following the colon will be stripped away following the LD step").
I would appreciate any guidance on this!
Warning message:
funs() is soft deprecated as of dplyr 0.8.0
Please use a list of either functions or lambdas:
# Simple named list:
list(mean = mean, median = median)
# Auto named with `tibble::lst()`:
tibble::lst(mean, median)
# Using lambdas
list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
Note: Using an external vector in selections is ambiguous.
ℹ Use `all_of(origcolnumber)` instead of `origcolnumber` to silence this message.
ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
Hi,
This is Omid and I am working on GBS data for a fish species and the vcf file is generated from stacks pipeline. I am using your package for filtering and I think it works well until the time of LD filtering. I should add that I have the .ld files per each pop but for the final file in .vcf format I have nothing. Here is the error which I faced and I will be so grateful if you can help me to solve this error.
Error in -removepops : invalid argument to unary operator
Execution halted
Regards,
Omid
Dear Alana,
Hope you are fine.
Actually I just have a question about the ld option in your GBS_SNP_filter package. If two loci are in ld, then both of them will be removed or one of them will be retained?
Thank you for all your supports.
Regards,
Omid
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.