Giter Club home page Giter Club logo

gbs_snp_filter's People

Contributors

laninsky avatar sethmusker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gbs_snp_filter's Issues

Error using HWE.R

Hello,
I have come across an error at the HWE step. The error message reads,
"Error in names(hwetablebin) <- c("snp", popnames) :
'names' attribute [9] must be the same length as the vector [1]
Execution halted
ls: cannot access *pop.vcf: No such file or directory"

I do get the HWE table in the outfiles, but the HWE.vcf does not get written.
I would appreciate any suggestions.

Error in generating the .ld.vcf file (the final step)

Dear Alana,
Hope you are fine.
Again this is Omid and sorry because of a new error. Actually I am working on the same samples as previous but today I decided to generate a vcf file including two populations and when I decided to use your package for filtering I faced with the following error. I will be so grateful if you can help to overcome this error.

Parsed with column specification:
cols(
  CHR_A = col_character(),
  BP_A = col_double(),
  SNP_A = col_character(),
  CHR_B = col_character(),
  BP_B = col_double(),
  SNP_B = col_character(),
  R2 = col_double(),
  X8 = col_logical()
)
Parsed with column specification:
cols(
  CHR_A = col_character(),
  BP_A = col_double(),
  SNP_A = col_character(),
  CHR_B = col_character(),
  BP_B = col_double(),
  SNP_B = col_character(),
  R2 = col_double(),
  X8 = col_logical()
)
Warning messages:
1: Missing column names filled in: 'X8' [8]
2: Missing column names filled in: 'X8' [8]
[1] "Up to 2 out of 13 pairwise LD comparisons"
Error in if (is.na(SNP_record[(j - 1), 8])) { :
  argument is of length zero
Execution halted

I also will send you my files through Dropbox.

Thank you very much in advance.

Regards,

Omid

Add some guidance on what vcf to use

Tweak the readme to point out what vcf is the one to use, because people are not necessarily going to scroll through the novel that is the readme :P

Error in generating .HWE.vcf file

Dear Alana,
hope you are doing well.
This is Omid Jafari. Previously I was working with stacks ver.1 pipeline and you helped me with solving my error in using your package by updating it. But now I have generated a vcf file from stacks ver 2 which in column ID there is some changes and I think the error backs to that. It should be mentioned that my pipeline was genome reference-based.

Error: 'populations.0.65_0.9.0.01_3.HWE.vcf' does not exist in current working directory ('/home/omid/Khazar-vcf-mehrshad/hwe').
Execution halted

In the GBS_SNP_filter.txt file, at the last lane I changed _.* to :.* and it gets a bit running but then again I face with error.

There were 50 or more warnings (use warnings() to see the first 50)
[1] "Up to 1 out of 5 populations"
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,  :
  length of 'dimnames' [2] not equal to array extent
Calls: unlist -> lapply -> FUN -> which -> Ops.data.frame -> matrix
In addition: Warning message:
Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `enframe(name = NULL)` instead.
This warning is displayed once per session.
Execution halted
ls: cannot access *pop.vcf: No such file or directory
Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: readr
Loading required package: stringr
Loading required package: tibble
Parsed with column specification:
cols(
  X1 = col_character()
)
Error: 'populations.recode.0.65_0.9.0.01_3.HWE.vcf' does not exist in current working directory ('/home/omid/Khazar-vcf-mehrshad/hwe').
Execution halted

I'll share my original vcf file, popmap.txt and GBS_SNP_filter.txt and will be so grateful if you can help me to pass over this error.

I should add the point that I think the the error backs to my vcf file, because when apply for using the package on some other .vcf files (generated from stacks 2) with the change in last line code of GBS_SNP_filter.txt file (as mentioned above) it works fluently, so I think there is some thing wrong in that vcf file!!

Regards,
Omid

Failing after PLINK for stickleback test data

Warning messages:
1: Missing column names filled in: 'X8' [8] 
2: Missing column names filled in: 'X8' [8] 
3: Missing column names filled in: 'X8' [8] 
Error in gsub(parameters[8, 1], "", as.matrix(temp %>% select(!!parameters[7,  : 
  incorrect number of dimensions
Execution halted

Suspect it is something to do with no regex pattern being present for this dataset - drill into it more tomorrow.

unknown issue

Hi laninsky,

thank you very much for your sweet package!

I am running into an issue

Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: readr
Loading required package: stringr
Loading required package: tibble
Parsed with column specification:
cols(
  .default = col_character(),
  `1864` = col_double(),
  `15` = col_double()
)
See spec(...) for full column specifications.
Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) :
  object 'ALT' not found
Calls: %>% ... mutate.tbl_df -> mutate_impl -> str_count -> stri_count_regex
In addition: Warning message:
Duplicated column names deduplicated: '.' => '._1' [6], '0/0:25:25,0:40:-0.00,-9.18,-124.38' => '0/0:25:25,0:40:-0.00,-9.18,-124.38_1' [17], '0/0:18:18,0:40:-0.00,-7.07,-90.18' => '0/0:18:18,0:40:-0.00,-7.07,-90.18_1' [24], '0/0:23:23,0:40:-0.00,-8.57,-114.61' => '0/0:23:23,0:40:-0.00,-8.57,-114.61_1' [26], '0/0:40:40,0:40:-0.00,-13.69,-197.66' => '0/0:40:40,0:40:-0.00,-13.69,-197.66_1' [30], '0/0:28:28,0:40:-0.00,-10.08,-139.03' => '0/0:28:28,0:40:-0.00,-10.08,-139.03_1' [31], '0/0:30:30,0:40:-0.00,-10.68,-148.81' => '0/0:30:30,0:40:-0.00,-10.68,-148.81_1' [32], '0/0:15:15,0:40:-0.00,-6.17,-75.53' => '0/0:15:15,0:40:-0.00,-6.17,-75.53_1' [34], '0/0:35:35,0:40:-0.00,-12.19,-173.23' => '0/0:35:35,0:40:-0.00,-12.19,-173.23_1' [36], '0/0:23:23,0:40:-0.00,-8.57,-114.61' => '0/0:23:23,0:40:-0.00,-8.57,-114.61_2' [37], '0/0:14:14,0:40:-0.00,-5.86,-70.64' => '0/0:14:14,0:40:-0.00,-5.86,-70.64_1' [39], '0/0:18:18,0:40:-0.00,-7.07,-90.18' => '0/0:18:18,0:40:-0.00,-7.07,-90.18_2' [43], '0/0:19:1 [... truncated]
Execution halted
ls: cannot access *pop.vcf: No such file or directory
Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: readr
Loading required package: stringr
Loading required package: tibble
Parsed with column specification:
cols(
  X1 = col_character()
)

grateful for any help!

My parameters file:

cat GBS_SNP_filter.txt


test.vcf
0.0
1
0.05
0.5
1

file attached ( test.vcf but called test.vcf.txt for upload authorisations
test.vcf.txt
)!

locus ID regex pattern when column has colons

I'm putting togther the GBS_SNP_filter.txt file and am a bit confused about line 8: the locus ID regex pattern. I have a reference-aligned vcf file output from Stacks v2. The #CHROM column is the scaffold the locus was assembled against, followed by POS and ID columns. The locus ID is in the 'ID' column but also has additional information, separated by colons. For example: (columns #CHROM, POS, ID) - ENA|CAJHIB020000001|CAJHIB020000001.1 13970 3:73:+.

From the ReadMe, it looks like I need to use a regex pattern because of the formatting of the locus ID column. I'm a bit confused about what that would look like in my case and if the colons in the ID column will be a problem, given what's mentioned in the ReadMe ("the locus name should not have a colon in it, because everything following the colon will be stripped away following the LD step").

I would appreciate any guidance on this!

Address some deprecating in code

Warning message:
funs() is soft deprecated as of dplyr 0.8.0
Please use a list of either functions or lambdas: 

  # Simple named list: 
  list(mean = mean, median = median)

  # Auto named with `tibble::lst()`: 
  tibble::lst(mean, median)

  # Using lambdas
  list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
Note: Using an external vector in selections is ambiguous.
ℹ Use `all_of(origcolnumber)` instead of `origcolnumber` to silence this message.
ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.

Error in -removepops

Hi,
This is Omid and I am working on GBS data for a fish species and the vcf file is generated from stacks pipeline. I am using your package for filtering and I think it works well until the time of LD filtering. I should add that I have the .ld files per each pop but for the final file in .vcf format I have nothing. Here is the error which I faced and I will be so grateful if you can help me to solve this error.

Error in -removepops : invalid argument to unary operator 
Execution halted

Regards,
Omid

ld performance in the package

Dear Alana,
Hope you are fine.
Actually I just have a question about the ld option in your GBS_SNP_filter package. If two loci are in ld, then both of them will be removed or one of them will be retained?

Thank you for all your supports.

Regards,
Omid

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.