Giter Club home page Giter Club logo

vcf_ref_extraction's Introduction

README

The Python script, โ€˜vcf_ref_extraction.py,โ€™ is available on GitHub (github.com/kbeigel/vcf_ref_extraction). This script takes a .vcf file as input and examines the .vcf file and extracts all of the bases from sites in the reference genome that were identified as SNP sites. These SNPs are kept in the same order as they appear in the .vcf (which is also the same order of SNPs in the PHYLIP file of sample SNP data) and output into a text file. Using Mesquite (v. 3.61), the output reference file (.txt or FASTA) was added back into the PHYLIP file (Maddison and Maddison 2019). As a side note, this script could be tweaked to extract other information stored in the .vcf file such as counts and identities of insertions and deletions that appeared in the variant-calling process (which are not a part of the output SNP dataset; SNPs are the result of TYP == SUB).

1. Extraction of reference sequence from .vcf

The script was written and desgined to be run in the PyCharm IDE Community Edition 2019.3.3 with the Python 3.8 interpreter on Windows 10; execution of the script may take a couple of minutes since .vcf files can be large. The script contains information about how to call the .vcf file and designate an output file name. The script also prints (to the Run window) the total number of characters output, the number of characters omitted, and the total characters.

2. What to do with reference bases after extraction

The reference bases in the output file can then be added to the SNP data PHYLIP file containing the sample data. The order of sites is retained (SNP bases are concatenated). Adding the reference to the SNP data file can be done using a program like Mesquite v3.61: Open the SNP PHYLIP/fasta file in Mesquite and add the reference sequence from a .txt or FASTA file.

vcf_ref_extraction's People

Contributors

kbeigel avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.