Giter Club home page Giter Club logo

wyeomyia-smithii-project's Introduction

WyGen Repository:

This repository contains scripts I wrote as a graduate research assistant while working on the Wyeomyia smithii Genome project for Dr. Elizabeth Cooper's Lab at UNC Charlotte. In this project I identified and aged Transposable Elements (TEs) within W.smithii and six other mosquito genome annotations, then I visualized these results.

https://elizcooperlab.com

Part A:

  1. repeatModeler.slurm

Bash script used to identify the TE families in the Wyeomyia smithii genome assembly. Requires genome annotation of Wyeomyia smithii and the repeatmodeler module. I replicated with the 6 other mosquito species genome annotations.

  1. add-names.py

Python script to add species name identifiers to each header in the fasta files of TE families identified from repeatmodeler for each of the 7 species. The output from repeatmodeler is required for this script.

  1. get-families.py

Python script to extract the TE families that are in each genome. The fasta files with added species identifiers are required for this script. I repeated with the 6 other mosquito species genome annotations.

  1. get-family-counts.py

Python script to determine the number of each TE family found within each species. The output from get-families.py is required for this script. I repeated with the 6 other mosquito species.

  1. longest_transcripts.sh

Bash script to get primary transcripts for later input into orthofinder. This script requires proteome files for each investigated species.

  1. OrthoScript.sh

Bash script to run orthofinder on the primary transcripts found in the previous step. The output of interest is Species_Tree/SpeciesTree_rooted.txt

  1. iTOL annotation editor

Input SpeciesTree_rooted.txt into iTOL, then use the annotation editor to produce a multi-value bar-chart of TE family counts

Part B:

  1. ltrharvest.sh

Bash script to run ltrharvst and locate unfragmented LTRs in the mosquito genome annotations. Genome annotations in fasta format are required for this script.

  1. gff-format.py

Python script to alter the format of the ltrharvest output to make it work with getfasta. The ltrharvest output is required for this script.

  1. getfasta.sh

Bash script to extract sequences from ltrharvest. The fasta genome annotations and the output from gff-format.py are required for this script.

  1. ltr-seqs.py

Python script to extract the long terminal repeats. The getfasta.sh output is required for this script.

  1. ltr-files.py

Python script to produce one file for each set of LTR sequences. The ltr-seqs.py output is required for this script.

  1. clustal-loop.sh

Bash script to align each of the LTR files we produced from the previous step. The directory containing all of the previous outputs is required for this script.

  1. TE_aging.R

R script to determine and visualize the age of the identified LTRs. The aligned outputs from the previous step are required for this script.

wyeomyia-smithii-project's People

Contributors

lydia-holley avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.