Giter Club home page Giter Club logo

rhynchosporium_tree's Introduction

Rhynchosporium Phylogeny

Phylogenetic tree reconstruction using available Rhynchosporium commune WGS data.


Data Preparation

Download WGS data and genome from NCBI

Search for Rhynchosporium DNA data on SRA and download metadata (SraRunTable.txt). Using ENA, generate a script to download all accessions from BioProjects PRJNA3276561 and PRJNA419548, and download to cluster:

bash scripts/ena-file-download-selected-files-20240216-1125.sh
bash scripts/ena-file-download-selected-files-20240216-1138.sh

Download R. commune UK7 data and genome, and uncompress:

conda activate ncbi_datasets
datasets download genome accession GCA_900074885.1 --filename GCA_900074885.1.zip

Quality Control

QC raw sequencing data:

for file in *gz; do 
    sbatch ../scripts/fastqc.sh $file
done

multiqc fastqc/

Trim adapters and low quality sequence, and QC trimmed reads:

mkdir trimmed && cd trimmed

# Trim
for file in ../reads/*_1.fastq.gz; do 
    file2=$(ls $file | sed s/"_1.fastq.gz"//g)
    sbatch ../scripts/trimmomatic_pe.sh $file "$file2"_2.fastq.gz
done

# QC trimmed reads
for file in *gz; do 
    sbatch ../scripts/fastqc.sh $file
done

multiqc fastqc/

Rename reads to "Run_geo_loc_name_country_Collection_Date_readDirection.fastq.gz":

python scripts/rename_reads.py SraRunTable.txt trimmed/ renamed_reads/

Phylogenetic Tree

Assemble phylogenetic tree based on SNPs.

Map reads to UK7 genome

Index UK7 genome:

bwa index R_commune_UK7.fasta

Map reads to genome:

mkdir alignment && cd alignment

for file in ../renamed_reads/*_F.fastq.gz; do 
    file2=$(ls $file | sed s/"_F.fastq.gz"//g)
    sbatch ../scripts/bwa-mem.sh ../UK7/R_commune_UK7.fasta $file "$file2"_R.fastq.gz
done

Sort and index alignments:

for file in *sam; do
    sbatch ../scripts/samtools_sort.sh $file
done

Variant Calling

Index genome for variant calling:

samtools faidx R_commune_UK7.fasta

Generate pileup and call variants using bcftools 'c' model:

mkdir variant_calling && cd variant_calling

# Generate file containing full paths of BAM files
for file in ../alignment/*bam; do 
    realpath $file >> rhynchosporium.list
done

sbatch ../scripts/bcftools_call.sh ../UK7/R_commune_UK7.fasta rhynchosporium.list

Variant Filtering

Check total number of variants:

bcftools view -H rhynchosporium.bcf | wc -l 
# 5911174

Remove sites with less than 10 supporting reads:

bcftools view rhynchosporium.bcf | vcfutils.pl varFilter -d10 - > rhynchosporium_filt.vcf

# Check number of remaining variants
bcftools view -H rhynchosporium_filt.vcf | wc -l 
# 4784455

Calculate average and maximum QUAL scores:

grep -v '^#' rhynchosporium_filt.vcf | awk 'BEGIN {max=0} {sum+=$6; if ($6>max) {max=$6}} END {print "Average qual: "sum/NR "\tMax qual: " max}'
# Average qual: 743.252   Max qual: 999

Check number of high quality (QUAL>800) SNPs and export to new file:

bcftools view rhynchosporium_filt.vcf -H -e 'QUAL<800' --types snps | wc -l 
# 3426167

bcftools view rhynchosporium_filt.vcf -e 'QUAL<800' --types snps > rhynchosporium_highqual.vcf

Reconstruct Phylogenetic Tree

Convert vcf format to phylip file2:

python ../scripts/vcf2phylip.py -i ./rhynchosporium_highqual.vcf

Remove invariant sites from the phylip file2:

python ../scripts/ascbias.py -p ./rhynchosporium_highqual.min4.phy -o rhynchosporium_highqual.phy

# Check total samples and SNP sites
head -n1 rhynchosporium_highqual.phy
# 146 3417572

Run RAxML-ng:

sbatch ../scripts/raxml.sh rhynchosporium_highqual.phy

Visualise Tree

Visualise tree using iTOL.

Phylogenetic Tree with Branch Distances

Phylogenetic Tree without Branch Distances


References

1. Mohd-Assaad N, McDonald BA, Croll D. Genome-Wide Detection of Genes Under Positive Selection in Worldwide Populations of the Barley Scald Pathogen. Genome Biol Evol. 2018 Apr 1;10(5):1315-1332. doi: 10.1093/gbe/evy087. PMID: 29722810; PMCID: PMC5972619

2. https://rdtarvin.github.io/IBS2019_Genomics-of-Biodiversity/main/2019/08/05/09-raxml-epi.html

rhynchosporium_tree's People

Contributors

rj-price avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.