Prevalence analyses comments

Real Data Analysis With coiaf

This repository stores the real data analysis conducted to test the software package {coiaf}.

Data Source

We analysed samples from the MalariaGEN Plasmodium falciparum Community Project [1]. The MalariaGEN Plasmodium falciparum Community Project provides genomic data from over 7,000 P. falciparum samples from 28 malaria-endemic countries in Africa, Asia, South America, and Oceania from 2002-2015. Detailed information about the data release including brief descriptions of contributing partner studies and study locations is available in the supplementary of MalariaGEN et al..

Project Structure

.
├── analysis
│   ├── estimation-comparison.Rmd
│   └── pf6_analysis.Rmd
├── data-outputs
│   ├── core-genome.rds
│   ├── data_dims.rds
│   ├── rmcl_estimation.rds
│   └── seq-error
│       ├── seq_0.01.rds
│       ├── seq_0.05.rds
│       ├── seq_0.1.rds
│       ├── seq_0.15.rds
│       └── seq_0.2.rds
├── download
│   ├── 00_Pf6_vcf_filtering.Rmd
│   ├── 01_create_coiaf_inputs.Rmd
│   ├── 02_run_rmcl.Rmd
│   └── core-genome.tsv
├── figures
│   ├── cluster-locations.png
│   ├── coi-world.png
│   ├── comparison.png
│   ├── continuous-region.png
│   ├── deploid.png
│   ├── discrete-region.png
│   ├── fws.png
│   ├── grouped-prevalence.png
│   ├── log-prevalence.png
│   ├── silhoutte.png
│   └── varying-seq-error.png
├── metadata
│   ├── deploidibd_data.rdata
│   ├── pf6_meta.Rmd
│   └── pf6_meta.rds
├── raw-regions
└── scripts
    ├── coi-region.R
    ├── combine-raw-regions.R
    ├── data-dimensions.R
    ├── rmcl-estimation.R
    └── slurm-region.R

References

1. MalariaGEN, Ahouidi A, Ali M, Almagro-Garcia J, Amambua-Ngwa A, Amaratunga C, et al. An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. Wellcome Open Research. 2021;6: 42. doi:10.12688/wellcomeopenres.16168.1

	ggplot(data = prev_fws, aes(x = prev, y = COI, color = Method)) +
	geom_point(alpha = 0.3, position = position_jitter(height = 0.2)) +
	scale_x_log10(
	breaks = c(0.01, 0.1, 1),
	labels = c(0.01, 0.1, 1),
	limits = c(0.01, 1)
	) +
	facet_grid(~Method) +
	labs(x = "Log10 Prevalence", y = "Estimated COI") +
	scale_color_discrete(
	name = "Estimation Method",
	labels = c("Discrete Variant Method", "THE REAL McCOIL")
	) +
	theme_coiaf()

	out <- mclapply(X = unlist(region_pos[2:24]), FUN = function(i){
	loci_sample_subset(
	loci = obj_list[[i]]$loci,
	vcf = obj_list[[i]]$vcf,
	chrom = obj_list[[i]]$chrom,
	vcfout = obj_list[[i]]$vcfout,
	samples = obj_list[[i]]$samples
	)
	}, mc.cores = 14
	)

	for(j in (4:24)){
	lj <- grep(paste0("\\d\\d_v3.*reg_",j,"\\.vcf"), l, value = TRUE)
	O <- gsub("_01_v3","",lj[1])

	# concatenate our vcfs with bcftools
	system(paste("bcftools concat -O z -o", gsub("//", "/", O), paste(lj, collapse = " ")))

	# read that in and save it out
	vcf_rds <- file.path(here::here(),paste0("analysis/data/derived/snp_selections/vcf_final_reg_",j,".rds"))
	newvcfR <- vcfR::read.vcfR(gsub("//", "/", O))
	saveRDS(newvcfR, vcf_rds)

	}

	```{r exclude patients}
	exclude <- setdiff(unique(complete_predictions$name), unique(rmcl_coi_out$name))
	compare_data <- complete_predictions %>%
	dplyr::filter(!name %in% exclude) %>%
	dplyr::filter(rmcl_med != 25) %>%
	dplyr::relocate(Region, .after = dplyr::last_col())
	```

	```{r load meta}
	meta <- readRDS(here::here("metadata", "pf6_meta.rds"))

	patient_lat_long <- dplyr::left_join(
	compare_data, meta,
	by = c("name" = "Sample")
	)
	```

bailey-lab / coiaf-real-data Goto Github PK

coiaf-real-data's Introduction

Real Data Analysis With coiaf

Data Source

Project Structure

References

coiaf-real-data's People

Contributors

Stargazers

Watchers

Forkers

coiaf-real-data's Issues

Prevalence analyses comments

Data download inconsistencies

Subsetting data in comparing results

Comparison of coiaf and The REAL McCOIL

COI across the world

Prevalence and FwS

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent