Giter Club home page Giter Club logo

maize_gxe_prediction's Introduction

Maize_GxE_Prediction

This is the source code for our paper:
Fernandes, I.K., Vieira, C.C., Dias, K.O.G. et al. Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials. Theor Appl Genet 137, 189 (2024). https://doi.org/10.1007/s00122-024-04687-w.

Citation:

@article{fernandes2024,
  author={Fernandes, Igor K. and Vieira, Caio C. and Dias, Kaio O. G. and Fernandes, Samuel B.},
  title={Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials},
  journal={Theoretical and Applied Genetics},
  year={2024},
  month={Jul},
  day={23},
  volume={137},
  number={8},
  pages={189},
  issn={1432-2242},
  doi={10.1007/s00122-024-04687-w},
  url={https://doi.org/10.1007/s00122-024-04687-w}
}

How to reproduce the results

Before starting reproducing, here are some important notes:

  • You will need a lot of space to run all experiments
  • The scripts ran in a HPC cluster using SLURM, thus you may need to rename job partitions accordingly to the HPC cluster you use (check the .sh files)

Clone repository and download the data

After cloning the repository, download the data here, extract it, and put both Training_Data and Testing_Data folders inside the data folder. Unzip the VCF file Training_Data/5_Genotype_Data_All_Years.vcf.zip.

The folder structure should be as follows:

Maize_GxE_Prediction/
├── data/
│   ├── Training_Data/
│   └── Testing_Data/
├── src/
├── logs/
├── output/
│   ├── cv0/  
│   ├── cv1/
│   └── cv2/

Setup conda and R packages

Install the conda environment:

conda env create -f environment.yml

Install R packages:

# from CRAN
install.packages("arrow")
install.packages("data.table")
install.packages("AGHmatrix")
install.packages("devtools")
install.packages("asreml")  # for BLUEs and FA

# from github source
setRepositories(ind = 1:2)
devtools::install_github("samuelbfernandes/simplePHENOTYPES")

Preprocessing

  1. Create BLUEs:
JOB_BLUES=$(sbatch --parsable 1-job_blues.sh)
  1. Create datasets for cross-validation schemes:
JOB_DATASETS=$(sbatch --dependency=afterok:$JOB_BLUES --parsable 2-job_datasets.sh)
  1. Filter VCF and create kinships matrices (you will need vcftools and plink here):
JOB_GENOMICS=$(sbatch --dependency=afterok:$JOB_DATASETS --parsable 3-job_genomics.sh)
  1. Create Kronecker products between environmental and genomic relationship matrices (will take some hours):
JOB_KRON=$(sbatch --dependency=afterok:$JOB_GENOMICS --parsable 4-job_kroneckers.sh)

Models

  1. Fit E models:
for i in {1..10}; do sbatch --export=seed=${i} --job-name=Eseed${i} --output=logs/job_e_seed${i}.txt 5-job_e.sh; done
  1. Fit G and G+E models:
for i in {1..10}; do sbatch --export=seed=${i} --job-name=Gseed${i} --output=logs/job_g_seed${i}.txt 6-job_g.sh; done
  1. Fit GxE models (will take several hours):
for i in {1..10}; do sbatch --export=seed=${i} --job-name=GxEs${i} --output=logs/job_gxe_seed${i}.txt --dependency=afterok:$JOB_KRON --parsable 7-job_gxe.sh; done
  1. fit GBLUP FA(1) models (will take several hours):
for i in {1..10}; do sbatch --export=seed=${i} --job-name=faS${i} --output=logs/job_fa_seed${i}.txt 8-job_fa.sh; done

Some files in output will be big, particularly the Kronecker files, so you might want to exclude them later.


Results (optional)

We can check some results directly from the terminal. Here are some examples:

Check some GxE results:

find logs/ -name 'gxe_*' | xargs grep -E 'RMSE:*' | head

Store SVD explained variances:

find logs/ -name '*cv*' | xargs grep -E '*variance*' > logs/svd_explained_variance.txt

Check accuracy of GBLUP FA(1) models in CV0:

grep \\[1\\] logs/fa_cv0*

Check which models are done for GxE in one of the repetitions:

cat logs/job_gxe_seed6.txt

maize_gxe_prediction's People

Contributors

igorkf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

asigdel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.