Lesson on Cloud/HPC for genomics lessons
Contributors:
- Chris Fields
- Bob Freeman
- Adina Howe
- Andréa Matsunaga
- Bertrand Néron
- Francesco Tabaro
- Walid Gharib
- Philip Lijnzaad
- Mateusz Kuzak
- Kalle Happonen
Course objectives:
Computational bottlenecks are frequently faced when dealing with large datasets. In life sciences, NGS datasets are frequently big and need specific processing softwares (e.g FastQC, Bowtie, TopHat, etc.).
For this purpose, it is useful to give the end users the opportunity to work on powerful machines having preinstalled specific softwares intended to their needs. This is possible using HPC cloud computing.
Many european countries have their own Clouds for scientific usage.
- Netherlands Cloud: SurfSARA
- Finnish Cloud: CSC- Center for Science (https://www.csc.fi/en)
- French Cloud: IFB - French Institute of Bioinformatics (http://www.france-bioinformatique.fr/)
- Switzerland Life Sciences Cluster: Swiss Institute of Bioinformatics - SIB (http://www.vital-it.ch/)
Unfortunately, users of every country should have access to their respective clusters which normally have their own interfaces and require specific knowledge.
Fortunately, course instructors will create accounts for you and will give you remote access via SSH. You will basically learn how to manage your data on the cloud, learn how much computation power you need for your analysis.
Pre-requisite:
- The learners need to be familiar with basic Unix command line.
- Windows users should have Git bash (a Unix command line interface) for Windows installed. See also the setup instructions on the workshop's home page.
- For UNIX based styems users (Ubuntu, Mac OSX), you just need your terminal application open.
What you will learn from this cousre:
- Connection to a distant machine (in this case it is your respective cloud)
- Copying data from your local machine to your virtual machine in the cloud
- Check disk space, quotas and other dependencies
- Install the relevant software in your virtual machine
- Run the software on the data you already copied from your local machine
- Fetch back your output results from the Cloud when done, and visualize them on your local machine