Giter Club home page Giter Club logo

frontiers_supplement's Introduction

Hitchhiker's Guide to PCA:
Demystifying dimensionality reduction in R/Bioconductor

Lauren Hsu, Aedin Culhane

Description

This workshop will provide a beginner's guide to principal component analysis (PCA), the difference between singular value decomposition, different forms of PCA and fast PCA for single-cell data. We will describe how to detect artifacts and select the optimal number of components. It will focus on SVD and PCA applied to single-cell data.

Principal component analysis (PCA) is a key step in many bioinformatics pipelines. In this interactive session we will take a deep dive into the various implementations of singular value decomposition (SVD) and principal component analysis (PCA) to clarify the relationship between these methods, and to demonstrate the equivalencies and contrasts between these methods. We will also discuss interpretation of outputs, as well as some common pitfalls and sources of confusion in utilizing these methods.

Pre-requisites

A basic understanding of R syntax would be helpful, but not required. No prior knowledge of PCA necessary.

Workshop Participation

We invite audience members to engage with questions and examples from their own workflows. R notebooks will also be available in advance to run code interactively with the workshop.

R / Bioconductor packages used

  • stats (prcomp, princomp, svd)
  • FactoMineR
  • ade4
  • irlba
  • ggplot2

Time outline

  • Set-up + package installation (5 min)
  • Introduction to matrix factorization and PCA [conceptual] (15 min)
  • Interactive demonstration of methods (25 min)
  • Potential pitfalls, interpreting outputs, and how to decide what’s right for your pipeline (15 min)

Workshop goals and objectives

Upon completion of this workshop, we expect participants to have gained an understanding of how to apply PCA and other SVD-based methods in research.

Learning goals

  1. Understand how PCA works, the variations of PCA, and how it relates to SVD
  2. Suggest appropriate use cases for these dimensionality reduction techniques
  3. Select appropriate methods for use in bioinformatics pipelines

Learning objectives

  1. Describe the similarities and differences between the different implementations of PCA and SVD in R/Bioconductor
  2. Perform PCA/SVD on real data
  3. Creating plots to interpret PCA/SVD outputs, including diagnosis of problems like arch/horseshoe effect

frontiers_supplement's People

Contributors

laurenhsu1 avatar aedin avatar federicomarini avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.