Giter Club home page Giter Club logo

cs2952g-team-4's Introduction

CS2952G-Team-4 (Blue Genes)

The Hi-C Super-resolution problem: a survey and analysis of deep learning methods for enhancing experimental Hi-C data

Jiaqi Zhang, Tyler Defroscia and Michael Nisenzon


Abstract

In recent years, Hi-C experiments are widely used to analyze chromatin interactions. But many applications regarding using Hi-C data are facing the problem that available Hi-C data have low resolution, which will hurt the related analysis. To address this problem, deep learning models such as a convolutional neural network (CNN) and a generative adversarial network (GAN) are used to enhance data resolution due to their effectiveness in various image processing tasks. The estimations of these models indeed increase the data resolution, however, the training cost is a significant increase as well. Previous papers pay little attention to compare the computational resources used in a different model. Moreover, these models consider the Hi-C data as images and apply correlation and image-based metrics to evaluate similarities between their estimations and high-resolution counterparts. Therefore, the feasibility of these models in real applications is doubtful. In this paper, we implement comprehensive experiments to compare most of the ad hoc models on enhancing Hi-C resolution and utilize four biologically plausible similarity scores to measure the estimation. Based on the experimental results, we give a guidance on how to choose from various methods to best fit the application requirement and available computational resources. (If possible, maybe we can build up a new model.)

Project Goal

[a] Use four Hi-C-based measurements: GenomeDISCO, Hi-C Spector, HiCRep, and QuASAR-Rep. These metrics are proposed three years ago. So it would be better if other newly proposed measurements can be used.

[b] Compare various deep learning models. Use the above-mentioned metrics. Compare their performances on various downstream analyses. Compares their required computational resources.

[c] Give guidance about how to choose a model for a Hi-C analysis application.


Paper List

Literature review: (https://github.com/JQZ-Brown/CS2952G-Team-4/blob/master/Literature%20Review.pdf)

GAN and its variations

[1] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.

[2] Wang, Xintao, et al. "Esrgan: Enhanced super-resolution generative adversarial networks." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

Paper [2] has a detailed literature review in the "Related Work" section. I recommend you refer to it for more knowledge about super-resolution GAN.

Deep learning models used for enhancing the Hi-C resolution

[3] Li, Zhilan, and Zhiming Dai. "SRHiC: A Deep Learning Model to Enhance the Resolution of Hi-C Data." Frontiers in Genetics 11 (2020): 353.

[4] Liu, Tong, and Zheng Wang. "HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data." Bioinformatics 35.21 (2019): 4222-4228.

[5] Hong, Hao, et al. "DeepHiC: A generative adversarial network for enhancing Hi-C data resolution." PLoS computational biology 16.2 (2020): e1007287.

There are not too many deep learning models for enhancing Hi-C resolution.

Non-deep-learning models

[6] Zhang, Shilu, et al. "In silico prediction of high-resolution Hi-C interaction matrices." Nature communications 10.1 (2019): 1-18.

Carron et al. "Boost-HiC: Computational enhancement of long-range contacts in chromosomal contact maps"

This method solves the problem from another perspective using traditional machine learning models such as a random forest. There may be other non-deep-learning models. You could search for it. I think matrix completion (imputation) can be a good start.

Downstream analysis tasks over Hi-C data

[7] Zhang, Yan. "Investigate Genomic 3D Structure Using Deep Neural Network." (2017).

This is the dissertation of the author of HiCPlus. This dissertation introduce some downstream anlysis in details.

Other resources that might be related

[8] Li, Wenyuan, et al. "Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data." Bioinformatics 31.6 (2015): 960-962.

[9] [Data Production and Processing Standard of the Hi-C Mapping Center] (https://www.encodeproject.org/documents/75926e4b-77aa-4959-8ca7-87efcba39d79/@@download/attachment/comp_doc_7july2018_final.pdf)

[10] Djekidel, Mohamed Nadhir, Yang Chen, and Michael Q. Zhang. "FIND: difFerential chromatin INteractions Detection using a spatial Poisson process." Genome research 28.3 (2018): 412-422.


Paper Draft:

A latex project is already created in the Overleaf. You can edit it through this link: (https://www.overleaf.com/5522244455tjwbxqgsqcsh). This project emplys the Bioinformatics Journal format (https://academic.oup.com/bioinformatics/pages/submission_online). If you've never used latex before, just let me know. We can fnd a way to work together with doc files.


TODO:

[1] First Draft (due at Oct. 30)

[2] Review (due at Nov. 2)

[3] Idea presentation (due at Nov. 5)

cs2952g-team-4's People

Contributors

jqz-brown avatar zjqxxn avatar tdefroscbrown avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.