wangdi2014 Goto Github PK

followers: 5.0 following: 2.0 repos: 1.5K gists: 2.0

Type: User

Location: Potomac, Maryland

wangdi2014's Projects

covid-19-data

An ongoing repository of data on coronavirus cases and deaths in the U.S.

This is a project based on the complete genome analysis of the COVID-19 (Sars-cov2) virus, taken from the Wuhan-Hu-1 isolate sample. I cleaned the genome sample to obtain an RNA sequence and I verified the number of base-pairs in the virus. Using the concept of Kolmogorov complexity, I was able to find the lower bound size of a compressed version of the COVID-19 virus. I was able to compress it into an 8.412 kb file using the "LZMA" algorithm. Then I converted the RNA sequence into a DNA string for applying the concepts of "Codons". This helped me to find the essential 20 different types of proteins that can be used to express the genome into the Protein sequence. Further, I made a decoder to make the genome into the Reading-Frame sequence. With the help of this reading frame sequence, I was able to extract the polypeptides and long-chain polypeptides in the virus. Then, I analyzed the Open Reading Frame(ORF) for the Sars-Cov-2 virus which has 10 different proteins that are responsible for the synthesis and catalytic process of COVID-19 in a human body. At last, I was able to verify the length of all the 10 proteins(ORF1a, ORF1b, Spike Glycoprotein, Membrane, ORF6, ORF7a, ORF8, ORF10) thus this project has the proof of all the scientific foundlings using Data science concepts.

covid19

ACE2 expression and cigarette exposure

covid19-detection-using-chest-x-ray

Covid-19 detection in chest x-ray images using Convolution Neural Network.

cpgstats

cpgStats is a C application to parse CpG Dinucleotides file and get a summary of statistics and bed file annotation.

cpi_prediction

This is a code for compound-protein interaction (CPI) prediction based on a graph neural network (GNN) for compounds and a convolutional neural network (CNN) for proteins.

cpp-high-performance

C++ High Performance, published by Packt

cptac3-rna-related-pipeline

Gene/transcript expression; Fusion; de novo assembly

cptac3_splicing

CPTAC3 RNA-seq splicing pipeline

cptac_methylation

Methylation array analysis pipeline for CPTAC

crc_meta

Code and analysis results for the CRC shotgun meta-analysis

create-pptc-pdx-oncoprints

As part of an overall strategy for improving therapies for childhood cancers, the PPTC seeks to develop models for the types of tumors that will be encountered in early phase clinical testing by establishing patient derived xenografts (PDXs) from high-risk childhood cancers refractory to current standard of care treatments. Genomic profiling of these models is required to enable PPTC investigators to develop robust "responder hypotheses" when drug activity is observed. With funding provided by Alex's Lemonade Stand Foundation, we genomically characterize a major subset of 286 PDX models. We use whole exome sequencing, transcriptome sequencing, and SNPArray to characterize the tumor models. The focus on DNA and RNA sequencing data mirrors the current standard practice in most clinical diagnostics lab that use these technologies to detect the spectrum of targetable mutations, gene amplifications, and gene fusion events relevant to preclinical drug development.