A collection of samples from my independent and collaboartive projects.
Main Tools
- Python (NumPy, polars, Pandas, scikit-learn, Seaborn, Matplotlib, Plotly, Pymongo, NLTK, Gensim, Textblob)
- R (plumbr, plotly, ggplot2, dplyr, tidyr, caret, gganimate, rpart, data.table, reticulate, e1071)
- SQL (e.g., BigQuery, SQLite)
Feel free to contact me via LinkedIn
Current Resume
- NYC Tree Census Project (R and Python)
- My main contributions:
- Data collection and exploratory data analyses
- Data cleaning and kNN data imputation
- K-means Clustering (Unsupervised) and elbow method evaluation
- Classification models (Supervised)
- Logistic Regression, Naive Bayes, Decision Tree, Neural Network
- Model evaluation using LIME
- My main contributions:
- COVID-19 Research Articles - NLP Project (R and Python)
- My main contributions:
- Exploratory data analyses
- Data cleaning and text preprocessing
- Text classification algorithms: LDA and sentiment analysis code
- Interactive plotly visuals
- Resulting publication (co-first author):
- Dornick, C., Kumar, A., Seidenberger, S. Seidle, E., & Mukherjee, P. (2021) Analysis of Patterns and Trends in COVID-19 Research, Procedia Computer Science, 185, 302-310. https://doi.org/10.1016/j.procs.2021.05.032
- My main contributions:
- Network Data Visualization (Python and Gephi)
- Student conversion and retention: Interactive visuals and animated GIF (R)*
- NLP - Survey Data (Python)*
- Text Cleaning
- LDA (Unsupervised)
- Sentiment Analysis
- Word Frequencies by k-pairings (k=1,2,3)
- Dashboard Usage Analysis (R)*
- Market Basket Analysis (Unsupervised)
- Visualizations (parallel coordinates plots, frequency plots, network graphics)
*Real data are intentially excluded