Developed a K-Nearest Neighbors model aimed at predicting future contracts of soccer players based on a comprehensive set of variables including performance statistics, age, and market potential.
Employed Python for data manipulation and analysis, leveraging libraries such as Pandas and Numpy for data cleaning and merging, ensuring the reliability of the dataset used for model training and validation.
Utilized SciKit-Learn to implement a KNN model, analyzing player data to forecast key contract elements like transfer value and salary, enhancing predictive accuracy and relevance for real-world applications.
Developed and fine-tuned a k-Nearest Neighbors (k-NN) regression model to predict the durations of power outages based on influential factors like cause, geography, and population metrics, using a dataset that included detailed information on past power outage events.
Conducted rigorous data cleaning and exploratory data analysis to identify key predictors, manage outliers, and ensure data quality, which included encoding categorical variables and selecting relevant quantitative features for model training.
Evaluated the model's performance using root mean square error (RMSE) and R-squared metrics, performed a fairness analysis comparing predictions across different U.S. regions, and refined the model by incorporating features like peak energy consumption hours and population metrics to improve prediction accuracy.
Conducted in-depth data analysis for a comprehensive study on the impact of COVID-19 on air quality in San Diego County, utilizing Python, pandas, and NumPy for data cleaning and merging, and Matplotlib and Seaborn for data visualization.
Applied statistical expertise to investigate correlations between pandemic trends and NO2 levels using linear regression models in Python, interpreting results to establish a nuanced understanding of the pandemic's environmental effects.
Collaborated within a multidisciplinary team, contributing to research, data interpretation, and ethical analysis, leveraging communication and technical skills to ensure the project's success and validity.
Analyzed a comprehensive dataset of major power outages in the U.S. from January 2000 to July 2016 to identify the correlation between weather conditions and power outage durations, focusing specifically on the impact of colder weather on the length of outages.
Conducted extensive data cleaning and exploratory data analysis to refine the dataset for relevant variables, implemented hypothesis testing to examine the relationship between power outage durations and cold weather, and communicated findings through clear insights.
Utilized statistical methods and permutation tests to validate the hypothesis that colder weather leads to longer power outage durations, contributing to the development of strategies for improving infrastructure resilience and emergency response during adverse weather conditions.
Developed an interactive globe, using JavaScript and D3, that allows the user to hover over countries and reveal its COVID-19 cases, GDP, and population.
Automated the visualization to spin at a constant rate and programmed it to give the user the ability to zoom in and out of countries as well as pan from one side to the other.
Analyzed large datasets to find trends and perform statistical analysis on eBike accidents in the County of San Diego utilizing Python, Pandas, and Numpy packages and visualized findings on Tableau.
Executed literature studies on eBike safety, and synthesized key research findings to assess if there is a greater accident-to-user ratio in eBikes than in bicycles.
Revised and improved newsletter surveys with a focus on obtaining valuable feedback from nonprofit organizations and medical institutions to meticulously assess the health-related benefits and associated costs attributed to using eBikes by riders.