mirugwe1 Goto Github PK

followers: 2.0 following: 3.0 repos: 21.0 gists: 0.0

Name: Mirugwe_Alex

Type: User

Company: Makerere University School of Public Health

Bio: A Data Scientist with research interests in Machine Learning, Deep Learning, Computer Vision, and NLP.

Twitter: mirugwealex1

Location: Kampala, Uganda

Blog: mirugwe.com

👋 Hello there! Welcome to my GitHub profile. I'm thrilled to have you here! This is the place where I showcase my projects, share my code, and collaborate with other amazing developers and enthusiasts like yourself. Feel free to explore, learn, and contribute. Let's dive in!

About Me

I'm a passionate data scientist with a strong background in developing and deploying AI-powered solutions. I have a deep love for data and enjoy solving complex problems using cutting-edge technologies. My journey in the field of data science has been an exciting and continuous learning experience.

What You'll Find Here

In this repository, you'll find a diverse collection of projects, experiments, and research work that reflect my interests and expertise. I believe in the power of open source and strive to make my code accessible, well-documented, and easy to understand. I hope you'll find value in exploring these projects and perhaps even find inspiration for your own endeavors.

🌱 I’m currently learning PyTorch, Generative AI modeling
📝 I regularly write articles on The application of machine learning and deep learning in healthcare
💬 Ask me about Python, R, Tensorflow, Machine Learning, Deep Learning
📫 How to reach me [email protected]

Connect with me:

I love connecting with fellow data scientists, researchers, and enthusiasts. If you have any questions, ideas, or just want to say hello, feel free to reach out to me. You can find me on various platforms:

Languages and Tools:

Mirugwe_Alex's Projects

accurate-occupancy-detection-of-an-office-room-from-light-temperature-humidity-and-co2-measurement

This project aims at developing, validating, and testing several classification statistical models that could predict whether or not an office room is occupied using several data features, namely temperature (◦C), light (lx), humidity (%), CO2 (ppm), and a humidity ratio. The data is modeled using classification techniques i.e. Logistic regression, Classification tree, Bagging-Random forest, and Gradient boosted trees. These models were trained and then after evaluated against validation and test sets and using confusion matrices to obtain classification and misclassification rates. The logistic model was trained using glmnet R package, Tree package for classification tree model, randomForest for both Bagging and Random Forest Models, and gbm package for Gradient Boosted Model. The best accuracy was obtained from the Random Forest Model with a classification rate of 93.21% when it was evaluated against the test set. Light sensor is also the most significant variable in predicting whether the office room is occupied or not, this was observed in all the five models.

bentoml_mlops

# HIV Viral Load Predictive Model with BentoML for MLOps

bird_detection

This repository hosts all the scripts used in the implementation of bird detection models. We are using Convolutional Neural Networks(CNN)'s Faster R-CNN, Single Shot Detector(SSD), and YOLOv3 meta-architectures while utilizing ResNet-101, MobileNet, Inception ResNet v2 and VGG-16 feature extraction Networks (backbone network).

cervical_cancer_screening

This repository contains the implementation of deep learning Convolutional Neural Network (CNN) algorithms for cervical cancer screening. The algorithm aims to assist in the early detection and classification of cervical cancer from digital cervical images.

covid19

covid19_data_analysis

Aminated graps and maps of COVID19 Data.

custering-analysis-in-r

This assignment aims at discovering whether there exist any regional patterns in the spread of the COVID-19 virus through the use of cluster analysis statistical modeling on the countries COVID data collected from the Our World Data website. The dataset used in this assignment has 30 variables related to COVID-19 cases for 208 different countries. The data was collected from the start of the pandemic to $02^{nd}/09/2020$. The clustering analysis was done using R Programming Language and cluster statistical learning algorithms of Hierarchical clustering, Kmeans, and Partitioning Around Mediods(PAM). A model of six(6) clusters was built and silhouette plots were used to assess the quality of the clustering. And the hierarchical clustering model produced the highest average silhouette width of $\color{red}{\text{0.85}}$. And since different countries on the same continent have been affected differently by the virus, therefore in this regard clustering models couldn't group countries regionally. Countries were clustered depending on how they have been hit by the coronavirus pandemic.

data_protection

The repository contains code for AES AND RSA data encryption algorithm

deep-learning-using-r-keras

This assignment involves the use of Keras Tensorflow based R package to build multiple models on regression and classification data.

dimensional-reduction-pca-isomap-multi-dimensional-scaling-and-knn-modelling.

The goal of this project is to apply different dimensional reduction methods i.e. Principal Component Analysis (PCA), metric Multidimensional Scaling (MDS), and IsoMap to the MNIST handwritten digits data sets consisting of a greyscale image of digit 5 or 8 represented by one dimension vector of size 785 columns and Wisconsin Diagnostic Breast Cancer dataset-WDBC (source: UCI Machine Learning) consists of 569 data points classified as either malignant or benign to determine which methods and parameters work best on different types of data. We used the KNN algorithm to evaluate the performance of these dimensional reduction methods. KNN models were built both on the original dimension data sets and the dimensionally reduced data to classify digits in the MNIST data or patient's cancer status in the WDBC data. And the difference in the results was used to evaluate the impact of reducing the dimensions on accuracy. Reducing the dimensions of the MNIST handwritten digits data set, slightly improved the performance of the model's classification rate as it increased by only **0.4** i.e. from **98.5%** to **98.9%** for the IsoMap reduction method. PCA and metric MDS did not improve the performance as it reduced from **98.5%** to **96.75** for both methods. For the breast cancer data set, the model's performance only improved when PCA dimensionally reduced was considered. The model **100%** classified the patient's breast cancer status. Other reduction methods did not increase or reduce the classification accuracy from ***92.04%*** which was obtained with original data.

kaggle-s--world-happiness-report-analysis

The datasets that I have chosen are happiness 2015-2019 datasets, of Kaggle’s dataset. These datasets give the happiness rank and happiness score of 155 countries around the world based on seven factors including family, life expectancy, economy, generosity, trust in government, freedom, and dystopia residual. Sum of the value of these seven factors gives us the happiness score and the higher the happiness score, the lower the happiness rank. So, it is evident that the higher value of each of these seven factors means the level of happiness is higher. We can define the meaning of these factors as the extent to which these factors lead to happiness. Dystopia is the opposite of utopia and has the lowest happiness level. Dystopia will be considered as a reference for other countries to show how far they are from being the poorest country regarding happiness level.

llm_chatbot

Health based Chatbot powered by LLM

machine-learning-webapp

AutoML Web App for predicting tips using Python, Pandas Profiliing, Streamlit, and PyCaret

mirugwe

mirugwe1

recommendation-systems

The goal of this project was to build recommender systems that predict the rating a user will give to a book and also recommends books to users that they might enjoy, based on their past book evaluations using content-based systems i.e. item-based collaborative filtering, user-based collaborative filtering, and matrix factorization. The accuracy of the matrix factorization recommender system was assessed using cross-validation. These content-based systems recommend books to users based on the cosine similarity distance between books or users. In User-Based Collaborative Filtering (UBCF), books are recommended assuming that users with similar preferences will rate books similarly. In Item-Based Collaborative Filtering (IBCF), the presumption is that users will prefer books that are similar to other items they like. Information about users and books was stored in a matrix that was modeled and used to make predictions (the recommendations). The matrix factorization recommender system assessed to find the influence of adding L2 Regularization and bias to it. And it was found that L2 regularization did not improve the performance of the model while adding the bias greatly improved the performance and the lowest RMSE of **0.033** was registered. Finally, a model that ensembles the predictions from UBCF, IBCF, and matrix factorization was created and evaluated using the RMSE.

restaurant-tipping-linear-regression-model

The goal of this project is to build a linear model for predicting the average amount of tip in dollars a waiter is expected to earn from the restaurant given the predictor variables i.e. total bill paid, day, the gender of the customer (sex) time of the party, smoker, and size of the party. And this was achieved through the use of the Linear Regression method. The dataset of 200 observations and 7 variables was divided into training and testing sets in a ratio of 8:2 respectively. The model was fitted using the lm() function of R on the train set and tested on the testing set using predict() function. And the model fitness was deeply analyzed to understand how well it fits the data. Using Lasso regularization approach, the model was improved and this helped to identify the most important predictors in estimating the amount of tip received by the waiter. And also an interaction of size and smoker was included in the final model which greatly improved its data fitness.

mirugwe1 Goto Github PK

About Me

What You'll Find Here

Connect with me:

Languages and Tools:

Mirugwe_Alex's Projects

Recommend Projects

Recommend Topics

Recommend Org