Sanyam gujral's Projects
A basic game similiar to Galaga written in Python using Pygame library.
War Games was a cartoon series proposed around 2007 and developed by Bluefields Creative. The notion was to generate a series of short episodes, in the vein of the series Star Wars: Clone Wars ("Newbie is a rookie Colonial Marine in a future where the Alien threat is widespread and feared by all. Marine training now involves HOT "Bug Hunt" scenarios - HOT meaning this is not a simulation. Rookies are dropped into a semi-controlled environment with live ammunition and REAL aliens (though tethered with remote stun collars). Each episode of the animated series would start with a mini-episode of Newbie and his training exercises, which get more elaborate each week. The animated piece at bluefieldscreative.com represents the first training exercise, pitting Newbie, with very limited ammo, in the middle of an abandoned terraforming station that is home to one single alien. A very dangerous game of one-on-one hide-and-seek. Of course, by the end of the first season, Newbie will be integrated into the main storyline and seeing his first taste of real action in the war against the bugs, but these mini-episodes.
This is a example of Elastic App Search with App Search Python Client
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
The purpose of this project is to share knowledge on how awesome Streamlit is and can be
This repository contains all the files related to the final project for the Master in Data Science, VI Edition, held by Kschool Madrid. The objective of this project is the implementation of a book recommender system, using the data made available by Goodreads, a website where users can register and rate the books they have read, sharing their ratings and opinions with other readers. The approach chosen is to generate a system that recommends books using the information inherent in users' ratings. So, rather than predicting the ratings that each user would give to all the books included in this analysis that he hasn’t read yet (and hence trying to reduce the error in those predictions), I have chosen instead to generate a system that give relevant recommendations to each user based on a certain measure of similarity between the books he has already read and rated and the books he hasn't read but other users do. In order to successfully run the code, please download all the .csv files included in this repository and place them in your own working directory and hence change the path at the beginning of each script (os.chdir("/Users/678094/Desktop/Goodreads")) so it will be pointing at your own working directory where the files have just been saved. I have replicated this part at the beginning of each notebook because in order to build the recommender system it is not compulsory to run all three notebooks: notebooks 01. and 02. are facultative, they serve to scrape from the Goodreads website additional data used in the analysis. But the same data is saved as .csv file and loaded again in notebook .03. So the logical sequence of the code consists in running notebook 01, 02 and 03 respectively, but in case you want to skip the scraping part and go directly to the recommender system, you can run notebook 03. independently, changing the working directory accordingly as indicated above.
This repository contains all the files related to the final project for the Master in Data Science, VI Edition, held by Kschool Madrid. The objective of this project is the implementation of a book recommender system, using the data made available by Goodreads, a website where users can register and rate the books they have read, sharing their ratings and opinions with other readers. The approach chosen is to generate a system that recommends books using the information inherent in users' ratings. So, rather than predicting the ratings that each user would give to all the books included in this analysis that he hasn’t read yet (and hence trying to reduce the error in those predictions), I have chosen instead to generate a system that give relevant recommendations to each user based on a certain measure of similarity between the books he has already read and rated and the books he hasn't read but other users do. In order to successfully run the code, please download all the .csv files included in this repository and place them in your own working directory and hence change the path at the beginning of each script (os.chdir("/Users/678094/Desktop/Goodreads")) so it will be pointing at your own working directory where the files have just been saved. I have replicated this part at the beginning of each notebook because in order to build the recommender system it is not compulsory to run all three notebooks: notebooks 01. and 02. are facultative, they serve to scrape from the Goodreads website additional data used in the analysis. But the same data is saved as .csv file and loaded again in notebook .03. So the logical sequence of the code consists in running notebook 01, 02 and 03 respectively, but in case you want to skip the scraping part and go directly to the recommender system, you can run notebook 03. independently, changing the working directory accordingly as indicated above.
Manage your Ruby application's gem dependencies
Predicting the Quality of Red Wine using Machine Learning Algorithms for Regression Analysis, Data Visualizations and Data Analysis. Description Context The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones). This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality , I just shared it to kaggle for convenience. (If I am mistaken and the public license type disallowed me from doing so, I will take this down if requested.) Content For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10) Tips What might be an interesting thing to do, is aside from using regression modelling, is to set an arbitrary cutoff for your dependent variable (wine quality) at e.g. 7 or higher getting classified as 'good/1' and the remainder as 'not good/0'. This allows you to practice with hyper parameter tuning on e.g. decision tree algorithms looking at the ROC curve and the AUC value. Without doing any kind of feature engineering or overfitting you should be able to get an AUC of .88 (without even using random forest algorithm) KNIME is a great tool (GUI) that can be used for this. 1 - File Reader (for csv) to linear correlation node and to interactive histogram for basic EDA. 2- File Reader to 'Rule Engine Node' to turn the 10 point scale to dichtome variable (good wine and rest), the code to put in the rule engine is something like this: $quality$ > 6.5 => "good" TRUE => "bad" 3- Rule Engine Node output to input of Column Filter node to filter out your original 10point feature (this prevent leaking) 4- Column Filter Node output to input of Partitioning Node (your standard train/tes split, e.g. 75%/25%, choose 'random' or 'stratified') 5- Partitioning Node train data split output to input of Train data split to input Decision Tree Learner node and 6- Partitioning Node test data split output to input Decision Tree predictor Node 7- Decision Tree learner Node output to input Decision Tree Node input 8- Decision Tree output to input ROC Node.. (here you can evaluate your model base on AUC value) Inspiration Use machine learning to determine which physiochemical properties make a wine 'good'! Acknowledgements This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality , I just shared it to kaggle for convenience. (I am mistaken and the public license type disallowed me from doing so, I will take this down at first request. I am not the owner of this dataset. Please include this citation if you plan to use this database: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. Relevant publication P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
An end-to-end PyTorch framework for image and video classification
Open Source Code of Conduct at Twitter
My CodeChef solutions
Hobby
Complete-Life-Cycle-of-a-Data-Science-Project
Data Source: https://www.kaggle.com/dalpozz/creditcardfraud/data It is a CSV file, contains 31 features, the last feature is used to classify the transaction whether it is a fraud or not. Information about data set The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues original features are not provided and more background information about the data is also not present. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise. Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification. The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. Flow of Project We have done Exploratory Data Analysis on full data then we have removed outliers using "LocalOutlierFactor", then finally we have used KNN technique to predict to train the data and to predict whether the transaction is Fraud or not. We have also applied T-SNE to visualize the Fraud and genuine transactions in 2-D. How to Run the Project In order to run the project just download the data from above mentioned source then run any file. Prerequisites You need to have installed following softwares and libraries in your machine before running this project. Python 3 Anaconda: It will install ipython notebook and most of the libraries which are needed like sklearn, pandas, seaborn, matplotlib, numpy, scipy. Installing Python 3: https://www.python.org/downloads/ Anaconda: https://www.anaconda.com/download
Data Source: https://www.kaggle.com/dalpozz/creditcardfraud/data It is a CSV file, contains 31 features, the last feature is used to classify the transaction whether it is a fraud or not. Information about data set The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues original features are not provided and more background information about the data is also not present. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise. Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification. The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. Flow of Project We have done Exploratory Data Analysis on full data then we have removed outliers using "LocalOutlierFactor", then finally we have used KNN technique to predict to train the data and to predict whether the transaction is Fraud or not. We have also applied T-SNE to visualize the Fraud and genuine transactions in 2-D. How to Run the Project In order to run the project just download the data from above mentioned source then run any file. Prerequisites You need to have installed following softwares and libraries in your machine before running this project. Python 3 Anaconda: It will install ipython notebook and most of the libraries which are needed like sklearn, pandas, seaborn, matplotlib, numpy, scipy. Installing Python 3: https://www.python.org/downloads/ Anaconda: https://www.anaconda.com/download/
Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Computer Vision module for detecting emotion, age and gender of a person in any given image, video or real time webcam. A custom VGG16 model was developed and trained on open source facial datasets downloaded from Kaggle and IMDB. OpenCV,dlib & keras were used to aid facial detection and video processing. The final system can detect the emotion, age and gender of people in any given image, video or real time webcam
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Deploying a Sentiment Analysis Model on Amazon Sagemaker which consists of deploying a Sentiment Analysis model using Recurrent Neural Networks in the Amazon AWS SageMaker tool. The notebook and Python files provided here result in a simple web application which interacts with a deployed recurrent neural network performing sentiment analysis on movie reviews. In the final architecture AWS API Gateway and AWS Lambda functions is used as well.
Detectron2 is FAIR's next-generation platform for object detection and segmentation.
Streamlit Web App to predict the onset of diabetes based on diagnostic measures
Predict whether a patient has diabetes. This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes. I conducted a brief exploratory data analysis then ran three models: logistic regression, linear SVC and gradient boosting classification. Links [1] - https://www.kaggle.com/uciml/pima-indians-diabetes-database
Handwritten Digit Recognition using OpenCV, sklearn and Python