Giter Club home page Giter Club logo
  • šŸ‘‹ Hi, Iā€™m @vaitybharati
  • šŸ‘€ Iā€™m interested in Data Science, Machine Learning and Artificial Intelligence
  • šŸŒ± Iā€™m currently mastering Python, Tableau, R, MySQL, Azure, Apache, Sapark, Hadoop, SAS, Artificial intelligence and Deep learning
  • šŸ’žļø Iā€™m looking to collaborate on all topics related to Data Science, Machine Learning and Artificial Intelligence.
  • šŸ“« You can reach me on my email id [email protected]

Vaitybharati's Projects

p19.-hypothesis-testing-2-proportion-t-test-students-jobs-in-2-states- icon p19.-hypothesis-testing-2-proportion-t-test-students-jobs-in-2-states-

Hypothesis-Testing-2-Proportion-T-test-Students-Jobs-in-2-States. Assume Null Hypothesis as Ho is p1-p2 = 0 i.e. p1 ā‰  p2. Thus Alternate Hypthesis as Ha is p1 = p2. Explanation of bernoulli Binomial RV: np.random.binomial(n=1,p,size) Suppose you perform an experiment with two possible outcomes: either success or failure. Success happens with probability p, while failure happens with probability 1-p. A random variable that takes value 1 in case of success and 0 in case of failure is called a Bernoulli random variable. Here, n = 1, Because you need to check whether it is success or failure one time (Placement or not-placement) (1 trial) p = probability of success size = number of times you will check this (Ex: for 247 students each one time = 247) Explanation of Binomial RV: np.random.binomial(n=1,p,size) (Incase of not a Bernoulli RV, n = number of trials) For egs: check how many times you will get six if you roll a dice 10 times n=10, P=1/6 and size = repetition of experiment 'dice rolled 10 times', say repeated 18 times, then size=18. As (p_value=0.7255) > (Ī± = 0.05); Accept Null Hypothesis i.e. p1 ā‰  p2 There is significant differnce in population proportions of state1 and state2 who report that they have been placed immediately after education.

p20.-hypothesis-testing-anova-test---iris-flower-dataset icon p20.-hypothesis-testing-anova-test---iris-flower-dataset

Hypothesis Testing Anova Test - Iris Flower dataset. Anova ftest statistics: Analysis of varaince between more than 2 samples or columns. Assume Null Hypothesis Ho as No Varaince: All samples population means are same. Thus Alternate Hypothesis Ha as It has Variance: Atleast one population mean is different. As (p_value = 0) < (Ī± = 0.05); Reject Null Hypothesis i.e. Atleast one population mean is different Thus there is variance in more than 2 samples.

p21.-hypothesis-testing-chi2-test-athletes-and-smokers- icon p21.-hypothesis-testing-chi2-test-athletes-and-smokers-

Hypothesis-Testing-Chi2-Test-Athletes-and-Smokers. Assume Null Hypothesis as Ho: Independence of categorical variables (Athlete and Smoking not related). Thus Alternate Hypothesis as Ha: Dependence of categorical variables (Athlete and Smoking is somewhat/significantly related). As (p_value = 0.00038) < (Ī± = 0.05); Reject Null Hypothesis i.e. Dependence among categorical variables Thus Athlete and Smoking is somewhat/significantly related.

p22.-hypothesis-testing-chi2-test-human-gender-and-choice-of-pets- icon p22.-hypothesis-testing-chi2-test-human-gender-and-choice-of-pets-

Hypothesis-Testing-Chi2-Test-Human-Gender-and-Choice-of-Pets. Assume Null Hypothesis as Ho: Human Gender and choice of pets is independent and not related. Thus Alternate Hypothesis as Ha : Human Gender and choice of pets is dependent and related. As (p_valu=0.1031) > (Ī± = 0.05); Accept Null Hypothesis i.e Independence among categorical variables. Thus, there is no relation between Human Gender and Choice of Pets.

p23.-eda-1 icon p23.-eda-1

EDA (Exploratory Data Analysis) -1: Loading the Datasets, Data type conversions,Removing duplicate entries, Dropping the column, Renaming the column, Outlier Detection, Missing Values and Imputation (Numerical and Categorical), Scatter plot and Correlation analysis, Transformations, Automatic EDA Methods (Pandas Profiling and Sweetviz).

p26.-supervised-ml---multiple-linear-regression---cars-dataset icon p26.-supervised-ml---multiple-linear-regression---cars-dataset

Supervised-ML---Multiple-Linear-Regression---Cars-dataset. Model MPG of a car based on other variables. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.

p27.-supervised-ml---multiple-linear-regression---toyoto-cars icon p27.-supervised-ml---multiple-linear-regression---toyoto-cars

Supervised-ML---Multiple-Linear-Regression---Toyota-Cars. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.

p29.-unsupervised-ml---hierarchical-clustering-univ.- icon p29.-unsupervised-ml---hierarchical-clustering-univ.-

Unsupervised-ML---Hierarchical-Clustering-University Data. Import libraries, Import dataset, Create Normalized data frame (considering only the numerical part of data), Create dendrograms, Create Clusters, Plot Clusters.

p30.-unsupervised-ml---k-means-clustering-non-hierarchical-clustering-univ.- icon p30.-unsupervised-ml---k-means-clustering-non-hierarchical-clustering-univ.-

Unsupervised-ML---K-Means-Clustering-Non-Hierarchical-Clustering-Univ. Use Elbow Graph to find optimum number of clusters (K value) from K values range. The K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion WCSS. Plot K values range vs WCSS to get Elbow graph for choosing K (no. of clusters)

p31.-unsupervised-ml---dbscan-clustering-wholesale-customers- icon p31.-unsupervised-ml---dbscan-clustering-wholesale-customers-

Unsupervised-ML---DBSCAN-Clustering-Wholesale-Customers. Import Libraries, Import Dataset, Normalize heterogenous numerical data using standard scalar fit transform to dataset, DBSCAN Clustering, Noisy samples are given the label -1, Adding clusters to dataset.

p32.-unsupervised-ml---association-rules-data-mining-titanic- icon p32.-unsupervised-ml---association-rules-data-mining-titanic-

Unsupervised-ML---Association-Rules-Data-Mining-Titanic. Data Preprocessing: As the data is categorical format, we are using One Hot Encoding to convert into numerical format. Apriori Algorithm: frequent item sets & association rules. A leverage value of 0 indicates independence. Range will be [-1 1]. A high conviction value means that the consequent is highly depending on the antecedent and range [0 inf]. Lift Ratio > 1 is a good influential rule in selecting the associated transactions.

p33.-unsupervised-ml---pca-data-mining-univ- icon p33.-unsupervised-ml---pca-data-mining-univ-

Unsupervised-ML---PCA-Data-Mining-Univ. Import Dataset, Converting data to numpy array, Normalizing the numerical data, Applying PCA Fit Transform to dataset, PCA Components matrix or covariance Matrix, Variance of each PCA, Final Dataframe, Visualization of PCAs, Eigen vector and eigen values for a given matrix.

p35.-unsupervised-ml---recommendation-system-data-mining-movies- icon p35.-unsupervised-ml---recommendation-system-data-mining-movies-

Unsupervised-ML-Recommendation-System-Data-Mining-Movies. Recommend movies based on the ratings: Sort by User IDs, number of unique users in the dataset, number of unique movies in the dataset, Impute those NaNs with 0 values, Calculating Cosine Similarity between Users on array data, Store the results in a dataframe format, Set the index and column names to user ids, Slicing first 5 rows and first 5 columns, Nullifying diagonal values, Most Similar Users, extract the movies which userId 6 & 168 have watched.

p36.-supervised-ml---decision-tree---c5.0-entropy-iris-flower- icon p36.-supervised-ml---decision-tree---c5.0-entropy-iris-flower-

Supervised-ML-Decision-Tree-C5.0-Entropy-Iris-Flower-Using Entropy Criteria - Classification Model. Import Libraries and data set, EDA, Apply Label Encoding, Model Building - Building/Training Decision Tree Classifier (C5.0) using Entropy Criteria. Validation and Testing Decision Tree Classifier (C5.0) Model

r1 icon r1

R Basics Tutorial-1

r2 icon r2

R2 - Decision Making statements in R

r3 icon r3

R3 - Joins and Appling Functions in R

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.