Vaitybharati's Projects
Assignment-07-K-Means-Clustering-Airlines. Perform clustering (K means clustering) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.
Assignment-08-PCA-Data-Mining-Wine data. Perform Principal component analysis and perform clustering using first 3 principal component scores (both heirarchial and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)
Association-Rules-Data-Mining-Books. Apriori Algorithm, Association rules with 10% Support and 70% confidence, Association rules with 20% Support and 60% confidence, Association rules with 5% Support and 80% confidence, visualization of obtained rule.
Association Rules Data Mining (Groceries). Converting the data frame into a list of lists, Using Transactionencoder to transform this dataset into a logical data frame, Building the data frame: rows are logical and columns are the items that have been purchased, Print Column names, We need to drop nan column from the data frame, Most popular items, Top 10 Popular items, Barplot visualization of popular items, Apriori Algorithm: Association rules with 5% Support and 70% confidence, Association rules with 1% Support and 80% confidence, Visualization of obtained rule.
Assignment-09-Association-Rules-Data-Mining-my_movies. Apriori Algorithm. Association rules with 10% Support and 70% confidence. Association rules with 5% Support and 90% confidence. Lift Ratio > 1 is a good influential rule in selecting the associated transactions. Visualization of obtained rule.
Q11) Suppose we want to estimate the average weight of an adult male in Mexico. We draw a random sample of 2,000 men from a population of 3,000,000 men and weigh them. We find that the average person in our sample weighs 200 pounds, and the standard deviation of the sample is 30 pounds. Calculate 94%,98%,96% confidence interval?
Below are the scores obtained by a student in tests 34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56. Find mean, median, variance, standard deviation. What can we say about the student marks?
Data _set: Cars.csv Calculate the probability of MPG of Cars for the below cases. MPG <- Cars$MPG a. P(MPG>38) b. P(MPG<40) c. P (20<MPG<50)
Q 21) Check whether the data follows normal distribution a) Check whether the MPG of Cars follows Normal Distribution
Check Whether the Adipose Tissue (AT) and Waist Circumference(Waist) from wc-at data set follows Normal Distribution
Q 22) Calculate the Z scores of 90% confidence interval,94% confidence interval, 60% confidence interval for Adipose Tissue (AT) and Waist Circumference(Waist) from wc-at data set
Q 23) Calculate the t scores of 95% confidence interval, 96% confidence interval, 99% confidence interval for sample size of 25
Q 24) A Government company claims that an average light bulb lasts 270 days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs last an average of 260 days, with a standard deviation of 90 days. If the CEO's claim were true, what is the probability that 18 randomly selected bulbs would have an average life of no more than 260 days
Q7) For Points,Score,Weigh: Find Mean, Median, Mode, Variance, Standard Deviation, and Range and also Comment about the values/ Draw some inferences. Use Q7.csv file
Q9) Calculate Skewness, Kurtosis & draw inferences on the following data Cars speed and distance Use Q9_a.csv
Q9) Calculate Skewness, Kurtosis & draw inferences on the following data SP and Weight(WT) Use Q9_b.csv
Assignment-10-Recommendation-System-Data-Mining-books. Recommend a best book based on the ratings: Sort by User IDs, number of unique users in the dataset, number of unique books in the dataset, converting long data into wide data using pivot table, replacing the index values by unique user Ids, Impute those NaNs with 0 values, Calculating Cosine Similarity between Users on array data, Store the results in a dataframe format, Set the index and column names to user ids, Nullifying diagonal values, Most Similar Users, extract the books which userId 162107 & 276726 have watched, extract the books which userId 276729 & 276726 have watched.
Assignment-11-Text-Mining-01-Elon-Musk, Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv), Text Preprocessing: remove both the leading and the trailing characters, removes empty strings, because they are considered in Python as False, Joining the list into one string/text, Remove Twitter username handles from a given twitter text. (Removes @usernames), Again Joining the list into one string/text, Remove Punctuation, Remove https or url within text, Converting into Text Tokens, Tokenization, Remove Stopwords, Normalize the data, Stemming (Optional), Lemmatization, Feature Extraction, Using BoW CountVectorizer, CountVectorizer with N-grams (Bigrams & Trigrams), TF-IDF Vectorizer, Generate Word Cloud, Named Entity Recognition (NER), Emotion Mining - Sentiment Analysis.
NLP: Sentiment Analysis or Emotion Mining on Amazon Product Reviews - Part-1. Let’s learn the NLP techniques to perform Sentiment Analysis or Emotion Mining on extracted Product Reviews from Amazon. Part-1 covers Text preprocessing and Feature extraction, the next part covers Sentiment Analysis or Emotion Mining on text corpus. https://medium.com/@vaitybharati/nlp-sentiment-analysis-or-emotion-mining-on-amazon-product-reviews-part-1-428d43112027
Text-Mining-Amazon-Reviews-using-Scrapy. Ever wondered? Life would be easier if there could be ways to know how well your product performs and what do people feel about your product? The Solution -Text Mining Techniques. https://medium.com/@vaitybharati/text-mining-how-to-extract-amazon-reviews-using-scrapy-5bd709cb826c
Plot the data, find the outliers and find out μ,σ,σ^2
The time required for servicing transmissions is normally distributed with mean = 45 minutes and SD = 8 minutes. The service manager plans to have work begin on the transmission of a customer’s car 10 minutes after the car is dropped off and the customer is told that the car will be ready within 1 hour from drop-off. What is the probability that the service manager cannot meet his commitment?
The current age (in years) of 400 clerical employees at an insurance claims processing center is normally distributed with mean = 38 and Standard deviation =6. For each statement below, please specify True/False. If false, briefly explain why.A. More employees at the processing center are older than 44 than between 38 and 44. B. A training program for employees under the age of 30 at the center would be expected to attract about 36 employees.
Let X ~ N(100, 20^2). Find two values, a and b, symmetric about the mean, such that the probability of the random variable taking a value between them is 0.99.
Consider a company that has two different divisions. The annual profits from the two divisions are independent and have distributions Profit1 ~ N(5, 3^2) and Profit2 ~ N(7, 4^2) respectively. Both the profits are in $ Million. Answer the following questions about the total profit of the company in Rupees. Assume that $1 = Rs. 45 A. Specify a Rupee range (centered on the mean) such that it contains 95% probability for the annual profit of the company. B. Specify the 5th percentile of profit (in Rupees) for the company C. Which of the two divisions has a larger probability of making a loss in a given year?
In January 2005, a company that monitors Internet traffic (WebSideStory) reported that its sampling revealed that the Mozilla Firefox browser launched in 2004 had grabbed a 4.6% share of the market. I. If the sample were based on 2,000 users, could Microsoft conclude that Mozilla has a less than 5% share of the market? II. WebSideStory claims that its sample includes all the daily Internet users. If that’s the case, then can Microsoft conclude that Mozilla has a less than 5% share of the market?
Auditors at a small community bank randomly sample 100 withdrawal transactions made during the week at an ATM machine located near the bank’s main branch. Over the past 2 years, the average withdrawal amount has been $50 with a standard deviation of $40. Since audit investigations are typically expensive, the auditors decide to not initiate further investigations if the mean transaction amount of the sample is between $45 and $55. What is the probability that in any given week, there will be an investigation?
Association-Rules
Bagging-boosting-stacking
Basics-of-R-Tutorial 1