Vaitybharati's Projects
The following data are numbers of passengers on flights of Delta Air Lines between San Francisco and Seattle over 33 days in April and early May. 128, 121, 134, 136, 136, 118, 123, 109, 120, 116, 125, 128, 121, 129, 130, 131, 127, 119, 114, 134, 110, 136, 134, 125, 128, 123, 128, 133, 132, 136, 134, 129, 132
Fortune published a list of the 10 largest “green companies”—those that follow environmental policies. Their annual revenues, in $ billions, are given below. Find the mean, variance, and standard deviation of the annual revenues.
Solution to Aczel problems practice (1-66, 1-67, 1-68, 1-69, 1-70)
Solution to Aczel problems practice (1-71, 1-72, 1-73)
Solution to Aczel problems practice (1-74, 1-75)
The following table shows changes in bad loans and in provisions for bad loans, from 2005 to 2006, for 19 lending institutions. Verify the reported averages, and find the medians. Which measure is more meaningful, in your opinion? Also find the standard deviation and identify outliers for change in bad loans and change in provision for bad loans
Solution to Aczel problems practice (1-78, 1-79)
The future Euroyen is the price of the Japanese yen as traded in the European futures market. The following are 30-day Euroyen prices on an index from 0 to 100%. Find mean, variance, standard deviation and the median.
Solution to Aczel problems practice (1-82, 1-83)
The following data are annualized returns on a group of 15 stocks. 12.5, 13, 14.8, 11, 16.7, 9, 8.3, -1.2, 3.9, 15.5, 16.2, 18, 11.6, 10, 9.5
The following data are the total 1-year return, in percent, for 10 midcap mutual funds
Following are the numbers of daily bids received by the government of a developing country from firms interested in winning a contract for the construction of a new port facility
Data: 23, 26, 29, 30, 32, 34, 37, 45, 57, 80, 102, 147, 210, 355, 782, 1209
24. TABLE 1–1 Boston Condominium Data Asking_Price Number_of_Bedrooms Number_of_Bathrooms Direction_Facing Washer/Dryer? Doorman? $709,000 2 1 E Y Y 812,500 2 2 N N Y 980,000 3 3 N Y Y 830,000 1 2 W N N 850,900 2 2 W Y N Data in 100 dollars: 7090, 8125, 9800, 8300, 8509
Anova
A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions. Cutlets.csv
Anova ftest statistics. A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch. Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.
Chi2 contengency independence test. Assume Null Hypothesis as Ho: Independence of categorical variables (male-female buyer rations are similar across regions (does not vary and are not related) Thus Alternate Hypothesis as Ha: Dependence of categorical variables (male-female buyer rations are NOT similar across regions (does vary and somewhat/significantly related)
Chi2 contengency independence test. Q4. TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and has to be reworked before processing. The manager wants to check whether the defective % varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences.
Chi2 contengency independence test. Fantaloons Sales managers commented that % of males versus females walking in to the store differ based on day of the week. Analyze the data and determine whether there is evidence at 5 % significance level to support this hypothesis.
Assignment-04-Simple-Linear-Regression-1. Q1) Delivery_time -> Predict delivery time using sorting time. Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization, Feature Engineering, Correlation Analysis, Model Building, Model Testing and Model Predictions using simple linear regression.
Assignment-04-Simple-Linear-Regression-2. Q2) Salary_hike -> Build a prediction model for Salary_hike Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization. Correlation Analysis. Model Building. Model Testing. Model Predictions.
Multiple-Linear-Regression-1. Consider only the below columns and prepare a prediction model for predicting Price of Toyota Corolla.
Assignment-05-Multiple-Linear-Regression-2. Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model. R&D Spend -- Research and devolop spend in the past few years Administration -- spend on administration in the past few years Marketing Spend -- spend on Marketing in the past few years State -- states from which data is collected Profit -- profit of each state in the past few years.
Assignment-06-Logistic-Regression. Output variable -> y y -> Whether the client has subscribed a term deposit or not Binomial ("yes" or "no") Attribute information For bank dataset Input variables: # bank client data: 1 - age (numeric) 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed) 4 - education (categorical: "unknown","secondary","primary","tertiary") 5 - default: has credit in default? (binary: "yes","no") 6 - balance: average yearly balance, in euros (numeric) 7 - housing: has housing loan? (binary: "yes","no") 8 - loan: has personal loan? (binary: "yes","no") # related with the last contact of the current campaign: 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 10 - day: last contact day of the month (numeric) 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec") 12 - duration: last contact duration, in seconds (numeric) # other attributes: 13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted) 15 - previous: number of contacts performed before this campaign and for this client (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success") Output variable (desired target): 17 - y - has the client subscribed a term deposit? (binary: "yes","no") 8. Missing Attribute Values: None
Assignment-07-Clustering-Hierarchical-Airlines. Perform clustering (hierarchical) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. Data Description: The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.
Assignment-07-DBSCAN-Clustering-Crimes. Perform Clustering for the crime data and identify the number of clusters formed and draw inferences.