Here is a step-by-step explanation of the SAS code:
Step 1: The code starts by turning on the ods graphics
option and specifying the location of the librisk
library.
Step 2: The proc import
procedure is used to import a dataset named Credit_Data.csv
from a specified location and create a new dataset named loan_data
in the librisk
library.
Step 3: The data
step is used to manipulate the data and create new variables such as Delinquency
, fico_avg
, last_fico_avg
, and income_category
.
Step 4: The proc contents
procedure is used to generate a report on the contents of the risk_data
dataset.
Step 5: The proc means
procedure is used to generate descriptive statistics for the numeric variables in the risk_data
dataset.
Step 6: The proc freq
procedure is used to generate frequency tables for the character variables in the risk_data
dataset.
Step 7: The proc sgplot
procedure is used to generate box plots and bar plots to visualize the relationship between different variables and the target variable (Delinquency
).
Step 8: The code then removes outliers from the annual_inc
variable and generates a histogram to visualize its distribution.
Step 9: The code also generates a donut plot to visualize the distribution of the purpose
variable.
Step 10: The code uses the proc surveyselect
procedure to generate a random sample of 10000 observations from the income_no_out
dataset.
Step 11: The proc sgscatter
procedure is used to generate a scatter plot to visualize the relationship between two numeric variables (EMI
and Loan_Amount
) and the target variable (Delinquency
).
Step 12: The code then performs statistical data analysis using procedures such as proc corr
, proc glm
, and proc univariate
.
Step 13: The code creates a new dataset named loan_data
that contains only the relevant variables for further analysis.
Step 14: The code uses the proc stdize
procedure to standardize some of the numeric variables in the loan_data
dataset.
Step 15: The code then removes observations with missing values from the loan_data
dataset.
Step 16: The code uses the proc reg
procedure to fit a linear regression model to predict the target variable (Delinquency
) using a set of predictor variables.
Step 17: The code uses the proc logistic
procedure to fit a logistic regression model to predict the target variable (Delinquency
) using a set of predictor variables, perform stepwise selection, and generate a ROC curve.
I hope this format makes it easier for you to understand how this SAS code works!