The original data-set had 81 variables and 113,937 observations. I narrowed my data-set down to 17 variables.
I will try to determine what factors influence borrower cost. I selected the BorrowerAPR as an estimate of total borrower cost, as it measures the effective rate after fees. Prosper Score and Credit Score were selected as I suspect they will heavily influence the loan costs. I selected variables related to employment, income and loan amount as they should be considered when determining a borrowers ability to repay their loan. Inquiries in the last six months and delinquencies were selected to see how any negative information would be factored into loan prices. Finally, I selected the loan origination quarter to see if interest rates had fluctuated over the time frame of the data-set.
The borrower APR ranges from 0.6% to 51.2% with a median of 21.0%. There is a spike in the number of loans priced near 35.8%
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00653 0.15630 0.20980 0.21880 0.28380 0.51230 25
Most common APR’s:
##
## 0.35797 0.35643 0.37453
## 3672 1644 1260
The prosper score histogram shows a relatively normal distribution.
The credit scores show a relatively normal distribution with a median of 699.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 19.0 679.0 699.0 704.6 739.0 899.0 591
The median available bankcard credit is $4,100, with a heavily left skewed distribution. Transforming the available loans (plus one to include those with zero available) by log10 creates a normal distribution with an additional spike for borrowers with no available credit.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 880 4100 11210 13180 646300 7544
The debt to income ratio shows a normal distribution with a median of 0.22, and several right skewed outliers.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
The median loan size was 6,500.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
The number of loans accelerated significantly in the latter part of the data-set.
I narrowed the original data-set from 81 variables to 17, by eliminated variables that were similar to other variables in the data-set and variables that were not related to my inquiry of what determines borrowing cost.
My narrowed data-set originally included three factorial variables, I then added an additional factorial variable (Prosper Score) in order to create additional box plot visualizations. The remaining variables are numeric and integers.
I am interested in measuring the main factors in determining borrower cost, for which I will use the BorrowerAPR as a measure. I suspect credit score, and income to be two of the primary factors in determining borrower cost. I also anticipate that the Prosper Score will be a primary factor. However, I suspect that the Prosper Score and Credit Score will be so highly correlated that they may be effectively redundant.
Delinquencies, home ownership status and the length of employment duration may also help determine borrower costs. It is also possible that rates changed over time, thus the loan closing date could be a factor.
I converted the Prosper score into a new ProsperScoreFactor variable so I could use various visualizations on the data. I converted the Income range factor variable into a number, so I could perform correlation calculations. I ordered the Income Range factor variable so that it would be easier to understand in the visualizations. I also modified the loan origination quarter variable so that it could be sorted in chronological order.
The Borrower APR shows the strongest correlations with Prosper score, credit score, loan amount and available credit.
The correlation between loan amount and prosper score is likely due to a self selecting bias, where the most credit worthy borrowers are the only group that can borrower large dollar amounts and are also most likely to demand and receive lower rates.
As expected, there is a direct correlation between interest rate and prosper score. The correlation coefficient is -0.66. The higher the Prosper Score (indicating lower risk) the lower the interest rate.
##
## Pearson's product-moment correlation
##
## data: loanSummary$ProsperScore and loanSummary$BorrowerAPR
## t = -260.93, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6709351 -0.6634688
## sample estimates:
## cor
## -0.6672187
There appears to be a correlation between credit ratings and interest rates. This relationship is what I anticipated, with lower rates for borrowers with higher credit ratings and a correlation coefficient of -0.43.
##
## Pearson's product-moment correlation
##
## data: loanSummary$CreditScoreRangeUpper and loanSummary$BorrowerAPR
## t = -160.21, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4344422 -0.4249487
## sample estimates:
## cor
## -0.4297073
Borrowers with higher incomes tend to have lower borrowing costs.
The following plot shows the rise and retraction of the mean and median interest rates through the data-set’s time frame.
Since Prosper score seems to have a strong influence on borrowing cost, what influences the Prosper score?
How strong is the relationship between credit rating and prosper score? They are highly correlated, with a correlation coefficient of 0.37. However, there are several outliers and the Prosper Score does not have as strong a linear relationship with the Credit Score as I anticipated. The median Credit Score is unchanged for Prosper Scores 2 through 5 and 6 through 8.
##
## Pearson's product-moment correlation
##
## data: loanSummary$ProsperScore and loanSummary$CreditScoreRangeUpper
## t = 115.93, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3639411 0.3755582
## sample estimates:
## cor
## 0.3697641
Stated monthly income has a weak relationship with the Prosper Score, with a correlation coefficient of only 0.08.
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and ProsperScore
## t = 24.069, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07566126 0.08902693
## sample estimates:
## cor
## 0.0823478
Prosper score and available bank loans have a high correlation coefficient of 0.31.
##
## Pearson's product-moment correlation
##
## data: loanSummary$ProsperScore and loanSummary$AvailableBankcardCredit
## t = 96.29, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3077796 0.3199109
## sample estimates:
## cor
## 0.3138581
APR appears to be correlated with the Prosper Score, credit score, loan amount and available bank credit, but does not appear to be affected by employment duration.
The Prosper score is correlated with the credit score and available credit but (surprisingly) does not appear to be affected by stated monthly income.
The strongest relationship was, not surprisingly, between the Borrower APR and the Prosper Score. This makes sense as the Prosper score is prevalent on the website, thus is likely the primary consideration in the risk weighting by decision for the investor/lenders.
Borrowers in higher income ranges tend to have lower APR’s and higher credit scores. Additionally, borrowers with higher income ranges benefit from a steeper regression line toward lower APR’s, meaning that as their credit scores increase they benefit from a greater reduction in rates than those with lower incomes would.
Each level of Prosper score tends to have a similar slope toward lower APR’s as credit scores increase. Borrowers with higher prosper scores tend to have lower APR’s and higher credit scores.
##
## Pearson's product-moment correlation
##
## data: CreditScoreRangeUpper and BorrowerAPR
## t = -160.21, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4344422 -0.4249487
## sample estimates:
## cor
## -0.4297073
As with the prior plots, higher income ranges tend to have higher prosper scores and lower APR’s.
##
## Pearson's product-moment correlation
##
## data: ProsperScore and BorrowerAPR
## t = -260.93, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6709351 -0.6634688
## sample estimates:
## cor
## -0.6672187
The dashed line below represents the median APR, the colored lines represent the median APR’s for each Prosper Score.
The median APR’s tend to follow the overall median APR for each income level, with the exception of unemployed.
There is a weak upward trend between stated monthly income and credit score. The relationship is not as strong as I anticipated with a correlation coefficient of 0.1. The correlation coefficient between the Prosper score and stated monthly income is an even weaker 0.08.
##
## Pearson's product-moment correlation
##
## data: loanSummary$CreditScoreRangeUpper and loanSummary$StatedMonthlyIncome
## t = 36.54, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1021433 0.1136511
## sample estimates:
## cor
## 0.1079008
##
## Pearson's product-moment correlation
##
## data: loanSummary$StatedMonthlyIncome and loanSummary$ProsperScore
## t = 24.069, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07566126 0.08902693
## sample estimates:
## cor
## 0.0823478
Borrowers with more available bankcard credit tend to have lower APR’s and are more likely to be homeowners. The correlation coefficient between available bankcard credit and APR -0.35.
##
## Pearson's product-moment correlation
##
## data: loanSummary$AvailableBankcardCredit and loanSummary$BorrowerAPR
## t = -121.44, df = 106390, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3541924 -0.3436378
## sample estimates:
## cor
## -0.3489261
While borrower who are homeowners seem to have higher credit scores in general, home ownership does not appear to have a significant affect on the prosper score.
Similar to the APR, borrowers with higher incomes and credit scores tend to have higher Prosper scores.
Borrowers with higher available bank credit tend to have higher income ranges and higher Prosper scores.