Linear Discriminant Analysis (LDA) | Ken Li, FRM

A scoring model is a family of statistical tools developed from qualitative and quantitative empirical data that determines the appropriate parameters and variables for predicting default. Linear discriminant analysis (LDA) is one of the most popular statistical methods used for developing scoring models. An LDA-based model is a reduced form model due to its dependency on exogenous variable selection, the default composition, and the default definition. A scoring function is a linear function of variables produced by an LDA. The variables are chosen based on their estimated contribution to the likelihood of default and come from an extensive pool of qualitative features and accounting ratios. The contributions (i.e., weights) of each accounting ratio to the overall score are represented by Altmans Z-score. Although there are many discriminant analysis methods, the one referenced in this topic is the ordinary least squares method.

LDA categorizes firms into two groups: the first represents performing (solvent) firms and the second represents defaulting (insolvent) firms. One of the challenges of this categorization is whether or not it is possible to predict which firms will be solvent and which will be insolvent prior to default. A Z-score is assigned to each firm at some point prior to default on the basis of both financial and nonfinancial information. A Z cut-off point is used to differentiate both groups, although it is imperfect as both solvent and insolvent firms may have similar scores. This may lead to incorrect classifications.

Altman proposed the following LDA model:
[latex]Z = 1.21x_1 +1.40x_2 + 3.30x_3 + 0.6x_4 + 0.999x_5[/latex]

where:

[latex]x_1[/latex] = working capital / total assets

[latex]x_2[/latex] = accrued capital reserves / total assets

[latex]x_3[/latex] = EBIT / total assets

[latex]x_4[/latex] = equity market value / face value of term debt

[latex]x_5[/latex] = sales / total assets

In this model, the higher the Z-score, the more likely it is that a firm will be classified in the group of solvent firms. The Z-score cut-off (also known as the discriminant threshold) was set at Z = 2.673. The model was used not only to plug in current values to determine a Z-score, but also to perform stress tests to show what would happen to each component (and its associated weighting) if a financial factor changed.

Another example of LDA is the RiskCalc model, which was developed by Moodys. It incorporates variables that span several areas, such as financial leverage, growth, liquidity, debt coverage, profitability, size, and assets. The model is tailored to individual countries, with the model for a country like Italy driven by the positive impact on credit quality of factors such as higher profitability, higher liquidity, lower financial leverage, strong activity ratios, high growth, and larger company sizes.

With LDA, one of the main goals is to optimize variable coefficients such that Z-scores minimize the inevitable overlapping zone between solvent and insolvent firms. For two groups of borrowers with similar Z-scores, the overlapping zone is a risk area where firms may end up incorrectly classified, historical versions of LDA would sometimes consider a gray area allowing for three Z-score range interpretations to determine who would be granted funding: very safe borrowers, very risky borrowers, and the middle ground of borrowers that merited further investigation. In the current world, LDA incorporates the two additional objectives of measuring default probability and assigning ratings.

The process of fitting empirical data into a statistical model is called calibration. LDA calibration involves quantifying the probability of default by using statistical-based outputs of ratings systems and accounting for differences between the default rates of samples and the overall population. This process implies that more work is still needed, even after the scoring function is estimated and Z-scores are obtained, before the model can be used. In the case of the model being used simply to accept or reject credit applications, calibration simply involves adjusting the Z-score cut-off to account for differences between sample and population default rates. In the case of the model being used to categorize borrowers into different ratings classes (thereby assigning default probabilities to borrowers), calibration will include a cut-off adjustment and a potential rescaling of Z-score default quantifications.

Because of the relative infrequency of actual defaults, a more accurate model can be derived by attempting to create more balanced samples with relatively equal (in size) groups of both performing and defaulting firms. However, the risk of equaling the sample group sizes is that the model applied to a real population will tend to overpredict defaults. To protect against this risk, the results obtained from the sample must be calibrated. If the model is only used to classify potential borrowers into performing versus defaulting firms, calibration will only involve adjusting the Z cut-off using Bayes theorem to equate the frequency of defaulting borrowers per the model to the frequency in the actual population.

Prior probabilities represent the probability of default when there is no collected evidence on the borrower. Prior probabilities qinsojv and qsolv represent the prior probabilities of insolvency and solvency, respectively. One proposed solution is to adjust the cut-off point by the following relation:

[latex] ln(\frac{q_solv}{q_insolv})[/latex]

If it is the case that the prior probabilities are equal (which would occur in a balanced sample), there is no adjustment needed to the cut-off point (i.e., relation is equal to 0). If the population is unbalanced, an adjustment is made by adding an amount from the relation just shown to the original cut-off quantity.

For example, assume a sample exists where the cut-off point is 1.00. Over the last 20 years, the average default rate is 3.73% (i.e., [latex]q_insolv[/latex] = 3.73%). This implies that qsolv is equal to 96.25%, and the relation will dictate that we must add [latex]ln(\frac{96.25%}{3.75%})[/latex] or 3.25 to the cut-off point (1.00 + 3.25 = 4.25).

The risk is the potential misclassification of borrowers leading to unfavorable decisions rejecting a borrower in spite of them being solvent or accepting a borrower that ends up defaulting. In the case of the first borrower, the cost of the error is an opportunity cost (C O STolv/insolv). In the case of the second borrower, the cost is the loss given default (COSTinsolv/soly). These costs are not equal, so the correct approach may be to adjust the cut-off point to account for these different costs by adjusting the relation equation as follows:

[latex] ln(\frac{q_solv \times COST_solv/insolv}{q_insolv \times COST_insolv/solv})[/latex]

Extending the earlier example, imagine the current assessment of loss given default is 50% and the opportunity cost is 20%. The cut-off score will require an adjustment of: [latex]ln\frac{96.25% \times 20%}{3.75% \times 50%} [/latex]= 2.33.

The cut-off point selection is very sensitive to factors such as overall credit portfolio profile, the market risk environment, market trends, funding costs, past performance/budgets, and customer segment competitive positions.

Note that LDA models typically offer only two decisions: accept or reject. Modern internal rating systems, which are based on the concept of default probability, require more options for decisions.