LO 46.4: Explain how to validate the calibration and the discriminatory power of a | Ken Li, FRM

LO 46.4: Explain how to validate the calibration and the discriminatory power of a rating model.
Validating Calibration
The validation process looks at the variances from the expected PDs and the actual default rates.
The Basel Committee (2005a)2 suggests the following tests for calibration: Binomial test. Chi-square test (or Hosmer-Lemeshow). Normal test. Traffic lights approach. The binomial test looks at a single rating category at a time, while the chi-square test looks at multiple rating categories at a time. The normal test looks at a single rating category for more than one period, based on a normal distribution of the time-averaged default rates. Two key assumptions include (1) mean default rate has minimal variance over time and (2) independence of default events. The traffic lights approach involves backtesting in a single rating category for multiple periods. Because each of the tests has some shortcomings, the overall conclusion is that no truly strong calibration tests exist at this time.
Validating Discriminatory Power
The validation process is performed ex post using backtesting of defaulting and non defaulting items. Therefore, the concept of a longer forecast period requires that the forecast period begin further away from t = 0 and from the time the data is collected.
Validating discriminatory power can be done using the following four methods as outlined by the Basel Committee (2005a):
Migration matrices. Accuracy indices (e.g., Lorentzs concentration curves and Gini ratios). Classification tests (e.g., binomial test, Type I and II errors, chi-square test, and
Statistical tests (e.g., Fishers r2, Wilks X, and Hosmer-Lemeshow).
normality test).
The frequency distribution of errors is key to assessing the models forecasting reliability. With regard to error rates, validation requires an assessment of error tolerance, its calibration, and its financial impact (e.g., a false positive or Type I error increases losses, and a false negative or Type II error increases opportunity costs).
Basel Committee on Banking Supervision (2005a), Studies on Validation of Internal Rating Systems, Working Papers 14, Basel, Switzerland.
2018 Kaplan, Inc.
Page 111
Topic 46 Cross Reference to GARP Assigned Reading – De Laurentis, Maino, and Molteni, Chapter 5
Ke y C o n c e pt s
LO 46.1 To validate a rating model, a financial institution must confirm the reliability of the results produced by the model and that the model still meets the financial institutions operating needs and any regulatory requirements. The tools and approaches to validation are regularly reassessed and revised to stay current with the changing market and operating environment.
Best practices for the roles of internal organizational units in the validation process include active involvement of senior management and the internal audit group. In general, all staff involved in the validation process must have sufficient training to perform their duties properly.
With regard to independence, the validation group must be independent from the groups that are developing and maintaining validation models and the group(s) dealing with credit risk. The validation group should also be independent of the lending group and the rating assignment group. Ultimately, the validation group should not report to any of those groups. Given that validation is mainly done using documentation received by groups dealing with model development and implementation, the quality of the documentation is important. Controls must be in place to ensure that there is sufficient breadth, transparency, and depth in the documentation provided.
LO 46.2 There are five key areas regarding rating systems that are analyzed during the qualitative validation process: (1) obtaining probabilities of default, (2) completeness, (3) objectivity, (4) acceptance, and (3) consistency.
Quantitative validation comprises the following areas: (1) sample representativeness, (2) discriminatory power, (3) dynamic properties, and (4) calibration.
LO 46.3 Defaults are the key constraint in terms of creating sufficiently large data sets for model development, rating quantification, and validation purposes.
With regard to sample size and sample homogeneity, it is difficult to create samples from a population over a long period using the same lending technology. Lending technology is most likely to change. Unfortunately, the changes result in less consistency between the data used to create the rating model and the population to which the model is applied.
The time horizon of the data may be problematic because the data should take into account a full credit cycle. If it is less than a full cycle, the estimates will be biased by the favorable or unfavorable stages during the selected period within the cycle.
Validating data quality focuses on the stability of the lending technology and the degree of calibration required to infer sample results to the population.
Page 112
2018 Kaplan, Inc.
Topic 46 Cross Reference to GARP Assigned Reading – De Laurentis, Maino, and Molteni, Chapter 5
LO 46.4 Validating calibration looks at the variances from the expected probabilities of default and the actual default rates. Tests of calibration include (1) binomial test, (2) chi-square test (or Hosmer-Lemeshow), (3) normal test, and (4) traffic lights approach.
Validating discriminatory power involves backtesting of defaulting and non-defaulting items. Tests of discriminatory power include (1) statistical tests, (2) migration matrices, (3) accuracy indices, and (4) classification tests.
2018 Kaplan, Inc.
Page 113
Topic 46 Cross Reference to GARP Assigned Reading – De Laurentis, Maino, and Molteni, Chapter 5
C o n c e pt Ch e c k e r s
1.
2.
3.
4.
3.
Which of the following statements regarding the model validation process is most accurate? A. The validation process places equal importance on quantitative and qualitative
B. The validation group could be involved with the rating system design and
validation.
development process.
methodology.
C. The quantitative validation process involves an analysis of structure and model
D. The breadth and depth of validation should be commensurate primarily with
the dollar value of the loans outstanding.
Which of the following areas of quantitative validation would focus on rating systems stability? A. Calibration. B. Discriminatory power. C. Dynamic properties. D. Sample representativeness.
The increasing use of heuristic rating models versus statistical rating models would most likely be covered under which area of qualitative validation? A. Acceptance. B. Completeness. C. Consistency. D. Objectivity.
Which of the following statements regarding the validation of data quality is correct? A. Data should be created from a full credit cycle. B. Validating central tendency in the long term is done through normality testing. C. In practice, it is necessary to create samples from a population over a five-year
period using the same lending technology.
D. To make inferences about the population from the samples used in a model, it is
necessary to calibrate appropriately and do in-sample testing.
Which of the following methods would most likely be used to validate both the calibration and the discriminatory power of a rating model? A. Accuracy indices. B. Classification tests. C. Migration matrices. D. Traffic lights approach.
Page 114
2018 Kaplan, Inc.
Topic 46 Cross Reference to GARP Assigned Reading – De Laurentis, Maino, and Molteni, Chapter 5
C o n c e pt C h e c k e r An s w e r s
1. B The validation group could be involved with the rating system design and development
process as long as sufficient controls are in place to ensure independence. For example, the internal audit group could confirm that the validation group is acting independently.
There is more emphasis on qualitative validation over quantitative validation. Structure and model methodology is dealt with under qualitative validation, not quantitative. The breadth and depth of validation is not primarily focused on the dollar value of the loans outstanding and takes a broader approach by considering the type of credit portfolios analyzed, the complexity of the financial institution, and the level of market volatility.
2. C Dynamic properties include rating systems stability and attributes of migration
matrices. Calibration looks at the relative ability to estimate probability of default (PD). Discriminatory power is the relative ability of a rating model to accurately differentiate between defaulting and non-defaulting entities for a given forecast period. Sample representativeness is demonstrated when a sample from a population is taken and its characteristics match those of the total population.
3. A Heuristic models are more easily accepted since they mirror past experience and the credit
assessments tend to be consistent with cultural norms. In contrast, statistical models are less easily accepted given the high technical knowledge demands to understand them and the high complexity that creates challenges when interpreting the output.
Completeness refers to the sufficiency in number of factors used for credit granting purposes since many default-based models use very few borrower characteristics. In contrast, statistical- based models allow for many borrower characteristics to be used. Consistency refers to models making sense and being appropriate for their intended use. For example, statistical models may produce relationships between variables that are nonsensical, so the process of eliminating such variables increases consistency. Objectivity is achieved when the rating system can clearly define creditworthiness factors with the least amount of interpretation required, choosing between judgment-based versus statistical-based models.
4. A
If data is created from less than a full credit cycle, the estimates will be biased by the favorable or unfavorable stages during the selected period within the cycle.
Validating central tendency in the long term is done through backtesting and stress testing. In practice, it is almost impossible to have credit rules and regulations remain stable for even five years of a credit cycle. To make inferences about the population, it is necessary to use out- of-sample testing whereby the observations are created from the same lending technology but were not included in the development sample.
5. B Classification tests include the binomial test, chi-square test, and normality test. Those tests
are used to analyze discriminatory power and calibration.
Accuracy indices and migration matrices are used only for discriminatory power. The traffic lights approach is used only for calibration.
2018 Kaplan, Inc.
Page 115
The following is a review of the Operational and Integrated Risk Management principles designed to address the learning objectives set forth by GARP. This topic is also covered in:
M o d e l Ri s k
Topic 47
E x a m F o c u s
Models are indispensible in modern finance in quantifying and managing asset-liability risk management, credit risk, market risk, and many other risks. Models rely on a range of data input based on a combination of historical data and risk assumptions, and are critical in managing risk exposures and financial positions. However, models rely on the accuracy of inputs, and errors give rise to model risk. Model risk can range from errors in inputs and assumptions to errors in implementing or incorrectly interpreting a model, and can result in significant losses to market participants. For the exam, be able to identify and explain common model errors, model implementation and valuation issues, and model error mitigation techniques. Also, be familiar with the two case studies discussed related to model risk: Long-Term Capital Management and the London Whale incident.
S o u r c e s o f M o d e l R i s k