Couret.Venter: Difference between revisions

Latest revision as of 11:32, 17 June 2024

Reading: Couret, J. and Venter, G., "Using Multi-Dimensional Credibility to Estimate Class Frequency Vectors in Workers Compensation"

Synopsis: It can be difficult to have credible data for low frequency events. In this article Couret & Venter look at Workers' Compensation and how different types of injury severities can be used to improve the predictive power of the data as there is often only a small (possibly random) difference between an accident being say fatal or resulting in a permanent injury.

The article comes in three parts. First we introduce the equations which define the multi-dimensional credibility setup. Then we discuss how to estimate the required credibilities in order to produce estimates of the Workers' Compensation class mean by injury type. Lastly, we look at testing the effectiveness of the multi-dimensional credibility technique.

Study Tips

This is a challenging article as it is densely written so we recommend you read the wiki before turning to the source material. If you get stuck on part of the article, skip ahead for a bit and circle back later. It takes two or three readings for the concepts and algebra to come together.

The material hasn't come up very often lately and there is considerable variation in how the CAS has previously tested it. Make sure you can do all of the prior exam questions and know the concepts well. However, if your exam has a question on this material, you may want to leave it towards the end of the exam to focus on easier points first.

Estimated study time: 16 Hours (not including subsequent review time)

BattleTable

Based on past exams, the main things you need to know (in rough order of importance) are:

Be able to apply the Sum of Squared Errors Test to a given set of results.
Be able to apply the Multi-dimensional credibility equations to the data.
Be able to apply the Quintiles Test in the context of this paper.
Be able to briefly describe Couret & Venter's results.

Questions from the Fall 2019 exam are held out for practice purposes. (They are included in the CAS practice exam.)

reference	part (a)	part (b)	part (c)	part (d)
E (2015.Fall #5)	Multi-dimensional Credibility - calculate ratio	Quintile Test - describe process	Sum of Squared Errors Test - shortcomings
E (2014.Fall #1)	Quintile Test - evaluate	Sum of Squared Errors Test - apply & recommend
E (2014.Fall #4)	Statistical Considerations - apply in context
E (2013.Fall #3)	Holdout Sample - describe purpose	Holdout Sample - recommend	Sum of Squared Errors Test - evaluate	Trends - Mahler.Credibility
E (2012.Fall #5)	Multi-dimensional Credibility - determine appropriateness	Expected Loss - calculate

Full BattleQuiz

Excel Files

Forum

You must be logged in or this will not work.

In Plain English!

Claim counts for workers compensation classes are unreliable for serious injuries because of the low frequencies involved. However, serious injury types are correlated with other injuries as the situations which cause fatal (F), permanent total (PT), and major permanent partial (Major) injuries are usually similar. A small change in the situation may result in a significantly different outcome. So a class with a lot of major injuries probably has a higher than average likelihood for permanent total and fatal injuries.

Couret & Venter derive a multivariate correlated credibility by estimating the population mean for each injury type by class using a linear function of the sample means for all of the injury types in the class. The coefficients of the linear function are estimated by minimizing the expected squared error.

They apply this method to ratios of claim counts by injury type to temporary total impairment (TT) claim counts. That is, they treat a temporary total injury as an exposure which could have produced a higher severity claim (F, PT, Major, or Minor). Let V, W, X, and Y be the observed ratios for injury types F, PT, Major, and Minor. The paper assumes the distribution of claim counts by injury type is parametrizable for each class but the parameters are unknown. Let v_i, w_i, x_i, and y_i be the population (hypothetical) mean ratios. Rather than writing out all of the highly similar equations for each of the four serious injury types we focus on the permanent total injuries (PT) which uses the variable W. You should be able to translate the equations from permanent total injuries to any of the other serious injury types.

The observed sample claim count ratio of permanent total (PT) to temporary total (TT) for class i at time t is given by [math]m_{i,t}\cdot W_{i,t}=\displaystyle\sum_{j=1}^{m_{i,t}}\left(w_i+\epsilon_{j,t}\right)[/math]. Here, there are [math]m_{i,t}[/math] TT claims, and the [math]\epsilon_{j,t}[/math] are independent perturbations with mean zero and standard deviation [math]\sigma_{W_i}[/math] which vary by class but not time. Hence each TT claim is considered an exposure which may or may not produce a PT claim.

Rearranging the equation gives [math]W_{i,t}=w_i+\displaystyle\frac{1}{m_{i,t}}\cdot\sum_{j=1}^{m_{i,t}}\epsilon_{j,t}[/math] and [math]Var(W_{i,t}\;|\; w_i)=\displaystyle\frac{\sigma^2_{W_i}}{m_{i,t}}[/math].

Hence, the more TT claims, the smaller the random fluctuations of the annual observed class ratio [math]W_{i,t}[/math] from its population mean since [math]\frac{1}{m_{i,t}}[/math] goes to 0 as [math]m_{i,t}[/math] increases.

Key Assumption The variance of F, PT, Major and Minor claims decreases as the number of TT claims increases. That is, the more TT claims you have, the more of the other claim types you should have and hence their variance should decrease.

Let W_i be the sample class mean ratio over all time. Assume there are N independent time periods. Then [math]W_i=\displaystyle\frac{\sum_{t=1}^N m_{i,t}W_{i,t}}{\sum_{t=1}^N m_{i,t}}[/math].

Let m_i be the sum of [math]m_{i,t}[/math] over all time, and m be the sum over all classes i of m_i. Then [math]Var(W_i \;|\; w_i)=\displaystyle\frac{\sigma^2_{W_i}}{m_i}[/math].

The same calculations are also performed for each of the 7 hazard groups. These will become complements of credibility. The hazard groups use a subscript h, so [math]V_h[/math] is the observed mean fatal claims for hazard group h.

By forming a linear combination of the sample means for the injury types within class i, we account for correlations between injury types. For instance, focusing on fatal claims, V, we are estimating the population mean for fatal claims in class i, [math]v_i[/math], by the following equation: [math]\hat{v_i}=V_h + b_{v,i}(V_i-V_h)+c_{v,i}(W_i-W_h)+d_{v,i}(X_i-X_h)+e_{v,i}(Y_i-Y_h)[/math].

Remember, [math]V_h[/math] is the observed mean fatal claims for the hazard group h which contains class i.

Similar equations are formed for the class estimates of W, X, and Y.

The coefficients [math]b_{v,i}, c_{v,i}, d_{v,i}, e_{v,i}, \ldots, b_{y,i}, c_{y,i}, d_{y,i}, e_{y,i}[/math] are the multi-dimensional credibilities we need to estimate. Observe that if there was no correlation between the injury types then the credibilities are zero between injury types and the above equations reduce to credibility weighting the observed injury type mean for the class against the observed injury type mean for the hazard group containing the class.

Alice: "So far everything has been quite abstract. Let's remedy that by looking at part (a.) of the following example."

Set up and Solve the Multi-dimensional Credibility Equation

Credibility Considerations

Couret & Venter use multivariate Bühlmann-Straub credibility which minimizes the least square error. We want to minimize the expected squared error over all classes between the linear combination and the hypothetical mean. That is, in the case of permanent total injuries (variable w), minimize [math]E\left[\left(a+bV_i+cW_i+dX_i+eY_i-w_i\right)^2\right][/math].

The coefficients [math]a, b, c, d, e[/math] are determined as follows:

Differentiating the equation with respect to a and setting equal to 0 yields [math]a=-E\left[bV_i+cW_i+dX_i+eY_i-w_i\right][/math]. Substituting this back into the linear combination gives: [math]E[w_i]+b\left(V_i-E[V_i]\right)+c(W_i-E[W_i])+d(X_i-E[X_i])+e(Y_i-E[Y_i])[/math].

Since [math]E[w_i][/math] is unconditional, it applies across all classes and so can be used to estimate a hazard group mean. Hence, if a class has no credibility, (i.e. [math]b=c=d=e=0[/math]) the hazard group ratio is used.

Also note c corresponds to the traditional credibility factor Z when only injury type PT is considered. This is because [math]cW_i+(1-c)E[W_i]=E[w_i]+c(W_i-E[W_i])[/math] as [math]E[w_i]=E[W_i][/math].

Differentiating the linear combination with respect to b and setting equal to 0 yields [math]aE[V_i]+E[V_i(bV_i+cW_i+dX_i+eY_i-w_i)]=0[/math]. Substituting in the previous expression for a and rearranging gives the following equation: [math]Cov(V_i,W_i)=bVar(V_i)+cCov(V_i,w_i)+dCov(V_i,X_i)+eCov(V_i,Y_i)[/math].

Repeating for c, d, and e yields three similar equations. Taken together, they form the following matrix equation: [math]\left(\begin{array}{c}Cov(V_i,w_i)\\Cov(W_i,w_i)\\Cov(X_i,w_i)\\Cov(Y_i,w_i)\end{array}\right)=C\cdot\left(\begin{array}{c}b_{w,i}\\c_{w,i}\\d_{w,i}\\e_{w,i}\end{array}\right)[/math], where C is the covariance matrix of the class by injury-type sample means.

The difficulty now is in estimating the covariances. Couret & Venter note [math]Var(V_i\;|\;v_i)=\displaystyle\frac{\sigma^2_{V_i}}{m_i}[/math], where [math]\sigma^2_{V_i}[/math] is the process variance for V_i. The unconditional variance of V_i is [math]\frac{EPV_V}{m_i}+VHM_V[/math]. The latter term is the variance of hypothetical means for V which is the variance of the means v_i for the unobserved classes of V. The first term is the expected process variance which is [math]E[\sigma^2_{V_i}][/math]. Note this is independent of i because the expectation is over all classes.

Key Assumption The observed injury ratios for any year for each type of injury are the class injury ratio plus a random, independent perturbation.

Couret & Venter conclude it is sufficient to estimate the off-diagonal elements by the sample covariances.

The first entry on the leading diagonal of the matrix is [math]Var(V_i)=\frac{EPV_V}{m_i}+VHM_V[/math]. Subsequent leading diagonal entries follow by replacing V with W, X or Y.

Consequently, formulas are required to estimate EPV and VHM. The paper uses formulas due to Dean (2005). In the following, recall a hat denotes an estimate.

Formulas for EPV and VHM

[math]\hat{EPV_V}=\frac{\sum_{i=1}^R\sum_{t=1}^N m_{it}\left(V_{it}-V_i\right)^2}{R(N-1)}[/math] and [math]\hat{VHM_V}=\frac{\sum_{i=1}^Rm_i\left(V_i-V\right)^2-(R-1)\cdot\hat{EPV_V}}{m-\frac{1}{m}\cdot\sum_{i=1}^Rm_i^2}[/math]

Lastly, [math]Cov(V_i,W_i)=\sum_{i=1}^R\frac{\left(V_i-V_h\right)\cdot\left(W_i-W_h\right)\cdot m_i}{m_h}[/math].

Here, R is the number of classes in the hazard group that contains class i. N is the number of years used in the data set.

You should make sure you have these formulas memorized and know how to apply them.

Note that V is a weighted average of the V_i 's with weights m_i. Also, the variance of hypothetical means (VHM) is estimated using the sample variance from each class. This may be negative. If that happens, then set it equal to 0. When it is set equal to 0, the expected process variance accounts for all of the observed variation. That is, there are no individual risk differences.

mini BattleQuiz 1 You must be logged in or this will not work.

Performance Testing

This section of the paper is best skim-read after reading the key points below.

Couret & Venter had 7 policy years of untrended and undeveloped workers compensation data available. They discarded the most recent year as they believed it to be too immature. The data is examined in two ways: by hazard group/class and by injury type. They use the sum of square errors to measure the performance of the predictions. The predictions are made three ways: by using the hazard group mean, using the "raw" class mean, and using the multi-dimensional credibility process.

To facilitate testing, a holdout sample is created using the data for the odd report years. So the hazard group means, class means and multi-dimensional credibility process is all calculated on the even report years.

Couret & Venter note that incident ratios are impacted by unknown effects due to changes in the portfolio of individual insurance policies over time. They also note there is considerable volatility in the class ratios at the state level which means improving the estimate of the mean may only produce a small reduction in the sum of squared errors.

The multi-dimensional credibility procedure is designed to minimize the expected deviation between the true class mean and its sample estimator over the same period. By using the even years to predict the odd years (a form of holdout sample), there is a disconnect between the minimized expectation and the sample statistic.

Three testing approaches taken by Couret & Venter (shown for fatal injuries):

Hazard group method: [math]SSE=\sum_{\mbox{all classes}}(V_h-V_{i,\mbox{holdout}})^2[/math]
Raw class data method: [math]SSE=\sum_{\mbox{all classes}}(V_i-V_{i,\mbox{holdout}})^2[/math]
Credibility method: [math]SSE=\sum_{\mbox{all classes}}(\hat{V_i}-V_{i,\mbox{holdout}})^2[/math]

The lowest sum of squared errors (SSE) is best.

Alice: "This sounds complicated but don't worry - it's relatively easy to apply. Let's walk through an example now."

Calculate the Sums of Squared Errors

Their credibility method performs only slightly better than the hazard group method. Possible reasons include class data being volatile across years and the estimators being fit to the training dataset rather than the test dataset. The latter point is really saying there could be material differences between even and odd years in the averages. A better approach is to normalize the data by hazard group by year to eliminate differences between hazard groups in the train vs test datasets.

Another key point is their decision to test on ranked portfolios of state-class combinations. This is based on the quintiles test.

The Quintiles Test

We'll apply the quintiles test to an injury type - we'll use V (Fatal injuries) and fixed hazard group.

Split the data into a training data set and a test data set (also known as the holdout sample).
Sort the classes in the training data set into ascending order based on the estimate v_i. Superimpose this ordering on the classes in the test data set.
Group the classes into quintiles using the TT injury counts. That is, split the ordered data set into 5 pieces such that each piece contains approximately the same number of TT injuries.
Calculate V_quintile and V_{quintile, holdout}. Calculate [math]\hat{v}_\mbox{quintile}[/math] and [math]\hat{v_h}[/math] as weighted averages of [math]\hat{v_i}[/math] using the TT injury counts for the weights.
For each of the three approaches taken by Couret & Venter, analyze the sum of squared errors.

Why use a quintiles test?

We could compute the sum of squared errors for each of the methods at the individual class level. However, even after grouping classes into hazard groups, there is a large amount of variation between classes. This noise makes it hard to determine if the credibility method is performing better. By grouping into quintiles and calculating sums of squared errors based on the quintile statistics, we reduce this variation so the data is more credible and we can make a better comparison of the methods.

One way the CAS can test this material is to have you interpret the results of a quintiles test to determine whether or not you should use the multi-dimensional credibility technique or some other method. Take 5 minutes to attempt the following past exam question now.

Evaluate the Multi-dimensional Credibility Technique

Another way this can be tested is in parts b and c of 2015 Q5. Give them a try now!

Describe the Quintiles Test in the Context of Couret & Venter

Couret & Venter's Results

Using the quintiles test, the multi-dimensional credibility procedure produced a much lower sum of squared errors for all hazard groups/injury types except for hazard group A. Couret & Venter suggest this is because hazard group A is highly homogeneous so different injury types aren't prevalent or are not very predictive of other injury types.

mini BattleQuiz 2 You must be logged in or this will not work.

Full BattleQuiz

Excel Files

Forum

You must be logged in or this will not work.

Couret.Venter: Difference between revisions

Latest revision as of 11:32, 17 June 2024

Contents

Study Tips

BattleTable

In Plain English!

Credibility Considerations

Formulas for EPV and VHM

Performance Testing

The Quintiles Test

Why use a quintiles test?

Couret & Venter's Results

Navigation menu

Couret.Venter: Difference between revisions

Latest revision as of 11:32, 17 June 2024

Study Tips

BattleTable

In Plain English!

Credibility Considerations

Formulas for EPV and VHM

Performance Testing

The Quintiles Test

Why use a quintiles test?

Couret & Venter's Results

Navigation menu

Search