Couret.Venter

From BattleActs Wiki
Jump to navigation Jump to search

Reading: Couret, J. and Venter, G., "Using Multi-Dimensional Credibility to Estimate Class Frequency Vectors in Workers Compensation"

Synopsis: To follow...

Study Tips

Still to come...

Read this after Robertson.HazardGroups

Estimated study time: 3 hours (not including subsequent review time)

BattleTable

Based on past exams, the main things you need to know (in rough order of importance) are:

  • fact A...
  • fact B...
reference part (a) part (b) part (c) part (d)
E (2018.Spring #1)
E (2018.Spring #1)
E (2018.Spring #1)
E (2018.Spring #1)
E (2018.Spring #1)
E (2018.Spring #1)
E (2018.Spring #1)
E (2018.Spring #1)

In Plain English!

Claim counts for workers compensation classes are unreliable for serious injuries because of the low frequencies involved. However, serious injury types are correlated with other injuries as the situations which cause fatal (F), permanent total (PT), and major permanent partial (Major) injuries are usually similar. It is generally only a small difference in the situation that results in a significantly different outcome. So a class with a lot of major injuries probably has a higher than average likelihood for permanent total and fatal injuries.

Couret and Venter derive a multivariate correlated credibility by estimating the population mean for each injury type by class using a linear function of the sample means for all of the injury types in the class. The coefficients of the linear function are estimated by minimizing the expected squared error.

They apply this method to ratios of claim counts by injury type to temporary total impairment (TT) claim counts. That is, they treat a temporary total injury as an exposure which could have produced a higher severity claim (F, PT, Major, or Minor). Let V, W, X, and Y be the observed ratios for injury types F, PT, Major, and Minor. The paper assumes the distribution of clami counts by injury type is parametrizable for each class but the parameters are unknown. Let vi, wi, xi, and yi be the population (hypothetical) mean ratios. Then the observed sample claim count ratio of permanent total (PT) to temporary total (TT) for class i at time t is given by [math]M_{i,t}*W_{i,t}=\sum_{j=1}^{m_{i,t}}w_i+\epsilon_{i,t}[/math]. Here, there are [math]m_{i,t}[/math] TT claims, and the [math]\epsilon_{i,t}[/math] are independent perturbations with mean zero and standard deviation [math]\sigma_{W_i}[/math] which vary by class but not time. Hence each TT claim is considered an exposure which may or may not produce a PT claim.

Rearranging the equation gives [math]W_{i,t}=w_i+\sum_{j=1}^{m_{i,t}}\frac{\epsilon_{i,t}}{m_{i,t}}[/math] and [math]Var(W_{i,t}\;|\; w_{i,t})=\frac{\sigma^2_{W_i}}{m_{i,t}}[/math]. Hence, the more TT claims, the smaller the random fluctuations of the annual observed class ratio [math]W_{i,t}[/math] from its population mean.

A key assumption of Couret and Venter's analysis is the variance of F, PT, Major and Minor claims decreases as the number of TT claims increases. That is, the more TT claims you have, the more of the other claim types you should have and hence their variance should decrease.

Let Wi be the sample class mean ratio over all time. Assume there are N independent time periods. Then [math]W_i=\frac{\sum_{t=1}^N m_{i,t}W_{i,t}}{\sum_{t=1}^N m_{i,t}}[/math].

Let mi be the sum of [math]m_{i,t}[/math] over all time, and m be the sum over all classes i of mi. Then [math]Var(W_i \;|\; w_i)=\frac{\sigma^2_{W_i}}{m_i}[/math].

The same calculations are also performed for each of the 7 hazard groups discussed in Robertson.HazardGroups. These will become complements of credibility.

By forming a linear combination of the sample means for the injury types within the class, we account for correlations between injury types. That is, we are estimating Vi by the following equation: [math]\hat{V_i}=V_h + b_{v,i}(V_i-V_h)+c_{v,i}(W_i-W_h)+d_{v,i}(X_i-X_h)+e_{v,i}(Y_i-Y_h)[/math]. Here, Vh is the true mean of the hazard group which contains class i. Similar equations are formed for the class estimates of W, X, and Y.

So far everything has been quite abstract. Let's remedy that by looking at question 5a on the 2015 exam. Insert 2015.Q5 PDF

The coefficients [math]b_{v,i}, c_{v,i}, d_{v,i}, e_{v,i}, \ldots, b_{y,i}, c_{y,i}, d_{y,i}, e_{y,i}[/math] are the credibilities we need to estimate.

Couret & Venter use multivariate Bühlmann-Straub credibility which minimizes the least square error. We want to minimize the expected squared error over all classes between the linear combination and the hypothetical mean. That is, minimize [math]E\left[\left(a+bV_i+cW_i+dX_i+eY_i-w_i\right)^2\right][/math].

The coefficients [math]a, b, c, d, e[/math] are determined as follows:

Differentiating the equation with respect to a and setting equal to 0 yields [math]a=-E\left[bV_i+cW_i+dX_i+eY_i-w_i\right][/math]. Substituting this back into the linear combination gives: [math]E[w_i]+b\left(V_i-E[V_i]\right)+c(W_i-E[W_i])+d(X_i-E[X_i])+e(Y_i-E[Y_i])[/math].

Since [math]E[w_i][/math] is unconditional, it applies across all classes and so can be used to estimate a hazard group mean. Hence, if a class has no credibility, (i.e. [math]b=c=d=e=0[/math]) the hazard group ratio is used.

Also note c corresponds to the traditional credibility factor Z when only injury type PT is considered. This is because [math]cW_i+(1-c)E[W_i]=E[w_i]+c(W_i-E[W_i])[/math] as [math]E[w_i]=E[W_i][/math].

Differentiating the linear combination with respect to b and setting equal to 0 yields [math]aE[V_i]+E[V_i(bV_i+cW_i+dX_i+eY_i-w_i)]=0[/math]. Substituting in the previous expression for a and rearranging gives the following equation: [math]Cov(V_i,W_i)=bVar(V_i)+cCov(V_i,w_i)+dCov(V_i,X_i)+eCov(V_i,Y_i)[/math].

Repeating for c, d, and e yields three similar equations. Taken together, they form the following matrix equation: [math]\left(\begin{array}{c}Cov(V_i,w_i)\\Cov(W_i,w_i)\\Cov(X_i,w_i)\\Cov(Y_i,w_i)\end{array}\right)=C\cdot\left(\begin{array}{c}b\\c\\d\\e\end{array}\right)[/math], where C is the covariance matrix of the class by injury-type sample means.

The difficulty now is in estimating the covariances. Couret & Venter note [math]Var(V_i\;|\;v_i)=\frac{\sigma^2_{V_i}}{m_i}[/math], where [math]\sigma^2_{V_i}[/math] is the process variance for Vi. The unconditional variance of Vi is [math]\frac{EPV_V}{m_i}+VHM_V[/math]. The latter term is the variance of hypothetical means for V which is the variance of the means vi for the unobserved classes of V. The first term is the expected process variance which is [math]E[\sigma^2_{V_i}][/math]. Note this is independent of i because the expectation is over all classes.

The key assumption here is the observed injury ratios for any year for each type of injury are the class injury ratio plus a random, independent perturbation. Couret & Venter conclude it is sufficient to estimate the off-diagonal elements by the sample covariances.

The leading diagonal of the matrix is estimated by [math]\frac{EPV_V}{m_i}[/math] for Var(Vi) etc. Consequently, formulas are required to estimate EPV and VHM. The paper uses formulas due to Dean (2005). In the following, recall a hat denotes an estimate.

Formulas for EPV and VHM

[math]\hat{EPV_V}=\frac{\sum_{i=1}^R\sum_{t=1}^N m_{it}\left(V_{it}-V_i\right)^2}{R(N-1)}[/math] and [math]\hat{VHM_V}=\frac{\sum_{i=1}^Rm_i\left(V_i-V\right)^2-(R-1)\cdot\hat{EPV_V}}{m-\frac{1}{m}\cdot\sum_{i=1}^rm_i^2}[/math]

Lastly, [math]Cov(V_i,W_i)=\sum_{i=1}^R\frac{\left(V_i-V_h\right)\cdot\left(W_i-W_h\right)\cdot m_i}{m_h}[/math].

Here, R is the number of classes in the hazard group containing class i. N is the total number of classes over all hazard groups.

You should make sure you have these formulas memorized and know how to apply them.

Note that V is a weighted average of the Vi 's with weights mi. Also, the expected process variance (EPV) is estimated using the sample variance from each class. This may be negative. If that happens, then set it equal to 0. When it is set equal to 0, the expected process variance accounts for all of the observed variation. That is, there are no individual risk differences.

Performance Testing

This section of the paper is best skim-read after reading the key points below.

Couret & Venter had 7 policy years of untrended and undeveloped workers compensation data available. They discarded the most recent year as they believed it to be too immature. The data is examined in two ways: by hazard group/class and by injury type. They use the sum of square errors to measure the performance of the predictions. The predictions are made three ways: by using the hazard group mean, using the even report years to predict the odd report years, and using the credibility process.

Couret & Venter note that incident ratios are impacted by unknown effects due to changes in the portfolio of individual insurance policies over time. They also note there is considerable volatility in the class ratios at the state level which means improving the estimate of the mean may only produce a small reduction in the sum of squared errors.

The multi-dimensional credibility procedure is designed to minimize the expected deviation between the true class mean and its sample estimator over the same period. By using the even years to predict the odd years (a form of holdout sample), there is a disconnect between the minimized expectation and the sample statistic.

Three testing approaches taken by Couret & Venter

  1. Hazard group method: [math]SSE=\sum_{\mbox{all classes}}(V_h-V_{i,\mbox{holdout}})^2[/math]
  2. Raw class data method: [math]SSE=\sum_{\mbox{all classes}}(V_i-V_{i,\mbox{holdout}})^2[/math]
  3. Credibility method: [math]SSE=\sum_{\mbox{all classes}}(\hat{V_i}-V_{i,\mbox{holdout}})^2[/math]

The lowest sum of squared errors (SSE) is best.

While this sounds complicated, it is relatively easy to apply. Let's walk through 2011 Q2 now. Insert 2011.Q2 PDF

Their credibility method performs only slightly better than the hazard group method. Possible reasons include class data being volatile across years and the estimators being fit to the training dataset rather than the test dataset. The latter point is really saying there could be material differences between even and odd years in the averages. A better approach is to normalize the data by hazard group by year to eliminate differences between hazard groups in the train vs test datasets.

Another key point is their decision to test on ranked portfolios of state-class combinations. This is based on the quintiles test.

The Quintiles Test

We'll apply the quintiles test to an injury type - we'll use V (Fatal injuries) and fixed hazard group.

  1. Split the data into a training data set and a test data set (also known as the holdout sample).
  2. Sort the classes in the training data set into ascending order based on the estimate vi. Superimpose this ordering on the classes in the test data set.
  3. Group the classes into quintiles using the TT injury counts. That is, split the ordered data set into 5 pieces such that each piece contains approximately the same number of TT injuries.
  4. Calculate Vquintile and Vquintile, holdout. Calculate [math]\hat{v}_\mbox{quintile}[/math] and [math]\hat{v_h}[/math] as weighted averages of [math]\hat{v_i}[/math] using the TT injury counts for the weights.
  5. For each of the three approaches taken by Couret & Venter, analyze the sum of squared errors.

One way the CAS can test this material is to have you interpret the results of a quintiles test to determine whether or not you should use the multi-dimensional credibility technique or some other method. Take 5 minutes to attempt the following past exam question now. Insert 2012.Q5 PDF

Couret & Venter's Results

Using the quintiles test, the multi-dimensional credibility procedure produced a much lower sum of squared errors for all hazard groups/injury types except for hazard group A. Couret & Venter suggest this is because hazard group A is highly homogeneous so different injury types aren't prevalent or are not very predictive of other injury types.

Pop Quiz Answers