Mahler.Credibility

Reading: Mahler, H. C., "An Example of Credibility and Shifting Risk Parameters"

Synopsis: Tested on the paper but not the appendices.

The paper deals with optimally combining different years of historical data when experience rating. The usual application of credibility, namely Data * Credibility + Prior Estimate * Complement of Credibility is used. The prior estimate of credibility is normally the average experience for a class of insureds but could also be the relative loss potential of the insured to the class average.

The goal is to understand how changes in the risk classification parameters over time results in older years of data adding less credibility to experience rating than expected.

Study Tips

To follow...

Estimated study time: x mins, or y hrs, or n₁-n₂ days, or 1 week,... (not including subsequent review time)

BattleTable

Based on past exams, the main things you need to know (in rough order of importance) are:

fact A...
fact B...

reference	part (a)	part (b)	part (c)	part (d)
E (2018.Fall #01)
E (2015.Fall #04)
E (2013.Fall #03)
E (2012.Fall #03)

In Plain English!

The efficiency of an experience rating plan is the reduction in expected squared error due to the use of the plan. The lower the expected squared error, the higher the efficiency.

Mahler uses data on baseball games to illustrate his points. He uses elementary statistics (binomial distribution, normal approximation) to conclude baseball teams do have significant differences between them over the years. Since they have differences between them, this means experience rating should predict future performance with some accuracy.

Next, Mahler asks if the differences in the win/loss record for a fixed team over time can be explained by random fluctuations from the same underlying distribution. He concludes the observed results cannot be explained this way, so the parameters of the distribution which describes the number of losses for a team in a year are changing over time.

One method for testing if parameters shift over time is the standard chi-squared test. Mahler groups the data into 5-year periods (other lengths could also be used) by team and uses the chi-squared test to measure if they could have the same underlying mean over the entire dataset. If a team didn't change its losing percentage over time, then its losing percentage should be normally distributed around its average. The chi-squared statistic is defined as [math]\displaystyle\sum_{i=1}^n\frac{\left(\mbox{Actual Loss}_i - \mbox{Expected Loss}_i\right)^2}{\mbox{Expected Loss}_i}[/math]. For a fixed team with n games, we have [math]n-1[/math] degrees of freedom if we have to estimate the expected loss.

Another method is to compute the average correlation between risks.

Fix the distance, t, between years.
Consider the set of all pairs of losing percentages which are t years apart for a fixed baseball team. Calculate the correlation coefficient for this dataset.
Repeat step 2 for each team and the average the resulting set of correlation coefficients for the fixed t value.
Repeat steps 2 and 3 for new t values.

For reference, the correlation coefficient is defined as [math]\frac{Cov(X,Y)}{\sigma_X\cdot\sigma_Y} =\frac{E[\left(x-\bar{x}\right)\left(y-\bar{y}\right)]}{\sigma_X\cdot\sigma_Y}[/math].

Since we are given a lot of years of baseball data, for small t values the set of pairs t years apart is relatively large. As t increases, the volume of data decreases. Mahler notes that a non-zero observed correlation is not necessarily statistically significant as the 95% confidence interval about 0 is approximately [math]\pm0.1[/math] and this increases as the number of data points decreases.

Looking at the results shown in the paper, we observe the correlation method yields correlations which are significantly greater for years that are closer together (small t values) than those further apart. In fact, after approximately 10 years there is no distinguishable correlation between years. This leads Mahler to conclude recent years can be used to predict the future.

Let's see how the CAS might test this in an integrative question. Read through the following solution to 2018 Q1(a) and at a later date make sure you can fill in all the details from memory. Insert 2018.Q1a PDF

Statement of the Problem

Let X be the quantity we want to estimate. Let Y_i be various known estimates (the estimators) and set X to a weighted linear combination of the estimators, i.e. [math]X=\sum_i Z_i\cdot Y_i[/math]. The goal is to find the optimal set of weights {Z_i} that produces the best estimate of the future losing percentage.

Four Simple Solutions

Mahler eventually covers six solutions to the problem. We'll follow in his footsteps and begin with four easy options.

Assume every risk is average, so set the predicted mean equal to 50%. This assumes all games have a winner/ties are negligible. It also ignores all historical data, so gives 0% credibility to the data.
Assume the previous year repeats itself. This gives 100% credibility to the previous year.
Credibility weight the previous year with the grand mean.
- The grand mean is some external estimate that is independent of the data. In this case it is 50% because we assume equal likelihood of a win or loss.
- Cases 1 and 2 are special cases of this, corresponding to 0% and 100% credibility respectively.
Give equal weight to the most recent N years of data.
- This can be further extended by calculating the credibility, Z of the N years of data and then giving each prior year weight [math]\frac{Z}{N}[/math] and weight [math]1-Z[/math] to the grand mean.

There are various choices for determining the credibility used in the fourth option. Bühlmann, Bayesian or classical limited fluctuation credibility methods could be used.

To see how the CAS can test this material in an integrative question, read through the following PDF solution to 2018 Q1(b). Insert 2018.Q1b PDF At a later date, you should return to this exam and see if you can fill in all the details from memory.

Three Criteria for Deciding Between Solutions

Bühlmann and Bayesian credibility methods which minimize the mean squared error.
- The sum of squared errors (SSE) is defined as [math]SSE=\sum_{\mbox{team}}\left(\sum_{\mbox{years}}\left(X_{est,team}-X_{actual,team}\right)^2\right)[/math]
- The mean squared error is defined as [math]\frac{SSE}{\# \mbox{ teams}\cdot\#\mbox{ years}}[/math].
Small chance of large error (classical credibility)
- When the test is met, there is a probability P of a maximum departure from the expected mean of no more than k percent.
- We can't directly observe the loss potential as it varies over time. Thus, averaging over time produces an incorrect result.
- Rephrased - we're looking for credibilities which minimize [math]\mbox{Pr}\left(\frac{|X_{est,team}-X_{actual,team}|}{X_{est,team}}\gt k\%\right)[/math], where k% is some predetermined threshold.
Meyers/Dorweiler Credibility:
- Calculate the correlation using the Kendall Tau statistic. The first quantity is the ratio of actual losing percentage to predicted losing percentage. The second quantity is the ratio of the predicted losing percentage to the overall average losing percentage.

Section 8 of Mahler's paper applies each of these methods in turn to the baseball data. It's best given a quick skim-read. One key takeaway is found in Section 8.3: As the number of previous years, N, used as estimators increases, the credibility of the entire N year period decreases when using Bühlmann or Bayesian techniques. This is counterintuitive at first because actuaries typically expect credibility to increase as the volume of data is increased. However, Mahler points out the result is actually what we should expect given the parameters are shifting significantly over time. This effect is also seen using classical credibility but isn't seen when Meyers/Dorweiler credibility is used.

A question considered by Mahler in Section 8.5 is what constitutes a significant reduction in the mean squared error when using Bühlmann or Bayesian techniques? In the appendices (Alice:"Remember you're not tested on the appendices..."), Mahler derives a theoretical limit for the best reduction in mean squared error that can be achieved by credibility weighting two estimates. The theoretical limit derived relates to the minimum of the squared errors resulting from placing either 100% weight or 0% weight on the data. When N years of data is used under the assumption that the distributional parameters do not shift over time, the best possible reduction in squared error is [math]\frac{1}{2\cdot(N+1)}\%[/math], so for [math]N=1[/math], a 25% reduction in squared error is possible, or as stated in the paper, the credibility weighted estimate has a mean square error at least 75% of that of the mean square error resulting from giving either 100% or 0% weight to the data. Mahler uses this to conclude his measured reduction of the squared error to 83% for the baseball data set is significant.

Since there is less benefit to including older data (due to shifts in the distributional parameters), delays in receiving the data result in a reduction in the accuracy of experience rating. As expected, the mean squared error increases rapidly at first and then tapers off (increases at a decreasing rate) as the data delay widens. Consequently, the credibility of the data decreases at a decreasing rate as the time delay between the latest available data and the prediction time increases.

Equations for Least Squares Credibility

Reading the conclusion of the Mahler paper makes this section feel like it should be the most testable. However, to date it appears there haven't been any exam questions on it...

The variance of a dataset may be split into two parts. The variance between risks which Mahler denotes by [math]\tau^2[/math], and the variance within risks which Mahler writes as [math]\delta^2+\zeta^2[/math]. Here, [math]\delta^2[/math] is the process variance and [math]\zeta^2[/math] is the parameter variance (variance due to changes in the distribution parameters over time). This sub-division of the within risk isn't necessary for computing credibility here.

Mahler applies method 6, that is, let year X_i be given weight Z_i and assign the balance to the grand mean M for the entire dataset. We get the following equation which relates the expected square error between the observation and prediction: [math]\displaystyle\sum_{i=1}^N\sum_{j=1}^N Z_iZ_j(\tau^2+C(|i-j|)) - 2\sum_{i=1}^N Z_i(\tau^2+C(N+\Delta-i)) +\tau^2 +C(0)[/math]. Where [math]\tau^2[/math] is the between variance, C(k) is the covariance for data for the same risk, k years apart (within covariance), C(0) is the within variance, and Δ is the length of time between the latest year of data used and the year being estimated, and N is the number of years of data being used.

This is viewed as a second degree polynomial for the credibility weights Z_i. Most likely you would be given this equation on the exam if it was needed.

Differentiating the above equation leads to N equations in N unknowns so it can be solved exactly. In particular, this equation can be solved for the case when the total assigned credibility Z_i is distributed uniformly across the N years. Note that this method can produce credibilities which do not necessarily decline monotonically, and can even be negative. Negative weights allow more emphasis to be given to other years with an associated reduction in expected squared error.

The credibilities may also be restricted so they sum to 1 and hence place no weight on the grand mean. The resulting system of equations may be solved using Lagrange multipliers. In the paper, Mahler's results show the most recent year receives the greatest credibility. However, a monotonic decline is not guaranteed and nor are the weights necessarily all greater than zero.

At the end of Section 11, Mahler notes his results are reasonable for estimating the least squares credibility in baseball. However, the transfer of these results to another area is dependent on the covariance structure of the proposed dataset. Details are discussed in the appendix which is out of scope for the exam.

The three types of credibility considered in Mahler's paper can be viewed as two categories. Bühlmann/Bayesian credibility and limited fluctuation credibility attempt to eliminate large errors. Bühlmann/Bayesian credibility achieves this because large errors will have a disproportionate impact on the sum of square errors. Limited fluctuation credibility does this by minimizing the errors greater than a selected threshold.

However, Meyers/Dorweiler credibility may be viewed as concerned with the pattern of the errors. Large errors are not a problem as long as there is no pattern relating them to the experience rating modifications. Mahler concludes it's important to understand what any criterion tests and that there are hazards to solely relying on any one method.

Mahler's Ratemaking Example

At the end of the paper, Mahler gives a brief overview of an application to ratemaking. The idea is to combine the five most recent annual loss ratios to produce a rate level indication, i.e. [math]N=5[/math]. We're told to assume there are three years between the latest year of data and the future average date of loss, i.e. [math]\Delta=3[/math]. By assuming a covariance structure, placing no weight on the grand mean (because in ratemaking, what is our best guess for the grand mean?), and requiring the credibilities to sum to 1, there are N equations in N unknowns to solve. The equations in Mahler's example are:

[math]Z_1C(0)+Z_2C(1)+Z_3C(2)+Z_4C(3)+Z_5C(4)=C(7)+\frac{\lambda}{2} [/math]
[math]Z_1C(1)+Z_2C(0)+Z_3C(1)+Z_4C(2)+Z_5C(3)=C(6)+\frac{\lambda}{2} [/math]
[math]Z_1C(2)+Z_2C(1)+Z_3C(0)+Z_4C(1)+Z_5C(2)=C(5)+\frac{\lambda}{2} [/math]
[math]Z_1C(3)+Z_2C(2)+Z_3C(1)+Z_4C(0)+Z_5C(1)=C(4)+\frac{\lambda}{2} [/math]
[math]Z_1C(4)+Z_2C(3)+Z_3C(2)+Z_4C(1)+Z_5C(0)=C(3)+\frac{\lambda}{2} [/math]

This system can be solved using standard linear algebra techniques once the C(k) are known. The C(k) describe the covariance structure between times that are k years apart. Here's the data used in Mahler's example

Separation in Years	Covariance (0.00001)
0	130
1	60
2	55
3	50
4	45
5	40
6	35
7	30

We should read this as [math]C(1)=60\cdot 0.00001 =0.0006[/math] for example.

The example Mahler gives is a bit large to expect on the exam. If you solve the equations carefully you should get (in order) 11.6%, 13.4%, 17.3%, 23.8%, and 33.9% with the more recent data getting the larger weights.

If this is asked on the exam, it's more likely to be for two or three years. Go ahead and try solving the following problem: Insert Mahler.Rating PDF

Mahler concludes his work is directly applicable to ratemaking situations where the risk is being compared to a broader average, such as when experience rating individual risks, or calculating class or territory relativities. While in his baseball example the grand mean was always 50% (someone wins and someone loses almost always), this isn't necessarily true for insurance applications.

Mahler notes his estimation methods are always balanced. That is, when the dataset is not subdivided (by class or territory for instance), the estimate produces the average for the entire dataset.

Finally, (and perhaps most importantly) Mahler concludes when the distributional parameters are changing over time, less weight should be given to older years of data and it is more important to minimize the delay in receiving the data.

Mahler.Credibility

Contents

Study Tips

BattleTable

In Plain English!

Four Simple Solutions

Three Criteria for Deciding Between Solutions

More General Solutions

Equations for Least Squares Credibility

Mahler's Ratemaking Example

Pop Quiz Answers

Navigation menu

Mahler.Credibility

Study Tips

BattleTable

In Plain English!

Four Simple Solutions

Three Criteria for Deciding Between Solutions

More General Solutions

Equations for Least Squares Credibility

Mahler's Ratemaking Example

Pop Quiz Answers

Navigation menu

Search