GLM_Validation2
Hi, in the solution, for the "Vertical Dist.", just wanted to confirm it's not the max subtracts the min, but it's always the 10th decile subtracts the 1st decile?
BTW, I believe there is an error in "Avg Actual A"'s formula, as the numbers look a bit small at the 1st glance.
thank you in advance!
Comments
Yes, for the vertical distance it should be the 10th decile minus the 1st decile. This is because we're ordering the risks from best to worst according to our model. So if the model is performing well, then the largest vertical distance will be between the 10th and 1st deciles. If it's occurring elsewhere then that's a red flag!
You're correct about the error. Our column references had shifted over one too far. I've uploaded a new version which addresses this (and makes a minor tweak to an input so the result remains extreme)
for part b)
1) predictive accuracy: how did you arrive at the conclusion of "Model B tracks them better than mode A"?
2) for monotonicity: do we reference to "avg actual A" to see if the nbrs consistently increase?
Great questions thank you.
1) There is a fair amount of judgment here. What matters most is how you defend your answer. I looked at the absolute value of the difference between the average and actual values for each model (the delta columns in the solution). I then counted the number of times that this calculation was less for Model A than it was for Model B. This was 4/10 times. So I concluded that Model B is generally closer to the actuals than Model A. I also tried looking at the total variation which is the sum of these absolute values. This actually says Model A is better but can be misleading because all it takes is one large miss for a model to appear significantly worse than it is.
2) Yes, we should always refer to the average actuals because the way we constructed the deciles means the predicted values will always increase.