BattleCard
For the card: "The output of a GLM contains the following "EngineSize:small, BodyType:van". Which variable and category is the associated coefficient relative to?"
Shouldn't the coefficient be describing the effect of having bodytype van for small engine size relative to the effect of having bodytype van for the base class of engine size?
I understand that the discount/relativity of the interaction term is relative to the small engine size with the base class of body type, but coefficient is different than discount/relativity.
Comments
The coefficient for "EngineSize:small, BodyType:van" is used to determine the discount associated with having BodyType = Van when you have a small engine. This discount is relative to the risks with small engine size and base body type. So the coefficient is relative to the base class for body type.
The magnitude of the discount for small engine size with body type van can be compared to the size of the discount for the base class engine size with body type van.
We highly recommend working through the algebra in the sprinkler interaction example in the text/wiki if you haven't already. https://battleacts8.ca/8/wiki/index.php?title=Goldburd.Interactions#Interacting_Two_Categorical_Variables
If you replace occupancy class with EngineSize (just assume there are 4 sizes) and Sprinkler status with Body Type (assume Van or All Other for simplicity) then you can write out the equations for each type of risk and then compute the discounts when you hold the EngineSize still and vary the BodyType. This helps illustrate why the coefficients are relative to the base class for BodyType.
I understand that the discount calculated from the coefficient is relative to the risks with small engine size and base body type, but isn't it the coefficients estimate from the GLM output itself describes the magnitude of discount which is relative to the size of discount for base class engine size with body type van?
I believe it is also why the wiki on the sprinkler example, says "The p-value of 0.005 indicates there is a significant difference in the indicated magnitude of the sprinkler discount between class 2 and class 1". The p-value and coefficients are both the output from GLM, which are not really the same thing as discount/relativity?
In the interacting two categorical variables example we have the sprinkler discount for class two requires two GLM coefficients to calculate it (sprinklered coefficient, and the occupancy class 2: sprinklered yes coefficient). The same is true for class 1 except one of those coefficients is 0.000 because it's the base level. These discounts are relative to a risk with no sprinklers.
If you're talking about the discounts relative between risks that all have sprinklers then yes, only one coefficient is needed to calculate it. Keep in mind we're talking about the discounts though in relative terms here rather than absolute terms because the overall base risk is class 1 with no sprinklers.
right exactly, so isn't the battlecard "Which variable and category is the associated coefficient relative to?" referring to the interaction term coefficient only, which is exactly what you described as "discounts relative between risks that all have sprinklers" that only use 1 coefficient to calculate?
If that is the case, should the interaction term coefficient be relative to risk with base class engine size with body type van instead of small engine size with base class body type (current battlecard answer)?
thanks for the clarification
The battlecard is correct. In this case, the GLM output would contain a separate entry for EngineSize: Small, say -0.300, and suppose the EngineSize: Small; BodyType: Van coefficient is -0.250.
In absolute terms we have some base class for engine size and base class for body type. Assuming a log-link function, small engines that aren't also vans get a relativity of e^-0.300 (~26% discount). Risks that are vans with small engines get a relativity of e^(-0.300 + -0.250) which is about a 42% discount.
So the -0.250 coefficient is allowing us to see how the discount for small engines varies according to body type (relative to the base class for body type).
Using the same coefficient as yours, and further suppose coefficient for BodyType:Van is -0.1.
Based on the interaction term, we can agree non-small engine size is the base class for engine size and non-van is the base class for body type.
Based on the example for sprinklers, I calculate the relativity for body type van with non-small engine size as e^(-0.1), a ~9.5% discount compared to a risk with non-small engine size and is not a van.
The relativity for a risk with small engines and is a van is e^(-0.1-0.25), a 29.5% discount compared to a risk in with small engine that is not a van.
So, the coefficient -0.25 is measuring the difference in the magnitude of discount relative to the base class for engine size.
If the interaction term is BodyType:Van; EngineSize:Small, then it will be as you described. It seems you are treating Predictor A and Predictor B the other way round.
Please correct me if I am wrong.
Let's formulate the GLM output using our figures:
Estimate (ignore std error and p-value for now)
(Intercept) c
EngineSize:small -0.3
BodyType:van -0.1
EngineSize:small, BodyType:van -0.25
Assume a log-link function is used. Assume EngineSize consists of {"Small", "Not Small"} and BodyType consists of {"Van", "Not Van"}.
The relativity for small vans is e^(c-0.3-0.1-0.25).
The BodyType:van coefficient of -0.1 describes the impact of being a van body type on the "Not Small" engine class.
Risks that are vans and not small engine size have relativity e^(c-0.1).
So the effect of small vans relative to not small vans is e^(c-0.3-0.1-0.25) / e^(c-0.1) = e^(-0.3-0.25). This leaves us with the interaction term -0.25 and the term for the EngineSize:small, -0.3, thus indicating how the BodyType variable influences the non-base-class category for the EngineSize variable.
So the coefficient for the interaction term is the effect of being a BodyType:van versus a BodyType:"Not Van" on an EngineSize:small risk. That is, it is the impact of the BodyType variable, not the EngineSize variable.