|Mixture model example from Wikipedia|
The other day I was emailing with a friend about my new paper in which I use something called a "mixture model" to analyze the variation in treatment effects. I'm pretty excited about the idea of using mixture models in this way, but was my friend was somewhat dismissive.
The idea of a mixture model is pretty straight forward. Imagine observing a distribution where the four distributions pictured to the right were all equally likely and we saw the probability of an outcome less than -2. For the purple type the probability is 0.5, for the red type it is 0. For the green type it is about 0.05 and for the blue type it is about 0.1. So in our data we should see an outcome less than -2 about (0.5 + 0 + 0.04 + 0.1)/4 = 0.16 (about sixteen percent of the time).
The statistical question of interest is if we observe outcomes less than -2 sixteen percent of the time, can we use this information to decompose the observed probability into the four underlying types that generate the observed data? Can we tell from our data how many hidden types there are? Can we tell the proportion of each type? Can we determine the distribution for each type?
The answer is no.
It is easy to see this. Imagine that instead of the purple type having a probability of 0.5 that the outcome is less than -2, it is 0.4 and the probability for the blue type is instead 0.2. The observed probability in the data will again be 0.16. We can arbitrarily change the underlying type probabilities, as long as the aggregate is 0.64. All such possibilities are consistent with what we observe. Similarly, we can change the weights and the probabilities in many different ways and still get the observed sixteen percent we see in the data.
OK, so if we can't decompose our observed data into the underlying distributions, what is the point?
The interesting thing about these models is that with certain information and certain assumptions about how the data is generated, it is possible to decompose the data into the underlying distributions.
Unfortunately, it is very very common to estimate these models when there is not enough data to do the decomposition. Often the resulting decomposition is coming from arbitrary and non-credible assumptions made by the researcher rather than any actual information in the data. Worse, it is often unclear how much of what we know about the distribution is due to information in the data and how much is due to the arbitrary assumptions of the researcher.
In 1977, a mathematical statistician, Joseph Kruskal, worked out in this paper, sufficient conditions for the data to provide enough information for the observed distribution to be decomposed into the underlying distributions. That is, Kruskal presented a set of conditions for when the data and not arbitrary assumptions of the researcher would provide enough information for the decomposition. More recently, in this paper, signal engineer, Nikos Sidiropoulos, and co-authors presented necessary conditions on the data for the decomposition to possible.
My new paper thinks of their being different types of people, where not only may these different people have different outcomes, but the treatment being tested may have different effects. When we test a new drug using randomized control trials we generally present the results aggregated over the different types. If we find that the drug increases survival, we do not know if it increases survival for some people, all people, or most people. The objective of the statistical analysis is to uncover the different hidden types and ultimately to target particular drugs to particular sub-groups of the population. The hope is that this can be done without resorting to arbitrary and non-credible assumptions. My friend remains skeptical.