Finite mixture models seem to hold much promise. They are currently used for everything from determining what movie you would like to watch to analyzing gene-expression (see Geoffrey McLachlan's page). I am interested in how they can be used to determine variation in treatment effects (see previous post).

The question is whether there are different types of stage 3 colon cancer patients. Eventually the question is whether the therapy should be different for the different types of patients, but here it is just whether the patients are different.

The two charts are created from the Moertel et al (1990) data on use of adjuvant chemotherapy among stage 3 colon cancer patients. See more on this data here. The method is the "npEM" kernel density "EM-like" algorithm in the "mixtools" package. I added the bounds calculator.

The first chart shows the probability distribution over days of survival for two latent types of patients, the Type Green patients and the Type Red patients. The separation in the lines is due to the fact that the estimation method is unable to exactly determine the distribution. The second chart presents the distribution over positive lymph nodes for Type Green and Type Red patients.

These charts are based on a finite mixture model in which it is assumed that there are at least two types of patients (fewer is allowed). It is also assumed that there are two "signals" of the patient's latent type. The first signal is the number of days of survival after entering the trial. The second signal is the number of positive lymph nodes the patient has.

It is assumed that while there is a statistical relationship between survival and the number of positive lymph nodes, there is not a causal relationship. It is the unobserved type that determines the relationship between survival and the number of positive lymph nodes. See American Cancer Society on lymph nodes and cancer. This assumption is potentially testable using this data because we can estimate the same model separately in the two other treatment arms and by random assignment, the mixture of latent types should be the same across treatment arms.

The charts use the "observational" arm of the trial. The data in this arm is not able to perfectly distinguish the latent types even with the assumptions I made above. Still we can see that there do seem to be at least two types of stage 3 colon cancer patients.

The Type Green patients tend to live longer (actually the trial ends before death for most of these patients) and these patients generally have less than 5 positive lymph nodes. Type Red patients generally have lower survival probabilities and tend to have a much higher number of positive lymph nodes.

While this analysis is very rough, it does raise an important question. Should all stage 3 colon cancer patients receives adjuvant chemotherapy? We know from the trial results that the "average" patient lives longer with adjuvant chemotherapy. But is that average patient Red or Green or some mixture of both?