Chris On Statistics: 2014

Monday, July 7, 2014

RCTs are Necessary to Determine Causality. Right? Wrong.

It seems to be a widely held belief that randomized control trials are preferable if one is interested in determining causality. Causality is something of a slippery concept, but let us use the following definition: Drug A causes an increase in survival if drug A increases survival for at least ONE person relative to drug B.

I think this is a minimal requirement. If it is not true that drug A increases survival for at least one person relative to drug B, then drug A certainly does not cause an increase in survival.

If we observe the proportion of patients who survive on drug A and the proportion of patients who survive on drug B, can we determine if drug A causes an increase in survival?

No. We don't have enough information.

If more patients who take drug A live longer than patients who take drug B, we cannot (without more information) determine why they lived longer and whether it had anything to do with drug A.

What is the minimum amount of information we would need to determine if drug A causes greater survival? The answer is somewhat technical, but there are two cases where we could make the determination.

The first case is where we observe patient survival for a representative group of patients who only had drug B available. For example, prior to 2004 stage III colon cancer patients received 5FU as adjuvant therapy as oxaliplatin had yet to be approved by the FDA.

The second case is where we are willing to assume that patients are assigned to the drug for which their survival probability is higher. This is what is called a "behavioral" assumption. Such assumptions are relatively common in economics, but generally frowned upon outside of economics.

In both of these cases it is possible to determine whether drug A causes greater survival relative to drug B by simply looking at the probability of survival between those patients that take drug A and those patients that take drug B.

It is not necessary to have some ideal randomized control trial if we are able to observe survival for a group of representative patients who had no access to drug A or if we are willing to assume that patients are not assigned to the drug that is more likely to kill them.

Tuesday, June 17, 2014

Solving the Wrong Problem

For the last 100 years or so statisticians and econometricians have spent all of their energy solving the wrong statistical problem.

In statistics we are interested in determining what happens to some outcome of interest (Y) after making a change to some other observable variable (X). For example, we are interested in increasing survival from colon cancer (Y to Y') using some new drug treatment (X to X'). The problem is there is some unobservable characteristic of the patient or the drug or both (U) that may determine both patient survival and the use of the new drug treatment.

There are two statistical problems.

The first one, the one we spend all our energy on, is called "confounding." In the picture above this problem is represented by the line running from U to X. The unobserved characteristic of the patient is determining the treatment the patient receives. In this paper, there seems to be a tendency for oxaliplatin to be given to healthier patients which may explain the survival difference between the oxaliplatin group and the non-oxaliplatin group. To solve this problem we spend millions and millions of dollars every year to run randomized control trials. In economics, we devise fancy and clever ways of overcoming the confounding with instrumental variables.

The second problem doesn't really have a name. I will call it "mediating." This problem is represented in the picture by the line from U to Y. The problem is that there may be some unobserved characteristic of the patient that is mediating the effect of the treatment on the patient's survival. Oxaliplatin may have greater effect on survival for younger patients relative to older patients (see here). We do not spend any money or much time devising ways to solve this problem. In fact, we often give up before we start by saying that it is impossible because it is not possible to observe the same patient's outcome under two different treatments.

The problem with spending all our time on the first problem is that once we solve it we are still no closer to solving the second problem and we still don't know what will happen to any patient when given the treatment being studied.

In the graph to the left, the outcome (Y) is a function of both the treatment (X) and the unobserved patient characteristic (U). Although the experiment removes the line between U and X, the line between U and Y remains.

We can conduct as many experiments as we like and still be no closer to knowing what will happen when we give the treatment to a new patient because we don't know anything about that patient's unobserved characteristics.

Saturday, June 14, 2014

I Love Polynomials

The statistician, Andrew Gelman, hates polynomials. I love them.

Polynomials are really very cool and they have a lot of very nice properties.

One of the most important properties they have in statistics is that they form a "ring". This is an algebraic term meaning that any two elements of the set may be added (subtracted) or multiplied (divided) and the corresponding outcome is also an element of the set. Fractions form a ring. If you take any fraction and multiply it by any other fraction you get a fraction (a rational number). Numbers (natural numbers) do not form a ring. 5/4 is not a natural number.

What is the big deal about rings?

The big deal is that if we have a set of continuous functions on a closed and bounded space that form a ring, then that set can approximate any (that is any and all) continuous functions on the aforementioned space. In English. If we want to approximate a continuous function, then we can't do any better than to use a polynomial. For those interested, check out Stone-Weierstrass Theorem.

Polynomials are also unbelievably easy to estimate. We can just do ordinary least squares regression and viola.

So any function you want to estimate can be approximated with a polynomial and they are really easy to estimate. What is not to love?

How do you know that the polynomial you are estimating is really equal to the function you are interested in approximating? Well. You don't.

Gelman points out that there is a tendency to estimate very high order polynomials and he discusses the implications in this unpublished working paper. The problem is that the data may not allow such polynomials to be identified. The result is that many important coefficient estimates are simply made up numbers.

I show in this paper that if you have a sequence of polynomials and a data set with enough information to accurately each polynomial in the sequence, then that sequence converges to the function of interest. I also show in a Monte Carlo (with made up numbers) experiment that if there is not enough information in the data to accurately estimate a high order polynomial, the approximation error is very large.

The corollary is that if you have a polynomial that is a poor approximation of the function of interest then there is not enough information in the data to accurately estimate the high order polynomial.

Sunday, June 8, 2014

RCTs are Like Looking for Money Under the Street Light

Mutt and Jeff (June 3 1942)

Rubin (1974) argues that we should prefer randomized control trials (RCT) to observational data because the "casual effect" of the treatment is measured from the RCT data under mild assumptions.

Rubin defines the "causal effect" of a new treatment as the difference between the outcome of the patient when she is given the new treatment and the outcome of the patient if she had received the current standard of care. Rubin acknowledges that the difference is not observed and is in fact unobservable. A patient can only ever receive one treatment and so it is not possible to observe the outcome in two alternative treatments.

Like the man in the top hat, Rubin suggests looking for the information in the light. In Rubin's case the "light" is provided by the RCT which measures the average treatment effect under mild assumptions. Rubin argues that average treatment effect is a measure of the difference in outcomes for the "typical" patient. If we take "typical" to mean that it is true for some reasonable sized group of patients, then there is no reason to believe that the "typical treatment effect" will even have the same sign as the average treatment effect. The average treatment effect averages over the difference in outcomes for each of the patients. If some patients benefit from the treatment and some patients are harmed by the treatment then the average treatment effect may be positive or negative depending on the relative sizes of the two patient groups and the relative sizes of the benefits or harms.

If the average treatment effect is positive then we know for certain that there exists one patient for whom the new treatment was better than the existing treatment. That is it. That is all we know for certain. It may be that all patients are better off with new treatment or it may be that (almost) all patients are worse off with the new treatment. The average treatment effect is observed by the light of the RCT but it tells us very little about what we are looking for.

Saturday, May 31, 2014

If Wishers Were Horses, Beggars Would Ride

The National Cancer Institute (of the NIH) announced this week a large reorganization of its clinical trial system. As part of the reorganization it announced smaller budgets for running clinical trials. This reorganization has been coming down the pike for a while now and the smaller budgets are a matter of fact given reduced funding from Washington.

What I found disturbing in the announcement was the repeated claim that new technologies in oncology drugs would reduce the need for large clinical trials.

According to the announcement

Although the screening tests may need to be performed on very large numbers of patients to find those whose tumors exhibit the appropriate molecular profile, the numbers of patients required for interventional studies are likely to be smaller than what was required in previous trials. That is because the patient selection is based on having the target for the new therapy, leading to larger differences in clinical benefit (such as how long patients live overall or live without tumor progression) between the intervention and control groups.

It is true that breakthrough advances such as the AIDS cocktail or Gleevac can show themselves to be enormously effective even in small trials, but that doesn't mean that we should expect all new drugs or treatments coming into development to be breakthroughs.

It is unclear to me why we should expect that targeted therapies should have larger effects on survival. I see why we should expect targeted therapies to be more targeted and thus only likely to work for a small subset of patients with very particular genetic mutations in their tumor. But even if the therapy works on the "bench" it may not have the same effect once it is put into humans. As the announcement states, targeted therapies will require greater amounts of genetic screening in order to find the right patients. More over, the total population of patients with a particular genetic mutation may be extremely small. The future of targeted therapies may well involve smaller clinical trials, but I think NCI is being rather optimistic believing that we won't need large trials. Today's "Daily News" from ASCO presents a very different view from Don Berry.

Gleevac is the poster boy for new age of targeted therapy, but it is a drug that seems to be exception rather than the rule. Gleevac was able to solve a very particular genetic problem for a very particular class of cancer patients. The genetic problems in most common cancers seem to be substantially more complicated and do not seem amendable to single target therapies. In colon cancer, genetic testing is required for certain drugs, not because these drugs have amazing breakthrough effects, but rather because they don't seem to work when certain genetic mutations are present (see here).

In the mean time non-targeted therapies like immunological therapies are starting to be developed. Will these therapies also require smaller trial sizes?

Let's hope NCI gets its wishes and all future drugs are breakthrough therapies that don't require large clinical trials and beggars can finally ride.

Wednesday, May 21, 2014

A New Way to Solve Confounding?

IV Graph from Imbens (2014)

Confounding refers to statistical problem that there is some unobserved characteristic of the patient that is both determining the patient's observed treatment and the patient's outcome.

For example, this study shows that older stage III colon cancer patients are much less likely to receive oxaliplatin as an adjuvant therapy than younger patients. This may be the reason that in the Medicare data, oxaliplatin is associated with bigger survival effects than in the randomized control trials. The Medicare data suffers from a confounding problem. Doctors of sicker patients may be less willing to prescribe oxaliplatin because of its side effect profile. The observed difference in survival may not be due to the use of oxaliplatin, it may simply be the fact that the non-oxaliplatin patients are sicker.

In the graph to the right, the unobserved variable (patient "sickness") is represented by the red U. The patient's treatment (oxaliplatin or not) is represented by the black X and the patient's survival is represented by the black Y. We would like to know whether there is a blue line from X to Y, representing treatment effect of using oxaliplatin on survival. But we can't determine the treatment effect because U is affecting both X and Y through the red lines from U to X and U to Y. Sicker patients are less likely to get oxaliplatin (red line from U to X) and sicker patients have lower survival (red line from U to Y).

A standard way to solve the confounding problem is to observe (or introduce) a fourth variable (Z) which is called an "instrumental variable." As the graph shows, the instrument is some observed characteristic of the patient that determines the patient's treatment choice but is unrelated to the patient's unobserved characteristic or the patient's survival. In randomized control trials the instrument is the random number generating process that is used to assign patients to treatment arms.

In the Medicare data on the use of oxaliplatin, the instrument may be the date of the diagnosis. Patient's diagnosed earlier were much less likely to receive oxaliplatin than patient diagnosed at a later date. By looking at changes in survival over the time period of the introduction of oxaliplatin we can determine the causal effect of oxaliplatin on survival (assuming no other major changes to treatment during the same time period).

An alternative way to solve the confounding problem is to measure all the confounding characteristics. If we observe U then we can simply measure the effect of X and U on Y. If we observe the co-morbidities of the patient we can measure the relationship between the co-morbidities and the use of oxaliplatin on survival. The problem with this approach is that we may not observe all the confounding factors.

A new paper of mine (see discussion here) suggest an alternative approach. Instead of attempting to directly measure U, we infer U from observable characteristics of the patient. Instead of attempting to directly measure the "sickness" of the patient, we look at observable characteristics of the patient like their age and use those signals to determine the distribution of patient's latent sickness type.

This mixture model approach has the advantage of not requiring instruments and not requiring that observe every possible characteristic of the patient that may be determining the treatment choice.

Saturday, May 17, 2014

Can Mixture Models Cure Cancer?

Mixture model example from Wikipedia

The other day I was emailing with a friend about my new paper in which I use something called a "mixture model" to analyze the variation in treatment effects. I'm pretty excited about the idea of using mixture models in this way, but was my friend was somewhat dismissive.

The idea of a mixture model is pretty straight forward. Imagine observing a distribution where the four distributions pictured to the right were all equally likely and we saw the probability of an outcome less than -2. For the purple type the probability is 0.5, for the red type it is 0. For the green type it is about 0.05 and for the blue type it is about 0.1. So in our data we should see an outcome less than -2 about (0.5 + 0 + 0.04 + 0.1)/4 = 0.16 (about sixteen percent of the time).

The statistical question of interest is if we observe outcomes less than -2 sixteen percent of the time, can we use this information to decompose the observed probability into the four underlying types that generate the observed data? Can we tell from our data how many hidden types there are? Can we tell the proportion of each type? Can we determine the distribution for each type?

The answer is no.

It is easy to see this. Imagine that instead of the purple type having a probability of 0.5 that the outcome is less than -2, it is 0.4 and the probability for the blue type is instead 0.2. The observed probability in the data will again be 0.16. We can arbitrarily change the underlying type probabilities, as long as the aggregate is 0.64. All such possibilities are consistent with what we observe. Similarly, we can change the weights and the probabilities in many different ways and still get the observed sixteen percent we see in the data.

OK, so if we can't decompose our observed data into the underlying distributions, what is the point?

The interesting thing about these models is that with certain information and certain assumptions about how the data is generated, it is possible to decompose the data into the underlying distributions.

Unfortunately, it is very very common to estimate these models when there is not enough data to do the decomposition. Often the resulting decomposition is coming from arbitrary and non-credible assumptions made by the researcher rather than any actual information in the data. Worse, it is often unclear how much of what we know about the distribution is due to information in the data and how much is due to the arbitrary assumptions of the researcher.

In 1977, a mathematical statistician, Joseph Kruskal, worked out in this paper, sufficient conditions for the data to provide enough information for the observed distribution to be decomposed into the underlying distributions. That is, Kruskal presented a set of conditions for when the data and not arbitrary assumptions of the researcher would provide enough information for the decomposition. More recently, in this paper, signal engineer, Nikos Sidiropoulos, and co-authors presented necessary conditions on the data for the decomposition to possible.

My new paper thinks of their being different types of people, where not only may these different people have different outcomes, but the treatment being tested may have different effects. When we test a new drug using randomized control trials we generally present the results aggregated over the different types. If we find that the drug increases survival, we do not know if it increases survival for some people, all people, or most people. The objective of the statistical analysis is to uncover the different hidden types and ultimately to target particular drugs to particular sub-groups of the population. The hope is that this can be done without resorting to arbitrary and non-credible assumptions. My friend remains skeptical.

Thursday, May 15, 2014

Can Netflix Cure Cancer?

Neil Hunt of Netflix discusses a number of issues related to the value of randomized control trials to cure cancer. While Hunt is a bit off on the number of people that take part in clinical trials and about Gleevac, it is a very interesting talk.

Monday, May 12, 2014

Confounded by Confoundedness

Data based on the Big Mac index by
The Economist -magazine January 2012.

Over the last few weeks I have been confounded by a couple of articles on the health effects of diet.

One article presented the results of a large diet survey where the health outcomes such as death were matched to the participants who filled out the diet survey. The article suggested that various healthy diet choices led to major positive health outcomes like not dying. The article didn't discuss the obvious concerns and skepticism we should have about such a study, on the contrary, the article interviewed various people who said the results bolstered policy recommendations such as food pyramids.

The second article was from a New York Times blogger, George Johnson. Johnson was attending the American Association of Cancer Research conference and noted that there was little work in which large diet surveys were matched with health outcomes at the conference. Johnson stated that the lack of interest was due to the problems of interpreting such work.

Observational studies such as large diet surveys with matched health outcomes data suffer from obvious confounding issues. However, I'm confounded by both the unthinking acceptance of such statistical analysis and the absolute dismissal of such statistical analysis. Isn't there some middle ground?

For anyone trained in economics the middle ground solution is to find "instruments." Instruments are not bagpipes, rather they are observable variables for which it is plausible to make the following claim: there is some observational characteristic associated with survey participants that is associated with diet choice but not the health outcome of interest. For example, some participants may live in France.

When one lives in France, one tends to have access to French food and French food prices. For example a French Big Mac is 510 calories, while a US Big Mac is 550 calories. The US Big Mac is 8% bigger. The US Big Mac costs $4.20, while the French Big Mac costs $4.43 according to the Big Mac Index. So if you live in France you get to buy smaller Big Macs for more money than if you live in the US.

The French tend not to be as fat as Americans (see OECD slide here). One reason is that they face different food choices and different food prices. Another is that they have different preferences. Apparently, there exist people in France who do not like Big Macs.

Economists, Aviv Nevo, Rachel Griffith and Pierre Dubois, analyzed differences in food consumption in France and the UK in a forthcoming paper in the American Economic Review. The authors find that part of the explanation for different food consumption patterns is prices, but part is preferences.

To the extent diet is determined by the prices faced by the survey participant, then it is likely to be unrelated to the participants preferences and other lifestyle choices like not smoking.

If we know the geographical location of a survey participant, we can start to unravel whether diet causes cancer or whether diet and cancer are related through other lifestyle choices like smoking.

Friday, May 9, 2014

Inferring Heterogeneous Treatment Effects

People are different. Different people have different cancers and those cancers require different treatments. While most oncologists and cancer researchers understand this, it does not stop them from insisting that we must rely on statistical techniques that are not able to measure the effect of these differences on treatment outcomes.

Standard statistical analysis of randomized control trials provides unbiased estimates of the average treatment effect. Standard statistical analysis of randomized control trials provides no information about the variation in the treatment effect across the patient population.

To observe variation in treatment effect we would have to observe each patient's outcome with both the treatment that they received and the treatment that they did not receive. Economists call this the "counter-factual" problem. The treatment the patient did not receive is counter to the fact. If in some imaginary world we were able to observe both the factual outcome and the counter-factual outcome then we would look at the difference between them and measure the distribution of the differences. We would be able to measure the distribution of the treatment effect. As we live in the real world we do not directly observe the distribution of the treatment effect.

While we cannot directly observe the variation in the treatment effect, we may be able to infer it.

The idea is that the heterogeneity in the treatment effect is being determined by some characteristic of the patient that is unobserved by the statistician. Further, patients can be categorized into latent or hidden classes according to this unobserved characteristic. These hidden classes can be inferred from observing certain characteristics of the patients that we know to be associated with the hidden classes. It has been shown that if we observe a number patient characteristics that are all associated with the hidden classes then we may be able to uncover the hidden classes.

Analysis based on these ideas have been successfully used in econometrics, psychometrics, biostatistics and computer science.

Here is my foray into the use of a technique called non-negative matrix factorization. I use the technique to analyze heterogeneous treatment effects in adjuvant therapy for colon cancer.

Tuesday, May 6, 2014

Adaptive Clinical Trials

Don Berry

At the recent NCCN conference, I was introduced to the idea of an "adaptive clinical trial." There are a number of ideas that get amalgamated into this general description. One example is that the weighted random assignment into each arm of the trial changes over time in response to trial results.

For example, the trial may begin by assigning all patients to each of the two arms with probability one half. However, after say the first year the data suggests patients in one of the two arms have greater survival, at that point the weight increases. It may be that if probability of surviving 6 months is 5 percentage points higher for the new treatment, then the proportion of new patients assigned to the new treatment arm increases to 60-40. If at a year, the difference remains, then the percentage may increase to 70-30. However, if the difference reduces, the percentage of patients assigned to the new treatment arm may fall to 50-50. On the other hand, if patients in the new treatment arm have one year survival that is 10 percentage points higher, then the percentage of patients assigned to the new treatment may increase to 80-20.

Before talking about the advantages or disadvantages of such a trial design, it is important to remember that current trials accrue patients over time and large trials may take years before they have enough patients. During this time, information on the value of the new treatment is being carefully accumulated, stored and kept away from prying eyes. The FDA strongly prefers that data not be available to researchers so as not to lead to biased trial results. That said, trials often have a data committee who are responsible for looking at the data and deciding whether or not there are severe enough safety problems such that the trial should be stopped.

From a statistical point of view, the trial design has no obvious disadvantages and may have some advantages. Assignment is still random, and as long as no information on the value of the various treatments leaks to patients or doctors, the trial results should remain unconfounded and unbiased. One advantage of the trial design is that it may reduce the selection-into-study bias. Potential patients may be much more willing to sign up for a trial where they know that the probability of being assigned the better arm is increasing in the difference between the trial arms.

Two possible problems are that the power of the trial may be reduced because one of the arms is getting a smaller proportion of the patients. Another problem is associated with so called "bandit problems." It is theoretically possible that through statistical coincidence, patients on one of the trial arms start doing very poorly. This will cause the weights to move away from that arm to the other trial arm. It may be that the initial poor results would be overwhelmed by later data, had the assignment remained at 50-50 but because of the adaptive design there isn't enough patients being assigned to the trial arm for overwhelming to occur. While this is not a bias problem, it may mean a greater probability of false-positive trial results.

The FDA raises a number of concerns with such trial designs in this report.

Saturday, May 3, 2014

"Causal Effects" and Randomized Control Trials

One reason for wanting to conduct a randomized control trial is that it provides evidence of the causal effect of a new drug or treatment. However, if we define "causal effect" as the difference in outcomes a patient would receive on the new drug and the outcome the patient would have received on the alternative, then it is not clear RCTs can provide this information.

Don Rubin points out in this paper that the problem is that we cannot observe the same patient's outcome for both the new drug and for the alternative. We are limited to observing only the patient's outcome for the treatment that the patient actually received.

We cannot observe the "causal effect."

Rubin states that all is not lost because it is possible to measure the average treatment effect with an ideal randomized control trial. The problem, is that the average treatment effect may not provide information on the effect of the treatment on the average patient, the majority of patients or even a plurality of patients. The average treatment effect averages over the treatment effects of the different patients.

In the comment to this post, Bill provides an example in which 99% of patients live one month shorter on the new drug and 1% of patients live 200 months. Such a trial will show that on average patients live 1 month longer on the new drug. This example shows that the average does not have to reflect the outcome for majority of patients or even for a plurality of patients.

In most cancer trials the authors present the difference in median survival between the treatment arms. I show in this unpublished working paper, that for arbitrary differences in median survival it is easy to find examples in which almost all patients (every patient except 1) their individual outcomes is the opposite of the median difference.

So a positive average treatment effect does not imply even a significant proportion of patient will benefit from the drug and a positive difference in median survival does not imply that a significant proportion of patients will benefit from the drug.

What about other measures, hazard ratios, Kaplan-Meier plots, can any information from the randomized control trial tell us if any reasonable number of patients will benefit from the drug?

Hazard ratios are based on "regression techniques" that make strong parametric assumptions that may not be true in practice. But even assuming that the parametric assumptions were correct the results only show that some positive number of patients will have some positive benefit from the drug. Like with the average treatment effect, a proportional hazard ratio is calculated by "averaging" over patients, some of whom may benefit from the drug and some of whom may not.

Kaplan-Meier plots are potentially representations of the marginal probability of survival for each treatment. If the trial does not suffer from attrition bias or participation bias then the difference in the survival curve at each point in time provides an estimate of the minimum number of people who would benefit from the drug.

If there is attrition bias or selection-into-sample bias then the results from the Kaplan-Meier plots can still be used to provide the minimum number of people who would benefit from the drug.

Wednesday, April 30, 2014

The Future is Now

Last Friday I had the opportunity to attend the National Comprehensive Cancer Network (NCCN) Policy Summit: Designing Clinical Trials in the Era of Multiple Biomarkers and Targeted Therapies. Dr Alan Venook opened the conference by discussing a new era in which we have drugs tailored to each patient's tumor. Venook discussed many issues that may arise in this exciting future.

What struck me about the discussion is that the future is now. According to a recent article in the Washington Post, the FDA has been lobbied to approve a new drug for Duchenne muscular dystrophy based on a twelve person study. According to the article, the drug may help about 2,000 boys alive in the US today and approximately 1 in 30,000 boys born.

How large the trial needs to be to satisfy the FDA depends on how effective the drug is, but if say 500 boys are needed, the trial would have to accrue at least 25% of the disease population.

This may the future. Drug's designed to target very very specific disease populations will necessary aim at very small disease populations. Disease populations that are so small that it may be impossible to design large enough studies to test the drug's effectiveness. How does the FDA approve such drugs?

Thursday, April 24, 2014

Testing Between Causal and Spurious Effects

Tom Cruise in the movie Top Gun
that was set in Fallon NV.

In the late 1990s there was a spike in childhood leukemia cases in the town of Fallon NV, the famed home of Top Gun. What was the cause of the spike? We still don't know.

There are three possibilities:

1. The spike was due to environmental factors such as the arsenic in the drinking water or the exposure to the heavy metal tungsten.

2. The spike was related to the fact that the town had a large number of Navy personnel or some other set of unknown characteristics of the town's population.

3. The spike was a statistical fluke.

If we are interested in determining whether the leukemia spike was due to environmental factors then we can think of the relationship between the environment and leukemia as a "causal" relationship or a "spurious" relationship.

Let X represents the environment of Fallon, Y the number of leukemia cases, and U some unobserved cause of both a family's location in Fallon and leukemia. It could be that X is directly determining Y or that U is determining both X and Y.

Causal relationship

Spurious relationship

How could we distinguish between the two possibilities? In both cases we will see that a families location choice and families likelihood of having a child with leukemia are correlated.

Judea Pearl argues we should conduct an experiment. That is we should introduce a policy to purposely change X. If we move families out of Fallon or assign them to other locations and see a reduction in leukemia cases among families not located in Fallon, then we know that Fallon is the cause of the spike. If we don't see any change in the likelihood that children in these families get leukemia then we know the relationship between Fallon's environment and leukemia cases is spurious. That is, we can rule out (1) and know it may be due to some other cause (2) or a statistical fluke (3).

What if there is both a causal relationship and a spurious relationship? That is, what if there is something in the environment of Fallon that is leading to increases in leukemia, but the magnitude and direction of the effect is being mediated by some unobserved characteristic such as a family's propensity to be in the Navy. In this case Pearl's experiment still determines whether there is a directed arrow from X to Y, but we learn nothing about how that relationship is being mediated by U.

If we were able to randomly assign families to Fallon NV, then we could determine that something in Fallon's environment is increasing the likelihood of a child in the family having leukemia. What we don't learn is whether there are other factors that either mitigate or propagate the effect of Fallon's environment on the propensity to get childhood leukemia.

Pearl's experiment allows us to determine whether the relationship is causal or spurious. It does not provide information on the appropriate policy response to the problem.

Tuesday, April 22, 2014

Drug Increases Survival in Averaged Cancer Patient!

Lake Wobegon where all the children are above average.

You will never see the above heading in a newspaper article on a new break through drug. This may be unfortunate, because it one of the more accurate headlines you will read.

In a comment to this post, Bill provides an example of a drug that decreases survival for 99% of patients by 1 month and increases survival for 1% of patients by 200 months.

If you punch that into Google you get that the average treatment effect (Rubin's typical causal effect) is an increase in survival of 1 month (approximately). So while 99% of patients are made worse off by the drug, the 1% of patients do so well that they pull up the average so that Bill's "crappy" drug comes out smelling like roses.

While Bill's example is an extreme, the point is more general. The average treatment effect does not provide evidence on how the treatment will effect individual patients. If a particular treatment increases average survival by 30 percentage points for half the patients and decreases survival by 10 percentage points for the other half of patients we will find that the average increases in survival is 20 percentage points. A treatment that increases survival by 20 percentage points is a major breakthrough in cancer research.

In their analysis of 5-Fu as an adjuvant treatment for Stage III colon cancer, Moertel et al (1990) find that 5-Fu is associated with a 20 percentage point increase in the probability of survival at 4 years. The authors recommend that the treatment be provided to all patients. We see that in this post, the treatment effect for older patients is vastly different than for younger patients. Older patients are associated with a large treatment effect, while younger patients see a small or non-existent treatment effect.

Thursday, April 10, 2014

Balance vs Randomization

Australian soldiers playing the two-coin toss game "two up".
Two up requires balance prior to randomization.

According to a 2010 report by the National Research Council (National Academies), the "primary benefit from randomizing clinical trial participants into treatment and control groups comes from balancing the distributions of known and unknown characteristics among these groups prior to study treatments."

It is a simple matter to see that this is not true and that randomization does not imply balancing.

Remember back to gym class and those times where the teams were chosen at random and how mad you got because Bobby and Sue were on the other team and you only got Steve and how unfair that was (and there is no way Jesse was as good as Sue). Or the time where the gym teacher made an effort to balance the two teams by carefully pairing people of equal ability and assigning them to different teams.

Random assignment does not imply balancing and balancing implies non-random assignment.

Of course the authors of the report, who are highly respected statisticians, know this. The authors are more careful in some other parts of the report - putting "probabilistically" in parenthesis before "balances" in another similar paragraph. Moreover, when the authors do discuss the technical reason for randomized control trials they cite the arguments by Rubin (discussed in an earlier post).

The problem with perpetuating the myth that balancing is implied by randomization, is that lay people and regulators may look askance at unbalanced studies, mistaking "unbalance" for "non-random". Worse, we may see reporting bias and publication bias because studies with unbalanced populations or unbalanced treatment arms are held back. Worse still, we may see (or not see) efforts to "balance" the trial through non-random assignment of patients to treatment arms.

So next time you see a well-balanced study. Beware. It may not be random and the results may be baised.

Tuesday, April 8, 2014

Variation in Treatment Effects

Figure 1

The figures on the left present the Kaplan-Meier survival plots for various adjuvant treatments for colon cancer from a trial conducted in the late 1980s.

Figure 1: Survival probabilities at each point in time for patients that were 61 years old and over. It shows that patients who were in the 5-FU trial arm had an average survival probability that was twenty percentage points higher than patients in the other two trial arms at the 3,000 day point.

Figure 2

Figure 2: Survival probabilities at each point in time for patients that were 60 years or younger. It shows that patients in each of the three trial arms had similar survival probabilities.

These graphs were produced using R and the "colon" data in the "survival package". The patients were split into two equal groups by age.

The data is from the Moertel et al (1990) paper. While the authors discussed some subset analysis. The authors do not produce these graphs nor note the variation in survival by age.

The main results presented in the original study are discussed in this post.

Monday, April 7, 2014

Angus Deaton: Epidemiology, randomised trials, and the search for what w...

Angus Deaton presents a lecture discussing various issues and concerns with the use of RCTs in economics and more broadly in other fields including epidemiology. The lecture was given in honor of John Snow.

Saturday, April 5, 2014

National Academies Report On Attrition Bias in Randomized Control Trials: A Missed Opportunity

In 2010, the National Research Council of the National Academies published a report titled "The Prevention and Treatment of Missing Data in Clinical Trials." While acknowledging there is a problem is a great first step, the report fell somewhat short.

In any longitudinal study (a study that occurs over time), like a randomized control trial analyzing cancer treatments on survival, there is going to be attrition from the study. Over time, people will leave the study for many different types of reasons. Some reasons people leave a study have no effect on statistical inference. For example, if a patient or a patient's spouse gets a work transfer to different location without access to a study center. However, there are some reasons why people leave a study that may have a large impact on statistical inference. For example, a patient may leave the study simply because they feel the treatment is not working. It is this second reason for leaving that is associated with "attrition bias."

The report makes a number of very good points. It has some relatively simple and easy to implement suggestions for how to adjust trial design to reduce or better account for attrition bias. It makes it clear that if the attrition is "non-random" then any assumptions that the researcher or statistician makes about how the data is missing cannot be tested or verified. I was also pleasantly surprised to see that the report discussed a number of ideas that have been developed in economics including "Heckman selection" models, instrumental variables, and local average treatment effects.

Even so, there were two recommendations I didn't see, but would have liked to have:

1. Present bounds. Econometrician Charles Manski and bio-statistician James Robins (in his paper on the treatment effect of AZT on AIDS patients) introduced the idea of bounding the average treatment effect when faced with variables "missing not at random" in late 1980s. It would have been nice to see this idea mentioned as a possible solution.
2. Discuss the implications. If there is concern about bias, that concern should be raised by the researchers. The researchers should discuss the implications of the results and their policy recommendations.

Wednesday, April 2, 2014

Unconfoundedness. It's What's for Dinner.

Confounded Graph

Unconfoundedness is a silly name for a thing. Particularly a thing that is so important. A thing that leads us to spend billions of dollars on randomized control trials every year, while perfectly good observational data lays forlorn and unloved in the databases of CMS, hospitals and insurance companies.

So what is this "unconfoundedness"?

Unconfoundedness is the state of not being confounded.

Obviously.

To understand unconfoundedness. It is necessary to understand confoundedness. Consider the graph to the right. We are interested in causal effect of X on Y. Where X may represent the colon cancer treatment FolFox (5-Fu and oxaliplatin) and Y represents survival of colon cancer patients. We would like to know how much of an increase in survival colon cancer patients get when they are given FolFox versus 5-Fu alone. If we observed data from Medicare patients like in this paper. We may think that FolFox have a big effect on survival. The problem is that patients are not choosing randomly between 5-Fu and FolFox. Patients and their doctors may have information about their own characteristics (U) and that information may be determining their choice of treatment (the arrow from U to X).

So it may be that when we see that patients on FolFox do much better than patients on 5-Fu, it may be that is coming from the fact that doctors and patients who are older or frailer are choosing to forgo the oxaliplatin and its associated side effects. The observed difference in outcomes may be due to the treatment or it may be due to the characteristics of the patients that are choosing each of the treatments and having nothing at all to do with the effect of the treatment itself.

Unconfounded Graph

How do we get rid of this confounding effect?

One way is to randomize patient assignment to treatment. This is what is done in randomized control trials. This act of random assignment removes the arrow from U to X (see graph to the left). Treatment choice (X) is no longer decided by unobserved patient characteristics (U).

Is randomization the only way to get unconfoundedness?

No. There are many ways for data sets that have the unconfoundedness property. The important thing is that the choice of treatment is unrelated to unobserved characteristics of the patients that may be associated with different observable outcomes. For example, prior to oxaliplatin getting FDA approval, there was very little use of the drug by colon cancer patients. Economists call observational data that satisfies unconfoundeness, "natural experiments."

Why is unconfoundedness good?

Technically, unconfoundedness allows the researcher to measure the "marginal" distribution of outcomes conditional on the treatment. The observed distribution of outcomes conditional on treatment choice is an unbiased estimate of the marginal distribution of outcomes conditional on treatment.

OK. Why is what-ever-you-said good?

Well, we may be able to use the marginal distribution of outcomes to calculate information about the treatment effect (the causal effect). For example, if we observe the "average" conditional on the treatment choice, then we can measure Rubin's "typical causal effect" (of course in cancer we don't observe the average survival). More generally, we can use these estimates to bound the proportion of patients who will have a positive treatment effect.

Tuesday, April 1, 2014

New Drug May or May Not Increase Life Expectancy in Mice!!

Kaplan-Meier plot of survival for mice
from Mitchell et al (2014)

So there goes my theory on why medical journals present median survival. Here we have a study where all the patients (they are mice) die, and yet the study still reports the (mostly) useless difference in median survival.

The study looked at the effect of the drug SRT1720 on life expectancy of mice and was recently reported in CELL. 400 mice were allocated to 4 treatment arms - standard diet, standard diet plus SRT1720, high-fat diet, and high-fat diet with SRT1720.

As I said, all the mice do the right thing and die and so we know the mean effect of SRT1720 on survival or we would if the authors had reported it. The authors do report that the average effect is "significant" for mice on both diets. I don't know if they mean it is statistically significant or medically significant. We also learn that SRT1720 is associated with an 8% increase in survival for the SD mice and a 22% increase in survival for the HD mice. We aren't told if these are statistically significantly different from zero.

While it is not discussed, we see from the picture that at about 85 weeks, the average increase in survival probability is twenty percentage points for the HD mice and ten percentage points for the SD mice. Although at 140 weeks, the average increase in the probability of survival due to the drug is approximately zero.

The fact that the curves come together at the end suggests that the drug affects different mice differently. Again this is not discussed by the authors, but we can infer from the figure that for at least 30% of mice on a high fat diet the drug increase survival (see discussion in this post). However, for some mice (that live a long time) the drug has no effect on survival.

As for diet, we see that has a very large effect. For the non-drugged mice, switching from HD to SD increases the probability of survival at 85 weeks about 40 percentage points. While at 140 weeks it increases the probability of survival by between 5 and 10 percentage points.

So if you are a mouse, you may want to cut down on the fat.

Monday, March 31, 2014

ATE: What Is It and Why Should You Care?

Arthur Dent is from Earth, which the Encyclopedia
Galactica describes as "mostly harmless."

ATE is economics shorthand for the "average treatment effect" or what Rubin calls the "typical causal effect." Many think of it as the "casual effect."

ATE is the raison d'etre of randomized control trials.

I believe that researchers should provide policy makers, doctors and patients with more information than the average treatment effect, but not all economists agree.

In their book "Mostly Harmless Econometrics", Joshua Angrist and Jorn-Steffen Pischke state,

Even a randomized trial with perfect compliance fails to reveal the distribution of [difference in treatment outcomes]. This does not matter for average treatment effects since the mean of a difference is the difference in means. But all other features of the distribution ... are hidden because we never get to see both [treatment outcomes] for any one person. The good news for applied econometricians is that the difference in marginal distributions, is usually more important than the distribution of treatment effects because comparisons of aggregate economic welfare typically require only the marginal distributions of [treatment outcomes] and not the distribution of their difference.

It is not exactly clear what Angrist and Pischke mean but there is some support their argument. In his book Public Policy in an Uncertain World, Charles Manski, shows that a policy maker that maximizes a standard social welfare function (trust me it is something economists think policy makers may theoretically do), should only be interested in ATE.

Actual policy makers do not only care about ATE. Consider the drug Vioxx. It worked pretty well "on average." It was just the small matter of severely harming a few patients that got it into trouble.

It may be more important to remember your towel than to know the ATE.

Saturday, March 29, 2014

Average Probability of Survival Effect

In a previous post, I claimed that it was not possible to estimate the average causal effect on survival for treatments in cancer. It is not possible because average survival is not observed.

Moertel et al (1990)

In another post, I pointed that the widely reported median difference in survival has no content. This is because it is easy to come up with examples in which treatment A has higher median survival than treatment B, and yet almost all patients would live longer on treatment B.

Is there some information that can be garnered from a randomized control trial in cancer that is both measurable and would provide information to regulators, patients and doctors on the likely effect of a treatment?

There is. It is the "average probability of survival effect".

Consider the figure to the right. It presents survival probabilities (Kaplan-Meier plots) for the effect of adjuvant chemotherapy for stage III colon cancer patients. The study was interested in determining whether adjuvant chemotherapy would increase survival for colon cancer patients. Consider the 4 year mark. At that point in time approximately 50% of the standard of care arm (observation) had survived, while approximately 70% of patients in the combination with 5-FU arm had survived.

If there is no biased attrition and no biased selection into the study, then we have unbiased estimates of the average probability of surviving to 4 years when given no chemo after surgery (50%) and the average probability of surviving to 4 years when given 5-FU after surgery (70%). As the difference in averages is equal to the average difference we know that for the average stage III patient, taking a 5-FU based adjuvant chemotherapy increases the 4 year survival probability by twenty percentage points.

Of course, we may not be average and the policy implications of the measure are not clear, but those are discussions for another time.

Friday, March 28, 2014

Is There Any Theoretical Justification for Randomized Contrial Trials in Cancer Research?

No.

There is no theoretical justification for using randomized control trials to test the effectiveness of treatments for cancer in humans.

Donald Rubin

To be clear, the question is not whether it is good to have careful studies or to do replicable analysis. Those things are good. The question here is whether randomizing treatment assignment provides any information over and above some other treatment assignment mechanisms. To be even more clear, the question is not whether randomized control trials are justified in general. The question is whether they are justified in measuring survival in cancer research.

When justifying the use of randomized control trials statisticians generally point to Harvard's Donald Rubin and his seminal paper "Estimating Causal Effects of Treatments from Randomized and Nonrandomized Studies." In the paper, Rubin states that "...given the choice between a randomized study and an equivalent nonrandomized study, one should choose the data from the experiment..."

Why?

Rubin says that we should be interested in the causal effect of some treatment. If we were interested in the causal effect on survival of a new drug, Rubin would define that to be the difference between a patient's survival on the new drug and a patient's survival on the alternative treatment (perhaps the standard of care). I have no problem with that definition.

But Rubin notes that this difference, the causal effect, cannot be observed.

Houston we have a problem.

So what to do.

This is where the rabbit goes into the hat. Watch carefully.

Rubin states that instead of the causal effect (which is not observed) we should instead be interested in the "typical" causal effect. OK. I'm with you. Typical sounds reasonable.

Rubin then states that an "obvious" definition of "typical" is the average difference. Perhaps. Rubin then points out that due to the linearity of averages, the average difference is equal to the difference in the average outcome for each treatment. Further, due to the unconfounded nature of ideal randomized control trials, the average outcome of each treatment arm is an unbiased estimate of the average outcome of each treatment.

Bob's your uncle.

If we are willing to concede that the average causal effect is the appropriate measure, then that information is provided by an ideal randomized control trial.

Why is cancer different from every other night?

I'm glad you asked, youngest imaginary blog reader.

Cancer is different from every other night because in cancer, people die. Actually, the statistical problem is caused by them not dying. Because people don't die we have a censoring problem and we are unable to measure the average survival from each trial arm. No average, therefore no difference in averages, therefore no average difference, therefore no typical difference, and therefore no dice.

Pages