Unconfoundedness is a silly name for a thing. Particularly a thing that is so important. A thing that leads us to spend billions of dollars on randomized control trials every year, while perfectly good observational data lays forlorn and unloved in the databases of CMS, hospitals and insurance companies.
So what is this "unconfoundedness"?
Unconfoundedness is the state of not being confounded.
To understand unconfoundedness. It is necessary to understand confoundedness. Consider the graph to the right. We are interested in causal effect of X on Y. Where X may represent the colon cancer treatment FolFox (5-Fu and oxaliplatin) and Y represents survival of colon cancer patients. We would like to know how much of an increase in survival colon cancer patients get when they are given FolFox versus 5-Fu alone. If we observed data from Medicare patients like in this paper. We may think that FolFox have a big effect on survival. The problem is that patients are not choosing randomly between 5-Fu and FolFox. Patients and their doctors may have information about their own characteristics (U) and that information may be determining their choice of treatment (the arrow from U to X).
So it may be that when we see that patients on FolFox do much better than patients on 5-Fu, it may be that is coming from the fact that doctors and patients who are older or frailer are choosing to forgo the oxaliplatin and its associated side effects. The observed difference in outcomes may be due to the treatment or it may be due to the characteristics of the patients that are choosing each of the treatments and having nothing at all to do with the effect of the treatment itself.
How do we get rid of this confounding effect?
One way is to randomize patient assignment to treatment. This is what is done in randomized control trials. This act of random assignment removes the arrow from U to X (see graph to the left). Treatment choice (X) is no longer decided by unobserved patient characteristics (U).
Is randomization the only way to get unconfoundedness?
No. There are many ways for data sets that have the unconfoundedness property. The important thing is that the choice of treatment is unrelated to unobserved characteristics of the patients that may be associated with different observable outcomes. For example, prior to oxaliplatin getting FDA approval, there was very little use of the drug by colon cancer patients. Economists call observational data that satisfies unconfoundeness, "natural experiments."
Why is unconfoundedness good?
Technically, unconfoundedness allows the researcher to measure the "marginal" distribution of outcomes conditional on the treatment. The observed distribution of outcomes conditional on treatment choice is an unbiased estimate of the marginal distribution of outcomes conditional on treatment.
OK. Why is what-ever-you-said good?
Well, we may be able to use the marginal distribution of outcomes to calculate information about the treatment effect (the causal effect). For example, if we observe the "average" conditional on the treatment choice, then we can measure Rubin's "typical causal effect" (of course in cancer we don't observe the average survival). More generally, we can use these estimates to bound the proportion of patients who will have a positive treatment effect.