Chris On Statistics: I Love Polynomials

The statistician, Andrew Gelman, hates polynomials. I love them.

Polynomials are really very cool and they have a lot of very nice properties.

One of the most important properties they have in statistics is that they form a "ring". This is an algebraic term meaning that any two elements of the set may be added (subtracted) or multiplied (divided) and the corresponding outcome is also an element of the set. Fractions form a ring. If you take any fraction and multiply it by any other fraction you get a fraction (a rational number). Numbers (natural numbers) do not form a ring. 5/4 is not a natural number.

What is the big deal about rings?

The big deal is that if we have a set of continuous functions on a closed and bounded space that form a ring, then that set can approximate any (that is any and all) continuous functions on the aforementioned space. In English. If we want to approximate a continuous function, then we can't do any better than to use a polynomial. For those interested, check out Stone-Weierstrass Theorem.

Polynomials are also unbelievably easy to estimate. We can just do ordinary least squares regression and viola.

So any function you want to estimate can be approximated with a polynomial and they are really easy to estimate. What is not to love?

How do you know that the polynomial you are estimating is really equal to the function you are interested in approximating? Well. You don't.

Gelman points out that there is a tendency to estimate very high order polynomials and he discusses the implications in this unpublished working paper. The problem is that the data may not allow such polynomials to be identified. The result is that many important coefficient estimates are simply made up numbers.

I show in this paper that if you have a sequence of polynomials and a data set with enough information to accurately each polynomial in the sequence, then that sequence converges to the function of interest. I also show in a Monte Carlo (with made up numbers) experiment that if there is not enough information in the data to accurately estimate a high order polynomial, the approximation error is very large.

The corollary is that if you have a polynomial that is a poor approximation of the function of interest then there is not enough information in the data to accurately estimate the high order polynomial.

Chris On Statistics

Pages

Saturday, June 14, 2014

I Love Polynomials

No comments:

Post a Comment