### Linear regression issues in astronomy

```Regression
Eric Feigelson
Classical regression model
``The expectation (mean) of the dependent (response) variable Y for a given
value of the independent variable X (or vector of variables X) is equal to a
specified mathematical function f, which depends on both X and a vector of
parameters q, plus a random error (scatter).”
All of the randomness in the model resides in this error term; the functional
relationship is deterministic with a known mathematical form.
The `error’ e is commonly assumed to be a normal (Gaussian) i.i.d. random
variable with zero mean, e = N(m,s2) = N(0,s2).
Parameter estimation & model selection
Once a mathematical model is chosen, and a dataset is provided,
then the `best fit’ parameters q are estimated by one (or more) of
the following estimation techniques:
•
•
•
•
•
Method of moments
Least squares (L2)
Least absolute deviation (L1)
Maximum likelihood estimation
Bayesian inference
Choices must be made regarding model complexity and parsimony
(Occam’s Razor):
• Does the LCDM model have a w-dot term?
• Are three or four planets orbiting the star?
Model selection methods are often needed: c2n, BIC, AIC, …
The final model should be validated against the dataset (or other
datasets) using goodness-of-fit tests (e.g. Anderson-Darling
test) with bootstrap resamples for significance levels.
Robust regression
In the 1960-70s, a major effort was made to design regression
methods that are `robust’ against the effects of non-Gaussianity and
outliers. Today, a common approach is based on M estimation that
reduces the influence of data points with large residuals. Some
common methods:
1.
Huber’s estimator Here we minimize a function with less
dependence on outliers than the sum of squared residuals:
2.
Rousseeuw;s least trimmed squares Minimize:
3.
MM estimators More complicated versions with scaled residuals
Quantile regression
When residuals are non-Gaussian and the sample is large, regressions
can be performed for different quantiles of the residuals, not just the
mean of the residuals. The methodology is complex but often
effective.
90%
50%
10%
Nonlinear astrophysical models
Long study of the radial profile of brightness in an elliptical galaxy (or
spiral galaxy bulge) as a function of distance r from the center of an
elliptical galaxy or spiral galaxy bulge. Models: Hubble’s isothermal
sphere (1930), de Vaucouleurs’ r1/4 law (1948), Sersic’s law (1967):
Nonlinear regression: Count and categorical data
Some nonlinear regressions treat problems where the values of the
response variable Y are not real numbers.
Poisson regression Y is integer valued
This occurs often in astronomy (and other fields) when one counts
objects, photons, or other events as a function of some other
variables. This includes `luminosity functions’ and `mass functions’ of
galaxies, stars, planets, etc. However, Poisson regression has never
been used by astronomers!
The simplest Poisson regression model is loglinear:
Approaches to regression with measurement error
• `Minimum chi-squared’ regression
• York’s (1966) analytical solution for bivariate linear regression
• Modified weighted least squares (Akritas & Bershady 1996)
• `Errors-in-variables’ or `latent variable’ hierarchical regression
models (volumes by Fuller 1987; Carroll et al. 2006)
• SIMEX: Simulation-Extrapolation Algorithm (Carroll et al.)
• Normal mixture model: MLE and Bayesian (Kelly 2007)
Unfortunately, no consensus has emerged.
I think Kelly’s likelihood approach is most promising.
Problems with `minimum chi-square regression
a very common astronomical regression procedure
Dataset of the form:
(bivariate data with heteroscedastic measurement errors with known variances)
Linear (or other) regression model:
Best fit parameters from minimizing the function:
The distributions of the parameters are estimated from tables of the c2
distribution (e.g. Dc2 = 2.71 around best-fit parameter values for 90% confidence
interval for 1 degree of freedom; Numerical Recipes, Fig 15.6.4)
This procedure is often called `minimum chi-square regression’ or `chisquare fitting’ because a random variable that is the sum of squared
normal random variables follows a c2 distribution. If all of the variance in
Y is attributable to the known measurement errors
and these errors
are normally distributed, then the model is valid.
However, from a statistical viewpoint ….
… this is a non-standard procedure! Pearson’s (1900) chi-square
statistic was developed for a very specific problem: hypothesis testing
for the multinomial experiment: contingency table of counts in a predetermined number of categories.
where Oi are the observed counts, and Mi are the model counts
dependent on the p model parameters q. The weights (denominator)
are completely different than in the astronomers’ procedure. R.A.
Fisher showed that X2P is asymptotically (for large n) distributed as
c2 with k-p-1 degrees of freedom.
When the astronomers’ measurement errors are estimated from the
square-root of counts in the multinomial experiment, then their
procedure is equivalent to Neyman’s version of the X2 statistic with
the observed counts in the denominator.
Why the astronomers procedure can be very uncertain!
• When the experiment does not clearly specify the number of `bins’, the
degrees of freedom in the resulting c2 distribution is ill-determined
(Chernoff & Lehmann 1954).
Astronomers often take real valued (X,Y) data and `bin’ them in
arbitrary ordered classes in X (e.g. a histogram), average the Y
values in the bin, and estimate the measurement error as
sqrt(n_bin), and apply c2 fitting. This violates the ChernoffLehmann condition.
• The astronomer often derives the denominator of the c2-like statistic
in complicated and arbitrary fashions. There is no guarantee that the
resulting `measurement errors’ account for the scatter in Y about the
fitted function.
The
values may be estimated from the noise in the astronomical
image or spectrum after complicated data reduction. Sometimes
the astronomer adds an estimated systematic error to the noise
error term, or adjusts the errors in some other fashion. The result is
arbitrary changes in the minimization procedure with no guarantee
that the statistic is c2 distributed.
```