Weighted Least Squares

The Art of Statistical Discrimination

Sep 10, 2025

“The goal is to turn data into information, and information into insight.”

—Carly Fiorina, former CEO of Hewlett-Packard

Some of my readers familiar with economics, econometrics, statistics and ML (or have read my "Ten Commandments of Econometrics") are well acquainted with the standard approach to linear regression estimation: Ordinary Least Squares. OLS minimizes the sum of squared residuals — the differences between actual and predicted values.

But OLS makes a critical assumption: all observations are equally informative. This assumption frequently fails when our data contains systematic differences in reliability, representativeness, or economic significance across observations.

The Motivation: Why Equal Treatment Can Mislead

Consider estimating the relationship between R&D spending and firm productivity using a dataset of technology companies. Your sample contains 100 firms: 70 are small startups with volatile earnings and limited track records, while 30 are established corporations with stable operations and comprehensive financial reporting.

Should a three-person startup that might not exist next year receive the same influence in your regression as Apple or Google? Intuitively, the answer is no. The established firms provide more reliable information about the true R&D-productivity relationship, while startup data contains more measurement error and idiosyncratic variation.

This intuition motivates weighted least squares: give more credible observations greater influence in parameter estimation.

The Mathematical Framework

Standard OLS minimizes the objective function:

\(\min_{\beta_0, \beta_1, \dots, \beta_k} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_{1i} - \beta_2 x_{2i} - ... - \beta_k x_{ki})^2\,,\)

where y_i represents the dependent variable (say, productivity) for firm i, and x_ji denotes the value of explanatory variable j (like R&D intensity) for that firm.

Weighted least squares incorporates differential reliability by minimizing:

\(\min_{\beta_0, \beta_1, \dots, \beta_k} \sum_{i=1}^n w_i (y_i - \beta_0 - \beta_1 x_{1i} - \beta_2 x_{2i} - ... - \beta_k x_{ki})^2\,,\)

The weight w_i reflects how much we trust observation i. In our firm example, we might set w_i equal to firm age, number of employees, or revenue — proxies for data reliability. A ten-year-old firm with 1000 employees receives higher weight than a six-month-old startup with three employees.

The first-order conditions for this weighted problem yield:

\(\sum_{i=1}^n w_i (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_{1i} - \hat{\beta}_2 x_{2i} - ... - \hat{\beta}_k x_{ki}) = 0\,,\)

and for each slope coefficient j = 1, … , k:

\(\sum_{i=1}^n w_i x_{ji} (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_{1i} - \hat{\beta}_2 x_{2i} - ... - \hat{\beta}_k x_{ki}) = 0\,,\)

These equations state that the weighted residuals sum to zero, and the weighted covariance between each explanatory variable and the residuals equals zero. Unlike OLS, however, observations with larger weights have proportionally greater influence on satisfying these moment conditions.

Three Practical Applications

Heteroscedasticity Correction

The most theoretically grounded use of WLS addresses heteroscedasticity — situations where error variance differs across observations. Suppose measurement error in productivity data increases with firm size, perhaps because larger firms have more complex operations that are harder to measure accurately.

If we model this as

\(\text{Var} (\varepsilon_i) = \sigma^2 \cdot \text{size}_i\,,\)

then the optimal weights are

\( w_i = \frac{1}{\text{size}_i} \,.\)

This gives smaller firms higher weight, compensating for their lower measurement error.

More generally, if

\(\text{Var} (\varepsilon_i) = \frac{\sigma^2}{w_i}\,,\)

then weighting by w_i produces efficient estimates.

Precision-Based Weighting

Sometimes observations differ in inherent precision rather than error variance per se. Consider estimating the effect of class size on student test scores using district-level data. Districts with 100 students per grade provide more precise estimates than districts with 10 students per grade, simply due to sample size differences.

A natural approach sets w_i equal to the number of students in district i. Districts with more students receive higher weight because their average outcomes are measured more precisely — the same logic used in meta-analysis.

Economic Significance Weighting

Finally, we might weight by economic importance rather than statistical precision. When estimating the relationship between infrastructure investment and regional growth, a regression including both Wyoming and California should arguably give California much greater influence — it represents a far larger share of national economic activity.

Setting weights proportional to state GDP ensures that parameter estimates reflect the experiences of economically significant regions rather than treating all states equally.

Implementation Strategy

The critical challenge is choosing appropriate weights. The process typically follows three steps:

First, identify the source of heterogeneity. Is the problem measurement error that varies across observations? Different precision levels? Varying economic significance? The answer determines your weighting strategy.

Second, construct weights based on observable characteristics. For heteroscedasticity, run OLS first and model the squared residuals as a function of explanatory variables. If

\(\hat{\varepsilon}_i^2 = \alpha_0 + \alpha_1 z_i + v_i\)

where z_i is some variable that predicts error variance, then set

\( w_i = \frac{1}{\hat{\alpha}_0 + \hat{\alpha}_1 z_i}\)

Third, estimate the weighted model and conduct sensitivity analysis. How do results change with alternative weighting schemes? If estimates are highly sensitive to weight specification, your conclusions may be less robust than initially apparent.

Statistical Properties and Pitfalls

Under correct specification, WLS preserves unbiasedness:

\(\mathbb{E}[\hat{\beta}_j] = \beta_j\,,\)

for all parameters. When weights equal the inverse error variance, WLS achieves optimal efficiency among linear unbiased estimators.

The variance of each parameter estimate becomes:

\(\text{Var} (\hat{\beta}_j) = \frac{\sigma^2} {\sum_{i=1}^n w_i (x_{ji}-\bar{x}_{jw})^2}\)

where

\(\bar{x}_{jw} = \frac{\sum_{i=1}^n w_i x_{ji}}{\sum_{i=1}^n w_i}\)

is the weighted mean of variable j. Higher weights on observations with larger covariate variation reduce parameter uncertainty.

Misspecified weights can be worse than no weights at all! If you weight by firm size but the true source of heterogeneity is firm age, you may actually increase estimation error. This is why theoretical justification and sensitivity analysis remain crucial (see the first commandment of econometrics).

The most dangerous pitfall involves endogenous weight selection — choosing weights to achieve preferred results. This practice invalidates all standard inference procedures and transforms objective analysis into sophisticated data manipulation.

Conclusion

Weighted least squares recognizes that not all observations are created equal. When applied thoughtfully, it can substantially improve both the efficiency and relevance of regression estimates. The method shines when dealing with heteroscedastic errors, precision differences, or varying economic significance across observations.

Success requires methodological discipline: weights must be justified theoretically, chosen before examining results, and subjected to sensitivity analysis. Done properly, WLS transforms the crude assumption of equal observation importance into a more nuanced approach that reflects the underlying data structure. This exemplifies good econometric practice — using available information to improve inference while maintaining rigorous standards for causal identification.

Jordan Peeples, PhD

Sep 10

Nice review! I think too much faith can be placed in the sampling methods of different datasets. Weighting is incredibly important for constructing summary statistics as well.

Just to add two ways of adjusting weights when you don't have them in a dataset (for anyone curious enough to look these up): Post-stratification and propensity weighting.

Expand full comment

Statonomics

Discussion about this post