Ordinary least squares vs. total least squares
xxy
y=βxyy^. TLS fits the same equation by minimizing squared distances between (x,y) points and their projection on the line. In this simplest case TLS line is simply the first principal component of the 2D data. To find β, do PCA on (x,y) points, i.e. construct the 2×2 covariance matrix Σ and find its first eigenvector v=(vx,vy); then β=vy/vx.
In Matlab:
v = pca([x y]); //# x and y are centered column vectors
beta = v(2,1)/v(1,1);
In R:
v <- prcomp(cbind(x,y))$rotation
beta <- v[2,1]/v[1,1]
By the way, this will yield correct slope even if x and y were not centered (because built-in PCA functions automatically perform centering). To recover the intercept, compute β0=y¯−βx¯.
OLS vs. TLS, multiple regression
Given a dependent variable y and many independent variables xi (again, all centered for simplicity), regression fits an equation
y=β1x1+…+βpxp.
OLS does the fit by minimizing the squared errors between observed values of
y and predicted values
y^. TLS does the fit by minimizing the squared distances between observed
(x,y)∈Rp+1 points and the closest points on the regression plane/hyperplane.
Note that there is no "regression line" anymore! The equation above specifies a hyperplane: it's a 2D plane if there are two predictors, 3D hyperplane if there are three predictors, etc. So the solution above does not work: we cannot get the TLS solution by taking the first PC only (which is a line). Still, the solution can be easily obtained via PCA.
As before, PCA is performed on (x,y) points. This yields p+1 eigenvectors in columns of V. The first p eigenvectors define a p-dimensional hyperplane H that we need; the last (number p+1) eigenvector vp+1 is orthogonal to it. The question is how to transform the basis of H given by the first p eigenvectors into the β coefficients.
Observe that if we set xi=0 for all i≠k and only xk=1, then y^=βk, i.e. the vector
(0,…,1,…,βk)∈H
lies in the hyperplane
H. On the other hand, we know that
vp+1=(v1,…,vp+1)⊥H
is orthogonal to it. I.e. their dot product must be zero:
vk+βkvp+1=0⇒βk=−vk/vp+1.
In Matlab:
v = pca([X y]); //# X is a centered n-times-p matrix, y is n-times-1 column vector
beta = -v(1:end-1,end)/v(end,end);
In R:
v <- prcomp(cbind(X,y))$rotation
beta <- -v[-ncol(v),ncol(v)] / v[ncol(v),ncol(v)]
Again, this will yield correct slopes even if x and y were not centered (because built-in PCA functions automatically perform centering). To recover the intercept, compute β0=y¯−x¯β.
As a sanity check, notice that this solution coincides with the previous one in case of only a single predictor x. Indeed, then the (x,y) space is 2D, and so, given that the first PCA eigenvector is orthogonal to the second (last) one, v(1)y/v(1)x=−v(2)x/v(2)y.
Closed form solution for TLS
Surprisingly, it turns out that there is a closed form equation for β. The argument below is taken from Sabine van Huffel's book "The total least squares" (section 2.3.2).
Let X and y be the centered data matrices. The last PCA eigenvector vp+1 is an eigenvector of the covariance matrix of [Xy] with an eigenvalue σ2p+1. If it is an eigenvector, then so is −vp+1/vp+1=(β−1)⊤. Writing down the eigenvector equation:
(X⊤Xy⊤XX⊤yy⊤y)(β−1)=σ2p+1(β−1),
and computing the product on the left, we immediately get that
βTLS=(X⊤X−σ2p+1I)−1X⊤y,
which strongly reminds the familiar OLS expression
βOLS=(X⊤X)−1X⊤y.
Multivariate multiple regression
The same formula can be generalized to the multivariate case, but even to define what multivariate TLS does, would require some algebra. See Wikipedia on TLS. Multivariate OLS regression is equivalent to a bunch of univariate OLS regressions for each dependent variable, but in the TLS case it is not so.