Cholesky กับ eigendecomposition สำหรับการวาดตัวอย่างจากการแจกแจงปกติหลายตัวแปร

16

ผมอยากจะวาดตัวอย่าง )วิกิพีเดียแสดงให้เห็นว่าจะใช้CholeskyหรือEigendecompositionคือ หรือ $\mathbf{x} \sim N\left(\mathbf{0}, \mathbf{\Sigma} \right)$ $\mathbf{\Sigma} = \mathbf{D}_1\mathbf{D}_1^T$ $\mathbf{\Sigma} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$

และด้วยเหตุนี้ตัวอย่างสามารถวาดผ่าน: หรือ $\mathbf{x} = \mathbf{D}_1 \mathbf{v}$ โดยที่ $\mathbf{x} = \mathbf{Q}\sqrt{\mathbf{\Lambda}} \mathbf{v}$ $\mathbf{v} \sim N\left(\mathbf{0}, \mathbf{I} \right)$

Wikipedia suggests that they are both equally good for generating samples, but the Cholesky method has the faster computation time. Is this true? Especially numerically when using a monte-carlo method, where the variances along the diagonals may differ by several orders of magnitude? Is there any formal analysis on this problem?

— Damien
แหล่งที่มา

1

Damien, the best recipe to make sure what program is faster is to check it yourself on your software: Cholesky- and Eigen- decompositions functions may vary in speed in different implementations. The Cholesky way is more popular, AFAIK, but the eigen way may be potentially more flexible.

— ttnphns

1

O (N^{3} / 3)

$O(N^3/3)$

O (N^{3})

$O(N^3)$ (Jacobi Eigenvalue Algorithm. However, I have two further problems: (1) What does "potentially more flexible" mean? and (2) The variances differ by several orders of magnitude (

10^{- 4}

$10^{-4}$ vs

10^{- 9}

$10^{-9}$ for the most extreme elements) - does this have a bearing on the selected algorithm?

— Damien

@Damien one aspect of "more flexible" is that the eigendecomposition, which for a covariance matrix corresponds to the SVD, can be truncated to get an optimal low-rank approximation of the full matrix. The truncated SVD can be computed directly, rather than computing the full thing and then throwing out the small eigenvalues.

— GeoMatt22

How about reading my answer at Stack Overflow: Obtain vertices of the ellipse on an ellipse covariance plot (created by car::ellipse). Although the question is asked in different application, the theory behind is the same. You will see nice figures for geometric explanation there.

— 李哲源

12

The problem was studied by by Straka et.al for the Unscented Kalman Filter which draws (deterministic) samples from a multivariate Normal distribution as part of the algorithm. With some luck, the results might be applicable to the monte-carlo problem.

The Cholesky Decomposition (CD) and the Eigen Decomposition (ED) - and for that matter the actual Matrix Square Root (MSR) are all ways in which a positive semi-definite matrix (PSD) can be broken down.

พิจารณาSVDของเมทริกซ์ PSD $P = USV^T$ . Since P is PSD, this is actually the same as the ED with $P = USU^T$ . Moreover, we can split the diagonal matrix by its square root: $P = U\sqrt{S}\sqrt{S}^TU^T$ , noting that $\sqrt{S} = \sqrt{S}^T$ .

ตอนนี้เราอาจแนะนำเมทริกซ์มุมฉากแบบสุ่ม $O$ :

$P = U\sqrt{S}OO^T\sqrt{S}^TU^T = (U\sqrt{S}O)(U\sqrt{S}O)^T$ .

The choice of $O$ actually affects the estimation performance, especially when there is strong off-diagonal elements of the covariance matrix.

The paper studied three choices of $O$ :

$O = I$ , which corresponds to the ED;
$O = Q$ from the QR decomposition of $U\sqrt{S} = QR$ , which corresponds to the CD; and
$O = U^T$ which leads to a symmetric matrix (i.e. MSR)

From which the following conclusions were drawn in the paper after much analysis (quoting):

For a to-be-transformed random variable with uncorrelated elements all the three considered MDs provide identical sigma points and hence they make almost no difference on quality of the [Unscented Transform] approximation. In such a case the CD may be preferred for its low costs.

If the random variable contains correlated elements, the use of different [decompositions] may significantly affect quality of the [Unscented Transform] approximation of the mean or covariance matrix of the transformed random variable. The two cases above showed that the [ED] should be preferred.

If the elements of the to-be-transformed variable exhibit strong correlation so that the corresponding covariance matrix is nearly singular, another issue must be taken into account, which is numerical stability of the algorithm computing the MD. The SVD is much more numerically stable for nearly singular covariance matrices than the ChD.

Reference:

Straka, O.; Dunik, J.; Simandl, M. & Havlik, J. "Aspects and comparison of matrix decompositions in unscented Kalman filter", American Control Conference (ACC), 2013, 2013, 3075-3080.

— Damien
แหล่งที่มา

6

Here is a simple illustration using R to compare the computation time of the two method.

library(mvtnorm)
library(clusterGeneration)
set.seed(1234)
mean <- rnorm(1000, 0, 1)
sigma <- genPositiveDefMat(1000)
sigma <- sigma$Sigma

eigen.time <- system.time(
  rmvnorm(n=1000, mean=mean, sigma = sigma, method = "eigen")
  )

chol.time <- system.time(
  rmvnorm(n=1000, mean=mean, sigma = sigma, method = "chol")
  )

The running times are

> eigen.time
   user  system elapsed 
   5.16    0.06    5.33 
> chol.time
   user  system elapsed 
   1.74    0.15    1.90

When increasing the sample size to 10000, the running times are

> eigen.time <- system.time(
+   rmvnorm(n=10000, mean=mean, sigma = sigma, method = "eigen")
+   )
> 
> chol.time <- system.time(
+   rmvnorm(n=10000, mean=mean, sigma = sigma, method = "chol")
+   )
> eigen.time
   user  system elapsed 
   15.74    0.28   16.19 
> chol.time
   user  system elapsed 
   11.61    0.19   11.89

Hope this helps.

— Aaron Zeng
แหล่งที่มา

3

Here's the manual, or poor-man's, prove-it-to-myself demonstration:

> set.seed(0)
> # The correlation matrix
> corr_matrix = matrix(cbind(1, .80, .2, .80, 1, .7, .2, .7, 1), nrow=3)
> nvar = 3 # Three columns of correlated data points
> nobs = 1e6 # One million observations for each column
> std_norm = matrix(rnorm(nvar * nobs),nrow=nobs, ncol=nvar) # N(0,1)

Corr = [\begin{matrix} 1 & .8 & .2 \\ .8 & 1 & .7 \\ .2 & .7 & 1 \end{matrix}]

$\text{Corr}=\small \begin{bmatrix} 1 & .8 & .2\\ .8& 1 & .7 \\ .2&.7&1 \end{bmatrix}$

N = [\begin{matrix} [, 1] & [, 2] & [, 3] \\ [1,] & - 1.0806338 & 0.6563913 & 0.8400443 \\ [2,] & - 1.1434241 & - 0.1729738 & - 0.9884772 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ ⋮ & ⋮ & ⋮ & ⋮ \\ [999999,] & 0.4861827 & 0.03563006 & - 2.1176976 \\ [1000000,] & - 0.4394551 & 1.69265517 & - 1.9534729 \end{matrix}]

$\text{N}=\tiny \begin{bmatrix} & [,1] & [,2] & [,3] \\ [1,] & -1.0806338 & 0.6563913 & 0.8400443 \\ [2,] & -1.1434241 & -0.1729738 & -0.9884772 \\ \vdots & \vdots & \vdots & \vdots \\ \vdots & \vdots & \vdots & \vdots \\ [999999,] & 0.4861827 & 0.03563006 & -2.1176976 \\ [1000000,] & -0.4394551 & 1.69265517 & -1.9534729\\ \end{bmatrix}$

1. SVD METHOD:

{[\underset{[3 \times 3]}{U} \underset{[\begin{matrix} \sqrt{d_{1}} & 0 & 0 \\ 0 & \sqrt{d_{2}} & 0 \\ 0 & 0 & \sqrt{d_{3}} \end{matrix}]}{Σ^{0.5}} \underset{[3 \times 10^{6}]}{N^{T}}]}^{T}

$\left[ \bf \underset{[3 \times 3]}{\color{blue}{\Large\,U}}\,\,\,\,\,\underset{\tiny \begin{bmatrix}\sqrt{d_1}&0&0\\0&\sqrt{d_2}&0\\0&0&\sqrt{d_3}\end{bmatrix}}{\Large\color{blue}{\Sigma^{0.5}}} \, \underset{[3\times 10^6]}{\Large\color{blue}{N^T}} \right]^T$

> ptm <- proc.time()
> # Singular Value Decomposition method:
> svd = svd(corr_matrix)   
> rand_data_svd = t(svd$u %*% (diag(3) * sqrt(svd$d)) %*% t(std_norm))
> proc.time() - ptm
   user  system elapsed 
   0.29    0.05    0.34 
> 
> ptm <- proc.time()

2. CHOLESKY METHOD:

{[\underset{[\begin{matrix} c_{11} & 0 & 0 \\ c_{21} & c_{22} & 0 \\ c_{31} & c_{32} & c_{33} \end{matrix}]}{Ch} \underset{[3 \times 10^{6}]}{N^{T}}]}^{T}

$\bf \left[ \underset{\begin{bmatrix}c_{11}&0&0\\c_{21}&c_{22}&0\\c_{31}&c_{32}&c_{33}\end{bmatrix}}{\Large\color{blue}{\text{Ch}}}\,\,\underset{[3\times 10^6]}{\Large\color{blue}{N^T}} \right]^T$

> # Cholesky method:
> chole = t(chol(corr_matrix))
> rand_data_chole = t(chole %*% t(std_norm))
> proc.time() - ptm
   user  system elapsed 
   0.25    0.03    0.31

Thank you to @userr11852 for pointing out to me that there is a better way to calculate the difference in performance between SVD and Cholesky, in favor of the latter, using the function microbenchmark. At his suggestion, here is the result:

microbenchmark(chol(corr_matrix), svd(corr_matrix))
Unit: microseconds
              expr     min     lq      mean  median      uq     max neval cld
 chol(corr_matrix)  24.104  25.05  28.74036  25.995  26.467  95.469   100  a 
  svd(corr_matrix) 108.701 110.12 116.27794 111.065 112.719 223.074   100   b

— Antoni Parellada
แหล่งที่มา

@user11852 Thank you. I read cursorily the entry on microbenchmark and it really makes a difference.

— Antoni Parellada

Sure, but does it have a difference in estimation performance?

— Damien

Good point. I haven't had time to explore the package.

— Antoni Parellada