back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [ 62 ] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]


62

Step 1: Normalize the explanatory variables so that X\, . . ., X\ have mean 0 and variance 1 over the estimation period. Thus X*= (Xt - ,-)/ ,- where , and cr, are the mean and standard deviation of X, for i - 1, . . ., k.

Step 2: Pass X*u . . ., X*k through PCA to obtain all principal components /*],..., Pk, along with their factor weights matrix.

Step 3: Perform an OLS regression of Y on Pu . . Pk to obtain the intercept estimate a and slope coefficient estimates b = (bu . . ., bk).

Step 4: Use the factor weights from the PCA to convert these estimates into coefficients for the original model and their covariance matrix estimate.

To see this in mathematical terms, consider the PCA written in matrix form as in (6.1), that is,

X* = PW,

where W is the factor weights matrix. Following §A.1.2, the OLS estimated model in step 3 is

= a + Pb + e, (6.8)

where a - (a, a, . . ., a), and by (A.1.10) the vector b of OLS estimates is

b = (PP)-Py = A Py

because PP - A, the x diagonal eigenvalue matrix of X*X*. By (A. 1.13) their covariance matrix is rj2A. Now the orthogonal regression model is obtained by substituting in P = X*W to (6.8). That is,

= a + X*b* + e, (6.9)

where the coefficients b* = Wb. In terms of the non-standardized original variables, (6.9) becomes

y = c + Xd + e, (6.10)

where d = (b*/au . . ., b*/ak) - Eb*, in which S is a diagonal matrix with 1 / , on the ith diagonal. Therefore the orthogonal regression slope coefficients can be obtained as a simple transformation of the direct regression slope coefficients b using the PCA factor weights W and the diagonal matrix S:

d = SWb.

The constant in (6.10) is - a + where - ( ,, . . ., \ik), the vector of means of the explanatory variables. The /-statistics and other model diagnostics for (6.10) are obtained from the (diagonal) covariance matrix of d. This is simply computed as13

13 V(a) = SWK(b)Vv"S = £Wa2A-W£ = a2SA-S.



Paribas SocGen Danone

0.019969 0.021218 0.015506

Pi

0.01043 -0.00008 -0.00003

0.85004 0.84801 0.70791 0.28896 0.29994 -0.70628 0.44038 -0.43694 -0.00538

42.56798 39.96654 45.6539 14.47043 14.13611 -45.5488 22.05318 -20.5929 -0.34727

0.3984 0.1531 0.2465

K(d) = a2EA 1E,

where a2 is the variance of y, and is the diagonal matrix of eigenvalues of X*X*.

To summarize the procedure, instead of regressing Y on a set of correlated variables in X, the regression (6.8) is performed on the uncorrelated principal components. The OLS coefficients b that are estimated on the principal components are simply transformed into coefficient estimates for the original model. In fact d = EWb, where E is a diagonal matrix with 1 / , on the diagonal (cj, is the standard deviation of the ith explanatory variable) and W is the PCA factor weights matrix.

As an example, consider a regression of the CAC index daily returns on the daily returns to three stocks, Paribas, Societe Generate and Danone. An OLS procedure using daily returns gives the following estimates, with /-statistics in parentheses:

rcac = 0.0003 +0.1943 rparibas + 0.2135 rsocgen + 0.2995 rdan. (6.11)

(1.45) (14.71) (17.21) (20.55)

If this model were to be used for portfolio optimization the capital allocation to each stock as determined by the estimated coefficients would be approximately 27.5% to Paribas, 30% to Societe Generate and 42.5% to Danone. However, the stocks were highly correlated during the data period: Paribas and Societe Generate had a correlation of 0.615 and Danone had a correlation of almost 0.4 with each of the other two stocks. This high level of collinearity will have affected the coefficient estimates in the OLS regression above.

The same model is estimated by an orthogonal regression, and the results are shown in Table 6.6. This is a much better model because the portfolio weights will not be distorted by multicollinearity. The coefficient estimates d - EWb in the last column of the table indicate that approximately 50% of the capital should actually be allocated to Paribas, only 19% to Societe Generate and 31% to Danone.

Table 6.6: Calculation of orthogonal regression coefficients



6.4.2 Missing Data

This section will show that when data are missing from a variable that lies in a correlated system, it is possible to use PCA to fill in appropriate values for the missing variable from data that are available on the correlated variables. Values for the missing data should be created in a way that reflects the correlation in the system, and PCA is the natural way to do this.

Suppose the new stock has a daily return X,, and suppose X2, . . ., Xk are the daily returns on the correlated stocks.

Step 1: Perform a PCA on X, and X2, . . ., Xk using only the most recent data (as far back as you can go to include data on X,) to obtain principal components and factor weights as in (6.2). Choose only the first m principal components for the representation, where m < k. The choice of m will depend on how highly correlated the system is.14 Save the factor weights of the representation of Xb and denote these factor weights as vvn, . . ., wlm.

Step 2: Perform another PCA, this time using a long history of data on just X2, . . ., Xk, and take the same number m of principal components. Call these components P{, . . ., Pm. They will be time series going from the start of the data set on X2, . . ., Xk up to the present.

Step 3: Recreate an artificial data history on X, for the same long period, using the factor weights from step 1 and the principal components from step 2, as:

X?= wuP, +WnP2 + ... + WunPm.

Step 4: To calibrate the model, decisions about the other variables to include and the size of m are necessary. The real data on X, that are available should be compared with the simulated data on Xf over the recent period from step 3. The variables X2, . . ., Xk and the number of components m will need to be chosen so that there is a reasonably small root mean square error between X, and Xt-

To illustrate the procedure, suppose that a new stock has been issued in the US banking sector and that daily prices are available from 2 March to 6 October 2000. First choose some related stocks that have a long price history and that are reasonably highly correlated with the new stock. For this illustration I have chosen just seven such stocks, because a PCA of these seven stock returns and the new stock return between 2 March and 6 October 2000 already gives quite good results. In fact four principal components explain about 72% of the variation in the system and the new stock return has the principal component representation:

0.85867 + 0.047495 + 0.091244/>3 + 0.35181/>4. (6.12)

A price series for the new stock from the beginning of 1998 was simulated by taking the returns from the seven banking stocks (all have price histories going

14 As a rule of thumb, we need to take enough principal components to explain at least 60-70% of the \ anauon

Values for the missing data should be created in a way that reflects the correlation in the system, and PCA is the natural way to do this



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [ 62 ] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]