back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [ 94 ] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]


94

5. The Hurst coefficient for a time series can be seen on Bloomberg by typing IBM <equity> KAOS <go>. It can be calculated using the RS function found in the Wolfram Research MathSource repository at http: www.mathsource.com/cgibin/MathSource/Publications /Periodicals/TheMathematicaJournal/0205-692.

6. See Soros, G., "The Theory of Reflexivity," Address to the MIT Department of Economics, World Economy Laboratory Conference, Washington, DC, April 26, 1994.

7. For more information on fractals, see Schroeder, M., Fractals, Chaos and Power Laws, New York: W. H. Freeman, 1993.

8. Meyer, Y., Wavelets, Algorithms and Applications, Philadelphia: SLAM, 1993, offers a good introduction.

9. I wish to thank Rolf Carlson of Wolfram Research for his illustrations and expertise with Mathe-matica 3.0™.

10. For a good introduction on fuzzy logic, see Cox, E., Fuzzy Logic for Business and Industry, Rockland, MA: Charles River Media, 1995-

11. A WWW site for exploring neural networks with Mathematica™ and for exploring with MAT-LAB™ is Error! Bookmark not defined. A neural net add-in module to MS Excel also can be found at Error! Bookmark not defined.

12. For an introduction to neural nets, see Caudill, M., and Butler, C, Naturally Intelligent Systems, MIT Press. For a unified view of neural net architectures, see Jurik, M., "Understanding Neural Networks," in Neural Networks and Financial Forecasting, Aptos, CA: Jurik Research, 1996. Neural nets can be explored using Mathematica™, MATLAB™, or many other commercially available software products.

13. May, C.T., Nonlinear Pricing, New York: John Wiley & Sons, 1998.



Making Profits with Data Preprocessing

Casimir C. "Casey" Klimasauskas

The primary motivation for preprocessing data is to build better models, where "better" means more consistently profitable. To this end, there are three reasons to pre-process data:

1. To extract key features that a human analyst might use, simplifying the structure of the model.

2. To reduce model dependency on specific signal levels, improving the performance of models as markets enter new trading ranges.

3. To reshape the distribution of the data enhancing the performance of the neural or statistical modeling techniques used to develop the models.

Each of these transformations increases the number of candidate input variables. This often has an adverse effect on model performance, particularly in financial applications. Though some modeling techniques have been developed for effectively dealing with large numbers of highly correlated inputs and noisy data, they are not readily available. As such, whenever using transformations, it is always essential to select a handful of inputs from all the candidates prior to building the model.

motivation and issues in preprocessing

When one of our customers needs to hire a statistician, they place an ad in the local paper. Everyone who responds is required to take a test prior to an interview. The test consists of 1,000 rows by 51 columns of numbers. The objective of the test is to determine what relationships, if any, exist between the first column and the last 50 columns. The applicants have their choice of computer and statistical package to use. There is no specific time limit on the test.

Most individuals who take the test find some relationship between the first column and the last 50. All the data is random, however, and any relationships they



discover are spurious. This example illustrates one of the issues that arise when using relatively small amounts of data in high-dimensional spaces.

This is exactly analogous to the problem of pulling faint signals out of noisy financial data. The relationships we want to find may be hidden by spurious correlations in the flood of data available. To effectively pull these signals out, it is essential to find ways to transform the data so that we enhance the signals we want.

As an example of one of the problems with small amounts of data in high-dimensional spaces, Table 16.1 shows the kind of correlations and models that can be created with random data. You can replicate this experiment in Microsoft Excel® by filling 16 columns and 100 rows with random data (=rand()). Then copy the entire area and use "paste special," to paste only the values. Use the "correl()" function to compute the correlation (R) between the dependent values (first column) and each of the independent variables. This is the "Linear Correlation Coefficient" shown in Table 16.1. Finally, use "linestQ" (the linear regression function) to perform linear regression from the dependent variables to the independent variables. Read the documentation on "linest()" carefully. It swaps coefficients with the first one last and vice versa. This is the "Linear Regression Coefficient" in Table 16.1. (If you decide to write a program to test this, DO NOT USE the ANSI standard random number generator found in the math library of many compilers. Instead, select one from those described in a professional

Table 16.1 Coefficients from Linear Regression and Linear correlation coefficient for a problem with fifteen dependent variables and one independent variable

Linear Regression Coefficient Linear Correlation Coefficient (R)

Y-Correlation (R)

0.394

0.382

0.01

-0.03

0.05

0.05

-0.04

0.02

0.11

0.09

-0.04

-0.01

0.28

0.28

-0.04

-0.01

-0.03

-0.04

-0.03

-0.06

0.17

0.10

0.04

0.05

0.05

0.04

0.12

0.11

0.10

0.08

-0.08

-0.08

The first row of the table shows the linear correlation of the resulting formula with the dependent variable.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [ 94 ] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]