back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [ 123 ] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]


123

Part IV SUMMARY

Market Analysis

• The development of every trading system should start with a rational observation about market behavior. You must take care in developing and understanding relationships that are meaningful so as not to rely on random patterns. Statistical analysis will help you determine the accuracy of the inferences you may derive from data.

• Your first attempt should be to break down a price time series into these components: long-term trend, cyclical effects, seasonal effects, and residual effects. It may be easier to preprocess or model each component separately.

• Examine cross-market interactions and measure cross-correlation effects. Once you understand the relationship between two data streams, it is a simple process to determine which one leads the other and by how much. This will allow you to take advantage of the relationship, by looking for divergence between the trends that describe each.

• Classical economic theory is based on linear, Gaussian, and stationary (non-time-varying) assumptions about market behavior. Nonlinear pricing (NP) is based on nonlinear, non-Gaussian, and nonstationary assumptions. In addition to relying on market relationships, NP assumes the strengths of these relationships may change over time, and quantifies these changes as well.

• NP practitioners quickly learn that sometimes linearity is an inaccurate paradigm. Straight lines are not good forecasting tools, nor are completely random walks. Most of the time traded asset price actions are not random; instead they persist or antipersist. That is, they have the tendency to follow their current path or reverse themselves.

• Persistence is not trend. Trend is a perspective in the present that looks back in time. It is easy for a random walk to appear trending. Persistence is a perspective in the present that looks forward in time to give a "likelihood" of future price movements. It is therefore useful to measure prices persistence over time.

• An opportunity to profit is to arbitrage between the pattern depicted by NP and the markets inability to detect the pattern just as accurately. Moreover, this efficiency is unlikely to be arbitraged away because of the number of variables involved, the varying investment horizons, and the technology gap.

Data Representation and Preprocessing

• There are three reasons to preprocess your data prior to modeling.

1. To reshape the distribution of the data.

2. To extract key features that a human analyst might use, simplifying the structure of the model.



Part IV

3. To reduce model dependency on specific signal levels, improving the performance of models as markets enter new trading ranges.

• A common preprocessing method is to detrend your data. If you do so, your model will often generalize well even when the market reaches record highs because detrended data focuses on the relative shape and relationships between recent activities.

• If you intend to include a spectral analysis of a time series in your advanced system, consider using wavelets instead of Fourier coefficients. Unlike Fourier analysis, which is given to stationary data, time-frequency wavelets are suited for quasi-stationary signals and time-scale wavelets are given to fractal structures.

• You can attain a uniform distribution of input values by separating the input records into bins and selecting an equal number of records from each bin. By doing so, you can frequently build systems using only a fraction of the data set. This leaves you with more data cases to test your system.

• Complex, overoptimized systems may work well on in-sample data but will likely fail miserably on out-of-sample data. The more complex your system is, the more data you will need to develop and test it. But because the financial markets are nonstationary (statistical characteristics change over time), distant historical data may be unusable. Therefore, keep your systems simple. Simple models are often more accurate than sophisticated models.

• One way to simplify your system is to reduce the number of input variables feeding it. There are many ways to do this. Theoretically, the perfect way is to try all possible combinations of input variables and see which collection works

. best. But for large numbers of variables, this is far too time consuming. Acceptable alternatives are listed in Appendix C.

• If the number of inputs have been reduced to a sufficiently small number (5 to 10), it is often possible to eliminate conflicting inputs by computing expected values in local neighborhoods and using those prototypes as training vectors. This process also tends to reduce data distribution effects on network training.

Forecasting

• The first task in building a forecasting model is to evaluate the optimal forecast distance into the future. One method for obtaining this estimate is described in Neural Networks and Financial Forecasting (Jurik Research, 1996).

• Neural networks (NN) are more powerful than classical (linear) regression methods. Neural networks are divided into two classes: those that employ nonlinear regression modeling (Perceptron cells) and those that build a library of examples (template cells). The former class is constructed in such a way as to minimize a sum of squared error or regression criteria. As a practical matter, this class works best when all input data is normally distributed and has zero mean and unit variance.



Advanced Indicators and Forecasting

• Fuzzy logic (FL) is a technique that offers a practical way to transfer the value judgments and wisdom of the user to a quantitative model. The user need not know the math to create, tune, or use a set of fuzzy rules. In some cases, the user can program rules in words or sentences.

• Genetic algorithms (GAs) offer a way to optimize extremely complex systems. This compares favorably to most traditional optimization methods, such as "hill-climbing" techniques, because these cannot handle discontinuities typically present in complex systems.

• Use NN, FL, and GA technology to either forecast price action and/or detect patterns in the data stream.

• When using them for forecasting, it is best to make your target data as stationary as possible, over the entire training set. Forget about predicting price; price has no stable mean, it wanders all over. Instead, consider the percent rate of change (PROC) of price, or the PROC of the highest high (and lowest low) of the next N price bars. The latter gives you a stable price channel forecast.

• When using them for pattern recognition, you will get best results training one model per price pattern. Use an equal number of "pattern found" and "pattern not found" examples. Vary the size and shape of the pattern to be detected. Use features to describe the price pattern that will remain unaffected when volatility or price is either high or low. Finding good invariant features is a difficult skill to master. I suggest you hire a professional for this task; overall you will save yourself time and money. For an up-to-date list of Software Consultants, see Appendix C.

• Verify your systems parameters are robust by determining how well performance is maintained when the parameters are adjusted up or down by 10 percent. A fragile (and therefore unworkable) system will quickly lose performance under less than optimal parameter settings.

• There may be occasions when you do not have thousands of data points for model testing. For this situation, consider using resampling, leave-out testing, and bootstrap methods. For additional information on these methods, see Computer Systems That Learn (Weiss & Kulikowski, 1991, Morgan-Kaufmann Publishing).



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [ 123 ] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]