Efficient selection of inputs for artificial neural network models

Fernando, T.; Maier, H.; Dandy, G.; May, R.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/29312

Type:	Conference paper
Title:	Efficient selection of inputs for artificial neural network models
Author:	Fernando, T. Maier, H. Dandy, G. May, R.
Citation:	MODSIM 2005 International Congress on Modelling and Simulation: Modelling and Simulation Society of Australia and New Zealand, December 2005 / Andre Zerger and Robert M. Argent (eds.): pp.1806-1812
Publisher:	mssanz
Publisher Place:	http://mssanz.org.au/modsim05/authorsE-G.htm
Issue Date:	2005
ISBN:	0975840002 9780975840009
Conference Name:	International Congress on Modelling and Simulation (16th : 2005 : Melbourne, Victoria)
Editor:	Zerger, A. Argent, R.
Statement of Responsibility:	Fernando, T.M.K.G., H.R. Maier, G.C. Dandy and R. May
Abstract:	The selection of an appropriate subset of variables from a set of measured potential input variables for inclusion as inputs to model the system under investigation is a vital step in model development. This is particularly important in data driven techniques, such as artificial neural networks (ANNs) and fuzzy systems, as the performance of the final model is heavily dependent on the input variables used to develop the model. Selection of the best set of input variables is essential to being able to model the system under consideration reliably. When the available data set is high dimensional, it is necessary to select a subset of the potential input variables to reduce the number of free parameters in the model in order to obtain good generalization with finite data. The correct choice of model inputs is also important for improving computational efficiency. However, the topic of input selection is a difficult one. Real systems are generally complex and mostly associated with nonlinear processes. Consequently, the dependencies between output and input variables, as well as conditional dependencies between variables, are difficult to measure. Mutual information (MI) has been used successfully to measure the dependence between output and input variables. In contrast to the linear correlation coefficient, which often forms the basis of empirical input variable selection approaches, mutual information is capable of measuring dependencies based on both linear and nonlinear relationships, making it well suited for use with complex nonlinear systems. Partial mutual information (PMI) has been proposed in recent years as a means of measuring conditional dependencies between output and input variables (Sharma, 2000). It is a robust technique for selecting input variables for multivariate, nonlinear, complex natural systems, such as hydrological processes. The PMI approach is a stepwise input variable selection algorithm. Consequently, it is necessary to have a reliable technique to indicate whether a selected candidate variable is significant or not. The original algorithm proposed by Sharma (2000) used the bootstrap method with 100 bootstraps to obtain the 95th percentile confidence limit for the PMI. However, as pointed out by Chernick (1999), about 5,000 bootstraps are needed for simple problems and about 10,000 bootstraps for more complicated problems in order to estimate the required confidence intervals reliably. Use of such a large number of bootstraps as the stopping criterion for the PMI algorithm would decrease the computational efficiency of the algorithm significantly, probably to the point of impracticality for most realistic problems. The focus of this study is to introduce an alternative stopping criterion for PMI algorithm implementation, which is both robust and computationally efficient. As part of the proposed method, significant PMI scores are treated as outliers in the computed PMI scores. A robust outlier detection technique, the Hampel identifier (Davies and Gather, 1993), is used to evaluate the significance of selected candidate inputs. The reliability of the new technique is first investigated using two nonlinear data series where dependencies of attributes were known a priori. The new technique consistently selects the correct inputs, while being computationally efficient. The modified PMI algorithm is then applied to select inputs to forecast salinity in the River Murray at Murray Bridge, South Australia, which are used to develop an ANN model. The results obtained in this study are compared with those obtained in three previous studies which developed ANN models for the same case study. The proposed PMI algorithm identifies only 11 inputs as significant from 1323 candidate inputs. The resulting ANN model has t he smallest number of inputs when compared with the models developed in previous studies for this case study, with no loss in predictive performance.
Description (link):	http://www.mssanz.org.au/modsim05/
Published version:	http://www.mssanz.org.au/modsim05/papers/fernando.pdf
Appears in Collections:	Aurora harvest 2 Civil and Environmental Engineering publications Environment Institute publications

Files in This Item:

There are no files associated with this item.

Show full item record

Adelaide Research & Scholarship