Towards a Parallel Data Mining Toolbox

Peter Christen, Markus Hegland, Ole M. Nielsen, Stephen Roberts, Peter E. Strazdins and Irfan Altas, Efficient Data Mining: Scripting and Scalable Parallel Algorithms ,PDDM-01: 4th International Workshop on Parallel and Distributed Data Mining, in conjuction with IPDPS'2001, San Fracsisco, April 2001.

Abstract:

This paper presents our approach to data mining that allows the coupling of parallel applications with a scripting language resulting in an efficient and flexible toolbox. Parallel algorithms which are scalable both in data size and number of processors are a key issue to be able to solve the ever increasing problems in data mining. On the other hand, data mining applications should be flexible to allow interactive data exploration. By using a toolbox written in a scripting language we are able to steer parallel applications in a flexible way, thus fulfilling the needs of a data miner for fast interactive data analysis. The chosen approach is discussed and first results are presented.

Contents

Keywords

data mining, thin plate splines, additive models, scripting, wavelets, parallel linear systems, symmetric indefinite systems