References | Links

 
Parallel Data Mining  

Parallel processing speeds up the work of decision-support systems such as data mining by dividing a complex query into multiple parts and assigning each part to a separate processor. Parallel algorithms of data mining and parallel systems ( such as symmetric multiprocessing (SMP) and massively parallel processing (MPP). SMP systems share a common memory among clusters of machines. MPP systems are often called "shared nothing" or distributed-memory systemsare) are well suited to applications with lumpy data , which is heavily used for data mining queries.

Developing data mining algorithms, combining with parallel methods, we can implement parallel data mining methods for :

  • Association rules mining
  • Classification and regression: e.g. decision trees, neural nets, etc.
  • Clustering
  • Text Mining
  • Web Mining
  • Data visualization
  • Bayesian approaches
  • Genetic programming
  • Statistical inference
 
   
References  
[1] A. A. Freitas and S. H. Lavington, Mining Very Large Databases With Parallel Processing, Kluwer Academic Publishers, 1998.
[2] T. Shintani, M. Kitsuregawa, Parallel Mining Algorithms for Generalized Association Rules with Classfication Hierarchy, SIGMOD 98, Seattle, WA, 1998.
[3] Xiaowei Xu, Jochen Jager, and Hans-Peter Kriegel, A Fast Parallel Clustering Algorithm for Large Spatial Databases. Data Mining and Knowledge Discovery, Volume 3, No. 3, September 1999.
[4] David W. Cheung and Yongqiao Xiao. Effect of Data Distribution in Parallel Mining of Associations. Data Mining and Knowledge Discovery, Volume 3, No. 3, September 1999.
[5] Y. Xiang and T. Chu. Parallel Learning of Belief Networks in Large and Difficult Domains Data Mining and Knowledge Discovery, Volume 3, No. 3, September 1999.
[6] D. W. Cheung, V. T. Ng, A. W. Fu and Y. Fu, Efficient Mining of Association Rules in Distributed Databases, IEEE Trans, on Knowledge and Data Engineering, Vol. 8, Dec. 1996.
[7] D. W. Cheung, Y. Xiao, Effect of Data Distribution in Parallel Mining of Associations, Data Mining and Knowledge Discovery, Vol. 3, 1999.
[8] A. Srivastava, E.-H. Han, V. Kumar, and V. Singh. Parallel formulations of decision-tree classification algorithms. Data Mining and Knowledge Discovery: An International Journal, 3(3):237-261, September 1999.
[9] J. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In Proc. of the 22nd VLDB Conference, 1996.
[10] M. J. Zaki, C. Ho and R. Agrawal, "Parallel Classification for Data Mining on Shared-Memory Multiprocessors"
[11] R. D. Lawrence, et al, A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems, Data Mining and Knowledge Discovery 3, Kluwer Academic Publishers, Netherlands, pp. 171-195, 1999.
 
   
Links  
 
   
 

Copyright © 2003 Huaiguo Fu