Untitled Document


References \| Links

Parallel Data Mining

Parallel processing speeds up the work of decision-support systems such as data mining by dividing a complex query into multiple parts and assigning each part to a separate processor. Parallel algorithms of data mining and parallel systems ( such as symmetric multiprocessing (SMP) and massively parallel processing (MPP). SMP systems share a common memory among clusters of machines. MPP systems are often called "shared nothing" or distributed-memory systemsare) are well suited to applications with lumpy data , which is heavily used for data mining queries.

Developing data mining algorithms, combining with parallel methods, we can implement parallel data mining methods for :

Association rules mining
Classification and regression: e.g. decision trees, neural nets, etc.
Clustering
Text Mining
Web Mining
Data visualization
Bayesian approaches
Genetic programming
Statistical inference

References

[1]	A. A. Freitas and S. H. Lavington, Mining Very Large Databases With Parallel Processing, Kluwer Academic Publishers, 1998.
[2]	T. Shintani, M. Kitsuregawa, Parallel Mining Algorithms for Generalized Association Rules with Classfication Hierarchy, SIGMOD 98, Seattle, WA, 1998.
[3]	Xiaowei Xu, Jochen Jager, and Hans-Peter Kriegel, A Fast Parallel Clustering Algorithm for Large Spatial Databases. Data Mining and Knowledge Discovery, Volume 3, No. 3, September 1999.
[4]	David W. Cheung and Yongqiao Xiao. Effect of Data Distribution in Parallel Mining of Associations. Data Mining and Knowledge Discovery, Volume 3, No. 3, September 1999.
[5]	Y. Xiang and T. Chu. Parallel Learning of Belief Networks in Large and Difficult Domains Data Mining and Knowledge Discovery, Volume 3, No. 3, September 1999.
[6]	D. W. Cheung, V. T. Ng, A. W. Fu and Y. Fu, Efficient Mining of Association Rules in Distributed Databases, IEEE Trans, on Knowledge and Data Engineering, Vol. 8, Dec. 1996.
[7]	D. W. Cheung, Y. Xiao, Effect of Data Distribution in Parallel Mining of Associations, Data Mining and Knowledge Discovery, Vol. 3, 1999.
[8]	A. Srivastava, E.-H. Han, V. Kumar, and V. Singh. Parallel formulations of decision-tree classification algorithms. Data Mining and Knowledge Discovery: An International Journal, 3(3):237-261, September 1999.
[9]	J. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In Proc. of the 22nd VLDB Conference, 1996.
[10]	M. J. Zaki, C. Ho and R. Agrawal, "Parallel Classification for Data Mining on Shared-Memory Multiprocessors"
[11]	R. D. Lawrence, et al, A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems, Data Mining and Knowledge Discovery 3, Kluwer Academic Publishers, Netherlands, pp. 171-195, 1999.

Links

M. J. Zaki
Parallel Datamining and Knowledge Discovery: QUAKEFINDER
London Parallel Applications Centre Database Archive
Parallel and Distributed Intelligent Systems Lab, Columbia University
The research in the Parallel and Distributed Intelligent Systems lab focuses on applications of high performance computing and communications in the area of knowledge-based systems and database inference.
Scalable Unix Commands for Parallel Computers (http://www.mcs.anl.gov/home/lusk/ptools/)
PVM at UTK (http://netlib2.cs.utk.edu/pvm)
MPI Standard site (http://www.mcs.anl.gov/mpi/index.html)
EPCC (http://www.epcc.ed.ac.uk/epcc-tec/documents.html), on MPI, HPF, Parallel Tools Consortium, and Performance Analysis Tools for Parallel Programs
IBM MPL (http://www.mhpcc.edu/training/workshop/html/mpl/MPLIntro.html)
Distributed Algorithms and/or Distributed Systems
(http://www.cwi.nl/cwi/departments/AA1/distcom/distcom.html)
Sun Microsystems Lab Technical Reports
(http://www.sun.com/smli/technical-reports/index.html)
Designing and Building Parallel Programs by Ian Foster
(http://www.mcs.anl.gov/home/toonen/book/book.html).
Bibliographies on Parallel Processing
(http://liinwww.ira.uka.de/bibliography/Parallel/index.html)
Parallel Tools Projects Around the World (http://www.llnl.gov/ptools/projects.world.html).
Parallelism Bibliographies (file://unix.hensa.ac.uk/parallel/bibliographies)
Reports, Journals, and Societies Related to HPCC (http://www.netlib.org/nse/pubs.html)
OpenMP
Special Issue on Scalable Parallel and Distributed Data Mining
PVM source code Downloadable PVM source code from netlib
Introduction to programming with PVM.
CPPvm ( C Plus Plus PVM) - a C++ interface to PVM 3.4 written by Steffen Goerzig. It allows send/recv of C++ objects as well as use of distributed C++ objects, and more...
jPVM - new version 1.1.4, which works with PVM 3.4, a native methods interface to PVM for the Java (tm) platform
JPVM - is a PVM-like class library implemented in and for use with Java.