Association Rule Mining


Algorithms \| References

Let I={ i₁, i₂, ..., i_n } be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that T belongs to I. Note that the quantities of items bought in a transaction are not considered, meaning that each item is a binary variable representing if an item was bought. Each transaction is associated with an identifier, called TID. Let X be a set of items. A transaction T is said to contain X if and only if X `belongs to T . An association rule is an implication of the form X => Y , where X belongs to I, Y belongs to I and X intersects Y = empty. The rule X => Y holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y . The rule X => Y has support s in the transaction set D if s% of transactions in D contain X unions Y . Confidence denotes the strength of implication and support indicates the frequencies of the occurring patterns in the rule. It is often desirable to pay attention to only those rules which may have reasonably large support. The task of mining association rules is essentially to discover strong association rules in large databases. The problem of mining association rules is decomposed into the following two steps:

1. Discover the large itemsets, i.e., the sets of itemsets that have transaction support above a pre-determined minimum support s.
2. Use the large itemsets to generate the association rules for the database.

Algorithms

APRIORI

A-CLOSE

CLOSET

CHARM

CLOSET+

References

[1]	R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proceedings of ACM SIGMOD, pages 207-216, May 1993.
[2]	R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, pages 478-499, September 1994.
[3]	R. Srikant and R. Agrawal. Mining Generalized Association Rules. Proceedings of the 21th International Conference on Very Large Data Bases, pages 407-419, September 1995.
[4]	J.-S. Park, M.-S. Chen, and P. S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. Proceedings of ACM SIGMOD, pages 175-186, May, 1995.
[5]	R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, Montreal, Canada, June 1996.
[6]	M.S.Chen, J. Han and P.S. Yu. Data Mining: An Overview from Database Perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6) 866-883, December 1996.
[7]	M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997.
[8]	M. J. Zaki. Parallel and distributed association mining: A survey. IEEE Concurrency (Special Issue on Data Mining), December 1999.
[9]	J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), Dallas, TX, May 2000.
[10]	Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery, May 2000.
[11]	K. Gouda and M. J. Zaki. Efficiently Mining Maximal Frequent Itemsets. Proc. of the IEEE Int. Conference on Data Mining, San Jose, 2001.
[12]	Roberto Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD Conference, 1998.
[13]	R. Agarwal, C. Aggarwal and V. Prasad. A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing, 2001.
[14]	Q. Zou, W. W. Chu and B. Lu. Smartminer: A New Data Mining Method to Discover Frequent Patterns. The 2002 IEEE International Conference on Data Mining, December 2002.
[15]	Q. Zou, W. Chu, D. Johnson and H. Chiu. Pattern Decomposition Algorithm for Data Mining of Frequent Patterns. Journal of Knowledge and Information System, 2002.
[16]	N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In 7th Intl. Conf. on Database Theory, January 1999.
[17]	Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pages 181-192, Seattle, Washington, July 1994
[18]	D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. In Intl. Conf. on Data Engineering, Apr. 2001.
[19]	R. J. Bayardo Jr. and R. Agrawal, "Mining the Most Interesting Rules" In Proc. of the 5th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, August 1999.
[20]	R.J. Bayardo Jr., R. Agrawal, D. Gunopulos: "Constraint-Based Rule Mining in Large, Dense Databases", Proc. of the 15th Int'l Conf. on Data Engineering, Sydney, Australia, March 1999.
[21]	N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal: Efficient mining of association rules using closed itemset lattices. Journal of Information systems, 24 (1999), 25-46.
[22]	T. Shintani, M. Kitsuregawa, Parallel Mining Algorithms for Generalized Association Rules with Classfication Hierarchy, SIGMOD 98, Seattle, WA, 1998.