
Citation: | Qiang Gao, Feng-Li Zhang, Rui-Jin Wang. Mining Frequent Sets Using Fuzzy Multiple-Level Association Rules[J]. Journal of Electronic Science and Technology, 2018, 16(2): 145-152. DOI: 10.11989/JEST.1674-862X.60408013 |
Now we are facing an increasingly wide range of data types from the original single Boolean attribute data sets to the current mixed type attribute data sets. There are many association rule algorithms based on the Boolean data and improved algorithms[1]-[3]. For association rules[4],
{(Temperature,High),(Sunny)}⇒{⟨Windspeed,Breeze⟩}. |
Researchers have applied the fuzzy theory, founded by Professor L. A. Zadeh, to data mining. Take the following records as an example to illustrate the fuzzy theory, {(Pathe horror movie, 8.9, 1), (Imagenes horror movie, 8.7, 4)}. Each record has three parts, the first part is the name of attribute’s ID, the second part is the evaluated point from people, and the third part is the frequency. If we suppose that the profit of Imagenes horror movie is much more than that of Pathe horror movie, the fact could not be neglected that there are different weights even if their frequencies are low. These attributes should have their own weights, which are the key of the method using association rules to find the potential relationship. In real world, we are usually interested in approximate description of attributes rather than pointing out the detailed description of some attributes to recommend other attributes.
In this paper, we propose an algorithm called fuzzy multiple-level association (FMA) rules. The rules of algorithm to solve non-Boolean attributes are as follows: 1) We use the approach based on weight and use the fuzzy theory to deal with the frequency of attribute and the value of attribute, respectively. 2) We use the concept hierarchy theory to encode concept trees and divide them into different levels. 3) This paper proposes the algorithm FMA that uses the improved Eclat algorithm as a basic model to deal with frequent sets called CH_Eclat. CH_Eclat uses non-frequent 2-itemset to cut down a large number of useless candidate sets. CH_Eclat also uses a method based on lost support values called Lost_Support to cut down times to count support. Therefore we can get frequent sets more fast according to the verification of experiment.
In 1993, R. Agrawal et al. first proposed the concept of association rules[5]. R. Agrawal et al. proposed a famous algorithm Apriori[6] to count the support, and J.-W. Han et al. [7] put forward the Fp-growth algorithm to solve the problem of generating a large number of candidate sets. These two methods are based on the horizontal format of data. References [8], [9], and [10] used the Apriori algorithm to mine multi-level rules. M. J. Zaki[11] proposed the Eclat algorithm to use the vertical data to store database’s records. This algorithm’s performance is generally better than horizontal format data. And many papers have improved the Eclat algorithm[12]-[14]. Reference [15] improved the Apriori algorithm to mine real data compared with the Fp-tree algorithm, but their ways of counting support were not appropriate to deal with non-Boolean data. Reference [16] discussed some algorithms in association rules mining in recent years such as fuzzy Fp-tree.
References [8], [9], and [10] proposed several methods for the multi-level association rules mining, and the fuzzy theory is applied to the association rules in [8] and [10], which describes the attributes’ frequency by using the fuzzy set. Reference [8] proposed a fuzzy multiple-level mining algorithm for items which have different minimum supports. Reference [9] presented a partition technique for the multilevel association rule mining problem, it improves the cost of I/O and central processing unit (CPU) runtime. Reference [10] also used different support values at each level as well as different membership functions for each item, it adopts a top-down progressively deepening approach to derive large sets. But these algorithms only consider the frequency of the attributes, they do not consider that some factors may also affect the mining efficiency, such as the evaluation by people or its quantitative features.
Some other researchers also concerned about quantitative rules and multi-level mining[17]-[19]. In [20], M. Kaya proposed effective mining of fuzzy multi-cross-level weighted association rules. It invented a fuzzy data cube facilitating for handling quantitative values of dimensional attributes, and as a result, mining fuzzy association rules can be used at different levels. Since it is based on the traditional Apriori mining algorithm, it inevitably costs much CPU runtime and makes it difficult to deal with massive sparse data.
The traditional Eclat algorithm uses the vertical format to find the frequency set. It uses the depth-first search strategy and bases on the concept lattice theory. And it divides the search space by the relationship of prefix equivalence, using cross-counting to count the support. In this paper, instead of applying the basic Eclat algorithm, we propose an improved algorithm CH_Eclat, which is more suitable for FMA.
Theorem 1.1. A superset of the non-frequent set is certainly a non-frequent itemset.
Theorem 1.2. Non-empty subsets of any frequent itemsets are also frequent itemsets.
1) The traditional Eclat algorithm uses the depth-first search strategy, while CH_Eclat uses the breadth-first search method to connect attribute sets. We fully use Apriori’s transcendental nature to prune the candidate sets by non-frequent 2-itemset. And non-frequent 2-itemset uses the vertical format to store the relationship of attributes. According to Theorem 1.1, non-frequent 2-itemset will greatly cut down the generation of non-frequent k-itemsets (k >2). For example, a transaction database Z has six attributes
{{A,B},{A,D},{B,E},{C,F},{D,F}} |
where
2) Combined with [21], using a missing threshold called Lost-Support based on cross-counting computing. The cross-counting method reduces the times of comparisons. For example, this attribute set {A, B}, pointer p1 points to A’s first data and pointer p2 points to B’s first data. Firstly, we compare the value of (*p1) with the value of (*p2), judging that if (*p1) is equal to (*p2). If they are equal, then p1=p1+1 and p2=p2+1. If they are not equal and (*p1) > (*p2), then p2=p2+1 and preparing the next comparison operation, where * is the pointer. In this method, the operation times are determined by
So we can make full use of Lost_Support to reduce unnecessary comparisons. For Table. 1, we get
Attribute | Record ID |
A | 1, 2, 4, 5, 7, 8, 9 |
B | 1, 3, 4, 6, 7, 10, 12 |
Lost_Support(A)=A.Length−min_suppLost_Support(B)=B.Length−min_supp. |
When the value of A is not matched to that of B, the Lost_Support of A will be cut down by 1, so as B. If any Lost_Support is less than 0, we will break the counting and delete this relationship {A, B}, and the operation of counting is shown in Fig. 2.
The CH_Eclat algorithm is descripted as Table 2.
CH_Eclat algorithm |
Input: |
D is the set of the transactions; |
min_supp is the minimum support threshold. |
Output: |
Frequent_Sets F. |
Description: |
H=Convert_Vertical (D);//Convert to vertical data |
F1=first_Frequent(H, min_supp); |
for(k=2; Fk–1 ≠∅; k++) do begin |
if(k=2)//Generate 2_non-frequent set to help cut down |
norelation_Set=generate_norelation(F1); |
end if |
for each Item-Set t∈ F k–1 do begin |
if(t ∉ norelation_Set && (count_supp(t)>=min_supp)) |
//count_supp() use cross count method |
Ck=add_Frequent(t); |
end if |
end for |
Fk=Ck;//Get next frequent set |
end for |
Answer=∪Fk//get all frequent sets |
Definition 2.1. A database
TIDj={id,(tj1,fj1,trj1),(tj2,fj2,trj2),⋯,(tjm,fjm,trjm)} |
Each record has a unique tag named id.
Definition 2.2. Set fuzzy record about
The proposed algorithm FMA use the hierarchical coding method to get the coding tree and fuzzy set to deal with frequency of attributes and attributes’ value, with which we can get frequent sets. The concrete steps are shown as following:
Step 1. Concept hierarchy
The concept is usually expressed in the form of a hierarchical structure tree, each node is representing for a concept, the root node ANY is a special node. A given multi-dimensional attribute has many functions, including constructing many views, generating concept hierarchy, explaining the partial order and total order of attributes by patterns, and defining the construction of hierarchical tree. There are many ways to construct the tree, we can divide concept by data set’s own feature and hierarchical model.
First, we should encode the concept from the concept tree. And we can use (1) to calculate the concept’s code value,
ω=10∗(Ω−1)+i | (1) |
where Ω is the hierarchical level, i is the position of the current level of the layer to its ancestor node, the value of attribute coding is incremented by one from left to right, and
Fig. 3 shows a simple case.
After using (1), the hierarchy coded tree is gotten, as shown Fig. 4.
Step 2. Convert the frequency of attribute to fuzzy set
First, define the membership functions of attributes’ frequencies, then membership functions are set according to the results of root node’s sub node classification. The corresponding frequent fuzzy set of
fjp→{Rp1,Rp2,⋯,Rps}(s=1,2,⋯,k), |
forming as
μjp(Rp1)Rp1+μjp(Rp2)Rp2+⋯+μjp(Rps)Rps. |
Step 3. Get representative fuzzy domain for frequency
Obtain the statistics of the number from every item fuzzy set
max_Countjp=MAX{Countp1,Countp2,⋯,Countps}. | (2) |
We will get
Step 4. Dealing with value of attribute
Define the membership function of attributes’ value, divide into g attribute areas, and calculate their membership
forming as
λjp(Cp1)Cp1+λjp(Cp2)Cp2+⋯+λjp(Cph)Cph. |
Delete the fuzzy set that is not belong to certain conditions and
Step 5. Add the weight coefficients
Calculate every fuzzy attribute’s final support supp(
Weight={w1,w2,⋯,wm},supp(fuzz_frepmax,Cph)=μjp(Rps)∗λjp(Cph)∗wp. |
Finally, we will get the handled database which we need.
Step 6. Getting frequent itemsets by CH_Eclat.
Defining the minimum support as min_supp, calculate every fuzzy attribute’s support. And we know every value of fuzzy attribute is between [0, 1], because it ensures the downward closed property. According to [22], we do not need to calculate the ancestor nodes’ supports, because we can get them from their children. Finally, we will get the frequent item set.
In this part, we show an example of films to explain the application of FMA. In Table 3, the first of item is something which users buy, the second is the purchase quantity, and the third is the score of its evaluation. We divide films into three parts: Movies, serials, and documentaries. And each type has its specific attribute. So we make them have their own membership functions according to the frequency of every attribute. Finally, we can make data mining deeply by our algorithm. The original database is showed in Table 3.
TID | Items |
D1 | (Pathe horror movie,1,8.9) (Imagenes horror movie,4,8.7) (Dmci story serial,4,8.0) (LS story serial,6,9.3)(Mia-max wild life documentary,7,5.6) |
D2 | (Pathe horror ,movie,3.8.0) (Imagenes horror movie,3,8.2) (Gaumont social movie,1,9.0) (Dmci comic serial,5,7.8) (LS comic serial,3,6.9) (Pathe scientific documentary,4,7.0) (Mia-max scientific documentary,4,8.0) |
D3 | (Dmci story serial,7,6.9) (Dmci comic serial,8,8.8) (LS wild life documentary,5,5.8) (Pathe scientific documentary,7,7.6) |
D4 | (Pathe horror movie,2,4.3) (Dmci story serial,5,5.6)(LS wild life documentary,5,8.8) |
D5 | (Dmci story serial,5,6.6)(LS comic serial,4,7.4) |
D6 | (Pathe horror movie,3,8.2) (Imagenes horror movie,10,8.6) |
Step 1. First, we set the relationship between a schema
<r, k> | 111 | 112 | 121 | 122 | 211 | 212 | 221 | 222 | 311 | 321 | 322 | 11 | 12 | 21 | 22 | 31 | 32 | 1 | 2 | 3 |
W | 1 | 0.6 | 0.5 | 0.8 | 0.6 | 0.5 | 0.8 | 0.5 | 0.6 | 0.6 | 0.8 | 1 | 0.8 | 0.7 | 0.8 | 0.8 | 0.9 | 1 | 0.8 | 0.9 |
We get the stratification model from the original database, and then the concept tree and coded tree will be gotten as shown in Fig. 5, and Fig. 6, respectively.
Step 2. We predefine their membership functions according to goods’ quantity and attributes classification, which are showed in Figs. 7, 8, and 9.
Then we convert the fuzzy frequent attribute according to predefining membership functions. We only show some of the data in Table 5.
TID | Items |
D1 | (1<111,Low>)(0.4<112,Low>+0.6<112,Middle>)(0.6<211,Low>+0.4<211,Middle>)(1<311,High>)(0.2<11,Low>+0.8<11,Middle>)(0.4<21,Middle>+0.6<21,High>)(1<31,High>)(0.2<1,Low>+0.8<1,Middle>)(0.4<2,Middle>+0.6<2,High>)(1<3,High>) |
… | … |
D6 | (0.6<111,Low>+0.4<111,Middle>)(0.2<112,Middle>+0.8<112,High>)(1<11,High>)(1<1,High>) |
Step 3. Calculate all of the transactions and find every item’s maximum cardinal number. For example, as the first transaction, we can count the number of {(111, Low)}, Count111. Low=1+0.6+0.8+0.6=2.4, which is the maximum cardinal number. After counting, we will get the preprocessing the fuzzy candidate 1-itemset as Table 6.
Fuzzy attribute | Support |
111. Low | 2.4 |
112. Middle | 1.2 |
11. Middle | 2 |
121. Low | 1 |
12. Low | 1 |
1. Middle | 1.8 |
211. Middle | 2.6 |
212. Middle | 0.8 |
… | … |
3. High | 3.5 |
Step 4. We construct the fuzzy concept of every attribute by score, thus we can set the fuzzy concept {Very Good, Common, Bad}. It means that the film is good, or it is common, or I do not like this film. The membership is showed in Fig. 10.
We use (1) to convert the score to the vertical format:
scorei=∑scoreji+1||j|| | (3) |
where i means the level of attribute, j means the child node number, and ||j|| means the number of child node. It calculates the higher level attribute’s score, getting Table 7.
1 | 2 | 3 | 4 | 5 | 6 | |
PHM(111) | 8.9 | 8.0 | / | 4.3 | / | 8.2 |
IHM(112) | 8.7 | 8.2 | / | / | / | 8.6 |
HM(11) | 8.8 | 8.1 | / | 4.3 | / | 8.4 |
GSM(121) | / | 9.0 | / | / | / | / |
M(1) | 8.8 | 8.6 | / | 4.3 | / | 8.4 |
DCS(221) | / | 7.0 | 8.8 | / | / | / |
… | … | … | … | … | … | … |
D(3) | 5.6 | 7.5 | 6.7 | 8.8 | / | / |
Step 5. The details of this step will be shown in the following.
1) Convert Table 7 to the membership table according to Fig. 10 and exclude the item that the sum of membership is zero. We can get Table 8.
Attribute | User ID | ||||||
1 | 2 | 3 | 4 | 5 | 6 | ||
PHM(111. Low) | V | 1 | 0 | / | 0 | / | 0.4 |
C | 0 | 1 | / | 1 | / | 0.6 | |
IHM(112. Middle) | V | 1 | 0.4 | / | / | / | 1 |
C | 0 | 0.6 | / | / | / | 0 | |
HM(11. Middle) | V | 1 | 0.2 | / | 0 | 0.8 | |
C | 0 | 0.8 | / | 0 | / | 0.2 | |
B | 0 | / | / | 1 | / | 0 | |
GSM(121. Low) | V | / | 1 | / | / | / | / |
SM(12. Low) | V | / | 1 | / | / | / | / |
M(1. Middle) | V | 1 | 1 | / | 0 | / | 0.8 |
C | 0 | 0 | / | 0 | / | 0.2 | |
B | 0 | 0 | / | 1 | / | 0 | |
… | … | … | … | … | … | … | … |
D(3. High) | V | 1 | 0 | 0 | 1 | / | / |
C | 0 | 1 | 0.4 | 0 | / | / | |
B | 0 | 0 | 0.6 | 0 | / | / | |
2) Combining with the fuzzy support in Table 5, the support of quantitative value of the fuzzy attribute is found as Table 9.
1 | 2 | 3 | 4 | 5 | 6 | ||
111. Low | V | 1*1 | 0 | / | 0 | / | 0.4*0.6 |
C | 0 | 1*0.6 | / | 1*0.8 | / | 0.6*0.6 | |
112. Middle | V | 1*0.6 | 0.4*0.4 | / | / | / | 1*0.2 |
C | 0 | 0.6*0.4 | / | / | / | 0 | |
11. Middle | V | 1*0.8 | 0.2*1 | / | 0 | 0.8*0.4 | |
C | 0 | 0.8*1 | / | 0 | / | 0.2*0.4 | |
B | 0 | / | / | 1*0.2 | / | 0 | |
121. Low | V | / | 1*1 | / | / | / | / |
12. Low | V | / | 1*1 | / | / | / | / |
… | … | … | … | … | … | … | … |
3. High | V | 1*1 | 0 | 0 | 1*0.5 | / | / |
C | 0 | 1*1 | 0.4*1 | 0 | / | / | |
B | 0 | 0 | 0.6*1 | 0 | / | / | |
3) According to Tables 9 and 4, we multiply the weight. Then we get the final support of every fuzzy attribute as Table 10.
1 | 2 | 3 | 4 | 5 | 6 | supp | ||
111. Low | V | 1 | 0 | / | 0 | / | 0.24 | 1.24 |
C | 0 | 0.6 | / | 0.8 | / | 0.36 | 1.76 | |
112. Middle | V | 0.6 | 0.16 | / | / | / | 0.2 | 0.58 |
C | 0 | 0.24 | / | / | / | 0 | 0.14 | |
11. Middle | V | 0.8 | 0.2 | / | 0 | 0.32 | 1.32 | |
C | 0 | 0.8 | / | 0 | / | 0.08 | 0.88 | |
B | 0 | / | / | 0.2 | / | 0 | 0.2 | |
… | … | … | … | … | … | … | … | … |
321. Middle | C | / | 1 | / | / | / | / | 0.6 |
322. High | C | / | 1 | 1 | / | / | / | 1.6 |
32. High | C | / | 1 | 1 | / | / | / | 1.8 |
3. High | V | 1 | 0 | 0 | 0.5 | / | / | 1.35 |
C | 0 | 1 | 0.4 | 0 | / | / | 1.26 | |
B | 0 | 0 | 0.6 | 0 | / | / | 0.54 | |
Step 6. Set min_supp=0.8, delete {Bad} evaluation of fuzzy attribute, we will get 1-frequent item set as Table 11.
1 | 2 | 3 | 4 | 5 | 6 | supp | ||
111. Low | V | 1 | 0 | / | 0 | / | 0.24 | 1.24 |
C | 0 | 0.6 | / | 0.8 | / | 0.36 | 1.76 | |
11. Middle | V | 0.8 | 0.2 | / | 0 | 0.32 | 1.32 | |
C | 0 | 0.8 | / | 0 | / | 0.08 | 0.88 | |
12. Low | V | / | 0.8 | / | / | / | / | 0.8 |
1. Middle | V | 0.8 | 0.8 | / | 0 | / | 0.8 | 2.4 |
211. Middle | C | 0.24 | / | 0.48 | 0 | 0.07 | / | 0.8 |
22. Middle | C | / | 0.64 | 0 | / | 0.48 | / | 1.12 |
2. Middle | C | 0 | 0.64 | 0 | 0 | 0.48 | / | 1.12 |
31. High | V | 0.8 | / | 0 | 0.4 | / | / | 1.2 |
322. High | C | / | 0.8 | 0.8 | / | / | / | 1.6 |
32. High | C | / | 0.9 | 0.9 | / | / | / | 1.8 |
3. High | V | 0.9 | 0 | 0 | 0.45 | / | / | 1.35 |
C | 0 | 0.9 | 0.36 | 0 | / | / | 1.26 | |
The 2-frequent itemset mining: To avoid redundancy, the child node should not connect to its ancestor node, for example, node 111 should not connect to 11, 12, and 1. The minimum fuzzy attribute set can be generated as the new set from two fuzzy attributes since every attribute set connects to another attribute set. The final mining results are shown in Table 12. In Table 12, item
1 | 2 | 3 | 4 | 5 | 6 | supp | ||
111. Low31. High | VV | 0.8 | / | / | / | / | / | 0.8 |
111. Low3. High | VV | 0.9 | / | / | / | / | / | 0.9 |
12. Low322. High | VC | / | 0.8 | / | / | / | / | 0.8 |
12. Low32. High | VC | / | 0.8 | / | / | / | / | 0.8 |
12. Low3. High | VC | / | 0.8 | / | / | / | / | 0.8 |
In this paper, we propose the FMA algorithm which is based on the CH_Eclat algorithm. We use the CH_Eclat model to count the support of database, and the performance of getting support does well affect on the effectiveness of the overall data mining. We improve the original Eclat algorithm to adapt our FMA algorithm. We use three kinds of standard data set to test our CH_Eclat algorithm and get the performance of it. The algorithm test lab environment is: CPU is AMD A8-7650K Radeon R7, 10 Compute Cores4C+6G 3.30GHz, 8G Memory, and Windows 7 OS. The CH_Eclat algorithm uses JAVA language and debugs in the Eclipse development platform. We set different support to get their run time. The console of experiment is showed in Table 13.
Datasets | Z | N | T |
T10I4D100K | 100 000 | 1 000 | 10 |
Retail | 88162 | 16470 | 10.3 |
Mushroom | 8124 | 119 | 23 |
In Figs. 11, 12, and 13, the CH_Eclat algorithm is compared with the standard Eclat algorithm and algorithm proposed by Z.-Y. Xiong[12] named as h_Eclat. In data set retail, the standard Eclat’s time is overflow, so we do not line it. After comparing, we find CH_Eclat has the better performance especially for the large data set and sparse data. We can quickly find the association rules with the FMA algorithm because of better support from CH_Eclat.
We verified the validity of FMA according to the real data about the Beijing logistics route. It has 4799 logistics routes and uses different logistics modes to send their goods. We will mine the frequent route line, get potential information through peoples’ comments, and then give scores. Through the processing of the original data, the logistics send to 32 regions including several provinces, autonomous regions, and Hong Kong. It uses six types of different logistics transportation tools to send the goods to 362 municipal areas, in other words, 2453 counties in all. Due to the sparse feature of the county logistics route data, it is not conducive to analyze the frequent routes. According to the regional features and FMA, we use the province-city-county order to construct multi-level mining. With the support of the logistics routes among different areas, we can find the frequent levels of different routes, and the result is shown in Fig. 14. We find that the route from Beijing to Sichuan is the most frequent. A lot of goods sent to Sichuan from Yanqing District, Beijing.
According to the different transportation tools used in logistics, we are able to find out the frequent levels of the use of different logistics transportation tools. The result is showed in Fig. 15. We find that a lot of goods sent to Yunan Province and Sichuan Province by Xingtie, and we also find that the goods sent to Hebei Province can get good comments by Xingtie.
After getting the frequent rules, we analyze the performance of the algorithm compared with M. Kaya’s method[20] which is based on Apriori. We set it as B_Apriori showing in Fig. 16. The result shows that the proposed algorithm has higher efficiency.
In this paper, we proposed an algorithm of multi-level quantitative attributes by using the weight and combining with fuzzy theory. It is used to deal with different weights and different frequencies. With this method, we improved the data mining and got potential rules. This algorithm is much closer to the real environment. So we can do data mining by setting different conditions to obtain the target information. Our example well explained the usage of this algorithm and generated the potential rules. This algorithm is also fit for the mixed type attributes including the Boolean attribute. The good performance has been demonstrated through the data of the Beijing logistics route. It can quickly tap the frequent route. In the future, we will explore the applications of the fuzzy theory and association rules. It will promote the development of association rules.
[1] |
R. Chang and Z.-Y. Liu, " An improved apriori algorithm,” in Proc. of 2011 IEEE Intl. Conf. on Electronics and Optoelectronics, 2011, pp. 476-478.
|
[2] |
Z.-X. Hong and F.-L. Bian, " An improved apriori algorithm based on support count matrix,” Geomatics and Information Science of Wuhan University, vol. 33, no. 12, pp. 1246-1249, 2008.
|
[3] |
J. Dongre, G. L. Prajapati, and S. V. Tokekar, " The role of Apriori algorithm for finding the association rules in Data mining,” in Proc. of IEEE Intl. Conf. on Issues and Challenges in Intelligent Computing Techniques, 2014, pp. 657-660.
|
[4] |
B.-Q. Hu, Fuzzy Theory Foundation, Wuhan University Press, Jun. 2010, ch. 1, pp. 1-10.
|
[5] |
R. Agrawal, T. Imielinksi, and A. Swami, " Mining associations between sets of items in massive databases,” in Proc. of 1993 ACM SIGMOD Conf. on Management of Data, Washington, D.C., 1993, pp. 207-216 .
|
[6] |
R. Agrawal and R. Srikan, " Fast algorithms for mining association rules in lager databases,” Journal of Computer Science and Technology, vol. 15, no. 6, pp. 619-624, 2000.
|
[7] |
J.-W. Han, J. Pei, and Y.-W. Yin, " Mining frequent patterns without candidate generation,” in Proc. of ACM SIGMOD Intl. Conf. on Management of Data, 2000, pp. 1-12.
|
[8] |
Y. C. Lee, T. P. Hong, and T. C. Wang, " Multi-level fuzzy mining with multiple minimum supports,” Expert Systems with Applications, vol. 34, no. 1, pp. 459-468, Jan. 2008.
|
[9] |
P. Gautam, and K. R. Pardasani, " Efficient method for multiple-level association rules in large databases,” Journal of Emerging Trends in Computing and Information Sciences, vol. 2, no. 12, Dec. 2011.
|
[10] |
A. M. N. Kousari, S. J. Mirabedini, and E. Ghasemkhani, " Improvement of mining fuzzy multiple-level association rules from quantitative data,” Journal of Software Engineering and Application, vol. 5, no. 3, pp. 190-199, Mar. 2012.
|
[11] |
M. J. Zaki, " Scalable algorithms for association mining,” IEEE Trans. on Knowledge and Data Engineering, vol. 12, no. 3, pp. 372-390, 2000.
|
[12] |
Z.-Y. Xiong, P.-E. Chen, and Y.-F. Zhang, " Improvement of Eclat algorithm for association rules based on Hash Boolean matrix,” Application Research of Computers, vol. 27, no. 4, pp. 1323-1325, 2010.
|
[13] |
Y.-F. Zhang, Z.-Y. Xiong, X.-F. Geng, and J.-M. Chen, " Analysis and improvement of Eclat algorithm,” Computer Engineering, vol. 36, no. 23, pp. 28-30, 2010.
|
[14] |
X.-M. Yu and H. Wang, " Improvement of Eclat algorithm based on support in frequent item set mining,” Journal of Computers, vol. 9, no. 9, pp. 2116-2123, 2014.
|
[15] |
T.-P Han and D.-M. Wang, " An applied research based on improved association rules,” in Proc. of the 5th Intl. Conf. on Information Science and Technology, 2015, pp. 594-598.
|
[16] |
S. K. Solanki and J. T. Patel, " A survey on association rule mining,” in Proc. of Intl. Conf. on Advanced Computing and Communication Technologies, 2015, pp. 212-216.
|
[17] |
T. Yang and C. Li, " A study of fuzzy quantitative items based on weighted association rules mining,” in Proc. of the 2nd Intl. Conf. on Intelligent Computing and Cognitive Informatics, 2015, pp. 42-46.
|
[18] |
T. Hashem, C. F. Ahmed, M. Samiullah, S. Akther, B. S. Jeong, and S. Jeon, " An efficient approach for mining cross-level closed itemsets and minimal association rules using closed itemset lattices,” Expert Systems with Applications, vol. 41, no. 6, pp. 2914-2938, 2014.
|
[19] |
C.-J. Li and T.-Q. Yang, " Effective mining of fuzzy quantitative weighted association rules,” in Proc. of Intl. Conf. on E-business and E-government, 2010, pp. 1418-1421.
|
[20] |
M. Kaya and R. Alhajj, " Effective mining of fuzzy multi-cross-level weighted association rules,” in Proc. of Intl. Conf. on Foundations of Intelligent Systems, 2006, pp. 399-408.
|
[21] |
P.-E. Feng, Y. Liu, Q.-Y. Qiu, and L.-X Li, " Strategies of efficiency improvement for eclat algorithm,” Journal of Zhejiang University, vol. 47, no. 2, pp. 223-230, Feb. 2013.
|
[22] |
J.-H. Cheng and P.-F. Shi, " Fast mining multiple-level association rules,” Chinese Journal of Computers, vol. 21, no. 11, pp. 1037-1041, Nov. 1998.
|
1. | Geng, X., Yang, Z., Jiao, L. et al. Association rule-based classification: A comprehensive review of methodologies and applications. Expert Systems with Applications, 2025. DOI:10.1016/j.eswa.2025.127454 |
2. | Wang, X.N., Dong, G.M. DESIGN OF SPORTS TRAINING REAL-TIME DATA ANALYSIS SYSTEM BASED ON RECONFIGURABLE DATA MINING TECHNOLOGY. Revista Internacional de Medicina y Ciencias de la Actividad Fisica y del Deporte, 2024, 24(96): 68-85. DOI:10.15366/rimcafd2024.96.005 |
3. | Lei, Q.. Design of an Instant Data Analysis System for Sports Training Based on Data Mining Technology. International Journal of Web-Based Learning and Teaching Technologies, 2023, 18(2) DOI:10.4018/IJWLTT.330991 |
4. | Dahab, A.A.-A., Haggag, R.M., Fotouh, S.A.-A. Enhancing Customer Relationship Management Using Fuzzy Association Rules and the Evolutionary Genetic Algorithm. International Journal of Advanced Computer Science and Applications, 2023, 14(4): 639-649. DOI:10.14569/IJACSA.2023.0140470 |
5. | Hong, X.. Application of data mining technology in software engineering. Journal of Physics: Conference Series, 2021, 2066(1): 012013. DOI:10.1088/1742-6596/2066/1/012013 |
6. | Hsu, P.-Y., Huang, C.-W. IECT: A methodology for identifying critical products using purchase transactions. Applied Soft Computing Journal, 2020. DOI:10.1016/j.asoc.2020.106420 |
Attribute | Record ID |
A | 1, 2, 4, 5, 7, 8, 9 |
B | 1, 3, 4, 6, 7, 10, 12 |
CH_Eclat algorithm |
Input: |
D is the set of the transactions; |
min_supp is the minimum support threshold. |
Output: |
Frequent_Sets F. |
Description: |
H=Convert_Vertical (D);//Convert to vertical data |
F1=first_Frequent(H, min_supp); |
for(k=2; Fk–1 ≠∅; k++) do begin |
if(k=2)//Generate 2_non-frequent set to help cut down |
norelation_Set=generate_norelation(F1); |
end if |
for each Item-Set t∈ F k–1 do begin |
if(t ∉ norelation_Set && (count_supp(t)>=min_supp)) |
//count_supp() use cross count method |
Ck=add_Frequent(t); |
end if |
end for |
Fk=Ck;//Get next frequent set |
end for |
Answer=∪Fk//get all frequent sets |
TID | Items |
D1 | (Pathe horror movie,1,8.9) (Imagenes horror movie,4,8.7) (Dmci story serial,4,8.0) (LS story serial,6,9.3)(Mia-max wild life documentary,7,5.6) |
D2 | (Pathe horror ,movie,3.8.0) (Imagenes horror movie,3,8.2) (Gaumont social movie,1,9.0) (Dmci comic serial,5,7.8) (LS comic serial,3,6.9) (Pathe scientific documentary,4,7.0) (Mia-max scientific documentary,4,8.0) |
D3 | (Dmci story serial,7,6.9) (Dmci comic serial,8,8.8) (LS wild life documentary,5,5.8) (Pathe scientific documentary,7,7.6) |
D4 | (Pathe horror movie,2,4.3) (Dmci story serial,5,5.6)(LS wild life documentary,5,8.8) |
D5 | (Dmci story serial,5,6.6)(LS comic serial,4,7.4) |
D6 | (Pathe horror movie,3,8.2) (Imagenes horror movie,10,8.6) |
<r, k> | 111 | 112 | 121 | 122 | 211 | 212 | 221 | 222 | 311 | 321 | 322 | 11 | 12 | 21 | 22 | 31 | 32 | 1 | 2 | 3 |
W | 1 | 0.6 | 0.5 | 0.8 | 0.6 | 0.5 | 0.8 | 0.5 | 0.6 | 0.6 | 0.8 | 1 | 0.8 | 0.7 | 0.8 | 0.8 | 0.9 | 1 | 0.8 | 0.9 |
TID | Items |
D1 | (1<111,Low>)(0.4<112,Low>+0.6<112,Middle>)(0.6<211,Low>+0.4<211,Middle>)(1<311,High>)(0.2<11,Low>+0.8<11,Middle>)(0.4<21,Middle>+0.6<21,High>)(1<31,High>)(0.2<1,Low>+0.8<1,Middle>)(0.4<2,Middle>+0.6<2,High>)(1<3,High>) |
… | … |
D6 | (0.6<111,Low>+0.4<111,Middle>)(0.2<112,Middle>+0.8<112,High>)(1<11,High>)(1<1,High>) |
Fuzzy attribute | Support |
111. Low | 2.4 |
112. Middle | 1.2 |
11. Middle | 2 |
121. Low | 1 |
12. Low | 1 |
1. Middle | 1.8 |
211. Middle | 2.6 |
212. Middle | 0.8 |
… | … |
3. High | 3.5 |
1 | 2 | 3 | 4 | 5 | 6 | |
PHM(111) | 8.9 | 8.0 | / | 4.3 | / | 8.2 |
IHM(112) | 8.7 | 8.2 | / | / | / | 8.6 |
HM(11) | 8.8 | 8.1 | / | 4.3 | / | 8.4 |
GSM(121) | / | 9.0 | / | / | / | / |
M(1) | 8.8 | 8.6 | / | 4.3 | / | 8.4 |
DCS(221) | / | 7.0 | 8.8 | / | / | / |
… | … | … | … | … | … | … |
D(3) | 5.6 | 7.5 | 6.7 | 8.8 | / | / |
Attribute | User ID | ||||||
1 | 2 | 3 | 4 | 5 | 6 | ||
PHM(111. Low) | V | 1 | 0 | / | 0 | / | 0.4 |
C | 0 | 1 | / | 1 | / | 0.6 | |
IHM(112. Middle) | V | 1 | 0.4 | / | / | / | 1 |
C | 0 | 0.6 | / | / | / | 0 | |
HM(11. Middle) | V | 1 | 0.2 | / | 0 | 0.8 | |
C | 0 | 0.8 | / | 0 | / | 0.2 | |
B | 0 | / | / | 1 | / | 0 | |
GSM(121. Low) | V | / | 1 | / | / | / | / |
SM(12. Low) | V | / | 1 | / | / | / | / |
M(1. Middle) | V | 1 | 1 | / | 0 | / | 0.8 |
C | 0 | 0 | / | 0 | / | 0.2 | |
B | 0 | 0 | / | 1 | / | 0 | |
… | … | … | … | … | … | … | … |
D(3. High) | V | 1 | 0 | 0 | 1 | / | / |
C | 0 | 1 | 0.4 | 0 | / | / | |
B | 0 | 0 | 0.6 | 0 | / | / | |
1 | 2 | 3 | 4 | 5 | 6 | ||
111. Low | V | 1*1 | 0 | / | 0 | / | 0.4*0.6 |
C | 0 | 1*0.6 | / | 1*0.8 | / | 0.6*0.6 | |
112. Middle | V | 1*0.6 | 0.4*0.4 | / | / | / | 1*0.2 |
C | 0 | 0.6*0.4 | / | / | / | 0 | |
11. Middle | V | 1*0.8 | 0.2*1 | / | 0 | 0.8*0.4 | |
C | 0 | 0.8*1 | / | 0 | / | 0.2*0.4 | |
B | 0 | / | / | 1*0.2 | / | 0 | |
121. Low | V | / | 1*1 | / | / | / | / |
12. Low | V | / | 1*1 | / | / | / | / |
… | … | … | … | … | … | … | … |
3. High | V | 1*1 | 0 | 0 | 1*0.5 | / | / |
C | 0 | 1*1 | 0.4*1 | 0 | / | / | |
B | 0 | 0 | 0.6*1 | 0 | / | / | |
1 | 2 | 3 | 4 | 5 | 6 | supp | ||
111. Low | V | 1 | 0 | / | 0 | / | 0.24 | 1.24 |
C | 0 | 0.6 | / | 0.8 | / | 0.36 | 1.76 | |
112. Middle | V | 0.6 | 0.16 | / | / | / | 0.2 | 0.58 |
C | 0 | 0.24 | / | / | / | 0 | 0.14 | |
11. Middle | V | 0.8 | 0.2 | / | 0 | 0.32 | 1.32 | |
C | 0 | 0.8 | / | 0 | / | 0.08 | 0.88 | |
B | 0 | / | / | 0.2 | / | 0 | 0.2 | |
… | … | … | … | … | … | … | … | … |
321. Middle | C | / | 1 | / | / | / | / | 0.6 |
322. High | C | / | 1 | 1 | / | / | / | 1.6 |
32. High | C | / | 1 | 1 | / | / | / | 1.8 |
3. High | V | 1 | 0 | 0 | 0.5 | / | / | 1.35 |
C | 0 | 1 | 0.4 | 0 | / | / | 1.26 | |
B | 0 | 0 | 0.6 | 0 | / | / | 0.54 | |
1 | 2 | 3 | 4 | 5 | 6 | supp | ||
111. Low | V | 1 | 0 | / | 0 | / | 0.24 | 1.24 |
C | 0 | 0.6 | / | 0.8 | / | 0.36 | 1.76 | |
11. Middle | V | 0.8 | 0.2 | / | 0 | 0.32 | 1.32 | |
C | 0 | 0.8 | / | 0 | / | 0.08 | 0.88 | |
12. Low | V | / | 0.8 | / | / | / | / | 0.8 |
1. Middle | V | 0.8 | 0.8 | / | 0 | / | 0.8 | 2.4 |
211. Middle | C | 0.24 | / | 0.48 | 0 | 0.07 | / | 0.8 |
22. Middle | C | / | 0.64 | 0 | / | 0.48 | / | 1.12 |
2. Middle | C | 0 | 0.64 | 0 | 0 | 0.48 | / | 1.12 |
31. High | V | 0.8 | / | 0 | 0.4 | / | / | 1.2 |
322. High | C | / | 0.8 | 0.8 | / | / | / | 1.6 |
32. High | C | / | 0.9 | 0.9 | / | / | / | 1.8 |
3. High | V | 0.9 | 0 | 0 | 0.45 | / | / | 1.35 |
C | 0 | 0.9 | 0.36 | 0 | / | / | 1.26 | |
1 | 2 | 3 | 4 | 5 | 6 | supp | ||
111. Low31. High | VV | 0.8 | / | / | / | / | / | 0.8 |
111. Low3. High | VV | 0.9 | / | / | / | / | / | 0.9 |
12. Low322. High | VC | / | 0.8 | / | / | / | / | 0.8 |
12. Low32. High | VC | / | 0.8 | / | / | / | / | 0.8 |
12. Low3. High | VC | / | 0.8 | / | / | / | / | 0.8 |
Datasets | Z | N | T |
T10I4D100K | 100 000 | 1 000 | 10 |
Retail | 88162 | 16470 | 10.3 |
Mushroom | 8124 | 119 | 23 |