Loading [MathJax]/jax/output/SVG/jax.js
Qiang Gao, Feng-Li Zhang, Rui-Jin Wang. Mining Frequent Sets Using Fuzzy Multiple-Level Association Rules[J]. Journal of Electronic Science and Technology, 2018, 16(2): 145-152. DOI: 10.11989/JEST.1674-862X.60408013
Citation: Qiang Gao, Feng-Li Zhang, Rui-Jin Wang. Mining Frequent Sets Using Fuzzy Multiple-Level Association Rules[J]. Journal of Electronic Science and Technology, 2018, 16(2): 145-152. DOI: 10.11989/JEST.1674-862X.60408013

Mining Frequent Sets Using Fuzzy Multiple-Level Association Rules

doi: 10.11989/JEST.1674-862X.60408013
Funds: This work was supported by the Fundamental Research Funds for the Central Universities under Grants No. ZYGX2014J051 and No. ZYGX2014J066, Science and Technology Projects in Sichuan Province under Grants No. 2015JY0178, No. 2016FZ0002, No. 2014GZ0109, No. 2015KZ002 and No. 2015JY0030, China Postdoctoral Science Foundation under Grant No. 2015M572464
More Information
  • Author Bio:

    Qiang Gao was born in Anhui Province in 1993. He received the B.S. degree from Anhui Polytechnic University, Wuhu in 2014. He is currently pursuing the Ph.D. degree with the School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu. His research interests include big data, quantum computation, and quantum storage

    Feng-Li Zhang was born in Henan Province in 1963. She received the B.S., M.S., and Ph.D. degrees in computer science from University of Electronic Science and Technology of China in 1983, 1986, and 2007, respectively. She is currently a professor with University of Electronic Science and Technology of China. Her research interests include mobile database, security of network, and big data

    Rui-Jin Wang was born in Xinjiang Uygur Autonomous Region in 1980. He received the M.S. and Ph.D. degrees from University of Electronic Science and Technology of China in 2009 and 2013, respectively. He is currently a lecture with University of Electronic Science and Technology of China. His research interests include quantum computation, security of network, and big data

  • Authors’ information: Q. Gao, F.-L. Zhang (corresponding author), and R.-J. Wang are with the School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054 (e-mail: qianggao@std.uestc.edu.cn; fzhang@uestc.edu.cn; ruijinwang@uestc.edu.cn)
  • Received Date: 2016-04-07
  • Rev Recd Date: 2016-11-07
  • Available Online: 2019-12-25
  • Publish Date: 2018-07-04
  • At present, most of the association rules algorithms are based on the Boolean attribute and single-level association rules mining. But data of the real world has various types, the multi-level and quantitative attributes are got more and more attention. And the most important step is to mine frequent sets. In this paper, we propose an algorithm that is called fuzzy multiple-level association (FMA) rules to mine frequent sets. It is based on the improved Eclat algorithm that is different to many researchers’ proposed algorithms that used the Apriori algorithm. We analyze quantitative data’s frequent sets by using the fuzzy theory, dividing the hierarchy of concept and softening the boundary of attributes’ values and frequency. In this paper, we use the vertical-style data and the improved Eclat algorithm to describe the proposed method, we use this algorithm to analyze the data of Beijing logistics route. Experiments show that the algorithm has a good performance, it has better effectiveness and high efficiency.
  • Now we are facing an increasingly wide range of data types from the original single Boolean attribute data sets to the current mixed type attribute data sets. There are many association rule algorithms based on the Boolean data and improved algorithms[1]-[3]. For association rules[4], {(A,parameter1,),(B,parameter1,)}{(C,parameter1,)} denotes attributes A and B would recommend C, and () means that the attribute might be the quantitative data, and it may have some parameters such as the frequency. So in the {(Sunny),(Temperature,25C),(Windspeed,12km/h)}, although the attribute Sunny belongs to the Boolean attribute, the attributes temperature and wind speed are quantitative attributes which have different domains such as {Low,Comfortable,High}. The values of these quantitative attributes will influence the result of data mining:

    {(Temperature,High),(Sunny)}{Windspeed,Breeze}.

    Researchers have applied the fuzzy theory, founded by Professor L. A. Zadeh, to data mining. Take the following records as an example to illustrate the fuzzy theory, {(Pathe horror movie, 8.9, 1), (Imagenes horror movie, 8.7, 4)}. Each record has three parts, the first part is the name of attribute’s ID, the second part is the evaluated point from people, and the third part is the frequency. If we suppose that the profit of Imagenes horror movie is much more than that of Pathe horror movie, the fact could not be neglected that there are different weights even if their frequencies are low. These attributes should have their own weights, which are the key of the method using association rules to find the potential relationship. In real world, we are usually interested in approximate description of attributes rather than pointing out the detailed description of some attributes to recommend other attributes.

    In this paper, we propose an algorithm called fuzzy multiple-level association (FMA) rules. The rules of algorithm to solve non-Boolean attributes are as follows: 1) We use the approach based on weight and use the fuzzy theory to deal with the frequency of attribute and the value of attribute, respectively. 2) We use the concept hierarchy theory to encode concept trees and divide them into different levels. 3) This paper proposes the algorithm FMA that uses the improved Eclat algorithm as a basic model to deal with frequent sets called CH_Eclat. CH_Eclat uses non-frequent 2-itemset to cut down a large number of useless candidate sets. CH_Eclat also uses a method based on lost support values called Lost_Support to cut down times to count support. Therefore we can get frequent sets more fast according to the verification of experiment.

    In 1993, R. Agrawal et al. first proposed the concept of association rules[5]. R. Agrawal et al. proposed a famous algorithm Apriori[6] to count the support, and J.-W. Han et al. [7] put forward the Fp-growth algorithm to solve the problem of generating a large number of candidate sets. These two methods are based on the horizontal format of data. References [8], [9], and [10] used the Apriori algorithm to mine multi-level rules. M. J. Zaki[11] proposed the Eclat algorithm to use the vertical data to store database’s records. This algorithm’s performance is generally better than horizontal format data. And many papers have improved the Eclat algorithm[12]-[14]. Reference [15] improved the Apriori algorithm to mine real data compared with the Fp-tree algorithm, but their ways of counting support were not appropriate to deal with non-Boolean data. Reference [16] discussed some algorithms in association rules mining in recent years such as fuzzy Fp-tree.

    References [8], [9], and [10] proposed several methods for the multi-level association rules mining, and the fuzzy theory is applied to the association rules in [8] and [10], which describes the attributes’ frequency by using the fuzzy set. Reference [8] proposed a fuzzy multiple-level mining algorithm for items which have different minimum supports. Reference [9] presented a partition technique for the multilevel association rule mining problem, it improves the cost of I/O and central processing unit (CPU) runtime. Reference [10] also used different support values at each level as well as different membership functions for each item, it adopts a top-down progressively deepening approach to derive large sets. But these algorithms only consider the frequency of the attributes, they do not consider that some factors may also affect the mining efficiency, such as the evaluation by people or its quantitative features.

    Some other researchers also concerned about quantitative rules and multi-level mining[17]-[19]. In [20], M. Kaya proposed effective mining of fuzzy multi-cross-level weighted association rules. It invented a fuzzy data cube facilitating for handling quantitative values of dimensional attributes, and as a result, mining fuzzy association rules can be used at different levels. Since it is based on the traditional Apriori mining algorithm, it inevitably costs much CPU runtime and makes it difficult to deal with massive sparse data.

    The traditional Eclat algorithm uses the vertical format to find the frequency set. It uses the depth-first search strategy and bases on the concept lattice theory. And it divides the search space by the relationship of prefix equivalence, using cross-counting to count the support. In this paper, instead of applying the basic Eclat algorithm, we propose an improved algorithm CH_Eclat, which is more suitable for FMA.

    Theorem 1.1. A superset of the non-frequent set is certainly a non-frequent itemset.

    Theorem 1.2. Non-empty subsets of any frequent itemsets are also frequent itemsets.

    1) The traditional Eclat algorithm uses the depth-first search strategy, while CH_Eclat uses the breadth-first search method to connect attribute sets. We fully use Apriori’s transcendental nature to prune the candidate sets by non-frequent 2-itemset. And non-frequent 2-itemset uses the vertical format to store the relationship of attributes. According to Theorem 1.1, non-frequent 2-itemset will greatly cut down the generation of non-frequent k-itemsets (k >2). For example, a transaction database Z has six attributes {A,B,C,D,E,F}, by mining frequent 2-itemset simultaneously, we will obtain a non-frequent 2-itemset, such as

    {{A,B},{A,D},{B,E},{C,F},{D,F}}

    where {} means itemsets from the database Z. We use above method to store, as shown in Fig. 1. So we can judge that {A,B,C} is not frequent by this non-frequent set, because {A,B} is non-frequent.

    Figure  1.  Non-frequent set.

    2) Combined with [21], using a missing threshold called Lost-Support based on cross-counting computing. The cross-counting method reduces the times of comparisons. For example, this attribute set {A, B}, pointer p1 points to A’s first data and pointer p2 points to B’s first data. Firstly, we compare the value of (*p1) with the value of (*p2), judging that if (*p1) is equal to (*p2). If they are equal, then p1=p1+1 and p2=p2+1. If they are not equal and (*p1) > (*p2), then p2=p2+1 and preparing the next comparison operation, where * is the pointer. In this method, the operation times are determined by Min{A.Length,B.Length}.

    So we can make full use of Lost_Support to reduce unnecessary comparisons. For Table. 1, we get

    Table  1.  Example of vertical-style data
    AttributeRecord ID
    A1, 2, 4, 5, 7, 8, 9
    B1, 3, 4, 6, 7, 10, 12
     | Show Table
    DownLoad: CSV
    Lost_Support(A)=A.Lengthmin_suppLost_Support(B)=B.Lengthmin_supp.

    When the value of A is not matched to that of B, the Lost_Support of A will be cut down by 1, so as B. If any Lost_Support is less than 0, we will break the counting and delete this relationship {A, B}, and the operation of counting is shown in Fig. 2.

    Figure  2.  Cross-counting method of Lost_Support.

    The CH_Eclat algorithm is descripted as Table 2.

    Table  2.  CH_Eclat algorithm
    CH_Eclat algorithm
    Input:
     D is the set of the transactions;
     min_supp is the minimum support threshold.
    Output:
     Frequent_Sets F.
    Description:
     H=Convert_Vertical (D);//Convert to vertical data
     F1=first_Frequent(H, min_supp);
     for(k=2; Fk–1 ; k++) do begin
      if(k=2)//Generate 2_non-frequent set to help cut down
       norelation_Set=generate_norelation(F1);
      end if
      for each Item-Set t F k–1 do begin
       if(t norelation_Set && (count_supp(t)>=min_supp))
       //count_supp() use cross count method
       Ck=add_Frequent(t);
      end if
     end for
     Fk=Ck;//Get next frequent set
    end for
    Answer=Fk//get all frequent sets
     | Show Table
    DownLoad: CSV

    Definition 2.1. A database Z={TID1,TID2,,TIDn}T is consisted of n records TIDj(j=1,2,,n). {·}T means it is in the vertical format. Each record TIDj of database Z can define as

    TIDj={id,(tj1,fj1,trj1),(tj2,fj2,trj2),,(tjm,fjm,trjm)}

    Each record has a unique tag named id.fjp that is attribute tjp’s frequency in record TIDj. trjp is the value of attribute tjp, p=1,2,,m, where m is the number of TIDj’s attributes.

    Definition 2.2. Set fuzzy record about TIDj(j=1,2,,n). We get k abstract concepts from frequency fjp, and it is corresponding to Rps (s=1, 2, ···, k), so fjp{Rp1,Rp2,,Rps}, and we call μjp(Rps) as a frequency membership function for attribute tjp on TIDj. Then we get q abstract concepts from trjp, and we know that it is corresponding to Cph (h=1, 2, ···, q), so trjp{Cp1,Cp2,,Cph}. And we call λjp(Cph) as the value membership function for attribute tjp on TIDj.

    The proposed algorithm FMA use the hierarchical coding method to get the coding tree and fuzzy set to deal with frequency of attributes and attributes’ value, with which we can get frequent sets. The concrete steps are shown as following:

    Step 1. Concept hierarchy

    The concept is usually expressed in the form of a hierarchical structure tree, each node is representing for a concept, the root node ANY is a special node. A given multi-dimensional attribute has many functions, including constructing many views, generating concept hierarchy, explaining the partial order and total order of attributes by patterns, and defining the construction of hierarchical tree. There are many ways to construct the tree, we can divide concept by data set’s own feature and hierarchical model.

    First, we should encode the concept from the concept tree. And we can use (1) to calculate the concept’s code value,

    ω=10(Ω1)+i (1)

    where Ω is the hierarchical level, i is the position of the current level of the layer to its ancestor node, the value of attribute coding is incremented by one from left to right, and is to encode the predefined taxonomy.

    Fig. 3 shows a simple case.

    Figure  3.  Hierarchy original tree.

    After using (1), the hierarchy coded tree is gotten, as shown Fig. 4.

    Figure  4.  Hierarchy coded tree.

    Step 2. Convert the frequency of attribute to fuzzy set

    First, define the membership functions of attributes’ frequencies, then membership functions are set according to the results of root node’s sub node classification. The corresponding frequent fuzzy set of fjp is obtained as

    fjp{Rp1,Rp2,,Rps}(s=1,2,,k),

    forming as

    μjp(Rp1)Rp1+μjp(Rp2)Rp2++μjp(Rps)Rps.

    Step 3. Get representative fuzzy domain for frequency

    Obtain the statistics of the number from every item fuzzy set <fjp,Rps>, Countps=μjp(Rps). Find every item’s maximum cardinal number which has s fuzzy domain, and use the maximum cardinal number to represent the fuzzy domain of this item. Use (2) to calculate the maximum:

    max_Countjp=MAX{Countp1,Countp2,,Countps}. (2)

    We will get fuzz_frepmax which has the maximum of cardinal number, updating the initial attributes and converting to fuzzy attributes. Take fuzzy attribute domain {111, Low, Middle, High} as an example and High is the maximum, so we choose High as the representative, that is to say, the fuzzy attribute will be 111. High. In this way, we can get the pretreated candidate 1-itemset.

    Step 4. Dealing with value of attribute

    Define the membership function of attributes’ value, divide into g attribute areas, and calculate their membership λjp(Cph). We will get the fuzzy set of trjp

    trjp{Cp1,Cp2,,Cph} (h=1, 2, ···, q)

    forming as

    λjp(Cp1)Cp1+λjp(Cp2)Cp2++λjp(Cph)Cph.

    Delete the fuzzy set that is not belong to certain conditions and tjp will be divided into regional attributes.

    Step 5. Add the weight coefficients

    Calculate every fuzzy attribute’s final support supp(fuzz_frepmax, Cph) by the weight coefficients table. The judgment of weight is based on the expert opinion or other ways such as market rules for sales:

    Weight={w1,w2,,wm},supp(fuzz_frepmax,Cph)=μjp(Rps)λjp(Cph)wp.

    Finally, we will get the handled database which we need.

    Step 6. Getting frequent itemsets by CH_Eclat.

    Defining the minimum support as min_supp, calculate every fuzzy attribute’s support. And we know every value of fuzzy attribute is between [0, 1], because it ensures the downward closed property. According to [22], we do not need to calculate the ancestor nodes’ supports, because we can get them from their children. Finally, we will get the frequent item set.

    In this part, we show an example of films to explain the application of FMA. In Table 3, the first of item is something which users buy, the second is the purchase quantity, and the third is the score of its evaluation. We divide films into three parts: Movies, serials, and documentaries. And each type has its specific attribute. So we make them have their own membership functions according to the frequency of every attribute. Finally, we can make data mining deeply by our algorithm. The original database is showed in Table 3.

    Table  3.  Original database
    TIDItems
    D1(Pathe horror movie,1,8.9) (Imagenes horror movie,4,8.7) (Dmci story serial,4,8.0) (LS story serial,6,9.3)(Mia-max wild life documentary,7,5.6)
    D2(Pathe horror ,movie,3.8.0) (Imagenes horror movie,3,8.2) (Gaumont social movie,1,9.0) (Dmci comic serial,5,7.8) (LS comic serial,3,6.9) (Pathe scientific documentary,4,7.0) (Mia-max scientific documentary,4,8.0)
    D3(Dmci story serial,7,6.9) (Dmci comic serial,8,8.8) (LS wild life documentary,5,5.8) (Pathe scientific documentary,7,7.6)
    D4(Pathe horror movie,2,4.3) (Dmci story serial,5,5.6)(LS wild life documentary,5,8.8)
    D5(Dmci story serial,5,6.6)(LS comic serial,4,7.4)
    D6(Pathe horror movie,3,8.2) (Imagenes horror movie,10,8.6)
     | Show Table
    DownLoad: CSV

    Step 1. First, we set the relationship between a schema Ω,i, and then calculate it according to (1). In this original database, the weight of each item attributes corresponds to their price and profit. And we have been normalizing the data to ensure the downward closed property. The weighting factor is showed in Table 4.

    Table  4.  Weighting coefficient table
    <r, k>111112121122211212221222311321322111221223132123
    W10.60.50.80.60.50.80.50.60.60.810.80.70.80.80.910.80.9
     | Show Table
    DownLoad: CSV

    We get the stratification model from the original database, and then the concept tree and coded tree will be gotten as shown in Fig. 5, and Fig. 6, respectively.

    Figure  5.  Hierarchy concept tree.
    Figure  6.  Hierarchy coded tree.

    Step 2. We predefine their membership functions according to goods’ quantity and attributes classification, which are showed in Figs. 7, 8, and 9.

    Figure  7.  Membership functions for number of serials.
    Figure  8.  Membership functions for number of movies.
    Figure  9.  Membership function for number of documentary.

    Then we convert the fuzzy frequent attribute according to predefining membership functions. We only show some of the data in Table 5.

    Table  5.  Fuzzy attribute table of frequency
    TIDItems
    D1(1<111,Low>)(0.4<112,Low>+0.6<112,Middle>)(0.6<211,Low>+0.4<211,Middle>)(1<311,High>)(0.2<11,Low>+0.8<11,Middle>)(0.4<21,Middle>+0.6<21,High>)(1<31,High>)(0.2<1,Low>+0.8<1,Middle>)(0.4<2,Middle>+0.6<2,High>)(1<3,High>)
    D6(0.6<111,Low>+0.4<111,Middle>)(0.2<112,Middle>+0.8<112,High>)(1<11,High>)(1<1,High>)
     | Show Table
    DownLoad: CSV

    Step 3. Calculate all of the transactions and find every item’s maximum cardinal number. For example, as the first transaction, we can count the number of {(111, Low)}, Count111. Low=1+0.6+0.8+0.6=2.4, which is the maximum cardinal number. After counting, we will get the preprocessing the fuzzy candidate 1-itemset as Table 6.

    Table  6.  Fuzzy candidate 1-item set
    Fuzzy attributeSupport
    111. Low2.4
    112. Middle1.2
    11. Middle2
    121. Low1
    12. Low1
    1. Middle1.8
    211. Middle2.6
    212. Middle0.8
    3. High3.5
     | Show Table
    DownLoad: CSV

    Step 4. We construct the fuzzy concept of every attribute by score, thus we can set the fuzzy concept {Very Good, Common, Bad}. It means that the film is good, or it is common, or I do not like this film. The membership is showed in Fig. 10.

    Figure  10.  Membership function.

    We use (1) to convert the score to the vertical format:

    scorei=scoreji+1||j|| (3)

    where i means the level of attribute, j means the child node number, and ||j|| means the number of child node. It calculates the higher level attribute’s score, getting Table 7.

    Table  7.  Overall rating table
    123456
    PHM(111)8.98.0/4.3/8.2
    IHM(112)8.78.2///8.6
    HM(11)8.88.1/4.3/8.4
    GSM(121)/9.0////
    M(1)8.88.6/4.3/8.4
    DCS(221)/7.08.8///
    D(3)5.67.56.78.8//
     | Show Table
    DownLoad: CSV

    Step 5. The details of this step will be shown in the following.

    1) Convert Table 7 to the membership table according to Fig. 10 and exclude the item that the sum of membership is zero. We can get Table 8.

    Table  8.  Membership list
    AttributeUser ID
    123456
    PHM(111. Low)V10/0/0.4
    C01/1/0.6
    IHM(112. Middle)V10.4///1
    C00.6///0
    HM(11. Middle)V10.2/00.8
    C00.8/0/0.2
    B0//1/0
    GSM(121. Low)V/1////
    SM(12. Low)V/1////
    M(1. Middle)V11/0/0.8
    C00/0/0.2
    B00/1/0
    D(3. High)V1001//
    C010.40//
    B000.60//
     | Show Table
    DownLoad: CSV

    2) Combining with the fuzzy support in Table 5, the support of quantitative value of the fuzzy attribute is found as Table 9.

    Table  9.  Fuzzy support table
    123456
    111. LowV1*10/0/0.4*0.6
    C01*0.6/1*0.8/0.6*0.6
    112. MiddleV1*0.60.4*0.4///1*0.2
    C00.6*0.4///0
    11. MiddleV1*0.80.2*1/00.8*0.4
    C00.8*1/0/0.2*0.4
    B0//1*0.2/0
    121. LowV/1*1////
    12. LowV/1*1////
    3. HighV1*1001*0.5//
    C01*10.4*10//
    B000.6*10//
     | Show Table
    DownLoad: CSV

    3) According to Tables 9 and 4, we multiply the weight. Then we get the final support of every fuzzy attribute as Table 10.

    Table  10.  Final fuzzy support degree table
    123456supp
    111. LowV10/0/0.241.24
    C00.6/0.8/0.361.76
    112. MiddleV0.60.16///0.20.58
    C00.24///00.14
    11. MiddleV0.80.2/00.321.32
    C00.8/0/0.080.88
    B0//0.2/00.2
    321. MiddleC/1////0.6
    322. HighC/11///1.6
    32. HighC/11///1.8
    3. HighV1000.5//1.35
    C010.40//1.26
    B000.60//0.54
     | Show Table
    DownLoad: CSV

    Step 6. Set min_supp=0.8, delete {Bad} evaluation of fuzzy attribute, we will get 1-frequent item set as Table 11.

    Table  11.  1-frequent item set
    123456supp
    111. LowV10/0/0.241.24
    C00.6/0.8/0.361.76
    11. MiddleV0.80.2/00.321.32
    C00.8/0/0.080.88
    12. LowV/0.8////0.8
    1. MiddleV0.80.8/0/0.82.4
    211. MiddleC0.24/0.4800.07/0.8
    22. MiddleC/0.640/0.48/1.12
    2. MiddleC00.64000.48/1.12
    31. HighV0.8/00.4//1.2
    322. HighC/0.80.8///1.6
    32. HighC/0.90.9///1.8
    3. HighV0.9000.45//1.35
    C00.90.360//1.26
     | Show Table
    DownLoad: CSV

    The 2-frequent itemset mining: To avoid redundancy, the child node should not connect to its ancestor node, for example, node 111 should not connect to 11, 12, and 1. The minimum fuzzy attribute set can be generated as the new set from two fuzzy attributes since every attribute set connects to another attribute set. The final mining results are shown in Table 12. In Table 12, item {111,Low,Verygood)} connects to {(31,High,Verygood)}. We can get from this rule: The set which includes low frequency of {Pathe horror movie} and high frequency of {Wild Life Documentary} is a frequent set, and we find their evaluation is good according to the proposed method.

    Table  12.  2-frequent set
    123456supp
    111. Low31. HighVV0.8/////0.8
    111. Low3. HighVV0.9/////0.9
    12. Low322. HighVC/0.8////0.8
    12. Low32. HighVC/0.8////0.8
    12. Low3. HighVC/0.8////0.8
     | Show Table
    DownLoad: CSV

    In this paper, we propose the FMA algorithm which is based on the CH_Eclat algorithm. We use the CH_Eclat model to count the support of database, and the performance of getting support does well affect on the effectiveness of the overall data mining. We improve the original Eclat algorithm to adapt our FMA algorithm. We use three kinds of standard data set to test our CH_Eclat algorithm and get the performance of it. The algorithm test lab environment is: CPU is AMD A8-7650K Radeon R7, 10 Compute Cores4C+6G 3.30GHz, 8G Memory, and Windows 7 OS. The CH_Eclat algorithm uses JAVA language and debugs in the Eclipse development platform. We set different support to get their run time. The console of experiment is showed in Table 13.

    Table  13.  Standard data set
    DatasetsZNT
    T10I4D100K100 0001 00010
    Retail881621647010.3
    Mushroom812411923
     | Show Table
    DownLoad: CSV

    In Figs. 11, 12, and 13, the CH_Eclat algorithm is compared with the standard Eclat algorithm and algorithm proposed by Z.-Y. Xiong[12] named as h_Eclat. In data set retail, the standard Eclat’s time is overflow, so we do not line it. After comparing, we find CH_Eclat has the better performance especially for the large data set and sparse data. We can quickly find the association rules with the FMA algorithm because of better support from CH_Eclat.

    Figure  11.  Result of T10I4D100K.
    Figure  12.  Result of retail.
    Figure  13.  Result of mushroom.

    We verified the validity of FMA according to the real data about the Beijing logistics route. It has 4799 logistics routes and uses different logistics modes to send their goods. We will mine the frequent route line, get potential information through peoples’ comments, and then give scores. Through the processing of the original data, the logistics send to 32 regions including several provinces, autonomous regions, and Hong Kong. It uses six types of different logistics transportation tools to send the goods to 362 municipal areas, in other words, 2453 counties in all. Due to the sparse feature of the county logistics route data, it is not conducive to analyze the frequent routes. According to the regional features and FMA, we use the province-city-county order to construct multi-level mining. With the support of the logistics routes among different areas, we can find the frequent levels of different routes, and the result is shown in Fig. 14. We find that the route from Beijing to Sichuan is the most frequent. A lot of goods sent to Sichuan from Yanqing District, Beijing.

    Figure  14.  Frequent routs under different support.

    According to the different transportation tools used in logistics, we are able to find out the frequent levels of the use of different logistics transportation tools. The result is showed in Fig. 15. We find that a lot of goods sent to Yunan Province and Sichuan Province by Xingtie, and we also find that the goods sent to Hebei Province can get good comments by Xingtie.

    Figure  15.  Frequent logistics under different support.

    After getting the frequent rules, we analyze the performance of the algorithm compared with M. Kaya’s method[20] which is based on Apriori. We set it as B_Apriori showing in Fig. 16. The result shows that the proposed algorithm has higher efficiency.

    Figure  16.  Performance comparison.

    In this paper, we proposed an algorithm of multi-level quantitative attributes by using the weight and combining with fuzzy theory. It is used to deal with different weights and different frequencies. With this method, we improved the data mining and got potential rules. This algorithm is much closer to the real environment. So we can do data mining by setting different conditions to obtain the target information. Our example well explained the usage of this algorithm and generated the potential rules. This algorithm is also fit for the mixed type attributes including the Boolean attribute. The good performance has been demonstrated through the data of the Beijing logistics route. It can quickly tap the frequent route. In the future, we will explore the applications of the fuzzy theory and association rules. It will promote the development of association rules.

  • [1]
    R. Chang and Z.-Y. Liu, " An improved apriori algorithm,” in Proc. of 2011 IEEE Intl. Conf. on Electronics and Optoelectronics, 2011, pp. 476-478.
    [2]
    Z.-X. Hong and F.-L. Bian, " An improved apriori algorithm based on support count matrix,” Geomatics and Information Science of Wuhan University, vol. 33, no. 12, pp. 1246-1249, 2008.
    [3]
    J. Dongre, G. L. Prajapati, and S. V. Tokekar, " The role of Apriori algorithm for finding the association rules in Data mining,” in Proc. of IEEE Intl. Conf. on Issues and Challenges in Intelligent Computing Techniques, 2014, pp. 657-660.
    [4]
    B.-Q. Hu, Fuzzy Theory Foundation, Wuhan University Press, Jun. 2010, ch. 1, pp. 1-10.
    [5]
    R. Agrawal, T. Imielinksi, and A. Swami, " Mining associations between sets of items in massive databases,” in Proc. of 1993 ACM SIGMOD Conf. on Management of Data, Washington, D.C., 1993, pp. 207-216 .
    [6]
    R. Agrawal and R. Srikan, " Fast algorithms for mining association rules in lager databases,” Journal of Computer Science and Technology, vol. 15, no. 6, pp. 619-624, 2000.
    [7]
    J.-W. Han, J. Pei, and Y.-W. Yin, " Mining frequent patterns without candidate generation,” in Proc. of ACM SIGMOD Intl. Conf. on Management of Data, 2000, pp. 1-12.
    [8]
    Y. C. Lee, T. P. Hong, and T. C. Wang, " Multi-level fuzzy mining with multiple minimum supports,” Expert Systems with Applications, vol. 34, no. 1, pp. 459-468, Jan. 2008.
    [9]
    P. Gautam, and K. R. Pardasani, " Efficient method for multiple-level association rules in large databases,” Journal of Emerging Trends in Computing and Information Sciences, vol. 2, no. 12, Dec. 2011.
    [10]
    A. M. N. Kousari, S. J. Mirabedini, and E. Ghasemkhani, " Improvement of mining fuzzy multiple-level association rules from quantitative data,” Journal of Software Engineering and Application, vol. 5, no. 3, pp. 190-199, Mar. 2012.
    [11]
    M. J. Zaki, " Scalable algorithms for association mining,” IEEE Trans. on Knowledge and Data Engineering, vol. 12, no. 3, pp. 372-390, 2000.
    [12]
    Z.-Y. Xiong, P.-E. Chen, and Y.-F. Zhang, " Improvement of Eclat algorithm for association rules based on Hash Boolean matrix,” Application Research of Computers, vol. 27, no. 4, pp. 1323-1325, 2010.
    [13]
    Y.-F. Zhang, Z.-Y. Xiong, X.-F. Geng, and J.-M. Chen, " Analysis and improvement of Eclat algorithm,” Computer Engineering, vol. 36, no. 23, pp. 28-30, 2010.
    [14]
    X.-M. Yu and H. Wang, " Improvement of Eclat algorithm based on support in frequent item set mining,” Journal of Computers, vol. 9, no. 9, pp. 2116-2123, 2014.
    [15]
    T.-P Han and D.-M. Wang, " An applied research based on improved association rules,” in Proc. of the 5th Intl. Conf. on Information Science and Technology, 2015, pp. 594-598.
    [16]
    S. K. Solanki and J. T. Patel, " A survey on association rule mining,” in Proc. of Intl. Conf. on Advanced Computing and Communication Technologies, 2015, pp. 212-216.
    [17]
    T. Yang and C. Li, " A study of fuzzy quantitative items based on weighted association rules mining,” in Proc. of the 2nd Intl. Conf. on Intelligent Computing and Cognitive Informatics, 2015, pp. 42-46.
    [18]
    T. Hashem, C. F. Ahmed, M. Samiullah, S. Akther, B. S. Jeong, and S. Jeon, " An efficient approach for mining cross-level closed itemsets and minimal association rules using closed itemset lattices,” Expert Systems with Applications, vol. 41, no. 6, pp. 2914-2938, 2014.
    [19]
    C.-J. Li and T.-Q. Yang, " Effective mining of fuzzy quantitative weighted association rules,” in Proc. of Intl. Conf. on E-business and E-government, 2010, pp. 1418-1421.
    [20]
    M. Kaya and R. Alhajj, " Effective mining of fuzzy multi-cross-level weighted association rules,” in Proc. of Intl. Conf. on Foundations of Intelligent Systems, 2006, pp. 399-408.
    [21]
    P.-E. Feng, Y. Liu, Q.-Y. Qiu, and L.-X Li, " Strategies of efficiency improvement for eclat algorithm,” Journal of Zhejiang University, vol. 47, no. 2, pp. 223-230, Feb. 2013.
    [22]
    J.-H. Cheng and P.-F. Shi, " Fast mining multiple-level association rules,” Chinese Journal of Computers, vol. 21, no. 11, pp. 1037-1041, Nov. 1998.
  • Cited by

    Periodical cited type(6)

    1. Geng, X., Yang, Z., Jiao, L. et al. Association rule-based classification: A comprehensive review of methodologies and applications. Expert Systems with Applications, 2025. DOI:10.1016/j.eswa.2025.127454
    2. Wang, X.N., Dong, G.M. DESIGN OF SPORTS TRAINING REAL-TIME DATA ANALYSIS SYSTEM BASED ON RECONFIGURABLE DATA MINING TECHNOLOGY. Revista Internacional de Medicina y Ciencias de la Actividad Fisica y del Deporte, 2024, 24(96): 68-85. DOI:10.15366/rimcafd2024.96.005
    3. Lei, Q.. Design of an Instant Data Analysis System for Sports Training Based on Data Mining Technology. International Journal of Web-Based Learning and Teaching Technologies, 2023, 18(2) DOI:10.4018/IJWLTT.330991
    4. Dahab, A.A.-A., Haggag, R.M., Fotouh, S.A.-A. Enhancing Customer Relationship Management Using Fuzzy Association Rules and the Evolutionary Genetic Algorithm. International Journal of Advanced Computer Science and Applications, 2023, 14(4): 639-649. DOI:10.14569/IJACSA.2023.0140470
    5. Hong, X.. Application of data mining technology in software engineering. Journal of Physics: Conference Series, 2021, 2066(1): 012013. DOI:10.1088/1742-6596/2066/1/012013
    6. Hsu, P.-Y., Huang, C.-W. IECT: A methodology for identifying critical products using purchase transactions. Applied Soft Computing Journal, 2020. DOI:10.1016/j.asoc.2020.106420

    Other cited types(0)

  • Catalog

      Figures(16)  /  Tables(13)

      Article Metrics

      Article views (1150) PDF downloads (114) Cited by(6)
      Related
      Proportional views

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return