目前位置>EngineeringScience6-1

探勘高頻瀏覽路徑與高效益項目集之相關性型樣Mining Correlation Patterns of Frequent Traversal Paths and High Utility Itemsets

公告類型: 工程科學類6-1
點閱次數: 397

摘要

隨著網路購物平台密集的發展,購物的環境與行為已經由實體店面漸漸轉移到網路虛擬店面,當使用者在網路購物的同時也留下了大量的瀏覽資料,因此網頁探勘(Web Mining)的技術變得日益重要,這樣技術已經廣泛的應用在商業上的預測以及決策的支援。目前有許多資料探勘的研究,除了探討網頁瀏覽路徑之外,亦同時考慮與購買商品之間的關聯性,進而獲得更詳細的資訊。但在關聯規則裡提供項目間的關係裡項目的重要性都一樣,因此無法得知道項目間是否存在高效益,因此近年來新興起高效益探勘相關研究,高效益探勘考慮項目的數量和效益,因此可探勘出最高效益的商品組合,以達到提升效益。為了達到上述目的,故本研究提出具有連續路徑及高效益產品組合之相關性型樣的CCUCorrelation Patterns of Consecutive Paths and High Utility Itemsets)演算法。CCU演算法流程分為三個階段;在第一階段裡,首先找出瀏覽路徑的高頻連續瀏覽路徑。第二階段,再找出高效益產品組合項目集。最後,在第三階段,將第一階段與第二階段的結果進行交叉搭配產生相關性型樣後,再掃描資料庫一次,計算每個相關性型樣的支持度與實際效益值。在前二階段裡,皆會以一個過濾機制來避免大量的候選項目集產生,以提升整個執行效能。在實驗評估裡,從實驗結果可得知在不同參數下,CCU演算法是具有不錯的執行效率。

關鍵詞:網頁探勘,資料探勘,高效益探勘,過濾機制


Abstract

In recent years, with the development of the online shopping platforms, people’s shopping behaviors have been changed from physical stores to virtual stores. Web mining technology has thus become an important issue. It has been widely applied to support of making decisions in business from web transactional databases and web traversal path databases. Since the association rules only provide the relationship among items and the importance of each item is the same, it cannot be found whether there are itemsets with high utility or low frequency from web transactional databases. On the other hand, some studies discussed not only the traversal paths of most users but also the relationship among items to obtain more detailed information, but they did not consider the  quantity of items in transactions and their corresponding profits in the database. They could thus not provide users with high utility itemsets information from the web databases. In this study, we proposed a novel algorithm called CCU (Correlation Patterns of Consecutive Paths and High Utility Itemsets), to discover correlation patterns with consecutive and utility aspects from the web databases composing transactions and traversal paths. The mining process can be divided into the three phases. In the first phase, the proposed algorithm can efficiently find large consecutive sequences via the filtration mechanism from the web traversal path database. Moreover, it can efficiently discover the high utility itemsets via the filtration mechanism from the web transactional database in the second phase. The candidate correlation patterns are generated by the large consecutive sequences discovered and the high utility itemsets discovered. The database scan is then executed again to find their support and utility values in the third phase. Finally, the correlation patterns, in which their supports and utility satisfy the support threshold and the utility threshold, respectively, are output as information. In the experimental evaluation, the experimental results show that the proposed CCU algorithm has a good performance under different parameters.

Keyword: Web Mining, Data Mining, Utility Mining, Filtration Mechanism


相關附檔
發布日期: 2021/09/23
發布人員: 薛淑真