關聯規則學習

關聯規則學習（粵拼：gwaan1 lyun4 kwai1 zak1 hok6 zaap6；英文：association rule learning，ARL）係一種機械學習做法，能夠攞一拃離散變數做 input，計完數之後畀出一拃 output 反映啲變數嘅唔同結果之間會點樣「共同發生」。

基本定義[編輯]

關聯規則可以理解為一啲「如果... 就...」法則，描繪數據當中啲變數之間有乜關係。例如想像而家一間超市做好晒紀錄，記低晒每位客人買咗咩（數據），佢哋可以叫電腦搵出呢啲數據入便有嘅關聯規則，得知：

「如果一個客人買咗麵包，佢九成會買埋牛油。」
「如果一個客人買咗泡菜，佢七成會買埋白米。」
「如果一個客人買咗牙刷，佢八成會買埋牙膏。」

用數學符號表達： $X\land Y\rightarrow Z$ （關聯規則分析出嘅嘢），當中 X Y Z 當中每一個都係「有冇買呢件呢件產品」。

應用例子[編輯]

關聯規則喺營銷上可以好使好用。

依家想像有班做營銷工作嘅分析師，想探知消費者買嘢嗰陣嘅習慣係點，佢哋由一間超市嗰度攞咗數據，得知某年某月某日，到訪嗰間超市嘅客每個人買咗啲乜，即係話手上嘅數據望落係好似噉嘅^[1]^[2]：

顧客 A：荔枝、啤酒、白米、雞肉 | 顧客 B：荔枝、啤酒、白米 | 顧客 C：芝士、啤酒、白米、雞肉 | ...

等等。喺最基本上，班分析師可以計吓每種貨品有人買嘅機率係幾多，例如設 $P({\text{lai zi}})$ 做一個客「買荔枝嘅機會率」， $P({\text{lai zi}})$ 可以好簡單噉計到出嚟：

P({\text{lai zi}})={\frac {\text{買  咗  荔  枝  嘅  客  嘅  數  量}}{\text{客  嘅  總  數  量}}}

班分析師可以做更進階嘅分析。除咗計一件貨品有幾高支持度^{[e 1]}之外，佢哋仲可以^[2]：

決定攞走所有支持度（例如）低過 10% 嘅貨品，唔再對佢哋進行分析；
計信心度^{[e 2]}：設 C 同 D 做間超市嘅其中兩件貨品，關聯規則分析上講嘅信心度所指嘅，就係「如果某個客買咗 C，佢會買 D 嘅機會率」，設 ${\text{sap bok}}$ （取自粵語十扑）做支持度，即係^[3]
${\text{seon sam}}(C\rightarrow D)={\frac {{\text{sap bok}}(C\cup D)}{{\text{sap bok}}(C)}}={\frac {P(C\cap D)}{P(C)}}=P(D\mid C)$
計提升度^{[e 3]}：淨係得信心度係唔夠嘅，因為信心度冇考慮到貨品 D 幾多人買（ $P(D)$ ）。提升度可以詮釋做「設商品 D 嘅支持度做恆常^{[註 1]}，C 至 D 嘅信心度」，即係話
${\text{tai sing}}(C\rightarrow D)={\frac {P(C\cap D)}{P(C)\times P(D)}}={\frac {P(D\mid C)}{P(D)}}$

如果提升度數值係 1，表示買唔買 C 同買唔買 D 之間根本冇啦掕。如果提升度數值大過 1，就表示買 C 會提升買 D 嘅機率。如果提升度數值細過 1，就表示買 C 會降低買 D 嘅機率。

有咗呢啲資訊，做市場研究嘅人就可以預測客人嘅行為^[4]^[2]，再用各種手法圖利。例如而家知道咗客人成日會同時買 X 同 Y 呢兩種貨品，賣方可以特登將 X 同 Y 擺喺同一貨架上便（方便客人一嘢攞晒兩樣貨），又可以做減價嗰陣淨係同 X 或者 Y 其中一樣做減價，又或者暗中將啲賣 X 嘅廣告 show 畀買咗 Y 嘅客睇... 等等^[5]。

R 做法[編輯]

喺 2020 年代初，R 程式語言有函式庫支援人做關聯規則分析^[2]。

睇埋[編輯]

註解[編輯]

↑ 亦可以睇吓控制變數嘅概念。

詞彙[編輯]

↑ support，响 ARL 當中係指「有幾多人買」。
↑ confidence
↑ lift

引咗[編輯]

↑ Kumbhare, T. A., & Chobe, S. V. (2014). An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1), 927-930. "The performance of FP-growth is better than all other algorithms."
↑ ^2.0 ^2.1 ^2.2 ^2.3 （英文）簡介點樣用 R 程式語言嚟做關聯規則探勘，講到關聯規則探勘當中嘅 support-confidence-lift 三大指標。
↑ Hornik, K., Grün, B., & Hahsler, M. (2005). arules - A computational environment for mining association rules and frequent item sets. Journal of Statistical Software, 14(15), 1-25.
↑ Kumbhare, T. A., & Chobe, S. V. (2014). An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1), 927-930. "The performance of FP-growth is better than all other algorithms."
↑ Ng, A., & Soo, K. (2017). Numsense! Data Science for the Layman. Annalyn Ng and Kenneth Soo.

外拎[編輯]

（英文）簡介點樣用 R 程式語言嚟做關聯規則探勘

[7] 亦可以睇吓控制變數嘅概念。

[3] support，响 ARL 當中係指「有幾多人買」。

[4] ↑ confidence

[6] t

[1] Kumbhare, T. A., & Chobe, S. V. (2014). An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1), 927-930. "The performance of FP-growth is better than all other algorithms."

[ARLtutorR-2] 2.0 ^2.1 ^2.2 ^2.3 （英文）簡介點樣用 R 程式語言嚟做關聯規則探勘，講到關聯規則探勘當中嘅 support-confidence-lift 三大指標。

[5] Hornik, K., Grün, B., & Hahsler, M. (2005). arules - A computational environment for mining association rules and frequent item sets. Journal of Statistical Software, 14(15), 1-25.

[8] Kumbhare, T. A., & Chobe, S. V. (2014). An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1), 927-930. "The performance of FP-growth is better than all other algorithms."

[9] Ng, A., & Soo, K. (2017). Numsense! Data Science for the Layman. Annalyn Ng and Kenneth Soo.

[1]

[2]

[e 1]

[e 2]

[3]

[e 3]

[註 1]

[4]

[5]