| |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
-> 人工智能 -> 数据挖掘(凑标题字数) -> 正文阅读 |
|
[人工智能]数据挖掘(凑标题字数) |
Data MiningChapter TwoData dispersion characteristicsCenterMean:
x
ˉ
=
1
n
∑
i
=
1
n
x
i
\bar{x} = \frac{1}{n} \sum_{i = 1}^n x_i
xˉ=n1?∑i=1n?xi?,
μ
=
∑
x
N
\mu = \frac{\sum x}{N}
μ=N∑x? Median(for grouped data): m e d i a n = L 1 + ( n / 2 ? ( ∑ f r e q ) l f r e q m e d i a n ) w i d t h median = L_1 + (\frac{n / 2 - (\sum freq)l}{freq_{median}}) width median=L1?+(freqmedian?n/2?(∑freq)l?)width Mode:
m
e
a
n
?
m
o
d
e
=
3
×
(
m
e
a
n
?
m
e
d
i
a
n
)
mean - mode = 3 \times (mean - median)
mean?mode=3×(mean?median) Quartiles:
Q
1
Q_1
Q1?(25th percentile),
Q
3
Q_3
Q3?(75th percentile) Variance: Pixel-Oriented Visualization TechniquesSimilarity and Dissimilarity
Distance measure for symmetric binary variables:
d
(
i
,
j
)
=
r
+
s
q
+
r
+
s
+
t
d(i, j) = \frac{r + s}{q + r + s + t}
d(i,j)=q+r+s+tr+s? Here, the asymmetric means the loss cost is different, for some data sets, like the FP is an absolute majority. Jaccard coefficient (similarity measure for asymmetric binary variables): s i m J a c c a r d ( i , j ) = q q + r + s sim_{Jaccard}(i, j) = \frac{q}{q + r + s} simJaccard?(i,j)=q+r+sq? Minowski distance(L-h norm): d ( i , j ) = ∣ x i 1 ? x j 1 ∣ h + ∣ x i 2 ? x j 2 ∣ h + ? + ∣ x i p ? x j p ∣ h h d(i, j) = \sqrt[h]{|x_{i1} - x_{j1}|^h + |x_{i2} - x_{j2}|^h + \cdots + |x_{ip} - x_{jp}|^h} d(i,j)=h∣xi1??xj1?∣h+∣xi2??xj2?∣h+?+∣xip??xjp?∣h? Properties:
A distance that satifies these properties is a metric.
h
=
1
:
h = 1:
h=1: Mabhattan distance
d
(
i
,
j
)
=
∣
x
i
1
?
x
j
1
∣
+
∣
x
i
2
?
x
j
2
∣
+
?
+
∣
x
i
p
?
x
j
p
∣
d(i, j) = |x_{i1} - x_{j1}| + |x_{i2} - x_{j2}| + \cdots + |x_{ip} - x_{jp}|
d(i,j)=∣xi1??xj1?∣+∣xi2??xj2?∣+?+∣xip??xjp?∣ Ordinal Variables: z i f = r i f ? 1 M f ? 1 z_{if} = \frac{r_{if} - 1}{M_f - 1} zif?=Mf??1rif??1? d ( i , j ) = ∑ f = 1 p δ i j ( f ) d i j ( f ) ∑ f = 1 p δ i j ( f ) d(i, j) = \frac{\sum_{f = 1}^p \delta_{ij}^{(f)} d_{ij}^{(f)}}{\sum_{f = 1}^p \delta_{ij}^{(f)}} d(i,j)=∑f=1p?δij(f)?∑f=1p?δij(f)?dij(f)?? c o s ( d 1 , d 2 ) = d 1 ? d 2 ∣ ∣ d 1 ∣ ∣ ? ∣ ∣ d 2 ∣ ∣ cos(d_1, d_2) = \frac{d_1 \cdot d_2}{||d_1|| \cdot ||d_2||} cos(d1?,d2?)=∣∣d1?∣∣?∣∣d2?∣∣d1??d2?? to evaluate the similarity of sentences. Chapter ThreeData ProcessingData cleaning, Data integration, Data reduction, Data transformation and data discretization. χ 2 \chi^2 χ2(chi-square) test
χ
2
=
∑
(
O
b
s
e
r
v
e
r
d
?
E
x
p
e
c
t
e
d
)
2
E
x
c
e
p
t
e
d
\chi^2 = \sum \frac{(Observerd - Expected)^2}{Excepted}
χ2=∑Excepted(Observerd?Expected)2? Correlation coefficient(Pearson’s product moment coefficient)r A , B = ∑ i = 1 n ( a ? A ˉ ) ( b ? B ˉ ) ( n ? 1 ) σ A σ B = ∑ i = 1 n ( a i b i ) ? n A ˉ B ˉ ( n ? 1 ) σ A σ B r_{A, B} = \frac{\sum_{i = 1}^n (a - \bar{A})(b - \bar{B})}{(n - 1) \sigma_A \sigma_B} = \frac{\sum_{i = 1}^n (a_i b_i) - n \bar{A} \bar{B}}{(n - 1) \sigma_A \sigma_B} rA,B?=(n?1)σA?σB?∑i=1n?(a?Aˉ)(b?Bˉ)?=(n?1)σA?σB?∑i=1n?(ai?bi?)?nAˉBˉ? r A , B > 0 r_{A, B} > 0 rA,B?>0 means A and B are positively correlated. Let
a
k
′
=
(
a
k
?
m
e
a
n
(
A
)
)
/
s
t
d
(
A
)
,
b
k
′
=
(
b
k
?
m
e
a
n
(
B
)
)
/
s
t
d
(
B
)
{a_k}' = (a_k - mean(A)) / std(A), {b_k}' = (b_k - mean(B)) / std(B)
ak?′=(ak??mean(A))/std(A),bk?′=(bk??mean(B))/std(B), Covariance
C
o
v
(
A
,
B
)
=
E
(
(
A
?
A
ˉ
)
(
B
?
B
ˉ
)
)
=
∑
i
=
1
n
(
a
i
?
A
ˉ
)
(
b
?
B
ˉ
)
n
Cov(A, B) = E((A - \bar{A})(B - \bar{B})) = \frac{\sum_{i = 1}^n (a_i - \bar{A})(b - \bar{B})}{n}
Cov(A,B)=E((A?Aˉ)(B?Bˉ))=n∑i=1n?(ai??Aˉ)(b?Bˉ)? Data reductionUnsupervised:
Supervised:
Semi-supervised:
Linear:
Nonlinear:
Dimensionality reduction (Feature reduction):
Selection: choose a best subset of size d from the available p features. PCAGiven { x 1 , . . . , x n } ∈ R p \{x_1, ..., x_n\} \in \mathbb{R}^p {x1?,...,xn?}∈Rp, target: get the a a a to maxmize the v a r ( z ) var(z) var(z), here z = a x z = ax z=ax v a r ( z ) = E ( ( z ? z ˉ ) 2 ) = 1 n ∑ i = 1 n ( a x i ? a x ˉ ) 2 = 1 n ∑ i = 1 n a T ( x i ? x ˉ ) ( x i ? x ˉ ) T a = a T S a S = 1 n ∑ i = 1 n ( x i ? x ˉ ) ( x i ? x ˉ ) T \begin{aligned} var(z) &= E((z - \bar{z})^2)\\ &= \frac{1}{n} \sum_{i = 1}^n (ax_i - a\bar{x})^2\\ &= \frac{1}{n} \sum_{i = 1}^n a^T(x_i - \bar{x})(x_i - \bar{x})^Ta\\ &= a^TSa\\ S &= \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})(x_i - \bar{x})^T \end{aligned} var(z)S?=E((z?zˉ)2)=n1?i=1∑n?(axi??axˉ)2=n1?i=1∑n?aT(xi??xˉ)(xi??xˉ)Ta=aTSa=n1?i=1∑n?(xi??xˉ)(xi??xˉ)T? which means
m
a
x
a
a
T
S
a
,
s
.
t
.
a
T
a
=
1
max_a a^TSa, s.t. a^Ta = 1
maxa?aTSa,s.t.aTa=1. L = a T S a ? λ ( a T a ? 1 ) ? L ? a = 2 S a ? 2 λ a = 0 \begin{aligned} L = a^TSa - \lambda(a^Ta - 1)\\ \frac{\partial L}{\partial a} = 2Sa - 2\lambda a = 0\\ \end{aligned} L=aTSa?λ(aTa?1)?a?L?=2Sa?2λa=0? So, λ \lambda λ and a a a is the pair of eigenvalue and eigenvector of S. Then v a r ( z ) = a T λ a = λ var(z) = a^T \lambda a = \lambda var(z)=aTλa=λ. So the lambda is chosen from large to small. Next,
m
a
x
a
2
a
2
T
S
a
2
,
s
.
t
.
a
2
T
a
2
=
1
,
c
o
v
(
z
(
2
)
,
z
(
1
)
)
=
0
max_{a_2} a_2^T S a_2, s.t. a_2^T a_2 = 1, cov(z^{(2)}, z^{(1)}) = 0
maxa2??a2T?Sa2?,s.t.a2T?a2?=1,cov(z(2),z(1))=0 if we want another PCA. Main theoretical result: LDA(Linear Discriminant Analysis)Find a transformation a, such that the a^TX_1 and a^TX_2 are maximally separated & each class is minimally dispersed (maximum separation).
m
a
x
?
(
a
(
x
1
ˉ
?
x
2
ˉ
)
)
2
,
m
i
n
?
v
a
r
(
z
1
)
,
m
i
n
?
v
a
r
(
z
2
)
max\ (a(\bar{x_1} - \bar{x_2}))^2, min\ var(z_1), min\ var(z_2)
max?(a(x1?ˉ??x2?ˉ?))2,min?var(z1?),min?var(z2?) Suppose there exists two class
w
1
,
w
2
w_1, w_2
w1?,w2? s i ~ 2 = ∑ y ∈ w i ( y ? μ i ~ ) 2 = ∑ x ∈ w i ( a T x ? a T μ i ) 2 = ∑ x ∈ w i ( a T x ? a T μ i ) ( a T x ? a T μ i ) T = ∑ x ∈ w i a T ( x ? μ i ) ( x ? μ i ) T a = a T S i a \tilde{s_i}^2 = \sum_{y \in w_i} (y - \tilde{\mu_i})^2 = \sum_{x \in w_i} (a^Tx - a^T\mu_i)^2 = \sum_{x \in w_i} (a^Tx - a^T\mu_i)(a^Tx - a^T\mu_i)^T = \sum_{x \in w_i} a^T(x - \mu_i)(x - \mu_i)^Ta = a^TS_ia si?~?2=∑y∈wi??(y?μi?~?)2=∑x∈wi??(aTx?aTμi?)2=∑x∈wi??(aTx?aTμi?)(aTx?aTμi?)T=∑x∈wi??aT(x?μi?)(x?μi?)Ta=aTSi?a within-in class scatter matrix: S W = S 1 + S 2 , s 1 ~ 2 + s 2 ~ 2 = a T S W a S_W = S_1 + S_2, \tilde{s_1}^2 + \tilde{s_2}^2 = a^TS_Wa SW?=S1?+S2?,s1?~?2+s2?~?2=aTSW?a ( μ 1 ~ ? μ 2 ~ ) 2 = ( a T μ 1 ? a T μ 2 ) 2 = a T ( μ 1 ? μ 2 ) ( μ 1 ? μ 2 ) T a = a T S B a (\tilde{\mu_1} - \tilde{\mu_2})^2 = (a^T\mu_1 - a^T\mu_2)^2 = a^T(\mu_1 - \mu_2) (\mu_1 - \mu_2)^Ta = a^TS_Ba (μ1?~??μ2?~?)2=(aTμ1??aTμ2?)2=aT(μ1??μ2?)(μ1??μ2?)Ta=aTSB?a between-class scatter matrix: S B = ( μ 1 ? μ 2 ) ( μ 1 ? μ 2 ) T S_B = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^T SB?=(μ1??μ2?)(μ1??μ2?)T
J
(
a
)
=
a
T
S
B
a
a
T
S
W
a
J(a) = \frac{a^TS_Ba}{a^TS_Wa}
J(a)=aTSW?aaTSB?a? Chapter FourFP miningitemset: A set of one or more items Find all the rules
X
→
Y
X \rightarrow Y
X→Y with minimum support and confidence. closed-patterns and max-patterns So a max-pattern is a closed-pattern. AprioriAn important property: ** Any subset of a frequent itemset must be frequent**. Method:
Example: Pseudo-code:
Major computational challenges:
Improving Apriori: general ideas:
FP-growthHere, we link an website, it[step 3, pages 28] says that “Recursively mine conditional FP‐trees and grow frequent patterns obtained so far. If the conditional FP‐tree contains a single path, simply enumerate all the patterns”. Mining sequential patternssequential patterns: GSPChapter FiveDecision Tree Decision TreeIt is derivated in the aspect of propability. We can calculate the propability of every output with the given input. If we assume every condition is independent, then P ( X ∣ C ) = ∏ P ( X i ∣ C ) P(X|C) = \prod P(X_i|C) P(X∣C)=∏P(Xi?∣C), then l o g P ( X ∣ C ) = ∑ l o g P ( X i ∣ C ) logP(X|C) = \sum logP(X_i|C) logP(X∣C)=∑logP(Xi?∣C), so we let the cost Function be l o g log log. To understand better, we can use the concept of thermodynamics, which is called entropy.
H
(
Y
)
=
?
∑
i
=
1
m
p
i
l
o
g
(
p
i
)
H(Y) = - \sum_{i = 1}^m p_i log(p_i)
H(Y)=?∑i=1m?pi?log(pi?) where
p
i
=
P
(
Y
=
y
i
)
p_i = P(Y = y_i)
pi?=P(Y=yi?)
I
n
f
o
(
D
)
=
?
∑
i
=
1
m
p
i
l
o
g
2
(
p
i
)
Info(D) = -\sum_{i = 1}^m p_i log_2(p_i)
Info(D)=?∑i=1m?pi?log2?(pi?) Bayes Classification MethodsFirst, we know that
P
(
B
)
=
∑
i
=
1
M
P
(
B
∣
A
i
)
P
(
A
i
)
P(B) = \sum_{i = 1}^M P(B|A_i)P(A_i)
P(B)=∑i=1M?P(B∣Ai?)P(Ai?), and
P
(
H
∣
X
)
=
P
(
X
∣
H
)
P
(
H
)
P
(
X
)
P(H|X) = \frac{P(X|H)P(H)}{P(X)}
P(H∣X)=P(X)P(X∣H)P(H)? Na?ve Bayesian prediction requires each conditional prob. be non-zero. Otherwise, the predicted prob. will be zero.
Support Vector MachinesModel Evaluation and SelectionConfusion Matrix:
Accuracy:
T
P
+
T
N
A
L
L
\frac{TP + TN}{ALL}
ALLTP+TN? Precision:
T
P
T
P
+
F
P
\frac{TP}{TP + FP}
TP+FPTP? Holdout methodCross-validationBootstrapEstimating Confidence Intervalst-test ROC curvesChapter SixK-meansK-medoidschoose the closet point of the K-means center. Q |
|
|
上一篇文章 查看所有文章 |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 | -2024/11/27 6:26:43- |
|
网站联系: qq:121756557 email:121756557@qq.com IT数码 |