IT数码 购物 网址 头条 软件 日历 阅读 图书馆
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
   -> 人工智能 -> 数据挖掘(凑标题字数) -> 正文阅读


Chapter Two

Data dispersion characteristics


Mean: x ˉ = 1 n ∑ i = 1 n x i \bar{x} = \frac{1}{n} \sum_{i = 1}^n x_i xˉ=n1?i=1n?xi?, μ = ∑ x N \mu = \frac{\sum x}{N} μ=Nx?
Weighted Mean: x ˉ = ∑ i = 1 n w i x i ∑ i = 1 n w i \bar{x} = \frac{\sum_{i = 1}^n w_i x_i}{\sum_{i = 1}^n w_i} xˉ=i=1n?wi?i=1n?wi?xi??

Median(for grouped data): m e d i a n = L 1 + ( n / 2 ? ( ∑ f r e q ) l f r e q m e d i a n ) w i d t h median = L_1 + (\frac{n / 2 - (\sum freq)l}{freq_{median}}) width median=L1?+(freqmedian?n/2?(freq)l?)width

Mode: m e a n ? m o d e = 3 × ( m e a n ? m e d i a n ) mean - mode = 3 \times (mean - median) mean?mode=3×(mean?median)
mean > median, positively skewed
mean < median, negatively skewed

Quartiles: Q 1 Q_1 Q1?(25th percentile), Q 3 Q_3 Q3?(75th percentile)
Inter-quartile range: I Q R = Q 3 ? Q 1 IQR = Q_3 - Q_1 IQR=Q3??Q1?
Five number summary: min, Q 1 Q_1 Q1?, median, Q 3 Q_3 Q3?, max
Boxplot: ends of the box are the quartiles; median is marked; add whiskers, and plot outliers individually
Outlier: usually, a value higher/lower than 1.5 x IQR

unbiased estimation: s 2 = 1 n ? 1 ∑ i = 1 n ( x i ? x ˉ ) 2 = 1 n ? 1 [ ∑ i = 1 n x i 2 ? 1 n ( ∑ i = 1 n x i ) 2 ] s^2 = \frac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2 = \frac{1}{n - 1}[\sum_{i = 1}^n x_i^2 - \frac{1}{n}(\sum_{i = 1}^n x_i)^2] s2=n?11?i=1n?(xi??xˉ)2=n?11?[i=1n?xi2??n1?(i=1n?xi?)2]
biased estimation: σ 2 = 1 n ∑ i = 1 n ( x i ? μ ) 2 = 1 n ∑ i = 1 n x i 2 ? μ 2 \sigma^2 = \frac{1}{n} \sum_{i = 1}^n (x_i - \mu)^2 = \frac{1}{n} \sum_{i = 1}^n x_i^2 - \mu^2 σ2=n1?i=1n?(xi??μ)2=n1?i=1n?xi2??μ2

Pixel-Oriented Visualization Techniques

Similarity and Dissimilarity

1qrq + r
0sts + t
sumq + sr + tp

Distance measure for symmetric binary variables: d ( i , j ) = r + s q + r + s + t d(i, j) = \frac{r + s}{q + r + s + t} d(i,j)=q+r+s+tr+s?
Distance measure for asymmetric binary variables: d ( i , j ) = r + s q + + r + s d(i, j) = \frac{r + s}{q + + r + s} d(i,j)=q++r+sr+s?

Here, the asymmetric means the loss cost is different, for some data sets, like the FP is an absolute majority.

Jaccard coefficient (similarity measure for asymmetric binary variables): s i m J a c c a r d ( i , j ) = q q + r + s sim_{Jaccard}(i, j) = \frac{q}{q + r + s} simJaccard?(i,j)=q+r+sq?

Minowski distance(L-h norm): d ( i , j ) = ∣ x i 1 ? x j 1 ∣ h + ∣ x i 2 ? x j 2 ∣ h + ? + ∣ x i p ? x j p ∣ h h d(i, j) = \sqrt[h]{|x_{i1} - x_{j1}|^h + |x_{i2} - x_{j2}|^h + \cdots + |x_{ip} - x_{jp}|^h} d(i,j)=hxi1??xj1?h+xi2??xj2?h+?+xip??xjp?h ?


  1. d ( i , j ) > 0 d(i, j) > 0 d(i,j)>0 if i ≠ j i \neq j i?=j and d ( i , i ) = 0 d(i, i) = 0 d(i,i)=0 (Positive definiteness)
  2. d ( i , j ) = d ( j , i ) d(i, j) = d(j, i) d(i,j)=d(j,i) (Symmetry)
  3. d ( i , j ) ? d ( i , k ) + d ( k , j ) d(i, j) \leqslant d(i, k) + d(k, j) d(i,j)?d(i,k)+d(k,j) (Triangle Inequality)

A distance that satifies these properties is a metric.

h = 1 : h = 1: h=1: Mabhattan distance d ( i , j ) = ∣ x i 1 ? x j 1 ∣ + ∣ x i 2 ? x j 2 ∣ + ? + ∣ x i p ? x j p ∣ d(i, j) = |x_{i1} - x_{j1}| + |x_{i2} - x_{j2}| + \cdots + |x_{ip} - x_{jp}| d(i,j)=xi1??xj1?+xi2??xj2?+?+xip??xjp?
h = 2 : h = 2: h=2: Euclidean distance d ( i , j ) = ∣ x i 1 ? x j 1 ∣ 2 + ∣ x i 2 ? x j 2 ∣ 2 + ? + ∣ x i p ? x j p ∣ 2 d(i, j) = \sqrt{|x_{i1} - x_{j1}|^2 + |x_{i2} - x_{j2}|^2 + \cdots + |x_{ip} - x_{jp}|^2} d(i,j)=xi1??xj1?2+xi2??xj2?2+?+xip??xjp?2 ?
h → ∞ : h \rightarrow \infty: h: supernum distance d ( i , j ) = l i m h → ∞ ( ∑ f = 1 p ∣ x i f ? x j f ∣ ) 1 h = m a x f p ∣ x i f ? x j f ∣ d(i, j) = lim_{h \rightarrow \infty} (\sum_{f = 1}^p |x_{if} - x_{jf}|)^{\frac{1}{h}} = max_f^p |x_{if} - x_{jf}| d(i,j)=limh?(f=1p?xif??xjf?)h1?=maxfp?xif??xjf?

Ordinal Variables: z i f = r i f ? 1 M f ? 1 z_{if} = \frac{r_{if} - 1}{M_f - 1} zif?=Mf??1rif??1?

d ( i , j ) = ∑ f = 1 p δ i j ( f ) d i j ( f ) ∑ f = 1 p δ i j ( f ) d(i, j) = \frac{\sum_{f = 1}^p \delta_{ij}^{(f)} d_{ij}^{(f)}}{\sum_{f = 1}^p \delta_{ij}^{(f)}} d(i,j)=f=1p?δij(f)?f=1p?δij(f)?dij(f)??

c o s ( d 1 , d 2 ) = d 1 ? d 2 ∣ ∣ d 1 ∣ ∣ ? ∣ ∣ d 2 ∣ ∣ cos(d_1, d_2) = \frac{d_1 \cdot d_2}{||d_1|| \cdot ||d_2||} cos(d1?,d2?)=d1??d2?d1??d2?? to evaluate the similarity of sentences.

Chapter Three

Data Processing

Data cleaning, Data integration, Data reduction, Data transformation and data discretization.

χ 2 \chi^2 χ2(chi-square) test

χ 2 = ∑ ( O b s e r v e r d ? E x p e c t e d ) 2 E x c e p t e d \chi^2 = \sum \frac{(Observerd - Expected)^2}{Excepted} χ2=Excepted(Observerd?Expected)2?
The larger the Χ2 value, the more likely the variables are related.

Correlation coefficient(Pearson’s product moment coefficient)

r A , B = ∑ i = 1 n ( a ? A ˉ ) ( b ? B ˉ ) ( n ? 1 ) σ A σ B = ∑ i = 1 n ( a i b i ) ? n A ˉ B ˉ ( n ? 1 ) σ A σ B r_{A, B} = \frac{\sum_{i = 1}^n (a - \bar{A})(b - \bar{B})}{(n - 1) \sigma_A \sigma_B} = \frac{\sum_{i = 1}^n (a_i b_i) - n \bar{A} \bar{B}}{(n - 1) \sigma_A \sigma_B} rA,B?=(n?1)σA?σB?i=1n?(a?Aˉ)(b?Bˉ)?=(n?1)σA?σB?i=1n?(ai?bi?)?nAˉBˉ?

r A , B > 0 r_{A, B} > 0 rA,B?>0 means A and B are positively correlated.

Let a k ′ = ( a k ? m e a n ( A ) ) / s t d ( A ) , b k ′ = ( b k ? m e a n ( B ) ) / s t d ( B ) {a_k}' = (a_k - mean(A)) / std(A), {b_k}' = (b_k - mean(B)) / std(B) ak?=(ak??mean(A))/std(A),bk?=(bk??mean(B))/std(B),
then c o r r e l a t i o n ( A , B ) = A ′ ? B ′ correlation(A, B) = {A}' \cdot {B}' correlation(A,B)=A?B


C o v ( A , B ) = E ( ( A ? A ˉ ) ( B ? B ˉ ) ) = ∑ i = 1 n ( a i ? A ˉ ) ( b ? B ˉ ) n Cov(A, B) = E((A - \bar{A})(B - \bar{B})) = \frac{\sum_{i = 1}^n (a_i - \bar{A})(b - \bar{B})}{n} Cov(A,B)=E((A?Aˉ)(B?Bˉ))=ni=1n?(ai??Aˉ)(b?Bˉ)?
r A , B = C o v ( A , B ) σ A σ B r_{A, B} = \frac{Cov(A, B)}{\sigma_A \sigma_B} rA,B?=σA?σB?Cov(A,B)?
C o v ( A , B ) = E ( ( A ? A ˉ ) ( B ? B ˉ ) ) = E ( A ? B ) ? A ˉ B ˉ Cov(A, B) = E((A - \bar{A})(B - \bar{B})) = E(A \cdot B) - \bar{A} \bar{B} Cov(A,B)=E((A?Aˉ)(B?Bˉ))=E(A?B)?AˉBˉ

Data reduction


  1. Latent Semantic Indexing (LSI): truncated SVD
  2. Principal Component Analysis (PCA)
  3. Independent Component Analysis (ICA)
  4. Canonical Correlation Analysis (CCA)


  1. Linear Discriminant Analysis (LDA)


  1. Semi-supervised Discriminant Analysis (SDA)


  1. Latent Semantic Indexing (LSI): truncated SVD
  2. Principal Component Analysis (PCA)
  3. Linear Discriminant Analysis (LDA)
  4. Canonical Correlation Analysis (CCA)


  1. Nonlinear feature reduction using kernels
  2. Manifold learning

Dimensionality reduction (Feature reduction):

  1. Feature extraction
  2. Feature selection

Selection: choose a best subset of size d from the available p features.
Extraction: given p features (set X), extract d new features (set Z) by linear or non-linear combination of all the p features.


Given { x 1 , . . . , x n } ∈ R p \{x_1, ..., x_n\} \in \mathbb{R}^p {x1?,...,xn?}Rp, target: get the a a a to maxmize the v a r ( z ) var(z) var(z), here z = a x z = ax z=ax

v a r ( z ) = E ( ( z ? z ˉ ) 2 ) = 1 n ∑ i = 1 n ( a x i ? a x ˉ ) 2 = 1 n ∑ i = 1 n a T ( x i ? x ˉ ) ( x i ? x ˉ ) T a = a T S a S = 1 n ∑ i = 1 n ( x i ? x ˉ ) ( x i ? x ˉ ) T \begin{aligned} var(z) &= E((z - \bar{z})^2)\\ &= \frac{1}{n} \sum_{i = 1}^n (ax_i - a\bar{x})^2\\ &= \frac{1}{n} \sum_{i = 1}^n a^T(x_i - \bar{x})(x_i - \bar{x})^Ta\\ &= a^TSa\\ S &= \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})(x_i - \bar{x})^T \end{aligned} var(z)S?=E((z?zˉ)2)=n1?i=1n?(axi??axˉ)2=n1?i=1n?aT(xi??xˉ)(xi??xˉ)Ta=aTSa=n1?i=1n?(xi??xˉ)(xi??xˉ)T?

which means m a x a a T S a , s . t . a T a = 1 max_a a^TSa, s.t. a^Ta = 1 maxa?aTSa,s.t.aTa=1.
We use Lagrange method to solve the problem.

L = a T S a ? λ ( a T a ? 1 ) ? L ? a = 2 S a ? 2 λ a = 0 \begin{aligned} L = a^TSa - \lambda(a^Ta - 1)\\ \frac{\partial L}{\partial a} = 2Sa - 2\lambda a = 0\\ \end{aligned} L=aTSa?λ(aTa?1)?a?L?=2Sa?2λa=0?

So, λ \lambda λ and a a a is the pair of eigenvalue and eigenvector of S. Then v a r ( z ) = a T λ a = λ var(z) = a^T \lambda a = \lambda var(z)=aTλa=λ. So the lambda is chosen from large to small.

Next, m a x a 2 a 2 T S a 2 , s . t . a 2 T a 2 = 1 , c o v ( z ( 2 ) , z ( 1 ) ) = 0 max_{a_2} a_2^T S a_2, s.t. a_2^T a_2 = 1, cov(z^{(2)}, z^{(1)}) = 0 maxa2??a2T?Sa2?,s.t.a2T?a2?=1,cov(z(2),z(1))=0 if we want another PCA.
c o v ( z ( 2 ) , z ( 1 ) ) = a 2 T S a 1 = λ a 2 T a 1 cov(z^{(2)}, z^{(1)}) = a_2^T S a_1 = \lambda a_2^T a_1 cov(z(2),z(1))=a2T?Sa1?=λa2T?a1?, so(I don’t know) S a 2 = λ a 2 S a_2 = \lambda a_2 Sa2?=λa2?, and the λ \lambda λ is the second largest eigenvalue.
Dimension reduction: χ ∈ R p × n → A T χ ∈ R d × n \chi \in \mathbb{R}^{p×n} \rightarrow A^T \chi∈\mathbb{R}^{d×n} χRp×nATχRd×n
Original data(Reconstruction): A T χ ∈ R d × n → X ˉ = A ( A T X ) ∈ R p × n A^T \chi \in \mathbb{R}^{d×n} \rightarrow \bar{X} =A(A^TX) \in \mathbb{R}^{p×n} ATχRd×nXˉ=A(ATX)Rp×n

Main theoretical result:
The matrix A consisting of the first d eigenvectors of the covariance matrix S solves the following optimization problem
m i n A ∈ R p × d ∣ ∣ χ ? A A T X ∣ ∣ F 2 , s . t . A T A = I d min_{A \in \mathbb{R}^{p \times d}} ||\chi - AA^TX||_F^2, s.t. A^TA = I_d minARp×d?χ?AATXF2?,s.t.ATA=Id?

LDA(Linear Discriminant Analysis)

Find a transformation a, such that the a^TX_1 and a^TX_2 are maximally separated & each class is minimally dispersed (maximum separation).

m a x ? ( a ( x 1 ˉ ? x 2 ˉ ) ) 2 , m i n ? v a r ( z 1 ) , m i n ? v a r ( z 2 ) max\ (a(\bar{x_1} - \bar{x_2}))^2, min\ var(z_1), min\ var(z_2) max?(a(x1?ˉ??x2?ˉ?))2,min?var(z1?),min?var(z2?)
target: m a x ? J = ( a ( x 1 ˉ ? x 2 ˉ ) ) 2 v a r ( z 1 ) + v a r ( z 2 ) max\ J = \frac{(a(\bar{x_1} - \bar{x_2}))^2}{var(z_1) + var(z_2)} max?J=var(z1?)+var(z2?)(a(x1?ˉ??x2?ˉ?))2?

Suppose there exists two class w 1 , w 2 w_1, w_2 w1?,w2?
z = a T x z = a^Tx z=aTx
μ i ~ = 1 n i ∑ z ∈ w i z \tilde{\mu_i} = \frac{1}{n_i} \sum_{z \in w_i} z μi?~?=ni?1?zwi??z
μ i = 1 n i ∑ x ∈ w i x , μ i ~ = a T μ i \mu_i = \frac{1}{n_i} \sum_{x \in w_i} x, \tilde{\mu_i} = a^T \mu_i μi?=ni?1?xwi??x,μi?~?=aTμi?
∣ μ 1 ~ ? μ 2 ~ ∣ = ∣ a T ( μ 1 ? μ 2 ) ∣ |\tilde{\mu_1} - \tilde{\mu_2}| = |a^T(\mu_1 - \mu_2)| μ1?~??μ2?~?=aT(μ1??μ2?)
s i ~ 2 = ∑ z ∈ w i ( z ? μ i ~ ) 2 \tilde{s_i}^2 = \sum_{z \in w_i} (z - \tilde{\mu_i})^2 si?~?2=zwi??(z?μi?~?)2
J ( a ) = ( μ 1 ~ ? μ 2 ~ ) 2 s 1 ~ 2 + s 2 ~ 2 J(a) = \frac{(\tilde{\mu_1} - \tilde{\mu_2})^2}{\tilde{s_1}^2 + \tilde{s_2}^2} J(a)=s1?~?2+s2?~?2(μ1?~??μ2?~?)2?

s i ~ 2 = ∑ y ∈ w i ( y ? μ i ~ ) 2 = ∑ x ∈ w i ( a T x ? a T μ i ) 2 = ∑ x ∈ w i ( a T x ? a T μ i ) ( a T x ? a T μ i ) T = ∑ x ∈ w i a T ( x ? μ i ) ( x ? μ i ) T a = a T S i a \tilde{s_i}^2 = \sum_{y \in w_i} (y - \tilde{\mu_i})^2 = \sum_{x \in w_i} (a^Tx - a^T\mu_i)^2 = \sum_{x \in w_i} (a^Tx - a^T\mu_i)(a^Tx - a^T\mu_i)^T = \sum_{x \in w_i} a^T(x - \mu_i)(x - \mu_i)^Ta = a^TS_ia si?~?2=ywi??(y?μi?~?)2=xwi??(aTx?aTμi?)2=xwi??(aTx?aTμi?)(aTx?aTμi?)T=xwi??aT(x?μi?)(x?μi?)Ta=aTSi?a

within-in class scatter matrix: S W = S 1 + S 2 , s 1 ~ 2 + s 2 ~ 2 = a T S W a S_W = S_1 + S_2, \tilde{s_1}^2 + \tilde{s_2}^2 = a^TS_Wa SW?=S1?+S2?,s1?~?2+s2?~?2=aTSW?a

( μ 1 ~ ? μ 2 ~ ) 2 = ( a T μ 1 ? a T μ 2 ) 2 = a T ( μ 1 ? μ 2 ) ( μ 1 ? μ 2 ) T a = a T S B a (\tilde{\mu_1} - \tilde{\mu_2})^2 = (a^T\mu_1 - a^T\mu_2)^2 = a^T(\mu_1 - \mu_2) (\mu_1 - \mu_2)^Ta = a^TS_Ba (μ1?~??μ2?~?)2=(aTμ1??aTμ2?)2=aT(μ1??μ2?)(μ1??μ2?)Ta=aTSB?a

between-class scatter matrix: S B = ( μ 1 ? μ 2 ) ( μ 1 ? μ 2 ) T S_B = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^T SB?=(μ1??μ2?)(μ1??μ2?)T

J ( a ) = a T S B a a T S W a J(a) = \frac{a^TS_Ba}{a^TS_Wa} J(a)=aTSW?aaTSB?a?
S B a = λ S W a S_Ba = \lambda S_Wa SB?a=λSW?a
S W ? 1 S B a = λ a S_W^{-1}S_Ba = \lambda a SW?1?SB?a=λa

Chapter Four

FP mining

itemset: A set of one or more items
k-itemset X = { x 1 , … , x k } X = \{x_1, …, x_k\} X={x1?,,xk?}
(absolute) support, or, support count of X: Frequency or occurrence of an itemset X X X;
(relative) support, s, is the fraction of transactions that contains X X X (i.e., the probability that a transaction contains X X X).
An itemset X X X is frequent if X X X’s support is no less than a minsup threshold.

Find all the rules X → Y X \rightarrow Y XY with minimum support and confidence.
support, s, probability that a transaction contains X ∪ Y X \cup Y XY;
confidence, c, conditional probability that a transaction having X X X also contains Y Y Y.

closed-patterns and max-patterns
An itemset X X X is closed if X is frequent and there exists no super-pattern Y ? X Y \supset X Y?X, with the same support as X X X;
An itemset X X X is a max-pattern if X X X is frequent and there exists no frequent super-pattern Y ? X Y \supset X Y?X.

So a max-pattern is a closed-pattern.


An important property: ** Any subset of a frequent itemset must be frequent**.
Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested!


  1. Initially, scan DB once to get frequent 1-itemset;
  2. Generate length (k+1) candidate itemsets from length k frequent itemsets;
  3. Test the candidates against DB;
  4. Terminate when no frequent or candidate set can be generated;


Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=null; k++) do begin
    Ck+1 = candidates generated from Lk;
    for each transaction t in database do
  increment the count of all candidates in Ck+1 that are contained in t
    Lk+1  = candidates in Ck+1 with min_support
return union(Lk);

Major computational challenges:

  1. Multiple scans of transaction database
  2. Huge number of candidates
  3. Tedious workload of support counting for candidates

Improving Apriori: general ideas:

  1. Reduce passes of transaction database scans
  2. Shrink number of candidates
  3. Facilitate support counting of candidates


Here, we link an website, it[step 3, pages 28] says that “Recursively mine conditional FP‐trees and grow frequent patterns obtained so far. If the conditional FP‐tree contains a single path, simply enumerate all the patterns”.

Mining sequential patterns

sequential patterns:


Chapter Five

Decision Tree
Bayes Classification Methods
Support Vector Machines

Decision Tree

It is derivated in the aspect of propability. We can calculate the propability of every output with the given input. If we assume every condition is independent, then P ( X ∣ C ) = ∏ P ( X i ∣ C ) P(X|C) = \prod P(X_i|C) P(XC)=P(Xi?C), then l o g P ( X ∣ C ) = ∑ l o g P ( X i ∣ C ) logP(X|C) = \sum logP(X_i|C) logP(XC)=logP(Xi?C), so we let the cost Function be l o g log log. To understand better, we can use the concept of thermodynamics, which is called entropy.

H ( Y ) = ? ∑ i = 1 m p i l o g ( p i ) H(Y) = - \sum_{i = 1}^m p_i log(p_i) H(Y)=?i=1m?pi?log(pi?) where p i = P ( Y = y i ) p_i = P(Y = y_i) pi?=P(Y=yi?)
H ( Y ∣ X ) = ? ∑ x p ( x ) H ( Y ∣ X = x ) H(Y|X) = - \sum_x p(x)H(Y|X = x) H(YX)=?x?p(x)H(YX=x)

I n f o ( D ) = ? ∑ i = 1 m p i l o g 2 ( p i ) Info(D) = -\sum_{i = 1}^m p_i log_2(p_i) Info(D)=?i=1m?pi?log2?(pi?)
I n f o A ( D ) = ? ∑ j = 1 v ∣ D j ∣ ∣ D ∣ × I n f o ( D j ) Info_A(D) = -\sum_{j = 1}^v \frac{|D_j|}{|D|} \times Info(D_j) InfoA?(D)=?j=1v?DDj??×Info(Dj?)
G a i n ( A ) = I n f o ( D ) ? I n f o A ( D ) Gain(A) = Info(D) - Info_A(D) Gain(A)=Info(D)?InfoA?(D)

Bayes Classification Methods

First, we know that P ( B ) = ∑ i = 1 M P ( B ∣ A i ) P ( A i ) P(B) = \sum_{i = 1}^M P(B|A_i)P(A_i) P(B)=i=1M?P(BAi?)P(Ai?), and P ( H ∣ X ) = P ( X ∣ H ) P ( H ) P ( X ) P(H|X) = \frac{P(X|H)P(H)}{P(X)} P(HX)=P(X)P(XH)P(H)?
Assume all condition is independent, then P ( X ∣ C i ) = ∏ k = 1 n P ( x k ∣ C i ) P(X|C_i) = \prod_{k = 1}^n P(x_k|C_i) P(XCi?)=k=1n?P(xk?Ci?)

Na?ve Bayesian prediction requires each conditional prob. be non-zero. Otherwise, the predicted prob. will be zero.
Use Laplacian correction:

  1. Adding 1 to each case
  2. The “corrected” prob. estimates are close to their “uncorrected” counterparts

Support Vector Machines


Model Evaluation and Selection

Confusion Matrix:

Actual class/ Predicted class C 1 C_1 C1? ? C 1 \neg C_1 ?C1?
C 1 C_1 C1?True Positive(TP)False Negative(FN)
? C 1 \neg C_1 ?C1?False Positive(FP)True Negative(TN)

Accuracy: T P + T N A L L \frac{TP + TN}{ALL} ALLTP+TN?
Error rate: F P + F N A L L \frac{FP + FN}{ALL} ALLFP+FN?
Sensitivity: T P P \frac{TP}{P} PTP?
Specificity: T N N \frac{TN}{N} NTN?

Precision: T P T P + F P \frac{TP}{TP + FP} TP+FPTP?
Recall: T P T P + F N \frac{TP}{TP + FN} TP+FNTP?
F measure: 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l \frac{2 \times Precision \times Recall}{Precision + Recall} Precision+Recall2×Precision×Recall?
F-beta measure: ( 1 + β 2 ) × P r e c i s i o n × R e c a l l β × P r e c i s i o n + R e c a l l \frac{(1 + \beta^2) \times Precision \times Recall}{\beta \times Precision + Recall} β×Precision+Recall(1+β2)×Precision×Recall?

Holdout method



Estimating Confidence Intervals


ROC curves

Chapter Six



choose the closet point of the K-means center.


  人工智能 最新文章
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
论文笔记:TEACHTEXT: CrossModal Generaliz
详解Python 3.x 导入(import)
上一篇文章           查看所有文章
加:2021-11-02 23:13:39  更:2021-11-02 23:14:05 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 -2025/1/11 7:59:11-

  网站联系: qq:121756557  IT数码