| |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
-> 人工智能 -> 《Machine Learning Fundamentals》Class Notes -- Chapter Nine Clustering -> 正文阅读 |
|
[人工智能]《Machine Learning Fundamentals》Class Notes -- Chapter Nine Clustering |
What are clustering algorithms used for?We want to divide the data into several different types(different clusters) without labels. Data in the same cluster have some similar features for a given dataset. We might as well define tihs similarities as distance. Therefore, our goal is to find a partition such that the intra-cluster distance is small and the inter-cluster distance is large.
Figure1:intra-cluster and inter-cluster distance
we need to determine the appropriate distance measure based on the samples. Here we will use Euclidean distance as a similarity measure. E u c l i d e a n ? d i s t a n c e : ∑ u = 1 n ∥ x i u ? x j u ∥ 2 Euclidean\ distance: \sqrt{\sum_{u=1}^n\|x_{iu}-x_{ju}\|^2} Euclidean?distance:u=1∑n?∥xiu??xju?∥2?
Figure2:Clustring
The problem we need to optimizeDivide the dataset { x 1 , x 2 , . . . , x M } \{x_1,x_2,...,x_M\} {x1?,x2?,...,xM?} into disjoint sets { S 1 , S 2 , . . . , S K } \{S_1,S_2,...,S_K\} {S1?,S2?,...,SK?}. For each set S k S_k Sk?, the representative point is chosen as μ k \mu_k μk?. The loss function can be defined as: Therefore, our goal is to find an optimal partition that minimizes
E
E
E. (Note: Going directly to find the optimal solution of E E E is an NP-hard problem.) K-Means
When we do the above operations, the value of
E
E
E is updated to
E
′
E'
E′,
E
′
≤
E
E'\le E
E′≤E, and
E
E
E has a lower bound
E
≥
0
E\ge 0
E≥0. This also can be proved by EM(Expectation-Maximization) algorithm. X-meansIn the K-means algorithm, the value of K K K is fixed. However, it is often difficult to choose the best K K K. For the X-means algorithm, users can choose a range of
K
K
K, and the X-means first runs the ordinary K-means algorithm according to the lower limit of the range. According to the value of BIC(Bayesian information criterion), the X-means algorithm determines whether to divide each cluster into two.
Figure3: An illustration of X-means algorithm. Source: www.cs.cmu.edu
Hierarchical ClusteringHierarchical clustering provides another idea to help users interpret the appropriate number of clusters. Take AGNES(AGglomerative NESting) as an example:
Figure4: An illustration of hierarchical clustering. Source: 🍉 Book
Users can observe the distance between clusters and choose an optimal K K K by themselves. Code(K-means)
Figure5: The result of K-means
|
|
|
上一篇文章 下一篇文章 查看所有文章 |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 | -2024/11/26 7:39:14- |
|
网站联系: qq:121756557 email:121756557@qq.com IT数码 |