diff --git a/assets/1606221473495.png b/assets/1606221473495.png new file mode 100644 index 0000000..ceead42 Binary files /dev/null and b/assets/1606221473495.png differ diff --git a/assets/1606221707427.png b/assets/1606221707427.png new file mode 100644 index 0000000..8dadc44 Binary files /dev/null and b/assets/1606221707427.png differ diff --git a/必备数学基础.md b/必备数学基础.md index e44c342..4956723 100644 --- a/必备数学基础.md +++ b/必备数学基础.md @@ -1235,4 +1235,26 @@ notebook已更新,markdown待更新 ![1606031463635](assets/1606031463635.png) -待更新,notebook已更新 \ No newline at end of file +待更新,notebook已更新 + + + +### KMEANS算法 + +#### KMEANS算法概述 + +聚类概念: + +- 无监督问题:手上没有标签 +- 聚类:相似的东西分到一组 +- 难点:如何评估,如何调参 + +![1606221473495](assets/1606221473495.png) + +基本概念: + +- 想要得到簇的个数,需要指定K值,即聚成几个堆 +- 质心:均值,即向量各维取平均,最中间的位置 +- 距离度量:常用欧几里得距离和余弦相似度(先标准化) +- 优化目标:![1606221707427](assets/1606221707427.png),让每一个样本到中心点(质心)的距离越小越好,即每个点到中心点的和最小,越小越相似 +