You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

57 lines
1.6 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# 1. 异常检测——概述
### 问题描述
Problem Formulation
- 有一批训练数据如:{x1,x2,...,xn}
- 我们想从这批输入数据中分出类似的,或者不类似的
![1619436116620](assets/1619436116620.png)
> 类似上图找出数据中anomaly的数据这个anomaly并不表示它是有问题只是说它跟大多数数据不一样。有可能是特别好的有可能是特别坏的。
### 什么是异常
What is Anomaly?
什么是异常取决于大部分是什么
![1619436606950](assets/1619436606950.png)
> 你给它看很多雷丘,那么皮卡丘就是异常
>
> 你给它看很多皮卡丘,那么雷丘就是异常
>
> 你给它看很多神奇宝贝,那么数码宝贝就是异常
### 异常检测的应用
Applications
- Fraud Detection诈欺检测
- Training data正常刷卡行为x盗刷
- Ref: https://www.kaggle.com/ntnu-testimon/paysim1/home
- Ref: https://www.kaggle.com/mlg-ulb/credicardraud/home
- Network Intrusion Detection入侵检测
- Training data:正常连接x攻击行为
- Ref:http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
- Cancer Detection细胞检测
- Training data正常细胞x癌细胞
- Ref:http://kdd.ics.uci.edu/uciml/breast-cancer-wisconsin-data/home
### 如何分类
Binary Classification?
- Given normal data ![1619437444012](assets/1619437444012.png)
- Given anomaly ![1619437459729](assets/1619437459729.png)
- Then training a binary classifier ......
如上给它正常数据和异常数据然后自动分成Class1和Class2