Classification 分类
-
Given a collection of records (training set) Each record contains a set of attributes, one of the attributes is the class. -
Find a model for class attribute as a function of the values of other attributes. -
Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
Clustering 聚类
- Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that
- Data points in one cluster are more similar to one another.
- Data points in separate clusters are less similar to one another.
- Similarity Measures:
- Euclidean Distance if attributes are continuous.
- Other Problem-specific Measures, e.g., Cosine Similarity, Hamming Distance, Gaussian Distance etc.
Association Rule Discovery 关联规则发现
- Given a set of records each of which contain some number of items from a given collection;
- Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
Sequential Pattern Discovery 顺序模式发现
- Given a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.
- Rules are formed by first discovering patterns. Event occurrences in the patterns are governed by timing constraints.
Regression 回归
- Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Y = aX + b
- Greatly studied in statistics, neural network fields.
- Examples
- Predicting sales amounts of new product based on advertising expenditure.
- Predicting wind velocities as a function of temperature, humidity, air pressure, etc.
- Time series prediction of stock market indices.
Deviation/Anomaly Detection 偏差/异常检测
- Detect significant deviations from normal behavior.
- Applications:
- Credit Card Fraud Detection 信用卡欺诈检测
- Network Intrusion Detection 网络入侵检测
Deep Learning
Graph Learning
|