IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> Applied Machine Learning Notes -> 正文阅读

[人工智能]Applied Machine Learning Notes

Applied Machine Learning

Introduction

Terms

A dataset that is intended to be analyzed by machine learning method is supposed to have Feature (X) and Target Value/Label (y)

Traning and test sets are needed for a given dataset.
Model fitting will produce a ‘training model’ by using the training set of data. Then, we can evaluate the model based on the training model.

There are two types of problems in machine learning: classfication and regression. Both catrgories take a set of traning instances and learn a mapping to produce a target value. Classification return discret value, while Regression return continuous value.

Overfitting and Underfitting

Generalization ability refers to an algorithm’s ability to give accurate predictions for new previously unseen data.

Overfit Models are too complex. Not likely to generalize well to new examples.
Underfit Models are too simple. Not do well on training data.

Supervised Learning

k-Nearest Neighbor

  1. Find the most similar (closest) instances (in X_train) to the x_test
  2. Get the corresponding y_label of the X_train
  3. Predict the label for x_test by combing label acquired from the 2nd step.

Beyond classification, knn can be used for regression. Simply find the closest output value corresponding to the features. 在这里插入图片描述

Linear Model

It is a sum of weighted variables that predicts a target output calue given an input data instance.
(eg: predicting housing prices, (ax + by + cz = target_value)

Input feature vector: x = (x0, x1, x2…)
Parameters to estimate
w = (w0,w1,w2…) slope
b = (b0,b1,b2…) constant bias

Least Squares

Finds the w and b that minimizes the mean squared error of the model: the sum of squared differences between predicted target and actual target values.

No parameters to control model complexity.

Parameters (w,b) are estimated from training data. The learning algorithm would want to minimize a loss function.

Loss function in this case is Sum of squared differences (RSS) over the training data between predicted target and actual target values.

Ridge Regression

L2 Penalty: In addition to the least-squares criterion, add an additional parameter to regularize the w to prevent overfitting of the model. + alpha*sum(w^2). Sum of w entries is minimized to reduce the complexity of the model. Higher alpha means more regularization and simpler models.

Regularization: prevents overfitting by restricting the model and reduce its complexity.

Normalization: All features are in the same scale so that weight on the regularization penalty is fair in this case. Also could lead to faster convergence in learning. MinMax scaling can do this job (compute the min value and max value, transform a given feature xi to a scaled version).

Lasso Regression

L1 Penalty: Instead of sum of squares of w, lasso regression use absolute value of the coefficients.

Lasso vs Ridge: Use Ridge when there are many small/medium sized effect. Use lasso when there are only a few varibales.

Polynomial Features with Linear Regression

Generate new features consisting of all polynomial combinations of the original two features (x0, x1)
(x0, x1, x0^2, x0*x1, x1^2)

Logistic Regression

Also has linear function
Use logistic to convert the function to discrete value. (-1,+1)

y = 1/(1+exp[-(function)]

Linear Classifiers: Linear Vector Support Machines

f(x,w,b) = sign(w*x+b)
w (weight)
x (input value)

Results could be positive or negative.

w is a line that seperate the two groups of data.
We want the maximum margin (line with the maximum distance between the two groups of data) classifer (Linear Vector Support Machines)

The points that used to construct the line is called the support vector.

C is the degree of regularization. Larger C means less regularization (more specific).

To predict multiple class
The sklearn use binary classfication on each category one by one. Run each categories against all other categories (One line separate one category and all other).

Kernelized Support Vector Machines
Use a kernel to transform the data for later classification. (eg: radial basis function).
K(x,x’) = exp[-gamma*(x-x’)^2]
Larger gamma means points have to be very close to be similar.

Cross Validation

Use different sets of training sets and get the average score of the performance.

eg: Fold 3: split the data into three sets. Use each set as the test set once, so the rest sets are the training sets. So there are in total three folds and three test sets. Normally use a stratified cross validation strategy to ensure each set simulate the original data structure (types of target value)

Decision Trees

Use a binary tree to regress or classify data. Classify flower type based on flower length and width.

Yes/No -> Yes/No -> Groups of data

Pure Nodes: pure groups of data
Mixed Nodes: might need further decision

Need to limit the depth/nodes of the tree/min num to split to prevent overfitting.

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2022-04-18 17:43:23  更:2022-04-18 17:46:34 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/26 10:28:32-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码