开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> 【无标题】 -> 正文阅读

[人工智能]【无标题】

数据分析

【python教程】数据分析——numpy、pandas、matplotlib_哔哩哔哩_bilibili

帮助人们作出判断，以便采取适当行动

环境安装

1.conda

https://www.anaconda.com/download/

1395122641@qq.com

Sd7rx.pJbbMdTbz

2.jupyter notebook

一款编程/文档/笔记/展示软件

【win+R】进入cmd命令行，输入【jupyter notebook】然后回车启动jupyter notebook

数据分析流程

提出问题→准备数据→分析数据→获得结论→成果可视化

jupyter和conda

matplotlib

一、numpy

numpy特点：

1.快速

2.方便

3.科学计算的基础库

numpy是在python中做科学计算的基础库，重在数值计算，用于在大型、多维数组上执行数值运算

数组，数学里面的矩阵，多个列表嵌套。

1.numpy创建数组（矩阵）

numpy中数组类型为ndarray
arange = array(range()) – -- 快速生成数组
dtype：打印出数组中存放的数据的类型
astype：调整数据类型

import numpy as np
import random

# 使用numpy生成数组，得到ndarray的类型
t1 = np.array([1,2,3])
print(t1)
print(type(t1))

"""
运行结果：
[1 2 3]
<class 'numpy.ndarray'>
numpy的数组类型是numpy.ndarray
"""

t2 = np.array(range(10))
print(t2)
print(type(t2))
"""
运行结果：
[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>
"""

t3 = np.arange(10) # 快速生成数组,arange = array(range)
print(t3)
print(type(t3))
"""
运行结果：
[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>
"""

# 生成的数组在4~10之间，且步长为2
t4 = np.arange(4,10,2)
print(t4)
"""
运行结果：
<class 'numpy.ndarray'>
[4 6 8]
"""

# dtype：打印出数组中存放的数据的类型
print(t3.dtype)
print(t4.dtype)
"""
int32 这个32跟电脑有关，32位(4个字节)电脑就是int32，64位电脑就是int64
int32
"""

print("*"*100)
# numpy中的数据类型，可以规定存储的数据类型

t5 = np.array(range(1,4),dtype=float)# 规定存储的数据类型是float型
t5 = np.array(range(1,4),dtype="float32")
t5 = np.array(range(1,4),dtype="i1")# i1是int8，8位(1个字节)
print(t5)
print(t5.dtype)
"""
****************************************************************************************************
[1. 2. 3.]
float64
float32
int8
"""

# numpy中的bool类型
t6 = np.array([1,1,0,1,0,0],dtype=bool)
print(t6)
print(t6.dtype)
"""
[ True  True False  True False False]
bool
"""

# 调整数据类型 -- astype
t7 = t6.astype("int8")
print(t7)
print(type(t7))
print(t7.dtype)
"""
[1 1 0 1 0 0]
<class 'numpy.ndarray'>
int8
"""

# numpy中的小数
t8 = np.array([random.random() for i in range(10)])
print(t8)
print(t8.dtype)
"""
输出10个随机的0~1之间的小数
[0.68144346 0.678907   0.45509393 0.90033104 0.34856001 0.39504937
 0.03622501 0.43622648 0.61767704 0.57093758]
float64
"""

# 取小数
t8 = np.round(t8,2)# 取两位的小数
print(t8)
"""
float64
[0.67 0.92 0.52 0.8  0.12 0.51 0.96 0.07 0.63 0.81]
"""

# python中取随机的三位的小数
t9 = round(random.random(),3)
print(t9)
"""
0.953
"""

# 产生的随机数保留两位小数
t10 = "%.2f"%random.random()
print(t10)
"""
0.83
"""

2.数组的形状和数组的计算

1.数组的形状

1.数组类型

一维数组
二维数组
三维数组

2.修改数组类型

reshape – --任意指定修改为几维数组
flatten – -- 展开数组,把数组展开成一维的

import numpy as np
# 一维数组
t1 = np.arange(12)
print(t1)
print(t1.shape)
"""
[ 0  1  2  3  4  5  6  7  8  9 10 11]
(12,)
"""
# 二维数组
t2 = np.array([[1,2,3],[4,5,6]])
print(t2)
print(t2.shape)
"""
[[1 2 3]
 [4 5 6]]
(2, 3) # 表示2行3列的数组
"""
# 三维数组
t3 = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print(t3)
print(t3.shape)
"""
[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
(2, 2, 3)
"""

# 修改数组的形状 -- reshape
t4 = np.arange(12)
print(t4)
print(t4.reshape((3,4)))# 改成3行4列的一个二维数组
"""
[ 0  1  2  3  4  5  6  7  8  9 10 11]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
"""
t5 = np.arange(24)
print(t5)
print(t5.reshape((2,3,4)))# 改成2块3行4列的数组 -- 三维数组
print(t5.reshape((4,6)))
"""
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
***********************************************************************************
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
 ***********************************************************************************
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]  
"""
# 把二维数组变成一维数组
t5 = t5.reshape((4,6))
print(t5)
print(t5.reshape((24,)))
"""
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
"""

t6 = t5.reshape((t5.shape[0]*t5.shape[1],))# shape[0]表示t5的行数，shape[1]表示t5的列数
# t6是把t5变成一个t5行数*t5列数的一维数组
print(t6)
"""
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
"""
# flatten -- 展开数组,把数组展开成一维的
print(t5.flatten())
"""
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
"""

2.数组的计算

1.数组和数字计算

2.数组和数组计算（广播原则）

数组类型相同：对应数字运算即可
数组类型不同：
- 任意数组（x，y）（行，列）和数组（x，1）或（1，y）：行相同按行运算，列相同按列运算
- 任意数组之间，在某一个方向是一致的就可以进行运算
- 任意数组之间：行列都不同，没有同类型的，不能进行运算，会报错
- eg：（广播原则）
  
  shape为（3，3，3）的数组不能和（3，2）的数组进行计算；
  
  shape为（3，3，2）的数组能够和（3，2）的数组进行计算；
  - （3，3，2）相当于是一个高是2，长宽均为3的长方体，长方体的前面是一个3*2的面，可以与（3，2）运算，每层运算一次，一共三层
  shape为（3，3，2）的数组能够和（3，3）的数组进行计算。
  - （3，3，2）相当于是一个高是2，长宽均为3的长方体，一个3*3面可以与（3，3）运算，每层运算一次，一共两层

print(t5)
"""
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
"""
print(t5+2)
print(t5*2)
print(t5/2)
print(t5/0)# 报错
"""
[[ 2  3  4  5  6  7]
 [ 8  9 10 11 12 13]
 [14 15 16 17 18 19]
 [20 21 22 23 24 25]]
***********************
 [[ 0  2  4  6  8 10]
 [12 14 16 18 20 22]
 [24 26 28 30 32 34]
 [36 38 40 42 44 46]]
***********************
 [[ 0.   0.5  1.   1.5  2.   2.5]
 [ 3.   3.5  4.   4.5  5.   5.5]
 [ 6.   6.5  7.   7.5  8.   8.5]
 [ 9.   9.5 10.  10.5 11.  11.5]]
 ***********************
 [[nan inf inf inf inf inf]
 [inf inf inf inf inf inf]
 [inf inf inf inf inf inf]
 [inf inf inf inf inf inf]]
"""
# 同类型数组运算
t7 = np.arange(100,124).reshape((4,6))
print(t7)
print(t5+t7)
print(t5*t7)
print(t5/t7)
"""
[[100 101 102 103 104 105]
 [106 107 108 109 110 111]
 [112 113 114 115 116 117]
 [118 119 120 121 122 123]]
****************************
[[100 102 104 106 108 110]
 [112 114 116 118 120 122]
 [124 126 128 130 132 134]
 [136 138 140 142 144 146]]
"""
# 不同类型数组运算
t8 = np.arange(0,6)
print(t8)
print(t5-t8)
"""
[0 1 2 3 4 5]
**********************
# 对应的同类型的部分运算,按行运算
[[ 0  0  0  0  0  0]
 [ 6  6  6  6  6  6]
 [12 12 12 12 12 12]
 [18 18 18 18 18 18]]
"""
t9 = np.arange(4).reshape((4,1))
print(t9)
print(t5-t9)
"""
[[0]
 [1]
 [2]
 [3]]
**********************
# 按列运算
[[ 0  1  2  3  4  5]
 [ 5  6  7  8  9 10]
 [10 11 12 13 14 15]
 [15 16 17 18 19 20]]
"""

3.numpy读取本地数据和索引

1.numpy读取本地数据

1.轴

在numpy中可以理解为方向，用0，1，2……表示

一维数组：0轴

二维数组（shape（2，2））：0轴和1轴

0轴：行；1轴：列；

三维数组（shape（2，2，3））：0，1，2轴

0轴：块；1轴：行；2轴：列

2.numpy读取数据

1.numpy从本地CSV文件中读取数据

numpy从CSV中读取数据

CSV：逗号分隔值文件

显示：表格状态

源文件：换行和逗号分隔行列的格式化文本，每一行的数据表示一条记录

举例：

数据来源：https://www.kaggle.com/datasnaek/youtube/data

import numpy as np
us_file_path = "文件路径"
t1 = np.loadtxt(us_file_path,delimiter",",dtype="int",unpack=True)
print(t1)

unpack=True：达到转置效果

2.numpy中的转置

三种方法：

transpose
T
swapaxes

import numpy as np
t1 = np.arange(24).reshape((4,6))
print(t1)
t2 = t1.transpose()
print(t2)
t3 = t1.T
print(t3)
t4 = t1.swapaxes(1,0)
print(t4)
"""
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
******************************
[[ 0  6 12 18]
 [ 1  7 13 19]
 [ 2  8 14 20]
 [ 3  9 15 21]
 [ 4 10 16 22]
 [ 5 11 17 23]]
******************************
[[ 0  6 12 18]
 [ 1  7 13 19]
 [ 2  8 14 20]
 [ 3  9 15 21]
 [ 4 10 16 22]
 [ 5 11 17 23]]
******************************
[[ 0  6 12 18]
 [ 1  7 13 19]
 [ 2  8 14 20]
 [ 3  9 15 21]
 [ 4 10 16 22]
 [ 5 11 17 23]]
"""

2.numpy中的索引和切片

1.从本地读取文件中的数据

2.取矩阵中某行某列的值

import numpy as np

us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"

t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t2 = np.loadtxt(us_file_path,delimiter=",",dtype="int")

print(t1)
print("*"*100)
print(t2)

# 取行
print(t2[2])
print(t2[1,:])
print(t2[2:,:])
print(t2[[2,3,10],:])

# 取连续多行，从第二行开始取
print(t2[2:])

# 取不连续的多行，取第二行，第八行，第十行
print(t2[[2,8,10]])

# 取列
print(t2[:,0])

# 取连续的多列
print(t2[:,2:])

# 取不连续的多列
print(t2[:,[0,2]])

# 取行和列，取第三行第四列的值
print(t2[2,3])
print(type(t2[2,3]))

# 取多行和多列，取第三到五行，第二到四列的值
# 取的是行和列交叉点的位置
a = t2[2:5,1:4]
print(a)

# 取多个不相邻的点，取第一行第一列的数（0，0）和第三行第二列的数（2，1）
c = t2[[0,2],[0,1]]
print(c)
# d选出来的结果是（0，0），（2，1），（2，3）
d = t2[[0,2,2],[0,1,3]]
print(d)

3.numpy中数值的修改

1.直接对某行某列进行赋值

2.numpy中布尔索引

把t2中小于10的数字替换为3

print(t2<10) # 取到布尔类型的返回
t2[t2<10] = 3 # 把t2中小于10的数字替换为3
print(t2)
print(t2[t2>20])

3.numpy中的三目运算符 – -- where

把t中小于10的数字替换为0，把大于10的替换为10

# numpy中的三目运算符
np.where(t<10,0,10) # 如果t中的数小于10，替换为0；否则的话，替换为10

4.numpy中的裁剪 – -- clip

把t中小于10的数字替换为10，把大于18的替换为18

t.clip(10,18)

但是其中的nan没有被替换

先把矩阵中的数转换成浮点类型才能转换为nan

import numpy as np

t1 = np.arange(24)
t2 = t1.reshape((4,6))
print(t2)
t2 = t2.astype(float) # 先转换成浮点类型才能转换为nan
t2[3,3] = np.nan
print(t2)

4.numpy中的nan和常用方法

1.数据的拼接

把两拨数据拼接起来

1.竖直拼接

np.vstack((t1,t2))

2.水平拼接

np.hstack((t1,t2))

分割和拼接相反

import numpy as np

t1 = np.arange(12)
t2 = t1.reshape((2,6))
print(t2)
print("*"*100)
t3 = np.arange(12,24)
t4 = t3.reshape((2,6))
print(t4)
print("*"*100)
# 竖直拼接
t5 = np.vstack((t2,t4))
print(t5)
print("*"*100)
# 水平拼接
t6 = np.hstack((t2,t4))
print(t6)

3.数组的行列交换

1.行交换

2.列交换

import numpy as np

t1 = np.arange(18)
t2 = t1.reshape((3,6))
print(t2)
# 行交换
t2[[1,2],:] = t2[[2,1],:]
print(t2)
# 列交换
t2[:,[0,2]] = t2[:,[2,0]]
print(t2)

例题：

import numpy as np

us_data = "./youtube_video_data/US_video_data_numbers.csv"
uk_data = "./youtube_video_data/GB_video_data_numbers.csv"

# 加载国家数据
us_data = np.loadtxt(us_data,delimiter=",",dtype=int)
uk_data = np.loadtxt(uk_data,delimiter=",",dtype=int)

# 添加国家信息
# 构造全为0的数据
zeros_data = np.zeros((us_data.shape[0],1)).astype(int)
ones_data = np.ones((uk_data.shape[0],1)).astype(int)

# 分别添加一列全为0，1的数组
us_data = np.hstack(us_data,zeros_data)
uk_data = np.hstack(uk_data,ones_data)

# 拼接两组数据
final_data = np.vstack((us_data,uk_data))
print(final_data)

2.numpy中的随机方法

1.获取最大值最小值的位置

np.argmax(t,axis=0)
np.argmin(t,axis=1)

2.创建一个全为0的数组

? np.zeros((3,4))

3.创建一个全为1的数组

? np.ones((3,4))

4.创建一个全为1的正方形数组（方阵）

? np.eye(3)

import numpy as np
# 创建一个全为1的数组
print(np.ones((3,4)).astype(int))
# 创建一个全为0的数组
print(np.zeros((2,3)))
# 创建一个对角线为1的正方形数组（方阵）
print(np.eye(10))
# 获取最大值最小值的位置
t = np.eye(4)
print(np.argmax(t,axis=0))
print(np.argmin(t,axis=1))
t[t==1] = -1
print(t)
print(np.argmax(t,axis=0))
print(np.argmin(t,axis=1))

5.numpy生成随机数

np.random.xxx

参数	解释
rand(d0,d1,…,dn)	创建d0~ dn维度的均匀分布的随机数数组，浮点数，范围从0~1
randn	创建d0~ dn维度的标准正态分布随机数，浮点数，平均数0，标准差1
randint(low,high,(shape))	从给定上下限范围选取随机数整数，范围是low,high，形状是shape
uniform(low,high,(size))	产生具有均匀分布的数组，low起始值，high结束值，size形状
normal(loc,scale.(size))	从指定正态分布中随机抽取样本，分布中心是loc(概率分布的均值)，标准差是scale，形状是size
seed(s)	随机数种子，s是给定的种子值。因为计算机生成的是伪随机数，所以通过设定相同的随机数种子，可以每次生成相同的随机数

import numpy as np

print(np.random.randint(10,20,(4,5)))
print("*"*100)
# 使用随机种子，使每一次得到的结果一样
np.random.seed(10)
t = np.random.randint(0,20,(3,4))
print(t)

6.numpy的注意点copy和view

a=b 完全不复制，a和b相互影响
a=b[:]，视图的操作，一种切片，会创建新的对象a。但是a的数据完全由b保管，他们两个的数据变化是一致的
a=b.copy()，复制，a和b互不影响

3.numpy中的nan和常用统计方法

1.numpy中的nan和inf

nan(not a number)：不是一个数字

inf：正无穷

-inf：负无穷

两个nan是不相等的
np.nan!=np.nan
利用以上的特性，判断数组中nan的个数

np.count_nonzero(t!=t)
通过np.isnan(a)来判断一个数字是否是nan，返回bool类型

np.isnan(a)
nan和任何值计算都为nan

2.numpy中求和

t1 = np.arange(12).reshape((3,4))
print(t1)
print(np.sum(t1))
print(np.sum(t1,axis=0))
print(np.sum(t1,axis=1))

一般把一组数据中的nan替换为均值（中值）或者是直接删除有缺失值的一行

如何计算一组数据中的中值或者是均值

均值：比如要算某一列的均值，对这一列中不为nan的数进行求和，然后除以他的个数

中值：一组数从大到小排列好，除nan以外的数有奇数个数，就取中间的数；有偶数个数就取中间的两个数之和再除以二

如何删除有缺失数据的那一行（列）

3.numpy中常用统计函数

求和	t.sum(axis=None)
均值	t.meam(a,axis=None)	受离群点的影响大
中值	np.median(t,axis=None)
最大值	t.max(axis=None)
最小值	t.min(axis=None)
极值	np.ptp(t,axis=None)	最大值和最小值之差
标准差	t.std(axis=None)	一组数据平均值的分散程度，标准差越大，表示与相对平均值波动越大，越不稳定

import numpy as np

t1 = np.arange(12).reshape((3,4))
print(t1)
print(np.sum(t1,axis=1))
print(np.mean(t1,axis=1))
print(np.median(t1,axis=0))
print(np.max(t1,axis=1))
print(np.min(t1,axis=1))
print(np.ptp(t1))
print(np.ptp(t1,axis=1))

人工智能最新文章

2022吴恩达机器学习课程——第二课（神经网

第十五章规则学习

FixMatch: Simplifying Semi-Supervised Le

数据挖掘Java——Kmeans算法的实现

大脑皮层的分割方法

【翻译】GPT-3是如何工作的

论文笔记:TEACHTEXT: CrossModal Generaliz

python从零学（六）

详解Python 3.x 导入(import)

【答读者问27】backtrader不支持最新版本的

加:2021-12-15 18:17:53 更:2021-12-15 18:19:54

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/31 5:33:24-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码