1. Numpy
1.1 Numpy 简介
-
NumPy 是一个 Python 包,它代表 “Numeric Python”。 它是一个由多维数组对象(矩阵)和用于处理数组的例程集合组成的库 -
使用NumPy,开发人员可以执行以下操作:
- 数组的算数和逻辑运算。
- 傅立叶变换和用于图形操作的例程
- 与线性代数有关的操作,NumPy 拥有线性代数和随机数生成的内置函数
-
NumPy 通常与 SciPy(Scientific Python)和 Matplotlib(绘图库)一起使用,这种组合广泛用于替代 MatLab -
引入numpy库 import numpy as np
1.2 Numpy 安装
conda env list
activate my_python
python -m pip install numpy scipy matplotlib ipython jupyter pandas sympy nose -i https://pypi.douban.com/simple/
1.3 Ndarray对象
-
NumPy 中定义的最重要的对象是称为 ndarray 的 N 维数组类型 -
它描述相同类型的元素集合,可以使用基于零的索引访问集合中的项目 -
ndarray 类的实例可以通过本教程后面描述的不同的数组创建例程来构造 numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)
-
几种创建array 的方式
a = np.array([1,2,3])
print(a)
b = np.array([[1,2],
[3,4]])
print(b)
d = np.zeros((3,3),dtype=int)
print(d)
e = np.arange(10,20,2)
print(e)
f = np.arange(12).reshape((3,4))
print(f)
g = np.linspace(1,10,6)
g = np.linspace(1,10,6).reshape((2,3))
print(g)
-
Ndarray对象的声明: a = np.array([0,1,2,3],dtype=float)
b = a
a[0] = 0.3
print(a)
print(b is a)
print(b)
c = a
c[1:3] = [4,4]
print(c)
print(a)
a = np.array([0,1,2,3],dtype=float)
b = a.copy()
a[0] = 0.3
print(a)
print(b)
1.3 Numpy数据类型
序号 | 数据类型及描述 |
---|
1. | bool_ 存储为一个字节的布尔值(真或假) | 2. | int_ 默认整数,相当于 C 的long ,通常为int32 或int64 | 3. | intc 相当于 C 的int ,通常为int32 或int64 | 4. | intp 用于索引的整数,相当于 C 的size_t ,通常为int32 或int64 | 5. | int8 字节(-128 ~ 127) | 6. | int16 16 位整数(-32768 ~ 32767) | 7. | int32 32 位整数(-2147483648 ~ 2147483647) | 8. | int64 64 位整数(-9223372036854775808 ~ 9223372036854775807) | 9. | uint8 8 位无符号整数(0 ~ 255) | 10. | uint16 16 位无符号整数(0 ~ 65535) | 11. | uint32 32 位无符号整数(0 ~ 4294967295) | 12. | uint64 64 位无符号整数(0 ~ 18446744073709551615) | 13. | float_ float64 的简写 | 14. | float16 半精度浮点:符号位,5 位指数,10 位尾数 | 15. | float32 单精度浮点:符号位,8 位指数,23 位尾数 | 16. | float64 双精度浮点:符号位,11 位指数,52 位尾数 | 17. | complex_ complex128 的简写 | 18. | complex64 复数,由两个 32 位浮点表示(实部和虚部) | 19. | complex128 复数,由两个 64 位浮点表示(实部和虚部) |
c = np.array([1,23,4],dtype=np.int)
print(c.dtype)
1.5 Numpy 数组属性
-
ndarray.shape:返回一个包含数组维度的元组,它也可以用于调整数组大小 -
ndarray.nidm:返回数组的维数 b = np.array([[1,2],
[3,4]])
print(b)
print('bumber of dim:',b.ndim)
print('shape:',b.shape)
print('size:',b.size)
-
nadarry.itemsize:返回数组中每个元素的字节单位长度 a = np.array([1,2,3],dtype=float)
b = np.array([[1,2],[3,4]],dtype=int)
print(a.itemsize)
print(b.itemsize)
1.6 对Numpy数组的操作
-
基本运算: a = np.array([40,50,60,70])
b = np.linspace(1,40,4)
print(a,b)
c = a - b
print(c)
c= a + b
print(c)
c = a**2
print(c)
c = 10*np.sin(a)
print(c)
print(b>10)
print(b==14)
-
矩阵乘法
a = np.array([[1,2],
[3,4]])
b = np.arange(4).reshape((2,2))
c = a*b
print(c)
c_dot = np.dot(a,b)
print(c_dot)
-
求出一些统计信息 a = np.random.random((2,4))
print(a)
print(np.sum(a))
print(np.min(a))
print(np.max(a))
print(np.sum(a,axis=1))
print(np.min(a,axis=0))
print(np.max(a,axis=1))
a = np.arange(2,14).reshape((3,4))
print(a)
print(np.argmin(a))
print(np.argmin(a,axis=1))
print(np.argmax(a))
print(np.mean(a))
print(np.average(a))
print(np.median(a))
print(np.cumsum(a))
print(np.diff(a))
print(np.nonzero(a))
-
矩阵排序 a = np.arange(12,0,-1).reshape((3,4))
print(a)
print(np.sort(a))
-
矩阵转置
a = np.arange(12,0,-1).reshape((3,4))
print(a)
print(np.transpose(a))
print((a.T).dot(a))
print(a.T)
print(a[np.newaxis,:])
print(a[:,np.newaxis])
a = np.array([1,1,1])[:,np.newaxis]
b = np.array([2,2,2])[:,np.newaxis]
print(np.hstack((a,b)))
-
矩阵元素提取
print(np.clip(a,5,9))
a = np.arange(3,15)
print(a)
print(a[3])
a = a.reshape((3,4))
print(a)
print(a[2])
print(a[2][1])
print(a[2,1])
print(a[2,:])
print(a[2,1:3])
-
矩阵遍历
for row in a:
print(row)
for column in a.T:
print(column)
print(a.flatten())
for item in a.flat:
print(item)
-
矩阵合并
a = np.array([1,1,1])
b = np.array([2,2,2])
c = np.vstack((a,b))
print(c)
print(c.shape)
d = np.hstack((a,b))
print(d)
print(d.shape)
a = np.array([1,1,1])[:,np.newaxis]
b = np.array([2,2,2])[:,np.newaxis]
c = np.concatenate((a,b),axis=0)
print(c)
c = np.concatenate((a,a,b,b),axis=1)
print(c)
-
矩阵切割 a = np.arange(12).reshape((3,4))
print(a)
print(np.hsplit(a,2))
print(np.split(a,2,axis=1))
print(np.vsplit(a,3))
print(np.split(a,3,axis=0))
print(np.array_split(a,3,axis=1))
2. Pandas
2.1 DataFrame与Series
s = pd.Series([1,3,6,np.nan,44,1])
print(s)
2.2 构造数据表
-
Pandas DataFrame 是一个二维的数组结构,类似二维数组 -
DataFrame 构造方法 pandas.DataFrame( data, index, columns, dtype, copy)
- data:一组数据(ndarray、series, map, lists, dict 等类型)
- index:索引值,或者可以称为行标签
- columns:列标签,默认为 RangeIndex (0, 1, 2, …, n)
- dtype:数据类型
- copy:拷贝数据,默认为 False
dates = pd.date_range('20210101',periods=6)
print(dates)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=['a','b','c','d'])
print(df)
df = pd.DataFrame(np.arange(12).reshape((3,4)))
print(df)
-
使用字典来规定数据格式 df = pd.DataFrame({'a':1.,
'b':pd.Timestamp('20210101'),
'c':pd.Series(1,index=list(range(4)),dtype='float32'),
'd':np.array([3]*4,dtype='int32'),
'e':pd.Categorical(["test","train","test","train"]),
'f':'foo'})
print(df)
print(df.dtypes)
print(df.index)
print(df.columns)
print(df.values)
2.3 对数据表的操作
-
查看数据表的基本描述
print(df.describe())
-
数据表的转置 print(df.T)
在控制台现实全部数据,不省略
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)
-
数据表的排序
print(df.sort_index(axis=1,ascending=False))
print(df.sort_index(axis=0,ascending=False))
print(df.sort_values(by='e'))
-
获取数据表中的数据 dates = pd.date_range('20210101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])
print(df)
print(df['a'],df.a)
print(df[0:3],df['20210101':'20210104'])
print(df.loc['20210103'])
print(df.loc[:,['a','b']])
print(df.loc['20210103',['a','b']])
print(df.iloc[3])
print(df.iloc[3,1])
print(df.ix[:3,['a','c']])
print(df[df.a>8])
-
数据表的切片
print(df.iloc[3:5,1:3])
print(df.iloc[[1,3,5],1:3])
-
对数据表的指定位置赋值
dates = pd.date_range('20210101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])
print(df)
df.loc['20210103','b'] = 100
print(df)
df.iloc[1,1] = 100
print(df)
df.iloc[4:,:] = 0
print(df)
df[df.loc[:,:] > 10] = 6
print(df)
df.a[df.a == 0] = 9
print(df)
-
添加新的列
df['e'] = np.nan
df['f'] = pd.Series([1,2,3,4,5,6],index=pd.date_range('20210101',periods=6))
print(df)
-
处理丢失数据(空值处理)
dates = pd.date_range('20210101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])
df.iloc[1,2] = np.nan
df.iloc[3,3] = np.nan
print(df)
print(df.dropna(axis=0,how='any'))
print(df.dropna(axis=0,how='all'))
print(df.fillna(value=0))
print(df.isnull())
print(np.any(df.isnull()) == True)
2.4 pandas 导入导出
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)
data1 = pd.read_csv('D:\实验结果\任务调度实验\\test1.csv')
print(data1)
data2 = pd.read_excel('D:\实验结果\任务调度实验\蚁群运行结果1.xlsx')
print(data2)
data1.to_pickle('D:\实验结果\任务调度实验\student.pickle')
data3 = pd.read_pickle('D:\实验结果\任务调度实验\student.pickle')
print(data3)
2.5 数据表的合并
-
pandas.concat df1 = pd.DataFrame(np.ones((3,4))*0,columns=['a','b','c','d'])
df2 = pd.DataFrame(np.ones((3,4))*1,columns=['a','b','c','d'])
df3 = pd.DataFrame(np.ones((3,4))*2,columns=['a','b','c','d'])
print(df1,df2,df3)
res = pd.concat([df1,df2,df3],axis=0)
print(res)
res = pd.concat([df1,df2,df3],axis=0,ignore_index=True)
print(res)
df1 = pd.DataFrame(np.ones((3,4))*0,index=[1,2,3],columns=['a','b','c','d'])
df2 = pd.DataFrame(np.ones((3,4))*1,index=[2,3,4],columns=['b','c','d','e'])
print(df1,df2)
res = pd.concat([df1,df2],join='inner')
print(res)
res = pd.concat([df1,df2],join='inner',ignore_index=True)
print(res)
res = pd.concat([df1,df2],axis=1)
print(res)
res = pd.concat([df1,df2],axis=1,join_axes=[df1.index])
print(res)
-
pandas.append
df1 = pd.DataFrame(np.ones((3,4))*0,index=[1,2,3],columns=['a','b','c','d'])
df2 = pd.DataFrame(np.ones((3,4))*1,index=[2,3,4],columns=['a','b','c','d'])
print(df1,df2)
res = df1.append(df2,ignore_index=True)
print(res)
df3 = pd.DataFrame(np.ones((3,4))*2,index=[2,3,4],columns=['a','b','c','d'])
res = df1.append([df2,df3],ignore_index=True)
print(res)
df1 = pd.DataFrame(np.ones((3,4))*0,index=[1,2,3],columns=['a','b','c','d'])
s1 = pd.Series([1,2,3,4],index=['a','b','c','d'])
res = df1.append(s1,ignore_index=True)
print(res)
df1 = pd.DataFrame(np.ones((3,4))*0,index=[1,2,3],columns=['a','b','c','d'])
df1['e'] = pd.Series([2,3,4],index=[1,2,3])
print(df1)
-
pandas.merge
-
使用columns来merge合并
df1 = pd.DataFrame(np.ones((3,2))*1,index=[0,1,2],columns=['a','b'])
df1['c'] = pd.Series([2.,3.,4.],index=[0,1,2])
df2 = pd.DataFrame(np.ones((3,2))*3,index=[0,1,2],columns=['e','f'])
df2['c'] = pd.Series([2.,3.,4.],index=[0,1,2])
print(df1)
print(df2)
df3 = pd.merge(df1,df2,on='c')
print(df3)
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
print(left)
print(right)
res = pd.merge(left, right, on='key')
print(res)
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
print(left)
print(right)
res = pd.merge(left,right,on=['key1','key2'])
print(res)
res = pd.merge(left,right,on=['key1','key2'],how='right')
print(res)
-
使用index来merge合并 left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=['K0', 'K1', 'K2'])
right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
'D': ['D0', 'D2', 'D3']},
index=['K0', 'K2', 'K3'])
print(left)
print(right)
res1 = pd.merge(left, right, left_index=True, right_index=True, how='outer')
print(res1)
res2 = pd.merge(left, right, left_index=True, right_index=True, how='inner')
print(res2)
-
indicator:显示merge的方式 df1 = pd.DataFrame({'col1':[0,1], 'col_left':['a','b']})
df2 = pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})
print(df1)
print(df2)
res1 = pd.merge(df1, df2, on='col1', how='outer', indicator=True)
print(res1)
res2 = pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column')
print(res2)
-
给属性添加后缀,以方便merge后进行区分 boys = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'age': [1, 2, 3]})
girls = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'age': [4, 5, 6]})
print(boys)
print(girls)
res = pd.merge(boys, girls, on='k', suffixes=['_boy', '_girl'], how='inner')
print(res)
2.6 plot可视化
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.Series(np.random.randn(1000), index=np.arange(1000))
data = data.cumsum()
data.plot()
plt.show()
data = pd.DataFrame(np.random.randn(1000, 4), index=np.arange(1000), columns=list("ABCD"))
data = data.cumsum()
data.plot()
plt.show()
ax = data.plot.scatter(x='A', y='B', color='Blue', label="Class 1")
data.plot.scatter(x='A', y='C', color='Green', label='Class 2', ax=ax)
plt.show()
|