CSV文件
CSV (Comma‐Separated Value, 逗号分隔值) CSV是一种常见的文件格式,用来存储批量数据,当存取一、二维数据的时候,使用CSV文件,调用numpy中的np.savetxt() 和np.loadtxt() 方法。
np.savetxt()
np.savetxt(frame, array, fmt='%.18e', delimiter=None)
? frame : 文件、字符串或产生器,可以是.gz或.bz2的压缩文件 ? array : 存入文件的数组 ? fmt : 写入文件的格式,例如:%d %.2f %.18e ? delimiter : 分割字符串,默认是任何空格
In [18]: a = np.arange(100).reshape((5,20))
In [19]: a
Out[19]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97, 98, 99]])
In [20]: np.savetxt('a.csv',a , fmt = '%d',delimiter = ',')
.
在默认路径下就会生成一个CSV文件,注意此处excel只是一个打开方式 .
右击选择文件属性,可以看到:确实是一个.CSV 文件
. 当选用记事本打开后:
. . .
np.loadtxt()
np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)
? frame : 文件、字符串或产生器,可以是.gz或.bz2的压缩文件 ? dtype : 数据类型,可选 ? delimiter :分割字符串,默认是任何空格 ? unpack : 如果True,读入属性将分别写入不同变量
In [21]: b = np.loadtxt('a.csv',dtype = np.int,delimiter = ',')
In [22]: b
Out[22]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97, 98, 99]])
In [30]: b= np.loadtxt('a.csv',dtype = np.int,delimiter = ',',unpack = True)
In [31]: b
Out[31]:
array([[ 0, 20, 40, 60, 80],
[ 1, 21, 41, 61, 81],
[ 2, 22, 42, 62, 82],
[ 3, 23, 43, 63, 83],
[ 4, 24, 44, 64, 84],
[ 5, 25, 45, 65, 85],
[ 6, 26, 46, 66, 86],
[ 7, 27, 47, 67, 87],
[ 8, 28, 48, 68, 88],
[ 9, 29, 49, 69, 89],
[10, 30, 50, 70, 90],
[11, 31, 51, 71, 91],
[12, 32, 52, 72, 92],
[13, 33, 53, 73, 93],
[14, 34, 54, 74, 94],
[15, 35, 55, 75, 95],
[16, 36, 56, 76, 96],
[17, 37, 57, 77, 97],
[18, 38, 58, 78, 98],
[19, 39, 59, 79, 99]])
如果没有看懂,继续“反转”回来:
In [32]: np.savetxt('b.csv',b,fmt = '%d',delimiter = ',')
In [33]: c = np.loadtxt('b.csv',dtype = np.int,delimiter = ',')
In [34]: c
Out[34]:
array([[ 0, 20, 40, 60, 80],
[ 1, 21, 41, 61, 81],
[ 2, 22, 42, 62, 82],
[ 3, 23, 43, 63, 83],
[ 4, 24, 44, 64, 84],
[ 5, 25, 45, 65, 85],
[ 6, 26, 46, 66, 86],
[ 7, 27, 47, 67, 87],
[ 8, 28, 48, 68, 88],
[ 9, 29, 49, 69, 89],
[10, 30, 50, 70, 90],
[11, 31, 51, 71, 91],
[12, 32, 52, 72, 92],
[13, 33, 53, 73, 93],
[14, 34, 54, 74, 94],
[15, 35, 55, 75, 95],
[16, 36, 56, 76, 96],
[17, 37, 57, 77, 97],
[18, 38, 58, 78, 98],
[19, 39, 59, 79, 99]])
In [36]: d = np.loadtxt('b.csv',dtype = np.int,delimiter = ',',unpack = True)
In [37]: d
Out[37]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97, 98, 99]])
期间有个错误,作者不太理解,如下:
d = np.loadtxt('b.csv',dtype = np.int,unpack = True)
运行此代码会报错,但是加上delimiter 属性之后就正常运行,猜测可能是因为整型需要标点符号?大佬可以指出……
CSV文件的局限性
CSV只能有效存储一维和二维数组。 np.savetxt() 和np.loadtxt() 只能有效存取一维和二维数组。
多维数据的存取
tofile()
a.tofile(frame, sep='', format='%s')
? frame : 文件、字符串 ? sep : 数据分割字符串,如果是空串,写入文件为二进制 ? format : 写入数据的格式
示例1:
In [1]: import numpy as np
In [2]: a = np.arange(100).reshape(5,10,2)
In [3]: a.tofile('b.dat',sep = ',',format = '%d')
In [4]: a.tofile('c.dat',format = '%d')
np.fromfile()
np.fromfile(frame, dtype=float, count=‐1, sep='')
? frame : 文件、字符串 ? dtype : 读取的数据类型 ? count : 读入元素个数,‐1表示读入整个文件 ? sep : 数据分割字符串,如果是空串,写入文件为二进制
In [1]: import numpy as np
In [2]: a = np.arange(100).reshape(5,10,2)
In [3]: a.tofile('b.dat',sep = ',',format = '%d')
In [4]: a.tofile('c.dat',format = '%d')
In [5]: c = np.fromfile('b.dat',dtype = np.int,sep = ',')
In [6]: c
Out[6]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
In [7]: d = np.fromfile('b.dat',dtype = np.int,sep = ',').reshape(5,10,2)
In [8]: d
Out[8]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]],
[[20, 21],
[22, 23],
[24, 25],
[26, 27],
[28, 29],
[30, 31],
[32, 33],
[34, 35],
[36, 37],
[38, 39]],
[[40, 41],
[42, 43],
[44, 45],
[46, 47],
[48, 49],
[50, 51],
[52, 53],
[54, 55],
[56, 57],
[58, 59]],
[[60, 61],
[62, 63],
[64, 65],
[66, 67],
[68, 69],
[70, 71],
[72, 73],
[74, 75],
[76, 77],
[78, 79]],
[[80, 81],
[82, 83],
[84, 85],
[86, 87],
[88, 89],
[90, 91],
[92, 93],
[94, 95],
[96, 97],
[98, 99]]])
需要注意:
- 该方法需要读取时知道存入文件时数组的维度和元素类型
a.tofile() 和np.fromfile() 需要配合使用- 可以通过元数据文件来存储额外信息
Numpy 的便捷文件存取
np.save(fname, array) 或 np.savez(fname, array)
? fname : 文件名,以.npy为扩展名,压缩扩展名为.npz ? array : 数组变量
np.load(fname)
? fname : 文件名,以.npy为扩展名,压缩扩展名为.npz
In [16]: a = np.arange(100).reshape(5,10,2)
In [17]: np.save('abc',a)
NumPy 的随机数函数
NumPy的随机数函数子库(np.random.*)
np.random的随机函数(一)
np.random.rand(d0,d1,..,dn) :根据d0‐dn创建随机数数组,浮点数,[0,1),均匀分布 np.random.randn(d0,d1,..,dn) :根据d0‐dn创建随机数数组,标准正态分布 np.random.randint((low,high,(shape)) :根据shape创建随机整数或整数数组,范围是[low, high) seed(s) :随机数种子,s是给定的种子值
In [21]: import numpy as np
In [22]: a = np.random.rand(2,3,4)
In [23]: a
Out[23]:
array([[[0.83320496, 0.12055854, 0.49206112, 0.8566886 ],
[0.75550467, 0.02323388, 0.41936258, 0.07790391],
[0.73686559, 0.76060261, 0.03561822, 0.92840269]],
[[0.13533978, 0.91260566, 0.83112466, 0.08282195],
[0.72465413, 0.04355692, 0.61615688, 0.69286588],
[0.96795979, 0.70903762, 0.29605484, 0.16523414]]])
In [24]: b = np.random.randn(2,3,4)
In [25]: b
Out[25]:
array([[[ 1.12912474, -1.90537955, -0.26429729, -0.8029504 ],
[ 1.39715161, -0.70918709, -0.56475154, 0.5884456 ],
[ 0.49628045, 0.64393377, 1.54287675, -1.3668673 ]],
[[-1.6025208 , -0.69154075, -0.53622357, -0.44194084],
[-2.13685166, -0.45760608, 0.8125242 , 0.69221503],
[-0.76923649, -0.16106871, -0.58058069, -0.19658289]]])
In [26]: c = np.random.randint(3,4,5)
In [27]: c
Out[27]: array([3, 3, 3, 3, 3])
In [28]: d = np.random.randint(100,200,(3,4,5))
In [29]: d
Out[29]:
array([[[139, 194, 103, 122, 123],
[118, 102, 102, 150, 109],
[168, 127, 104, 142, 126],
[183, 125, 130, 163, 117]],
[[194, 100, 198, 105, 155],
[195, 138, 108, 177, 139],
[129, 178, 129, 160, 172],
[157, 165, 173, 166, 133]],
[[176, 109, 101, 192, 168],
[141, 160, 199, 190, 114],
[128, 125, 187, 173, 141],
[114, 178, 136, 166, 178]]])
In [31]: e = np.random.randint(100,200,(3,4))
In [32]: e
Out[32]:
array([[124, 100, 128, 149],
[193, 120, 188, 145],
[174, 188, 198, 189]])
In [33]: np.random.seed(10)
In [34]: np.random.randint(100,200,(3,4))
Out[34]:
array([[109, 115, 164, 128],
[189, 193, 129, 108],
[173, 100, 140, 136]])
In [35]: np.random.randint(100,200,(3,4))
Out[35]:
array([[116, 111, 154, 188],
[162, 133, 172, 178],
[149, 151, 154, 177]])
In [36]: np.random.seed(10)
In [37]: np.random.randint(100,200,(3,4))
Out[37]:
array([[109, 115, 164, 128],
[189, 193, 129, 108],
[173, 100, 140, 136]])
np.random的随机函数(二)
shuffle(a) :根据数组a的第1轴进行随排列,改变数组x permutation(a) :根据数组a的第1轴产生一个新的乱序数组,不改变数组x choice(a,size,replace,p) :从一维数组a中以概率p抽取元素,形成size形状新数组 replace表示是否可以重用元素,默认为True
In [40]: import numpy as np
In [41]: a = np.random.randint(100,200,(3,4))
In [42]: a
Out[42]:
array([[116, 111, 154, 188],
[162, 133, 172, 178],
[149, 151, 154, 177]])
In [43]: np.random.shuffle(a)
In [44]: a
Out[44]:
array([[116, 111, 154, 188],
[149, 151, 154, 177],
[162, 133, 172, 178]])
In [45]: np.random.shuffle(a)
In [46]: a
Out[46]:
array([[162, 133, 172, 178],
[116, 111, 154, 188],
[149, 151, 154, 177]])
In [48]: a
Out[48]:
array([[162, 133, 172, 178],
[116, 111, 154, 188],
[149, 151, 154, 177]])
In [49]: np.random.permutation(a)
Out[49]:
array([[149, 151, 154, 177],
[162, 133, 172, 178],
[116, 111, 154, 188]])
In [50]: a
Out[50]:
array([[162, 133, 172, 178],
[116, 111, 154, 188],
[149, 151, 154, 177]])
In [55]: np.random.choice(a,(3,2))
Out[55]:
array([[186, 130],
[157, 130],
[186, 131]])
In [64]: np.random.choice(a,(3,2),replace = False)
Out[64]:
array([[142, 111],
[162, 192],
[179, 196]])
In [66]: np.random.choice(a,(3,2),p = a/np.sum(a))
Out[66]:
array([[179, 111],
[196, 192],
[157, 196]])
np.random的随机函数(三)
uniform(low,high,size) :产生具有均匀分布的数组,low起始值,high结束值,size形状 normal(loc,scale,size) :产生具有正态分布的数组,loc均值,scale标准差,size形状 poisson(lam,size) :产生具有泊松分布的数组,lam随机事件发生率,size形状
In [67]: np.random.uniform(0,10,(3,4))
Out[67]:
array([[0.43097356, 8.79915175, 7.63240587, 8.78096643],
[4.17509144, 6.05577564, 5.13466627, 5.97836648],
[2.62215661, 3.00871309, 0.25399782, 3.03062561]])
NumPy 的统计函数
NumPy直接提供的统计类函数(np.* ) np.std() 、np.var() 、 np.average()
NumPy 的统计函数(一)
sum(a, axis=None) : 根据给定轴axis计算数组a相关元素之和,axis整数或元组 mean(a, axis=None) :根据给定轴axis计算数组a相关元素的期望,axis整数或元组 average(a,axis=None,weights=None) :根据给定轴axis计算数组a相关元素的加权平均值 std(a, axis=None) :根据给定轴axis计算数组a相关元素的标准差 var(a, axis=None) :根据给定轴axis计算数组a相关元素的方差
其中:axis = None 是统计函数的标配参数
In [68]: a = np.arange(15).reshape(3,5)
In [69]: a
Out[69]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [70]: np.sum(a)
Out[70]: 105
In [71]: np.mean(a)
Out[71]: 7.0
In [72]: np.mean(a,axis = 1)
Out[72]: array([ 2., 7., 12.])
In [73]: np.mean(a,axis = 0)
Out[73]: array([5., 6., 7., 8., 9.])
In [74]: np.average(a,axis = 0,weights = [10,5,1])
Out[74]: array([2.1875, 3.1875, 4.1875, 5.1875, 6.1875])
In [75]: np.std(a)
Out[75]: 4.320493798938574
In [76]: np.var(a)
Out[76]: 18.666666666666668
NumPy 的统计函数(二)
min(a) 、 max(a) :计算数组 a 中元素的最小值、最大值。 argmin(a) 、argmax(a) :计算数组a中元素最小值、最大值的降一维后下标。 unravel_index(index, shape) :根据 shape 将一维下标 index 转换成多维下标。 ptp(a) :计算数组 a 中元素最大值与最小值的差。 median(a) :计算数组 a 中元素的中位数(中值)。
In [80]: np.maxa = np.arange(15,0,-1).reshape(3,5)
In [81]: a
Out[81]:
array([[15, 14, 13, 12, 11],
[10, 9, 8, 7, 6],
[ 5, 4, 3, 2, 1]])
In [82]: np.max(a)
Out[82]: 15
In [83]: np.min(a)
Out[83]: 1
In [84]: np.argmax(a)
Out[84]: 0
In [85]: np.argmin(a)
Out[85]: 14
In [86]: np.unravel_index(np.argmin(a),a.shape)
Out[86]: (2, 4)
In [87]: np.unravel_index(np.argmin(a),(5,3))
Out[87]: (4, 2)
In [89]: np.ptp(a)
Out[89]: 14
In [90]: np.median(a)
Out[90]: 8.0
NumPy的梯度函数
np.gradient(f) :计算数组 f 中元素的梯度,当 f 为多维时,返回每个维度梯度。
梯度:连续值之间的变化率,即斜率 XY坐标轴连续三个X坐标对应的Y轴值:a, b, c,其中,b的梯度是: (c‐a)/2 ,其中,2是 a与c之间的距离。
In [93]: a = np.random.randint(0,100,5)
In [94]: a
Out[94]: array([77, 48, 93, 75, 86])
In [95]: np.gradient(a)
Out[95]: array([-29. , 8. , 13.5, -3.5, 11. ])
In [96]: b = np.random.randint(0,200,(3,5))
In [97]: b
Out[97]:
array([[165, 11, 149, 161, 95],
[171, 73, 168, 101, 171],
[ 90, 71, 136, 72, 156]])
In [98]: np.gradient(b)
Out[98]:
[array([[ 6. , 62. , 19. , -60. , 76. ],
[-37.5, 30. , -6.5, -44.5, 30.5],
[-81. , -2. , -32. , -29. , -15. ]]),
array([[-154. , -8. , 75. , -27. , -66. ],
[ -98. , -1.5, 14. , 1.5, 70. ],
[ -19. , 23. , 0.5, 10. , 84. ]])]
|