[Python知识库] Python核心技术总结

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> Python知识库 -> Python核心技术总结 -> 正文阅读

[Python知识库]Python核心技术总结

第一章变量和数据类型

1.1 变量

变量命名规则

字母数字下划线组合，但不能数字开头
变量名不能包含空格
不能使用Python保留的特殊用途单词
变量简短，知名见义
慎用小写字母l和大写数字O,会被误认为数字1 和0
Python中的变量一律使用小写，这与其他编程语言中的小驼峰命名不一样。

1.2 常量

常量使用大写

1.3 字符串

常见操作

字符串大小写

In [49]: name = 'hello world'
# title()
In [50]: name.title()
Out[50]: 'Hello World'
# upper()
In [51]: name.upper()
Out[51]: 'HELLO WORLD'
# lower() 
In [52]: name.lower()
Out[52]: 'hello world'
# capitalize() 首字母 大写
In [53]: name.capitalize()
Out[53]: 'Hello world'

合并字符串

# +
In [41]: str_pre = "hello"

In [42]: str_after = " world"

In [43]: str_concat = str_pre+str_after

In [44]: str_concat
Out[44]: 'hello world'

删除空白和特殊字符

# trip() strip()只能删除字符头尾的字符串,默认删除空格 lstrip() rstrip()
In [46]: str_trip ='000000000helloworld00000000000'

In [47]: str_trip.strip('0')
Out[47]: 'helloworld'

字符串的搜索和替换

# 统计字符出现的次数 count
name = "hello world"
In [29]: name.count('l')
Out[29]: 3
# 首字母大写
In [30]: name.capitalize()
Out[30]: 'Hello world'
# Python center() 返回一个原字符串居中,并使用空格填充至长度 width 的新字符串。默认填充字符为空格。
In [33]: name.center(20,'-')
Out[33]: '----hello world-----'
# find 查找字符串，多个返回第一个的索引，没有找到返回-1
In [35]: name.find('p')
Out[35]: -1
# index 查找字符串，多个返回第一个的索引，没有找到返回错误
In [36]: name.index('l')
Out[36]: 2
In [37]: name.index('p')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-733edee1b9a6> in <module>
----> 1 name.index('p')

ValueError: substring not found
# 字符串替换
In [38]: name.replace('world','word')
Out[38]: 'hello word'
# 字符串存在 判断
In [39]: 'word' in name
Out[39]: False

In [40]: 'world' in name
Out[40]: True

字符串分割

# split() 默认使用空格进行分割,返回一个list
In [26]: string_split = "I have a dream!"

In [27]: string_split.split()
Out[27]: ['I', 'have', 'a', 'dream!']

字符串连接

# join
In [20]: print(','.join('6666'))
6,6,6,6
In [22]: print(','.join(['alice','bluce','candy','duke']))
alice,bluce,candy,duke

字符串切片

In [13]: string_split = "hello world"
# 可以用来做字符串的复制
In [14]: string_split[:]
Out[14]: 'hello world'
# 切片索引从0开始
In [15]: string_split[0:]
Out[15]: 'hello world'
In [16]: string_split[1:]
Out[16]: 'ello world'
# 切片是一个左闭右开的区间
In [17]: string_split[:-1]
Out[17]: 'hello worl'
# 字符串反转
In [18]: string_split[::-1]
Out[18]: 'dlrow olleh'

string 模块

In [8]: import string
# 所有大写字母
In [9]: string.ascii_uppercase
Out[9]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
# 所有小写字母
In [10]: string.ascii_lowercase
Out[10]: 'abcdefghijklmnopqrstuvwxyz'
# 所有字母
In [11]: string.ascii_letters
Out[11]: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
# 所有数字
In [12]: string.digits
Out[12]: '0123456789'

1.4 数字

常见操作

# + 加
In [54]: 5+5
Out[54]: 10
# - 减
In [55]: 5-5
Out[55]: 0
# * 乘
In [56]: 5*5
Out[56]: 25
# / 正常除法
In [57]: 6/5
Out[57]: 1.2
# // 地板除
In [58]: 6//5
Out[58]: 1
# % 取模
In [59]: 6%5
Out[59]: 1
# ** 幂运算
In [60]: 6**2
Out[60]: 36
# ** 幂运算 实现开方效果
In [61]: 4**0.5
Out[61]: 2.0

1.5 注释

单行注释
```
# 这是单行注释
```
多行注释
```
# 使用三个单引号
''' 这是多行注释''' 
# 使用三个双引号
""" 这是多行注释 """
```
注意多行注释和文档字符串的区别,文档字符串是可以调用的,而且要严格进行缩进,一般定义在类或者方法的第一行

1.6 标准数据类型

Number(数字)
- int
- float
- bool
- complex(复数)
String(字符串)
List(列表)
Tuple(元组)
Set(集合)
Dictionary(字典)

在这六个标准数据类型中分可变数据类型和不可变数据类型

可变数据类型 List,Dictionary Set
不可变数据类型 Number String Tuple

可变数据类型的方法通常是没有返回值的，直接修改源数据,不可变数据类型的方法一般有返回值，对源对象的操作生成新的对象,源对象并没有发生改变。

1.7 输入输出

输出-print

--1.占位符
name = 'World'
print('hello %s'%(name))
# hello World
--2. str.format() 格式化字符串
print('my name is {name},age is {age}'.format(name='justin',age=18))
-- 3. f-string 格式化字符串 推荐
name = 'World'
print(f'hello,{name}')
-- 4.不换行输出
for i in range(0,4):
	print(i,end='')

输入-input()

input参考资料：

第二章常见数据结构

2.1 list

定义

list（列表）是一系列按照特定顺序排列的元素组成,元素类型可以不一样。列表名字一般采用负数形式。有序。

常见方法

增

# range()函数
In [113]: range(1,11)
Out[113]: range(1, 11)
In [114]: type(range(1,11))
Out[114]: range
# 使用range()生成数字列表
# 生成 1-10的数字型列表
In [111]: numbers = list(range(1,11))
In [112]: numbers
Out[112]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# 定义空列表 存放手机
In [62]: cellphones = []
# append() 添加元素 没有返回值
In [63]: cellphones.append('Apple')

In [64]: cellphones.append('HuaWei')

In [65]: cellphones.append('小米')

In [66]: cellphones.append('Oppo')

In [67]: cellphones
Out[67]: ['Apple', 'HuaWei', '小米', 'Oppo']
# insert() 在指定位置插入数据
In [68]: cellphones.insert(0,'Nokia')
In [69]: cellphones
Out[69]: ['Nokia', 'Apple', 'HuaWei', '小米', 'Oppo']

删

In [89]: cellphones
Out[89]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
# del 没有返回值,按照索引进行删除
In [90]: del cellphones[0]
In [91]: cellphones
Out[91]: ['HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
In [92]: cellphones.insert(0,'Apple')
In [93]: cellphones
Out[93]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
# pop() 存在返回值,按照索引进行删除,默认删除末尾的元素 
In [94]: cellphones.pop()
Out[94]: 'Xiaomi'
In [95]: cellphones
Out[95]: ['Apple', 'HuaWei', 'Nokia', 'Oppo']
In [96]: cellphones.append('Xiaomi')
In [97]: cellphones
Out[97]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
# 删除列表末尾，利用得到的返回值追加到列表末尾，列表没有发生改变
In [98]: cellphones.append(cellphones.pop())
In [99]: cellphones
Out[99]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
# pop() 可以按照索引进行删除
In [100]: cellphones.insert(0,cellphones.pop(0))
In [101]: cellphones
Out[101]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
# remove() 不存在返回值,根据值进行删除,如果存在多个，则只会删除第一个，如果要使用 remove()的值，可以将remove()要删除的值保存在变量中
In [102]: cellphones.remove('Apple')
In [103]: cellphones
Out[103]: ['HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
In [104]: cellphones.insert(0,'Apple')
In [105]: cellphones.insert(0,'Apple')
In [106]: cellphones
Out[106]: ['Apple', 'Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']

In [107]: cellphones.remove(['Apple','Apple'])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-107-a2c7e0df81fd> in <module>
----> 1 cellphones.remove(['Apple','Apple'])

ValueError: list.remove(x): x not in list
# remove() 不存在返回值,根据值进行删除,如果存在多个，则只会删除第一个
In [108]: cellphones.remove('Apple')

In [109]: cellphones
Out[109]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']

改

# 根据索引修改
In [72]: cellphones[3]='Xiaomi'
In [73]: cellphones
Out[73]: ['Nokia', 'Apple', 'HuaWei', 'Xiaomi', 'Oppo']

查

# 根据索引进行查找
# 列表索引都是从0开始的
In [70]: cellphones[0]
Out[70]: 'Nokia'

In [71]: cellphones[1]
Out[71]: 'Apple'

列表反转

# 使用 reverse()实现列表反转
In [73]: cellphones
Out[73]: ['Nokia', 'Apple', 'HuaWei', 'Xiaomi', 'Oppo']
# 使用列表切片同样可以实现列表反转
In [76]: cellphones[::-1]
Out[76]: ['Nokia', 'Apple', 'HuaWei', 'Xiaomi', 'Oppo']
In [77]: cellphones
Out[77]: ['Oppo', 'Xiaomi', 'HuaWei', 'Apple', 'Nokia']

列表切片

同字符串操作

列表长度

# len() 获取列表的长度
In [78]: len(cellphones)
Out[78]: 5

In [79]: cellphones
Out[79]: ['Oppo', 'Xiaomi', 'HuaWei', 'Apple', 'Nokia']

排序

# list.sort() 没有返回值，直接修改源列表,默认升序
In [83]: cellphones.sort()
In [84]: cellphones
Out[84]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
# 对列表进行降序排序
In [85]: cellphones.sort(reverse=True)

In [86]: cellphones
Out[86]: ['Xiaomi', 'Oppo', 'Nokia', 'HuaWei', 'Apple']
# 对列表进行升序排序,
In [87]: cellphones.sort(reverse=False)
In [88]: cellphones
Out[88]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']
# sorted(list) 对列表临时排序,不改变原来的列表顺序
In [80]: sorted(cellphones)
Out[80]: ['Apple', 'HuaWei', 'Nokia', 'Oppo', 'Xiaomi']

In [81]: cellphones
Out[81]: ['Oppo', 'Xiaomi', 'HuaWei', 'Apple', 'Nokia']

遍历

# 遍历列表
In [115]: for cellphone in cellphones:
   ...:     print(cellphone)
     ...:
Apple
HuaWei
Nokia
Oppo
Xiaomi

列表复制

In [124]: my_food = ['bread','milk','orange']
# 使用切片对列表进行复制，对复制的列表操作不会影响源列表
In [125]: friend_food = my_food[:]

In [126]: friend_food
Out[126]: ['bread', 'milk', 'orange']

In [127]: my_food.append('apple')

In [128]: friend_food.append('pear')

In [129]: my_food
Out[129]: ['bread', 'milk', 'orange', 'apple']

In [130]: friend_food
Out[130]: ['bread', 'milk', 'orange', 'pear']
# 避免这种错误出现 
In [131]: friend_food_copy = friend_food

In [132]: friend_food
Out[132]: ['bread', 'milk', 'orange', 'pear']

In [133]: friend_food_copy.pop()
Out[133]: 'pear'

In [134]: friend_food
Out[134]: ['bread', 'milk', 'orange']

不可以用 friend_food=my_food 进行列表复制，这只是friend_food指向my_food,对friend_food 的任何操作都会影响源列表

列表推导式

# 简单用法
In [116]: numbers = [i for i in range(1,11)]
In [117]: numbers
Out[117]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# 带if 条件
In [118]: numbers = [i for i in range(1,11) if i%2==0]
In [119]: numbers
Out[119]: [2, 4, 6, 8, 10]
# 多个 for 循环
In [120]: list_e = [(e, f * f) for e in range(3) for f in range(5, 15, 5)]
In [121]: list_e
Out[121]: [(0, 25), (0, 100), (1, 25), (1, 100), (2, 25), (2, 100)]
# 嵌套列表推导式,多个并列条件
In [122]: list_g = [[x for x in range(g - 3, g)] for g in range(22) if g % 3 == 0 and g != 0]

In [123]: list_g
Out[123]:
[[0, 1, 2],
 [3, 4, 5],
 [6, 7, 8],
 [9, 10, 11],
 [12, 13, 14],
 [15, 16, 17],
 [18, 19, 20]]

列表的深拷贝浅拷贝

# 列表的引用
In [1]: a = [1,2,3]
In [2]: b = a
In [3]: b
Out[3]: [1, 2, 3]
In [4]: a[2]=4
In [5]: a
Out[5]: [1, 2, 4]
In [6]: b
Out[6]: [1, 2, 4]
# 列表的浅拷贝 list.copy()
In [40]: b = a.copy()
In [41]: a
Out[41]: [1, 2, 3, [4, 5]]
In [42]: b
Out[42]: [1, 2, 3, [4, 5]]
In [44]: a[0]=10
In [45]: a
Out[45]: [10, 2, 3, [4, 5]]
In [46]: b
Out[46]: [1, 2, 3, [4, 5]]
In [47]: a[3][0]=6
In [48]: a
Out[48]: [10, 2, 3, [6, 5]]
In [49]: b
Out[49]: [1, 2, 3, [6, 5]]
# 列表的浅拷贝 copy.copy() 只对外层元素进行了深拷贝
In [1]: list1 = [1,2,3,[4,5]]
In [2]: import copy
In [3]: list2 = copy.copy(list1)
In [4]: list1
Out[4]: [1, 2, 3, [4, 5]]
In [5]: list2
Out[5]: [1, 2, 3, [4, 5]]
In [6]: list1[0]=10
In [7]: list1
Out[7]: [10, 2, 3, [4, 5]]
In [8]: list2
Out[8]: [1, 2, 3, [4, 5]]
In [9]: list1[3][0]=6
In [10]: list1
Out[10]: [10, 2, 3, [6, 5]]
In [11]: list2
Out[11]: [1, 2, 3, [6, 5]]
# 列表的深拷贝 copy.deepcopy() 列表的深拷贝 ，完全不相干
In [13]: list1 =[1,2,3,[4,5]]
In [14]: list2 = copy.deepcopy(list1)
In [15]: list2
Out[15]: [1, 2, 3, [4, 5]]
In [16]: list1[0]=10
In [17]: list1
Out[17]: [10, 2, 3, [4, 5]]
In [18]: list2
Out[18]: [1, 2, 3, [4, 5]]
In [19]: list1[3][0]=6
In [20]: list1
Out[20]: [10, 2, 3, [6, 5]]
In [21]: list2
Out[21]: [1, 2, 3, [4, 5]]

使用场景

list 使用在需要查询,修改的场景,极不擅长需要频繁的插入和删除元素的场景。

2.2 tuple

定义

tuple–元组,元组是不可变的,元组是一类不允许添加和删除元素的特殊列表，一旦创建不允许添加和删除修改.有序。

常见方法

增

# 定义一个空的元组
In [143]: tuple1 = ()
# 如果元组只有一个元素，逗号不能少
In [144]: tuple2 = (1,)

删

元组没有删除元素的方法，但是可以删除整个元组
```
In [141]: del tuple1
```
改

元组是不可变的，不能修改元组中的元素，但是可以对元组进行拼接
```
# 通过元组拼接
In [226]: (1,2,3)+(4,)
Out[226]: (1, 2, 3, 4)
```

查

按照索引进行查看

In [224]: tuple1 = (1,2,3,4,5,6)

In [225]: tuple1[0]

元组内置函数

# max() 对应list 中也有相应的方法
In [152]: max(tuple1)
Out[152]: 3
# min()
In [153]: min(tuple2)
Out[153]: 4
# 将 list 转元组
In [154]: tuple([1,2,3,4,5,6])
Out[154]: (1, 2, 3, 4, 5, 6)

zip 函数

# zip 函数将可迭代对象 打包成元组形式,组成对应 x,y
In [96]: a = [1,2,3]

In [97]: b = [1,4,9]

In [98]: for x,y in zip(a,b):
    ...:     print(f'x ={x} y={y}')
    ...:
x =1 y=1
x =2 y=4
x =3 y=9
# 解包
print([*zip(a,b)])
#  [(1, 1), (2, 4), (3, 9)]

元组运算符

# 元组元素个数
In [145]: len((1,2,3))
Out[145]: 3
# 元组拼接
In [146]: (1,2,3)+(4,5,6)
Out[146]: (1, 2, 3, 4, 5, 6)
# 元组复制
In [147]: ('hi',)*4
Out[147]: ('hi', 'hi', 'hi', 'hi')
# 元素是否存在
In [148]: 3 in (1,2,3)
Out[148]: True

遍历元组

同list 一样

# 元组推导式
In [138]: tuple1 = (i for i in range(1,11))
# 返回一个可迭代对象
In [139]: tuple1
Out[139]: <generator object <genexpr> at 0x0000022BC1B08DD0>
# 遍历元组
In [140]: for i in tuple1:
     ...:     print(i)
     ...:
1
2
3
4
5
6
7
8
9
10

元组推导式

# 元组推导式
In [138]: tuple1 = (i for i in range(1,11))
# 返回一个可迭代对象
In [139]: tuple1
Out[139]: <generator object <genexpr> at 0x0000022BC1B08DD0>

list 能用的推导式在元组上都可以进行使用

使用场景

相比较于list,tuple实例更加节省内存，如果你确定你的对象后面不会被修改，可以大胆使用元组。元组常用于拆包解包。

2.3 `dict`

定义

一种键值对的数据结构。无序。

常见方法

增

# 创建一个空的字典对象
In [170]: alien_0 = {}
# 
In [171]: alien_0['color'] = 'grenn'

In [172]: alien_0['point'] = 5

In [173]: alien_0
Out[173]: {'color': 'grenn', 'point': 5}

删

In [176]: del alien_0['point']

In [177]: alien_0
Out[177]: {'color': 'yellow'}

改

In [174]: alien_0['color'] = 'yellow'

In [175]: alien_0
Out[175]: {'color': 'yellow', 'point': 5}

查

通过key 查看

遍历

遍历键值对

In [179]: favorite_languages
Out[179]: {'jen': 'python', 'sarah': 'c', 'edward': 'ruby', 'phil': 'python'}

In [180]: for name,language in favorite_languages.items():
     ...:     print(f"{name}'s favotite language is {language}")
     ...:
jen's favotite language is python
sarah's favotite language is c
edward's favotite language is ruby
phil's favotite language is python

遍历键

In [181]: for name in favorite_languages.keys():
     ...:     print(name)
     ...:
jen
sarah
edward
phil
# 使用 key 遍历
In [183]: for name in favorite_languages.keys():
     ...:     print(f"{name}'s favotite language is ",favorite_languages[name])
     ...:
jen's favotite language is  python
sarah's favotite language is  c
edward's favotite language is  ruby
phil's favotite language is  python
# 遍历字典时会自动遍历key，所以在这里可以不写，但是建议写上，可阅读性高
In [183]: for name in favorite_languages.keys():
     ...:     print(f"{name}'s favotite language is ",favorite_languages[name])
     ...:
jen's favotite language is  python
sarah's favotite language is  c
edward's favotite language is  ruby
phil's favotite language is  python

遍历值

In [182]: for language in favorite_languages.values():
     ...:     print(language)
     ...:
python
c
ruby
python

按顺序进行遍历

# keys 升序排序
In [189]: for name in sorted(favorite_languages.keys()):
     ...:     print(f"{name}'s favotite language is ",favorite_languages[name])
     ...:
     ...:
edward's favotite language is  ruby
jen's favotite language is  python
phil's favotite language is  python
sarah's favotite language is  c

字典推导式

#value最后都被修改为z
In [190]: dict_a = {key:value for key in string.ascii_uppercase for value in str
     ...: ing.ascii_lowercase}

In [191]: dict_a
Out[191]:
{'A': 'z',
 'B': 'z',
 'C': 'z',
 'D': 'z',
 'E': 'z',
 'F': 'z',
 'G': 'z',
 'H': 'z',
 'I': 'z',
 'J': 'z',
 'K': 'z',
 'L': 'z',
 'M': 'z',
 'N': 'z',
 'O': 'z',
 'P': 'z',
 'Q': 'z',
 'R': 'z',
 'S': 'z',
 'T': 'z',
 'U': 'z',
 'V': 'z',
 'W': 'z',
 'X': 'z',
 'Y': 'z',
 'Z': 'z'}
 #根据键来构造值
In [194]: dict_b = {key:key*key for key in range(11)}
 #遍历一个有键值关系的可迭代对象
In [195]: dict_b
Out[195]: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}
In [196]: list_phone = [('HUAWEI', '华为'), ('MI', '小米'), ('OPPO', 'OPPO'), ('VIVO', 'VIVO')]
     ...: dict_c = {key: value for key, value in list_phone}

In [197]: dict_c
Out[197]: {'HUAWEI': '华为', 'MI': '小米', 'OPPO': 'OPPO', 'VIVO': 'VIVO'}
 
In [203]: dict1 = {"a":10,"B":20,"C":True,"D":"hello world","e":"python教程"}
     ...: dict2 = {key:value for key,value in dict1.items() if key.islower()}

In [204]: dict2
Out[204]: {'a': 10, 'e': 'python教程'}
# 生成大小写字母对照字典
In [206]: dict_3 = {key:key.lower() for key in string.ascii_uppercase}

In [207]: dict_3
Out[207]:
{'A': 'a',
 'B': 'b',
 'C': 'c',
 'D': 'd',
 'E': 'e',
 'F': 'f',
 'G': 'g',
 'H': 'h',
 'I': 'i',
 'J': 'j',
 'K': 'k',
 'L': 'l',
 'M': 'm',
 'N': 'n',
 'O': 'o',
 'P': 'p',
 'Q': 'q',
 'R': 'r',
 'S': 's',
 'T': 't',
 'U': 'u',
 'V': 'v',
 'W': 'w',
 'X': 'x',
 'Y': 'y',
 'Z': 'z'}

使用场景

字典适合在查询较多的场景，时间复杂度O(1),Python类中属性值等信息也是缓存在__dict__这个字典型数据结构中。dict占用字节数是list,tuple 的三四倍，对内存要求苛刻的场景谨慎使用字典。

2.4 set

定义

集合，集合中的元素都不重复。无序。

常见方法

增

In [210]: set_a = set()

In [211]: set_a.add('Justin')

In [212]: set_a.update([i for i in range(1,11)])

In [213]: set_a
Out[213]: {1, 10, 2, 3, 4, 5, 6, 7, 8, 9, 'Justin'}

删

In [219]: set_a.remove('Justin')

In [220]: set_a
Out[220]: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [221]: set_a.discard('0')

In [222]: set_a
Out[222]: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

改

set 集合没有修改元素的方法，如果要修改磨合元素可以先进行删除，在添加

查

set 集合不能通过索引下标进行访问，可以先转成list

集合间运算

# 存在于集合set_a 而不存在于集合set_b
In [230]: set_a - set_b
Out[230]: {7, 8, 9, 10}
# 并集
In [231]: set_a | set_b
Out[231]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
# 交集
In [232]: set_a & set_b
Out[232]: {1, 2, 3, 4, 5, 6}
# 异或
In [233]: set_a ^ set_b
Out[233]: {0, 7, 8, 9, 10}

In [234]: set_a
Out[234]: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [235]: set_b
Out[235]: {0, 1, 2, 3, 4, 5, 6}

清空集合&集合长度

In [236]: set_c = set_a | set_b

In [237]: set_c
Out[237]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [238]: len(set_c)
Out[238]: 11
In [241]: set_c.clear()

In [242]: len(set_c)
Out[242]: 0

遍历

In [241]: set_c.clear()

In [242]: len(set_c)
Out[242]: 0

使用场景

如果只是想缓存某些元素值，并且要求元素值不能重复时，可以使用此结构，并且set内部允许增删元素，且效率很高

第三章流程控制

3.1 if

示列

# 简单 if 
if True:
    pass
# if else 
if True:
    do something
else:
    do others
# if elif else
if True:
    do something
elif True:
    do something
else:
    do something

以下值都为False
- None
- False
- 所有的值为0的数
- 0 整型
- 0.0 浮点型
- “” 空字符串
- [] 空列表
- () 空元组
- {} 空字典
- set() 空集合
```
In [209]: bool(None),bool(False),bool(0),bool([]),bool(()),bool({}),bool(""),bool(set())
Out[209]: (False, False, False, False, False, False, False, False)
```
Python 没有长整型数据类型

3.2 while

示列

count = 0
while count < 5:
print (count, " 小于 5")
count = count + 1
else:
print (count, " 大于或等于 5")

3.3 for

示列

第四章函数

4.1 函数的调用

4.2 函数的创建

4.3 函数的参数

实参

形参

位置实参

In [245]: def pos_arg(name,age):
     ...:     print(f'name is {name} age is {age}')
     ...:
In [247]: pos_arg("Justin",18)
name is Justin age is 18

默认参数

In [251]: def def_arg(number,n=2):
     ...:     print(number**n)
     ...:

In [252]: def_arg(2,3)
8

In [253]: def_arg(2)
4

关键字实参

In [254]: def key_arg(name,age):
     ...:     print(f'name is {name} age is {age}')
     ...:

In [255]: key_arg(age ='20',name = 'Justin')
name is Justin age is 20

可变参数

任意数量关键字实参(字典)

In [272]: def fun1(hight,weight,**args):
     ...:     info = {}
     ...:     info['hight'] = hight
     ...:     info['weight'] = weight
     ...:     for key,value in args.items():
     ...:         info[key] = value
     ...:     return info
     ...:
     ...:

In [273]: fun1(180,130,name="Justin",age=18)
Out[273]: {'hight': 180, 'weight': 130, 'name': 'Justin', 'age': 18}

In [274]: fun1(180,130,name="Justin",age=18,university='清华大学')
Out[274]:
{'hight': 180,
 'weight': 130,
 'name': 'Justin',
 'age': 18,
 'university': '清华大学'}

任意数量实参(元组)

In [265]: fun(1,2,3,4,5,6)
Out[265]: 21
In [266]: fun(1,2)
Out[266]: 3
In [267]: fun(1,2,3)
Out[267]: 6
In [268]: def fun(*args):
     ...:     sum = 0
     ...:     for i in args:
     ...:         sum +=i
     ...:     return sum
     ...:
     ...:

In [269]:

4.4 常用函数

高阶函数

map

In [275]: a,*b = map(int,input().strip().split())
1 2 3 4 5 6 7

In [276]: a
Out[276]: 1

In [277]: b
Out[277]: [2, 3, 4, 5, 6, 7]

reduce

In [279]: from functools import reduce
     ...: def add(x,y):
     ...:     return x+y
     ...:

In [280]: reduce(add,[1,3,5,7,9])
Out[280]: 25

filter

In [281]: def not_empty(s):
     ...:     return s and s.strip()
     ...:

In [282]: list(filter(not_empty,['A', '', 'B', None, 'C', ' ']))
Out[282]: ['A', 'B', 'C']

sorted

list3 = [-1,2,-100,3,-4,5]
print(sorted(list3,key=abs))#[-1, 2, 3, -4, 5, -100]
list3
# [-1, 2, -100, 3, -4, 5]

匿名函数

list(map(lambda x:x*x,[1,2,3,4,5,6,7,8,9]))
# [1, 4, 9, 16, 25, 36, 49, 64, 81]

内置函数
- int
- str
- float
- type

4.5 将函数存储在模块中

4.6 函数中的解包和打包

在函数定义中

#  * 的作用：在函数定义中，收集所有的位置参数到一个新的元组，并将这个元组赋值给变量args
In [133]: def add(*arg):
     ...:     sum1 = 0
     ...:     for i in arg:
     ...:         sum1 +=i
     ...:     return sum1
     ...:

In [134]: add(1)
Out[134]: 1

In [135]: add(1,2)
Out[135]: 3

In [136]: add(1,2,3,4,5,6)
Out[136]: 21
# ** 的作用：在函数定义中，收集关键字参数传递给一个字典，并将这个字典赋值给变量kwargs
In [137]: def fun1(**args):
     ...:     for key,value in args.items():
     ...:         print(f'key={key},value={value}')
     ...:

In [138]: fun1(a=1,b=2)
key=a,value=1
key=b,value=2

在函数调用中

# * 的作用：在函数调用中，* 能够将元组或者列表解包成不同的参数
def myfun(a, b):
    print(a + b)
>>> n = [1, 2]
>>> myfun(*n)
3
>>> m = (1, 2)
>>> myfun(*m)
3
# ** 的作用：在函数调用中，**会以键/值的形式解包一个字典，使其成为独立的关键字参数
>>> mydict = {'a':1, 'b': 2}
>>> myfun(**mydict)
3
>>> myfun(*mydict)
ba

4.7 ·`*args **kwargs`的使用

python中*args和**kwargs的理解

4.8 高阶函数

4.9 装饰器

第五章面向对象

5.1 类

self
__init__

5.2 继承

5.3 封装

5.4 鸭子类型

5.5 内部类

5.6 装饰器

5.7 工厂方法

第六章文件与目录

6.1 目录常见操作

创建目录

# os.makedirs 可以递归的创建目录结构
import os
os.makedirs('tmp/python/fileop',exist_ok=True)

exist_ok=True 表示如果目录存在也不报错

删除文件或目录

# 删除文件
import os
os.remove('sdf.py')
# 删除目录
# shutil.rmtree() 可以递归的删除某个目录所有的子目录和子文件
import shutil
shutil.rmtree('tmp', ignore_errors=True)

拷贝文件或目录

# 拷贝文件
from shutil import copyfile

# 拷贝 d:/tools/first.py 到 e:/first.py
copyfile('d:/tools/first.py', 'e:/first.py')
# 拷贝目录
from shutil import copytree

# 拷贝 d:/tools/aaa 目录中所有的内容 到 e:/bbb 中
copytree('d:/tools/aaa', 'e:/new/bbb')
# 移动文件
move(source,dst)

修改文件名或目录名

import os

# 修改目录名 d:/tools/aaa 为 d:/tools/bbb
os.rename('d:/tools/aaa','d:/tools/bbb')

# 修改文件名 d:/tools/first.py 为 d:/tools/second.py
os.rename('d:/tools/first.py','d:/tools/second.py')

对文件路径名操作

In [317]: import os
# 获取文件名
In [318]: os.path.basename(r'E:\oracle11gclient_X64\client\doc\index.htm')
Out[318]: 'index.htm'
# 获取目录名
In [319]: os.path.dirname('E:\oracle11gclient_X64\client\doc\index.htm')
Out[319]: 'E:\\oracle11gclient_X64\\client\\doc'
# 目录名拼接
In [320]: os.path.join('temp','test',os.path.basename('E:\oracle11gclient_X64\client\doc\index.htm'))
Out[320]: 'temp\\test\\index.htm'

判断文件，目录是否存在

import os
# 判断指定路径的文件或者目录是否存在
os.path.exists('d:/systems/cmd.exe')
os.path.exists('d:/systems')
# 判断文件是否存在
import os
# 返回值为True 表示是文件
os.path.isfile('d:/systems/cmd.exe')
# 判断指定的路径是否是目录
import os
# 返回值为True 表示是目录
os.path.isdir('d:/systems')

文件大小和修改日期

# 获取文件的大小和修改日期
# 返回文件大小
>>> os.path.getsize('file1') 
3669

# 返回文件的最后修改日期，是秒时间
>>> os.path.getmtime('file1') 
1272478234.0

# 可以把秒时间 转化为日期时间
>>> import time
>>> time.ctime(os.path.getmtime('/etc/passwd'))
'Wed Apr 28 13:10:34 2010'
>>>

# 获取目录的大小
import os
def getFileSize(filePath, size=0):
    for root, dirs, files in os.walk(filePath):
        for f in files:
            size += os.path.getsize(os.path.join(root, f))
            print(f)
    return size

其实获取目录大小就是遍历目录中的每一个文件，大小相加

当前工作目录

In [286]: import os
# 获取当前目录
In [287]: cwd = os.getcwd()
In [289]: cwd = os.getcwd()
# 切换当前目录
In [290]: os.chdir('E:\\')
In [291]: print(os.getcwd())
E:\

递归遍历目录下的所有文件

import os

# 目标目录
targetDir = r'E:\test_makedirs'
files = []
dirs  = []

# 下面的三个变量 dirpath, dirnames, filenames
# dirpath 代表当前遍历到的目录名
# dirnames 是列表对象，存放当前dirpath中的所有子目录名
# filenames 是列表对象，存放当前dirpath中的所有文件名

for (dirpath, dirnames, filenames) in os.walk(targetDir):
    files += filenames
    dirs += dirnames

print(files) # ['aaa.txt', 'bbb.txt', 'ccc.txt', 'ddd.txt']
print(dirs) # ['a', 'b', 'c', 'd', 'aa', 'bb', 'cc', 'dd']
# 得到某个目录下所有文件的全路径
import os
# 目标目录
targetDir = r'd:\tmp\util\dist\check'
for (dirpath, dirnames, filenames) in os.walk(targetDir):
    for fn in filenames:
        # 把 dirpath 和 每个文件名拼接起来 就是全路径
        fpath = os.path.join(dirpath, fn)

import os
from os.path import isfile, join,isdir
# 目标目录
targetDir = r'd:\tmp\util\dist\check'
# 所有的文件
print([f for f in os.listdir(targetDir) if isfile(join(targetDir, f))])
# 所有的目录
print([f for f in os.listdir(targetDir) if isdir(join(targetDir, f))])

import glob
exes = glob.glob(r'd:\tmp\*.txt')
print(exes)

实战

# 创建如下文件 E:\\test_makedirs\\a\aa\\aaa.txt E:\\test_makedirs\\b\bb\\bbb.txt E:\\test_makedirs\\c\cc\\ccc.txt E:\\test_makedirs\\d\dd\\ddd.txt 
In [353]: list1
Out[353]: ['a', 'b', 'c', 'd']

In [354]: for i in list1:
     ...:     path = os.path.join('E:\\test_makedirs',i,i*2)
     ...:     os.makedirs(path,exist_ok=True)
     ...:     with open(os.path.join(path,i*3+'.txt'),'w') as f:
     ...:         f.write(i*100)

6.2文件读取

全部读取

targetFile = r'E:\666\file_open.txt'
with open(targetFile) as f:
    content = f.read()
    print(content.rstrip())

逐行读取

targetFile = r'E:\666\file_open.txt'
# 遍历文件会自动遍历每一行 ，所以这里可以直接 for line in f
with open(targetFile) as f:
    for line in f:
        print(line)
targetFile = r'E:\666\file_open.txt'
# 遍历文件会自动遍历每一行 ，建议直接显示写明readlines ,可阅读性高
with open(targetFile) as f:
    for line in f.readlines():
        print(line)

6.3 文件写入

新建文件

filename = 'programming.txt'
with open(filename, 'w',encoding='utf8') as file_object:
    file_object.write("I love programming.\n")
    file_object.write("I love creating new games.\n")

r 只读 w–写,a–追加,r+ --读写

追加文件

filename = 'programming.txt'
with open(filename, 'a',encoding='utf8') as file_object:
    file_object.write("I also love finding meaning in large datasets.\n")
    file_object.write("I love creating apps that can run in a browser.\n")

第七章异常

7.1 异常处理

# 只有可能引发异常的代码才需要放在try 语句中
try:
	answer = int(first_number) / int(second_number)
# except ,在尝试运行try 代码发生了指定异常是，运行
except ZeroDivisionError:
	print("You can't divide by 0!")
# 有一些在try 代码成功执行时才需要运行的代码放在else 代码中
else:
	print(answer)

except 中可以使用pass ,代表发生异常什么都不做

7.2 日志

常见的日志级别

类型描述
DEBUG 仅用于问题诊断的信息
INFO 该程序正在按预期运行
WARNING 指示出了问题
ERROR 该软件将不再能够运行
CRITICAL 非常严重的错误

实战

import logging
logging.basicConfig(filename='program1.log', format='%(asctime)s %(message)s', level=logging.INFO)
logging.info('Logging app started')
logging.warning('An example logging message.')
logging.warning('Another log message')
###输出以下内容###
"""
2022-02-23 15:08:34,021 Logging app started
2022-02-23 15:08:34,021 An example logging message.
2022-02-23 15:08:34,021 Another log message
"""

7.3 存储数据

序列化

json.dump()
反序列化

json.load()

实战

# 序列化和反序列化
import json


def get_stored_username():
    """如果存储了用户名，就获取它 反序列化"""
    filename = 'username.json'
    try:
        with open(filename) as f_obj:
            # 读取文件中的信息 存储到变量username 中
            username = json.load(f_obj)
    except FileNotFoundError:
        return None
    else:
        return username


def get_new_username():
    """提示用户输入用户名,序列化"""
    username = input("what is your name? ")
    filename = 'username.json'
    # 使用dump 将数据转存到文件username.json 函数json.dump()接受两个实参：要存储的数据以及可用于存储数据的文件对象
    with open(filename, 'w') as f_obj:
        json.dump(username, f_obj)
    return username


def greet_user():
    """问候用户，指出起名字"""
    username = get_stored_username()
    if username:
        print("Welcome back, " + username + "!")
    else:
        username = get_new_username()
        print("We'll remember you when you come back, " + username + "!")


greet_user()

总结

Python 中序列化的方式
- json
- pickle(机器学习中保存模型使用较多)
- shelve
- marshal
- joblib(机器学习中保存模型较多,可以多进程,效率比pickle更高)

第八章测试代码

8.1 几个常见概念

单元测试

用于核实函数的某个方面有没有问题
测试用例

一组单元测试,这些单元测试一起核实函数在各种情形下的行为都符合要求。
全覆盖式测试用例

包含一整套的单元测试，涵盖了各种可能的函数使用方式。
常见的断言方法
- assertEqual(a,b) 核实 a==b
- assertNotEqual(a,b) 核实 a!=b
- assertTrue(x) 核实 x 为True
- assertFalse(x) 核实 a 为 False
- assertIn(item,list) 核实 item 是否在 list 中
- assertNotIn(item,list) 核实 item 不在 list 中

8.2 测试函数

测试函数

# 需要进行单元测试的类 NameFunction.py
def get_formatted_name(first, last, middle=''):
    if middle:
        full_name = first + ' ' + middle + ' ' + last
    else:
        full_name = first + ' ' + last
    return full_name.title()

单元测试

# 单元测试的类 NamesTestCase.py
import unittest
from NameFunction import get_formatted_name


# 必须继承 unittest.TestCase类，才能进行单元测试
class NamesTestCase(unittest.TestCase):
    """测试NameFunction.py"""

    def test_first_last_name(self):
        formatted_name = get_formatted_name('james', 'hardon')
        self.assertEqual(formatted_name, 'James Hardon')

    def test_first_last_middle_name(self):
        """测试能否返回 middle 名字"""
        formatted_name = get_formatted_name(
            'wolfgang', 'mozart', 'amadeus')
        self.assertEqual(formatted_name, 'Wolfgang Amadeus Mozart')


if __name__ == '__main__':
    unittest.main()

8.3 测试类

需要测试的类

class AnonymousSurvey():
    """收集匿名调查问卷的答案"""

    def __init__(self, question):
        """存储一个问题，并为存储答案做准备"""
        self.question = question
        self.responses = []

    def show_question(self):
        """显示调查问卷"""
        print(self.question)

    def store_response(self, new_response):
        """存储单份调查答卷"""

        self.responses.append(new_response)

    def show_results(self):
        """显示收集到的所有答卷"""
        print("Survey results:")
        for response in self.responses:
            print('- ' + response)

测试类

import unittest
from AnonymousSurvey import AnonymousSurvey


class SurveyTestCase(unittest.TestCase):
    """针对AnonymouSurvey类的测试"""

    def setUp(self):
        """ 创建一个调查对象和一组答案,供使用的测试方法使用"""
        question = "What language did you first learn to speak?"
        self.my_survey = AnonymousSurvey(question)
        self.response = ['English', 'Spanish', 'Mandarin']

    def test_store_single_response(self):
        """测试单个答案会被妥善地存储"""

        self.my_survey.store_response(self.responses[0])
        self.assertIn(self.responses[0], self.my_survey.responses)

    def test_store_three_responses(self):
        """测试三个答案会被妥善地存储"""

        for response in self.responses:
            self.my_survey.store_response(response)
        for response in self.responses:
            self.assertIn(response, self.my_survey.responses)


if __name__ == '__main__':
    unittest.main()

unittest.TestCase类包含方法setUp()，让我们只需创建这些对象一次，并在每个测试方法中使用它们。如果你在TestCase类中包含了方法setUp(),Python将先运行它，再运行各个以test_打头的方法。这样，在你编写的每个测试方法中都可使用在方法setUp()中创建的对象了。

第九章多线程

https://github.com/Adopat/Python-tutorial/tree/master/10.%E5%A4%9A%E7%BA%BF%E7%A8%8B

第十章多进程

https://github.com/Adopat/Python-tutorial/tree/master/11.%E5%A4%9A%E8%BF%9B%E7%A8%8B

第十一章 `NumPy`

第十二章 Pandas

12.1 常见概念

12.2 常见操作

了解数据

# 查看前几行，末尾几行
df.head()
# 查看基本信息
df.info()
# 查看索引
df.index
# 查看列
df.columns 
# 获取某一列列名 获取第二列列名
df.columns[1]
# 查看行数
df.shape[0]
# 查看列数
df.shape[1]
# 查看某一列的数据类型
df.A.dtype 或 df.dtypes['A']
# 查看某一类 df[['A','B']] 选择多列
df.A 或 df['A'] 
# 统计某一列 各个数据 出现次数
df.A.value_counts()
# 查看 某一列总共条数
df.A.value_counts().count()
##实例代码 
In [442]: test_dict
Out[442]:
{'id': [1, 2, 3, 4, 5, 6],
 'name': ['Alice', 'Bob', 'Cindy', 'Eric', 'Helen', 'Grace '],
 'math': [90, 89, 99, 78, 97, 93],
 'english': [89, 94, 80, 94, 94, 90],
 'gender': ['male', 'male', 'fmale', 'fmale', 'fmale', 'fmale']}

In [443]: import pandas as pd
# 创建DataFrame 对象
In [444]: df = pd.DataFrame(test_dict)

In [445]: df
Out[445]:
   id    name  math  english gender
0   1   Alice    90       89   male
1   2     Bob    89       94   male
2   3   Cindy    99       80  fmale
3   4    Eric    78       94  fmale
4   5   Helen    97       94  fmale
5   6  Grace     93       90  fmale
# df.head() 默认显示前五行
In [446]: df.head()
Out[446]:
   id   name  math  english gender
0   1  Alice    90       89   male
1   2    Bob    89       94   male
2   3  Cindy    99       80  fmale
3   4   Eric    78       94  fmale
4   5  Helen    97       94  fmale
# df.tail() 默认显示前五行
In [447]: df.tail()
Out[447]:
   id    name  math  english gender
1   2     Bob    89       94   male
2   3   Cindy    99       80  fmale
3   4    Eric    78       94  fmale
4   5   Helen    97       94  fmale
5   6  Grace     93       90  fmale
# df.columns 查看列名
In [449]: df.columns
Out[449]: Index(['id', 'name', 'math', 'english', 'gender'], dtype='object')
# df.index 查看索引
In [450]: df.index
Out[450]: RangeIndex(start=0, stop=6, step=1)
# 查看行数 df.shape[0]
In [451]: df.shape[0]
Out[451]: 6
# 查看列数 df.shape[1] 或 df.shape 返回行和列
In [452]: df.shape[1]
Out[452]: 5
# 查看某一列
In [453]: df.gender
Out[453]:
0     male
1     male
2    fmale
3    fmale
4    fmale
5    fmale
Name: gender, dtype: object
# 统计某一列各种数据出现的次数
In [454]: df.gender.value_counts()
Out[454]:
fmale    4
male     2
Name: gender, dtype: int64
In [456]: df.describe()
Out[456]:
             id       math    english
count  6.000000   6.000000   6.000000
mean   3.500000  91.000000  90.166667
std    1.870829   7.456541   5.455884
min    1.000000  78.000000  80.000000
25%    2.250000  89.250000  89.250000
50%    3.500000  91.500000  92.000000
75%    4.750000  96.000000  94.000000
max    6.000000  99.000000  94.000000

In [457]: df.gender.dtype
Out[457]: dtype('O')

In [458]: df.dtypes['math']
Out[458]: dtype('int64')

In [459]: df.dtypes['gender']
Out[459]: dtype('O')

In [460]: df.gender.value_counts().count()
Out[460]: 2
# df.describe 可以快速查看数据的统计摘要，默认只统计数值类型
In [474]: df.describe()
Out[474]:
             id       math    english
count  6.000000   6.000000   6.000000
mean   3.500000  91.000000  90.166667
std    1.870829   7.456541   5.455884
min    1.000000  78.000000  80.000000
25%    2.250000  89.250000  89.250000
50%    3.500000  91.500000  92.000000
75%    4.750000  96.000000  94.000000
max    6.000000  99.000000  94.000000
# df.describe 可以快速查看数据的统计摘要，默认只统计数值类型 加上参数 include='all' 可以统计所有列
In [475]: df.describe(include='all')
Out[475]:
              id   name       math    english gender
count   6.000000      6   6.000000   6.000000      6
unique       NaN      6        NaN        NaN      2
top          NaN  Helen        NaN        NaN  fmale
freq         NaN      1        NaN        NaN      4
mean    3.500000    NaN  91.000000  90.166667    NaN
std     1.870829    NaN   7.456541   5.455884    NaN
min     1.000000    NaN  78.000000  80.000000    NaN
25%     2.250000    NaN  89.250000  89.250000    NaN
50%     3.500000    NaN  91.500000  92.000000    NaN
75%     4.750000    NaN  96.000000  94.000000    NaN
max     6.000000    NaN  99.000000  94.000000    NaN
# 单独统计某一列的信息
In [476]: df.math.describe()
Out[476]:
count     6.000000
mean     91.000000
std       7.456541
min      78.000000
25%      89.250000
50%      91.500000
75%      96.000000
max      99.000000
Name: math, dtype: float64
# 查看某一列数据类型 df.A.dtype
In [457]: df.gender.dtype
Out[457]: dtype('O')

In [458]: df.dtypes['math']
Out[458]: dtype('int64')

In [459]: df.dtypes['gender']
Out[459]: dtype('O')
# 查看 count(distinct) 等价于 nunique()
In [460]: df.gender.value_counts().count()
Out[460]: 2
# df 中的distinct
In [472]: df.gender.nunique()
Out[472]: 2
# 查看 DataFrame 的基本信息
In [462]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   id       6 non-null      int64
 1   name     6 non-null      object
 2   math     6 non-null      int64
 3   english  6 non-null      int64
 4   gender   6 non-null      object
dtypes: int64(3), object(2)
memory usage: 368.0+ bytes
# 选取多列
In [470]: df[['name','gender']]
Out[470]:
     name gender
0   Alice   male
1     Bob   male
2   Cindy  fmale
3    Eric  fmale
4   Helen  fmale
5  Grace   fmale
# 选取一列
In [471]: df['name']
Out[471]:
0     Alice
1       Bob
2     Cindy
3      Eric
4     Helen
5    Grace
Name: name, dtype: object
# 返回列名
In [478]: df.columns[1]
Out[478]: 'name'

过滤和排序

# 过滤
In [483]: df
Out[483]:
   id    name  math  english gender
0   1   Alice    90       89   male
1   2     Bob    89       94   male
2   3   Cindy    99       80  fmale
3   4    Eric    78       94  fmale
4   5   Helen    97       94  fmale
5   6  Grace     93       90  fmale
# 查询英语成绩>90 分
In [485]: df.query('english>90')
Out[485]:
   id   name  math  english gender
1   2    Bob    89       94   male
3   4   Eric    78       94  fmale
4   5  Helen    97       94  fmale
# 查询英语成绩>90 分 去重后性别人数
In [486]: df.query('english>90').gender.nunique()
Out[486]: 2
# 查询英语成绩>90 分 性别分布
In [487]: df.query('english>90').gender.value_counts()
Out[487]:
fmale    2
male     1
Name: gender, dtype: int64
# 查询英语 90分数学90 的信息 
In [489]: df.query('english>90&math>90')
Out[489]:
   id   name  math  english gender
4   5  Helen    97       94  fmale
# 注意使用以下写法时 要分别将每个条件用括号括起来
In [492]: df[(df['english']>90) & (df['math']>90)]
Out[492]:
   id   name  math  english gender
4   5  Helen    97       94  fmale
# 按照 数学成绩升序排列 ascending=True 为默认值
In [495]: df.sort_values(by='math')
Out[495]:
   id    name  math  english gender
3   4    Eric    78       94  fmale
1   2     Bob    89       94   male
0   1   Alice    90       89   male
5   6  Grace     93       90  fmale
4   5   Helen    97       94  fmale
2   3   Cindy    99       80  fmale
# 按照 数学成绩降序排列
In [496]: df.sort_values(by='math',ascending=False)
Out[496]:
   id    name  math  english gender
2   3   Cindy    99       80  fmale
4   5   Helen    97       94  fmale
5   6  Grace     93       90  fmale
0   1   Alice    90       89   male
1   2     Bob    89       94   male
3   4    Eric    78       94  fmale

In [497]: df.sort_values(by='math',ascending=True)
Out[497]:
   id    name  math  english gender
3   4    Eric    78       94  fmale
1   2     Bob    89       94   male
0   1   Alice    90       89   male
5   6  Grace     93       90  fmale
4   5   Helen    97       94  fmale
2   3   Cindy    99       80  fmale
# 对数学这一列升序排序
In [498]: df.math.sort_values()
Out[498]:
3    78
1    89
0    90
5    93
4    97
2    99
Name: math, dtype: int64
# isin 用法 df 中的in 
In [500]: df[df['name'].isin(['Alice','Bob','Grace'])]
Out[500]:
   id   name  math  english gender
0   1  Alice    90       89   male
1   2    Bob    89       94   male
In [502]: df.query("name in ['Alice','Bob']")
Out[502]:
   id   name  math  english gender
0   1  Alice    90       89   male
1   2    Bob    89       94   male
# 设置索引
df1 = df.copy()
df1.set_index('id',inplace=True)

分组

In [512]: df
Out[512]:
   id    name  math  english gender
0   1   Alice    90       89   male
1   2     Bob    89       94   male
2   3   Cindy    99       80  fmale
3   4    Eric    78       94  fmale
4   5   Helen    97       94  fmale
5   6  Grace     93       90  fmale

In [513]: df.groupby('gender')['math'].max()
Out[513]:
gender
fmale    99
male     90
Name: math, dtype: int64

In [514]: df.groupby('gender').math.max()
Out[514]:
gender
fmale    99
male     90
Name: math, dtype: int64

In [515]: df.groupby('gender').agg({'math':max})
Out[515]:
        math
gender
fmale     99
male      90

In [516]: df.groupby('gender').math.agg(['max','min','mean'])
Out[516]:
        max  min   mean
gender
fmale    99   78  91.75
male     90   89  89.50
In [519]: df.groupby('gender').describe()
Out[519]:
          id                                            math                                                  english
       count mean       std  min   25%  50%   75%  max count   mean       std   min    25%   50%    75%   max   count  mean       std   min    25%   50%    75%   max
gender
fmale    4.0  4.5  1.290994  3.0  3.75  4.5  5.25  6.0   4.0  91.75  9.500000  78.0  89.25  95.0  97.50  99.0     4.0  89.5  6.608076  80.0  87.50  92.0  94.00  94.0
male     2.0  1.5  0.707107  1.0  1.25  1.5  1.75  2.0   2.0  89.50  0.707107  89.0  89.25  89.5  89.75  90.0     2.0  91.5  3.535534  89.0  90.25  91.5  92.75  94.0
# 推荐使用这种写法
In [520]: df.groupby('gender')['math'].agg(['max','min','mean'])
Out[520]:
        max  min   mean
gender
fmale    99   78  91.75
male     90   89  89.50

apply

# apply(作用于某一列或某一行)
In [5]: df
Out[5]:
   id    name  math  english gender
0   1   Alice    90       89   male
1   2     Bob    89       94   male
2   3   Cindy    99       80  fmale
3   4    Eric    78       94  fmale
4   5   Helen    97       94  fmale
5   6  Grace     93       90  fmale

In [6]: def transform_gender(gender):
   ...:     if gender == 'male':
   ...:         return '男'
   ...:     elif gender == 'fmale':
   ...:         return '女'
   ...:     else:
   ...:         return None
   ...:
   ...:

In [7]: df1 = df.copy()

In [8]: df1['性别']=df.gender.apply(transform_gender)

In [9]: df1
Out[9]:
   id    name  math  english gender 性别
0   1   Alice    90       89   male  男
1   2     Bob    89       94   male  男
2   3   Cindy    99       80  fmale  女
3   4    Eric    78       94  fmale  女
4   5   Helen    97       94  fmale  女
5   6  Grace     93       90  fmale  女
# applymap(作用于所有元素)
# 将所有数字扩大10倍
In [18]: df2.applymap(lambda x :x*10 if (type(x) is int) else x)
Out[18]:
   id    name  math  english gender
0  10   Alice   900      890   male
1  20     Bob   890      940   male
2  30   Cindy   990      800  fmale
3  40    Eric   780      940  fmale
4  50   Helen   970      940  fmale
5  60  Grace    930      900  fmale

连接

# concat 注意最新版本中不建议使用 append() 进行union 操作了 ，使用 concat()
In [27]: raw_data_1 = {
    ...:         'subject_id': ['1', '2', '3', '4', '5'],
    ...:         'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
    ...:         'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
    ...: 
    ...: raw_data_2 = {
    ...:         'subject_id': ['4', '5', '6', '7', '8'],
    ...:         'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
    ...:         'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
    ...: 
    ...: raw_data_3 = {^M
    ...:         'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
    ...:         'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}

In [28]: data1 = pd.DataFrame(raw_data_1)

In [29]: data2 = pd.DataFrame(raw_data_2)

In [30]: data3 = pd.DataFrame(raw_data_3)

In [31]: data1_data2 = pd.concat([data1,data2])

In [32]: data1_data2
Out[32]:
  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan
# ignore_index=True 索引重置 默认 axis =0 进行行拼接, axis =1 列拼接
In [33]: data1_data2 = pd.concat([data1,data2],ignore_index=True)

In [34]: data1_data2
Out[34]:
  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches
5          4      Billy    Bonder
6          5      Brian     Black
7          6       Bran   Balwner
8          7      Bryce     Brice
9          8      Betty    Btisan
# merge 可以实现类似SQL join 操作
In [63]: df_3 = pd.merge(df_1,df_2,how='left',left_on='lkey',right_on='rkey')

In [64]: df_3
Out[64]:
  lkey  value rkey  height
0  foo      1  foo       5
1  foo      1  foo       8
2  bar      2  bar       6
3  baz      3  baz       7
4  foo      5  foo       5
5  foo      5  foo       8

In [65]: df_1
Out[65]:
  lkey  value
0  foo      1
1  bar      2
2  baz      3
3  foo      5

In [66]: df_2
Out[66]:
  rkey  height
0  foo       5
1  bar       6
2  baz       7
3  foo       8
# 建议使用这中 merge 方法，如果要不显示重复列如rkey ，可以在关联前进行重命名
In [68]: df_4 = df_1.merge(df_2,how='left',left_on='lkey',right_on='rkey')

In [69]: df_4
Out[69]:
  lkey  value rkey  height
0  foo      1  foo       5
1  foo      1  foo       8
2  bar      2  bar       6
3  baz      3  baz       7
4  foo      5  foo       5
5  foo      5  foo       8
# 有重名的列会自动加_x,_y,可以通过 suffixes 设置样式 suffixes['_left','_right']
In [71]: df_2.rename(columns={'height':'value'},inplace=True)

In [72]: df_2
Out[72]:
  rkey  value
0  foo      5
1  bar      6
2  baz      7
3  foo      8

In [73]: df_4 = df_1.merge(df_2,how='left',left_on='lkey',right_on='rkey')

In [74]: df_4
Out[74]:
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  bar        2  bar        6
3  baz        3  baz        7
4  foo        5  foo        5
5  foo        5  foo        8

总结工作常用操作

1. df.isna().sum() # 统计所有列的缺失值

df.isna().sum()/df.shape[0] # 计算所有列缺失率

df['feature'].isna().sum() # 统计单列的缺失值

df['feature'].isna().sum()/df.shape[0] # 计算单列缺失率

2. loc&iloc
df.loc[a, b] # a是选取行, b是选取列
df.loc[:, :] # ':' 在这里指的是选取所有行, 所有列
df.loc[[index], :] # index是dataframe的索引, 根据索引选择我们想要的数据
df.loc[index:, 'feature_1':'feature_5'] # 行选取index后面所有, 列选取feature_1到feature_5之间的所有特征
df.loc[index] # 如果loc后面只有一个元素的话, 那么默认是根据索引选择数据
3. df.feature[data.feature == 1] = 0 # 根据条件对df做修改(在这里是将df中feature列值为1的元素都改为0)
4.merge pd.merge(df_1, df_2, how='', left_on='', right_on='')
5.concat pd.concat([df_1, df_2], axis=1) # axis=1 拼接列; axis=0 拼接行
6.for index, row in df.iterrows(): 遍历DataFrame的每一行
7.删除 列 del df.A
8.修改数据类型 df = df.astype('float32')
9.获取top n df.feature_name.nlargest(2) df.feature_name.nsmallest(2)
10. case when 用法 使用 apply df['a']=df.A.apply(lambda x:cun(x))
11. 解决CSV 文件过大问题
data = pd.read_csv(path, sep=',', iterator=True)
loop = True
chunkSize = 1000
chunks = []
index=0
while loop:
    try:
        print(index)
        chunk = data.get_chunk(chunkSize)
        chunks.append(chunk)
        index+=1
    except StopIteration:

    	loop = False

print("Iteration is stopped.")

print('开始合并')

data = pd.concat(chunks, ignore_index=True)
14.判断数据中是否存在极值
np.isinf(data[i]).any()
15.统计特征中每一个值出现的次数
data[feature_1].value_counts()
16.逗号替换 , 修改数据类型
df['feature_1'] = ['100,000', '200,000', '300,000', '400,000']
df['feature_1'] = df['feature_1'].apply(lambda x: str(x).replace(',', '')).astype('float')
17.pandas 字符串根据长度自动补齐自定义字符
df['feature_2'] = ['405', '8094', '100', '22']
df['feature_2'].str.pad(width=6, side='left', fillchar='*')
# width: 设置统一长度
# side: 补齐方式，'left' or 'right'
# fillchar: 补充字符
18.dataframe 筛选条件
df.loc[df['feature_1'] > 100000]
df.loc[(df['feature_1'] > 100000) & (df['feature_2'] != '100')]
df.query()
In [78]: df
Out[78]:
   id    name  math  english gender
0   1   Alice    90       89   male
1   2     Bob    89       94   male
2   3   Cindy    99       80  fmale
3   4    Eric    78       94  fmale
4   5   Helen    97       94  fmale
5   6  Grace     93       90  fmale
# 筛选满足条件的列
In [79]: df.loc[df.math>90,'name']
Out[79]:
2     Cindy
4     Helen
5    Grace
Name: name, dtype: object
19.dataframe 去重
df['feature_3'] = [200, 300, 400, 200]
df.drop_duplicates(subset = ['feature_3'], keep='first')
# subset: 根据指定的列去重，如果不指定，则根据所有列去重；
# keep: 'first' or 'last' 去重时保留首次出现的值还是末尾出现的值；
20.字典转dataframe
dict_a = {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}
df2 = pd.DataFrame.from_dict(dict_a, orient='index', columns=['a'])
df2.reset_index().rename(columns={'index': 'abcd', 'a': 'ABCD'})
21.apply根据多列进行判断
def function(a, b):
    if a >= 100000 and b == '100':
        return 1
    else:
        return 0

df['test'] = df.apply(lambda x: function(x.feature_1, x.feature_2), axis = 1)
22.pandas 实现分组排序(组内排序) 类似数据库中的开窗函数
df.groupby([feature_1])[feature_2].rank(ascending=True, method='first')
--等效于 row_number() over(partition by feature_1 order by feature_2 asc ) 序号 1,2,3,4,5,5,6
df.groupby([feature_1])[feature_2].rank(ascending=True, method='max')
--等效于  rank() over(partition by feature_1 order by feature_2 asc) 1,2,3,3,3,6 序号不连续
df.groupby([feature_1])[feature_2].rank(ascending=True, method='min')
--等效于  dense_rank() over(partition by feature_1 order by feature_2 asc) 1,2,3,3,3,4  序号时连续的
23. pandas 合并行
import pandas as pd
import numpy as np
import os
os.chdir(r'C:/Users/young/Desktop')
#读入数据
df=pd.read_excel('多行合并.xlsx')
#定义拼接函数，并对字段进行去重
def concat_func(x):
    return pd.Series({
        '爱好':','.join(x['爱好'].unique()),
        '性别':','.join(x['性别'].unique())
    }
    )
#分组聚合+拼接
result=df.groupby(df['姓名']).apply(concat_func).reset_index()
#结果展示
result
24.字符串操作 关键点 .str.contains 类似SQL中的like 操做
df[df['colB'].str.contains('a{竖线}b')]
# ~ 取反
df[~df['colA'].isin(['A','B'])]
25.列拼接 .str.cat 注意合并的列数据类型要一致
data['合并2']=data['姓名'].str.cat(data['性别'],sep=',').str.cat(data['身份'],sep=',')
26.空值填充
data['类别'].fillna('others',inplace=True)
27.空值删除 how='any' 代表带有空值的 行/列删除,axis=0,代表删除行 ,how='all' 删除一整行或一整列都时空的数据
data.dropna(subset=['品牌'],how='any',axis=0,inplace=True)
28 删除重复值
data.drop_duplicates(inplace=True)
29.时间类型转换
pd.to_datetime(df['打卡时间'].str[:19],format="%Y-%m-%d %H:%M:%S")
30.pandas 在指定位置增加列
df.insert(loc=0, column='#', value=df.index)

根据SQL 和 pandas 对比

resample 时序分析

第十三章 GUI编程

13.1 常见GUI

13.2 `PythonSimpleGUI`

第十四章其他

14.1 日期和时间操作

常用的操作时间的包

常见操作

获取当前时间

# 获取当前时间戳
In [393]: import time
In [394]: time.time()
Out[394]: 1645579485.5731077

In [395]: from datetime import datetime
# 获取当前时间 字符串形式
In [396]: datetime.now()
Out[396]: datetime.datetime(2022, 2, 23, 9, 25, 24, 717509)

In [397]: str(datetime.now())
Out[397]: '2022-02-23 09:25:35.981446'

时间格式化

时间转字符串

In [398]: time.localtime()
Out[398]: time.struct_time(tm_year=2022, tm_mon=2, tm_mday=23, tm_hour=9, tm_min=26, tm_sec=8, tm_wday=2, tm_yday=54, tm_isdst=0)
# 时间转字符串
In [399]: time.strftime('%Y-%m-%d %H:%M:%S',time.localtime())
Out[399]: '2022-02-23 09:27:03'

In [400]: datetime.now().strftime('%Y-%M-%d %H:%M:%S')
Out[400]: '2022-27-23 09:27:47'
# 数字表示的时间转字符串
In [402]: time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
Out[402]: '2022-02-23 09:51:14'

字符串时间转时间

# 字符串转时间
In [404]: time.strptime('2022-02-01','%Y-%m-%d')
Out[404]: time.struct_time(tm_year=2022, tm_mon=2, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=32, tm_isdst=-1)
# 字符串时间转化为整数时间
In [406]: int(time.mktime(time.strptime('2022-02-01','%Y-%m-%d')))
Out[406]: 1643644800

ISO 格式转为本地时间

import dateutil.parser

# 字符串时间 转化为 datetime 对象
dt = dateutil.parser.isoparse('2008-09-03T20:56:35.450686+00:00')

# 转化为本地时区的 datetime 对象
localdt = dt.astimezone(tz=None)
# 产生本地格式 字符串
localdt.strftime('%Y-%m-%d %H:%M:%S')

获取指定的年月日，周

>>> from datetime import datetime
>>> datetime.now()
datetime.datetime(2018, 6, 30, 23, 3, 54, 238947)

# 年
>>> datetime.now().year
2018

# 月
>>> datetime.now().month
6

# 日
>>> datetime.now().day
30

# 时
>>> datetime.now().hour
23

# 分
>>> datetime.now().minute
7

# 秒
>>> datetime.now().second
58

# 毫秒
>>> datetime.now().microsecond
151169

# 获取星期几用 weekday方法
# 0 代表星期一，1 代表星期二 依次类推
>>> datetime.now().weekday() 
5

获取指定日期推移

thatDay = "2018-6-24"
from datetime import datetime,timedelta
theDay = datetime.strptime(thatDay, "%Y-%m-%d").date()

# 后推120天 就是 + timedelta(days=120)
target = theDay + timedelta(days=120)

print(target)
print(target.weekday())

# 前推120天 就是 - timedelta(days=120)
target = theDay - timedelta(days=120)

print(target)
print(target.weekday())

获取指定日期那周的周一

thatDay = "2022-10-30"
from datetime import datetime,timedelta
# 字符串转为时间
theDay = datetime.strptime(thatDay, "%Y-%m-%d").date()

# 这就是 2022-10-30 那一周的周一  datetime.date(2022, 10, 24)
weekMonday = theDay - timedelta(days=theDay.weekday())

获取某个月共有多少天

from calendar import monthrange
# monthrange返回的是元组
# 第一个元素是指定月第一天是星期几
# 第二个元素是指定月有多少天
mr = monthrange(2011, 2)

# 得到2011年2月有多少天
print(mr[1])

14.2 正则表达式

常见正则

常见匹配模式

re.l 忽略大小写
re.L 表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境

re.M 多行模式

re.MULTILINE 一样的效果

In [433]: content = '''001-苹果价格-60
     ...: 002-橙子价格-70
     ...: 003-香蕉价格-80'''
# 多行模式
In [434]: p1 =  re.compile(r'\d+$',re.M)

In [435]: for one in p1.findall(content):
     ...:     print(one)
     ...:
60
70
80
# 单行模式
In [436]: p = re.compile(r'\d+$')

In [437]: for one in p.findall(content,re.M):
     ...:     print(one)
     ...:
     ...:
80
# 正常字符串匹配
In [438]: content="苹果价格-60,-橙子价格-70,香蕉价格-80"

In [439]: pattern = re.compile(r'(\d+)')

In [440]: for one in pattern.findall(content):
     ...:     print(one)
     ...:
60
70
80

re.S 即为 . 并且包括换行符在内的任意字符（. 不包括换行符）

re.DOTALL 一样的效果
re.U 表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于 Unicode 字符属性数据库
re.X 为了增加可读性，忽略空格和 # 后面的注释

常用方法

re.complie() compile 函数用于编译正则表达式，生成一个正则表达式（ Pattern ）

re.findall() 在字符串中找到正则表达式所匹配的所有子串，并返回一个列表

content = '''
<div class="el">
        <p class="t1">           
            <span>
                <a>Python开发工程师</a>
            </span>
        </p>
        <span class="t2">南京</span>
        <span class="t3">1.5-2万/月</span>
</div>
<div class="el">
        <p class="t1">
            <span>
                <a>java开发工程师</a>
            </span>
		</p>
        <span class="t2">苏州</span>
        <span class="t3">1.5-2/月</span>
</div>
'''
# (.*?) 代表这是一个非贪婪模式,re.DOTALL 包括换行符,括号代表只匹配括号里面内容
import re
# 编译正则表达式，生成Pattern对象
p = re.compile(r'class=\"t1\">.*?<a>(.*?)</a>', re.DOTALL)
for one in  p.findall(content):
    print(one)
# Python开发工程师
# java开发工程师

re.split() split 方法按照能够匹配的子串将字符串分割后返回列表。
re.sub() 用于替换字符串中的匹配项。

贪婪匹配

尽可能匹配多的数据,正则默认的匹配模式，效率高于非贪婪匹配
非贪婪匹配

尽可能匹配少的数据

正则参考资料, 在线正则校验网址

14.3 读写Excel

读写Excel 常见的包

openpyxl

import openpyxl
# 新建文件
workbook = openpyxl.Workbook() 
# 写入文件
sheet = workbook.activesheet['A1']='data'
# 保存文件 
workbook.save('test.xlsx')

xlutils xlrd/xlwt

import xlwt #写入数据
import xlutils #操作excel
#----xlrd库
#打开excel文件
workbook = xlrd.open_workbook('myexcel.xls')
#获取表单
worksheet = workbook.sheet_by_index(0)
#读取数据
data = worksheet.cell_value(0,0)
#----xlwt库
#新建excel
wb = xlwt.Workbook()
#添加工作薄
sh = wb.add_sheet('Sheet1')
#写入数据
sh.write(0,0,'data')
#保存文件
wb.save('myexcel.xls')
#----xlutils库
#打开excel文件
book = xlrd.open_workbook('myexcel.xls')
#复制一份
new_book = xlutils.copy(book)
#拿到工作薄
worksheet = new_book.getsheet(0)
#写入数据
worksheet.write(0,0,'new data')
#保存
new_book.save()

xlsxwriter

import xlsxwriter as xw
#新建excel
workbook  = xw.Workbook('myexcel.xlsx')
#新建工作薄
worksheet = workbook.add_worksheet()
#写入数据
worksheet.write('A1',1)
#关闭保存
workbook.close()

win32com

#  --`pip install pywin32`
import win32com.client
excel = win32com.client.Dispatch("Excel.Application")

# excel.Visible = True     # 可以让excel 可见

# 这里填写要修改的Excel文件的绝对路径
workbook = excel.Workbooks.Open(r"d:\tmp\income1.xlsx")

# 得到 2017 表单
sheet = workbook.Sheets('2017')

# 修改表单第一行第一列单元格内容
# com接口，单元格行号、列号从1开始
sheet.Cells(1,1).Value="你好"

# 保存内容
workbook.Save()

# 关闭该Excel文件
workbook.Close()

# excel进程退出
excel.Quit()

# 释放相关资源
sheet = None
book = None
excel.Quit()
excel = None
## 读写性能对比
import time

def byCom():
    t1 = time.time()
    import win32com.client
    excel = win32com.client.Dispatch("Excel.Application")

    # excel.Visible = True     # 可以让excel 可见
    workbook = excel.Workbooks.Open(r"h:\tmp\ruijia\数据.xlsx")

    sheet = workbook.Sheets(2)

    print(sheet.Cells(2,15).Value)
    print(sheet.UsedRange.Rows.Count)  #多少行

    t2 = time.time()
    print(f'打开: 耗时{t2 - t1}秒')

    total = 0
    for row in range(2,sheet.UsedRange.Rows.Count+1):
        value = sheet.Cells(row,15).Value
        if type(value) not in [int,float]:
            continue
        total += value

    print(total)

    t3 = time.time()
    print(f'读取数据: 耗时{t3 - t2}秒')


def byXlrd():
    t1 = time.time()
    import xlrd

    # 加载 excel 文件
    srcBook = xlrd.open_workbook("数据.xlsx")
    sheet = srcBook.sheet_by_index(1)

    print(sheet.cell_value(rowx=1,colx=14))
    print(sheet.nrows) #多少行

    t2 = time.time()
    print(f'打开: 耗时{t2 - t1}秒')

    total = 0
    for row in range(1,sheet.nrows):
        value = sheet.cell_value(row, 14)
        if type(value) == str:
            continue
        total += value

    print(total)

    t3 = time.time()
    print(f'读取数据: 耗时{t3 - t2}秒')

byCom()
byXlrd()

如果你只是从大Excel文件中读取或修改少量数据，Excel COM 接口会快很多。

但是，如果你要读取大Excel中的大量数据，不要使用 COM接口，会非常的慢。

xlwings

import xlwings as xw
#连接到excel
workbook = xw.Book(r'path/myexcel.xlsx')#连接excel文件
#连接到指定单元格
data_range = workbook.sheets('Sheet1').range('A1')
#写入数据
data_range.value = [1,2,3]
#保存
workbook.save()

pandas

import pandas as pd 
# 打开Excel
df = pd.read_excel(path, sheet_name='xxx', header=1)
# 保存excel
df.to_excel(path, sheet_name='xxx')

DataNitro

#单一单元格赋值
Cell('A1').value = 'data'
#单元区域赋值
CellRange('A1:B2').value = 'data'

各种方式读写Excel 对比

各种方式读写Excel 对比参考资料

14.4 调用其他程序

14.5 socket编程

pass

14.6 哈希和加密

常见的哈希函数

哈希算法字节长度
MD5 计算结果16字节
SHA1 计算结果20字节
SHA224 计算结果28字节
SHA256 计算结果32字节
SHA384 计算结果48字节
SHA512 计算结果64字节

哈希函数（hash function）可以把任意长度的数据（字节串）计算出一个为固定长度的结果数据。

我们习惯把要计算的数据称之为源数据，计算后的结果数据称之为哈希值（hash value）或者摘要(digests)。
哈希计算的特点
- 相同的源数据，采用相同的哈希算法，计算出来的哈希值一定相同
- 不管源数据有多大，相同的哈希算法，计算出来的哈希值长度都是一样长的
- 不同的源数据使用同样的哈希算法，可能会产生相同的哈希值，这被称之为碰撞率（collision rate）
- 各种哈希算法，计算的结果长度越长，碰撞率越低，通常耗费的计算时长也越长。
  
  即使是 MD5 算法，碰撞率也非常小，小到几乎可以忽略不计。大约是 1.47*10的负29次方
- 哈希不可逆
哈希函数的使用场景
- 校验拷贝下载文件
- 校验信息有效性(登录，用户名密码校验)

Python 中实现哈希

import hashlib

# 使用 md5 算法 m = hashlib.sha256()
m = hashlib.md5()

# 要计算的源数据必须是字节串格式
# 字符串对象需要encode转化为字节串对象
m.update("张三，学费已交|13ty8ffbs2v".encode())

# 产生哈希值对应的bytes对象
resultBytes = m.digest()  
# 产生哈希值的十六进制表示
resultHex   = m.hexdigest()
print(resultHex)

常见的加密算法
- 对称加密 AES, RC4, DES, 3DES, IDEA
- 非对称加密 RSA (Rivest–Shamir–Adleman)
加密算法的特点
- 加解密算法是可逆的，hash算法是不可逆的。
- hash算法可以对很大的数据产生比较小的哈希值，而加密算法源数据很大，加密后的数据也会很大
加密算法的使用场景
- SSH 免密登录

Python 中实现加密

from cryptography.fernet import Fernet

# 产生密钥， 密钥是加密解密必须的
key = Fernet.generate_key()
f = Fernet(key)


src = "dfdkslfjdlsjdkljg"
# 源信息，必须是字节串对象
# 字符串对象需要encode一下
srcBytes = src.encode()

# 生成加密字节串
token = f.encrypt(srcBytes)
print(token)

# 解密，返回值是字节串对象
sb = f.decrypt(token)
print(sb.decode())

14.6 图片处理

QRcode 生成二维码

QRcode 生成二维码参考资料
PILLOW 图片处理

PILLOW 图片处理参考资料

14.7 制作可执行文件

注意事项
- 在虚拟环境打包，可以减少打包文件的大小
- 动态导入库 --hidden-import 动态导入
```
pyinstaller httpclient.py  --hidden-import PySide2.QtXml
```

常见打包命令

# --ico 指定打包文件的图标   --workpath 临时文件存放目录 --distpath 可执行文件存放目录 --noconsole 去除console 界面 图形界面应用会使用到
pyinstaller DataClean.py --noconsole --workpath e:\pybuild  --distpath e:\pybuild\dist  --icon="DataClean.ico"