开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> Python知识库 -> 学习笔记：Python数据分析之内建数据结构、函数及文件 -> 正文阅读

[Python知识库]学习笔记：Python数据分析之内建数据结构、函数及文件

1.数据结构和序列

1.1 元组

元组是一种固定长度、不可变的Python对象序列。

tuple = 1, 2, 3
tup

(1, 2, 3)

生成元素是元组的元组：

nested_tup = (1, 2, 3), (4, 5,6)
nested_tup

((1, 2, 3), (4, 5,6))

任意序列 / 迭代器 ? 元组：

tuple([6, 0, 7])

(6, 0, 7)

tup = tuple('python')
tup

('p', 'y', 't', 'h', 'o', 'n')

通过[ ]获取元组的元素：

tup[0]

'p'

对象元组中存储的对象本身其本身是可变的，但是元组一旦被创建，各个位置上的对象是无法被修改的：

//非法操作，会报错
tup = tuple(['exp', [0, 1], True])
tup[2] = False

若元组中的某个对象是可变的，比如列表，则该对象可进行修改：

tup = tuple(['exp', [0, 1], True])
tup[1].append(2)
tup

('exp', [0, 1, 2], True)

使用+号连接元组：

('python', None, 607) + (0, 1) + ('game')

('python', None, 607, 0, 1, 'game')

使用*号生成含有多份拷贝的元组：

('python', 'game') * 3

('python', 'game', 'python', 'game', 'python', 'game')

1.1.1 元组拆包

tup = (1, 2, 3)
a, b, c = tup
a
b
c

1
2
3

嵌套元组拆包：

tup = 1, 2, 3, (6, 0, 7)
a, b, c, (x, y, z) = tup
z

拆包以遍历元组或列表组成的序列：

seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
	print('a = {0}, b = {1}, c = {2}'.format(a, b, c))

a = 1, b = 2, c = 3
a = 4, b = 5, c = 6
a = 7, b = 8, c = 9

高级元组拆包：

values = 1, 2, 3, 4, 5
a, b, *rest = values
a, b

(1, 2)

rest

[3, 4, 5]

编程时大多使用下划线(_)来表示不想要的变量：a, b, *_ = values

1.1.2 元组方法

a = (0, 1, 1, 1, 1, 0, 1)
a.count(1)

1.2 列表

列表是长度可变、内容可变的序列，可以用[]或者list类型函数来定义列表。

list_1 = [1, 2, 3, None]
list_1

[1, 2, 3, None]

tup = ('a', 'b', 'c')
list_2 = list(tup)
list_2

['a', 'b', 'c']

list_2[2] = 'x'
list_2

['a', 'b', 'x']

迭代器 / 生成器 ? 列表：

gen = range(10)
gen

range(0, 10)

list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

1.2.1 增加和移除元素

append()函数：将元素添加到列表的尾部

list_ap = ['a', 'b', 'c']
list_ap.append('x')
list_ap

['a', 'b', 'c', 'x']

insert()函数：将元素插入到指定的列表位置

list_in = ['a', 'b', 'c']
list_in.insert(1, 'x')
list_in

 ['a', 'x', 'b', 'c']

pop()函数：将特定位置的元素移除并返回

list_p = ['a', 'x', 'b', 'c']
list_p.pop(1)

'x'

list_p

['a', 'b', 'c']

remove()函数：定位第一个符合要求的值并移除它

list_re = ['a', 'b', 'c', 'a']
list_re.remove('a')
list_re

['b', 'c', 'a']

in关键字可用于检查一个值是否在列表中：

list_in = ['a', 'b', 'c']
'a' in list_in

True

not in关键字可用于检查一个值是否不在列表中：

list_not = ['a', 'b', 'c']
'a' not in list_not

False

1.2.2 连接和联合列表

两个列表可以用+号连接：

[6, None, 'peer'] + [0, 1, (1, 0)]

[6, None, 'peer', 0, 1, (1, 0)]

extend()函数：向列表中添加多个元素

x = [6, 0, 7, None, 'peer']
x.extend([0, 1, (1, 0)])
x

[6, 0, 7, None, 'peer', 0, 1, (1, 0)]

如果使用append()函数，则插入的是[0, 1, (1, 0)]这一个元素。

1.2.3 排序

sort()函数：对列表内部进行排序（无须新建一个对象）

a = [6, 0, 7]
a.sort()
a

[0, 6, 7]

通过字符串的长度进行排序：

b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

1.2.4 二分搜索和已排序列表的维护

bisect()函数：返回指定元素应当被插入的位置

import bisect
// 索引 
//   0  1  2  3  4  5  6
c = [1, 2, 2, 2, 3, 4, 7]
bisect.bisect(c, 2)

bisect.bisect(c, 5)

insort()函数：将指定元素插入到相应位置

bisect.insort(c, 6)
c

[1, 2, 2, 2, 3, 4, 6, 7]

1.2.5 切片

基本形式：start: stop
使用切片符号对大多数序列类型选取其子集

seq = [1, 2, 3, 4, 5, 6, 7]
seq[1:5]

[2, 3, 4, 5]

切片还可以将序列赋值给变量：

seq[3: 4] = [8, 9]
seq

[1, 2, 3, 8, 9, 5, 6, 7]

注：包含起始位置start的索引，但是不包含结束位置stop的索引，元素数量的stop-start。
省略start和stop：

seq = [1, 2, 3, 4, 5, 6, 7]
seq[: 5]

[1, 2, 3, 4, 5]

seq = [1, 2, 3, 4, 5, 6, 7]
seq[3:]

 [4, 5, 6, 7]

负索引可以从序列的尾部进行索引：

seq = [1, 2, 3, 4, 5, 6, 7]
seq[-4:]

[4, 5, 6, 7]

seq = [1, 2, 3, 4, 5, 6, 7]
seq[-6: -2]

[2, 3, 4, 5]

步进值step：每个多少个数取一个值

seq = [1, 2, 3, 4, 5, 6, 7]
seq[: : 2]

[1, 3,  5, 7]

元素翻转，可以设置step为-1

seq = [1, 2, 3, 4, 5, 6, 7]
seq[: : -1]

[7, 6, 5, 4, 3, 2, 1]

1.3 内建序列函数

1.3.1 enumerate

enumerate()函数：返回(i, value)即(元素的值，元素的索引)的序列

// 构造一个字典
list_a = ['peer', 'apple', 'banana']

// 空字典
mapping = {}

for i, value in enumerate(list_a):
	mapping[v] = i

mapping

{'peer': 0, 'apple': 1, 'banana': 2}

1.3.2 sorted

sorted()函数：返回一个根据任意序列中的元素新建的已排序列表

sorted([7, 1, 2, 6, 0, 3, 2])

[0, 1, 2, 2, 3, 6, 7]

sorted('horse race')

[' ', 'a', 'c', 'e', 'e', 'h' ,'o', 'r', 'r', 's']

1.3.3 zip

zip()函数：将列表、元组或者其他序列的元素配对，新建一个元组构成的列表。

seq_1 = ['a', 'b', 'c']
seq_2 = ['one', 'two', 'three']
zipped = zip(seq_1, zeq_2)
list(zipped)

[('a', 'one'), ('b', 'two'), ('c', 'three')]

zip()可以处理任意长度的序列，生成列表的长度由最短的序列决定：

seq_1 = ['a', 'b', 'c']
seq_2 = ['one', 'two', 'three']
seq_3 = [False, True]
zipped = zip(seq_1, zeq_2,seq_3)
list(zipped)

[('a', 'one', False), ('b', 'two', True)]

zip()同时遍历多个序列：

seq_1 = ['a', 'b', 'c']
seq_2 = ['one', 'two', 'three']
for i, (a, b) in enumerate(zip(seq_1,seq_2)):
	print('{0}: {1}, {2}'.format(i, a, b))

0: a, one
1: b, two
2: c, three

zip()将行的列表转化为列的列表：

names = [('Lebron', 'James'), ('Allen', 'Iverson'), ('Stephen', 'Curry')]
first_names, last_names = zip(*names)

first_name

('Lebron', 'Allen', 'Stephen')

last_name

('James', 'Iverson', 'Curry')

1.3.4 reversed

reversed()函数：将序列的元素倒序排列。

list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

注：reversed是一个生成器。

1.4 字典

字典dict又名哈希表或者关联数组，拥有灵活尺寸的键值对集合。

empt_dict = {}
dict_1 = {'a': 'apple', 'b': [0, 1]}
dict_1

{'a': 'apple', 'b': [0, 1]}

插入元素到字典中：

dict_1[44] = 'peer'
dict_1

{'a': 'apple', 'b': [0, 1], 44: 'peer'}

访问字典中的元素

dict_1['b']

[0, 1]

in关键字检查字典中是否含有一个键：

'b' in dict_1

True

del关键字可用于删除值：

dict_1 = {'a': 'apple', 'b': [0, 1], 44: 'peer'}
dict_1[33] = 'cola' 
dict_1['extra_index'] = 'juice'
dict_1

{'a': 'apple', 
 'b': [0, 1], 
 44: 'peer',
 33: 'cola', 
 'extra_index': 'juice'}

del dict_1[33]
dict_1

{'a': 'apple', 
 'b': [0, 1], 
 44: 'peer',
 'extra_index': 'juice'}

pop()函数：删除值的同时，返回被删除的值

ret = dict_1.pop('extra_index')
ret

'juice'

dict_1

{'a': 'apple', 
 'b': [0, 1], 
 44: 'peer'}

keys()函数：提供字典的键的迭代器

dict_1 = {'a': 'apple', 'b': [0, 1], 44: 'peer'}
list(dict_1.keys())

['a', 'b', 44]

values()函数：提供字典的键的迭代器

dict_1 = {'a': 'apple', 'b': [0, 1], 44: 'peer'}
list(dict_1.values())

['apple', [0, 1], 'peer']

update()：合并两个字典

dict_1 = {'a': 'apple', 'b': [0, 1], 44: 'peer'}
dict_2 = {'b': [33, 44], 'c': 'cola'}
dict_1.update(dict_2)
dict_1

{'a': 'apple', 'b': [33, 44], 44: 'peer', 'c': 'cola'}

注：若传给update()函数的字典含有同样的键，则原本字典对应的键的值将会被覆盖。

1.4.1 从序列生成字典

mapping = dict((zip(range(5), reversed(range(5)))))
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

1.4.2 默认值

将字词组成的列表根据首字母分类为包含列表的字典：

// 由字词组成的列表
words = ['apple', 'bat', 'bar', 'atom', 'book']
// 空字典
by_letter = {}
// 遍历
for word in words:
	letter = word[0]
	if letter not in by_letter:
		by_letter[letter] = [word]
	else:
		by_letter[letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

可以用setdefault()简化为：

for word in words:
	letter = word[0]
	by_letter.setdefault(letter, []).append(word)

还可以用defaultdict类实现：

from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
	by_letter[word[0]].append(word)

1.4.3 有效的字典键类型

键必须是不可变的对象。
hash()函数：检查一个对象是否可以哈希化（即是否可以用作字典的键）。

hash('string')

5023931463650008331

// 非法操作
// 元组内的元素[2, 3]是列表，是可变的
hash((1, 2, [2, 3]))

将列表转化为元组，然后作为键：

dict_2 = {}
dict_2[tuple([1, 2, 3])] = 5
dict_2

{(1, 2, 3): 5}

1.5 集合

集合是一种无序的、元素唯一、元素不可变的容器。
两种创建方式：通过set()函数或者用字面值集与大括号的语法。

// 方式一
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

// 方式二
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

集合操作：并集

a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

// 等效于 a | b
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

集合操作：交集

// 等效于 a & b
a.intersection(b)

{3, 4, 5}

集合操作：将a的内容设置为a与b的并集

// 等效于 a |= b
a.update(b)
a

{1, 2, 3, 4, 5, 6, 7, 8}

集合的元素必须是不可变的：

my_data = [0, 1]
my_set = {tuple(my_data)}
my_set

{(0, 1)}

集合操作：检查一个集合是否是另一个集合的子集（包含于）或超集（包含）

set_a = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(set_a)

True

set_a.issuperset({1, 2, 3})

True

集合操作：判等

{1, 2, 3} == {3, 2, 1}

True

1.6 列表、集合、字典的推导式

列表的推导式：

strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

集合的推导式：

strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

// 使用map()函数
set(map(len, strings))

{1, 2, 3, 4, 6}

字典的推导式：

strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
loc_mapping = {val: index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

1.6.1 嵌套列表推导式

找出列表中包含的所有含有2个以上字母e的名字：

all_ data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
			 ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]
result = [name for names in all_data for name in names 
		  if name.count('e') >= 2]
result

['Steven']

将含有整数元组的列表扁平化为一个一维整数列表：

tuples_33 = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in tuples_33 for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

列表嵌套：

[[x for x in tup] for tup in tuples_33]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

2. 函数

函数参数：关键字参数必须跟在位置参数后。

// 正则表达式模块re
import re

// 移除标点符号
def remove_punctuation(value):
	return re.sub('[!#?]', '',  value)

// 去除空格、移除标点符号、适当调整大小写
clean_ops = [str.strip, remove_punctuation, str.title]

// 清洗字符串列表
def clean_strings(strings, ops):
	result = []
	for value in strings:
		for function in ops:
			value = function(value)
		result.append(value)
	return result

states = ['    Tom  ', 'Google!', 'Google', 'google', 'CuRRy', 'lebron   james##', 'Allen iverson?']

clean_strings(states, clean_ops)

['Tom',
 'Google',
 'Google',
 'Google',
 'Curry',
 'Lebron    James',
 'Allen Iverson']

将函数作为一个参数传给其他的函数：

for x in map(remove_punctuation, states):
	print(x)

Tom
Google
Google
Google
Curry
Lebron    James
Allen Iverson

2.1 匿名Lambda函数

lambda关键字：用于声明一个匿名函数。

strings = ['foo', 'card', 'bar', 'aaaa', 'abab']
strings.sort(key=lambda x: len(set(list(x))))

['aaaa', 'foo', 'abab', 'bar', 'card']

2.2 生成器

遍历字典，获得字典的键：

dict_it = {'a': 1, 'b': 2, 'c': 3}
for key in dict_it:
	print(key)

'a'
'b'
'c'

迭代器：一种用于在上下文中向Python解释器生成对象的对象。

dict_it = {'a': 1, 'b': 2, 'c': 3}
dict_iterator = iter(dict_it)
list(dict_iterator)

['a', 'b', 'c']

创建一个生成器，仅需在函数中将return关键字改为yield关键字。

def squares(n=10):
	print('Generating squares from 1 to {0}'.format(n ** 2))
	for i in range(1, n + 1):
		yield x ** 2

gen = squares()

for x in gen:
	print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100

2.2.1 生成器表达式

sum(x ** 2 for x in range(100))

dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

2.2.2 itertools模块

groupby()函数：可以根据任意的序列和一个函数，通过函数的返回值对序列中连续的元素进行分组。

import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
for letter, names in itertools.groupby(names, first_letter):
	print(letter, list(names))

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']

3. 文件与操作系统

打开文件进行读取或写入（默认情况下是以只读模式'r'打开的）：

path = 'examples/exp.txt'
f = open(path)
with open(path) as f:
	lines = [x.rstrip() for x in f]

使用with语句，文件会在with代码块结束后自动关闭。

模式	含义
r	只读模式
w	只写模式，创建新文件（清除路径下的同名文件中的数据）
x	只写模式，创建新文件，但存在同名路径时会创建失败
a	添加到已经存在的文件（若不存在则进行创建）
r+	读写模式
b	二进制文件的模式，添加到别的模式中（比如’rb’或者’wb’）
t	文件的文本模式（自动将字节解码为Unicode）。如果未指明模式，默认使用此模式，可以添加到别的模式中（比如’rt’或者’xt’）

with open('temp.txt', 'w') as handle:
	handle.writelines(x for x in open(path) if len(x) > 1)

with open('tmp.txt') as f:
	lines = f.readlines()

lines

Python知识库最新文章

Python中String模块

【Python】 14-CVS文件操作

python的panda库读写文件

使用Nordic的nrf52840实现蓝牙DFU过程

【Python学习记录】numpy数组用法整理

Python学习笔记

python字符串和列表

python如何从txt文件中解析出有效的数据

Python编程从入门到实践自学/3.1-3.2

python变量

加:2021-08-08 13:35:08 更:2021-08-08 13:35:20

360图书馆购物三丰科技阅读网日历万年历 2026年3日历

-2026/3/13 16:07:35-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码