开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> Python知识库 -> 爬虫学习笔记-单线程与多线程的差别-案例演示 -> 正文阅读

[Python知识库]爬虫学习笔记-单线程与多线程的差别-案例演示

多线程与多进程并发爬取

堵塞与非堵塞

在编程里经常会听到堵塞，非堵塞，同步，异步等这些专业词语。那这些到底代表着什么意思呢？举个通俗点的例子来进行说明：

(1)同步堵塞：你拿个水壶去烧水，就这样在火炉前等待着水烧开，期间不去干其他事，就这样站着，每过一段时间查看一下水烧开了没有。

(2)同步非堵塞：你还是拿个水壶去烧水，但不再傻傻的站着那里等水烧开，而是跑回房间上网。每过一段时间就来查看一下水烧开了没有，没烧开就走人。

(3)异步堵塞。你这次换了个水壶，水壶烧开以后会自己响提醒你，不需要你再查看。你还是去烧水，站在那里等水壶响，这就是异步堵塞

(4)异步非堵塞。你想了一下不对，既然水壶会自己通知你水烧开了没有，就不需要在这里继续等待了，所以烧水的时候就回房间做其他事情，等待水烧开后水壶自动通知你。

在Python代码中，与用还遍历的方式去执行函数时，其实就是属于同步堵塞的模式，效率十分低下，可以试一下运行下面这段代码。

import time
start =time.time()
def hello():
    time.sleep(1)
    print('hello world')
for i in range(0,10):
    hello()
print(time.time()-start)

运行结果

D:\pythonproject\venv\Scripts\python.exe D:/pythonproject/test/duoxianchen/tongbuduse.py
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
10.095674514770508

这段代码的运行花了10.095674514770508秒时间，因为它是使用了单线程堵塞的方式去执行函数。也就是如果有十壶水需要烧开，使用了每次只烧一壶的方式。其实可以使用异步的方式烧水，每次可以烧多壶水。

import time
import threading
start =time.time()
def hello():
    time.sleep(1)
    print("hello world")
for i in range(0,10):
    t=threading.Thread(target=hello)
    t.start()
print(time.time()-start)

执行此段代码

D:\pythonproject\venv\Scripts\python.exe D:/pythonproject/test/duoxianchen/yibusaoshui2.py
0.0019941329956054688
hello worldhello worldhello worldhello worldhello worldhello worldhello world





hello worldhello world
hello world



Process finished with exit code 0

这结果是复制粘贴的，不是录进去的。所以中间那行两个hello world 连在一块也是代码运行的结果。这里可以看到最后花费的时间在第一行就直接打印了出来，使用时间不到一秒。那么这段代码到底是怎么实现的呢？

首先，通过for循环遍历创建10个线程：

t = threading.Thread(target=hello)    # 创建线程

然后开启线程：

t.start()

这样就开启了10个线程，并发执行hello函数，在执行hello函数的同时还执行了计算花费时间的：

也就是开启线程的一刻，就结束了，

print(time.time()-start)

全部代码完成使用的时间其实只有1秒，就是我们强制执行的那1秒睡眠时间：

time.sleep(1)

但是，由于刚才执行的代码都是无效的，虽然很快，但没办法证明只用1秒完成。为了证明需要修改一下代码，不再使用循环遍历的方式创建线程：

import time
import threading
start =time.time()
def hello():
    time.sleep(1)
    print("hello world")
#for i in range(0,10):
# hello()
''' 使用手动的方式创建10个进程并执行'''
t1 = threading.Thread(target=hello)
t2 = threading.Thread(target=hello)
t3 = threading.Thread(target=hello)
t4 = threading.Thread(target=hello)
t5 = threading.Thread(target=hello)
t6 = threading.Thread(target=hello)
t7 = threading.Thread(target=hello)
t8 = threading.Thread(target=hello)
t9 = threading.Thread(target=hello)
t10 = threading.Thread(target=hello)

t1.start()
t2.start()
t3.start()
t4.start()
t5.start()
t6.start()
t7.start()
t8.start()
t9.start()
t10.start()

'''等待子线程结束'''
t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
t6.join()
t7.join()
t8.join()
t9.join()
t10.join()

print(time.time()-start)

运行代码，结果如下

D:\pythonproject\venv\Scripts\python.exe D:/pythonproject/test/duoxianchen/bingfa3.py
hello worldhello worldhello world
hello worldhello worldhello worldhello worldhello world


hello worldhello world





1.007749319076538

Process finished with exit code 0

? 总共使用时间在最后打印了出来，总花费时间比1秒多出一点。1秒是我们代码强制睡眠的时间，后面的0…007749319076538 是代码执行花费的时间。

? 不过这个例子有点太笨，需要自己手动写10个进程去添加，再换一个例子来说明。

import threading

import time

start =time.time()
def sing():
    time.sleep(3)
    print("我喜欢唱歌")
def dance():
    time.sleep(3)
    print("我喜欢跳舞")

sing()
dance()
print(time.time()-start)

创建两个函数，分别是sing函数 dance函数，睡眠3秒后打印 “我喜欢唱歌”；然后运行 dance 函数，睡眠3秒后打印“我喜欢跳舞”.然后运行代码，结果输入如下:

我喜欢唱歌
我喜欢跳舞
6.009649276733398

使用单线程堵塞的方式运行代码，先运行sing函数然后运行dance 函数，花费时间为6.009649276733398。然后修改一下代码，使用多线程的模式。

import time
import threading
start =time.time()
def sing():
    time.sleep(3)
    print("我喜欢唱歌")
def dance():
    time.sleep(3)
    print("我喜欢跳舞")
t1 =threading.Thread(target=sing)
t2 =threading.Thread(target=dance)
t1.start()
t2.start()
t1.join()
t2.join()
print(time.time()-start)

运行代码，结果如下

我喜欢唱歌我喜欢跳舞

3.004112958908081

可以看到，运行代码只花费了3.004112958908081秒。sing函数和dance函数是同时运行并且睡眠了3秒钟，然后打印"我喜欢唱歌"和"我喜欢跳舞"所谓多线程就是分别新增多条进程去完成目标工作。可以将上面的代码修改将工作的进程打印出来。

import time
import threading
start =time.time()
def sing():
    time.sleep(3)
#	print("我喜欢唱歌")    #删除代码
    print("我喜欢唱歌，现在运行的进程是 %s"%threading.current_thread().name)#打印线程名字
def dance():
    time.sleep(3)
#   print("我喜欢跳舞")	  #删除代码
    print('我喜欢跳舞，现在运行的进程是 %s'%threading.current_thread().name)#打印线程名字
t1 =threading.Thread(target=sing)
t2 =threading.Thread(target=dance)
t1.start()
t2.start()
t1.join()
t2.join()
print(time.time()-start)

运行结果

我喜欢跳舞，现在运行的进程是 Thread-2我喜欢唱歌，现在运行的进程是 Thread-1

3.0113303661346436

可以看到，运行sing 函数和dance函数使用了两条进程，分别是Thread-1 和Thread-2。两条进程同步分别运行了 sing函数和dance函数,避免了堵塞,优化了效率,节省了时间。

Python知识库最新文章

使用Nordic的nrf52840实现蓝牙DFU过程

【Python学习记录】numpy数组用法整理

Python学习笔记

python字符串和列表

python如何从txt文件中解析出有效的数据

Python编程从入门到实践自学/3.1-3.2

python变量

上一篇文章查看所有文章

加:2022-04-30 08:41:12 更:2022-04-30 08:42:42

360图书馆购物三丰科技阅读网日历万年历 2025年10日历

-2025/10/19 18:33:28-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码