为了能够更直观的看出 单进程 多进程和多线程在IO密集任务中的效率,现在对不同形式情况下的一个耗时情况做个对比:
计时装饰器
也是为了回顾之前的学习知识点,先做一个计时用的装饰器函数,用来对不同情况函数运行的耗时情况:
import time
def timer(is_timing: bool=True):
"""计时器装饰器"""
def decorator(fn):
from functools import wraps
@wraps(fn)
def wrapper(*args, **keywords):
start_time = time.time()
func = fn(*args, **keywords)
end_time = time.time()
if is_timing:
print(f"耗时:{end_time-start_time}")
return func
return wrapper
return decorator
功能函数
两个函数: open_url(url):向url方法请求,返回url响应的内容; save_page(title, page_text):把url响应的内容保存到本地。
def open_url(url):
import requests
try:
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
}
resp = requests.get(url, headers=headers)
print(url, resp.status_code, resp.apparent_encoding)
resp.encoding = resp.apparent_encoding
return resp.text
except Exception as e:
print(url, "访问失败\n", e)
return None
def save_page(title, page_text):
if page_text is None:
return
try:
with open(f"{title}.txt", "w", encoding="utf-8") as w:
w.write(page_text)
print(f"{title}:下载成功")
except Exception as e:
print(f"{title}:下载失败", e)
urls 是 不同网站的网址
单进程运行
单进程执行以上函数,查看需要多少时长: 电脑性能,网速,目标url主机的响应速度都会影响最终的耗时情况。
@timer(is_timing=True)
def run_url():
"""单进程 运行"""
urls = [
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/"
]
for i, url in enumerate(urls):
page_text = open_url(url)
save_page(i, page_text)
if __name__ == '__main__':
run_url()
多进程运行
开启多个进程的方法有很多,在这里用了这两种方法:
方法一: Pool方法 方法二: ProcessPoolExecutor方法
示例代码:
from multiprocessing import Pool
from concurrent.futures import ProcessPoolExecutor,as_completed
@timer(is_timing=True)
def run_url_pro():
"""开启 多进程 运行"""
urls = [
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/"
]
with ProcessPoolExecutor() as executor:
futures = [executor.submit(open_url, url) for url in urls]
for i, future in enumerate(as_completed(futures)):
executor.submit(save_page, i, future.result())
if __name__ == '__main__':
run_url_pro()
比单线程好多了。
多线程运行
多线程开启用的:
ThreadPoolExecutor方法
代码示例:
from concurrent.futures import ThreadPoolExecutor, as_completed
@ timer(is_timing=True)
def run_url_thread():
urls = [
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/",
"https://www.***.com/"
]
with ThreadPoolExecutor() as executor:
futures = executor.map(open_url, urls)
executor.map(save_page, range(5), futures)
if __name__ == '__main__':
run_url_thread()
多线程为什么会比多进程耗时要多呢?
结果
单进程 耗时:23.446341037750244 多进程 耗时:10.064575433731079 多线程 耗时:15.902909755706787
通过对10个不同url的访问并保存网站响应的内容到本地, 为什么我的结果里面 多线程要比多进程更耗时呢? 是有什么地方写的不合理吗?
如果有什么不对的地方,欢迎指正。
|