@Scrapy的errback巧用
触发前提
我有一个需求:我想用yield,但是呢我还没有url,或者说想用yield但是yield请求回来的东西又没有什么用,如果一直请求某个网址的话,肯定对相关网址造成很大的负担,那该咋办呢?
yield scrapy.Request方法参数
yield scrapy.Request中的url是必填的,所以第一种不填url是不可行的(除非改源码),!官方的参数介绍文档,这里面介绍一些参数的涵义,然后我就发现了一很有意思的参数:
有一个大胆的想法
如果我给一个不存在url,然后去请求,使用errback去调用我想调用的方法,这样是不是就可以解决呢?
说干就干
样例代码:
import scrapy
class BaiduSpider(scrapy.Spider):
name = 'baidu'
nums = 1
def start_requests(self):
for i in range(100):
yield scrapy.Request(url=f'http://www.1.com',errback=self.erro_def,dont_filter=True)
pass
def erro_def(self,res):
self.nums += 1
print(f'我被执行了______{self.nums}')
dont_filter=True这个参数是不过滤重复网址,如果某一天这个网址是真的网址以后咋办呢?简单,你再加一个callback 即可。 运行结果如下:
2022-03-15 09:59:54 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: airliveSpider)
2022-03-15 09:59:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.7.11 (default, Jul 27 2021, 09:42:29) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Windows-10-10.0.19041-SP0
2022-03-15 09:59:54 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'airliveSpider',
'LOG_LEVEL': 'INFO',
'NEWSPIDER_MODULE': 'airliveSpider.spiders',
'SPIDER_MODULES': ['airliveSpider.spiders']}
2022-03-15 09:59:54 [scrapy.extensions.telnet] INFO: Telnet Password: df69b71bcf4ea887
2022-03-15 09:59:54 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2022-03-15 09:59:54 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-03-15 09:59:54 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-03-15 09:59:54 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-03-15 09:59:54 [scrapy.core.engine] INFO: Spider opened
2022-03-15 09:59:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-15 09:59:54 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-03-15 10:00:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______2
我被执行了______3
我被执行了______4
我被执行了______5
我被执行了______6
我被执行了______7
我被执行了______8
我被执行了______9
2022-03-15 10:00:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:02 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______10
我被执行了______11
我被执行了______12
我被执行了______13
我被执行了______14
我被执行了______15
我被执行了______16
我被执行了______17
2022-03-15 10:00:08 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:08 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:08 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:08 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:08 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:08 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:08 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:08 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______18
我被执行了______19
我被执行了______20
我被执行了______21
我被执行了______22
我被执行了______23
我被执行了______24
我被执行了______25
2022-03-15 10:00:09 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:09 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:09 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______26
2022-03-15 10:00:09 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:09 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:09 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:09 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:09 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______27
我被执行了______28
我被执行了______29
我被执行了______30
我被执行了______31
我被执行了______32
我被执行了______33
2022-03-15 10:00:15 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:15 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:15 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:15 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:15 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______34
2022-03-15 10:00:15 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:15 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:15 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______35
我被执行了______36
我被执行了______37
我被执行了______38
我被执行了______39
我被执行了______40
我被执行了______41
2022-03-15 10:00:16 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:16 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______42
2022-03-15 10:00:16 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:16 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:16 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:16 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:16 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______43
2022-03-15 10:00:16 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______44
我被执行了______45
我被执行了______46
我被执行了______47
我被执行了______48
我被执行了______49
2022-03-15 10:00:22 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:22 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:22 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______50
2022-03-15 10:00:22 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:22 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:22 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______51
我被执行了______52
2022-03-15 10:00:22 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:22 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______53
我被执行了______54
我被执行了______55
我被执行了______56
我被执行了______57
2022-03-15 10:00:23 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:23 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:23 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______58
2022-03-15 10:00:23 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:23 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:23 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______59
我被执行了______60
2022-03-15 10:00:23 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:23 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______61
我被执行了______62
我被执行了______63
我被执行了______64
我被执行了______65
2022-03-15 10:00:29 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:29 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:29 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______66
2022-03-15 10:00:29 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:29 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______67
2022-03-15 10:00:29 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______68
2022-03-15 10:00:29 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:29 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______69
我被执行了______70
我被执行了______71
我被执行了______72
我被执行了______73
2022-03-15 10:00:30 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:30 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:30 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______74
2022-03-15 10:00:30 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:30 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______75
2022-03-15 10:00:30 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:30 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______76
2022-03-15 10:00:30 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______77
我被执行了______78
我被执行了______79
我被执行了______80
我被执行了______81
2022-03-15 10:00:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______82
2022-03-15 10:00:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______83
2022-03-15 10:00:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______84
2022-03-15 10:00:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______85
我被执行了______86
我被执行了______87
我被执行了______88
我被执行了______89
2022-03-15 10:00:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______90
2022-03-15 10:00:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______91
2022-03-15 10:00:38 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______92
我被执行了______93
我被执行了______94
我被执行了______95
我被执行了______96
我被执行了______97
2022-03-15 10:00:41 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:41 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______98
2022-03-15 10:00:41 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
2022-03-15 10:00:41 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.1.com> (failed 3 times): 502 Bad Gateway
我被执行了______99
我被执行了______100
虽然网址报错了,但是确实达到了我们的目的。
最后
只想使用yield但是有不想要返回值时,可以yield请求一个不存在url,使用errback调用想用的方法。
|