环境:
- Elasticsearch 版本:7.10.1
- elasticsearch-analysis-ik 版本:7.10.1
- Elasticsearch 操作的 Python 库版本:7.16.1
问题:
在使用 elasticsearch-analysis-ik 重建索引的代码如下:
from elasticsearch import Elasticsearch
es = Elasticsearch(['http://192.168.4.10:9200/'])
mapping = {
'properties': {
'title': {
'type': 'text',
'analyzer': 'ik_max_word',
'search_analyzer': 'ik_max_word'
}
}
}
es.indices.delete(index='news', ignore=[400,404])
es.indices.create(index='news',ignore=400)
result = es.indices.put_mapping(index='news', body=mapping)
print(result)
插入样本数据的代码如下:
from elasticsearch import Elasticsearch
es = Elasticsearch(['http://192.168.4.10:9200/'])
datas = [
{
'title': '高考结局大不同',
'url': 'https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html',
},
{
'title': '进入职业大洗牌时代,“吃香”职业还吃香吗?',
'url': 'https://new.qq.com/omn/20210828/20210828A025LK00.html',
},
{
'title': '乘风破浪不负韶华,奋斗青春圆梦高考',
'url': 'http://view.inews.qq.com/a/EDU2021041600732200',
},
{
'title': '他,活出了我们理想的样子',
'url': 'https://new.qq.com/omn/20210821/20210821A020ID00.html',
}
]
for data in datas:
es.index(index='news' body=data)
然后发生以下错误信息:
/data/web-spider2/chapter04/4.7/insert_more_data.py:26: DeprecationWarning: The 'body' parameter is deprecated for the 'index' API and will be removed in a future version. Instead use the 'document' parameter. See https://github.com/elastic/elasticsearch-py/issues/1698 for more information
es.index(index='news', body=data)
Traceback (most recent call last):
File "/data/web-spider2/chapter04/4.7/insert_more_data.py", line 26, in <module>
es.index(index='news', body=data)
File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/client/utils.py", line 347, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/client/__init__.py", line 413, in index
return self.transport.perform_request(
File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/transport.py", line 466, in perform_request
raise e
File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/transport.py", line 427, in perform_request
status, headers_response, data = connection.perform_request(
File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/connection/http_urllib3.py", line 291, in perform_request
self._raise_error(response.status, raw_data)
File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/connection/base.py", line 328, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.TransportError: TransportError(500, 'null_pointer_exception', 'Cannot invoke "org.wltea.analyzer.dic.DictSegment.match(char[], int, int)" because "org.wltea.analyzer.dic.Dictionary.singleton._StopWords" is null')
一个是提示body 参数被弃用,另一个是报 TransportError 错误。
解决:
解决此问题,只需要修改 es.index 部分的代码,代码如下:
from elasticsearch import Elasticsearch
from elasticsearch.client.utils import _bulk_body
es = Elasticsearch(['http://192.168.4.10:9200/'])
datas = [
{
'title': '高考结局大不同',
'url': 'https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html',
},
{
'title': '进入职业大洗牌时代,“吃香”职业还吃香吗?',
'url': 'https://new.qq.com/omn/20210828/20210828A025LK00.html',
},
{
'title': '乘风破浪不负韶华,奋斗青春圆梦高考',
'url': 'http://view.inews.qq.com/a/EDU2021041600732200',
},
{
'title': '他,活出了我们理想的样子',
'url': 'https://new.qq.com/omn/20210821/20210821A020ID00.html',
}
]
for data in datas:
es.index(index='news',doc_type='_doc', document={"doc": data})
|