前言:先说下需求,我们直接爬取高德地图坐标入库,然后定时器或者手动执行初始化Elasticsearch数据,爬取方法见爬取高德坐标
一、首先我们初始化Elasticsearch数据
1、创建索引、设置setting 登录Elasticsearch Head
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"ik_pinyin": {
"type": "custom",
"tokenizer": "my_pinyin"
},
"ik_smart_pinyin": {
"type": "custom",
"tokenizer": "ik_smart",
"filter": "my_pinyin"
}
},
"tokenizer": {
"my_pinyin": {
"type": "pinyin",
"keep_first_letter": true,
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"remove_duplicated_term": true,
"keep_original": true
}
},
"filter": {
"my_pinyin": {
"type": "pinyin",
"keep_first_letter": true,
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"remove_duplicated_term": true,
"keep_original": true
}
}
}
}
}
?我们对上面的setting解释下,由于我们要实现中文分词检索,和拼音检索,需要下载分词插件见IK分词插件、拼音插件安装
自定义分词解析器ik_pinyin、ik_smart_pinyin有人会问为啥定义两个拼音分词解析器,这个是为了实现我们输入搜索??jlh 或者pcs可以搜索到九龙湖派出所,ik_smart_pinyin是把词根解析成每个分词然后转成拼音存储起来,my_pinyin是过滤器,就是把分词二次处理,有些参数可以参考??拼音分词插件使用,里面说的很详细。
2、说完自定义分词器就要说到字段映射了,先看代码? ?像上面一样请求 PUT /location/_mapping
{
"properties": {
"address": {
"type": "text",
"fields": {
"ik_pinyin_field": {
"type": "text",
"analyzer": "ik_pinyin",
"search_analyzer": "keyword"
},
"ik_word_field": {
"type": "text",
"analyzer": "ik_max_word"
},
"ik_smart_pinyin_field": {
"type": "text",
"analyzer": "ik_smart_pinyin",
"search_analyzer": "keyword"
}
}
},
"code": {
"type": "keyword"
},
"categoriesCode": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"ik_pinyin_field": {
"type": "text",
"analyzer": "ik_pinyin",
"search_analyzer": "keyword"
},
"ik_word_field": {
"type": "text",
"analyzer": "ik_max_word"
},
"ik_smart_pinyin_field": {
"type": "text",
"analyzer": "ik_smart_pinyin",
"search_analyzer": "keyword"
}
}
},
"_class": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"id": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"geoPoint": {
"type": "geo_point"
},
"order": {
"type": "integer"
}
}
}
? ?字段有点多,实现的功能就是地址可地点名称可以通过分词搜索到,坐标要用geo_point类型、name,可以定义多个fields,可以看做是别名,拼音字段最好查询分词用keyword,这样就不会被分词,导致查询准确性不高,具体代码怎么写在下面
二、java代码实现
首先新增实体类由于设置过映射就不需要在字段上面加了,之前在字段上面添加坐标属性注解不生效。导致坐标系的类都不能用 也是挺坑的。
@Data
@Document(indexName = "location", shards = 5)
public class GeoPointData implements Serializable {
/**
* ID主键
*/
@Id
private String id;
/**
* 地址名称
*/
private String name;
/**
* 地址
*/
private String address;
/**
* 地址大类型
*/
private String[] categoriesCode;
/**
* 地址小类型
*/
private String[] code;
/**
* 位置坐标
*/
private GeoPoint geoPoint;
/**
* 默认推荐排序
*/
private Integer order;
}
根据关键字查询结果
/**
* 关键字查询
* @param operations
* @param keyword
* @param pageNum
* @param pageSize
* @return
*/
public static SearchHits<GeoPointData> queryLocationByKeyword(ElasticsearchOperations operations, String keyword, int pageNum, int pageSize){
//分页参数
pageNum = (pageNum < 1 ? 1:pageNum);
NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
nativeSearchQueryBuilder.withPageable(PageRequest.of(pageNum -1 ,pageSize));
// 查询条件
BoolQueryBuilder boolQueryBuilder = getQueryNameAndAddressBool(keyword);
nativeSearchQueryBuilder.withQuery(boolQueryBuilder);
//返回
SearchHits<GeoPointData> search = operations.search(nativeSearchQueryBuilder.build(), GeoPointData.class);
return search;
}
/**
* 获取查询条件
* @param keyword
* @return
*/
private static BoolQueryBuilder getQueryNameAndAddressBool(String keyword) {
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
MultiMatchQueryBuilder matchWord = QueryBuilders.multiMatchQuery(keyword);
// 中文
matchWord.field("name.ik_word_field",2.0F)
.field("address.ik_word_field",2.0F);
// 除了最佳匹配 其他的也要按照分数排序
matchWord.tieBreaker(0.3f);
// 分词匹配度向下取整
matchWord.minimumShouldMatch("75%");
PrefixQueryBuilder prefixNameBuilder = new PrefixQueryBuilder("name.ik_pinyin_field",keyword);
PrefixQueryBuilder prefixAddressBuilder = new PrefixQueryBuilder("address.ik_pinyin_field",keyword);
// 拼音支持词根 全拼+首字母 ik_smart分词后转拼音 全拼+首字母
MultiMatchQueryBuilder matchPinYin = QueryBuilders.multiMatchQuery(keyword);
matchPinYin.field("name.ik_pinyin_field")
.field("name.ik_smart_pinyin_field")
.field("address.ik_smart_pinyin_field")
.field("address.ik_pinyin_field");
boolQueryBuilder.should(matchWord).should(matchPinYin).should(prefixNameBuilder).should(prefixAddressBuilder);
return boolQueryBuilder;
}
代码比较简单,需要说的就是布尔过滤器 中文的权限调高点,
// 分词匹配度向下取整
matchWord.minimumShouldMatch("75%");这个是指分词比例是多少,例如目标是九龙湖加油站,这个词根入分词的时候会被分解九龙湖、加油、站,当你搜索南京南站加油站时,会被解析南京、南站、加油、站这个和九龙湖加油站分词比例是50%所以不会被搜出来,应该不难理解,你输入的搜索分解词根命中数除以你输入的词语解析的总数、后面的代码就是should了
附近周边查询
// 分页参数
pageNum = (pageNum < 1 ? 1:pageNum);
NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
nativeSearchQueryBuilder.withPageable(PageRequest.of(pageNum-1,pageSize));
// 查询条件
BoolQueryBuilder boolQueryBuilder = getQueryNameAndAddressBool(keyword,typeCode);
// 距离参数
GeoDistanceQueryBuilder distanceQueryBuilder = new GeoDistanceQueryBuilder("geoPoint");
distanceQueryBuilder.point(lat,lng);
// 5KM为周边
distanceQueryBuilder.distance(5, DistanceUnit.KILOMETERS);
GeoDistanceSortBuilder geoDistanceSortBuilder = new GeoDistanceSortBuilder("geoPoint",lat,lng);
geoDistanceSortBuilder.unit(DistanceUnit.METERS);
geoDistanceSortBuilder.order(SortOrder.ASC);
nativeSearchQueryBuilder.withSort(geoDistanceSortBuilder);
nativeSearchQueryBuilder.withFilter(distanceQueryBuilder);
nativeSearchQueryBuilder.withQuery(boolQueryBuilder);
// 返回
SearchHits<GeoPointData> search = operations.search(nativeSearchQueryBuilder.build(), GeoPointData.class);
return search;
代码比较简单就不说了有一个比较难的就是给你一条线路,找周边的点,像高德地图的沿途搜索
我们需要引入jts 主要的思路就是获取多边形坐标,然后查询,坑的是缓冲没有米的,需要换算,这个用的应该比较少,这个还是搞了好久。
//创建一条直线
String[] split = request.getCodeLine().split("-");
Coordinate[] coordinates = new Coordinate[split.length];
for (int i = 0; i < split.length; i++) {
String sb = split[i];
String[] split1 = sb.split(",");
Coordinate coordinate = new Coordinate(Double.valueOf(split1[0]), Double.valueOf(split1[1]));;
coordinates[i] = coordinate;
}
GeometryFactory gf=new GeometryFactory();
Geometry gfLineString = gf.createLineString(coordinates);
//缓冲区
Double degree = (180/earthRadius/Math.PI)*(bufferDistance/1000);
//缓冲区建立
BufferOp bufOp = new BufferOp(gfLineString);
bufOp.setEndCapStyle(BufferParameters.CAP_ROUND);
Geometry bg = bufOp.getResultGeometry(degree);
Coordinate[] bgCoordinates = bg.getCoordinates();
//获取周边点
List<GeoPoint> collect = Arrays.stream(bgCoordinates).map(item -> new GeoPoint(item.getY(), item.getX())).collect(Collectors.toList());
好了,写的可能比较乱,如果有什么疑问也可以在评论里面说出来,大家讨论下
|