ElasticSearch入门实战

基本概念

文档 Document

Elasticsearch 是面向文档的，这意味着索引或搜索的最小数据单元是文档。

文档类似于关系数据库中的一行。不同之处在于索引中的每个文档可以具有不同的结构（字段），但是对于通用字段应该具有相同的数据类型。 MySQL => Databases =>Tables => Columns / Rows， ElasticSearch => Indices => Types =>具有属性的文档。

类型 type

类型是文档的逻辑容器，类似于表格是行的容器。最好将不同结构的文档放入不同的类型中。

索引 index

索引是大量的文档集合。每个索引存储在磁盘上的同组文件中，它有一个定义多种类型的映射，索引存储了所有映射类型的字段。

分片 shard

由于Elasticsearch是一个分布式搜索引擎，因此索引通常会拆分为分布在多个节点上的称为分片的元素。

IK分词器

下载

下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases

安装

将ik分词器的文件放入es目录下的plugins中的 ik 目录（ik目录由自己创建）。

两种分词算法

ik_smart：最粗粒度的拆分（最少切分）

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "一起滑雪吧"
}

运行结果
{
  "tokens" : [
    {
      "token" : "一起",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "滑雪",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "吧",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 2
    }
  ]
}

ik_max_word：最细粒度划分（穷尽词库的可能）

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "一起滑雪吧"
}

运行结果
{
  "tokens" : [
    {
      "token" : "一起",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "一",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "TYPE_CNUM",
      "position" : 1
    },
    {
      "token" : "起",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "COUNT",
      "position" : 2
    },
    {
      "token" : "滑雪",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "吧",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 4
    }
  ]
}

扩展词典

有些词未存在词典中，需要我们自己去扩展

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "计算机组成原理"
}

运行结果
{
  "tokens" : [
    {
      "token" : "计算机",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "组成",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "原理",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

“计算机组成原理” 作为一个学科，本应是一个完整的词，不过词典中没有，需要我们手动添加进词典。

解决：

打开ik下config目录下的 IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict"></entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

创建自己的词典：
- 新建dic文件（mydic.dic），将 “计算机组成原理” 添加进去
- 注意：使用UTF-8保存
配置IKAnalyzer.cfg.xml

<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">mydic.dic</entry>
 <!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>

重启es

Restful风格操作

索引操作

创建一个索引

1 2	PUT /索引名/类型名/文档id (类型名未来不用) {请求体}

测试

PUT /test1/type1/1
{
  "name": "SpringBoot",
  "age": 123
}

创建规则（即手动加规则，否则将自动生成规则）

PUT /test2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      },
      "birthday" :{
        "type": "date"
      }
    }
  }
}

查看具体信息

GET test2

查看默认信息

PUT /test3/_doc/1
{
  "name": "柠檬茶",
  "age": 12,
  "birthday": "2020-11-11"
}

GET test3

拓展

1	GET _cat/indices?v 查看各索引信息

修改

POST /test3/_doc/1/_update
{
  "doc":{
    "name": "法外狂徒张三"				//将柠檬茶修改
  }
}

删除

1 2	DELETE /test3/_doc/1 删除文档 DELETE /test3 删除索引

文档操作

添加数据

PUT /mine/user/1
{
  "name": "Kim",
  "age": 20,
  "hobbit": ["篮球","技术"]
}

PUT /mine/user/3
{
  "name": "李四",
  "age": 33,
  "hobbit": ["战斗","飞行"]
}

获取数据 GET

1	GET mine/user/3

{
  "_index" : "mine",
  "_type" : "user",
  "_id" : "3",
  "_version" : 1,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "李四",
    "age" : 33,
    "hobbit" : [
      "战斗",
      "飞行"
    ]
  }
}

更新数据 PUT / POST（推荐）

PUT /mine/user/3
{
  "name": "李四233",
  "age": 33,
  "hobbit": ["战斗","飞行"]
}

POST /mine/user/3/_update					若少掉/_update，则跟PUT一样
{
  "doc":{
    "name": "我是Kim，不是李四"
  }
}

区别：PUT需要完整信息，否则会置空

简单查询

1	GET mine/user/3

简单条件查询

1	GET mine/user/_search?q=name:Kim

复杂查询

精准匹配

GET mine/user/_search
{
  "query": {
    "match": {		
      "name": "Kim"
    }
  }
}

查询结果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {						//命中
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.93710405,		//最大权重
    "hits" : [
      {
        "_index" : "mine",
        "_type" : "user",
        "_id" : "1",
        "_score" : 0.93710405,
        "_source" : {				//未指定source，默认全部
          "name" : "Kim",
          "age" : 20,
          "hobbit" : [
            "篮球",
            "技术"
          ]
        }
      },
      {
        "_index" : "mine",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.42466223,
        "_source" : {
          "name" : "我是Kim，不是李四",
          "age" : 33,
          "hobbit" : [
            "战斗",
            "飞行"
          ]
        }
      }
    ]
  }
}

过滤结果

GET mine/user/_search
{
  "query": {
    "match": {
      "name": "Kim"
    }
  },
  "_source": ["name","age"]					//输出字段
}

排序

GET mine/user/_search
{
  "query": {
    "match": {
      "name": "Kim"
    }
  },
  "sort": [
    {
      "age": {						//通过age字段来排序
        "order": "desc"				//desc为降序；升序为asc
      }
    }
  ]
}

分页查询

GET mine/user/_search
{
  "query": {
    "match": {
      "name": "Kim"
    }
  },
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ],
  "from": 0,					//从第几条数据开始
  "size": 1						//返回多少条数据（单页面数据大小）
}

布尔值查询

GET mine/user/_search
{
  "query": {
    "bool": {
      "must": [							//must相当于and
        {								//使用should则相当于or，即多条件中符合其中一个就行
          "match": {
            "name": "Kim"
          }
        },								//多条件精确查询
        {
          "match": {
            "age": 20
          }
        }
      ]
    }
  }
}

GET mine/user/_search
{
  "query": {
    "bool": {
      "must_not": [						//must_not：除去字段相匹配的数据
        {
          "match": {
            "name": "Kim"		
          }
        }
      ]
    }
  }
}

GET mine/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "Kim"
          }
        }
      ],
      "filter": {					//过滤
        "range": {					//范围过滤
          "age": {					//通过age字段进行过滤
            "gt": 10,				//大于10
            "lte": 20				//小于等于20
          }
        }
      }
    }
  }
}

匹配多个条件

GET mine/user/_search
{
  "query": {
    "match": {
      "hobbit": "技术 音"					//多个条件使用空格隔开
    }									  //只要满足其中一个结果就可以查出
  }
}

两种类型text与keyword

创建信息（name用text，desc用keyword）

PUT testdb
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text"
      },
      "desc":{
        "type": "keyword"
      }
    }
  }
}

PUT /testdb/_doc/1
{
  "name": "狂神说java name",
  "desc": "狂神说java desc"
}

PUT /testdb/_doc/2
{
  "name": "狂神说java name",
  "desc": "狂神说java desc2"
}

测试

GET _analyze
{
  "analyzer": "keyword",
  "text": "狂神说java name"
}

===============测试结果====================
{
  "tokens" : [
    {
      "token" : "狂神说java name",
      "start_offset" : 0,
      "end_offset" : 12,
      "type" : "word",
      "position" : 0
    }
  ]
}

GET _analyze
{
  "analyzer": "standard",
  "text": "狂神说java name"
}

===============测试结果====================
{
  "tokens" : [
    {
      "token" : "狂",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "神",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "说",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "java",
      "start_offset" : 3,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "name",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]
}

总结：keyword不会被分析，而默认的standard则会被拆分

精确查询

term查询将按照存储在倒排索引中的确切字词进行操作

term是代表完全匹配，即不进行分词器分析关键字，文档中必须包含整个搜索的词汇
match和term的区别是，match查询的时候，elasticsearch会使用分词器，而term查询不会使用分词器

match查询相当于模糊匹配，只包含关键字其中一部分关键词就行

测试1

GET testdb/_search
{
  "query": {
    "term": {
      "name": {
        "value": "狂"
      }
    }
  }
}

===============测试结果====================
{
  "took" : 321,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.45665967,
    "hits" : [
      {
        "_index" : "testdb",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.45665967,
        "_source" : {
          "name" : "狂神说java name",						//只要名字中带有“狂”的都可以
          "desc" : "狂神说java desc"
        }
      },
      {
        "_index" : "testdb",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.45665967,
        "_source" : {
          "name" : "狂神说java name",
          "desc" : "狂神说java desc2"
        }
      }
    ]
  }
}

测试2

GET testdb/_search
{
  "query": {
    "term": {
      "desc": {
        "value": "狂神说java desc"
      }
    }
  }
}

===============测试结果====================
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9808291,
    "hits" : [
      {
        "_index" : "testdb",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.9808291,
        "_source" : {
          "name" : "狂神说java name",
          "desc" : "狂神说java desc"			 			//只有一个了
        }
      }
    ]
  }
}

原因分析：name使用的是text，所以会被分词器解析；desc用的是keyword，所以必须完全匹配

多个值匹配的精确查询

GET testdb/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "t1": {
              "value": "22"
            }
          }
        },
        {
          "term": {
            "t1": "33"
          }
        }
      ]
    }
  }
}

高亮查询

让name字段为Kim的高亮

GET mine/user/_search
{
  "query": {
    "match": {
      "name": "Kim"
    }
  },
  "highlight": {
    "fields": {
      "name":{}
    }
  }
}

结果

"highlight" : {
          "name" : [
            "<em>Kim</em>"
          ]
        }

此时使用标签包裹起来，也可以自己修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET mine/user/_search
{
"query": {
"match": {
"name": "Kim"
}
},
"highlight": {
"pre_tags": "<p class='key' style='color:red'>", //前缀
"post_tags": "</p>", //后缀
"fields": {
"name":{}
}
}
}

结果

1
2
3
4
5
"highlight" : {
"name" : [
"<p class='key' style='color:red'>Kim</p>"
]
}

集成SpringBoot

Maven原生依赖

1
2
3
4
5
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.6.2</version>
</dependency>

初始化

1
2
3
4
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));

关闭客户端

1
client.close();

使用模板

创建SpringBoot项目并选中NoSQL中的ElasticSearch模块

下载下来springboot版本与es版本不一致，需要自己定义es版本依赖，保证和本地一致

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.2.5.RELEASE</version>
<relativePath/>
</parent>
<groupId>com.Kim</groupId>
<artifactId>es-api</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>es-api</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>


<elasticsearch.version>7.6.1</elasticsearch.version>

</properties>

创建ElasticSearchClientConfig.java

1
2
3
4
5
6
7
8
9
10
11
@Configuration
public class ElasticSearchClientConfig {

@Bean
public RestHighLevelClient restHighLevelClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("localhost", 9200, "http")));
return client;
}
}

Api测试
索引操作

在测试类中创建对象

1
2
@Autowired
private RestHighLevelClient restHighLevelClient;

或者

1
2
3
@Autowired
@Qualifier("restHighLevelClient")
private RestHighLevelClient client;

索引的创建（类似于 PUT kim_index ）

1
2
3
4
5
6
7
8
9
10
11
/**
* 测试索引的创建
*/
@Test
void testCreatIndex() throws IOException {
//1.创建索引请求
CreateIndexRequest request = new CreateIndexRequest("kim_index");
//2.客户端执行请求（请求后获得响应）
CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
System.out.println(response);
}

测试获取索引

1
2
3
4
5
6
7
8
9
10
/**
* 测试获取索引（只能判断存不存在）
*/
@Test
void testExistIndex() throws IOException {
//获得索引请求
GetIndexRequest request = new GetIndexRequest("kim_index2");
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}

测试删除索引

1
2
3
4
5
6
7
8
9
10
/**
* 测试删除索引
*/
@Test
void testDeleteIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest("kim_index");
AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
//删除索引成功则返回true
System.out.println(delete.isAcknowledged());
}

文档操作

添加文档

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
* 测试添加文档
*/
@Test
void testAddDocument() throws IOException {
//创建对象
User user = new User("Kim", 20);
//创建请求
IndexRequest request = new IndexRequest("kim_index");
//规则 PUT /kim_index/_doc/1
request.id("1");
request.timeout(TimeValue.timeValueSeconds(1));
//设置超时时间（两种方法都行）
request.timeout("1s");
//将数据放入请求 json
request.source(JSON.toJSONString(user), XContentType.JSON);
//客户端发送请求，获取响应的结果
IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT);
System.out.println(indexResponse.toString());
System.out.println(indexResponse.status());
}

获取文档

1
2
3
4
5
6
7
8
9
10
11
/**
* 获取文档，判断是否存在 GET index/_doc/1
*/
@Test
void testIsExists() throws IOException {
GetRequest getRequest = new GetRequest("kim_index", "1");
//不获取返回的 _source 的上下文，效率更高（可写可不写）
getRequest.fetchSourceContext(new FetchSourceContext(false));
boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
System.out.println(exists);
}

获取文档信息

1
2
3
4
5
6
7
8
9
10
/**
* 获取文档信息
*/
@Test
void testGetDocument() throws IOException {
GetRequest getRequest = new GetRequest("kim_index", "1");
GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
System.out.println(getResponse.getSourceAsString());
System.out.println(getResponse);
}

更新文档信息

1
2
3
4
5
6
7
8
9
10
11
12
/**
* 更新文档信息
*/
@Test
void testUpdateDocument() throws IOException {
UpdateRequest updateRequest = new UpdateRequest("kim_index", "1");
updateRequest.timeout("1s");
User user = new User("KimTou", 21);
updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);
UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
System.out.println(updateResponse.status());
}

删除文档记录

1
2
3
4
5
6
7
8
9
10
/**
* 删除文档记录
*/
@Test
void testDeleteDocument() throws IOException {
DeleteRequest deleteRequest = new DeleteRequest("kim_index", "1");
deleteRequest.timeout("1s");
DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
System.out.println(deleteResponse.status());
}

批量导入数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/**
* 批量插入数据
*/
@Test
void testBulkRequest() throws IOException {
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.timeout("10s");
ArrayList<User> userList = new ArrayList<>();
userList.add(new User("Kim1",20));
userList.add(new User("Kim2",20));
userList.add(new User("Kim3",20));
userList.add(new User("KimTou1",20));
userList.add(new User("KimTou2",20));
userList.add(new User("KimTou3",20));
//批处理请求
for (int i = 0; i < userList.size(); i++) {
//更新删除操作类似
bulkRequest.add(
new IndexRequest("kim_index")
.id(""+(i+1))
.source(JSON.toJSONString(userList.get(i)),XContentType.JSON));
}
BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
System.out.println(bulkResponse.hasFailures());
}

查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/**
* 查询
*/
@Test
void testSearch() throws IOException {
//搜索请求
SearchRequest searchRequest = new SearchRequest();
//构建查询条件
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
//查询条件，可以使用 QueryBuilders 工具类来实现
//QueryBuilders.termQuery 精确查询
//QueryBuilders.matchAllQuery 匹配所有
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "KimTou1");
//MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
sourceBuilder.query(termQueryBuilder);
//超时时间60s
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(JSON.toJSONString(searchResponse.getHits()));
System.out.println("====================================");
for (SearchHit documentFields : searchResponse.getHits().getHits()) {
System.out.println(documentFields.getSourceAsMap());
}
}

项目实战
项目搭建

模块选择：DevTools三个、Web、Thymeleaf、ElasticSearch

pom.xml中修改正确版本

1
2
3
4
5
6
7
8
9
//SpringBoot版本选择2.2.5
<version>2.2.5.RELEASE</version>

//ElasticSearch选择7.6.1
<properties>
<java.version>1.8</java.version>

<elasticsearch.version>7.6.1</elasticsearch.version>
</properties>

pom.xml中添加fastjson依赖

修改application.properties配置文件

1
2
3
server.port=9090
# 关闭Thymeleaf缓存
spring.thymeleaf.cache=false

导入前端素材

访问首页

1
2
3
4
@GetMapping({"/","/index"})
public String index(){
return "index";
}

爬虫
导入jsoup依赖（解析网页）

1
2
3
4
5
6

<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>

测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public static void main(String[] args) {
//获取请求
String url = "https://search.jd.com/Search?keyword=java";
//解析网页（Jsoup返回的Document对象就是浏览器Document对象）
Document document = Jsoup.parse(new URL(url), 30000);
//所有能在javascript中使用的方法，这里都能用
Element element = document.getElementById("J_goodsList");
//获取所有的li元素
Elements elements = element.getElementsByTag("li");
//获取元素中的内容，这里的el就是每一个li标签
for (Element el : elements) {
/*京东网站使用了懒加载
String img = el.getElementsByTag("img").eq(0).attr("src");*/
String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
String price = el.getElementsByClass("p-price").eq(0).text();
String title = el.getElementsByClass("p-name").eq(0).text();
System.out.println("===============================================");
System.out.println(img);
System.out.println(price);
System.out.println(title);
}
}

封装工具类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
@Component
public class HtmlParseUtil {
//测试
//public static void main(String[] args) throws IOException {
// new HtmlParseUtil().parseJD("心理学").forEach(System.out::println);
//}

/**
* 工具类
*/
public List<Content> parseJD(String keyword) throws IOException {
String url = "https://search.jd.com/Search?keyword="+keyword;
Document document = Jsoup.parse(new URL(url), 30000);
Element element = document.getElementById("J_goodsList");
//获取所有的li元素
Elements elements = element.getElementsByTag("li");
ArrayList<Content> goodsList = new ArrayList<>();
//获取元素中的内容，这里的el就是每一个li标签
for (Element el : elements) {
/*京东网站使用了懒加载
String img = el.getElementsByTag("img").eq(0).attr("src");*/
String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
String price = el.getElementsByClass("p-price").eq(0).text();
String title = el.getElementsByClass("p-name").eq(0).text();
Content content = new Content();
content.setTitle(title);
content.setPrice(price);
content.setImg(img);
goodsList.add(content);
}
return goodsList;
}
}

业务编写

解析数据放入es索引中

获取这些数据，实现搜索功能

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
@Service
public class ContentService {

@Autowired
private RestHighLevelClient restHighLevelClient;

/**
* 1.解析数据放入es索引中
*/
public Boolean parseContent(String keyword) throws IOException {
List<Content> contents = new HtmlParseUtil().parseJD(keyword);
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.timeout("2m");
for (int i = 0; i < contents.size(); i++) {
bulkRequest.add(
new IndexRequest("jd_goods")
.source(JSON.toJSONString(contents.get(i)), XContentType.JSON));
}
BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
return !bulkResponse.hasFailures();
}

/**
* 2.获取这些数据，实现搜索功能
*/
public List<Map<String,Object>> searchPage(String keyword,int pageNo,int pageSize) throws IOException {
if(pageNo<=1){
pageNo = 1;
}

//条件搜索
SearchRequest searchRequest = new SearchRequest("jd_goods");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

//分页
sourceBuilder.from(pageNo);
sourceBuilder.size(pageSize);

//精确查询
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyword);
sourceBuilder.query(termQueryBuilder);
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

//执行搜索
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

ArrayList<Map<String,Object>> list = new ArrayList<>();
//解析结果
for (SearchHit document : searchResponse.getHits().getHits()) {
list.add(document.getSourceAsMap());
}
return list;
}

}

Controller层

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@RestController
public class ContentController {

@Autowired
private ContentService contentService;

@GetMapping("/parse/{keyword}")
public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {
return contentService.parseContent(keyword);
}

@GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
public List<Map<String,Object>> search(@PathVariable("keyword") String keyword,
@PathVariable("pageNo") int pageNo,
@PathVariable("pageSize") int pageSize) throws IOException {
return contentService.searchPage(keyword, pageNo, pageSize);
}

}

前后端交互

创建vue

1
2
3
4
5
C:\Users\MI\Desktop\vue>npm install vue
......

C:\Users\MI\Desktop\vue>npm install axios //ajax
......

导入这两个的js文件

index.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">

<head>
<meta charset="utf-8"/>
<title>狂神说Java-ES仿京东实战</title>
<link rel="stylesheet" th:href="@{/css/style.css}"/>
</head>

<body class="pg">
<div class="page" id="app">
<div id="mallPage" class=" mallist tmall- page-not-market ">


<div id="header" class=" header-list-app">
<div class="headerLayout">
<div class="headerCon ">

<h1 id="mallLogo">
<img th:src="@{/images/jdlogo.png}" alt="">
</h1>

<div class="header-extra">


<div id="mallSearch" class="mall-search">
<form name="searchTop" class="mallSearch-form clearfix">
<fieldset>
<legend>天猫搜索</legend>
<div class="mallSearch-input clearfix">
<div class="s-combobox" id="s-combobox-685">
<div class="s-combobox-input-wrap">
<input v-model="keyword" type="text" autocomplete="off" value="dd" id="mq"
class="s-combobox-input" aria-haspopup="true">
</div>
</div>
<button type="submit" @click.prevent="searchKey" id="searchbtn">搜索</button>
</div>
</fieldset>
</form>
<ul class="relKeyTop">
<li><a>狂神说Java</a></li>
<li><a>狂神说前端</a></li>
<li><a>狂神说Linux</a></li>
<li><a>狂神说大数据</a></li>
<li><a>狂神聊理财</a></li>
</ul>
</div>
</div>
</div>
</div>
</div>


<div id="content">
<div class="main">

<form class="navAttrsForm">
<div class="attrs j_NavAttrs" style="display:block">
<div class="brandAttr j_nav_brand">
<div class="j_Brand attr">
<div class="attrKey">
品牌
</div>
<div class="attrValues">
<ul class="av-collapse row-2">
<li><a href="#"> 狂神说 </a></li>
<li><a href="#"> Java </a></li>
</ul>
</div>
</div>
</div>
</div>
</form>


<div class="filter clearfix">
<a class="fSort fSort-cur">综合<i class="f-ico-arrow-d"></i></a>
<a class="fSort">人气<i class="f-ico-arrow-d"></i></a>
<a class="fSort">新品<i class="f-ico-arrow-d"></i></a>
<a class="fSort">销量<i class="f-ico-arrow-d"></i></a>
<a class="fSort">价格<i class="f-ico-triangle-mt"></i><i class="f-ico-triangle-mb"></i></a>
</div>


<div class="view grid-nosku">

<div class="product" v-for="result in results">
<div class="product-iWrap">

<div class="productImg-wrap">
<a class="productImg">
<img :src="result.img">
</a>
</div>

<p class="productPrice">
<em>{{result.price}}</em>
</p>

<p class="productTitle">
<a v-html="result.title"> </a>
</p>

<div class="productShop">
<span>店铺：狂神说Java </span>
</div>

<p class="productStatus">
<span>月成交<em>999笔</em></span>
<span>评价 <a>3</a></span>
</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>



<script type="text/javascript" th:src="@{/js/axios.min.js}"></script>
<script type="text/javascript" th:src="@{/js/vue.min.js}"></script>
<script>

new Vue({
el: '#app',
data:{
keyword:'', //搜索关键字
results:[] //搜索结果
},
methods:{
searchKey(){
var keyword = this.keyword;
console.log(keyword);
//对接后端接口
axios.get('search/'+keyword+"/1/10").then(response=>{
console.log(response);
this.results = response.data; //绑定数据
})
}
}
})

</script>

</body>
</html>

关键字高亮
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
/**
* 3.获取这些数据，实现搜索高亮功能
*/
public List<Map<String,Object>> searchPageHighlightBuilder(String keyword,int pageNo,int pageSize) throws IOException {
if(pageNo<=1){
pageNo = 1;
}

//条件搜索
SearchRequest searchRequest = new SearchRequest("jd_goods");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

//分页
sourceBuilder.from(pageNo);
sourceBuilder.size(pageSize);

//精确查询
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyword);
sourceBuilder.query(termQueryBuilder);
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

//高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
//多个高亮显示
highlightBuilder.requireFieldMatch(false);
highlightBuilder.preTags("<span style='color:red'>");
highlightBuilder.postTags("</span>");
sourceBuilder.highlighter(highlightBuilder);

//执行搜索
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

ArrayList<Map<String,Object>> list = new ArrayList<>();
//解析结果
for (SearchHit document : searchResponse.getHits().getHits()) {

Map<String, HighlightField> highlightFields = document.getHighlightFields();
HighlightField title = highlightFields.get("title");
//原来的结果
Map<String, Object> sourceAsMap = document.getSourceAsMap();
//解析高亮的字段，将原来的字段替换为高亮的字段
if(title!=null){
Text[] fragments = title.fragments();
String newTitle = "";
for (Text text : fragments) {
newTitle += text;
}
//替换为高亮的字段
sourceAsMap.put("title",newTitle);
}
list.add(sourceAsMap);
}
return list;
}

参考资料：

https://www.bilibili.com/video/BV17a4y1x7zq

基本概念

文档 Document

类型 type

索引 index

分片 shard

IK分词器

下载

安装

两种分词算法

扩展词典

Restful风格操作

索引操作

创建一个索引

修改

删除

文档操作

添加数据

获取数据 GET

更新数据 PUT / POST（推荐）

简单查询

简单条件查询

复杂查询

精准匹配

过滤结果

排序

分页查询

布尔值查询

匹配多个条件

两种类型text与keyword

精确查询

多个值匹配的精确查询

高亮查询

集成SpringBoot

使用模板

Api测试

索引操作

文档操作

项目实战

项目搭建

爬虫

业务编写

前后端交互

关键字高亮