一:常用的类库有
- HttpClient
- Jsoup(通常用来解析返回的html页面)
二:常用的框架有
三:爬虫的大致流程
四:HttpClient的使用
1:依赖
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
2:不带参数的get请求
public static void get() throws IOException {
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://www.ming3.top/");
CloseableHttpResponse response = httpClient.execute(httpget);
System.out.println(response.toString());
String content = EntityUtils.toString(response.getEntity(), "UTF-8");
System.out.println(content);
}
3:带参数的post请求
public static void post() throws IOException {
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpPost httpPost = new HttpPost("http://www.ming3.top/wp-login.php");
List<NameValuePair> parameters = new ArrayList<NameValuePair>(0);
parameters.add(new BasicNameValuePair("log", "eighteen"));
parameters.add(new BasicNameValuePair("pwd", "233333338@qq.com"));
UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(parameters);
httpPost.setEntity(formEntity);
httpPost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36");
CloseableHttpResponse response = httpclient.execute(httpPost);
System.out.println(response);
String content = EntityUtils.toString(response.getEntity(), "UTF-8");
System.out.println(content);
}
4:当然还有带参数的get和不带参数的post,这里不再举例
5:使用post进行登录操作之后,常会返回需要重定向操作 如图所示:返回值是302,需要重定向,需要设置cookie
Ps:
HttpClient简易使用,写的很好
|