多语言展示
当前在线:1118今日阅读:26今日分享:39

Java 简单网络爬虫

大家在工作之余,可能会把这些时间给利用起来,比如爬爬网站的页面等等,今天我就和大家分享一下简单的获取网站页面。
工具/原料

eclipse

方法/步骤
1

首先我们先创建一个maven项目,找到Mavne Project ,然后Next。

2

勾选Create a simple project (skip archetype selection) 然后Next。

3

在Group框写域名,在ArtifactId里写项目名。 然后Finish。这个maven项目就创建完了。

4

然后就是创建项目包了,如图:

5

构建pom.xml:  4.0.0  com.baidu  PaChong1  0.0.1-SNAPSHOT   jar                              org.apache.httpcomponents                httpclient                4.3.1                                                                org.slf4j        slf4j-api        1.7.21                    ch.qos.logback        logback-core        1.1.3                ch.qos.logback        logback-classic        1.1.3                org.slf4j        log4j-over-slf4j        1.7.21                    org.projectlombok        lombok        1.16.20         

6

创建完毕,整个项目框架如图,然后在com.baidu里建方法类。

7

写类名,这个看个人爱好:

8

在类里写实现代码:package com.baidu;import java.io.BufferedReader;import java.io.IOException;import java.io.InputStreamReader;import org.apache.http.HttpEntity;import org.apache.http.HttpStatus;import org.apache.http.client.methods.CloseableHttpResponse;import org.apache.http.client.methods.HttpGet;import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.impl.client.HttpClients;public class HttpGetUtils {    public static void main(String[] args) {        // TODO Auto-generated method stub         //https://v.qq.com/  http://www.youku.com/  http://m.sunlands.com        String str=get('http://m.sunlands.com');        System.out.println(str);    }    private static String get(String url) {          String result = '';            try {                //获取httpclient实例                CloseableHttpClient httpclient = HttpClients.createDefault();                //获取方法实例。GET                HttpGet httpGet = new HttpGet(url);                //执行方法得到响应                CloseableHttpResponse response = httpclient.execute(httpGet);                try {                    //如果正确执行而且返回值正确,即可解析                    if (response != null                            && response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) {                        System.out.println(response.getStatusLine());                        HttpEntity entity = response.getEntity();                        //从输入流中解析结果                        result = readResponse(entity, 'utf-8');                    }                } finally {                    httpclient.close();                    response.close();                }            }catch (Exception e){                e.printStackTrace();            }            return result;    }        private static String readResponse(HttpEntity resEntity, String charset) {        StringBuffer res = new StringBuffer();        BufferedReader reader = null;        try {            if (resEntity == null) {                return null;            }            reader = new BufferedReader(new InputStreamReader(                    resEntity.getContent(), charset));            String line = null;            while ((line = reader.readLine()) != null) {                res.append(line);            }        } catch (Exception e) {            e.printStackTrace();        } finally {            try {                if (reader != null) {                    reader.close();                }            } catch (IOException e) {            }        }        return res.toString();    }}

9

运行出现这个就说明我们成功了。

注意事项

jdk,maven环境一定要配好。

推荐信息