如何使用Java语言实现一个网页爬虫

try { // 生成一个URL对象,要获取源代码的网页地址为:http://www.sina.com.cn url = new URL("http://www.jb51.net/article/97787.htm");// 打开URL urlConnection = (HttpURLConnection) url.openConnection();// 获取服务器响应代码 responsecode = urlConnection.getResponseCode();Stri...
如何使用Java语言实现一个网页爬虫
我给你代码
public class DEmo {
public static void match(String s1) {
Pattern p = Pattern.compile("<a(.*)>.*</a>");
Matcher m = p.matcher(s1);
while (m.find()) {
System.out.println(m.group(1));
}
}

public static void main(String args[]) {
URL url;
int responsecode;
HttpURLConnection urlConnection;
BufferedReader reader;
String line;
try {
// 生成一个URL对象,要获取源代码的网页地址为:http://www.sina.com.cn
url = new URL("http://www.jb51.net/article/97787.htm");
// 打开URL
urlConnection = (HttpURLConnection) url.openConnection();
// 获取服务器响应代码
responsecode = urlConnection.getResponseCode();
String temp = "";
if (responsecode == 200) {
// 得到输入流,即获得了网页的内容
reader = new BufferedReader(new InputStreamReader(
urlConnection.getInputStream(), "GBK"));
while ((line = reader.readLine()) != null) {
temp = temp + line;
}
System.out.println(temp);
match(temp);

} else {
System.out.println("获取不到网页的源码,服务器响应代码为:" + responsecode);
}
} catch (Exception e) {
System.out.println("获取不到网页的源码,出现异常:" + e);
}

}
}2016-12-02
mengvlog 阅读 49 次 更新于 2025-10-28 22:51:26 我来答关注问题0
檬味博客在线解答立即免费咨询

Java相关话题

Copyright © 2023 WWW.MENGVLOG.COM - 檬味博客
返回顶部