课外天地 李树青学习天地信息检索原理课件 → 适用于元搜索引擎的搜索引擎结果采集方法


  共有22818人关注过本帖树形打印复制链接

主题:适用于元搜索引擎的搜索引擎结果采集方法

帅哥哟,离线,有人找我吗?
admin
  1楼 博客 | 信息 | 搜索 | 邮箱 | 主页 | UC


加好友 发短信 管理员
等级:管理员 帖子:1951 积分:26826 威望:0 精华:34 注册:2003/12/30 16:34:32
适用于元搜索引擎的搜索引擎结果采集方法  发帖心情 Post By:2009/5/6 19:07:08 [只看该作者]

1 最简单的下载网页方法
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;


 

public class Exec {
        public static void main(String args[]) {
                FileOutputStream fos;
                URL url;
                InputStream is;
                int i;


 

                try {
                        fos = new FileOutputStream("storedPage.html");
                        url = new URL("http://www.baidu.com");
                        System.out.println(url.getFile());
                        is = url.openStream();


 

                        i = is.read();
                        while (i > 0) {
                                fos.write(i);
                                i = is.read();
                        }
                        fos.close();
                        is.close();
                } catch (IOException e) {
                }
        }
}


 

2 获取搜索引擎结果
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;


 

public class Exec {
        public static void main(String args[]) {
                FileOutputStream fos;
                URL url;
                InputStream is;
                int i;


 

                try {
                        fos = new FileOutputStream("storedPage.html");
                        url = new URL("http://www.baidu.com/s?wd=mp3");
                        System.out.println(url.getFile());
                        is = url.openStream();


 

                        i = is.read();
                        while (i > 0) {
                                fos.write(i);
                                i = is.read();
                        }
                        fos.close();
                        is.close();
                } catch (IOException e) {
                }
        }
}


 

3 基于swing窗体的搜索引擎结果采集程序
import java.awt.BorderLayout;
import java.awt.Dimension;
import java.awt.FlowLayout;
import java.awt.Toolkit;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;


 

import javax.swing.JButton;
import javax.swing.JFrame;
import javax.swing.JPanel;
import javax.swing.JScrollPane;
import javax.swing.JTextArea;
import javax.swing.JTextField;
import javax.swing.ScrollPaneConstants;


 

//测试类
public class Exec {
        public static void main(String[] args) {
                // 创建窗体类变量
                DemoWindow dw = new DemoWindow("网络文件下载");


 

                // 将窗体的宽度和高度分别设置为屏幕宽度和屏幕高度的1/3,左上角位置也设置为屏幕宽度和屏幕高度的1/3处
                Toolkit theKit = dw.getToolkit();
                Dimension wndSize = theKit.getScreenSize();
                dw.setBounds(wndSize.width / 3, wndSize.height / 3, wndSize.width / 3,
                                wndSize.height / 3);


 

                // 点击关闭按钮可以退出程序
                dw.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);


 

                // 设置窗体为可见
                dw.setVisible(true);
        }
}


 

// 界面窗体
class DemoWindow extends JFrame implements ActionListener {
        // 输入网络文件URL的文本框
        JTextField jtf = new JTextField(25);


 

        // 操作按钮
        JButton jb = new JButton("下载");


 

        // 显示网络文件信息的文本区
        JTextArea jta = new JTextArea();


 

        // 设置文本区的滚动条
        int v = ScrollPaneConstants.VERTICAL_SCROLLBAR_AS_NEEDED;
        int h = ScrollPaneConstants.HORIZONTAL_SCROLLBAR_AS_NEEDED;
        JScrollPane jsp = new JScrollPane(jta, v, h);


 

        // 布局面板
        JPanel jp = new JPanel();


 

        // 网络文件下载
        Downloader downloader;


 

        // 构造函数
        public DemoWindow(String title) {
                super(title);


 

                // 窗体布局
                jp.setLayout(new FlowLayout(FlowLayout.LEFT));
                jp.add(jtf);
                jp.add(jb);
                add(jp, BorderLayout.NORTH);
                add(jsp, BorderLayout.CENTER);


 

                // 添加事件监听器
                jtf.addActionListener(this);
                jb.addActionListener(this);
        }


 

        // 响应单击按钮
        public void actionPerformed(ActionEvent e) {
                // 创建网络文件下载类变量
                downloader = new Downloader(jtf.getText(), jta);


 

                // 启动下载线程
                Thread thread = new Thread(downloader);
                thread.start();
        }
}


 

// 网络文件下载类
class Downloader implements Runnable {
        // 网络文件的URL
        String urlString;


 

        // 显示网络文件信息的文本区
        JTextArea jta;


 

        // 构造函数
        public Downloader(String urlString, JTextArea jta) {
                // 设置属性
                this.urlString = urlString;
                this.jta = jta;
        }


 

        // 下载网络文件的线程方法
        public void run() {
                // 网络文件的相关信息
                StringBuffer info = new StringBuffer();
                try {
                        // 网络文件的URL
                        URL url = new URL(urlString);


 

                        // 打开该网络文件的URL连接
                        URLConnection urlConn = url.openConnection();


 

                        // 添加网络文件的相关信息
                        info.append("主机: " + url.getHost() + "\n");
                        info.append("端口: " + url.getDefaultPort() + "\n");
                        info.append("网络文件的类型: " + urlConn.getContentType() + "\n");
                        info.append("长度: " + urlConn.getContentLength() + "\n");
                        info.append("正在下载...");


 

                        // 显示网络文件的相关信息
                        jta.setText(info.toString());


 

                        // 创建网络文件的输入流
                        InputStream is = urlConn.getInputStream();


 

                        // 获取网络文件的文件名称
                        String localFileName = url.getFile().substring(
                                        url.getFile().lastIndexOf("/") + 1);


 

                        System.out.println(localFileName);
                        // http://www.baidu.com/index.html


 

                        // 创建本地文件输出流
                        FileOutputStream fos = new FileOutputStream(localFileName);


 

                        // 读取网络文件到本地文件
                        int data;
                        while ((data = is.read()) != -1) {
                                fos.write(data);
                        }


 

                        // 关闭流
                        is.close();
                        fos.close();
                } catch (Exception e) {
                        System.out.println(e.getMessage());
                }
                jta.append("下载完毕!");
        }
}

[此贴子已经被作者于2010-12-14 09:38:23编辑过]

 回到顶部