▍1. zhizhupc
使用网络爬虫技术实现自动查找指定网页上的新闻链接(Using web crawler technology automatically find links to news on a given page)
使用网络爬虫技术实现自动查找指定网页上的新闻链接(Using web crawler technology automatically find links to news on a given page)
主要应用领域: • 垂直搜索(Vertical Search):也称为专业搜索,高速、海量和精确抓取是定题网络爬虫DataScraper的强项,每天24小时每周7天无人值守自主调度的周期性批量采集,加上断点续传和软件看门狗(Watch Dog),确保您高枕无忧 • 移动互联网:手机搜索、手机混搭(mashup)、移动社交网络、移动电子商务都离不开结构化的数据内容,DataScraper实时高效地 采集内容,输出富含语义元数据的XML格式的抓取结果文件,确保自动化的数据集成和加工,跨越小尺寸屏幕展现和高精准信息检索的障碍。手机互联网不是 Web的子集而是全部,由MetaSeeker架设桥梁 • 企业竞争情报采集/数据挖掘:俗称商业智能(Business Intelligence),噪音信息滤除、结构化转换,确保数据的准确性和时效性,独有的广域分布式架构,赋予DataScraper无与伦比的情报采 集渗透能力,AJAX/Javascript动态页面、服务器动态网页、静态页面、各种鉴权认证机制,一视同仁。在微博网站数据采集和舆情监测领域远远领 先其它产品。(The main application areas: • Vertical Search (Vertical Search): also known as professional search, speed, mass and precision is the SDI Web crawler to crawl the strengths DataScraper 24 hours a day 7 days a week periodic unattended batch capture self-scheduling, Canada and software watchdog on the HTTP (Watch Dog), make sure you sit back and relax • Mobile Internet: mobile search, mobile mashups (mashup), mobile social networking, mobile commerce are inseparable from the structure of the data content, DataScraper efficiently capture real-time content, the output is rich semantic metadata XML format for the capture outcome document, to ensure that automated data integration and processing, across the small size screen display and high precision information retrieval obstacles. Mobile Internet is not a subset of Web but all, by building bridges MetaSeeker • Competitive intelligence gathering/data mining: commonly known as Business Intelligence (Business Intelli)
说明: nutch开发自己的搜索引擎 视频教程 简单 环境搭建(nutch own yourself search engine)
说明: 搜索引擎及Web智能的经典书籍,很多该方向导师推荐的信息检索必读书目。(Classic book on search and web intelligence, which is recommended by a lot of Prof. on information intelligence.)
说明: compass对lucene行进了封装 这个是对compass的应用,index的建立,搜索以及高级搜索(compass on the lucene marching compass of the package, this is the application, index creation, search and advanced search)
一个搜索引擎的实现,基于Java技术和Lunence的实现方案。(A search engine based on java and lucence.)
一个小的搜索引擎源代码,用java写的。现在很多研究搜索的都在用java写代码。(A small search engine source code, written using java. Now many studies in the english used to write java code.)
nutch-0.8刚出来不久的一个很好用的搜索引擎工具 nutch-0.8刚出来不久的一个很好用的搜索引(nutch-0.8 has just come out near a very good tool to use search engine nutch-0.8 has just come out soon with a good primer of english)
站内搜索lucene使用实例 (stations examples of the use of search lucene station examples of the use of search lucene)