A web crawler is a program (or a script) that browses the World Wide Web in order to index the pages and facilitate information retrieval. The web crawlers, also called as spiders, worms, ants, robots or wanderers, use the graph structure of the Internet to move from page to page. Usually, they start with a list of URLs to visit (called seeds) and then use all the external links found in those pages in order to spread throughout the Web. The primary use of web crawlers is to create and maintain indexes, to ease the work of search engines and specialized web portals. Other uses of spiders include checking links, validating HTML code or gathering information such as e-mail addresses used for spamming.
The outstanding expansion of the Internet and the increasing necessity for retrieving information in a fast and easy way has encouraged the appearance of numerous types of web crawlers, specialized in different domains in order to get a better performance. Some examples of spiders are: RBSE (Eichmann, 1994) - the first published web crawler, CORA , Letizia, Mapuccino, Sherlock Holmes, Google Crawler and Labrador.
References:
Kobayashi, M. and Takeda, K. (2000). "Information Retrieval on the Web"
Gautam Pant, Padmini Srinivasan, Filippo Menczer (2004) “Crawling the Web”
Wikipedia, The Free Enciclopedia : http://en.wikipedia.org/wiki/Web_crawler
The outstanding expansion of the Internet and the increasing necessity for retrieving information in a fast and easy way has encouraged the appearance of numerous types of web crawlers, specialized in different domains in order to get a better performance. Some examples of spiders are: RBSE (Eichmann, 1994) - the first published web crawler, CORA , Letizia, Mapuccino, Sherlock Holmes, Google Crawler and Labrador.
References:
Kobayashi, M. and Takeda, K. (2000). "Information Retrieval on the Web"
Gautam Pant, Padmini Srinivasan, Filippo Menczer (2004) “Crawling the Web”
Wikipedia, The Free Enciclopedia : http://en.wikipedia.org/wiki/Web_crawler
No comments:
Post a Comment