Guest User

Untitled

a guest
Apr 21st, 2018
65
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.17 KB | None | 0 0
  1. Arachnid is a Web Crawler Framework written in Java that contains a simple HTML parser to search for links, images and other content within HTML files or websites. It is a framework, not a complete program. It is designed to be expanded upon and because it is abstractly defined, some methods have to be defined to do certain tasks. For example the handleLink() method requires it to be overloaded by another method, as it has to do different things depending on the application. If you want the application to print out a site map, in other words a list of every page on the site, the handleLink() method will be defined differently than if it was listing all images on the site.
  2.  
  3. It will only accept text/html files that are formatted in a specific way due to the algorithm it uses. If it varies to any extent it most likely will not work. This is a limitation of the algorithm and shows the age of the framework. It was released in 2002, and the last update was in 2009 meaning it is, as of now, an inactive project. Despite not being that old, it is not a very well designed web crawler framework for modern web pages due to the variety of styles and formats that are common today.
Add Comment
Please, Sign In to add comment