Compare Products
|
|
|
Features * Fetching and parsing are done separately by default, this reduces the risk of an error corrupting the fetch parse stage of a crawl with Nutch.
* Plugins have been overhauled as a direct result of removal of legacy Lucene dependency for indexing and search.
* The number of plugins for processing various document types being shipped with Nutch has been refined. Plain text, XML, OpenDocument (OpenOffice.org), Microsoft Office (Word, Excel, Powerpoint), PDF, RTF, MP3 (ID3 tags) are all now parsed by the Tika plugin. The only parser plugins shipped with Nutch now are Feed (RSS/Atom), HTML, Ext, JavaScript, SWF, Tika & ZIP.
* Distributed filesystem (via Hadoop)
* Link-graph database
* NTLM authentication
|
Features * Darcy Ripper is fully implemented in Java, thus it can be run on any Java enabled machine.
* Darcy Ripper provides a large amount of configuration settings you can specify for your download process, in order to obtain exactly the web resources you desire.
* Possibility of resuming web resources download (in the same or in different download sessions);
* Request cookie header support;
* Basic WWW authentication support;
* Filtering requests based on status codes and/or regular expressions matching;
* Filtering responses based on their content-type and/or overall content;
* Visualization of the download statistics or downloads history.
* Darcy Ripper makes it possible for you to view every step of your download process. This means that you can visualize any URL that is being accessed or any resource that has been processed/downloaded. Unlike most any other tools, this feature makes it possible for you to notice if something is not working as you expected it and you may stop the process and remedy the issue.
* Besides the real-time presentation of the download process, Darcy is able to remember and offer to you statistics regarding all your download processes.
|
LanguagesOther |
LanguagesOther |
Source TypeOpen
|
Source TypeClosed
|
License TypeApache |
License TypeProprietary |
OS Type |
OS Type |
Pricing
|
Pricing
|
X
Compare Products
Select up to three two products to compare by clicking on the compare icon () of each product.
{{compareToolModel.Error}}Now comparing:
{{product.ProductName | createSubstring:25}} X