Compare Products

80legs App Apache Nutch App

Features

* It gives you the ability to customize your crawl by providing a handful of options that specific how your crawl will run. * Your URLs are run through a variety of sanity checks to make sure they can be crawled. If they pass, their sent to a URL queue, where they’ll be wait to be picked up. 80legs will automatically rate-limit how fast you crawl certain URLs so your crawl doesn’t overwhelm any websites. This is one of the ways we make sure your web crawl doesn’t get blocked by anyone. * Your URLs, along with the 80app, are sent out to our massive pool of crawling nodes. Each crawling node will fetch the HTML content of a URL, run the 80app on that HTML, and return the resulting data to 80legs. This massive collection of crawling nodes is a key reason 80legs can provide such amazingly-fast web crawling. * As your crawl runs, the results from each URL crawled will be packed up and delivered to your account, where they’ll wait for you to download them.

Features

* Fetching and parsing are done separately by default, this reduces the risk of an error corrupting the fetch parse stage of a crawl with Nutch. * Plugins have been overhauled as a direct result of removal of legacy Lucene dependency for indexing and search. * The number of plugins for processing various document types being shipped with Nutch has been refined. Plain text, XML, OpenDocument (OpenOffice.org), Microsoft Office (Word, Excel, Powerpoint), PDF, RTF, MP3 (ID3 tags) are all now parsed by the Tika plugin. The only parser plugins shipped with Nutch now are Feed (RSS/Atom), HTML, Ext, JavaScript, SWF, Tika & ZIP. * Distributed filesystem (via Hadoop) * Link-graph database * NTLM authentication

Languages

Other

Languages

Other

Source Type

Closed

Source Type

Open

License Type

Proprietary

License Type

Apache

OS Type

OS Type

Pricing

  • Free – Up to 10,000 URLs Intro Plan – 29.00 USD per month (Up to 100,000 URLs) Plus Plan – 99.00 USD per month (Up to 1,000,000 URLs) Premium Plan – 299.00 USD per month (Up to 10,000,000 URLs)

Pricing

  • Free Trial No Card, By Quotation
X

Compare Products

Select up to three two products to compare by clicking on the compare icon () of each product.

{{compareToolModel.Error}}

Now comparing:

{{product.ProductName | createSubstring:25}} X
Compare Now