Home > {{category.CategoryName}} > Apache Nutch
Apache Nutch Scraping App

Apache Nutch

by Apache

A well-matured, production-ready Web Crawler.
Helps with: Scraping
Similar to: Web-Scraping-SDK App Event Registry news API App Screen Scraping App Beautiful Soup App More...
Source Type: Open
License Types:
Supported OS:
Languages: Other

What is it all about?

Apache Nutch is a highly extensible and scalable open source web crawler software project.

Key Features

* Fetching and parsing are done separately by default, this reduces the risk of an error corrupting the fetch parse stage of a crawl with Nutch. * Plugins have been overhauled as a direct result of removal of legacy Lucene dependency for indexing and search. * The number of plugins for processing various document types being shipped with Nutch has been refined. Plain text, XML, OpenDocument (OpenOffice.org), Microsoft Office (Word, Excel, Powerpoint), PDF, RTF, MP3 (ID3 tags) are all now parsed by the Tika plugin. The only parser plugins shipped with Nutch now are Feed (RSS/Atom), HTML, Ext, JavaScript, SWF, Tika & ZIP. * Distributed filesystem (via Hadoop) * Link-graph database * NTLM authentication


Pricing

Yearly
Monthly
Lifetime
Free
Freemium
Trial With Card
Trial No Card
By Quote

Description

Free Trial No Card,
By Quotation

Alternatives

View More Alternatives

View Less Alternatives

Top DiscoverSDK Experts

User photo
500
Gábor László Hajba
Well-grounded software developer
Data Handling | Web and 17 more
View Profile
User photo
200
Noor Khan
Senior Software Engineer (Web)
GUI | Data Handling and 17 more
View Profile
User photo
60
Billy Joel Ranario
Full Stack Web Developer and Article Writer
GUI | Data Handling and 31 more
View Profile
User photo
20
Jeamar Paul Libres
Software Engineer, Web Developer, Android Developer
GUI | Web and 15 more
View Profile
Show All

Interested in becoming a DiscoverSDK Expert? Learn more

X

Compare Products

Select up to three two products to compare by clicking on the compare icon () of each product.

{{compareToolModel.Error}}

Now comparing:

{{product.ProductName | createSubstring:25}} X
Compare Now