Compare Products

80legs App Scrapy App

Features

* It gives you the ability to customize your crawl by providing a handful of options that specific how your crawl will run. * Your URLs are run through a variety of sanity checks to make sure they can be crawled. If they pass, their sent to a URL queue, where they’ll be wait to be picked up. 80legs will automatically rate-limit how fast you crawl certain URLs so your crawl doesn’t overwhelm any websites. This is one of the ways we make sure your web crawl doesn’t get blocked by anyone. * Your URLs, along with the 80app, are sent out to our massive pool of crawling nodes. Each crawling node will fetch the HTML content of a URL, run the 80app on that HTML, and return the resulting data to 80legs. This massive collection of crawling nodes is a key reason 80legs can provide such amazingly-fast web crawling. * As your crawl runs, the results from each URL crawled will be packed up and delivered to your account, where they’ll wait for you to download them.

Features

* Fast and powerful - write the rules to extract the data and let Scrapy do the rest. * Easily extensible - extensible by design, plug new functionality easily without having to touch the core. * Portable, Python - written in Python and runs on Linux, Windows, Mac and BSD. * Built-in support for selecting and extracting data from HTML/XML sources using extended CSS selectors and XPath expressions, with helper methods to extract using regular expressions. * An interactive shell console (IPython aware) for trying out the CSS and XPath expressions to scrape data, very useful when writing or debugging your spiders. * Built-in support for generating feed exports in multiple formats (JSON, CSV, XML) and storing them in multiple backends (FTP, S3, local filesystem) * Robust encoding support and auto-detection, for dealing with foreign, non-standard and broken encoding declarations. * Strong extensibility support, allowing you to plug in your own functionality using signals and a well-defined API (middlewares, extensions, and pipelines). * Wide range of built-in extensions and middlewares for handling: cookies and session handling HTTP features like compression, authentication, caching, user-agent spoofing, robots.txt, crawl depth restriction * A Telnet console for hooking into a Python console running inside your Scrapy process, to introspect and debug your crawler Plus other goodies like reusable spiders to crawl sites from Sitemaps and XML/CSV feeds, a media pipeline for automatically downloading images (or any other media) associated with the scraped items, a caching * DNS resolver, and much more!

Languages

Other

Languages

Python

Source Type

Closed

Source Type

Open

License Type

Proprietary

License Type

Proprietary

OS Type

OS Type

Pricing

  • Free – Up to 10,000 URLs Intro Plan – 29.00 USD per month (Up to 100,000 URLs) Plus Plan – 99.00 USD per month (Up to 1,000,000 URLs) Premium Plan – 299.00 USD per month (Up to 10,000,000 URLs)

Pricing

  • free
X

Compare Products

Select up to three two products to compare by clicking on the compare icon () of each product.

{{compareToolModel.Error}}

Now comparing:

{{product.ProductName | createSubstring:25}} X
Compare Now