Compare Products

Darcy Ripper App Scrapy App

Features

* Darcy Ripper is fully implemented in Java, thus it can be run on any Java enabled machine. * Darcy Ripper provides a large amount of configuration settings you can specify for your download process, in order to obtain exactly the web resources you desire. * Possibility of resuming web resources download (in the same or in different download sessions); * Request cookie header support; * Basic WWW authentication support; * Filtering requests based on status codes and/or regular expressions matching; * Filtering responses based on their content-type and/or overall content; * Visualization of the download statistics or downloads history. * Darcy Ripper makes it possible for you to view every step of your download process. This means that you can visualize any URL that is being accessed or any resource that has been processed/downloaded. Unlike most any other tools, this feature makes it possible for you to notice if something is not working as you expected it and you may stop the process and remedy the issue. * Besides the real-time presentation of the download process, Darcy is able to remember and offer to you statistics regarding all your download processes.

Features

* Fast and powerful - write the rules to extract the data and let Scrapy do the rest. * Easily extensible - extensible by design, plug new functionality easily without having to touch the core. * Portable, Python - written in Python and runs on Linux, Windows, Mac and BSD. * Built-in support for selecting and extracting data from HTML/XML sources using extended CSS selectors and XPath expressions, with helper methods to extract using regular expressions. * An interactive shell console (IPython aware) for trying out the CSS and XPath expressions to scrape data, very useful when writing or debugging your spiders. * Built-in support for generating feed exports in multiple formats (JSON, CSV, XML) and storing them in multiple backends (FTP, S3, local filesystem) * Robust encoding support and auto-detection, for dealing with foreign, non-standard and broken encoding declarations. * Strong extensibility support, allowing you to plug in your own functionality using signals and a well-defined API (middlewares, extensions, and pipelines). * Wide range of built-in extensions and middlewares for handling: cookies and session handling HTTP features like compression, authentication, caching, user-agent spoofing, robots.txt, crawl depth restriction * A Telnet console for hooking into a Python console running inside your Scrapy process, to introspect and debug your crawler Plus other goodies like reusable spiders to crawl sites from Sitemaps and XML/CSV feeds, a media pipeline for automatically downloading images (or any other media) associated with the scraped items, a caching * DNS resolver, and much more!

Languages

Other

Languages

Python

Source Type

Closed

Source Type

Open

License Type

Proprietary

License Type

Proprietary

OS Type

OS Type

Pricing

  • FREE

Pricing

  • free
X

Compare Products

Select up to three two products to compare by clicking on the compare icon () of each product.

{{compareToolModel.Error}}

Now comparing:

{{product.ProductName | createSubstring:25}} X
Compare Now