semanti.ca Web Article Data Extraction
What is it all about?
The goal of semanti.ca (pronounced seh-man-tee-kah) is to make the information on the Web accessible in its pure form. We build an AI-powered technology that looks at web pages and sees the information they contain. The modern Web pages are noisy, user interface fashion and technology are constantly changing, but semanti.ca keeps bringing you clean, normalized and organized information from noisy and ever-changing Web.
semanti.ca is an AI-powered scalable web article data extraction API. To extract data, semanti.ca loads a web article in a browser and reads it, just like humans do. semanti.ca accurately recognizes titles, headlines, published and updated dates, images, captions, tags. It extracts the content text and the HTML code, by ignoring advertisements, the design elements, and any other text or image not related to the main content. semanti.ca is not tailored to some specific website user interface designs or technology. It is trained on millions of web pages and is capable of recognizing relevant elements on the web page, independently of how the web page was built. It actually "looks" at the web pages and recognizes the content based on a statistical model learned from data. Furthermore, semanti.ca classifies the extracted content based on the IPTC Media Topics Taxonomy and extracts key phrases from the text. This helps our users to organize the extracted content.