There is a lot of data on the Internet, in fact as a source of data it’s the biggest in the world. Many business and individuals want to use this information in their projects, ranging from extracting competitors prices to taking a snapshot of stock quotes at a particular time. The problem is that this data is stored in an unstructured way making it difficult for computers to read and interpret automatically.
This was the problem GrabzIt was founded to solve. Our first priority was to allow people to take an exact copy of a web page by providing a service to screenshot websites as Images or PDF documents, allowing a copy of entire web page to be captured instantly.
Our next priority was to extract data from within a web page, to do this we created a service that converts web pages containing HTML tables into CSV or Excel documents. This would allow computer software to easily read HTML tables and for users to capture snapshots of HTML tables, which is useful for getting historical data on subjects like football scores.
However this doesn’t provide the flexibility that many users desired. So we created the Web Scraper. This is a highly flexible tool that can extract data from any web page or PDF document by crawling over a website and extracting data as it goes. In fact it is so powerful that not only can it do things like extract text, images, links and files from websites. It can even extract text from images, check that a link is valid or take screenshots of every page on a website.
To do this a user must specify what data to extract, most of which can be done through an online wizard. Once the Web Scraper has extracted the data it then puts it in a structured format which is crucial for a computer to be able to read it. This ranges from CSV, Excel and HTML documents to a SQL script, which allows the data to be loaded directly into a database.
While many are impressed by what we do, GrabzIt wants to go further and aims to make the web fully machine readable just like any other data source.