Extracting data from a non-machine-readable source, such as a website or a PDF document, and creating structured data from the result. Screen-scraping a dataset requires dedicated programming and is expensive in programmer time, so is generally done only after all other attempts to get the data in structured form have failed. Legal questions may arise about whether the scraping breaches the source website’s copyright or terms of service. Source: ODH

The process of extracting data in machine-readable formats of non-pure data sources, for example webpages or PDF documents. Often prefixed with the source (web scraping, PDF scraping).

