Data journalism
The ability to work with data is an increasingly important part of a journalist’s armoury.
The ability to work with data is an increasingly important part of a journalist’s armoury.
Almost any interesting use of data will combine data from different sources. To do this it is necessary to ensure that the different datasets are compatible: they must use the same names for the same objects, the same units or co-ordinates, etc. If the data quality is good this process of data integration may be straightforward but if not it is likely to be arduous.
A large amount of data transferred from one system or location to another.
Source: EU OD
A crawler is a programme that visits web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the web all have such a programme, which is also known as a ‘spider’ or a ‘bot’.
Datasets are created by collecting data in different ways: from manual or automatic measurements (e.g. weather data), surveys (census data), records of decisions (budget data) or ongoing transactions (spending data), aggregation of many records (crime data), mathematical modelling (population projections), etc.
A hub for data discovery which provides a common location that lists and links to an organization’s datasets. Such a hub is often located at www.example.com/data.
Source: US OD
A collection of data elements or datasets that make sense to group together. Each community of interest identifies the Data Assets specific to supporting the needs of their respective mission or business functions. Notably, a Data Asset is a deliberately abstract concept. A given Data Asset may represent an entire database consisting of multiple distinct entity classes, or may represent a single entity class.
A system that allows outsiders to be granted access to databases without overloading either system.
Source: ODH
Data may be thought of as unprocessed atomic statements of fact. It very often refers to systematic collections of numerical information in tables of numbers such as spreadsheets or databases. When data is structured and presented so as to be useful and relevant for a particular purpose, it becomes information available for human apprehension.
Catalog Service for the Web (CSW) is an API used by geospatial systems to provide metadata in open standards, including in the FGDC-endorsed ISO 19115 schema. The CSW-provided metadata can be mapped into the Project Open Data metadata schema.
Source: US OD