A legal requirement for public bodies to provide data held by them to citizens on request as well as proactively, unless a specific exemption applies, e.g. the data is confidential for the reasons of national security, privacy, market competition or similar. Information obtained under access to information law is not automatically considered open data, unless it is delivered in a machine-readable format and under an open licence. In many of the EU countries the right of access to information (documents) is considered to be a constitutional rank right which is protected by the independent redress mechanism and/or courts. In Anglo-Saxon countries the term is freedom of information (USA, UK etc.).

The asset description metadata schema is a vocabulary to describe interoperability assets making it possible for ICT developers to explore and search for interoperability assets. ADMS allows public administrations, businesses, standardisation bodies and academia to:

  • describe semantic assets in a common way so that they can be seamlessly cross-queried and discovered by information and communications technology (ICT) developers from a single access point;
  • search, identify, retrieve and compare semantic assets to be reused, avoiding duplication and expensive design work through a single point of access;
  • keep their own system for documenting and storing semantic assets;
  • improve indexing and visibility of their own assets;
  • link semantic assets to one another in cross-border and cross-sector settings.

Source: https://joinup.ec.europa.eu/asset/adms/description, cited by EU OD.

Processing data that includes personal information so that individuals can no longer be identified in the resulting data. Anonymisation enables data to be published without breaching data protection principles. The principal techniques are aggregation and de-identification. Care must be taken to avoid data leakage that would result in individuals’ privacy being compromised. Source: ODH.

Under EU rules anonymisation means the process of changing documents into anonymous documents which do not relate to an identified or identifiable natural person, or the process of rendering personal data anonymous in such a manner that the data subject is not or no longer identifiable.

Source: OD Directive

An application programming interface, which is a set of definitions of the ways one piece of computer software communicates with another. It is a method of achieving abstraction, usually (but not necessarily) between higher-level and lower-level software.

Source: USOD

A way computer programmes talk to one another. Can be understood in terms of how a programmer sends instructions between programmes.  For data, this is usually a way provided by the data publisher for programs or apps to read data directly over the web. The app sends the API a query asking for the specific data it needs, e.g. the time of the next bus leaving a particular stop. This allows the app to use the data without downloading the whole dataset, saving bandwidth and ensuring that the data used is the most up-to-date available.

Source: ODH.

Rate limiting will be part of any API platform, without some sort of usage log and analytics showing developers where they stand, the rate limits will cause nothing but frustration. Clearly show developers where they are at with daily, weekly or monthly API usage and provide proper relief valves allowing them to scale their usage properly.

Source: US OD

Quality API documentation is the gateway to a successful API. API documentation needs to be complete, yet simple–a very difficult balance to achieve. This balance takes work and will take the work of more than one individual on an API development team to make happen.

API documentation can be written by developers of the API, but additional edits should be made by developers who were not responsible for deploying the API. As a developer, it’s easy to overlook parameters and other details that developers have made assumptions about.

Source: US OD

A piece of software (short for ‘application’), especially one designed to run on the web or on mobile phones and similar platforms. Apps can make network connections to large databases and thus be a powerful way of consuming open data, which may be real-time, personalised, and (using a mobile phone’s GPS) location-specific information. Crowdsourcing apps can also be used to build or improve datasets.

Source: ODH.

Complete, functioning applications built on an API is the end goal of any API owner. Make sure and showcase all applications that are built on an API using an application showcase or directory. App showcases are a great way to showcase not just applications built by the API owner, but also showcase the successful integrations of ecosystem partners and individual developers.

Source: US OD

A specification that re-uses terms from one or more base standards, adding more specificity by identifying mandatory, recommended and optional elements to be used for a particular application, as well as recommendations for controlled vocabularies to be used. 

Source: https://data.europa.eu/euodp/en/developerscorner

Contains information on all databases public body holds and maintains with key information (metadata), including published and unpublished datasets. It is an obligation of the public sector bodies to publish asset lists on their websites.  See also Information Asset Register.

Acknowledging the source of data when using or re-publishing it. A data licence permitting the data to be used may include a requirement to attribute the source. Data subject to this restriction may still be considered open data according to the Open Definition.

Source: ODH.

A Creative Commons Licence that lets others distribute, remix, adapt, and build upon data, even commercially, as long as they credit the source for the original creation. This is the most accommodating of licenses offered. Recommended for maximum dissemination and use of licensed materials.

Source: Creative Commons

A Creative Commons Licence that lets others remix, adapt, and build upon your work even for commercial purposes, as long as they credit the author and license their new creations under the identical terms. This license is often compared to “copyleft” free and open source software licenses. All new works based on the source will carry the same license, so any derivatives will also allow commercial use. For example, this is the license used by Wikipedia.

Source: Creative Commons

A Creative Commons Licence that lets others reuse the work for any purpose, including commercially; however, it cannot be shared with others in adapted form, and credit must be provided to the author.

Source: Creative Commons.

A CreativeCommons Licence that lets others remix, adapt, and build upon the data non-commercially, and although their new works must also acknowledge the source and be non-commercial, they don’t have to license their derivative works on the same terms.

Source: Creative Commons.

A Creative Commons Licence that lets others remix, adapt, and build upon data non-commercially, as long as they credit the source and license their new creations under the identical terms.

Source: Creative Commons.

A Creative Commons Licence that is the most restrictive of the CC six main licenses, only allowing others to download data and share them with others as long as they credit the source, but they can’t change them in any way or use them commercially.

Source: Creative Commons.


The rate at which data can be transferred between computers. As bandwidth is limited, apps aim to download only the minimum amount of data needed to fulfil a user’s request.

Source: ODH.

A collection of data so large that it cannot be stored, transmitted or processed by traditional means. The increasing availability of and need to process such datasets (for example, huge collections of weather or other scientific data) has led to the development of specialised computer technologies, architectures and programming languages.

Source: ODH

BitTorrent is a protocol for distributing the bandwidth for transferring very large files between the computers which are participating in the transfer. Rather than downloading a file from a specific source, BitTorrent allows peers to download from each other.

Source: ODH.

A download containing files from multiple collections that can be retrieved at once. Data is available in bulk if the entire dataset can be downloaded easily and efficiently to a user’s own system. Conversely it is non-bulk if one is limited to getting small parts of the dataset, for example, are you restricted to a few elements of the data at a time and therefore require thousands or millions of requests to get the entire dataset. The provision of bulk access is a requirement of open data.

Source: ODH.


A catalog is a collection of datasets or web services.

Source: US OD


“No Rights Reserved” licence by Creative Commons. CC0 enables scientists, educators, artists and other creators and owners of copyright- or database-protected content to waive those interests in their works and thereby place them as completely as possible in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restriction under copyright or database law. Source: https://creativecommons.org/share-your-work/public-domain/cc0/

Building tools and communities, usually online, that address particular civic or social problems. Examples could be tools that help users meet like-minded people locally based on particular interests, report broken infrastructure to their local council, or collaborate to clear litter from their neighbourhood. Local-level open data is particularly useful for civic hacking projects.

Source: ODH

An open-source software platform for creating data portals, built and maintained by Open Knowledge. CKAN is used as the official data-publishing platform of around 20 national governments and powers many more local, community, scientific and other data portals. Notable features are configurable metadata, user-friendly web interface for publishers and data users, data preview, organisation-based authorisation levels, and APIs giving access to all features as well as data access. Source: ODH

A data management system that makes data accessible by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organisations) working to make their data open and available.

Source: http://ckan.org/

Data stored ‘in the cloud’ is handled by a hosting company, relieving the data owner of the need to manage its physical storage. Instead of being stored on a single machine, it may be stored across or moved between multiple machines in different locations, but the data owner and users do not need to know the details. The hosting company is responsible for keeping it available and accessible via the internet.

Source: ODH

Working code samples in all the top programming languages are common place in the most successful APIs. Documentation will describe in a general way, how to use an API, but code samples will speak in the specific language of developers.

Source: US OD

Connectivity relates to the ability for communities to connect to the Internet, especially the World Wide Web.

Source: ODH

A web service that provides dynamic access to the page content of a website, includes the title, body, and body elements of individual pages. Such an API often but not always functions atop a Content Management System.

Source: US OD

The process of automatically reading data in one file format and emitting the same data in a different format, thus making the data accessible to a wider range of applications.

Source: ODH

A legal right over intellectual property (e.g. a book) belonging to the creator of the work. While individual data (facts) cannot be copyright, a database will in general be covered by copyright protecting the selection and arrangement of data within it. Within the European Union separate ‘database rights’ protect a database where there was a substantial effort in ‘obtaining’ the data. A copyright holder may use a licence to grant other people rights in the protected material, perhaps subject to specified restrictions.

Source: ODH

The European Commission’s primary public repository and portal to disseminate information on all EU-funded research projects and their results.

Source: http://cordis.europa.eu/home_en.html

The principle of setting a price for a resource, e.g. data, aiming to recover the cost of collecting the data, as distinct from marginal cost. In the EU it is for public sector information allowed only exceptionally, as determined by the PSI/OD Directive.

A non-profit organisation founded in 2001 that promotes re-usable content by publishing a number of standard licences, some of them open (though others include a non-commercial clause), that can be used to release content for re-use, together with clear explanations of their meaning. Website: https://creativecommons.org/

Source: ODH

A set of open standard licences determined by the Creative Commons organisation. See https://creativecommons.org/licenses/

A model in which individuals and organisations obtain goods and services (ideas, money) from a large relatively open and often rapidly evolving group of internet users. It divides work between participants to achieve cumulative result. The term was coined 2006 (crowd plus sourcing), although it existed as a model before digital age. Source: https://en.wikipedia.org/wiki/Crowdsourcing

Dividing the work of collecting a substantial amount of data into small tasks that can be undertaken by volunteers.

Source: ODH

 ‘Comma-separated values’ is a standard format for spreadsheet data. Data is represented in a plain text file, with each data row on a new line and commas separating the values on each row. As a very simple open format it is easy to consume and is widely used for publishing open data. Source: ODH

It is often used to exchange data between differently similar applications. The CSV file format is useable by KSpread, OpenOffice Calc and Microsoft Excel spreadsheet applications. Many other applications support CSV to import or export data. Source: http://edoceo.com/utilitas/csv-file-format, cited by EU OD

It is a computer data file used for implementing the tried and true organizational tool, the Comma Separated List. The CSV file is used for the digital storage of data structured in a table of lists form. Each line in the CSV file corresponds to a row in the table. Within a line, fields are separated by commas, and each field belongs to one table column. CSV files are often used for moving tabular data between two different computer programs (like moving between a database program and a spreadsheet program).

Source: US OD

Catalog Service for the Web (CSW) is an API used by geospatial systems to provide metadata in open standards, including in the FGDC-endorsed ISO 19115 schema. The CSW-provided metadata can be mapped into the Project Open Data metadata schema.

Source: US OD


Data may be thought of as unprocessed atomic statements of fact. It very often refers to systematic collections of numerical information in tables of numbers such as spreadsheets or databases. When data is structured and presented so as to be useful and relevant for a particular purpose, it becomes information available for human apprehension.

Source: ODH

A value or set of values representing a specific concept or concepts. Data become “information” when analysed and possibly combined with other data in order to extract meaning, and to provide context. The meaning of data can vary depending on its context. Data includes all data. It includes, but is not limited to, 1) geospatial data 2) unstructured data, 3) structured data, etc.

Source: US OD

A system that allows outsiders to be granted access to databases without overloading either system.

Source: ODH

A collection of data elements or datasets that make sense to group together. Each community of interest identifies the Data Assets specific to supporting the needs of their respective mission or business functions. Notably, a Data Asset is a deliberately abstract concept. A given Data Asset may represent an entire database consisting of multiple distinct entity classes, or may represent a single entity class.

Source: US OD

A hub for data discovery which provides a common location that lists and links to an organization’s datasets. Such a hub is often located at www.example.com/data.

Source: US OD

Datasets are created by collecting data in different ways: from manual or automatic measurements (e.g. weather data), surveys (census data), records of decisions (budget data) or ongoing transactions (spending data), aggregation of many records (crime data), mathematical modelling (population projections), etc.

Source: ODH

A crawler is a programme that visits web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the web all have such a programme, which is also known as a ‘spider’ or a ‘bot’. Source: http://searchsoa.techtarget.com/definition/crawler; cited by EU OD

When extracting data from the web, the term ‘crawling’ is often also referred to as ‘data scraping’ or ‘harvesting’. There is a difference between these terms: crawling refers to dealing with large datasets where one can develop his or her own crawlers (or bots), which crawl to the deepest parts of the web pages. Data scraping, on the other hand, refers to retrieving information from any source (not necessarily the Web). Source: https://www.promptcloud.com/blog/data-scraping-vs-data-crawling/; cited by EU OD

A large amount of data transferred from one system or location to another.

Source: EU OD

Almost any interesting use of data will combine data from different sources. To do this it is necessary to ensure that the different datasets are compatible: they must use the same names for the same objects, the same units or co-ordinates, etc. If the data quality is good this process of data integration may be straightforward but if not it is likely to be arduous. A key aim of linked data is to make data integration fully or nearly fully automatic. Non-open data is a barrier to data integration, as obtaining the data and establishing the necessary permission to use it is time-consuming and must be done afresh for each dataset.

Source: ODH

The ability to work with data is an increasingly important part of a journalist’s armoury. Skills needed to research and tell a good data-based story include finding relevant data, data cleaning, exploring or mining the data to understand what story it is telling, and creating good visualisations.

Source: ODH

If personal data has been imperfectly anonymised, it may be possible by piecing it together (perhaps with data available from other sources) to reconstruct the identity of some data subjects together with personal data about them. The personal data, which should not have been published (see data protection), may be said to have ‘leaked’ from the ‘anonymised’ data. Other kinds of confidential data can also be subject to leakage by, for example, poor data security measures. See de-identification.

Source: ODH

The policies, procedures, and technical choices used to handle data through its entire lifecycle from data collection to storage, preservation and use. A data management policy should take account of the needs of data quality, availability, data protection, data preservation, etc.

Source: ODH

The practice of examining large pre-existing databases in order to generate new information. ‘For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyse local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.’

Source: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm , cited by EU OD

Breaking a data block into smaller chunks by following a set of rules so that it can be more easily interpreted, managed or transmitted by a computer.

Source: http://www.businessdictionary.com/definition/parsing.html cited by EU OD

A web platform for publishing data. The aim of a data portal is to provide a data catalogue, making data not only available but discoverable for data users, while offering a convenient publishing workflow for publishing organisations. Typical features are web interfaces for publishing and for searching and browsing the catalogue, machine interfaces (APIs) to enable automatic publishing from other systems, and data preview and visualisation.

Source: ODH

Mandated by the General Data Protection Regulation, DPbDD is a core obligation of data controllers and data processors to ensure effective implementation of data protection principles and data subjects’ rights and freedoms. The controllers are required to implement appropriate technical and organisational measures and necessary safeguards and are obliged to demonstrate the effectiveness of implemented measures.

Sources: EDPB Guidelines 4/2019 on Article 25 Data Protection by Design and by Default Adopted on 13 November 2019

An act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data. Long-term preservation of datasets is a challenge owing to uncertainty about the future of file formats, computer architectures, storage media and network connectivity. Projects that put particular stress on data preservation take a variety of approaches to solving these problems.

Sources: ODH and http://ifdo.org/preservation/

A measure of the usefulnessuseableness of data. An ideal dataset is accurate, complete, timely in publication, consistent in its naming of items and its handling of e.g. missing data, and directly machine-readable (see data cleaning), conforms to standards of nomenclature in the field, and is published with sufficient metadata that users can easily understand, for example, who it is published by and the meaning of the variables in the dataset. Source: ODH

From a user perspective, data quality can be summarised as fitness for use/ purpose. This  not only applies to the technical accessibility of a dataset, but also to its legal accessibility (can the dataset be used from a legal perspective), the financial accessibility (can the user afford to pay the price), and intellectual access (does the user understand/ is intellectually capable of using the dataset). ISO defined quality as “the totality of characteristics of an entity that bears its ability to satisfy stated and implied needs” (ISO 8402, 1994; see also Strong D.M., Lee Y.W., Wang R.Y. Data quality in context. Commun. ACM. 1997;40:103–110. doi: 10.1145/253769.253804).

Identified or identifiable natural person, a natural person who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

Source: General Data Protection Regulation, Article 4.1.1.

A person converting data into a usable form so that they can be easily used with automated or semi-automated tools. Data wrangling may include further data cleaning.

Sources: ODH

Any organised collection of data may be considered a database. In this sense the word is synonymous with dataset. It is a collection of data stored according to a schema and manipulated according to the rules set out in one Data Modelling Facility. Sources: ODH and USOD

Another meaning relates to a software system for processing and managing data, including features to extend or update, transform and query the data. Examples are the open source PostgreSQL, and the proprietary Microsoft Access. Source: ODH

A right to prevent or restrict others from extracting and reusing content from a database. In the EU it is regulated by a special piece of legislation – Directive 96/9/EC on the legal protection of databases.

A collection of related sets of data that is composed of separate elements but that can be manipulated as a unit and accessed or downloaded in one or more formats. Source: EU OD

A collection of data, published or curated by a single source, and available for access or download in one or more formats. Source: https://data.europa.eu/euodp/en/developerscorner

Any organised collection of data. ‘Dataset’ is a flexible term and may refer to an entire database, a spreadsheet or other data file, or a related collection of data resources. Source: ODH

The most basic representation of a dataset is data elements presented in tabular form. Each column represents a particular variable. Each row corresponds to a given value of that column’s variable. A dataset may also present information in a variety of non-tabular formats, such as an extensible mark-up language (XML) file, a geospatial data file, or an image file, etc.

Source: US OD

Data catalogue vocabulary. RDF vocabulary for interoperability of data catalogues.

Source: http://www.w3.org/TR/vocab-dcat ; cited by EU OD

DCAT application profile. A common vocabulary for describing datasets hosted in data portals in Europe, based on the DCAT.

Source: https://joinup.ec.europa.eu/asset/dcat_application_profile/description, cited by EU OD

Dublin core metadata initiative, an open organisation supporting innovation in metadata design and best practices across the metadata ecology.

Source: http://dublincore.org/

Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts by using multiple layers to progressively extract higher level features from the raw input. Because the computer gathers knowledge from experience, there is no need for a human computer operator formally to specify knowledge needed by the computer. Source: Deng, L.; Yu, D. (2014). "Deep Learning: Methods and Applications" . Foundations and Trends in Signal Processing. 7 (3–4): 1–199

A form of anonymisation where personal records are kept intact but specific identifying information, such as names, are replaced with anonymous identifiers. Compared to aggregation, de-identification carries a greater risk of data leakage: for example, if prison records include a prisoner’s criminal record and medical history, the prisoner could in many cases be identified even without their name by their criminal record, giving unauthorised access to their medical history. In other cases, this risk is absent, or the value of the un-aggregated data is so great that it is worth making de-identified data available subject to carefully designed safeguards.

Source: ODH

The Digital Europa Thesaurus (DET) is a multilingual thesaurus covering the main subject matters of the European Commission's public communications. It has been designed to describe and index web content from across the European Commission so that this content can be retrieved, aggregated, and managed. The thesaurus is maintained by DG COMM.

Source: https://op.europa.eu/en/web/eu-vocabularies/det

An ordinary table or spreadsheet can easily represent two data dimensions: each data point has a row and a column. Plenty of real-world data has more dimensions, however: for example, a dataset of Earth surface temperature varying with position and time (two co-ordinates are required to specify the position on earth, e.g. latitude and longitude, and one to specify the time).

Source: ODH

It is not enough for open data to be published if potential users cannot find it, or even do not know that it exists. Rather than simply publishing data haphazardly on websites, governments and other large data publishers can help make their datasets discoverable by indexing them in catalogues or data portals.

Source: ODH

Digital Object Identifier, an identifier for a digital object (such as a document or dataset) that is assigned by a central registry and is therefore guaranteed to be a globally unique identifier: no two digital objects in the world will have the same DOI.

Source: ODH

In a broader sense, it is a set of rules that defines a grammar for a class of documents, namely SGML-based documents (XML, HTML, …), thus being important for interoperability and data exchange. DTD specification, being an older format lacking important functionalities, is in general superseded by the newer XML Schema format.

Documents in digital form, subject to frequent or real-time updates, in particular because of their volatility or rapid obsolescence. Data generated by sensors are typically considered to be dynamic data.

Source: OD Directive.

An association between a binding and a network address, specified by a URI, that may be used to communicate with an instance of a service. An end point indicates a specific location for accessing a service using a specific protocol and data format.

Source: US OD


European legislation identifier, allowing to uniquely identify and access national and European legislation online and to guarantee easier access, exchange and reuse of legislation for public authorities, professional users, academics and citizens. ELI paves the way for a semantic web of legal gazettes and official journals. 

Source: https://en.wikipedia.org/wiki/European_Legislation_Identifier cited by EU OD

Errors are an inevitable part of API integration, and providing not only a robust set of clear and meaningful API error response codes, but a clear listing of these codes for developers to follow and learn from is essential.

API errors are directly related to frustration during developer integration, the more friendly and meaningful they are, the greater the chance a developer will move forward after encountering an error. Put a lot of consideration into your error responses and the documentation that educates developers.

Source: US OD

The Directive on open data and the re-use of public sector information 2019/1024/EU is a recent legislative piece on open data which replaces the former PSI Directive of 2003 and 2013. The Directive included the term ‘open data’ in its title, and broadened the scope of application onto public enterprises. It introduced the concept of high-value datasets, insisting on the use of APIs and dynamic data. The OD Directive extends also to researchreserch data albeitabeit in different regimesregime. The OD Directive came into force in July 2019, and has to be transposed by the member states byuntil July 2021.

Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE) entered into force in May 2007. It aims at supporting European Community environmental policies, and policies or activities which may have an impact on the environment by, among others,  adopting EU Regulations on Metadata, Data Specifications, Network Services, Data and Service Sharing and Monitoring and Reporting. Together with the PSI Directive its spirit can be regarded as one of the first European laws to account for data user needs.

The Directive on the re-use of public sector information, 2003/98/EC, amended by 2013/37/EU is the first EU legislation on the reuse of public sector information which established a requirement to ensure that public sector publishes its open data in machine readable format, and that it handles the requests of users, with a possibility of a redress before an independent authority and/or the court. It restricted the possibilities for discrimination among users and for the granting the exclusive rights. It also requires of all member states to establish ‘practical arrangements’ for publication of open data, what in practice led to the development of the member states’ data portals and the European open data portal that connects them. The Directive iswill be replaced by the EU Open Data Directive which came into force in July 2019, and has to be transposed by the member states by July 2021.

An open data portal which displays data from the EU member states national open data portals, as well as the portals of the EEA countries and some candidate and neighbouring countries. It should not be confused with the EU open data portal which contains datasets from the EU institutions, agencies and services.

See:  https://www.europeandataportal.eu/

EuroVoc is a multilingual, multidisciplinary thesaurus covering the activities of the EU, the European Parliament in particular. It contains terms in 23 EU languages (Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish), plus in three languages of countries which are candidates for EU accession: Macedonian, Albanian and Serbian.

Source: https://op.europa.eu/en/web/eu-vocabularies/th-dataset/-/resource/dataset/eurovoc

The legal situation in which one user has an exclusive right to re-use a dataset, excluding others from such a re-use. The OD Directive in general prohibits the of exclusive arrangements making the re-use of documents open to all potential actors in the market, even if one or more market actors already exploit added-value products based on those documents. The exclusive rights can be granted in only exceptional situation, such as the necessity to grant the right for the provision of a service in general interest.

In artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert and represents one of the current applicable results of machine learning and a possible step towards development of Artificial Intelligence.

Source: Jackson, Peter (1998), Introduction To Expert Systems (3 ed.), Addison Wesley, p. 2


Data meeting the concepts of findability, accessibility, interoperability, and reusability, defined as a set of more detailed measurable principles. Designed with the scientific data reusability in mind, FAIR principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. While FAIR data can also be open data, this might not be true in every case. Source: The FAIR Guiding Principles for scientific data management and stewardship

The description of how a file is represented on a computer disk. The format usually corresponds to the last part of the file name (‘extension’), e.g. a file in CSV format might be called schools-list.csv. The file format refers to the internal format of the file, not how it is displayed to users. E.g. CSV and XLS files are structured very differently on disk, but may look similar or identical when opened in a spreadsheet program such as Excel.

Source: ODH

A rating system for open data proposed by Tim Berners-Lee, founder of the World Wide Web. To score the maximum five stars, data must (1) be available on the Web under an open licence, (2) be in the form of structured data, (3) be in a non-proprietary file format, (4) use URIs as its identifiers (see also RDF), (5) include links to other data sources (see linked data). To score 3 stars, it must satisfy all of (1)-(3), etc. See also: https://5stardata.info/en/;

Source: ODH

‘Friend of a friend’ is a machine-readable descriptive vocabulary of persons, their activities and their relations to other people and objects. FOAF allows groups of people to describe social networks without the need for a centralised database.

Source: https://en.wikipedia.org/wiki/FOAF_%28ontology%29 , cited by EU OD

As defined by the OD Directive, ‘formal open standard’ means a standard which has been laid down in written form, detailing specifications for the requirements on how to ensure software interoperability.

Source: OD Directive

Key EU legislation on the personal data protection and privacy. The General Data Protection Regulation (EU)2016/679 is in force since May 2018 in the EU and EEA countries. It aims to give the individuals control over their personal data, requires all data controllers to establish procedures and safeguards, and is very strict towards breaches committed by private companies.


Geographical Information System, any computer system designed to read, display, analyse and manipulate geodata.

Source: ODH

The GNU General Public License is a free, “copyleft” license for software and other kinds of works in a sense that it protects the freedoms – the rights - of users of thus licensed software instead of focusing on the moral and material rights of authors within the recognized and almost universally accepted framework of copyright and neighbouring rights.

Source: Katulić, T.:” Opportunities and pitfalls of GPL software licencing agreement from the perspective of the software developer”, Central European Conference on Information and Intelligent Systems, CECIIS 2013

The Global Positioning System, a satellite-based system which provides exact location information to any equipment with a suitable receiver (including modern smartphones). GPS is invaluable to many location-based apps, providing users with e.g. route-finding information or weather forecasts based on their current location. GPS is also a striking example of successful open data, as it is maintained by the US government and provided free of charge to anyone with a GPS receiver.

Source: ODH

A dialect of JSON with specialised features for describing geodata, and hence a popular interchange format for geodata.

Source: ODH

Geographic data link place, time, and attributes. Some attributes are physical or environmental in nature, while others are social or economic”

Source: Longley, P.A., M.F. Goodchild, D.J. Maguire, and D.W. Rhind, 2001, Geographic information Systems and Science, Chicester, England (John Wiley and Sons Ltd), pp 64-65.

GitHub is a social coding platform allowing developers to publicly or privately build code repositories and interact with other developers around these repositories–providing the ability to download or fork a repository, as well as contribute back, resulting in a collaborative environment for software development.

Source: US OD


An event, usually over one or two days, where developers, subject experts and others come together to create apps, visualisations and prototypes that aim to address problems in a particular domain, usually making heavy use of data. Hackathons focusing on a particular collection of data are a possible form of community engagement by data publishers. The hackathon is a popular format in the open source community. Source: ODH.

An event in which computer programmers and others in the field of software development, like graphic designers, interface designers, project managers and computational philologists, collaborate intensively on software projects. Occasionally, there is a hardware component as well. Hackathons typically last between a day and a week in length. Some hackathons are intended simply for educational or social purposes, although in many cases the goal is to create usable software. Hackathons tend to have a specific focus, which can include the programming language used, the operating system, an application, an API, the subject, or the demographic group of the programmers. In other cases, there is no restriction on the type of software being created.

Source: USOD

Document the re-use of which is associated with important benefits for society, the environment and the economy, in particular because of their suitability for the creation of value-added services, applications and new, high-quality and decent jobs, and of the number of potential beneficiaries of the value-added services and applications based on those datasets; Under the OD Directive such datasets should be in general[1]  free of charge, machine readable, provided via APIs and as bulk download where relevant.

Source: OD Directive.


A company that stores a customer’s data on its own (the host’s) computers and makes it available over the internet. A hosted service is one that runs and stores data on the service-provider’s computers and is accessed over the network. See also SaaS.

Source: ODH.

Data in a format that can be conveniently read by a human. Some human-readable formats, such as PDF, are not machine-readable as they are not structured data, i.e. the representation of the data on disk does not represent the actual relationships present in the data.

Source: ODH.


The minimum set of metadata elements, the so-called IMMC core metadata, that is to be used in the data exchange.  IMMC Core Metadata, within the context of the Interinstitutional Metadata Maintenance Committee (IMMC), is defined as: the minimum set of metadata elements relative to the legal decision-making process, to be used in the data exchange between the institutions involved and the Publications Office.

Source: http://publications.europa.eu/mdr/core-metadata/  cited by EU OD

The name of an object or concept in a database. An identifier may be the object’s actual name (e.g. ‘London’ or ‘W1 1AA’, a London postcode), or a word describing the concept (‘population’), or an arbitrary identifier such as ‘XY123’ that makes sense only in the context of the particular dataset. Careful choice of identifiers using relevant standards can facilitate data integration.

Source: ODH.

A structured collection of data presented in a form that people can understand and process. Information is converted into knowledge when it is contextualised with the rest of a person’s knowledge and world model. Source: ODH.

Any communication or representation of knowledge such as facts, data, or opinions in any medium or form, including textual, numerical, graphic, cartographic, narrative, or audio-visual forms.

Source: US OD

IARs are registers specifically set up to capture and organise meta-data about the vast quantities of information held by government departments and agencies. A comprehensive IAR includes databases, old sets of files, recent electronic files, collections of statistics, research and so forth.

IARs can be developed in different ways. Government departments can develop their own IARs and these can be linked to national IARs. IARs can include information which is held by public bodies but which has not yet been – and maybe will not be – proactively published. Hence, they allow members of the public to identify information which exists and which can be requested. It is important that IARs are complete as possible because otherwise possible re-users could be discouraged from finding or requesting the dataset.

It is essential that the metadata in the IARs should be comprehensive so that search engines can function effectively. In the spirit of open government data, public bodies should make their IARs available to the general public as raw data under an open license so that civic hackers can make use of the data, for example by building search engines and user interfaces. The EU PSI Directive and the EU OD Directive require of Member states to establish tools that help re-users to find documents available for re-use, such as asset lists.

Source: ODH.

The stages through which information passes, typically characterized as creation or collection, processing, dissemination, use, storage, and disposition.

Source: US OD

The information society is one in which information is the defining feature, unlike the industrial society where steam power and fossil fuels were distinguishing elements. Information society is a concept that responds to the expansion and ubiquity of information. The term has been in use since the 1970s, but has gained in popularity and is now widely used in social and political policy.

Source: Information Society https://www.oxfordreference.com/view/10.1093/oi/authority.20110803100003718

A discrete set of information resources organized for the collection, processing, maintenance, transmission, and dissemination of information, in accordance with defined procedures, whether automated or manual.

Source: US OD

The phases through which an information system passes, typically characterized as initiation, development, operation, and termination.

Source: US OD

The Directive establishing and Infrastructure for Spatial Information in the European Community (INSPIRE Directive) enacted in 2007 (2007/2/EC) established an infrastructure for spatial information in Europe to support Community environmental policies and policies or activities which may have an impact on the environment. It is based on the infrastructure operated by the Member States of the EU and it addresses 34 spatial data themes needed for environmental applications, with key components specified through technical implementing rules. To ensure that the spatial data infrastructures of the Member States are compatible and usable in a Community and transboundary context, the Directive requires that common Implementing Rules (IR) are adopted in a number of specific areas (Metadata, Data Specifications, Network Services, Data and Service Sharing and Monitoring and Reporting). These IRs are adopted as Commission Decisions or Regulations.

Source: https://inspire.ec.europa.eu/inspire-directive/2

The term intellectual property rights in the context of open data refers primarily to copyright and related rights, including sui generis forms of protection. The OD Directive does not apply to documents covered by industrial property rights, such as patents and registered designs and trademarks. The Directive neither affects the existence or ownership of intellectual property rights of public sector bodies, nor does it limit the exercise of these rights in any way beyond the boundaries set by this Directive. The obligations imposed in accordance with the OD Directive apply only insofar as they are compatible with international agreements on the protection of intellectual property rights, in particular the Berne Convention for the Protection of Literary and Artistic Works (Berne Convention), the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS Agreement) and the WIPO Copyright Treaty (WCT). Public sector bodies are expected to exercise their copyright in a way that facilitates re-use.

Source: OD Directive

A worldwide network of interconnected computer networks that use the Internet protocol suite (TCP/IP) to facilitate data transmission and exchange among several billion devices, which are logically linked together by a globally unique address space.

Source: ODH

The ability of systems to exchange information and use the exchanged information.

Source: EU OD

The European Interoperability framework highlights: organisational interoperability, legal interoperability, semantic interoperability and technical interoperability.

Interoperable solutions for European public administrations Programme is a European Commission-funded programme aiming at facilitating transactions among cross-border and/or cross-sector public administrations in Europe and supporting development of digital solutions that enable public administrations, businesses and citizens in Europe to benefit from interoperable cross-border and cross-sector public services. ISA2 is established in 2015 to run from 2016 to 2020, as the follow-up programme to ISA 2010-2015.

Source: https://ec.europa.eu/isa2/isa2_en


JavaScript Object Notation, a simple but powerful format for data. It can describe complex data structures, is highly machine-readable as well as reasonably human-readable, and is independent of platform and programming language, and is therefore a popular format for data interchange between programs and systems. Source: ODH.

JavaScript object notation is an open standard format that uses human readable text to transmit data objects consisting of attribute–value pairs. It is the most common data format used for asynchronous browser/server communication (AJAJ). Source: EUOD.

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

Source: USOD

JSONP or “JSON with padding” is a JSON extension wherein the name of a callback function is specified as an input argument of the underlying JSON call itself. JSONP makes use of runtime script tag injection.

Source: USOD

JSON-based format to serialize Linked Data. The syntax is designed to easily integrate into deployed systems that already use JSON, and provides a smooth upgrade path from JSON to JSON-LD. It is primarily intended to be a way to use Linked Data in Web-based programming environments, to build interoperable Web services, and to store Linked Data in JSON-based storage engines.

Source: JSON-LD 1.1 specification, W3C


Keyhole Markup Language, an XML-based open format for geodata. KML was devised for Keyhole Earth Viewer, later acquired by Google and renamed Google Earth, but has been an international standard of the Open Geospatial Consortium since 2008.

Source: ODH.


A legal instrument by which a copyright holder may grant rights over the protected work. Data and content is open if it is subject to an explicitly-applied licence that conforms to the Open Definition. A range of standard open licences are available, such as the CCZero licence or the Creative Commons CC-BY licence, which requires only attribution.

Source: ODH.

If Project X publishes content, and wants to include content from Project Y, it is necessary that Y’s licence permits at least the same range of re-uses as X’s licence. For example, content published under a non-commercial licence cannot be included in Wikipedia, since Wikipedia’s open licence includes rights for commercial re-use which cannot be granted for the non-commercial data, an example of a failure of licences to mix well.

Source: ODH.

A form of data representation where every identifier is an http://… URI, using standard lists (see vocabulary) of identifiers where possible, and where datasets include links to reference datasets of the same objects. A key aim is to make data integration automatic, even for large datasets. Linked data is usually represented using RDF. See also five stars of open data; triple store. Source: ODH

Central for the concept of the semantic web; linked data assigns a web address, similar to a website address, to each piece of data, enabling connection of data through the web. It builds upon standard web technologies such as HTTP and URI, but rather than using them to serve web pages for human readers it extends them to share information in a way that can be read automatically by computers. The connections between linked data can grow without limitations. Linked data is particularly useful for analysing different types of data from various datasets, for example government data.

Source: EUOD

Linked data principles provide a common API for data on the web that is more convenient than many separately and differently designed APIs published by individual data suppliers. Tim Berners-Lee, inventor of the web and initiator of the linked data project, proposed the following principles upon which linked data is based. Use URIs to name things. Use HTTP URIs so that things can be referred to and looked up (dereferenced) by people and user agents. When someone looks up a URI, provide useful information using open web standards such as RDF or Sparql. Include links to other related things using their URIs when publishing on the web.

Source: W3C — http://www.w3.org/TR/ld-glossary/#linked-data-principles


Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.

Source: Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of research and development, 3(3), 210-229.

Data in a data format that can be easily automatically read and processed (or identified, recognized and extracted) by a computer, without human intervention, including individual statements of fact, and their internal structure, while ensuring no semantic meaning is lost. Examples of file formats are CSV, JSON, XML, etc. Machine-readable data must be structured data. 

Non-digital material (printed or hand-written documents) is by its non-digital nature not machine-readable, as well as digital material in certain file formats, as a PDF document containing tables of data, but they are (Compare →) human-readable. The equivalent tables in a format such as a spreadsheet would be machine readable.

As another example scans (photographs) of text are not machine-readable (but are human readable!) but the equivalent text in a format such as a simple ASCII text file can machine readable and processable.

The appropriate machine-readable format may vary by type of data - so, for example, machine readable formats for geographic data may differ from those for tabular data. Sources: ODH and US OD and OD Directive

There are two types of machine-readable data: human-readable data that is marked up so that it can also be understood by computers, for example microformats or RDFa; data formats intended principally for computers, for example RDF, XML and JSON.

Source: EU OD

The combination of multiple datasets from multiple sources to create a new service or visualisation or new information. 

Source: EU OD

If something is visible to many people then, collectively, they are more likely to find errors in it. Publishing open data can therefore be a way to improve its accuracy and data quality, especially where a good interface for reporting errors is provided. See → crowdsourcing.

Source: ODH

The additional cost incurred by supplying a single copy of a resource, e.g. data. For data to be open according to the Open Definition, it must be charged for at no more than marginal cost. Where data is available for download over the internet the marginal cost will usually be zero. There may be a small marginal cost in exceptional cases, e.g. if for reasons of size the data needs to be put on a disk and posted. The OD Directive considers costs incurred for the reproduction, provision and dissemination of documents as well as for anonymisation of personal data and measures taken to protect commercially confidential information as being marginal costs

Sources: ODH, OD Directive

Metadata is structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information resource. Metadata is often called data about data.

It consists of information about a dataset such as its title and description, method of collection, author or publisher, area and time period covered, licence, date and frequency of release, etc. It is essential to publish data with adequate metadata to aid both discoverability and usability of the data.

Source: ODH, EU OD and NISO — http://www.niso.org/publications/press/UnderstandingMetadata.pdf

The Metadata Registry is an important interoperability and standardisation tool. It registers and maintains definition data (metadata elements, named authority lists, schemas, etc.) used by the different EU institutions.

Source: EU OD and http://publications.europa.eu/mdr/index.html


Non-governmental organisation. NGOs are voluntary, non-profit organisations focussing on charitable work, community-building, campaigning, research, etc, making up a vital part of civil society.

Source: ODH

Used to denote that the holder of copyright or database rights waive all their interest in their data worldwide. May not be applicable in all legal systems.

Source: Creative Commons.

A restriction, as part of a licence, that content cannot be freely re-used for ‘commercial’ purposes. Content or data subject to a non-commercial restriction is not open, according to the Open Definition. Such a restriction reduces economic value and causes problems with licence mixing, as well as often ruling out more than is intended (for example, it is often unclear whether educational uses are ‘commercial’). The intent of a non-commercial clause may be better captured by a share-alike requirement. See also → Licences.

Source: ODH

A principle of the OD Directive and PSI Directive requiring that any applicable conditions for the re-use of documents are non-discriminatory for comparable categories of re-use, including for cross-border re-use, and limiting the possibilities of exclusive rights. 


Open Data Readiness Assessment, a framework created by the World Bank for assessing the opportunities, obstacles and next steps to be taken in a country (especially a developing country) considering publishing government data as open data.

Source: ODH

Open Database Licence, an attempt to create an open licence for data which covers the ‘database rights’ as well as copyright itself. It does this by imposing contractual obligations on the data re-user. Unfortunately [1] contract law is fundamentally different from copyright law, since copyright is inherent in a work and binds all downstream users of the work, whereas a contract only binds the parties to the contract and has no force on a later re-user of re-published data. The ODbL remains useful nevertheless, and other attempts are being made to create open licences specifically for data.

Source: ODH


The Open Government Partnership, a partnership of national governments launched in 2011 with the aim of promoting open government in the member countries and collaborating on multi-lateral agreements and best practices. It covers also open data.

A formal model that allows knowledge to be represented for a specific domain. An ontology describes the types of things that exist (classes), the relationships between them (properties) and the logical ways those classes and properties can be used together (axioms).

Source: W3C — http://www.w3.org/TR/ld-glossary/#ontology as cited by EU OD

The principle that access to the published papers and other results of research, especially publicly-funded research, should be freely available to all. This contrasts with the traditional model where research is published in journals which charge subscription fees to readers. Besides benefits similar to the benefits of open data, proponents suggest that it is immoral to withhold potentially life-saving and valuable research from some readers who may be able to use or build on it. Open-access journals now exist and the interest of research funders is giving them some traction, especially in the sciences. Source: ODH

Data is open if it can be freely accessed, used, modified and shared by anyone for any purpose - subject only, at most, to requirements to provide attribution and/or share-alike. Specifically, open data is defined by the Open Definition and requires that the data be (1) Legally open: that is, available under an open (data) license that permits anyone freely to access, reuse and redistribute, and (2)  Technically open: that is, that the data be available for no more than the cost of reproduction and in machine-readable and bulk form.

Source: ODH

Open Data Commons, an Open Knowledge Foundation initiative, is the home of a set of legal tools and licenses to help you publish, provide and use open data.

An open licence specifically directed at the use of databases. You are free: To share: To copy, distribute and use the database, To create: To produce works from the database, To adapt: To modify, transform and build upon the databases, as long as you: Attribute (You must attribute any public use of the database, or works produced from the database, in the manner specified in the ODbL. For any use or redistribution of the database, or works produced from it, you must make clear to others the license of the database and keep intact any notices on the original database), Share-Alike (If you publicly use any adapted version of this database, or works produced from an adapted database, you must also offer that adapted database under the ODbL), and Keep open (If you redistribute the database, or an adapted version of it, then you may use technological measures that restrict the work (such as DRM) as long as you also redistribute a version without such measures).

Source: https://opendatacommons.org/licenses/odbl/

You are free: To share: To copy, distribute and use the database, To create: To produce works from the database, To adapt: To modify, transform and build upon the databases, as long as you: Attribute.

Source: https://opendatacommons.org/licenses/by/summary/

You are free: To share: To copy, distribute and use the database, To create: To produce works from the database, To adapt: To modify, transform and build upon the databases, as long as you: Blank: This section is intentionally left blank. The PDDL imposes no restrictions on your use of the PDDL licensed database. If you use the PDDL for any changes you make, the work stays free and open for all. CC0 is compliant with the Science Commons protocol for open data, as is the Open Data Commons PDDL. They are both interoperable, so any data or content made available under either system can be mixed and remixed. Unlike CC0 however, the Open Data Commons system includes a set of Community Norms that can be linked with the license.

Source: https://opendatacommons.org/licenses/pddl/summary/

The Open Definition, first released by Open Knowledge in 2005, sets out under what conditions data and content is open. Both legal and technical compatibility is vital, and the Open Definition ensures that openly-licensed data can be combined successfully, avoiding a proliferation of licences and terms of use for open data leading to complexity and incompatibility. Today it is the main international standard for open data and open data licences, with an advisory council of senior open data practitioners and can be found at opendefinition.org. The Open Definition has influenced and steered other communities of practice in the open movement, including open access to publicly-funded research, open hardware, and more, as well as governments’ approach towards licences.

Source: ODH

A Web-based system that contains a data catalogue with descriptions of datasets and provides services enabling discovery and re-use of the datasets. See also → Data portal.

Source:  https://data.europa.eu/euodp/en/developerscorner

Open data ecosystem (ODE) is a concept that provides a dynamic and holistic understanding to open data provision and use. The main characteristics of open data ecosystems are user-drivenness, inclusiveness, circularity, and skill-based (Van Loenen, Zuiderwijk et al., unpublished/ submitted).

In comparison to the concept of open data infrastructure (ODI), ODE is broader concept built upon the ODI, which represents „the basic physical and organisational structure and facilities needed for the functioning of an open data ecosystem“ (van Loenen, 2018: 6). 

According to Zuiderwijk et al. (2014), ODE encompass four key elements: (1) releasing and publishing open data on the internet, (2) searching, finding, evaluating and viewing data and their related licenses, (3) cleansing, analyzing, enriching, combining, linking and visualizing dana, and (4) interpreting and discussing data and providing feedback to the data provider and other stakeholders. Three additional elements of ODE include (5) user pathways showing directions for how open data can be used, (6) a quality management system and (7) different types of metadata to be able to connect the elements.[1] 


van Loenen, Bastiaan, Vancauwenberghe, Glenn, Crompvoets, Joep and Dalla Corte, Lorenzo (2018). Open Data Exposed. In: van Loenen, Bastiaan, Vancauwenberghe, Glenn and Crompvoets, Joep (ed.) Open Data Exposed. Berlin: Springer, pp. 1-10.

Zuiderwijk, Anneke, Janssen, Marijn and Davis, Chris (2014). Innovation with open data: Essential elements of open data ecosystems, Information Polity 19 (1-2): 17–33.

Open data infrastructure (ODI) is a concept that explains the domain in which open data is created and used. ODI consists of a combination of social (non-technical) and technical elements which are interrelated and interact, ensuring the supply and use of open data. Technical elements refer to open data tools, technologies, standards etc., while non-technical elements include open data regulation and policies, governance and funding (van Loenen et al, 2018: 6). ODI can be defined as „a shared, (quasi-) public, evolving system, consisting of a collection of interconnected social elements (e.g. user operations) and technical elements (e.g. open data analysis tools and technologies, open data services) which jointly allow for the use of open government data“ (Zuiderwijk, 2015: 1).

As any other infrastructure, ODI has several necessary characteristics. First, it consists of different elements that are interconnected and interact. This refers to data, platforms and people (i.e. actors). Key actors involved are open data users and providers. Third, within an ODI the resources (data information and knowledge) are exchanged. Fourth, the interoperability of different elements of infrastructure has to be ensured (e.g. via metadata standards). Fifth, due to the development of new technologies, ODIs evolve through time. Finally, ODIs are openly shared among a variety of actors and systems.


van Loenen, Bastiaan, Vancauwenberghe, Glenn, Crompvoets, Joep and Dalla Corte, Lorenzo (2018). Open Data Exposed. In: van Loenen, Bastiaan, Vancauwenberghe, Glenn and Crompvoets, Joep (ed.) Open Data Exposed. Berlin: Springer, pp. 1-10.

Zuiderwijk Anneke (2015) Open data infrastructures: The design of an infrastructure to enhance the coordination of open data use. Doctoral dissertation. https://repository.tudelft.nl/islandora/object/uuid%3A9b9e60bc-1edd-449a-84c6-7485d9bde012

TODO 2020 Online Training Program - Open data concepts and components - Video lecture Open data infrastructure http://science.geof.unizg.hr/todo-platform/mod/book/view.php?id=3&chapterid=3

Open data lifecycle is one of the various models that have been developed for the description of open data. Open data lifecycle model describes the handling of the data itself, starting from its creation, through the provision of open data to its use by various parties. Although stages or phases of open data lifecycle vary within the models developed by different authors, most commonly phases include: discovery and acquisition, data organization, publication, integration, analysis, re-use and storage/preservation. The existing lifecycle models can mostly be distinguished based on the technological perspective - data curation, big data and linked data, and stakeholder perspective (publishers and users).  The open data lifecycle is preoccupied with the operational processes of open data publication (e.g. extracting, cleaning, publishing and maintaining data), while it leaves strategic processes (e.g. policy and decision making, implementation) outside the focus.

Source: Yannis, Charalabidis, Zuiderwijk, Anneke, Alexopoulos, Charalampos, Janssen, Marijn, Lampoltshammer, Thomas, Ferro, Enrico (2018). The Multiple Life Cycles of Open Data Creation and Use. In: Yannis, Charalabidis, Zuiderwijk, Anneke, Alexopoulos, Charalampos, Janssen, Marijn, Lampoltshammer, Thomas, Ferro, Enrico, The World of Open Data; Concepts, Methods, Tools and Experiences. Springer, Public Administration and Information Technology book series, pp. 11-31.

Open development seeks to bring the philosophy of the open movement to international development. It promotes open government, transparency of aid flows, engagement of beneficiaries in the design and implementation of development projects, and availability and use of open development data.

Source: ODH

The OpenDocument Format (ODF) is an open XML-based document file format for office applications to be used for documents containing text, spreadsheets, charts, and graphical elements. The file format makes transformations to other formats simple by leveraging and reusing existing standards wherever possible. As an open standard under the stewardship of OASIS, OpenDocument also creates the possibility for new types of applications and solutions to be developed other than traditional office productivity applications.

Source: Oasis Open Technical Committee

A file format that is platform-independent and made available to the public without any restriction that impedes the re-use of documents. Source: OD Directive

A file format with no restrictions, monetary or otherwise, placed upon its use and can be fully processed with at least one free/libre/open-source software tool. Patents are a common source of restrictions that make a format proprietary. Often, but not necessarily, the structure of an open format is set out in agreed standards, overseen and published by a non-commercial expert body. A file in an open format enjoys the guarantee that it can be correctly read by a range of different software programs or used to pass information between them.

Source: ODH

Open government, in line with the open movement generally, seeks to make the workings of governments transparent, accountable, and responsive to citizens. It includes the ideals of democracy, due process, citizen participation and open government data. A thorough-going approach to open government would also seek to enable citizen participation in, for example, the drafting and revising of legislation and budget-setting.

Source: ODH

Data collected, produced or paid for by the public bodies and made freely available for reuse for any purpose.

Source: EU OD

Standardised public licences available online which allow data and content to be freely accessed, used, modified and shared by anyone for any purpose, and which rely on open data formats. The EU Member States are expected to encourage the use of open licences that should eventually become common practice across the Union.

Source: OD Directive

The open movement seeks to work towards solutions of many of the world’s most pressing problems in a spirit of transparency, collaboration, re-use and free access. It encompasses open data, open government, open development, open science and much more. Participatory processes, sharing of knowledge and outputs and open source software are among its key tools. The specific definition of “open” as applied to data, knowledge and content, is set out by the Open Definition.

Source: ODH

The practice of science in accordance with open principles, including open access publishing, publication of and collaboration around research data as open data together with associated source code, and use and development of open source data processing tools.

Source: ODH

Software for which the source code is available under an open licence. Not only can the software be used for free, but users with the necessary technical skills can inspect the source code, modify it and run their own versions of the code, helping to fix bugs, develop new features, etc. Some large open source software projects have thousands of volunteer contributors. The Open Definition was heavily based on the earlier Open Source Definition, which sets out the conditions under which software can be considered open source.

Source: ODH

Computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under an open-source license that permits users to study, change, improve and at times also to distribute the software. Open source software is very often developed in a public, collaborative manner. Open source software is the most prominent example of open source development and often compared to (technically defined) user-generated content or (legally defined) open content movements.

Source: US OD

A spatial data infrastructure in which citizens, research institutions, private organizations and other businesses and non-governmental actors are recognized as key stakeholders of the infrastructure and in which public spatial data, but also private and citizen-generated spatial data is freely available to use for citizens, businesses and any other groups, without any restrictions.

Source: Mulder, A. E., G. Wiersma, B. van Loenen (2020). Status of National Open Spatial Data Infrastructures: a Comparison Across Continents. International Journal of Spatial Data Infrastructures Research, 2020, Vol.15, 56-87.

Generally understood as technical standards that are free from licensing restrictions. Can also be interpreted to mean standards that are developed in a vendor-neutral manner.

Sources: http://schoolofdata.org/handbook/appendix/glossary cited by EU OD; ODH

A standard developed or adopted by voluntary consensus standards bodies, both domestic and international. These standards include provisions requiring that owners of relevant intellectual property have agreed to make that intellectual property available on a non-discriminatory, royalty-free or reasonable royalty basis to all interested parties.

Source: US OD


Portable Document Format, a file format for representing the layout and appearance of documents on a page independent of the layout software, computer operating system, etc. It is a file format used to present and exchange documents independently of software, hardware or operating systems. An open standard maintained by the International Organisation for Standardisation.  Originally a proprietary format of Adobe Systems, PDF has been an open format since 2008. Data in PDF files is not machine-readable; see structured data.

Source: ODH and  https://acrobat.adobe.com/be/en/products/about-adobe-pdf.html

Use of statistics to predict outcomes.

Source: Geisser, S.: "Predictive Inference: An Introduction", Chapman & Hall, 1993.

The right of individuals to a private life includes a right not to have personal information about themselves made public. A right to privacy is recognised by the Universal Declaration of Human Rights and the European Convention on Human Rights, and in the EU and the Member States of the EU it is recognised by the Charter of Fundamental Rights and is regulated by special regulation. See → GDPR.[1] 

Source: ODH

 (i) Proprietary software is owned by a company which restricts the ways in which it can be used. Users normally need to pay to use the software, cannot read or modify the source code, and cannot copy the software or re-sell it as part of their own product. Common examples include Microsoft Excel and Adobe Acrobat. Non-proprietary software is usually open source.

(ii) A proprietary file format is one that a company owns and controls. Data in this format may need proprietary software to be read reliably. Unlike an open format, the description of the format may be confidential or unpublished, and can be changed by the company at any time. Proprietary software usually reads and saves data in its own proprietary format. For example, different versions of Microsoft Excel use the proprietary XLS and XLSX [1] formats.

Source: ODH


Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. Pseudonymization is suggested as one of the technical measures that can help with compliance with the European Union's General Data Protection Regulation and its obligations for data controllers to ensure secure processing of personal data.

Source: Rec. 26, Article 4.1.5. of the General Data Protection Regulation L 119/1, 4.5.2016

Content to which copyright does not apply, for example because it has expired or it never existed, is free for any kind of use by anyone and is said to be in the public domain. CC0, one of the licences of Creative Commons, is a ‘public domain dedication’ which attempts so far as possible to renounce all rights in the work and place it in the public domain. Source: ODH

The Public Domain means data is free for use by anyone for any purpose without restriction under copyright law. Public domain is the purest form of open/free, since no one owns or controls the material in any way. For the official documents in most of the European countries it is the most convenient licence.

Source: Creative Commons.

Public sector body’ means the State, regional or local authorities, bodies governed by public law or associations formed by one or more such authorities or one or more such bodies governed by public law.

Source: OD Directive

Data that is collected, produced, reproduced, processed, disseminated, or controlled by the public sector bodies in many areas of their activity while accomplishing their institutional tasks.  The work of government involves collecting huge amounts of data, much of which is not confidential (economic data, demographic data, spending data, crime data, transport data, etc). The value of much of this data can be greatly enhanced by releasing it as open data, freeing it for re-use by business, research, civil society, data journalists, etc.

Sources: ODH, US OD

Any undertaking active in the areas set out in point (b) of Article 1(1) of the OD Directive over which the public sector bodies may exercise directly or indirectly a dominant influence by virtue of their ownership of it, their financial participation therein, or the rules which govern it. A dominant influence on the part of the public sector bodies shall be presumed in any of the following cases in which those bodies, directly or indirectly: (a) hold the majority of the undertakings’ subscribed capital (b) control the majority of the votes attaching to shares issued by the undertaking; (c) can appoint more than half of the undertaking’s administrative, management or supervisory body.

Source: OD Directive

Anyone who distributes and makes available data or other content. Data publishers include government departments and agencies, research establishments, NGOs, media organisations, commercial companies, individuals, etc.

Source: ODH


A type of question accepted by a database about the data it holds. A complex query may ask the database to select records according to some criteria, aggregate certain quantities across those records, etc. Many databases accept queries in the specialised language SQL or dialects of it. A web API allows an app to send queries to a database over the web. Compared with downloading and processing the data, this reduces both the computation load on the app and the bandwidth needed.

Source: ODH


The original data, in machine-readable form, underlying any application, visualisation, published research or interpretation, etc. An expression that refers to data in its original state that has not been processed, aggregated or manipulated in any other way. It is also defined as ‘primary’.

Sources: ODH, EU OD

A family of international standards for data interchange on the web. RDF is based on the idea of identifying things using web identifiers or HTTP URIs and describing resources in terms of simple properties and property values.

Source: W3C — http://www.w3.org/TR/ld-glossary/#rdf

Resource Description Framework, the native way of describing linked data. RDF is not exactly a data format; rather, there are a few equivalent formats in which RDF can be expressed, including an XML-based format. RDF data takes the form of ‘triples’ (each atomic piece of data has three parts, namely a subject, predicate and object), and can be stored in a specialised database called a triple store. 

Source: ODH

A family of specifications for a metadata model. The RDF family of specifications is maintained by the World Wide Web Consortium (W3C). The RDF metadata model is based upon the idea of making statements about resources in the form of a subject-predicate-object expression…and is a major component in what is proposed by the W3C’s Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange, and utilize metadata about the vast resources of the Web, in turn enabling users to deal with those resources with greater efficiency and certainty. RDF’s simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity.

Source: US OD

A W3C recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within web documents.

Source: https://en.wikipedia.org/wiki/RDFa

A use of public sector information usually by third persons for purposes other than initial purposes within the public tasks for which data were originally collected, produced, or disseminated. It is rare that data gathered for a particular purpose does not have other possible uses. Once gathered, for whatever reason, data can be re-used again and again, in ways that were never envisaged when it was collected, provided only that the data-holder makes it available under an open licence to enable such re-use. Under the EU OD Directive, data can be re-used for both commercial and non-commercial purposes. An exchange of documents between public sector bodies is not considered to be a re-use of public sector information. 

Sources: ODH, OD Directive

Data (such as the current location of trains on a network) which is being constantly updated, where a query needs to be against the latest version of the data.

Source: ODH

A percentage of the overall charge, in addition to that needed to recover the eligible costs, not exceeding 5 percentage points above the fixed interest rate of the ECB. Used to calculate costs above marginal costs by the EU Open data Directive.

Source: OD Directive

A possibility for the user to require the disclosure of the dataset by the public bodies in a machine-readable format suitable for re-use. It is a formal procedure that usually requires the issuing of the formal decision. The decision of the public body is scrutinised by the independent authority or the court.

Documents in a digital form, other than scientific publications, which are collected or produced in the course of scientific research activities and are used as evidence in the research process, or are commonly accepted in the research community as necessary to validate research findings and results. Source: OD Directive

Traditionally the data was kept by researchers and only final research outputs, such as papers analysing the data, would be published. Open science holds that the data should be published, both to increase verifiability of the work and to enable it to be used in other research. The full spirit of open science collaboration demands data publication early in the project, but research culture will need to change appreciably before this becomes widespread. Research data management (RDM) is an emerging discipline that seeks best practices in handling this. Research data has been covered by the EU Directive regulating open data and the reuse of public sector information since 2019.

Source: ODH

CKAN uses this term to denote one of the individual data objects (a file such as a spreadsheet, or an API) in a dataset. Source: ODH

The physical representation of a dataset. Each resource can be a file of any kind, a link to a file elsewhere on the web or a link to an API. For example, if the data is being supplied in multiple formats or split into different areas or time periods, each file is a different ‘resource’ that should be described individually.

Source: http://www.w3.org/

A family of web feed formats (often dubbed Really Simple Syndication) used to publish frequently updated works — such as blog entries, news headlines, audio, and video — in a standardized format. An RSS document (which is called a “feed,” “web feed,” or “channel”) includes full or summarized text, plus metadata such as publishing dates and authorship.

Source: US OD


Software as a Service, i.e. a software program that runs, not on the user’s machine, but on the machines of a hosting company, which the user accesses over the web. The host takes care of associated data storage, and normally charges for the use of the service or monetises its client base in other ways.

Source: ODH

An XML schema defines the structure of an XML document. An XML schema defines things such as which data elements and attributes can appear in a document; how the data elements relate to one another; whether an element is empty or can include text; which types of data are allowed for specific data elements and attributes; and what the default and fixed values are for elements and attributes. A schema is also a description of the data represented within a database. The format of the description varies but includes a table layout for a relational database or an entity-relationship diagram. It is method for specifying constraints on XML documents.

Source: US OD

Schema.org is a vocabulary for structured data on the Internet, on web pages, in email messages, etc. It covers entities, relationships between entities and actions, and can easily be extended through a well-documented extension model. Schema.org is developed by W3C Community group in an open community process. 

Source: Schema.org

Extracting data from a non-machine-readable source, such as a website or a PDF document, and creating structured data from the result. Screen-scraping a dataset requires dedicated programming and is expensive in programmer time, so is generally done only after all other attempts to get the data in structured form have failed. Legal questions may arise about whether the scraping breaches the source website’s copyright or terms of service. Source: ODH

The process of extracting data in machine-readable formats of non-pure data sources, for example webpages or PDF documents. Often prefixed with the source (web scraping, PDF scraping).

Sources: http://en.wikipedia.org/wiki/Data_scraping, http://schoolofdata.org/handbook/appendix/glossary

Statistical data and metadata exchange, an international initiative that aims at standardising and modernising the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries.

Source: https://en.wikipedia.org/wiki/SDMX, cited by EU OD

An evolution or part of the World Wide Web that consists of machine-readable data in RDF and an ability to query that information in standard ways (e.g. via Sparql).

Source: W3C — http://www.w3.org/TR/ld-glossary/#semantic-web cited by EU OD

Search engine optimisation, a series of techniques which improve the visibility of a website in search engines result page (SERP) via the natural or unpaid (‘organic’ or ‘algorithmic’) search results. The goal of such optimisation is to rank as high as possible for a certain search query.

Source: https://en.wikipedia.org/wiki/Search_engine_optimization, cited by EU OD

A computer on the internet, usually manged by a hosting company, that responds to requests from a user, e.g. for web pages, downloaded files or to access features in a SaaS package being run on the server.

Source: ODH

Standard Generalized Markup Language (ISO 8879:1986) s an ISO-standard for document representation. SGML can be used for publishing in its broadest definition, ranging from single medium conventional publishing to multimedia data base publishing. SGML can also be used in office document processing when the benefits of human readability and interchange with publishing systems are required.

It is a basis for many popular markup-based languages used on the Web (HTML) or as a data storage/exchange format (XML, KML, …)

Source: ISO 8879:1986 specification

A popular file format for geodata, maintained and published by Esri, a manufacturer of GIS software. A Shapefile actually consists of several related files. Though the format is technically proprietary, Esri publishes a full specification standard and Shapefiles can be read by a wide range of software, so function somewhat like an open standard in practice.

Source: ODH

A license that requires users of a work to provide the content under the same or similar conditions as the original.

Source: ODH

SOAP (Simple Object Access Protocol) is a message-based protocol based on XML for accessing services on the Web. It employs XML syntax to send text commands across the Internet using HTTP. SOAP is similar in purpose to the DCOM and CORBA distributed object systems, but is more lightweight and less programming-intensive. Because of its simple exchange mechanism, SOAP can also be used to implement a messaging system.

Source: US OD

An open source enterprise search platform. Its major features include fulltext search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration and rich document (e.g. Word, PDF) handling.

Source: https://en.wikipedia.org/wiki/Apache_Solr

The files of computer code written by programmers that are used to produce a piece of software. The source code is usually converted or ‘compiled’ into a form that the user’s computer can execute. The user therefore never sees the original source code, unless it is published as open source.

Source: ODH

A query language similar to SQL, used for queries to a linked-data triple store.

Source: ODH

Sparql protocol and RDF query language (Sparql) defines a query language for RDF data, analogous to the structured query language for relational databases.

Source: W3C — http://www.w3.org/TR/ld-glossary/#sparql

A service that accepts Sparql queries and returns answers to them as Sparql result sets. It is best practice for dataset providers to give the URL of their Sparql endpoint to allow access to their data programmatically or through a web interface.

Source: W3C — http://www.w3.org/TR/ld-glossary/#sparql-endpoint

A table of data and calculations that can be processed interactively with a specialised spreadsheet program such as Microsoft Excel or OpenOffice Calc.

Source: ODH

Structured Query Language, a standard language used for interrogating many types of database. See → query.

Source: ODH

A published specification for, e.g., the structure of a particular file format, recommended nomenclature to use in a particular domain, a common set of metadata fields, etc. Conforming to relevant standards greatly increases the value of published data by improving machine readability and easing data integration.

Source: ODH

A set of predefined re-use conditions in a digital format, preferably compatible with standardised public licences available online.

Source: OD Directive

All data has some structure, but ‘structured data’ refers to data where the structural relation between elements is explicit in the way the data is stored on a computer disk. XML and JSON are common formats that allow many types of structure to be represented. The internal representation of, for example, word-processing documents or PDF documents reflects the positioning of entities on the page, not their logical structure, which is correspondingly difficult or impossible to extract automatically.

Source: ODH

Data that resides in fixed fields within a record or file. Relational databases and spreadsheets are examples of structured data. Although data in XML files is not fixed in locations like traditional database records, it is nevertheless structured, because the data is tagged and can be accurately identified.

Source: PC Magazine encyclopaedia — http://www.pcmag.com/encyclopedia/term/52162/ , cited by EU OD


Tab-separated values (TSV) are a very common form of text file format for sharing tabular data. The format is extremely simple and highly machine-readable.

Source: ODH

A principle of the public administration and democratic political system. Governments and other public sector bodies are said to be transparent when their workings and decision-making processes are well-understood, properly documented and open to scrutiny. Transparency is one of the aspects of open government. An increase in transparency is one of the benefits of open data.

Source: ODH

Public transport routes, timetables and real time data are valuable but difficult candidates for open data. Even when they are published, data from different transit authorities and companies may not be available in compatible formats, making it difficult for third parties to provide integrated transport information. Many transport authorities distribute public transport data using the General Transit Feed Specification (GTFS) which is maintained by Google. Work on standardisation and more open data is ongoing in the sector.

Source: ODH

The ‘triples’ of RDF data can be stored in a specialised database, called a triple store, against which queries can be made in the query language SPARQL.

Source: ODH

A triplestore is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like ‘Bob is 35’ or ‘Bob knows Fred’. Much like a relational database, information is stored in a triplestore and retrieved via a query language. Unlike a relational database, a triplestore is optimised for the storage and retrieval of triples. In addition to queries, triples can usually be imported/exported using RDF and other formats.

Source: http://en.wikipedia.org/wiki/Triplestore

A simple text format for a database table. Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab stop character. It is a form of the more general delimiter-separated values format.

Source: US OD


A meeting, similar to a conference, but with no agenda fixed in advance. Using various established techniques, participants jointly agree on the day what sessions will run. Some more traditional conference sessions with invited speakers may also be included. A popular format among the tech community, an unconference can be combined with or run alongside a hackathon based on open data. It is a possible method of community engagement by data publishers.

Source: ODH

The Unicode Standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world. In addition, it supports classical and historical texts of many written languages. The Unicode Standard provides a unique number for every character, no matter what platform, device, application or language. It has been adopted by all modern software providers and now allows data to be transported through many different platforms, devices and applications without corruption. Support of Unicode forms the foundation for the representation of languages and symbols in all major operating systems, search engines, browsers, laptops, and smart phones—plus the Internet and World Wide Web (URLs, HTML, XML, CSS, JSON, etc.).

Source: Unicode Consortium

An identifier for an object which is guaranteed to be different from identifiers of all other objects in a collection. Within a database, every object will have a UID that is unique within the database. A UID assigned by a central registry (such as an ISBN for books, or a DOI for data) will be unique for all objects for which it is assigned. The http://… identifiers of linked data provide a technique for guaranteeing UIDs without a central authority.

Source: ODH

Data that is more free-form, such as multimedia files, images, sound files, or unstructured text. Unstructured data does not necessarily follow any format or hierarchical sequence, nor does it follow any relational rules. Unstructured data refers to masses of (usually) computerized information which do not have a data structure which is easily readable by a machine. Examples of unstructured data may include audio, video and unstructured text such as the body of an email or word processor document. Data mining techniques are used to find patterns in, or otherwise interpret, this information. Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data – commonly appearing in e-mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations, and Web pages (“The Problem with Unstructured Data.”)

Source: US OD

Uniform Resource Identifier / Uniform Resource Locator. A URL is the http://… web address of some page or resource. When a URL is used in linked data as the identifier for some object, it is not strictly a locator for the object (e.g. http://dbpedia.org/page/Paris is the location of a document about Paris, but not of Paris itself), so in this context it is referred to as a URI.

Source: ODH

URI - Uniform Resource Identifier. A string that uniquely identifies virtually anything, including a physical building or more abstract concepts such as colours. It may or may not be resolvable on the web.

Source: W3C — http://www.w3.org/TR/ld-glossary/#uniform-resource-identifier

URL - Uniform resource locator. A global identifier commonly called a web address. A URL is resolvable on the web. All HTTP URLs are URIs; however, not all URIs are URLs.

Source: W3C — http://www.w3.org/TR/ld-glossary/#uniform-resource-locator

The Unicode Standard supports three encoding forms (UTF-8, UTF-16, UTF-32) that use a common repertoire of characters. These encoding forms allow for encoding as many as a million characters. This is sufficient for all known character encoding requirements, including full coverage of all historic scripts of the world, as well as common notational systems. All three encoding forms encode the same common character repertoire and can be efficiently transformed into one another without loss of data.
UTF-8 is popular for HTML and similar protocols. UTF-8 is a way of transforming all Unicode characters into a variable length encoding of bytes.

UTF-16 is popular in many environments that need to balance efficient access to characters with economical use of storage.

UTF-32 is useful where memory space is no concern, but fixed width, single code unit access to characters is desired.

Source: Unicode Consortium


A visual representation of data is often the most compelling way of communicating the data, bringing out its key features, correlations and outliers. Though many tools exist, creating a visualisation for a dataset is not an automatic process, but requires careful attention to the meaning of the variables, the relations between them and the stories inherent in the data, to design a visual representation that lets the message of the data shine through.

Source: ODH

A standard specifying the identifiers to be used for a particular collection of objects. Using standard vocabularies where they exist is key to enabling data integration. Linked data is rich in vocabularies in different topic areas. Source: ODH

A collection of terms for a particular purpose. Vocabularies can range from simple, such as the widely used RDF schema, FOAF and Dublin core metadata element set, to complex vocabularies with thousands of terms, such as those used in healthcare to describe symptoms, diseases and treatments. Vocabularies play a very important role in linked data, specifically to help with data integration. The use of this term overlaps with that of ‘ontology’.

Source: W3C — http://www.w3.org/TR/ld-glossary/#vocabulary


The World Wide Web, the vast collection of interlinked and linkable documents and services accessible via ‘web browsers’ over the Internet.

Source: ODH

The first generation of the World Wide Web, characterised by separate static websites rather than continually updated weblogs and social networking tools.

Source: http://en.wiktionary.org/wiki/Web_1.0

A colloquial description of the part of the World Wide Web that implements social networking, blogs, user comments and ratings, as well as related human-centred activities.

Source: W3C — http://www.w3.org/TR/ld-glossary/#web-2.0

A colloquial description of the part of the World Wide Web that implements machine-readable data and the ability to perform distributed queries and analysis on that data. It is considered synonymous with the terms ‘semantic web’ and ‘the web of data’.

Source: W3C — http://www.w3.org/TR/ld-glossary/#web-3.0

An API that is designed to work over the Internet.

Source: ODH

A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards

Source: US OD

An XML-based language (Web Services Description Language) used to describe the services a business offers and to provide a way for individuals and other businesses to access those services electronically.

Source: US OD


A proprietary spreadsheet format, the native format of the popular Microsoft Excel spreadsheet package. Older versions use .xls files, while more recent ones use the XML-based .xlsx variant.

Source: ODH

Extensible Markup Language, a simple and powerful standard for representing structured data. It is a markup language that defines a set of rules for encoding documents in a format which is both human readable and machine readable. It is a flexible language for creating common information formats and sharing both the format and content of data over the Internet and elsewhere. XML is a formatting language recommended by the World Wide Web Consortium (W3C).

Sources: https://en.wikipedia.org/wiki/XML ; ODH;  US OD