Open Education Handbook/Creating Open Data

From Wikibooks, open books for an open world
Jump to navigation Jump to search

How you open up data is covered in detail in the Open Data Handbook. There are three key rules recommend when opening up data:

  • Keep it simple. Start out small, simple and fast. There is no requirement that every dataset must be made open right now. Starting out by opening up just one dataset, or even one part of a large dataset, is fine – of course, the more datasets you can open up the better.
  • Remember this is about innovation. Moving as rapidly as possible is good because it means you can build momentum and learn from experience – innovation is as much about failure as success and not every dataset will be useful.
  • Engage early and engage often. Engage with actual and potential users and reusers of the data as early and as often as you can, be they citizens, businesses or developers. This will ensure that the next iteration of your service is as relevant as it can be.
  • It is essential to bear in mind that much of the data will not reach ultimate users directly, but rather via ‘info-mediaries’. These are the people who take the data and transform or remix it to be presented. For example, most of us don’t want or need a large database of GPS coordinates, we would much prefer a map. Thus, engage with infomediaries first. They will reuse and repurpose the material.
  • Address common fears and misunderstandings. This is especially important if you are working with or within large institutions such as government. When opening up data you will encounter plenty of questions and fears. It is important to (a) identify the most important ones and (b) address them at as early a stage as possible.

Opening up data[edit | edit source]

  • Choose the dataset(s) you plan to make open. Keep in mind that you can (and may need to) return to this step if you encounter problems at a later stage.
  • Apply an open license.
  • Determine what intellectual property rights exist in the data.
  • Apply a suitable ‘open’ license that licenses all of these rights
  • Make the data available - in bulk and in a useful format. You may also wish to consider alternative ways of making it available such as via an API.
  • Make it discoverable - post on the web and perhaps organize a central catalogue to list your open datasets.

When making data open it's important to think about the possible ethical implications of a release. A useful resource in thinking about this is the OER Research Hub Ethics Manual.

Machine-readable data[edit | edit source]

While human users are unequivocally the ultimate consumers of open data, as in education so in any other domain, human interaction is not necessarily the only means to consume and process these data until they are delivered to end-users in a form that responds to their needs. More often it will be for software systems, in the form of applications and services, to take the role of consuming data and delivering them, or a by-product thereof, to the user.

Much existing content, however, is presented or even simply exists in a form that is for the human brain to process, such as natural language text, images and audio-visual footage. Although there are technologies for software systems to extract meaningful data out of this content, a cleaner and less error-prone way is for the data providers to publish their content in a machine-readable form. In most cases, these data do not replace their natural language or audio-visual forms: on the contrary, they can be used to enhance the content presented in human-readable form in a variety of ways.

Common open data technologies:

Linked Data[edit | edit source]

A fundamental principle to be understood concerning the availability of linked data as resources reachable via a URI, is that they do not prevent the same resource to be presented in another format on the same URI. It is not implied that pasting the same URI in a Web browser will necessarily deliver an RDF document that describes that resource, just as it does not mean that only one RDF format can be delivered at that address. Thanks to modern Web Service standards such as the REST architectural style, for any URI an application can negotiate on-the-fly a format that both the application itself and the data provider support.