Category Archives: Open Data

A Data Revolution that Works for All of Us

Many of today’s global challenges are not new. Economic inequality, the unfettered power of corporations and markets, the need to cooperate to address global problems and the unsatisfactory levels of accountability in democratic governance – these were as much problems a century ago as they remain today.

What has changed, however – and most markedly – is the role that new forms of information and information technology could potentially play in responding to these challenges.

What’s going on?

The incredible advances in digital technology mean we have an unprecedented ability to create, share and access information. Furthermore, these technologies are increasingly not just the preserve of the rich, but are available to everyone – including the world’s poorest. As a result, we are living in a (veritable) data revolution – never before has so much data – public and personal – been collected, analysed and shared.

However, the benefits of this revolution are far from being shared equally.

On the one hand, some governments and corporations are already using this data to greatly increase their ability to understand – and shape – the world around them. Others, however, including much of civil society, lack the necessary access and capabilities to truly take advantage of this opportunity. Faced with this information inequality, what can we do?

How can we enable people to hold governments and corporations to account for the decisions they make, the money they spend and the contracts they sign? How can we unleash the potential for this information to be used for good – from accelerating research to tackling climate change? And, finally, how can we make sure that personal data collected by governments and corporations is used to empower rather than exploit us?

So how should we respond?

Fundamentally, we need to make sure that the data revolution works for all of us. We believe that key to achieving this is to put “open” at the heart of the digital age. We need an open data revolution.

We must ensure that essential public-interest data is open, freely available to everyone. Conversely, we must ensure that data about me – whether collected by governments, corporations or others – is controlled by and accessible to me. And finally, we have to empower individuals and communities – especially the most disadvantaged – with the capabilities to turn data into the knowledge and insight that can drive the change they seek.

In this rapidly changing information age – where the rules of the game are still up for grabs – we must be active, seizing the opportunities we have, if we are to ensure that the knowledge society we create is an open knowledge society, benefiting the many not the few, built on principles of collaboration not control, sharing not monopoly, and empowerment not exploitation.

Save the Date – OGP Pre-Conference, London Wednesday 30th October

This Autumn the Open Government Partnership Annual Conference is coming to London and will place on the 31st October and 1st November. As a lead into the main event, OGP is planning a 1-day civil society Pre-Conference event on Wednesday 30th October and we here at the Open Knowledge Foundation will be collaborating with them on it.

An informal group discussion on open government data

The aim is for this to be informal with lots of open space and a collaboratively organized schedule with activities and discussions like:

  • What does civil society want from OGP?
  • What’s next for open government data and open government?
  • Small group conversations about challenges and what can be learnt from other initiatives like EITI, IATI and the like
  • Workshops and data expeditions
  • Space for individual communities groups to meet, share and plan
  • Your suggestion here

If you’re interested you can pre-register now so as to notified once registration opens and more information becomes available.

Pre-register now »

Further details coming soon!

Git (and Github) for Data

The ability to do “version control” for data is a big deal. There are various options but one of the most attractive is to reuse existing tools for doing this with code, like git and mercurial. This post describes a simple “data pattern” for storing and versioning data using those tools which we’ve been using for some time and found to be very effective.

Introduction

The ability to do revisioning and versioning data – store changes made and share them with others – especially in a distributed way would be a huge benefit to the (open) data community. I’ve discussed why at some length before (see also this earlier post) but to summarize:

  • It allows effective distributed collaboration – you can take my dataset, make changes, and share those back with me (and different people can do this at once!)
  • It allows one to track provenance better (i.e. what changes came from where)
  • It allows for sharing updates and synchronizing datasets in a simple, effective, way – e.g. an automated way to get the last months GDP or employment data without pulling the whole file again

There are several ways to address the “revision control for data” problem. The approach here is to get data in a form that means we can take existing powerful distributed version control systems designed for code like git and mercurial and apply them to the data. As such, the best github for data may, in fact, be github (of course, you may want to layer data-specific interfaces on on top of git(hub) – this is what we do with http://data.okfn.org/).

There are limitations to this approach and I discuss some of these and alternative models below. In particular, it’s best for “small (or even micro) data” – say, under 10Mb or 100k rows. (One alternative model can be found in the very interesting Dat project recently started by Max Ogden — with whom I’ve talked many times on this topic).

However, given the maturity and power of the tooling – and its likely evolution – and the fact that so much data is small we think this approach is very attractive.

The Pattern

The essence of the pattern is:

  1. Storing data as line-oriented text and specifically as CSV1 (comma-separated variable) files. “Line oriented text” just indicates that individual units of the data such as a row of a table (or an individual cell) corresponds to one line2.

  2. Use best of breed (code) versioning like git mercurial to store and manage the data.

Line-oriented text is important because it enables the powerful distributed version control tools like git and mercurial to work effectively (this, in turn, is because those tools are built for code which is (usually) line-oriented text). It’s not just version control though: there is a large and mature set of tools for managing and manipulating these types of files (from grep to Excel!).

In addition to the basic pattern, there are several a few optional extras you can add:

  • Store the data in GitHub (or Gitorious or Bitbucket or …) – all the examples below follow this approach
  • Turn the collection of data into a Simple Data Format data package by adding a datapackage.json file which provides a small set of essential information like the license, sources, and schema (this column is a number, this one is a string)
  • Add the scripts you used to process and manage data — that way everything is nicely together in one repository

What’s good about this approach?

The set of tools that exists for managing and manipulating line-oriented files is huge and mature. In particular, powerful distributed version control systems like git and mercurial are already extremely robust ways to do distributed, peer-to-peer collaboration around code, and this pattern takes that model and makes it applicable to data. Here are some concrete examples of why its good.

Provenance tracking

Git and mercurial provide a complete history of individual contributions with “simple” provenance via commit messages and diffs.

Example of commit messages

Peer-to-peer collaboration

Forking and pulling data allows independent contributors to work on it simultaneously.

Timeline of pull requests

Data review

By using git or mercurial, tools for code review can be repurposed for data review.

Pull screen

Simple packaging

The repo model provides a simple way to store data, code, and metadata in a single place.

A repo for data

Accessibility

This method of storing and versioning data is very low-tech. The format and tools are both very mature and are ubiquitous. For example, every spreadsheet and every relational database can handle CSV. Every unix platform has a suite of tools like grep, sed, cut that can be used on these kind of files.

Examples

We’ve been using with this approach for a long-time: in 2005 we first stored CSV’s in subversion, then in mercurial, and then when we switched to git (and github) 3 years ago we started storing them there. In 2011 we started the datasets organization on github which contains a whole list of of datasets managed according to the pattern above. Here are a couple of specific examples:

Note Most of these examples not only show CSVs being managed in github but are also simple data format data packages – see the datapackage.json they contain.


Appendix

Limitations and Alternatives

Line-oriented text and its tools are, of course, far from perfect solutions to data storage and versioning. They will not work for datasets of every shape and size, and in some respects they are awkward tools for tracking and merging changes to tabular data. For example:

  • Simple actions on data stored as line-oriented text can lead to a very large changeset. For example, swapping the order of two fields (= columns) leads to a change in every single line. Given that diffs, merges, etc. are line-oriented, this is unfortunate.3
  • It works best for smallish data (e.g. < 100k rows, < 50mb files, optimally < 5mb files). git and mercurial don’t handle big files that well, and features like diffs get more cumbersome with larger files.4
  • It works best for data made up of lots of similar records, ideally tabular data. In order for line-oriented storage and tools to be appropriate, you need the record structure of the data to fit with the CSV line-oriented structure. The pattern is less good if your CSV is not very line-oriented (e.g. you have a lot of fields with line breaks in them), causing problems for diff and merge.
  • CSV lacks a lot of information, e.g. information on the types of fields (everything is a string). There is no way to add metadata to a CSV without compromising its simplicity or making it no longer usable as pure data. You can, however, add this kind of information in a separate file, and this exactly what the Data Package standard provides with its datapackage.json file.

The most fundamental limitations above all arise from applying line-oriented diffs and merges to structured data whose atomic unit is not a line (its a cell, or a transform of some kind like swapping two columns)

The first issue discussed below, where a simple change to a table is treated as a change to every line of the file, is a clear example. In a perfect world, we’d have both a convenient structure and a whole set of robust tools to support it, e.g. tools that recognize swapping two columns of a CSV as a single, simple change or that work at the level of individual cells.

Fundamentally a revision system is built around a diff format and a merge protocol. Get these right and much of the rest follows. The basic 3 options you have are: * Serialize to line-oriented text and use the great tools like git (what’s we’ve described above) * Identify atomic structure (e.g. document) and apply diff at that level (think CouchDB or standard copy-on-write for RDBMS at row level) * Recording transforms (e.g. Refine)

At the Open Knowledge Foundation we built a system along the lines of (2) and been involved in exploring and researching both (2) and (3) – see changes and syncing for data on on dataprotocols.org. These options are definitely worth exploring — and, for example, Max Ogden, with whom I’ve had many great discussions on this topic, is currently working on an exciting project called Dat, a collaborative data tool which will use the “sleep” protocol.

However, our experience so far is that the line-oriented approach beats any currently available options along those other lines (at least for smaller sized files!).

data.okfn.org

Having already been storing data in github like this for several years, we recently launched http://data.okfn.org/ which is explicitly based on this approach:

  • Data is CSV stored in git repos on GitHub at https://github.com/datasets
  • All datasets are data packages with datapackage.json metadata
  • Frontend site is ultra-simple – it just provides catalog and API and pulls data directly from github

Why line-oriented

Line-oriented text is the natural form of code and so is supported by a huge number of excellent tools. But line-oriented text is also the simplest and most parsimonious form for storing general record-oriented data—and most data can be turned into records.

At its most basic, structured data requires a delimiter for fields and a delimiter for records. Comma- or tab-separated values (CSV, TSV) files are a very simple and natural implementation of this encoding. They delimit records with the most natural separation character besides the space, the line break. For a field delimiter, since spaces are too common in values to be appropriate, they naturally resort to commas or tabs.

Version control systems require an atomic unit to operate on. A versioning system for data can quite usefully treat records as the atomic units. Using line-oriented text as the encoding for record-oriented data automatically gives us a record-oriented versioning system in the form of existing tools built for versioning code.


  1. Note that, by CSV, we really mean “DSV”, as the delimiter in the file does not have to be a comma. However, the row terminator should be a line break (or a line break plus carriage return). 

  2. CSVs do not always have one row to one line (it is possible to have line-breaks in a field with quoting). However, most CSVs are one-row-to-one-line. CSVs are pretty much the simplest possible structured data format you can have. 

  3. As a concrete example, the merge function will probably work quite well in reconciling two sets of changes that affect different sets of records, hence lines. Two sets of changes which each move a column will not merge well, however. 

  4. For larger data, we suggest swapping out git (and e.g. GitHub) for simple file storage like s3. Note that s3 can support basic copy-on-write versioning. However, being copy-on-write, it is comparatively very inefficient. 

Shuttleworth Fellowship Quarterly Review – Feb 2012

As part of my Shuttleworth Fellowship I’m preparing quarterly reviews of what I and the Open Knowledge Foundation have been up to. So, herewith are some some highlights from the last 3 months.

Highlights

  • Substantial new project support from several funders including support for Science working group and Economics working group
  • Our CKAN Data Management System selected in 2 major new data portal initatives
  • Continuing advance of projects across the board with several projects reaching key milestones (v1.0 or beta release, adoption by third parties)
  • Rapid expansion of chapters and local groups — e.g. London Meetup now has more than 100 participants, new chapters in Belgium and Switzerland are nearly finalized
  • Completion of major upgrade of core web-presence with new branding and theme used on http://okfn.org/ and across our network of sites (now numbering more than 40)
  • Announcement of School of Data which drew huge attention from the community. This is will be a joint Open Knowledge Foundation / P2PU project.
  • Major strengthening of organizational capacity with new staff

Projects

Major new project support including:

CKAN and the DataHub

OpenSpending

  • Major breakthrough with achievement of simple data upload and management process – result of more than 9 months of work
  • OpenSpending now contains more than 30 datasets with ~7 million spending items (up from 2 datasets and ~200k items a year ago, and under 10 datasets a 1.5m items just 4 months ago)
  • Substantial expansion in set of collaborators and a variety of new funding opportunities

Other Projects

  • BibServer and BibSoup, our bibliogrpahic software and service, reached beta and have been receiving increasing attention

  • Public Domain Review celebrated its 1st Birthday. Some stats:

    • The Review now has more than 800+ email subscribers, ~800 followers on Twitter
    • 20k visitors with over 40k page views per month
    • An increasing number of supporters making a monthly donation
  • Initiated a substantive collaboration on the PyBossa crowdsourcing platform with Shuttleworth Fellow Emeritus Francois Grey and his Citizen Cyberscience Centre

  • Annotator and AnnotateIt v1.0 Completed and Released

    • Annotator is now seeing uptake from several third-party projects and developers
    • Project components now have more than 100 followers on GitHub (up from ~20 in December)

Working Groups and Local Groups and Chapters

Working groups have continued to develop well:

  • New dedicated Working Group coordinator (Laura Newman)
  • Panton Fellowships run under auspices of Science Working Group
  • Funding of Economics Working Group

Rapid Chapter and local group development:

Additional items

Events and Meetings

Participated in numerous events and meetings including:

Shuttleworth Fellowship Bi-Annual Review

As part of my Shuttleworth Fellowship I’m preparing bi-annual reviews of what I — and projects I’m involved in — have been up to. So, herewith are some some highlights from the last 6 months.

CKAN and the theDataHub

OpenSpending

  • Two major point releases of OpenSpending software v0.10 and v0.11 (v0.11 just last week!). Huge maturing and development of the system. Backend architecture now finalized after a major refactor and reworking.
  • Community has grown significantly with now almost 50 OpenSpending datasets on theDataHub.org and growing group of core “data wranglers”
  • Spending Stories was a winner of the Knight News Challenge. Spending Stories will build on and extend OpenSpending.

Open Bibliography and the Public Domain

Open Knowledge Foundation and the Community

  • In September we received a 3 year grant from the Omidyar Network to help the Open Knowledge Foundation sustain and expand its community especially in the formation of new chapters
  • Completed a major recruitment process in (Summer-Autumn 2011) to bring on more paid OKFN team members including community coordinators, foundation coordinator and developers
  • The Foundation participated in launch of Open Government Partnership and CSO events surrounding the meeting
  • Working groups continuing to develop. Too much activity to summarize it all here but some highlights include:
    • WG Science Coordinator Jenny Molloy travelling to OSS2011 in SF to present Open Research Reports with Peter Murray-Rust
    • Open Economics WG developing and Open Knowledge Index in August
    • Open Bibliography working group’s work on an Metadata guide.
    • Open Humanities / Open Literature working group winning Inventare Il Futuro competition with their idea to use the Annotator
  • Development of new Local Groups and Chapters
    • Lots of ongoing activities in existing local groups and chapters such as those in Germany and Italy have
    • In addition, interest from a variety of areas in the establishment of new chapters and local groups, for example in Brazil and Belgium
  • Start of work on OKFN labs

Meetups and Events

Talks and Events

  • Attended Open Government Partnership meeting in July in Washington DC and launch event in New York in September
  • Attended Chaos Computer Camp with other OKFNers in August near Berlin
  • September: Spoke at PICNIC in Amsterdam
  • October: Code for America Summit in San Francisco (plus meetings) – see partial writeup
  • October: Open Government Data Camp in Warsaw (organized by Open Knowledge Foundation)
  • November: South Africa – see this post on Africa@Home and Open Knowledge meetup in Cape Town

General

Talking at Legal Aspects of Public Sector Information (LAPSI) Conference in Milan

This week on Thursday and Friday I’ll be in Milan to speak at the 1st LAPSI (Legal Aspects of Public Sector Information) Primer & Public Conference.

I’m contributing to a “primer” session on The Perspective of Open Data Communities and then giving a conference talk on Collective Costs and Benefits in Opening PSI for Re-use in a session on PSI Re-use: a Tool for Enhancing Competitive Markets where I’ll be covering work by myself and others on pricing and regulation of PSI (see e.g. the “Cambridge Study” and the paper on the Economics of the Public Sector of Information).

Update: slides are up.

Community, Openness And Technology

PSI: Costs And Benefits Of Openness

Creative Commons and the Commons

Background: I first got involved with Creative Commons (CC) in 2004 soon after its UK chapter started. Along with Damian Tambini, the then UK ‘project lead’ for CC, and the few other members of ‘CC UK’, I spent time working to promote CC and its licenses in the UK (and elsewhere). By mid-2007 I was no longer very actively involved and to most intents and purposes was no longer associated with the organization. I explain this to give some background to what follows.

Creative Commons as a brand has been fantastically successful and is now very widely recognized. While in many ways this success has been beneficial for those interested in free/open material it has also raised some issues that are worth highlighting.

Creative Commons is not a Commons

Ironically, despite its name, Creative Commons, or more precisely its licenses, do not produce a commons. The CC licenses are not mutually compatible, for example, material with a CC Attribution-Sharealike (by-sa) license cannot be intermixed with material licensed with any of the CC NonCommercial licenses (e.g. Attribution-NonCommercial, Attribution-Sharealike-Noncommercial).

Given that a) the majority of CC licenses in use are ‘non-commercial’ b) there is also large usage of ShareAlike (e.g. Wikipedia), this is an issue affects a large set of ‘Creative Commons’ material.

Unfortunately, the presence of the word ‘Commons’ in CC’s name and the prominence of ‘remix’ in the advocacy around CC tends to make people think, falsely, that all CC licenses as in some way similar or substitutable.

The ‘Brand’ versus the Licenses

More and more frequently I hear people say (or more significantly write) things like: “This material is CC-licensed”. But as just discussed there is large, and very significant, variation in the terms of the different CC licenses. It appears that for many people the overall ‘Brand’ dominates the actual specifics of the licenses.

This is in marked contrast to the Free/Open Source software community, where even in the case of the Free Software Foundation’s licenses people tend to specify the exact license they are talking about.

Standards and interoperability are what really matter for licenses (cf the “Commons” terminology). Licensing and rights discussions are pretty dull for most people — and should be. They are important only because they determine what you and I can and can’t do, and specifically what material you and I can ‘intermix’ — possible only where licenses are ‘interoperable’.

To put it the other way round: licenses are interoperable if you can intermix freely material licensed under one of those licenses with material licensed under another. This interoperability is crucial and it is, in license terms, what underlies a true commons.

More broadly we are interested in a ‘license standard’, in knowing, not only that a set of licenses are interoperable, but that they all allow certain things, for example for anyone to use, reuse and redistribute the licensed material (or to put in terms of freedom, that they guarantee those freedoms to users). This very need for a standard is why we created the Open Definition for content and data building directly on the work on a similar standard (the Open Source Definition) in the Free/Open Source software community.

The existence of non-commercial

CC took a crucial decision in including NonCommercial licenses in their suite. Given the ‘Brand’ success of Creative Commons the inclusion of NC licenses has been to give them a status close to, if not identical, with the truly open, commons-supporting, licenses in the CC suite.

This is a noticeable difference here with the software world, where NC is also active, but under the ‘freeware’ and ‘shareware’ names (these terms aren’t always used consistently), and with this material clearly distinguished from the Free/Open Source software community.

As the CC brand has grown, there is a desire by some individuals and institutions to use CC licenses simply because they are CC licenses (this is also encouraged by the baking in of CC licenses to many products and services). Faced with choosing a license, many people, and certainly many institutions, tend to go for the more restrictive option available (especially when the word commercial is in there — who wants to sanction exploitation for gain of their work by some third-party!). Thus, it is no surprise that non-commercial licenses appear to be by far the most popular.

Without the NC option, some of these people would have chosen one of the open CC licenses instead. Of course, some would not have licensed at all (or, at least not with a CC license), sticking with pure copyright or some other set of terms. Nevertheless, the benefit in gaining a clear dividing line, and in creating brand-pressure for a real commons, and real openness would have been substantial, and worth, in my opinion, the loss of the non-commercial option.

Structure and community

It is notable in the F/OSS community that most licenses, especially the most popular, are either not ‘owned’ by anyone (MIT/BSD) or are run by an organization with a strong community base (e.g. the Free Software Foundation). Creative Commons seem rather different. While there are public mailing lists ultimately decisions regarding the licenses, and about crucial features thereof such as compatibility with 3rd party licenses, remains with CC central based in San Francisco.

Originally, there was a fair amount of autonomy given to country projects but over time this autonomy has gradually been reduced (there are good reasons for this — such as a need for greater standardization across licenses). This has concrete affects for the terms in licenses.

For example, for v3.0 the Netherlands were requested to remove their provisions which included things like DB rights in their share-alike provision and instead standardize on a waiver for these additional rights (rights which are pretty important if you are doing data(base) licensing). Most crucially the CC licenses reserve the right to Creative Commons as an organization to determine compatibility decisions. This is arguably the single most important aspect of licensing, at least in respect of interoperability and the Commons.

Creative Commons and Data

Update: as September 2011 there has been further discussion between Open Data Commons and Creative Commons on these matters, especially regarding interoperability and Creative Commons v4.0.

From my first involvement in the ‘free/open’ area, I’d been interested in data licensing, both because of personal projects and requests from other people.

When first asked how to deal with this I’d recommended ‘modding’ a specific CC license (e.g. Attribution-Sharealike) to include provisions for data and data(bases). However, starting from 2006 there was a strong push from John Wilbanks, then at Science Commons but with the apparent backing of CC generally, against this practice as part of a general argument for ‘PD-only’ for data(bases) (with the associated implication that the existing CC licenses were content-only). While I respect John, I didn’t really agree with his arguments about PD-only and furthermore it was clear that there was a need in the community for open but non-PD licenses for data(bases).

In late 2007 I spoke with Jordan Hatcher and discovered about the work he and Charlotte Waelde were doing for Talis, to draft a new ‘open’ license for data(bases). I was delighted and started helping Jordan with these licenses — licenses that became the Open Data Commons PDDL and the ODbL. We sought input from CC during the drafting of these licenses, specifically the ODbL, but the primary response we had (from John Wilbanks and colleagues) was just “don’t do this”.

Once the ODbL was finalized we then contacted CC further about potential compatibility issues.

The initial response then was that, as CC did not recommend use of its licenses (other than CCZero) for data(bases), there should not be an issue since, as with CC licenses and software, there should be an ‘orthogonality’ of activity — CC licenses would license content, F/OSS licenses would license code, and data(base) licenses (such as the ODC ones) would license data. We pressed about this and had a phone con about this with Diane Peters and John Wilbanks in January 2010, with a follow-up email detailing the issues a bit later.

We’ve also explained on several occasions to senior members of CC central our desire to hear from CC on this issue and our willingness to look at ways to make any necessary amendments to ODC licenses (though obviously such changes would be conditional on full scrutiny by the Advisory Council and consultation with the community).

No response has been forthcoming. To this date, over a year later, we are yet to receive any response from CC despite having though we have now been promised a response at least 3 times (we’ve basically given up asking).

Further to this lack response, without any notice or discussion to ODC, CC recently put out a blog post in which they stated, in marked contrast to previous statements, that CC licenses were entirely suited to data. In many ways this is a welcome step (cf. my original efforts to use CC licenses for data above) but CC have made no statement about a) how they would seek to address data properly b) mention of the relationship of these efforts to existing work in Open Data Commons and especially re. the ODbL. One can only assume, at least in the latter case, that the omission was intentional.

All of this has led me, at least, to wonder what exactly CC’s aims are here. In particular, is CC genuinely concerned with interoperability (beyond a simple ‘everyone uses CC’) and the broader interests of the community who use and apply their licenses?

Conclusion

Creating a true commons for content and data is incredibly important (it’s one of the main things I work on day to day). Creative Commons have done amazing work in this area but as I outline above there is an important distinction between the (open) commons and CC licenses.

Many organisations, institutions, governments and individuals are currently making important decisions about licensing and legal tools – in relation to opening up everything from scientific information, to library catalogues to government data. CC could play an important role in the creation of an interoperable commons of open material. The open CC licenses (CC0, CC-BY and CC-BY-SA) are an important part of the legal toolbox which enables this commons.

I hope that CC will be willing to engage constructively with others in the ‘open’ community to promote licenses and standards which enable a true commons, particularly in relation to data where interoperability is especially crucial.

Talk at UKSG 2011 Conference

Yesterday, I was up in Harrogate at the UKSG (UK Serials Group) annual conference to speak in a keynote session on Open Bibiliograpy and Open Bibliographic Data.

I’ve posted the slides online and iframed below.

Outline

Over the past few years, there has an explosive growth in open data with significant uptake in government, research and elsewhere.

Bibliographic records are a key part of our shared cultural heritage. They too should therefore be open, that is made available to the public for access and re-use under an open license which permits use and reuse without restriction (http://opendefinition.org/). Doing this promises a variety of benefits.

First, it would allow libraries and other managers of bibliographic data to share records more efficiently and improve quality more rapidly through better, easier feedback. Second, through increased innovation in bibliographic services and applications generating benefits for the producers and users of bibliographic data and the wider community.

This talk will cover the what, why and how of open bibliographica data, drawing on direct recent experience such as the development of the Open Biblio Principles and the work of the Bibliographica and JISC OpenBib projects to make the 3 million records of the British Library’s British National Bibliography (BNB) into linked open data.

With a growing number of Government agencies and public institutions making data open, is it now time for the publishing and library community to do likewise?