Category Archives: Openness

Creative Commons and the Commons

Background: I first got involved with Creative Commons (CC) in 2004 soon after its UK chapter started. Along with Damian Tambini, the then UK ‘project lead’ for CC, and the few other members of ‘CC UK’, I spent time working to promote CC and its licenses in the UK (and elsewhere). By mid-2007 I was no longer very actively involved and to most intents and purposes was no longer associated with the organization. I explain this to give some background to what follows.

Creative Commons as a brand has been fantastically successful and is now very widely recognized. While in many ways this success has been beneficial for those interested in free/open material it has also raised some issues that are worth highlighting.

Creative Commons is not a Commons

Ironically, despite its name, Creative Commons, or more precisely its licenses, do not produce a commons. The CC licenses are not mutually compatible, for example, material with a CC Attribution-Sharealike (by-sa) license cannot be intermixed with material licensed with any of the CC NonCommercial licenses (e.g. Attribution-NonCommercial, Attribution-Sharealike-Noncommercial).

Given that a) the majority of CC licenses in use are ‘non-commercial’ b) there is also large usage of ShareAlike (e.g. Wikipedia), this is an issue affects a large set of ‘Creative Commons’ material.

Unfortunately, the presence of the word ‘Commons’ in CC’s name and the prominence of ‘remix’ in the advocacy around CC tends to make people think, falsely, that all CC licenses as in some way similar or substitutable.

The ‘Brand’ versus the Licenses

More and more frequently I hear people say (or more significantly write) things like: “This material is CC-licensed”. But as just discussed there is large, and very significant, variation in the terms of the different CC licenses. It appears that for many people the overall ‘Brand’ dominates the actual specifics of the licenses.

This is in marked contrast to the Free/Open Source software community, where even in the case of the Free Software Foundation’s licenses people tend to specify the exact license they are talking about.

Standards and interoperability are what really matter for licenses (cf the “Commons” terminology). Licensing and rights discussions are pretty dull for most people — and should be. They are important only because they determine what you and I can and can’t do, and specifically what material you and I can ‘intermix’ — possible only where licenses are ‘interoperable’.

To put it the other way round: licenses are interoperable if you can intermix freely material licensed under one of those licenses with material licensed under another. This interoperability is crucial and it is, in license terms, what underlies a true commons.

More broadly we are interested in a ‘license standard’, in knowing, not only that a set of licenses are interoperable, but that they all allow certain things, for example for anyone to use, reuse and redistribute the licensed material (or to put in terms of freedom, that they guarantee those freedoms to users). This very need for a standard is why we created the Open Definition for content and data building directly on the work on a similar standard (the Open Source Definition) in the Free/Open Source software community.

The existence of non-commercial

CC took a crucial decision in including NonCommercial licenses in their suite. Given the ‘Brand’ success of Creative Commons the inclusion of NC licenses has been to give them a status close to, if not identical, with the truly open, commons-supporting, licenses in the CC suite.

This is a noticeable difference here with the software world, where NC is also active, but under the ‘freeware’ and ‘shareware’ names (these terms aren’t always used consistently), and with this material clearly distinguished from the Free/Open Source software community.

As the CC brand has grown, there is a desire by some individuals and institutions to use CC licenses simply because they are CC licenses (this is also encouraged by the baking in of CC licenses to many products and services). Faced with choosing a license, many people, and certainly many institutions, tend to go for the more restrictive option available (especially when the word commercial is in there — who wants to sanction exploitation for gain of their work by some third-party!). Thus, it is no surprise that non-commercial licenses appear to be by far the most popular.

Without the NC option, some of these people would have chosen one of the open CC licenses instead. Of course, some would not have licensed at all (or, at least not with a CC license), sticking with pure copyright or some other set of terms. Nevertheless, the benefit in gaining a clear dividing line, and in creating brand-pressure for a real commons, and real openness would have been substantial, and worth, in my opinion, the loss of the non-commercial option.

Structure and community

It is notable in the F/OSS community that most licenses, especially the most popular, are either not ‘owned’ by anyone (MIT/BSD) or are run by an organization with a strong community base (e.g. the Free Software Foundation). Creative Commons seem rather different. While there are public mailing lists ultimately decisions regarding the licenses, and about crucial features thereof such as compatibility with 3rd party licenses, remains with CC central based in San Francisco.

Originally, there was a fair amount of autonomy given to country projects but over time this autonomy has gradually been reduced (there are good reasons for this — such as a need for greater standardization across licenses). This has concrete affects for the terms in licenses.

For example, for v3.0 the Netherlands were requested to remove their provisions which included things like DB rights in their share-alike provision and instead standardize on a waiver for these additional rights (rights which are pretty important if you are doing data(base) licensing). Most crucially the CC licenses reserve the right to Creative Commons as an organization to determine compatibility decisions. This is arguably the single most important aspect of licensing, at least in respect of interoperability and the Commons.

Creative Commons and Data

Update: as September 2011 there has been further discussion between Open Data Commons and Creative Commons on these matters, especially regarding interoperability and Creative Commons v4.0.

From my first involvement in the ‘free/open’ area, I’d been interested in data licensing, both because of personal projects and requests from other people.

When first asked how to deal with this I’d recommended ‘modding’ a specific CC license (e.g. Attribution-Sharealike) to include provisions for data and data(bases). However, starting from 2006 there was a strong push from John Wilbanks, then at Science Commons but with the apparent backing of CC generally, against this practice as part of a general argument for ‘PD-only’ for data(bases) (with the associated implication that the existing CC licenses were content-only). While I respect John, I didn’t really agree with his arguments about PD-only and furthermore it was clear that there was a need in the community for open but non-PD licenses for data(bases).

In late 2007 I spoke with Jordan Hatcher and discovered about the work he and Charlotte Waelde were doing for Talis, to draft a new ‘open’ license for data(bases). I was delighted and started helping Jordan with these licenses — licenses that became the Open Data Commons PDDL and the ODbL. We sought input from CC during the drafting of these licenses, specifically the ODbL, but the primary response we had (from John Wilbanks and colleagues) was just “don’t do this”.

Once the ODbL was finalized we then contacted CC further about potential compatibility issues.

The initial response then was that, as CC did not recommend use of its licenses (other than CCZero) for data(bases), there should not be an issue since, as with CC licenses and software, there should be an ‘orthogonality’ of activity — CC licenses would license content, F/OSS licenses would license code, and data(base) licenses (such as the ODC ones) would license data. We pressed about this and had a phone con about this with Diane Peters and John Wilbanks in January 2010, with a follow-up email detailing the issues a bit later.

We’ve also explained on several occasions to senior members of CC central our desire to hear from CC on this issue and our willingness to look at ways to make any necessary amendments to ODC licenses (though obviously such changes would be conditional on full scrutiny by the Advisory Council and consultation with the community).

No response has been forthcoming. To this date, over a year later, we are yet to receive any response from CC despite having though we have now been promised a response at least 3 times (we’ve basically given up asking).

Further to this lack response, without any notice or discussion to ODC, CC recently put out a blog post in which they stated, in marked contrast to previous statements, that CC licenses were entirely suited to data. In many ways this is a welcome step (cf. my original efforts to use CC licenses for data above) but CC have made no statement about a) how they would seek to address data properly b) mention of the relationship of these efforts to existing work in Open Data Commons and especially re. the ODbL. One can only assume, at least in the latter case, that the omission was intentional.

All of this has led me, at least, to wonder what exactly CC’s aims are here. In particular, is CC genuinely concerned with interoperability (beyond a simple ‘everyone uses CC’) and the broader interests of the community who use and apply their licenses?

Conclusion

Creating a true commons for content and data is incredibly important (it’s one of the main things I work on day to day). Creative Commons have done amazing work in this area but as I outline above there is an important distinction between the (open) commons and CC licenses.

Many organisations, institutions, governments and individuals are currently making important decisions about licensing and legal tools – in relation to opening up everything from scientific information, to library catalogues to government data. CC could play an important role in the creation of an interoperable commons of open material. The open CC licenses (CC0, CC-BY and CC-BY-SA) are an important part of the legal toolbox which enables this commons.

I hope that CC will be willing to engage constructively with others in the ‘open’ community to promote licenses and standards which enable a true commons, particularly in relation to data where interoperability is especially crucial.

Shuttleworth Fellowship – Activity in the Last 3 Months

As part of my Shuttleworth Fellowship I’m preparing quarterly reports on what I’ve been up to. So, herewith are some some highlights from the last 3 months. (Previous update – Sept-Dec)

Talks and Events

Projects

General

Talking at British Library about Open Shakespeare

This Thursday I and James Harriman-Smith will be heading over to the British Library to give a talk on Open Shakespeare and possibilities for “Open Literature”.

Update: Slides from the Open Shakespeare presentation

Outline

This talk will introduce http://www.openshakespeare.org/ — an innovative new approach to Shakespeare’s works, and, eventually, any literary text. The website is, as far as we know, unique in providing both public domain texts and open tools for the analysis of Shakespeare.

One such tool, the annotator, will be a special focus of the presentation, since it offers the potential for producing the first ever critical edition of Shakespeare compiled by thousands and with no restrictions on how it is used.

As well as exploring the technical challenges presented by such a website and such tools, we will also speak more generally on the open-source movement and its impact on literary studies, the problems posed and opportunities offered by openness, and, finally, the future evolution of the project itself.

Open Shakespeare Annotation Sprint

Cross-posted from Open Knowledge Foundation blog.

Tomorrow we’re holding the first Open Shakespeare Annotation ‘Sprint’. We’ll be getting together online and in-person to collaborate on critically annotating a complete Shakespeare play with all our work being open.

All of Shakespeare’s texts are, of course, in the public domain, and therefore already ‘open’. However, most editions of Shakespeare people actually use (and purchase) are ‘critical’ editions, that is texts together with notes and annotations that explain or analyze the text, and, for these critical editions no open version yet exists. This weekend we’re aiming to change that!

Using the annotator tool we now have a way to work collaboratively online to add and develop these ‘critical’ additions and the aim of the sprint is to fully annotate one complete play. Anyone can get involved, from lay-Shakespeare-lover to English professor, all you’ll need is a web-browser and an interest in Bard, and even if you can’t make it, you can vote right now on which play we should work on!

Using specially-designed annotation software we intend to print an edition of Shakespeare unlike any other, incorporating glosses, textual notes and other information written by anyone able to connect to the Open Shakespeare website.

Work begins with a full-day annotation sprint on Saturday 5th February, which will take online as well as at in-person meetups. Anyone can organize a meetup and we’re organizing one at University of Cambridge English Faculty (if you’d like to hold your own please just add it to the etherpad linked above).

The Public Domain in 2011

According to http://publicdomainworks.net/ (which I helped build) there were 661 people whose works entered the public domain in 2011:

http://publicdomainworks.net/stats/year/2011

Of course, I should immediately state that this is a fairly crude calculation based on a simple life+70 model and therefore not applicable to e.g. the US with its 1923 cut-off (for those interested in the details of computing public domain status there’s you can find lots more here: http://wiki.okfn.org/PublicDomainCalculators).

The figure is also a significant underestimate — to do these calculations you need lots of information about authors, their death dates and their works. This kind of bibliographic metadata has, until fairly recently, been very hard to come in an open data form and so we have been limited to doing calculations with only a relatively small subset of the actual all works (though, it should be said, we do have many of the most ‘important’ authors).

Thankfully this is now changing thanks to people like the British Library opening up their data so we should see a much extended list for 2011 some time in the next few months (if you’re interested in open bibliographic data, you should join the Open Knowledge Foundation’s Open Bibliographic Data Working Group).

Launch of the Public Domain Review

Lastly, I have an exciting announcement. Thanks to the work of my Open Knowledge Foundation colleague Jonathan Gray, we’re pleased to announce the Launch of the Public Domain Review to celebrate Public Domain Day 2011:

http://publicdomainreview.okfn.org/

As Jonathan explains in the blog post:

The 1st of January every year is Public Domain Day, when new works enter the public domain in many (though unfortunately not all) countries around the world.

To celebrate, the Open Knowledge Foundation is launching the Public Domain Review, a web-based review of works which have entered the public domain:

Each week an invited contributor will present an interesting or curious work with a brief accompanying text giving context, commentary and criticism. The first piece takes a look at works by Nathanael West, whose works enter the public domain today in many jurisdictions.

You can sign up to receive the review in your inbox via email. If you’re on Twitter, you can also follow @publicdomainrev. Happy Public Domain Day!

CKAN v1.2 Released together with Datapkg v0.7

This is a cross-post of the release announcement originally put up on the OKFN Blog.


We’re delighted to announce CKAN v1.2, a new major release of the CKAN software. This is the largest iteration so far with 146 tickets closed and includes some really significant improvements most importantly a new extension/plugin system, SOLR search integration, caching and INSPIRE support (more details below). The extension work is especially significant as it now means you can extend CKAN without having to delve into any core code.

In addition there are now over 20 CKAN instances running around the world and CKAN is being used in official government catalogues in the UK, Norway, Finland and the Netherlands. Furthermore, http://ckan.net/ — our main community catalogue — now has over 1500 data ‘packages’ and has become the official home for the LOD Cloud (see the lod group on ckan.net).

We’re also aiming to provide a much more integrated ‘datahub’ experience with CKAN. Key to this is the provision of a ‘storage’ component to complement the registry/catalogue component we already have. Integrated storage will support all kinds of important functionality from automated archival of datasets to dataset cleaning with google refine.

We’ve already been making progress on this front with the launch of a basic storage service at http://storage.ckan.net/ (back in September) and the development of the OFS bucket storage library. The functionality is still at an alpha stage and integration with CKAN is still limited so improving this area will be a big aim for the next release (v1.3).

Even in its alpha stage, we are already making use of the storage system, most significantly, in the latest release of datapkg, our tool for distributing, discovering and installing data (and content) ‘packages’. In particular, the v0.7 release (more detail below) includes upload support allowing you store (as well as register) your data ‘packages’.

Highlights of CKAN v1.2 release

  • Package edit form: attach package to groups (#652) & revealable help
  • Form API – Package/Harvester Create/New (#545)
  • Authorization extended: authorization groups (#647) and creation of packages (#648)
  • Extension / Plug-in interface classes (#741)
  • WordPress twentyten compatible theming (#797)
  • Caching support (ETag) (#693)
  • Harvesting GEMINI2 metadata records from OGC CSW servers (#566)

Minor:

  • New API key header (#466)
  • Group metadata now revisioned (#231)

All tickets

Datapkg Release Notes

A major new release (v0.7) of datapkg is out!

There’s a quick getting started section below (also see the docs).

About the release

This release brings major new functionality to datapkg especially in regard to its integration with CKAN. datapkg now supports uploading as well as downloading and can now be easily extended via plugins. See the full changelog below for more details.

Get started fast

# 1. Install: (requires python and easy_install)
$ easy_install datapkg
# Or, if you don't like easy_install
$ pip install datapkg or even the raw source!

# 2. [optional] Take a look at the manual
$ datapkg man

# 3. Search for something
$ datapkg search ckan:// gold
gold-prices -- Gold Prices in London 1950-2008 (Monthly)

# 4. Get some data
# This will result in a csv file at /tmp/gold-prices/data
$ datapkg download ckan://gold-prices /tmp

# 5. Store some data
# Edit the gold prices csv making some corrections
$ cp gold-prices/data mynew.csv
$ edit mynew.csv
# Now upload back to storage
$ datapkg upload mynew.csv ckan://mybucket/ckan-gold-prices/mynew.csv

Find out more » — including how to create, register and distribute your own ‘data packages’.

Changelog

  • MAJOR: Support for uploading datapkgs (upload.py)
  • MAJOR: Much improved and extended documenation
  • MAJOR: New sqlite-based DB index giving support for a simple, central, ‘local’ index (ticket:360)
  • MAJOR: Make datapkg easily extendable

    • Support for adding new Index types with plugins
    • Support for adding new Commands with command plugins
    • Support for adding new Distributions with distribution plugins
  • Improved package download support (also now pluggable)

  • Reimplement url download using only python std lib (removing urlgrabber requirment and simplifying installation)
  • Improved spec: support for db type index + better documentation
  • Better configuration management (especially internally)
  • Reduce dependencies by removing usage of PasteScript and PasteDeploy
  • Various minor bugfixes and code improvements

Credits

A big hat-tip to Mike Chelen and Matthew Brett for beta-testing this release and to Will Waites for code contributions.

Open Government Data Goes Global – OGDCamp Keynote

This is the keynote I gave as the opening to Open Government Data Camp 2010. Accompanying slides.

Keynote

Hello and Welcome!

I’m Rufus Pollock from the Open Knowledge Foundation. We’re delighted to have such great a group of people here and many thanks to all of you that have come, especially if you’ve travelled a long way.

And thanks of course to all of our sponsors who have kindly supported the travel expenses of those who could not otherwise afford to get here

I hardly need to tell you that things are really moving now in the world of open government data. After the pioneering work here in the UK and in the US, dozens of countries have now launched OGD initiatives and dozens more have things on the boil. Hundreds of developers have developed hundreds of web applications using open government data.

Projects like Gapminder are enabling us to gain new insights into everything from development trends to flu pandemics. The pioneering They Work For You (which enables anyone to easily follow a given person or topic in parliament) now has sister projects in everywhere from Chile to Lithuania. People are creating maps and mobile applications to enable you to find your nearest hospital, park, toilet, or anti-social behavioural order (ASBO)

Projects built using OGD let you do everything from plotting the quickest and most scenic bicycle route between two given locations, to finding places where you can afford to live that are within a certain travel time from your work, to finding out where large public subsidies are disbursed to different companies across different regions, to finding out how where your pennies are spent per day.

At the Foundation we have a slogan to keep ourselves from getting carried away which is ideas are cheap, implementation is costly. While it’s often easy to come up with an interesting idea, the hard part is building it, and building it really well.

That said, the world of open government data is powered by good ideas. All of the amazing projects which people have built will have once started out with a small and simple question: wouldn’t it be cool if?

Wouldn’t it be cool if I could be emailed every time someone plans to build something new in the area that I live in?

Wouldn’t it be cool if every time my parliamentary representative says stuff about something I’m interested in?

Wouldn’t it be cool if I look up on my phone the journey time between any two locations in Europe using only land-based public transport?

Wouldn’t it be cool if I could cross-reference data on working hours, well-being and weather for 100 different countries in a single click?

And so on!

‘Great!’ I hear you say. But how can we actually do any of these things. Ideas are very important, but how can we make sure this stuff gets built? Well, there is going to be no straightforward, comprehensive formula to making sure that all this stuff gets done. But in order to ensure that people can get started we need to do a couple of basic things:

1. Use an open license

By using an open license you are giving people certainty — and letting them know — that they can freely use, reuse and redistribute that material and this is essential if you want people to come along and do something interesting stuff with it.

Without an open licenses we’re living in a world of confusing signals — and that means data-jams! We really need to give people a big green light to let them know that they can take that data and make things with it. The second thing we need to to is:

2. Provide raw machine readable data

What does that mean? Well if material is originally in a database or spreadsheet format then make it available like that (or in the closest raw form) — and please don’t create a PDF or serve it up only via some fancy Shiny Front End (which is lovely for anyone who knows how to work it, but absolutely useless for anyone who doesn’t).

As I wrote back in 2007: Give Us The Raw Data and Give It To Us Now.

Over the next couple of days we’re going to be gathering interesting examples of why this matters at http://rawdatanow.com/. So if you have any cunning ideas for how to explain what raw data is to non-technical folks in government, or if you have any good anecdotes (the uglier the better), then please let us know!

These two things in themselves are not very hard. It really boils down to being very explicit about letting people reuse stuff (spell it out!), and dumping whatever database files you have on a server somewhere and linking to it somewhere where people will find it.

The main challenge is in convincing government that this is a Good Idea. How we can do this is something that we’d definitely like to encourage you to talk to each other about over the course of the event. What are the difficulties in your country? How did you overcome them? What do we need to do this well? Good examples? Good old fashioned evangelism? This event is about sharing your answers to these kinds questions.

Another big theme of this event will be answering the question: OK, so we’ve have opened up the data, now what?

Suppose you already have public bodies releasing a nice set of raw data on a web server under an open license what do we do now?

Well, first off we need to make sure the data is easy to find, and easy to reuse and to do this we may want to start a data catalogue, like data.gov, data.gov.uk, or data dot dot dot dot.

Data catalogues are an essential part of the plumbing for an ecosystem of open data.

At the Open Knowledge Foundation we are working hard on CKAN, which an open source system currently used in data.gov.uk and over 20 catalogues around the world. If you’d like to talk to us about starting up a catalogue in your country, please come and say hello. We’d be delighted to help you set one up!

We are currently working hard to start to connect together and federate different catalogues so we can more easily pull data together from lots of different countries with a few clicks of the mouse. And always remember why we’re doing this — a data catalogue, or even open data is is a means to an end: a way to make it easier for us to build tools and services that make our world better in some way, big or small, a better place.

Finally, let’s remember this is just beginning — and we should enjoy the journey!

Speaking at PICNIC 10 in Amsterdam

This week I’m going to be in Amsterdam at PICNIC ’10 speaking about open data — what it is, why it’s good and how we can go about growing the open data ecosystem.

If you’re in Amsterdam — at PICNIC or otherwise — and interested in open data do get in touch.

Update: slides have now been posted – enjoy!