Category Archives: Academic

Talking at Legal Aspects of Public Sector Information (LAPSI) Conference in Milan

This week on Thursday and Friday I’ll be in Milan to speak at the 1st LAPSI (Legal Aspects of Public Sector Information) Primer & Public Conference.

I’m contributing to a “primer” session on The Perspective of Open Data Communities and then giving a conference talk on Collective Costs and Benefits in Opening PSI for Re-use in a session on PSI Re-use: a Tool for Enhancing Competitive Markets where I’ll be covering work by myself and others on pricing and regulation of PSI (see e.g. the “Cambridge Study” and the paper on the Economics of the Public Sector of Information).

Update: slides are up.

Community, Openness And Technology

PSI: Costs And Benefits Of Openness

Speaking at CCSR in Manchester about Open Data

This Tuesday (25th Jan) I’ll be giving a seminar at Manchester University’s Centre for Census and Survey Research on Open Data.

The seminar is 4-5pm and there’ll be an open-data workshop/discussion from 12-4. If you’re interested in working with open data generally or any Open Knowledge Foundation projects specifically, for example [Where Does My Money Go], CKAN, etc, you’re very welcome to come along to any part of it.

Seminar Info

Title: Open Data: What, Why How!

There has been growing interest in many circles, and especially in government regarding ‘open data’. In this talk I explain what open is, what’s its attraction is, especially for public sector information, from an economic and policy standpoint, and finally explain how governments and others can adopt this new approach.

Is Google the next Microsoft? Monopoly, Competition, Regulation and Antitrust in Online Search

My paper Is Google the next Microsoft? Competition, Welfare and Regulation in Online Search has been published in the December issue of the Review of Network Economics.

With recent antitrust and competition authority interest in Google’s monopoly — such as the announcement on Nov 30th of an official probe of Google by the European competition authorities1 — the paper’s publication could not come at a more appropriate time – and the first version of this paper was put out in 2008 so it has also proved reasonably prescient!


Beginning from nothing fifteen years ago, today online search is a multi-billion dollar business and search engine providers such as Google have become household names.

While search has become increasingly ubiquitous it has also grown increasingly dominated by a single firm: Google. For example today in the UK Google accounts for 90% of all searches and in many other countries Google has a similar lead over its rivals.

In the paper I investigate why the search engine market is so concentrated and what implications this has for us both now, and in the future. Concluding that monopoly is currently a likely outcome I look at how competition could be promoted and a dominant search engine regulated.

To summarize the main points:

(a) Search engines provide ordinary users with a `free’ service gaining something extremely valuable in exchange, namely ‘attention’. With attention in ever more limited supply — after all each of us have at maximum 24 hours available in each day — access to that attention is correspondingly valuable especially for those who have products or services to advertise. Thus, though web search engines do not charge users, they can retail the attention generated by their service to those are willing to pay for access to it.

(b) The search engine market is already extremely concentrated. In many countries a single firm (usually Google) possesses of market share an order of magnitude larger than its rivals. As stated, in the UK Google already holds over 90% market share as. However, it is also noteworthy that there are some marked variations, for example in China Google trails the leaders.

(c) Competition issues are likely to become more serious as this dominance becomes established. It is important to realise that while search appears ‘free’ we do pay indirectly via the charges to advertises — who must in turn recoup that money from consumers. A dominant search engine may have incentives to distort its ‘results’ in ways that increase it owns profits but harm society — for example by suppressing organic search results that would substitute for or harm associated ‘sponsored’ results (adverts).

(d) There are a number of approaches that regulators and policy-makers could take to protect against these adverse consequences. For example, policy-makers could look at ways to separate the ‘software’ and ‘service’ parts of a search engines activity, or less dramatically, they could set up a regulatory body to review search result rankings and choices.

Conclusion: it will be increasingly necessary for there to be some form of oversight, possibly extending to formal regulation, of the search engine market. In several markets monopoly, or near monopoly, already exists and there is every reason to think this situation will persist. Left unchecked by competition the private interests of a search engine and the interests of society as whole will diverge and, thus, left entirely unregulated, the online search market will develop in ways that are harmful to the general welfare.

  1. Financial Times, Brussels launches formal Google probe (Nov 30 2010), Update: Google’s clout raises concerns in France, International Herald Tribue, Dec 15 2010, p.21 

Papers on the Size and Value of EU Public Domain

I’ve just posted two new papers on the size of and ‘value’ the EU Public Domain. These papers are based on the research done as part of the Public Domain in Europe (EUPD) Research Project (which has now been submitted).

  • Summary Slides Covering Size and Value of the Public Domain – Talk at COMMUNIA in Feb 2010
  • The Size of the EU Public Domain

    This paper reports results from a large recent study of the public domain in the European Union. Based on a combination of catalogue and survey data our figures for the number of items (and works) in the public domain extend across a variety of media and provide one of the first quantitative estimates of the ‘size’ of the public domain in any jurisdiction. We find that for books and recordings the public domain is around 10-20% of published extant output and would consist of millions and hundreds of thousands of items respectively. For films the figure is dramatically lower (almost zero). We also establish some interesting figures relevant to the orphan works debate such as the number of catalogue entries without any identified author (approximately 10%).

  • The Value of the EU Public Domain

    This paper reports results from a large recent study of the public domain in the European Union. Based on a combination of catalogue, commercial and survey data we present detailed figures both on the prices (and price differences) of in copyright and public domain material and on the usage of that material. Combined with the estimates for the size of the EU public domain presented in the companion paper our results allow us to provide the first quantitative estimate for the `value’ of the public domain (i.e. welfare gains from its existence) in any jurisdiction. We also find clear, and statistically significant, differences between the prices of in-copyright and public-domain in the two areas which we have significant data: books and sounds recordings in the UK. Patterns of usage indicate a significant demand for public domain material but limitations of the data make it difficult to draw conclusions on the impact of entry into the public domain on demand.

The results on price differences are particularly striking, as to my knowledge, these are by far the largest analysis done to date. More significantly, they clearly show that the claim in the Commission’s impact assessment that there was no price effect of copyright (compared to the public domain) was wrong. That claim was central to the impact assessment and to the proposal to extend copyright term in sound recordings (a claim that was based on a single study using a very small size, performed by PwC as part of a music-industry sponsored piece of consultancy for submission to the Gowers review).

Public Sector Transparency Board

As announced on Friday on the UK Government’s, I am one of the members of the UK Government’s newly formed Public Sector Transparency Board.

From the announcement:

The Public Sector Transparency Board, which was established by the Prime Minister, met yesterday for the first time.

The Board will drive forward the Government’s transparency agenda, making it a core part of all government business and ensuring that all Whitehall departments meet the new tight deadlines set for releasing key public datasets. In addition, it is responsible for setting open data standards across the whole public sector, listening to what the public wants and then driving through the opening up of the most needed data sets.

Chaired by Francis Maude, the Minister for the Cabinet Office, the other members of the Transparency Board are Sir Tim Berners-Lee, inventor of the World Wide Web, Professor Nigel Shadbolt from Southampton University, an expert on open data, Tom Steinberg, founder of mySociety, and Dr Rufus Pollock from Cambridge University, an economist who helped found the Open Knowledge Foundation.

In the words of Francis Maude:

“In just a few weeks this Government has published a whole range of data sets that have never been available to the public before. But we don’t want this to be about a few releases, we want transparency to become an absolutely core part of every bit of government business. That is why we have asked some of the country’s and the world’s greatest experts in this field to help us take this work forward quickly here in central government and across the whole of the public sector.”

The Size of the Public Domain (Without Term Extensions)

We’ve looked at the size of the public domain extensively in earlier posts.

The basic take away from the analysis was the finding that, based on library catalogue data, for books in the UK, approximately 15-20% of work was in the public domain — with public domain work being pretty old (70 years plus, due to the life+70 nature of copyright).

An interesting question to ask then is: how large would the public domain be if copyright had not been extended from its original length of 14 years with (possible) 14 year renewal (14+14) set out in Statute of Anne back in 1710? And how does this compare with how the situation, back when 14+14 was in “full swing”, say, 1795?

Furthermore, what about if copyright today was a simple 15 years — the point estimate for the optimal term of copyright found in paper on this subject? Well here’s the answer:

Today1795 (14+14)Today (14+14)Today (15y)
Total Items3.46m179k3.46m3.46m
No. Public Domain657k140k1.2m2.59m
%tage Public Domain19785275

Number and percentage of public domain works based on various scenarios based on Cambridge University Library catalogue data.

That’s right folks: based on the data available, if copyright had stayed at its Statute of Anne level, 52% of the books available today would in the public domain compared to an actual level of 19%. That’s around 600,000 additional items that would be in the public domain including works like Virginia Woolf’s (d. 1941) the Waves, Salinger’s Catcher in the Rye (pub. 1951) and Marquez’s Chronicle of a Death Foretold (pub. 1981).

For comparison, in 1795 78% of all extant works were in the public domain. A figure which we’d be close to having if copyright was a simple 15 years (in that case the public domain would be a substantial 75%).

To put this in visual terms, what the public domain is missing out as a result of copyright extension is the yellow region in the following figure: those are the set of works that would be public domain under 14+14 but aren’t under current copyright!

PD Stats

The Public Domain of books today (red), under 14+14 (yellow), and published output (black)

Update: I’ve posted the main summary statistics file including per-year counts. I’ve also started a CKAN data package: eupd-data for this EUPD-related data.

UK Government Plans to Open Up Data

Yesterday, in a speech on “Building Britain’s Digital Future”, UK Prime Minister Gordon Brown announced wide-ranging plans to open up UK government data. In addition to a general promise to extend the existing commitments to “make public data public” the PM announced:

  • The opening up of a large and important set of transport data (the NaPTAN dataset)
  • A commitment to open up a significant amount of Ordnance Survey data from the 1st April (though details of which datasets not yet specified)
  • By the Autumn an online e-“domesday” book giving “an inventory of all non-personal datasets held by departments and arms-length bodies
  • A new “institute” for web science headed by Tim Berners-Lee and Nigel Shadbolt and with an initial £30m in funding

This speech is a significant indication of a further commitment to the “making public data public” policy announced in the Autumn.

It’s great to see this as, a year ago it seemed as if government policy was set to largely ignore the research in the Models of Public Sector Information Provision by Trading Funds report (authored by myself, David Newbery and Professor Bently back in 2008) whose basic conclusions was that that government data which was digital, bulk and ‘upstream’ should be made available at marginal cost.

More detailed excerpts (with emphasis added)

Opening up data

In January we launched, a single, easy-to-use website to access public data. And even in the short space of time since then, the interest this initiative has attracted – globally – has been very striking. The site already has more than three thousand data sets available – and more are being added all the time. And in the past month the Office for National Statistics has opened up access for web developers to over two billion data items right down to local neighbourhood level.

The Department for Transport and the transport industry are today making available the core reference datasets that contain the precise names and co-ordinates of all 350 thousand bus stops, railway stations and airports in Britain.

Public transport timetables and real-time running information is currently owned by the operating companies. But we will work to free it up – and from today we will make it a condition of future franchises that this data will be made freely available.

And following the strong support in our recent consultation, I can confirm that from 1st April, we will be making a substantial package of information held by ordnance survey freely available to the public, without restrictions on re-use. Further details on the package and government’s response to the consultation will be published by the end of March.

e-Domesday Book

And I can also tell you today that in the autumn the Government will publish online an inventory of all non-personal datasets held by departments and arms-length bodies – a “domesday book” for the 21st century.

The programme will be managed by the National Archives and it will be overseen by a new open data board which will report on the first edition of the new domesday book by April next year. The Government will then produce its detailed proposals including how this work can be extended to the wider public sector.

To inform the continuing development of making public data public, the National Archives will produce a consultation paper on a definition of the “public task” for public data, to be published later this year.

The new domesday book will for the first time allow the public to access in one place information on each set of data including its size, source, format, content, timeliness, cost and quality. And there will be an expectation that departments will release each of these datasets, or account publicly for why they are not doing so.

Any business or individual will be free to embed this public data in their own websites, and to use it in creative ways within their own applications.


So our goal is to replace this first generation of e-government with a much more interactive second generation form of digital engagement which we are calling Mygov.

Companies that use technology to interact with their users are positioning themselves for the future, and government must do likewise. Mygov marks the end of the one-size-fits-all, man-from-the-ministry-knows-best approach to public services.

Mygov will constitute a radical new model for how public services will be delivered and for how citizens engage with government – making interaction with government as easy as internet banking or online shopping. This open, personalised platform will allow us to deliver universal services that are also tailored to the needs of each individual; to move from top-down, monolithic websites broadcasting public service information in the hope that the people who need help will find it – to government on demand.

And rather than civil servants being the sole authors and editors, we will unleash data and content to the community to turn into applications that meet genuine needs. This does not require large-scale government IT Infrastructure; the ‘open source’ technology that will make it happen is freely available. All that is required is the will and willingness of the centre to give up control.

Policy Recommendations in the Area of Innovation, Creativity and IP

I was recently asked to put together a short document outlining my main policy recommendations in the area of “innovation, creativity and IP”. Below is what I prepared.

General IP Policy

Recommendation: IP policy, and more generally innovation policy, should aim at the improvement of the overall welfare of UK society and citizens and not just at promoting innovation and creativity

Innovation is, of course, a major factor in the improvement of societal welfare — but not the only factor, access to the fruits of that innovation is also important.

IP rights are monopolies and such monopolies when over-extended do harm rather than good. The provision of IP rights must balance the promotion of innovation and creativity with the need for adequate access to the results of those efforts both by consumers and those who would seek to innovate and create by building upon them. A policy which aims purely at maximizing innovation, via the use of IP rights, will almost certainly be detrimental to societal welfare, since it will ignore the negative consequences of extending IP on access to innovation and knowledge. As such, IP policy is about having “enough, but not too much”.

This basic point is often overlooked. To help minimize the risk of this occurring in future it is suggested that this basic purpose — of promoting the welfare of UK citizens — be explicitly embedded within the goals of organisations and departments tasked with handling policies related to innovation and IP.

Recommendation: Move away from a focus on intellectual property to look at innovation and information policy more widely

IP rights are but one tool for promoting innovation and often a rather limited one. The focus should be on the general problem — promoting societal welfare through innovation and access to innovation — not on one particular solution to that problem.

Provision and Pricing of Public Section Information


Public sector information (PSI) is information held by a public sector organisation, for example a government department or, more generally, any entity which is majority owned and/or controlled by government. Classic examples, of public sector information in most countries would include, among many others: geospatial data, meteorological information and official statistics.

While much of the data or information used in our society is supplied from outside the public sector, compared to other parts of the economy, the public sector plays an unusually prominent role. In many key areas, a public sector organization may be the only, or one among very few, sources of the particular information it provides (e.g. for geospatial and meteorological information). As such, the policies adopted regarding maintenance, access and re-use of PSI can have a very significant impact on the economy and society more widely.

Funding for public sector information can come from three basic sources: government, ‘updaters’ (those who update or register information) and ‘users’ (those who want to access and use it). Policy-makers control the funding model by setting charges to external groups (‘updaters’ or ‘users’) and committing to make up any shortfall (or receive any surplus) that results. Much of the debate focuses on whether ‘users’ should pay charges sufficient to cover most costs (average cost pricing) or whether they should be given marginal cost access — which equates to free when the information is digital. However, this should not lead us to neglect the third source of funding via charges for ‘updates’.

Policy-makers must also to concern themselves with the regulatory structure in which public sector information holders operate. The need to provide government funding can raise major commitment questions while the fact that many public sector information holders are the sole source of the information they supply raise serious competition and efficiency issues.

Recommendation: Make digital, non-personal, upstream PSI available at marginal cost (zero)

The case for pricing public sector information to users at marginal cost (equal to zero for digital data) is very strong for a number of complementary reasons. First, the distortionary costs of average rather than marginal cost pricing are likely to be high. Second, the case for hard budget constraints to ensure efficient provision and induce innovative product development is weak. As such, digital upstream public sector information is best funded out of a combination of ‘updater’ fees and direct government contributions with users permitted free and open access. Appropriately managed and regulated, this model offers major societal benefits from increased provision and access to information-based services while imposing a very limited funding burden upon government.

Recommendation: Regulation should be transparent, independent and empowered. For every public sector information holder there should be a single, clear, source of regulatory authority and responsibility, and this ‘regulator’ should be largely independent of government.

This is essential if any pricing-policy is to work well and is especially important for marginal-cost pricing where the Government may be providing direct funding to the information holder. Policy-makers around the world have had substantial experience in recent years with designing these kinds of regulatory systems and this is, therefore, not an issue that should be especially difficult to address.

Copyright Term


The optimal term of copyright has been a very live policy issue over the last decade. Recently, in the European Union, and especially in the UK, there has been much debate over whether to extend the term of copyright in sound recordings from its current 50 years.

The basic trade-off inherent in copyright is a simple one. On the one hand, increasing copyright yields benefits by stimulating the creation of new works but, on the other hand, it reduces access to existing works (the welfare ‘deadweight’ loss). Choosing the optimal term, that is the length of protection, presents these two countervailing forces particularly starkly. By extending the term of protection, the owners of copyrights receive revenue for a little longer. Anticipating this, creators of work which were nearly, but not quite, profitable under the existing term will now produce work, and this work will generate welfare for society both now and in the future. At the same time, the increase in term applies to all works including existing ones — those created under the term of copyright before extension. Extending term on these works prolongs the copyright monopoly and therefore reduces welfare by hindering access to, and reuse of, these works.

Recommendation: Reduce Copyright Term – And Certainly Do Not Extend It

Current copyright term is significantly over-extended. Calculations performed in the course of my own work indicate that optimal copyright term is likely around 15 years and almost certainly below 40 (the breadth of the estimates here are a direct reflection of the existing data limitations but this upper bound is still (far) below existing terms).

Even a simple present-value calculation would indicate that the incentives for creativity today offered by extra term 50 years or more in the future are negligible — while the effect on access to knowledge can be very substantial, especially when term extensions are applied retrospectively (as they almost always are).

It is also noteworthy that recent extensions, such as that for authorial copyright in the US (the CTEA) and the proposed extension of recording copyright in the EU, have been opposed well-nigh unanimously by academic economists and other IP scholars. Policy-making in this area should be evidence-based and designed to promote the broader welfare of society as a whole. Policies that appear to reflect nothing more than special-interest lobbying will only perpetuate the “marked lack of public legitimacy” which the Gowers report lamented, discouraging those who wish to contribute constructively to future Government policy-making in these areas, and making enforcement ever harder — effective enforcement, after all, depends on consent borne of respect as well as obedience coerced through punishment.