As announced on Friday on the UK Government’s data.gov.uk, I am one of the members of the UK Government’s newly formed Public Sector Transparency Board.

From the announcement:

The Public Sector Transparency Board, which was established by the Prime Minister, met yesterday for the first time.

The Board will drive forward the Government’s transparency agenda, making it a core part of all government business and ensuring that all Whitehall departments meet the new tight deadlines set for releasing key public datasets. In addition, it is responsible for setting open data standards across the whole public sector, listening to what the public wants and then driving through the opening up of the most needed data sets.

Chaired by Francis Maude, the Minister for the Cabinet Office, the other members of the Transparency Board are Sir Tim Berners-Lee, inventor of the World Wide Web, Professor Nigel Shadbolt from Southampton University, an expert on open data, Tom Steinberg, founder of mySociety, and Dr Rufus Pollock from Cambridge University, an economist who helped found the Open Knowledge Foundation.

In the words of Francis Maude:

“In just a few weeks this Government has published a whole range of data sets that have never been available to the public before. But we don’t want this to be about a few releases, we want transparency to become an absolutely core part of every bit of government business. That is why we have asked some of the country’s and the world’s greatest experts in this field to help us take this work forward quickly here in central government and across the whole of the public sector.”

We’ve looked at the size of the public domain extensively in earlier posts.

The basic take away from the analysis was the finding that, based on library catalogue data, for books in the UK, approximately 15-20% of work was in the public domain — with public domain work being pretty old (70 years plus, due to the life+70 nature of copyright).

An interesting question to ask then is: how large would the public domain be if copyright had not been extended from its original length of 14 years with (possible) 14 year renewal (14+14) set out in Statute of Anne back in 1710? And how does this compare with how the situation, back when 14+14 was in “full swing”, say, 1795?

Furthermore, what about if copyright today was a simple 15 years — the point estimate for the optimal term of copyright found in paper on this subject? Well here’s the answer:

Today1795 (14+14)Today (14+14)Today (15y)
Total Items3.46m179k3.46m3.46m
No. Public Domain657k140k1.2m2.59m
%tage Public Domain19785275

Number and percentage of public domain works based on various scenarios based on Cambridge University Library catalogue data.

That’s right folks: based on the data available, if copyright had stayed at its Statute of Anne level, 52% of the books available today would in the public domain compared to an actual level of 19%. That’s around 600,000 additional items that would be in the public domain including works like Virginia Woolf’s (d. 1941) the Waves, Salinger’s Catcher in the Rye (pub. 1951) and Marquez’s Chronicle of a Death Foretold (pub. 1981).

For comparison, in 1795 78% of all extant works were in the public domain. A figure which we’d be close to having if copyright was a simple 15 years (in that case the public domain would be a substantial 75%).

To put this in visual terms, what the public domain is missing out as a result of copyright extension is the yellow region in the following figure: those are the set of works that would be public domain under 14+14 but aren’t under current copyright!

PD Stats

The Public Domain of books today (red), under 14+14 (yellow), and published output (black)

Update: I’ve posted the main summary statistics file including per-year counts. I’ve also started a CKAN data package: eupd-data for this EUPD-related data.

Yesterday, in a speech on “Building Britain’s Digital Future”, UK Prime Minister Gordon Brown announced wide-ranging plans to open up UK government data. In addition to a general promise to extend the existing commitments to “make public data public” the PM announced:

  • The opening up of a large and important set of transport data (the NaPTAN dataset)
  • A commitment to open up a significant amount of Ordnance Survey data from the 1st April (though details of which datasets not yet specified)
  • By the Autumn an online e-”domesday” book giving “an inventory of all non-personal datasets held by departments and arms-length bodies
  • A new “institute” for web science headed by Tim Berners-Lee and Nigel Shadbolt and with an initial £30m in funding

This speech is a significant indication of a further commitment to the “making public data public” policy announced in the Autumn.

It’s great to see this as, a year ago it seemed as if government policy was set to largely ignore the research in the Models of Public Sector Information Provision by Trading Funds report (authored by myself, David Newbery and Professor Bently back in 2008) whose basic conclusions was that that government data which was digital, bulk and ‘upstream’ should be made available at marginal cost.

More detailed excerpts (with emphasis added)

Opening up data

In January we launched data.gov.uk, a single, easy-to-use website to access public data. And even in the short space of time since then, the interest this initiative has attracted – globally – has been very striking. The site already has more than three thousand data sets available – and more are being added all the time. And in the past month the Office for National Statistics has opened up access for web developers to over two billion data items right down to local neighbourhood level.

The Department for Transport and the transport industry are today making available the core reference datasets that contain the precise names and co-ordinates of all 350 thousand bus stops, railway stations and airports in Britain.

Public transport timetables and real-time running information is currently owned by the operating companies. But we will work to free it up – and from today we will make it a condition of future franchises that this data will be made freely available.

And following the strong support in our recent consultation, I can confirm that from 1st April, we will be making a substantial package of information held by ordnance survey freely available to the public, without restrictions on re-use. Further details on the package and government’s response to the consultation will be published by the end of March.

e-Domesday Book

And I can also tell you today that in the autumn the Government will publish online an inventory of all non-personal datasets held by departments and arms-length bodies – a “domesday book” for the 21st century.

The programme will be managed by the National Archives and it will be overseen by a new open data board which will report on the first edition of the new domesday book by April next year. The Government will then produce its detailed proposals including how this work can be extended to the wider public sector.

To inform the continuing development of making public data public, the National Archives will produce a consultation paper on a definition of the “public task” for public data, to be published later this year.

The new domesday book will for the first time allow the public to access in one place information on each set of data including its size, source, format, content, timeliness, cost and quality. And there will be an expectation that departments will release each of these datasets, or account publicly for why they are not doing so.

Any business or individual will be free to embed this public data in their own websites, and to use it in creative ways within their own applications.

Mygov

So our goal is to replace this first generation of e-government with a much more interactive second generation form of digital engagement which we are calling Mygov.

Companies that use technology to interact with their users are positioning themselves for the future, and government must do likewise. Mygov marks the end of the one-size-fits-all, man-from-the-ministry-knows-best approach to public services.

Mygov will constitute a radical new model for how public services will be delivered and for how citizens engage with government – making interaction with government as easy as internet banking or online shopping. This open, personalised platform will allow us to deliver universal services that are also tailored to the needs of each individual; to move from top-down, monolithic websites broadcasting public service information in the hope that the people who need help will find it – to government on demand.

And rather than civil servants being the sole authors and editors, we will unleash data and content to the community to turn into applications that meet genuine needs. This does not require large-scale government IT Infrastructure; the ‘open source’ technology that will make it happen is freely available. All that is required is the will and willingness of the centre to give up control.

This Thursday (11th March) I’m speaking at the Forum Virium’s Open Up the City event in Helsinki.

This year their focus is on “open data, design, interfaces and innovation” and I’m speaking under the title “Open Data: What, Why, How?”.

I was recently asked to put together a short document outlining my main policy recommendations in the area of “innovation, creativity and IP”. Below is what I prepared.

General IP Policy

Recommendation: IP policy, and more generally innovation policy, should aim at the improvement of the overall welfare of UK society and citizens and not just at promoting innovation and creativity

Innovation is, of course, a major factor in the improvement of societal welfare — but not the only factor, access to the fruits of that innovation is also important.

IP rights are monopolies and such monopolies when over-extended do harm rather than good. The provision of IP rights must balance the promotion of innovation and creativity with the need for adequate access to the results of those efforts both by consumers and those who would seek to innovate and create by building upon them. A policy which aims purely at maximizing innovation, via the use of IP rights, will almost certainly be detrimental to societal welfare, since it will ignore the negative consequences of extending IP on access to innovation and knowledge. As such, IP policy is about having “enough, but not too much”.

This basic point is often overlooked. To help minimize the risk of this occurring in future it is suggested that this basic purpose — of promoting the welfare of UK citizens — be explicitly embedded within the goals of organisations and departments tasked with handling policies related to innovation and IP.

Recommendation: Move away from a focus on intellectual property to look at innovation and information policy more widely

IP rights are but one tool for promoting innovation and often a rather limited one. The focus should be on the general problem — promoting societal welfare through innovation and access to innovation — not on one particular solution to that problem.

Provision and Pricing of Public Section Information

Background

Public sector information (PSI) is information held by a public sector organisation, for example a government department or, more generally, any entity which is majority owned and/or controlled by government. Classic examples, of public sector information in most countries would include, among many others: geospatial data, meteorological information and official statistics.

While much of the data or information used in our society is supplied from outside the public sector, compared to other parts of the economy, the public sector plays an unusually prominent role. In many key areas, a public sector organization may be the only, or one among very few, sources of the particular information it provides (e.g. for geospatial and meteorological information). As such, the policies adopted regarding maintenance, access and re-use of PSI can have a very significant impact on the economy and society more widely.

Funding for public sector information can come from three basic sources: government, ‘updaters’ (those who update or register information) and ‘users’ (those who want to access and use it). Policy-makers control the funding model by setting charges to external groups (’updaters’ or ‘users’) and committing to make up any shortfall (or receive any surplus) that results. Much of the debate focuses on whether ‘users’ should pay charges sufficient to cover most costs (average cost pricing) or whether they should be given marginal cost access — which equates to free when the information is digital. However, this should not lead us to neglect the third source of funding via charges for ‘updates’.

Policy-makers must also to concern themselves with the regulatory structure in which public sector information holders operate. The need to provide government funding can raise major commitment questions while the fact that many public sector information holders are the sole source of the information they supply raise serious competition and efficiency issues.

Recommendation: Make digital, non-personal, upstream PSI available at marginal cost (zero)

The case for pricing public sector information to users at marginal cost (equal to zero for digital data) is very strong for a number of complementary reasons. First, the distortionary costs of average rather than marginal cost pricing are likely to be high. Second, the case for hard budget constraints to ensure efficient provision and induce innovative product development is weak. As such, digital upstream public sector information is best funded out of a combination of ‘updater’ fees and direct government contributions with users permitted free and open access. Appropriately managed and regulated, this model offers major societal benefits from increased provision and access to information-based services while imposing a very limited funding burden upon government.

Recommendation: Regulation should be transparent, independent and empowered. For every public sector information holder there should be a single, clear, source of regulatory authority and responsibility, and this ‘regulator’ should be largely independent of government.

This is essential if any pricing-policy is to work well and is especially important for marginal-cost pricing where the Government may be providing direct funding to the information holder. Policy-makers around the world have had substantial experience in recent years with designing these kinds of regulatory systems and this is, therefore, not an issue that should be especially difficult to address.

Copyright Term

Background

The optimal term of copyright has been a very live policy issue over the last decade. Recently, in the European Union, and especially in the UK, there has been much debate over whether to extend the term of copyright in sound recordings from its current 50 years.

The basic trade-off inherent in copyright is a simple one. On the one hand, increasing copyright yields benefits by stimulating the creation of new works but, on the other hand, it reduces access to existing works (the welfare ‘deadweight’ loss). Choosing the optimal term, that is the length of protection, presents these two countervailing forces particularly starkly. By extending the term of protection, the owners of copyrights receive revenue for a little longer. Anticipating this, creators of work which were nearly, but not quite, profitable under the existing term will now produce work, and this work will generate welfare for society both now and in the future. At the same time, the increase in term applies to all works including existing ones — those created under the term of copyright before extension. Extending term on these works prolongs the copyright monopoly and therefore reduces welfare by hindering access to, and reuse of, these works.

Recommendation: Reduce Copyright Term – And Certainly Do Not Extend It

Current copyright term is significantly over-extended. Calculations performed in the course of my own work indicate that optimal copyright term is likely around 15 years and almost certainly below 40 (the breadth of the estimates here are a direct reflection of the existing data limitations but this upper bound is still (far) below existing terms).

Even a simple present-value calculation would indicate that the incentives for creativity today offered by extra term 50 years or more in the future are negligible — while the effect on access to knowledge can be very substantial, especially when term extensions are applied retrospectively (as they almost always are).

It is also noteworthy that recent extensions, such as that for authorial copyright in the US (the CTEA) and the proposed extension of recording copyright in the EU, have been opposed well-nigh unanimously by academic economists and other IP scholars. Policy-making in this area should be evidence-based and designed to promote the broader welfare of society as a whole. Policies that appear to reflect nothing more than special-interest lobbying will only perpetuate the “marked lack of public legitimacy” which the Gowers report lamented, discouraging those who wish to contribute constructively to future Government policy-making in these areas, and making enforcement ever harder — effective enforcement, after all, depends on consent borne of respect as well as obedience coerced through punishment.

The lead article of Prospect Magazine’s February issue is a piece by by James Crabtree and Tom Chatfield entitled “Mashing the State”. It’s an in-depth look at the recent launch of data.gov.uk and its place in the wider context of government policy in relation to information — as well as information’s relation to governance (that “mashing” of the state …).

Where Does My Money Go gets a mention as does the “Cambridge” paper on pricing models at trading funds.

This Wednesday (27th of January) at 1pm I’m giving one of Cambridge University Library’s regular lunch-time talks on Openness and Libraries. Attendance is free and anyone can come along!

Update (28th Jan): talk is done and slides are now up.

Blurb

Over the past few years, open licensing (http://www.opendefinition.org/) has facilitated the explosive growth of a ‘knowledge commons’. To give a few prominent examples: Open Access journals, Open Educational Resources and Open Data in scientific research have all been enabled by licenses which permit material to be freely re-used and re-distributed. This outpouring of support for openness has led to an incredible rise in community-led development and innovative uses.

Bibliographic records are a key part of our shared cultural heritage and essential to anyone working with cultural materials (books, music, films etc). Opening up those records for access and re-use offer a variety of benefits.

First, it would allow libraries to share records more efficiently and improve quality more rapidly through better, easier feedback. Second, easier access to catalogue data would spur development of the multifarious services, technologies and research that use that data, including, for example, search engines, book or music websites, researchers working on information production, journalists writing on orphan works, as well as many other areas we cannot even imagine in advance.

With a growing number of Government agencies and public institutions making data open, is is now time for the library community to do likewise?

Attended an interesting talk today: “Historical Banking Crises and the Rules of the Game” by Professor Charles Calomiris, Columbia Business School. Sporadic notes below. See also this Weaving History thread on Financial Crises.

Notes

  • One crisis with 20 different explanations. Need to sort these out a little.
  • If banks are uninsured then in a recession banks cut their supply of loans
    • Banks are facing losses, need to bulk up their balance sheet and can do it either by raising equity or cutting supply of loans. Former is hard so do the latter.
  • Crisis aren’t just inherent to human nature or capitalism. “Crisis propensity reflects politically determined rules of the banking game that are conducive to crises:”
    1. industry setup that determines exposure of banks to risk
    2. absence of decent (effective and incentive compatible) central-banking (NB: 2 isn’t a big problem w/o 1)
    3. subsidization of risk by govt policies
  • Panic = moments of severe sudden withdrawal that threatened the system. Observable variable: collective action by NY clearing banks
    • In US (19th and early 20th c.): 1857, 1873, 1877, 1893, 1907 [ed: missing at least 2 and may have got wrong I think]
    • All of 6 crises in US post civil war were all preceded by 50% increase in liabilities and 7% drop in stock market
    • Britain: 1825, 1836, 1847, 1857, 1866 then none for over a century
  • Solvency crisis: -ve net worth of failed bank > 1% of GDP
    • 140 examples since 1978
    • Rare in past: 4 in 1873-1913
    • Australia: 1893 (10%)
    • Argentina: 1890 (10%)
    • Norway: 1900 (3%)
    • Italy: 1893 (1%)
  • Literature has converged in last 20 years to agree that safety-net provision on balance increases instability (rather than reducing it)
  • Crucial reform in 1858 in UK following 1857 crisis. BoE would no longer intervene in bills market. In 1866 made good on this promise when largest bill discounter went bust (Overend and Gurney)
  • Crisis origins:
    • Loose money: CBs, flat yield curve … (but note not enough for a crisis on own)
    • Housing subsidies delivered by leverage. F&F have $1.6 trillion out of $3 trillion total subprime. $350 billion cost on F&F alone.
    • Huge buy-side agency problems
      • Lots of buy-side people buying poor quality material for clients facility by big race-to-the-bottom at ratings agency
    • Prudential regulation failure
  • Everyone smart knew there was a subprime crisis in mid-2006.
  • Long-term regulatory reforms
    • Micro-prudential reform: focus on measurement of risk
    • Credit rating agency reform
    • Resolution policy/TBTF Problems

Size of the Public Domain III

November 26th, 2009

Here we are going to apply the results on Public Domain “proportions” derived in our previous post and thereby obtain best estimates of the UK public domain.

The logic is simple, and similar to that in our first post in the series: we will take the Public Domain proportions from Table 3 of our last post and combine with our (conservative) estimates for output based on library catalogues. Here are the results:

Pub. DateItems% PDNo. PD
1400-1850304587100304587
1850-18604097010040970
1860-18704373410043734
1870-1880505649548035
1880-1890668579060171
1890-1900668838556850
1900-1910703606545734
1910-1920604894024195
1920-1930786702519667
1930-194090576109057
1940-19507269264361
1950-196011825100
1960-197026297400
1970-2009213050900
Total345811619657361

UK Public Domain Totals Based on Cambridge University Library Data. Note, as discussed in previous posts, figures from British Library are approximately 3x larger (both for Public Domain and total items).

culbooks_counts_annual_1600-2001

Total (Black) and Public Domain (Red) Items per year based on the CUL Catalogue.

Zooming in to the pre-1960 period to get more detail:

culbooks_counts_annual_1600-1960

Total (Black) and Public Domain (Red) Items per Year based on the CUL Catalogue for pre-1960 period.

I’m one of the co-organizers of a workshop on Public Domain Calculators workshop taking place next week, on the 10th and 11th of November, at Emmanuel College, University of Cambridge.

Hosted by the Open Knowledge Foundation in association with the Centre for Intellectual Property and Information Law at the University of Cambridge, it’s a meeting of European experts on copyright and the digital public domain taking place as part of the Communia project.

The purpose of the workshop is to produce materials such as legal flow charts and public domain “algorithms” which will help with the representation of different national copyright laws and the determination of public domain status.

Details of the meeting are as follows:

Background

There is often a tendency to talk of ‘the public domain’ and of works falling out of copyright and ‘into the public domain’ – as though there is a single set of works which are out of copyright all over the world. In fact, of course, there are different national laws about the nature and duration of copyright in different types of works – and hence what is in the public domain is different in different countries.

Efforts are currently underway to build a series of public domain calculators – which will help to determine whether or not a given work is in copyright in a given jurisdiction. At the time of writing groups and individuals in more than 17 jurisdictions are assisting in this effort.