Category Archives: Innovation and Intellectual Property

Patents and Access to Medicines for HIV – a looming crisis

Patents and access to medicines from a 2013 report on AIDS by Medicins Sans Frontieres:

Because millions of people need to be initiated and sustained on treatment regimens for life, it is as critical as ever to ensure ARVs [anti-retro-virals] are affordable. Competition among generic producers was instrumental in bringing down the price of the first generation of ARVs, and is one of the key reasons treatment could be scaled up to millions of people. Today, first-line ART is available for just under US$100 per person per year (ppy), which is a 99% decrease from 2000, when treatments still under patent were priced at more than $10,000 ppy.

But the situation today is different and the progress achieved is once again under threat. Key countries, especially India, where generics are produced, now grant medicine patents in order to comply with their international obligations as members of the World Trade Organization (WTO) [medicine patents re-introduced in 2005 after being abolished in 1970]. Newer ARVs are already patented in these countries, meaning that production of affordable generic medicines is now restricted, keeping monopoly prices high.

With upwards of 55 million people expected to need ARV therapy by the year 2030, global patent rules are contributing to a looming crisis as current drugs lose their effectiveness and their newer, patented replacements are priced out of reach for all but the wealthy.

Talking at Legal Aspects of Public Sector Information (LAPSI) Conference in Milan

This week on Thursday and Friday I’ll be in Milan to speak at the 1st LAPSI (Legal Aspects of Public Sector Information) Primer & Public Conference.

I’m contributing to a “primer” session on The Perspective of Open Data Communities and then giving a conference talk on Collective Costs and Benefits in Opening PSI for Re-use in a session on PSI Re-use: a Tool for Enhancing Competitive Markets where I’ll be covering work by myself and others on pricing and regulation of PSI (see e.g. the “Cambridge Study” and the paper on the Economics of the Public Sector of Information).

Update: slides are up.

Community, Openness And Technology

PSI: Costs And Benefits Of Openness

Creative Commons and the Commons

Background: I first got involved with Creative Commons (CC) in 2004 soon after its UK chapter started. Along with Damian Tambini, the then UK ‘project lead’ for CC, and the few other members of ‘CC UK’, I spent time working to promote CC and its licenses in the UK (and elsewhere). By mid-2007 I was no longer very actively involved and to most intents and purposes was no longer associated with the organization. I explain this to give some background to what follows.

Creative Commons as a brand has been fantastically successful and is now very widely recognized. While in many ways this success has been beneficial for those interested in free/open material it has also raised some issues that are worth highlighting.

Creative Commons is not a Commons

Ironically, despite its name, Creative Commons, or more precisely its licenses, do not produce a commons. The CC licenses are not mutually compatible, for example, material with a CC Attribution-Sharealike (by-sa) license cannot be intermixed with material licensed with any of the CC NonCommercial licenses (e.g. Attribution-NonCommercial, Attribution-Sharealike-Noncommercial).

Given that a) the majority of CC licenses in use are ‘non-commercial’ b) there is also large usage of ShareAlike (e.g. Wikipedia), this is an issue affects a large set of ‘Creative Commons’ material.

Unfortunately, the presence of the word ‘Commons’ in CC’s name and the prominence of ‘remix’ in the advocacy around CC tends to make people think, falsely, that all CC licenses as in some way similar or substitutable.

The ‘Brand’ versus the Licenses

More and more frequently I hear people say (or more significantly write) things like: “This material is CC-licensed”. But as just discussed there is large, and very significant, variation in the terms of the different CC licenses. It appears that for many people the overall ‘Brand’ dominates the actual specifics of the licenses.

This is in marked contrast to the Free/Open Source software community, where even in the case of the Free Software Foundation’s licenses people tend to specify the exact license they are talking about.

Standards and interoperability are what really matter for licenses (cf the “Commons” terminology). Licensing and rights discussions are pretty dull for most people — and should be. They are important only because they determine what you and I can and can’t do, and specifically what material you and I can ‘intermix’ — possible only where licenses are ‘interoperable’.

To put it the other way round: licenses are interoperable if you can intermix freely material licensed under one of those licenses with material licensed under another. This interoperability is crucial and it is, in license terms, what underlies a true commons.

More broadly we are interested in a ‘license standard’, in knowing, not only that a set of licenses are interoperable, but that they all allow certain things, for example for anyone to use, reuse and redistribute the licensed material (or to put in terms of freedom, that they guarantee those freedoms to users). This very need for a standard is why we created the Open Definition for content and data building directly on the work on a similar standard (the Open Source Definition) in the Free/Open Source software community.

The existence of non-commercial

CC took a crucial decision in including NonCommercial licenses in their suite. Given the ‘Brand’ success of Creative Commons the inclusion of NC licenses has been to give them a status close to, if not identical, with the truly open, commons-supporting, licenses in the CC suite.

This is a noticeable difference here with the software world, where NC is also active, but under the ‘freeware’ and ‘shareware’ names (these terms aren’t always used consistently), and with this material clearly distinguished from the Free/Open Source software community.

As the CC brand has grown, there is a desire by some individuals and institutions to use CC licenses simply because they are CC licenses (this is also encouraged by the baking in of CC licenses to many products and services). Faced with choosing a license, many people, and certainly many institutions, tend to go for the more restrictive option available (especially when the word commercial is in there — who wants to sanction exploitation for gain of their work by some third-party!). Thus, it is no surprise that non-commercial licenses appear to be by far the most popular.

Without the NC option, some of these people would have chosen one of the open CC licenses instead. Of course, some would not have licensed at all (or, at least not with a CC license), sticking with pure copyright or some other set of terms. Nevertheless, the benefit in gaining a clear dividing line, and in creating brand-pressure for a real commons, and real openness would have been substantial, and worth, in my opinion, the loss of the non-commercial option.

Structure and community

It is notable in the F/OSS community that most licenses, especially the most popular, are either not ‘owned’ by anyone (MIT/BSD) or are run by an organization with a strong community base (e.g. the Free Software Foundation). Creative Commons seem rather different. While there are public mailing lists ultimately decisions regarding the licenses, and about crucial features thereof such as compatibility with 3rd party licenses, remains with CC central based in San Francisco.

Originally, there was a fair amount of autonomy given to country projects but over time this autonomy has gradually been reduced (there are good reasons for this — such as a need for greater standardization across licenses). This has concrete affects for the terms in licenses.

For example, for v3.0 the Netherlands were requested to remove their provisions which included things like DB rights in their share-alike provision and instead standardize on a waiver for these additional rights (rights which are pretty important if you are doing data(base) licensing). Most crucially the CC licenses reserve the right to Creative Commons as an organization to determine compatibility decisions. This is arguably the single most important aspect of licensing, at least in respect of interoperability and the Commons.

Creative Commons and Data

Update: as September 2011 there has been further discussion between Open Data Commons and Creative Commons on these matters, especially regarding interoperability and Creative Commons v4.0.

From my first involvement in the ‘free/open’ area, I’d been interested in data licensing, both because of personal projects and requests from other people.

When first asked how to deal with this I’d recommended ‘modding’ a specific CC license (e.g. Attribution-Sharealike) to include provisions for data and data(bases). However, starting from 2006 there was a strong push from John Wilbanks, then at Science Commons but with the apparent backing of CC generally, against this practice as part of a general argument for ‘PD-only’ for data(bases) (with the associated implication that the existing CC licenses were content-only). While I respect John, I didn’t really agree with his arguments about PD-only and furthermore it was clear that there was a need in the community for open but non-PD licenses for data(bases).

In late 2007 I spoke with Jordan Hatcher and discovered about the work he and Charlotte Waelde were doing for Talis, to draft a new ‘open’ license for data(bases). I was delighted and started helping Jordan with these licenses — licenses that became the Open Data Commons PDDL and the ODbL. We sought input from CC during the drafting of these licenses, specifically the ODbL, but the primary response we had (from John Wilbanks and colleagues) was just “don’t do this”.

Once the ODbL was finalized we then contacted CC further about potential compatibility issues.

The initial response then was that, as CC did not recommend use of its licenses (other than CCZero) for data(bases), there should not be an issue since, as with CC licenses and software, there should be an ‘orthogonality’ of activity — CC licenses would license content, F/OSS licenses would license code, and data(base) licenses (such as the ODC ones) would license data. We pressed about this and had a phone con about this with Diane Peters and John Wilbanks in January 2010, with a follow-up email detailing the issues a bit later.

We’ve also explained on several occasions to senior members of CC central our desire to hear from CC on this issue and our willingness to look at ways to make any necessary amendments to ODC licenses (though obviously such changes would be conditional on full scrutiny by the Advisory Council and consultation with the community).

No response has been forthcoming. To this date, over a year later, we are yet to receive any response from CC despite having though we have now been promised a response at least 3 times (we’ve basically given up asking).

Further to this lack response, without any notice or discussion to ODC, CC recently put out a blog post in which they stated, in marked contrast to previous statements, that CC licenses were entirely suited to data. In many ways this is a welcome step (cf. my original efforts to use CC licenses for data above) but CC have made no statement about a) how they would seek to address data properly b) mention of the relationship of these efforts to existing work in Open Data Commons and especially re. the ODbL. One can only assume, at least in the latter case, that the omission was intentional.

All of this has led me, at least, to wonder what exactly CC’s aims are here. In particular, is CC genuinely concerned with interoperability (beyond a simple ‘everyone uses CC’) and the broader interests of the community who use and apply their licenses?


Creating a true commons for content and data is incredibly important (it’s one of the main things I work on day to day). Creative Commons have done amazing work in this area but as I outline above there is an important distinction between the (open) commons and CC licenses.

Many organisations, institutions, governments and individuals are currently making important decisions about licensing and legal tools – in relation to opening up everything from scientific information, to library catalogues to government data. CC could play an important role in the creation of an interoperable commons of open material. The open CC licenses (CC0, CC-BY and CC-BY-SA) are an important part of the legal toolbox which enables this commons.

I hope that CC will be willing to engage constructively with others in the ‘open’ community to promote licenses and standards which enable a true commons, particularly in relation to data where interoperability is especially crucial.

Progress in the last 3 months

As part of my Shuttleworth Fellowship I’m preparing quarterly reports on what I’ve been up to. So, herewith are some some highlights from the last 3 months.

Talks and Events

Open Data Projects


Papers on the Size and Value of EU Public Domain

I’ve just posted two new papers on the size of and ‘value’ the EU Public Domain. These papers are based on the research done as part of the Public Domain in Europe (EUPD) Research Project (which has now been submitted).

  • Summary Slides Covering Size and Value of the Public Domain – Talk at COMMUNIA in Feb 2010
  • The Size of the EU Public Domain

    This paper reports results from a large recent study of the public domain in the European Union. Based on a combination of catalogue and survey data our figures for the number of items (and works) in the public domain extend across a variety of media and provide one of the first quantitative estimates of the ‘size’ of the public domain in any jurisdiction. We find that for books and recordings the public domain is around 10-20% of published extant output and would consist of millions and hundreds of thousands of items respectively. For films the figure is dramatically lower (almost zero). We also establish some interesting figures relevant to the orphan works debate such as the number of catalogue entries without any identified author (approximately 10%).

  • The Value of the EU Public Domain

    This paper reports results from a large recent study of the public domain in the European Union. Based on a combination of catalogue, commercial and survey data we present detailed figures both on the prices (and price differences) of in copyright and public domain material and on the usage of that material. Combined with the estimates for the size of the EU public domain presented in the companion paper our results allow us to provide the first quantitative estimate for the `value’ of the public domain (i.e. welfare gains from its existence) in any jurisdiction. We also find clear, and statistically significant, differences between the prices of in-copyright and public-domain in the two areas which we have significant data: books and sounds recordings in the UK. Patterns of usage indicate a significant demand for public domain material but limitations of the data make it difficult to draw conclusions on the impact of entry into the public domain on demand.

The results on price differences are particularly striking, as to my knowledge, these are by far the largest analysis done to date. More significantly, they clearly show that the claim in the Commission’s impact assessment that there was no price effect of copyright (compared to the public domain) was wrong. That claim was central to the impact assessment and to the proposal to extend copyright term in sound recordings (a claim that was based on a single study using a very small size, performed by PwC as part of a music-industry sponsored piece of consultancy for submission to the Gowers review).

Policy Recommendations in the Area of Innovation, Creativity and IP

I was recently asked to put together a short document outlining my main policy recommendations in the area of “innovation, creativity and IP”. Below is what I prepared.

General IP Policy

Recommendation: IP policy, and more generally innovation policy, should aim at the improvement of the overall welfare of UK society and citizens and not just at promoting innovation and creativity

Innovation is, of course, a major factor in the improvement of societal welfare — but not the only factor, access to the fruits of that innovation is also important.

IP rights are monopolies and such monopolies when over-extended do harm rather than good. The provision of IP rights must balance the promotion of innovation and creativity with the need for adequate access to the results of those efforts both by consumers and those who would seek to innovate and create by building upon them. A policy which aims purely at maximizing innovation, via the use of IP rights, will almost certainly be detrimental to societal welfare, since it will ignore the negative consequences of extending IP on access to innovation and knowledge. As such, IP policy is about having “enough, but not too much”.

This basic point is often overlooked. To help minimize the risk of this occurring in future it is suggested that this basic purpose — of promoting the welfare of UK citizens — be explicitly embedded within the goals of organisations and departments tasked with handling policies related to innovation and IP.

Recommendation: Move away from a focus on intellectual property to look at innovation and information policy more widely

IP rights are but one tool for promoting innovation and often a rather limited one. The focus should be on the general problem — promoting societal welfare through innovation and access to innovation — not on one particular solution to that problem.

Provision and Pricing of Public Section Information


Public sector information (PSI) is information held by a public sector organisation, for example a government department or, more generally, any entity which is majority owned and/or controlled by government. Classic examples, of public sector information in most countries would include, among many others: geospatial data, meteorological information and official statistics.

While much of the data or information used in our society is supplied from outside the public sector, compared to other parts of the economy, the public sector plays an unusually prominent role. In many key areas, a public sector organization may be the only, or one among very few, sources of the particular information it provides (e.g. for geospatial and meteorological information). As such, the policies adopted regarding maintenance, access and re-use of PSI can have a very significant impact on the economy and society more widely.

Funding for public sector information can come from three basic sources: government, ‘updaters’ (those who update or register information) and ‘users’ (those who want to access and use it). Policy-makers control the funding model by setting charges to external groups (‘updaters’ or ‘users’) and committing to make up any shortfall (or receive any surplus) that results. Much of the debate focuses on whether ‘users’ should pay charges sufficient to cover most costs (average cost pricing) or whether they should be given marginal cost access — which equates to free when the information is digital. However, this should not lead us to neglect the third source of funding via charges for ‘updates’.

Policy-makers must also to concern themselves with the regulatory structure in which public sector information holders operate. The need to provide government funding can raise major commitment questions while the fact that many public sector information holders are the sole source of the information they supply raise serious competition and efficiency issues.

Recommendation: Make digital, non-personal, upstream PSI available at marginal cost (zero)

The case for pricing public sector information to users at marginal cost (equal to zero for digital data) is very strong for a number of complementary reasons. First, the distortionary costs of average rather than marginal cost pricing are likely to be high. Second, the case for hard budget constraints to ensure efficient provision and induce innovative product development is weak. As such, digital upstream public sector information is best funded out of a combination of ‘updater’ fees and direct government contributions with users permitted free and open access. Appropriately managed and regulated, this model offers major societal benefits from increased provision and access to information-based services while imposing a very limited funding burden upon government.

Recommendation: Regulation should be transparent, independent and empowered. For every public sector information holder there should be a single, clear, source of regulatory authority and responsibility, and this ‘regulator’ should be largely independent of government.

This is essential if any pricing-policy is to work well and is especially important for marginal-cost pricing where the Government may be providing direct funding to the information holder. Policy-makers around the world have had substantial experience in recent years with designing these kinds of regulatory systems and this is, therefore, not an issue that should be especially difficult to address.

Copyright Term


The optimal term of copyright has been a very live policy issue over the last decade. Recently, in the European Union, and especially in the UK, there has been much debate over whether to extend the term of copyright in sound recordings from its current 50 years.

The basic trade-off inherent in copyright is a simple one. On the one hand, increasing copyright yields benefits by stimulating the creation of new works but, on the other hand, it reduces access to existing works (the welfare ‘deadweight’ loss). Choosing the optimal term, that is the length of protection, presents these two countervailing forces particularly starkly. By extending the term of protection, the owners of copyrights receive revenue for a little longer. Anticipating this, creators of work which were nearly, but not quite, profitable under the existing term will now produce work, and this work will generate welfare for society both now and in the future. At the same time, the increase in term applies to all works including existing ones — those created under the term of copyright before extension. Extending term on these works prolongs the copyright monopoly and therefore reduces welfare by hindering access to, and reuse of, these works.

Recommendation: Reduce Copyright Term – And Certainly Do Not Extend It

Current copyright term is significantly over-extended. Calculations performed in the course of my own work indicate that optimal copyright term is likely around 15 years and almost certainly below 40 (the breadth of the estimates here are a direct reflection of the existing data limitations but this upper bound is still (far) below existing terms).

Even a simple present-value calculation would indicate that the incentives for creativity today offered by extra term 50 years or more in the future are negligible — while the effect on access to knowledge can be very substantial, especially when term extensions are applied retrospectively (as they almost always are).

It is also noteworthy that recent extensions, such as that for authorial copyright in the US (the CTEA) and the proposed extension of recording copyright in the EU, have been opposed well-nigh unanimously by academic economists and other IP scholars. Policy-making in this area should be evidence-based and designed to promote the broader welfare of society as a whole. Policies that appear to reflect nothing more than special-interest lobbying will only perpetuate the “marked lack of public legitimacy” which the Gowers report lamented, discouraging those who wish to contribute constructively to future Government policy-making in these areas, and making enforcement ever harder — effective enforcement, after all, depends on consent borne of respect as well as obedience coerced through punishment.

Prospect Magazine Article: Mashing the State

The lead article of Prospect Magazine’s February issue is a piece by by James Crabtree and Tom Chatfield entitled “Mashing the State”. It’s an in-depth look at the recent launch of and its place in the wider context of government policy in relation to information — as well as information’s relation to governance (that “mashing” of the state …).

Where Does My Money Go gets a mention as does the “Cambridge” paper on pricing models at trading funds.

Exploring Patterns of Knowledge Production

I’m posting up some work-in-progress entitled Exploring Patterns of Knowledge Production (link to full pdf) that follows up to my earlier post of a year and a bit ago. Below I’ve excerpted the introduction plus list of motivational questions. Comments (and critique) very welcome!

Exploring Patterns of Knowledge Production Paper ‘Alpha’ (pdf)


In what follows the term ‘knowledge’ is here used broadly to signify all forms of information production including those involved in technological innovation, cultural creativity and academic advance.

Today, thanks to rapid advances in IT, we have available substantial datasets pertaining both to the extent and the structure of knowledge production across disciplines, space and time.

Especially recent is the availability of good ‘structural’ data — that is data on the linkages and relationships of different pieces of knowledge, for example as provided by citation information. This new material allows us to explore the “patterns of knowledge production” in deeper and richer ways than ever previously possible and often using entirely new methods.

For example, it has long been accepted that innovation and creativity are cumulative processes, in which new ideas build upon old. However, other than anecdotal and case-study material provided by historians of ideas and sociologists of science there has been little data with which to study this issue — and almost none of a comprehensive kind that would make possible a systematic examination.

However, the recent availability of comprehensive databases containing ‘citation’ information have allowed us to begin really examining the extent to which new work builds upon old — be it a new technology as represented by a patent or a new idea in academia as represented by a paper, builds upon old.

Similar opportunities present themselves in relation to identifying the creation of new fields of research or technology, and tracing their evolution over time. Here the existence of extensive “structural information” as presented, for example, by citation databases, enables new systematic approaches — for example, can new fields be identified (or perhaps defined) as points in ‘knowledge space’ far away from the existing loci of effort? or, alternatively, by the nature of its connections to the existing body of work?

Structural information of this kind can also be used in charting other changes in the life-cycle of knowledge creation. For example, to offer a specific conjecture, a field entering decline, though still exhibiting a similar level of output (papers etc) and even citations to a field in rude health, may display a citation structure which is markedly different — for example, more clustered within the field itself. Thus, by using this additional structural information we may be able to gain insights not available with simpler approaches.

At the same time, structure must also play a central role in any attempt to estimate knowledge related ‘output’ measures. This is of course not true for other forms of ‘output’, for example that of corn of steel, where we have relatively well-defined objective measures available: tonnes of such-and-such a quality.

But knowledge is different: the most obvious metrics, such as number of patents or papers produced, seem entirely inadequate: one particular innovation or paper may be ‘worth’ as much as a hundred or a thousand others.

The issue here is that, compared to corn or steel, knowledge is extremely inhomogeneous, or put slightly differently, quality (or significance) differs very substantially across the individual pieces of knowledge (papers, patents etc).

Thus, any serious attempt to measure the progress of knowledge must must find some way to do this quality-adjustment and structural information seems essential to this.

What specific questions might we explore with such datasets?

The following is a (non-exhaustive) list of the kinds of questions one might explore using these new datasets:

  • Can we use structure to infer information about quality of individual items? Clearly the answer is yes, for example by using a citation-based metric where a work’s value is estimated based on its citation by others.
  • Can we then use this information together with more global structure of the production network to gain a better idea of total (quality-adjusted) output. This would allow one to chart progress, or the lack of it, over time?
  • Can we use structural information to investigate the life-cycle of fields? For example, can we see fields ‘dying out’ or the onset of diminishing returns? Can we see new fields coming into existence and their initial growth patterns?
  • What about productivity per capita and its variation across the population? It is likely that one would need to focus here within a discipline as it would be difficult to directly compare across disciplines, at least when using quality adjusted productivity.
  • Do the structures of knowledge production vary over time and across disciplines and does this have implications for their productivity? Can we compare the structure of evolution in technology or economics with that in ‘natural’ evolution and, if not, what are the primary differences?
  • How do other (observable) attributes related to the producers of knowledge (their collaboration with others, their geographical location) affect the structures we observe and the associated outcomes (output, productivity) already discussed above?
  • Do different policies (for example openness vs. closedness — weak vs. strong IP) have implications for the structure of production and hence for output and productivity?
  • Is knowledge production (in a particular area) ergodic or path-dependent? Crudely: do we always end up in the same place or do small shocks have large long-term effects?


Update: 2011-01-31: have now broken out data worked into dedicated repos on bitbucket.

The Knowledge Commons is Different

I was looking again recently at “Understanding the Knowledge Commons” which I had perused previously.

While reading the introductory chapter by Hess and Ostrom I came across:

People started to notice behaviors and conditions on the web-congestion, free riding, conflict, overuse, and “pollution” — that had long been identified with other types of commons. They began to notice that this new conduit of distributing information was neither a private nor strictly a public resource.

I think they are absolutely right to consider the analogies of “knowledge commons” with traditional commons. However, and at the same time, I think it essential to emphasize that “knowledge commons” are also fundamentally different.

The key difference here is in the nature of the underlying good that makes up the commons: in traditional cases the good is some physical resource — seas, rivers, land — to which usage is shared (either de facto or de jure), while in the knowledge case, well, it’s knowledge!

Now physical resources are by their nature ‘rival’ (or ‘subtractable’ as the authors put it), that is your usage and my usage are substitutes — your usage reduces the amount available for me to use and, when we are close to capacity, is strictly rival — either I use it or you use it. Knowledge, however, is a classic example of a non-rival resource: when you learn something from me I’ve lost nothing but you’ve gained something.

This means, for example, that the classic ‘tragedy’ of the commons where overuse leads to destruction of the resource is simply not possible for a knowledge commons — in fact, knowledge is like some magical food from a fairytale where the more its used the more of it there is!

The more useful ‘commons’ analogy for knowledge is not in relation to use but to production and the ‘free-rider’ problems that can arise where something must be done by a team or community. The issue here is that a separation appears between your effort (private) and the resulting outcome (shared) which may lead to an under-supply of effort and ‘free-riding’ on the efforts of others (if there are ten people on guard duty late at night, one can probably take a nap endangering the city but if all ten of them do it then it could be disastrous).


1. Before any misunderstanding arises I should make clear that the authors also acknowledge the role of rival/non-rival distinction — Ostrom, in fact, was one of the ‘coiners’ of the term rivalry. However, the article’s overall focus is on the analogies with the traditional commons.

2. Jamie Boyle has talked about the “second enclosure movement”. Though interesting to make this analogy I think references to the original enclosure movement is unfortunate for two reasons. First, it reinforces the mistaken analogy between knowledge and physical goods. Second, the evidence that the original enclosure movement was bad isn’t very compelling (in fact, it probably delivered net benefits).