As announced on Friday on the UK Government’s data.gov.uk, I am one of the members of the UK Government’s newly formed Public Sector Transparency Board.

From the announcement:

The Public Sector Transparency Board, which was established by the Prime Minister, met yesterday for the first time.

The Board will drive forward the Government’s transparency agenda, making it a core part of all government business and ensuring that all Whitehall departments meet the new tight deadlines set for releasing key public datasets. In addition, it is responsible for setting open data standards across the whole public sector, listening to what the public wants and then driving through the opening up of the most needed data sets.

Chaired by Francis Maude, the Minister for the Cabinet Office, the other members of the Transparency Board are Sir Tim Berners-Lee, inventor of the World Wide Web, Professor Nigel Shadbolt from Southampton University, an expert on open data, Tom Steinberg, founder of mySociety, and Dr Rufus Pollock from Cambridge University, an economist who helped found the Open Knowledge Foundation.

In the words of Francis Maude:

“In just a few weeks this Government has published a whole range of data sets that have never been available to the public before. But we don’t want this to be about a few releases, we want transparency to become an absolutely core part of every bit of government business. That is why we have asked some of the country’s and the world’s greatest experts in this field to help us take this work forward quickly here in central government and across the whole of the public sector.”

Yesterday, in a speech on “Building Britain’s Digital Future”, UK Prime Minister Gordon Brown announced wide-ranging plans to open up UK government data. In addition to a general promise to extend the existing commitments to “make public data public” the PM announced:

  • The opening up of a large and important set of transport data (the NaPTAN dataset)
  • A commitment to open up a significant amount of Ordnance Survey data from the 1st April (though details of which datasets not yet specified)
  • By the Autumn an online e-”domesday” book giving “an inventory of all non-personal datasets held by departments and arms-length bodies
  • A new “institute” for web science headed by Tim Berners-Lee and Nigel Shadbolt and with an initial £30m in funding

This speech is a significant indication of a further commitment to the “making public data public” policy announced in the Autumn.

It’s great to see this as, a year ago it seemed as if government policy was set to largely ignore the research in the Models of Public Sector Information Provision by Trading Funds report (authored by myself, David Newbery and Professor Bently back in 2008) whose basic conclusions was that that government data which was digital, bulk and ‘upstream’ should be made available at marginal cost.

More detailed excerpts (with emphasis added)

Opening up data

In January we launched data.gov.uk, a single, easy-to-use website to access public data. And even in the short space of time since then, the interest this initiative has attracted – globally – has been very striking. The site already has more than three thousand data sets available – and more are being added all the time. And in the past month the Office for National Statistics has opened up access for web developers to over two billion data items right down to local neighbourhood level.

The Department for Transport and the transport industry are today making available the core reference datasets that contain the precise names and co-ordinates of all 350 thousand bus stops, railway stations and airports in Britain.

Public transport timetables and real-time running information is currently owned by the operating companies. But we will work to free it up – and from today we will make it a condition of future franchises that this data will be made freely available.

And following the strong support in our recent consultation, I can confirm that from 1st April, we will be making a substantial package of information held by ordnance survey freely available to the public, without restrictions on re-use. Further details on the package and government’s response to the consultation will be published by the end of March.

e-Domesday Book

And I can also tell you today that in the autumn the Government will publish online an inventory of all non-personal datasets held by departments and arms-length bodies – a “domesday book” for the 21st century.

The programme will be managed by the National Archives and it will be overseen by a new open data board which will report on the first edition of the new domesday book by April next year. The Government will then produce its detailed proposals including how this work can be extended to the wider public sector.

To inform the continuing development of making public data public, the National Archives will produce a consultation paper on a definition of the “public task” for public data, to be published later this year.

The new domesday book will for the first time allow the public to access in one place information on each set of data including its size, source, format, content, timeliness, cost and quality. And there will be an expectation that departments will release each of these datasets, or account publicly for why they are not doing so.

Any business or individual will be free to embed this public data in their own websites, and to use it in creative ways within their own applications.

Mygov

So our goal is to replace this first generation of e-government with a much more interactive second generation form of digital engagement which we are calling Mygov.

Companies that use technology to interact with their users are positioning themselves for the future, and government must do likewise. Mygov marks the end of the one-size-fits-all, man-from-the-ministry-knows-best approach to public services.

Mygov will constitute a radical new model for how public services will be delivered and for how citizens engage with government – making interaction with government as easy as internet banking or online shopping. This open, personalised platform will allow us to deliver universal services that are also tailored to the needs of each individual; to move from top-down, monolithic websites broadcasting public service information in the hope that the people who need help will find it – to government on demand.

And rather than civil servants being the sole authors and editors, we will unleash data and content to the community to turn into applications that meet genuine needs. This does not require large-scale government IT Infrastructure; the ‘open source’ technology that will make it happen is freely available. All that is required is the will and willingness of the centre to give up control.

This Thursday (11th March) I’m speaking at the Forum Virium’s Open Up the City event in Helsinki.

This year their focus is on “open data, design, interfaces and innovation” and I’m speaking under the title “Open Data: What, Why, How?”.

This Wednesday (27th of January) at 1pm I’m giving one of Cambridge University Library’s regular lunch-time talks on Openness and Libraries. Attendance is free and anyone can come along!

Update (28th Jan): talk is done and slides are now up.

Blurb

Over the past few years, open licensing (http://www.opendefinition.org/) has facilitated the explosive growth of a ‘knowledge commons’. To give a few prominent examples: Open Access journals, Open Educational Resources and Open Data in scientific research have all been enabled by licenses which permit material to be freely re-used and re-distributed. This outpouring of support for openness has led to an incredible rise in community-led development and innovative uses.

Bibliographic records are a key part of our shared cultural heritage and essential to anyone working with cultural materials (books, music, films etc). Opening up those records for access and re-use offer a variety of benefits.

First, it would allow libraries to share records more efficiently and improve quality more rapidly through better, easier feedback. Second, easier access to catalogue data would spur development of the multifarious services, technologies and research that use that data, including, for example, search engines, book or music websites, researchers working on information production, journalists writing on orphan works, as well as many other areas we cannot even imagine in advance.

With a growing number of Government agencies and public institutions making data open, is is now time for the library community to do likewise?

Open Notebook Social Science

October 22nd, 2009

The other day I posted up some work-in-progress on the subject of patterns of knowledge production.

That material is still in a fairly preliminary state. However, my decision to release it it in this form was a conscious decision and part of an ongoing attempt on my part to practice a more open “release early, release often” approach to research.

In doing this I’m drawing direct inspiration from the open source and open notebook (science) communities and seeking to engage in what might be termed open notebook social science!

I think most researchers (including myself) feel a reluctance to put out material that isn’t at a reasonable level of maturity. While there are some good reasons for this, I think the main motivations are less positive, and are primarily to do with fear: be it of criticism or that your ideas are “taken” by others. While such fears can have some basis, it seems to me the benefits of an open approach — in terms of visibility, dissemination, and potential for collaboration — significantly outweigh any of the associated risks.

Over the last year, I’ve already been making some effort to move in this direction but from this point on I’m aiming to do this more thoroughly and methodically. A first step in this will be to put all the “patterns” and data online.

The Open Knowledge Foundation’s 2009 Open Knowledge Conference (OKCon), which I help organize, will take place next Saturday 28th March – less than a week away.

Full details including programme can be found either in this blog post or on the OKCon home page.

As usual this will be a fun and informal day so if you’re free this Saturday and interested in “Open” stuff come along to UCL and take part.

I should also add that for the two days before (Thursday + Friday) there is also the 5th COMMUNIA Workshop which is about Accessing, Using, Reusing Public Sector Content and Data which is being co-organized by the Open Knowledge Foundation together with the London School of Economics and taking place at LSE (all thanks to the tireless work of Jonathan Gray and Prodromos Tsiavos!).

As a member of the Econometric Society I received yesterday the following announce:

The Council and the Fellowship of the Econometric Society have both voted in favor of a plan for the Society to publish two open-access journals: Quantitative Economics (QE) and Theoretical Economics (TE). All voting Council members were in favor of the proposal. Among the active Fellows, 277 (66.4% of the total) cast their ballots, with 240 votes (86.6%) in favor, 30 (10.8%) against, and 7 (2.5%) abstentions. An announcement together with a description of the new journals may be found in http://www.econometricsociety.org/news1.asp?ref=81 .

QE will be started from scratch and its first issue is planned for 2010. TE has been published by the Society for Economic Theory (http://econtheory.org/ ), but is to be adopted by the Econometric Society later this year. The first issue in 2010 will be the first one as a Society journal.

This is great news.

Recent Work on Open Economics

January 23rd, 2009

Over the Christmas break I had a chance to make some substantial improvements/additions to our Open Economics including:

  1. Improved javascript graphing.
  2. Extend Millenium Development Goals package and added web interface.
  3. First efforts at ‘Where Does My Money Go’

More details on each of these can be found below. Also we’d be delighted to here from anyone interested in getting involved in this, especially with the last item, so if interested do get in touch.

1. Updated javascript graphing package to use flot.

This also allows us to use javascript make the graphing stuff more interactive, in particular to select chart type and the series to plot. See e.g. the data on Daily Wages of Thatchers in the Middle Ages or Wheat, barley, oat, mutton and wool prices, and agricultural wages, 1500-1849.

2. Improved Millenium Development Goals package/dataset and added a web interface.

Extended ‘packagization’ of the MDG data by creating a mini-domain model and an associated sql version of data in addition to the existing csv normalized-tabular version of the data:

http://knowledgeforge.net/econ/svn/trunk/econdata/mdg/db.py

This is much more convenient for analysis (e.g. finding all countries which have at least one entry for any of these 3 series between 1995 and 2005 …). It is also essential for:

New web interface for Millenium Development Goals

Using the sql version of the data is was easy to build a quick-and-dirty web interface to enables one to browse and view the data quickly:

http://www.openeconomics.net/mdg/

For example here’s chart and data showing “Children under 5 moderately or severely underweight, percentage” for Afghanistan, China, India, United States:

http://www.openeconomics.net/mdg/view?commit=Show+Values&series=559&countries=4&countries=156&countries=356&countries=840

3. First efforts at ‘Where Does My Money Go’

Two parts to this project a) getting the data on government revenue/expenditure b) displaying it nicely in a web interface.

Part (a) is encapsulated in a new ukgovfinances dataset:

http://knowledgeforge.net/econ/svn/trunk/econdata/ukgovfinances/

Using this data we have made a (small) start on the web interface:

http://www.openeconomics.net/wdmmg/

The Open Knowledge Foundation (which I’m involved in) is co-organizing with MySociety and OPSI, a Workshop on Finding and Re-using Public (Sector) Information.

The event takes place this Saturday (1st of November) at the London Knowledge Lab near Holborn in London. Full details in this OKFN blog post and you can sign up the wiki page:

http://okfn.org/wiki/PublicInformation

One of the active Open Knowledge Foundation projects is Open Economics. A substantial part of that effort ends up being data acquisition and ‘cleaning’: getting hold of economic data, parsing it into (computer) usable form and adding it to the Store. (Wouldn’t it be nice if that data was already nicely packaged up or at least in a decent raw form …).

Once this job is done, the data is there in a nice clean state for others to use — plus we can draw some nice graphs (as we will see below). As an illustration of this process, we’ll look at one particular dataset acquired earlier this year when, motivated by the large increases in commodity prices and the concerns expressed regarding their impact, I decided to see what data I could dig up on food prices (starting with Wheat).

As usual, it was US government material that was most easily available (in a decent format) and I decided to start off with historical information on wheat to be found in the Wheat Yearbook, in particular the contents of:

http://www.ers.usda.gov/data/wheat/yearbook/WheatYearbookTables-Recent.xls

While the data was available (and open — since US Govt provided) it was in a format that was not immediately computer usable (lots of blank lines etc). Thus, the first step was to parse this into standard csv file format (see script here) and then upload this to Open Economics. The result:

http://www.openeconomics.net/store/517d7c4e-3cb7-4e8f-aaa1-745dd665ad1f

Not only do we now have nice clean data but, thanks to plotkit, Open Economics has javascript graphing so without any more effort we can automatically have graphs of the resulting material. Not only does this allow us to answer our original question (see Fig 4) but these graphs also tell a fascinating historical story:

US Wheat: 1866 – 2007

NB: if the figures are too small click through for the full-size versions on Open Economics (the dates at the bottom run from 1866 to 2007)

Figure 1: Output (Millions of Bushels)

US Wheat Data

First up is output. As can be seen here output rose steadily (approximately linearly) up until the First World War. It then stayed flat or even fell during the inter-war period — the Great Depression and the Dust Bowl can be seen in the sharp dip in the early 1930s. Following the Second World War output rose, accelerating (exponentially?) up until the early 1980s when it has flattened out, even declining (with sharp variations) to the present.

Looking at these raw output figures the immediate question one asks (at least as an economic historian) is: what underlying causes drove these changes in output. In particular, output is the product of two factors: total acreage in use and yield (average output per acre) so it would be interesting to see time-series for them as well. Fortunately this data is also available:

Figure 2: Acreage (Millions of Acres)

US Wheat Data

The first thing to note is that these series start in 1866, the year after the American Civil War ended. This was a period of great westward expansion in cultivation in the United States — the “Opening of the Prairies”. The graph bears graphic witness to these changes: we can see that harvested acreage tripled between 1866 and the outbreak of WWI in 1914.

This massive expansion was to have a profound effect far outside of the US: food prices dropped around the world due to the increase in supply. In Western Europe this lead to a ‘Great Depression’ in agriculture right up until the First World War (which in turn had a significant effect on European politics creating protectionist alliances between peasants and landowners in many European countries). It also assisted industrialization by keeping the price of bread low for the fast growing industrial proletariat.

However, by the end of WWI most of the acreage that could be cultivated was already in use. After that point, while there has been variation in planted acreage (perhaps driven by substitution between wheat and other crops) there has been no long term trend (whether increasing or decreasing). Thus, while the increase in output up to WWI can be largely explained by increases in acreage under cultivation [^1] the large increases in output in the post-WWII period can’t be. This brings us then to the second major factor in explaining changes in output: yields.

[^1]: a crude eyeballing suggests that output increased somewhere between 3-4 times between 1866 and WWI. This is in line with the increase in acreage. That said, diminishing returns arguments (best land is cultivated first) would suggest that to maintain yield per acre on a vastly increased acreage would have necessitated some increase in yields.

Figure 3: Yield (Bushels / Acre)

US Wheat Data

One could not ask for a sharper confirmation of our previous hypothesis than Figure 3. As it shows average yields were almost perfectly flat from 1866 up until the end of the Second World War. From that point yields took off growing sharply, but at an almost constant rate, up until the mid 70s, following which the growth rate slowed substantially (though yields still continued to grow albeit with increased variability). In concrete terms this corresponded to a rise in yield from around 12 bushels per acre at the end of WWII to somewhere around 35 bushels per acre in the 70s — and around 40 today.

To put this most starkly: there was a roughly 3-fold increase in yields in this 30 year period. Again this is a particularly ‘graphic’ testament to the ‘green revolution’ of the post-war period which was driven largely by the development and adoption of new corn varieties (hybrid corn), fertilizers etc.

Figure 4: Price ($ per Bushel)

US Wheat Data

Lastly we come to price. Here, despite substantial fluctuations the basic trends fit with our historical intuition. There is little change between 1866 and WWI, a sharp rise during the war, a substantial decline in the inter-war period, then another sharp-rise during WWII (wars are good for farmers!) followed by stabilization (or even slight decline) until the mid 1970s when there is another sharp rise. Following that there is substantial variation but no great changes until the present when the line shoots up again (doubling from around $3 per bushel to somewhere near $6 in a year).

As basic economics tell us, price should reflect the interaction of supply and demand. The marked stability of price over long periods (particularly those where supply has increased) suggests then that demand has matched supply (or vice-versa) fairly well over this period (one might also need to take account of the fact that there may also have been substantial government intervention to stabilize prices).

Given that supply has risen substantially through the whole period, and especially since WWII (see Fig 1) this means that demand has also been climbing sharply. This is true: world population has increased at least 5x since 1850 and roughly tripled since WWII (in addition many people, especially in developed countries have increased their per-capita consumption, by eating more and better — as well as wasting more).

It would be interesting to imagine what would have happened if this kind of population increase, particularly that since WWII, had occurred without the massive increase in yields shown in Figure 3 (part of the answer may be that population would not have increased so much …). Certainly the price increases seen recently may reflect the kind of growing surplus of demand over supply that we would have seen without the ‘green revolution’. As such, they may be signals of the significant readjustments that will be needed in the near future, whether that be increases in supply, reductions in demand or more efficient use of existing supplies.