Category Archives: Open Knowledge Foundation

A Data Revolution that Works for All of Us

Many of today’s global challenges are not new. Economic inequality, the unfettered power of corporations and markets, the need to cooperate to address global problems and the unsatisfactory levels of accountability in democratic governance – these were as much problems a century ago as they remain today.

What has changed, however – and most markedly – is the role that new forms of information and information technology could potentially play in responding to these challenges.

What’s going on?

The incredible advances in digital technology mean we have an unprecedented ability to create, share and access information. Furthermore, these technologies are increasingly not just the preserve of the rich, but are available to everyone – including the world’s poorest. As a result, we are living in a (veritable) data revolution – never before has so much data – public and personal – been collected, analysed and shared.

However, the benefits of this revolution are far from being shared equally.

On the one hand, some governments and corporations are already using this data to greatly increase their ability to understand – and shape – the world around them. Others, however, including much of civil society, lack the necessary access and capabilities to truly take advantage of this opportunity. Faced with this information inequality, what can we do?

How can we enable people to hold governments and corporations to account for the decisions they make, the money they spend and the contracts they sign? How can we unleash the potential for this information to be used for good – from accelerating research to tackling climate change? And, finally, how can we make sure that personal data collected by governments and corporations is used to empower rather than exploit us?

So how should we respond?

Fundamentally, we need to make sure that the data revolution works for all of us. We believe that key to achieving this is to put “open” at the heart of the digital age. We need an open data revolution.

We must ensure that essential public-interest data is open, freely available to everyone. Conversely, we must ensure that data about me – whether collected by governments, corporations or others – is controlled by and accessible to me. And finally, we have to empower individuals and communities – especially the most disadvantaged – with the capabilities to turn data into the knowledge and insight that can drive the change they seek.

In this rapidly changing information age – where the rules of the game are still up for grabs – we must be active, seizing the opportunities we have, if we are to ensure that the knowledge society we create is an open knowledge society, benefiting the many not the few, built on principles of collaboration not control, sharing not monopoly, and empowerment not exploitation.

Announcing a Leadership Update at Open Knowledge

Today I would like to share some important organisational news. After 3 years with Open Knowledge, Laura James, our CEO, has decided to move on to new challenges. As a result of this change we will be seeking to recruit a new senior executive to lead Open Knowledge as it continues to evolve and grow.

As many of you know, Laura James joined us to support the organisation as we scaled up, and stepped up to the CEO role in 2013. It has always been her intention to return to her roots in engineering at an appropriate juncture, and we have been fortunate to have had Laura with us for so long – she will be sorely missed.

Laura has made an immense contribution and we have been privileged to have her on board – I’d like to extend my deep personal thanks to her for all she has done. Laura has played a central role in our evolution as we’ve grown from a team of half-a-dozen to more than forty. Thanks to her commitment and skill we’ve navigated many of the tough challenges that accompany “growing-up” as an organisation.

There will be no change in my role (as President and founder) and I will be here both to continue to help lead the organisation and to work closely with the new appointment going forward. Laura will remain in post, continuing to manage and lead the organisation, assisting with the recruitment and bringing the new senior executive on board.

For a decade, Open Knowledge has been a leader in its field, working at the forefront of efforts to open up information around the world and and see it used to empower citizens and organisations to drive change. Both the community and original non-profit have grown – and continue to grow – very rapidly, and the space in which we work continues to develop at an incredible pace with many exciting new opportunities and activities.

We have a fantastic future ahead of us and I’m very excited as we prepare Open Knowledge to make its next decade even more successful than its first.

We will keep everyone informed in the coming weeks as our plans develop, and there will also be opportunities for the Open Knowledge community to discuss. In the meantime, please don’t hesitate to get in touch with me if you have any questions.

Candy Crush, King Digital Entertainment, Offshoring and Tax

Sifting through the King Entertainment F-1 filing with the SEC for their IPO (Feb 18 2014) I noticed the following in their risk section:

The intended tax benefits of our corporate structure and intercompany arrangements may not be realized, which could result in an increase to our worldwide effective tax rate and cause us to change the way we operate our business. Our corporate structure and intercompany arrangements, including the manner in which we develop and use our intellectual property and the transfer pricing of our intercompany transactions, are intended to provide us worldwide tax efficiencies [ed: for this I read – significantly reduce our tax-rate by moving our profits to low-tax jurisdictions …]. The application of the tax laws of various jurisdictions to our international business activities is subject to interpretation and also depends on our ability to operate our business in a manner consistent with our corporate structure and intercompany arrangements. The taxing authorities of the jurisdictions in which we operate may challenge our methodologies for valuing developed technology or intercompany arrangements, including our transfer pricing, or determine that the manner in which we operate our business does not achieve the intended tax consequences, which could increase our worldwide effective tax rate and adversely affect our financial position and results of operations.

It is also interesting how they have set up their corporate structure going “offshore” first to Malta and then to Ireland (from the “Our Corporate Information and Structure” section):

We were originally incorporated as Midasplayer.com Limited in September 2002, a company organized under the laws of England and Wales. In December 2006, we established Midasplayer International Holding Company Limited, a limited liability company organized under the laws of Malta, which became the holding company of Midasplayer.com Limited and our other wholly-owned subsidiaries. The status of Midasplayer International Holding Company Limited changed to a public limited liability company in November 2013 and its name changed to Midasplayer International Holding Company p.l.c. Prior to completion of this offering, King Digital Entertainment plc, a company incorporated under the laws of Ireland and created for the purpose of facilitating the public offering contemplated hereby, will become our current holding company by way of a share-for-share exchange in which the existing shareholders of Midasplayer International Holding Company p.l.c. will exchange their shares in Midasplayer International Holding Company p.l.c. for shares having substantially the same rights in King Digital Entertainment plc. See “Corporate Structure.”

Here’s their corporate structure diagram from the “Corporate Structure” section (unfortunately barely readable in the original as well …). As I count it there are 19 different entities with a chain of length 6 or 7 from base entities to primary holding company.

Northern Mariana Islands Retirement Fund Bankruptcy

Back on April 17 2012 the Northern Mariana Islands Retirement Fund attempted to file for bankruptcy under Chapter 11. There was some pretty interesting reading in their petition for bankruptcy including this section (para 10) which suggests some pretty bad public financial management (emphasis added): “Debtor has had difficulty maintaining healthy funding levels due to […]

Open Data Maker Night London No 3 – Tuesday 16th July

The next Open Data Maker Night London will be on Tuesday 16th July 6-9pm (you can drop in any time during the evening). Like the last two it is kindly hosted by the wonderful Centre for Creative Collaboration, 16 Acton Street, London.

Look forward to seeing folks there!

What

Open Data Maker Nights are informal events focused on “making” with open data – whether that’s creating apps or insights. They aren’t a general meetup – if you come, expect to get pulled into actually building something, though we won’t force you!

Who

The events usually have short introductory talks about specific projects and suggestions for things to work on – it’s absolutely fine to turn up knowing nothing about data or openness or tech as, there’ll an activity for you to help and someone to guide you in contributing!

Organize your own!

Not in London? Why not organize your own Open Data Maker night in your city? Anyone can and it’s easy to do – find out more »

data.okfn.org – update no. 1

This is the first of regular updates on Labs project http://data.okfn.org/ and summarizes some of the changes and improvements over the last few weeks.

1. Refactor of site layout and focus.

We’ve done a refactor of the site to have stronger focus on the data. Front page tagline is now:

We’re providing key datasets in high quality, easy-to-use and open form

Tools and standards are there in a clear supporting role. Thanks to all the suggestions and feedback on this and welcome more - we’re still iterating.

2. Pull request data workflow

There was a nice example of the pull request data workflow being used (by a complete stranger!): https://github.com/datasets/house-prices-uk/pull/1

3. New datasets

For example:

Looking to contribute data check out the instructions http://data.okfn.org/about/contribute#data and the outstanding requests: https://github.com/datasets/registry/issues

4. Tooling

5. Feedback on standards

There’s been a lot of valuable feedback on the data package and json table schema standards including some quite major suggestions (e.g. substantial change to JSON Table Schema to align more closely with JSON Schema - thx to jpmckinney)

Next steps

There’s plenty more coming up soon in terms of data and the site and tools.

Get Involved

Anyone can contribute and its easy – if you can use a spreadsheet you can help!

Instructions for getting involved here: http://data.okfn.org/about/contribute

Update on PublicBodies.org – a URL for every part of Government

This is an update on PublicBodies.org - a Labs project whose aim is to provide a “URL for every part of Government”: http://publicbodies.org/

PublicBodies.org is a database and website of “Public Bodies” – that is Government-run or controlled organizations (which may or may not have distinct corporate existence). Examples would include government ministries or departments, state-run organizations such as libraries, police and fire departments and more.

We run into public bodies all the time in projects like OpenSpending (either as spenders or recipients). Back in 2011 as part of the “Organizations” data workshop at OGD Camp 2011, Labs member Friedrich Lindenberg scraped together a first database and site of “public bodies” from various sources (primarily FoI sites like WhatDoTheyKnow, FragDenStaat and AskTheEU).

We’ve recently redone the site converting the sqlite DB to simple flat CSV files:

The site itself is now super-simple flat-files hosted on s3 (build code here). Here’s an example of the output:

The simplicity of CSV for data plus simple templating to flat-files is very attractive. There are some drawbacks such as changes to primary template resulting in a full rebuild and upload of ~6k files so, especially as the data grows, we may want to look into something a bit nicer but for the time being this works well.

Next Steps

There’s plenty that could be improved e.g.

  • More data - other jurisdictions (we only cover EU, UK and Germany) + descriptions for the bodies (this could be a nice crowdcrafting app)
  • Search and Reconciliation (via nomenklatura)
  • Making it easier to submit corrections or additions

The full list of issues is on github here: https://github.com/okfn/publicbodies/issues

Help is most definitely wanted! Just grab one of the issues or get in touch

Quick and Dirty Analysis on Large CSVs

I’m playing around with some large(ish) CSV files as part of a OpenSpending related data investigation to look at UK government spending last year – example question: which companies were the top 10 recipients of government money? (More details can be found in this issue on OpenSpending’s things-to-do repo).

The dataset I’m working with is the consolidated spending (over £25k) by all UK goverment departments. Thanks to the efforts of of OpenSpending folks (and specifically Friedrich Lindenberg) this data is already nicely ETL’d from thousands of individual CSV (and xls) files into one big 3.7 Gb file (see below for links and details).

My question is what is the best way to do quick and dirty analysis on this?

Examples of the kinds of options I was considering were:

  • Simple scripting (python, perl etc)
  • Postgresql - load, build indexes and then sum, avg etc
  • Elastic MapReduce (AWS Hadoop)
  • Google BigQuery

Love to hear what folks think and if there are tools or approaches they would specifically recommend.

The Data

Cleaning up Greater London Authority Spending (for OpenSpending)

I’ve been working to get Greater London Authority spending data cleaned up and into OpenSpending. Primary motivation comes from this question:

Which companies got paid the most (and for doing what)? (see this issue for more)

I wanted to share where I’m up to and some of the experience so far as I think these can inform our wider efforts - and illustrate the challenges just getting and cleaning up data. I note that the code and README for this ongoing work is in a repo on github: https://github.com/rgrp/dataset-gla

Data Quality Issues

There are 61 CSV files as of March 2013 (a list can be found in scrape.json).

Unfortunately the “format” varies substantially across files (even though they are all CSV!) which makes using this data real pain. Some examples:

  • no of fields and there names vary across files (e.g. SAP Document no vs Document no)
  • number of blank columns or blank lines (some files have no blank lines (good!), many have blank lines plus some metadata etc etc)
  • There is also at least one “bad” file which looks to be an excel file saved as CSV
  • Amounts are frequently formatted with “,” making them appear as strings to computers.
  • Dates vary substantially in format e.g. “16 Mar 2011”, “21.01.2011” etc
  • No unique transaction number (possibly document number)

They also switched from monthly reporting to period reporting (where there are 13 periods of approx 28d each).

Progress so far

I do have one month loaded (Jan 2013) with a nice breakdown by “Expenditure Account”:

http://openspending.org/gb-local-gla

Interestingly after some fairly standard grants to other bodies, “Claim Settlements” comes in as the biggest item at £2.3m

Progress on the Data Explorer

This is an update on progress with the Data Explorer (aka Data Transformer).

Progress is best seen from this demo which takes you on a tour of house prices and the difference between real and nominal values.

More information on recent developments can be found below. Feedback is very welcome - either here or the issues https://github.com/okfn/dataexplorer.

House prices tutorial

What is the Data Explorer

For those not familiar, the Data Explorer is a HTML+JS app to view, visualize and process data just in the browser (no backend!). It draws heavily on the Recline library and features now include:

  • Importing data from various sources (the UX of this could be much improved!)
  • Viewing and visualizing using Recline to create grids, graphs and maps
  • Cleaning and transforming data using a scripting component that allows you to write and run javascript
  • Saving and sharing: everything you create (scripts, graphs etc) can be saved and then shared via public URL.

Note, that persistence (for sharing) is to Gists (here’s the gist for the House Prices demo linked above). This has some nice benefits such as versioning; offline editing (clone the gist, edit and push); and bl.ocks.org-style ability to create a gist and have it result in public viewable output (though with substantial differences vs blocks …).

What’s Next

There are many areas that could be worked on – a full list of issues is in github. The most important I think at the moment are:

I’d very interested in people’s thoughts on the app so far and what should be done next and code contributions are also very welcome (the app has already benefitted from the efforts of many people including the likes of Martin Keegan and Michael Aufreiter to the app itself; and from folks like Max Ogden, Friedrich Lindenberg, James Casbon, Gregor Aisch, Nigel Babu (and many more) in the form of ideas, feedback, work on Recline etc).