Author Archives: Rufus Pollock

Putting Open at the Heart of the Digital Age

Introduction

I’m Rufus Pollock.

In 2004 I founded a non-profit called Open Knowledge

The mission we set ourselves was to open up all public interest information – and see it used to create insight that drives change.

What sort of public interest information? In short, all of it. From big issues like how our government spends our taxes or how fast climate change is happening to simple, everyday, things like when the next bus is arriving or the exact address of that coffee shop down the street.

For the last decade, we have been pioneers and leaders in the open data and open knowledge movement. We wrote the original definition of open data in 2005, we’ve helped unlock thousands of datasets. And we’ve built tools like CKAN, that powers dozens of open data portals, like data.gov in the US and data.gov.uk in the UK. We’ve created a network of individuals and organizations in more than 30 countries, who are all working to make information open, because they want to drive insight and change.

But today I’m not here to talk specifically about Open Knowledge or what we do.

Instead, I want to step back and talk about the bigger picture. I want to talk to you about digital age, where all that glitters is bits, and why we need to put openness at its heart.

Gutenberg and Tyndale

To do that I first want to tell you a story. Its a true story and it happened a while ago – nearly 500 years ago. It involves two people. The first one is Johannes Gutenberg. In 1450 Gutenberg invented this: the printing press. Like the Internet in our own time, it was revolutionary. It is estimated that before the printing press was invented, there were just 30,000 books in all of Europe. 50 years later, there were more than 10 million. Revolutionary, then, though it moved at the pace of the fifteenth century, a pace of decades not years. Over the next five hundred years, Gutenberg’s invention would transform our ability to share knowledge and help create the modern world.

The second is William Tyndale. He was born in England around 1494, so he grew up in world of Gutenberg’s invention.

Tyndale followed the classic path of a scholar at the time and was ordained as a priest. In the 1510s, when he was still a young man, the Reformation still hadn’t happened and the Pope was supreme ruler of a united church across Europe. The Church – and the papacy – guarded its power over knowledge, forbidding the translation of the bible from Latin so that only its official priests could understand and interpret it.

Tyndale had an independent mind. There’s a story that he got into an argument with a local priest. The priest told him:

“We are better to be without God’s laws than the Pope’s.”

Tyndale replied:

“If God spare my life ere many years, I will cause the boy that drives the plow to know more of the scriptures than you!”

What Tyndale meant was that he would open up the Bible to everyone.

Tyndale made good on his promise. Having fled abroad to avoid persecution, between 1524 and 1527 he produced the first printed English translation of the Bible which was secretly shipped back to England hidden in the barrels of merchant ships. Despite being banned and publicly burnt, his translation spread rapidly, giving ordinary people access to the Bible and sowing the seeds of the Reformation in England.

However, Tyndale did not live to see it. In hiding because of his efforts to liberate knowledge, he was betrayed and captured in 1534. Convicted of heresy for his work, on the 6th October 1536, he was strangled then burnt at the stake in a prison yard at Vilvoorden castle just north of modern day Brussels. He was just over 40 years old.

Internet

So let’s fast forward now back to today, or not quite today – the late 1990s.

I go to college and I discover the Internet.

It just hit me: wow! I remember days spent just surfing around. I’d always been an information junkie, and I felt like I’d found this incredible, never-ending information funfair.

And I got that I was going to grow up in a special moment, at the transition to an information age. We’d be living in this magical world, where the the main thing we create and use – information – could be instantaneously and freely shared with everyone on the whole planet.

But … why Openness

So, OK the Internet’s awesome …

Bet you haven’t heard that before!

BUT … – and this is the big but.

The Internet is NOT my religion.

The Internet – and digital technology – are not enough.

I’m not sure I have a religion at all, but if I believe in something in this digital age, I believe in openness.

This talk is not about technology. It’s about how putting openness at the heart of the digital age is essential if we really want to make a difference, really create change, really challenge inequity and injustice.

Which brings me back to Tyndale and Gutenberg.

Tyndale revisited

Because, you see, the person that inspired me wasn’t Gutenberg. It was Tyndale.

Gutenberg created the technology that laid the groundwork for change. But the printing press could very well have been used to pump out more Latin bibles, which would then only have made it easier for local priests to be in charge of telling their congregations the word of God every Sunday. More of the same, basically.

Tyndale did something different. Something so threatening to the powers that be that he was executed for it.

What did he do? He translated the Bible into English.

Of course, he needed the printing press. In a world of hand-copying by scribes or painstaking woodcut printing, it wouldn’t make much difference if the Bible was in English or not because so few people could get their hands on a copy.

But, the printing press was just the means: it was Tyndale’s work putting the Bible in everyday language that actually opened it up. And he did this with the express purpose of empowering and liberating ordinary people – giving them the opportunity to understand, think and decide for themselves. This was open knowledge as freedom, open knowledge as systematic change.

Now I’m not religious, but when I talk about opening up knowledge I am coming from a similar place: I want anyone and everyone to be able to access, build on and share that knowledge for themselves and for any purpose. I want everyone to have the power and freedom to use, create and share knowledge.

Knowledge power in the 16th century was controlling the Bible. Today, in our data driven world it’s much broader: it’s about everything from maps to medicines, sonnets to statistics. Its about opening up all the essential information and building insight and knowledge together.

This isn’t just dreaming – we have inspiring, concrete examples of what this means. Right now I’ll highlight just two: medicines and maps.

Example: Medicines

Everyday, millions of people around the world take billions of pills, of medicines.

Whether those drugs actually do you good – and what side effects they have – is obviously essential information for researchers, for doctors, for patients, for regulators – pretty much everyone.

We have a great way of assessing the effectiveness of drugs: randomized control trials in which a drug is compared to its next best alternative.

So all we need is all the data on all those trials (this would be non-personal information only – any information that could identify individuals would be removed). In an Internet age you’d imagine that that this would be a simple matter – we just need all the data openly available and maybe some way to search it.

You’d be wrong.

Many studies, especially negative ones, are never published – the vast majority of studies are funded by industry who use restrictive contracts to control what gets published. Even where pharmaceutical companies are required to report on the clinical trials they perform, the regulator often keeps the information secret or publishes it as 8,000 page PDFs each page hand-scanned and unreadable by a computer.

If you think I’m joking I’ll give just one very quick example which comes straight from Ben Goldacre’s Bad Pharma. In 2007 researchers in Europe wanted to review the evidence on a diet drug called rimonabant. They asked the European regulator for access to the original clinical trials information submitted when the drug was approved. For three years they were refused access on a variety of grounds. When they did get access this is what they got initially – that’s right 60 pages of blacked out PDF.

We might think this was funny if it weren’t so deadly serious: in 2009, just before the researchers finally got access to the data, rimonabant was removed from the market on the grounds that it increased the risk of serious psychiatric problems and suicide.

This situation needs to change.

And I’m happy to say something is happening. Working with Ben Goldacre, author of Bad Pharma, we’ve just started the OpenTrials project. This will bring together all the data, on all the trials and link it together and make it open so that everyone from researchers to regulators, doctors to patients can find it, access it and use it.

Example: Maps

Our second example is maps. If you were looking for the “scriptures” of this age of digital data, you might well pick maps, or, more specifically the geographic data on which they are built. Geodata is everywhere: from every online purchase to the response to the recent earthquakes in Nepal.

Though you may not realize it, most maps are closed and proprietary – you can’t get the raw data that underpins the map, you can’t alter it or adapt it yourself.

But since 2004 a project called OpenStreetMap has been creating a completely open map of the planet – raw geodata and all. Not only is it open for access and reuse use the database itself is collaboratively built by hundreds of thousands of contributors from all over the world.

What does this mean? Just one example. Because of its openness OpenStreetMap is perfect for rapid updating when disaster strikes – showing which bridges are out, which roads are still passable, what buildings are still standing. For example, when a disastrous earthquake struck Nepal in April this year, volunteers updated 13,199 miles of roads and 110,681 buildings in under 48 hours providing crucial support to relief efforts.

The Message not the Medium

To repeat then: technology is NOT teleology. The medium is NOT the message – and it’s the message that matters.

The printing press made possible an “open” bible but it was Tyndale who made it open – and it was the openness that mattered.

Digital technology gives us unprecedented potential for creativity, sharing, for freedom. But they are possible not inevitable. Technology alone does not make a choice for us.

Remember that we’ve been here before: the printing press was revolutionary but we still ended up with a print media that was often dominated by the few and the powerful.

Think of radio. If you read about how people talked about it in the 1910s and 1920s, it sounds like the way we used to talk about the Internet today. The radio was going to revolutionize human communications and society. It was going to enable a peer to peer world where everyone can broadcast, it was going to allow new forms of democracy and politics, etc. What happened? We got a one way medium, controlled by the state and a few huge corporations.

Look around you today.

The Internet’s costless transmission can – and is – just as easily creating information empires and information robber barons as it can creating digital democracy and information equality.

We already know that this technology offers unprecedented opportunities for surveillance, for monitoring, for tracking. It can just as easily exploit us as empower us.

We need to put openness at the heart of this information age, and at the heart of the Net, if we are really to realize its possibilities for freedom, empowerment, and connection.

The fight then is on the soul of this information age and we have a choice.

A choice of open versus closed.

Of collaboration versus control.

Of empowerment versus exploitation.

Its a long road ahead – longer perhaps than our lifetimes. But we can walk it together.

In this 21st century knowledge revolution, William Tyndale isn’t one person. It’s all of us, making small and big choices: from getting governments and private companies to release their data, to building open databases and infrastructures together, from choosing apps on your phone that are built on open to using social networks that give you control of your data rather than taking it from you.

Let’s choose openness, let’s choose freedom, let’s choose the infinite possibilities of this digital age by putting openness at its heart.

Thank you.

Poem – Untitled #23 – Written on a Train

Written on a train, some years ago

The tail-end of dusk,
Its softness making beauty of the world

The distant horizon pinked up in pastels
Beckons to eternity

While leftward lies darkness
Gathering all to her endless embrace.

The trees now shorn to subtlety
Are framed against the remnants of the sky

Trees, hedges, houses
All are soft shadows of themselves

And even a car-park sheathed in raucous lights
Can not offend.

The hemisphere runs in colours
Tuned to some resonance within

That calls out to savannah’s long ago
Beauty is leaving us and –

I cannot, cannot hold this,
Cannot make enough of my mind space

To seize this fleeting figure of the world
And make it fast.

Slow Tech

A thought from a recent a recent Digital Supper: just as there is “Slow Food” do we need a “Slow Technology” movement – technology at a human pace.

A key difference is that this can’t work just from individual action – though that will help – we would need coordinated action if tech were to evolve slower.

Open Knowledge appoints Pavel Richter as new CEO

I am delighted to announce we have found the newest member of the Open Knowledge team: Pavel Richter joins us as our new CEO!

Pavel Richter

Pavel’s appointment marks a new chapter in the development of Open Knowledge, which, over the last ten years, has grown into one of the leading global organisations working on open data and open knowledge in government, research, and culture.

Pavel has a rich and varied background including extensive time both in business and in the non-profit sector. In particular, Pavel brings his experience from over five years as the Executive Director of Wikimedia Deutschland: under his leadership, it grew to more than 70 staff, an annual budget of nearly 5 million Euros, and initiated major new projects such as Wikidata. Pavel’s engagement follows an extensive international search, led by a team including members of the Board of Directors as well as a Community Representative.

Personally, I am delighted and excited to welcome Pavel as CEO. This appointment represents an important step in the development of Open Knowledge as an organisation and community. Over the last decade, and especially in the last five years, we have achieved an immense amount.

Going forward one of our most important opportunities – and challenges – will be to forge and catalyse a truly global movement to put openness at the heart of the information age. Pavel’s experience, insight and passion make him more than equal to this task and I am thrilled to be able to work with him, and support him, as he takes on this role.

Open Data Can Speed up Research – Andy Beck of Harvard Medical School

Dr Andy Beck of Harvard Medical School in Reddit AMA thread:

Interesting question. I think there is a lot of value in actually showing the utility of open data, by using it creatively to answer important research questions. There are now huge public databases available and growing everyday (e.g., https://tcga-data.nci.nih.gov/tcga/ , http://www.ncbi.nlm.nih.gov/geo/). I think it’s powerful to show a student that using open data they can answer a question in 5 minutes that previously may have taken an entire PhD dissertation to complete. In addition, to advocating through use of data, supporting high quality open access journals is also a great way to advocate. [Source]

All of our lock-in fears prove justified – Twitter

Having acquired Gnip, Twitter is cutting off bulk access (the “firehose”) for everyone else – see e.g. Datasift announce and piece on recode.

Twitter have also been gradually shutting off / increasing control of access over the last few years. E.g. RSS shut down, then they changed API terms of use and got increasingly aggressive about that use.

It was always likely what the direction of travel would be for these “free” services – after all, somehow they’ve got to make money whilst providing “web-scale” service. But there’s nothing like an existence proof to give a distant predictable reality an immediacy that justifies action.

Of course the tough thing is the very reason we all use Facebook or Twitter or even Google is the immense direct and indirect network effects. That’s what makes it so tough for us individually to do much. However, as the need to monetise and protect their monopolies grow I think we are nearing the tipping point where we get some interesting innovation and disruption.

For a good review here see: http://stratechery.com/2015/twitter-might/ whose final paras i esp like:

Twitter’s story in many respects makes me think of Google: both companies started out benefiting greatly from openness and the power of both connecting users to what they were interested in and opening up powerful APIs to developers. The monetization model is even similar: note the AdSense reference above. Over time, though, Google has pulled more and more of its utility onto its own pages (and the revenue balance in the company has followed), just as Twitter focused on its own apps, and now Google is even starting to eat its best customers like travel websites and insurance agents (members-only), just like Twitter ate Datasift.

Frankly, the arc of both companies is simultaneously understandable and saddening to me. I’ve loved them both for the ways they have connected me to truly new ideas and new people, and it’s frustrating to see the growth imperative push both companies to turn increasingly inwards. One does wonder if they might find salvation in each other.

Grey dawn, you welcome not my spirit to the day

Grey dawn, you welcome not my spirit to the day.
Locked deep in winter’s embrace, the depths of January
Are moribund of hope, and I can but think on Spring
To keep from despair and an endless sojourn in the soft arms of sleep.

The day does not begin but seeps in, in sluggish batches from the East.
The watery light of a half-begotten sun
Has barely strength enough to banish night and makes us only think
Ever of indoors, indoors!

Why weighs my spirit so this season’s lack?
There is good to take in it I’m sure, yet here,
Stood here, this Janus’d morn, with heaven swathed in grey
I cannot find it, and must survive with heavy heart
             these bleak mid-winter days.

Enlightened [TV Series]

I have nearly finished the first series of Enlightened, a TV Series created by Laura Dern and Mike White. The series is extraordinary – even in a world where TV series have become (over the last ten years) a predominant form of entertainment and art.

It is not an easy or fun series – which probably accounts for its cancellation after just two seasons (I’m sort of amazed it got made in the first place – I imagine Laura Dern had something to do with it). In fact, it is often profoundly sad (and funny) as we witness the small tragedies (and ironies) that attend upon Amy (Laura Dern) and those around her. Amy herself is a great tragi-comic creation who remains all too human and un-enlightened despite her initial sojourn at a meditation retreat at the start of the series.

The best way to sum the seris up is to imagine that Raymond Carver had switched from writing short story miniatures of the small desolations and tragedies of suburban America and made TV instead: Enlightened is what he might have produced.

Wanted – Data Curators to Maintain Key Datasets in High-Quality, Easy-to-Use and Open Form

Wanted: volunteers to join a team of “Data Curators” maintaining “core” datasets (like GDP or ISO-codes) in high-quality, easy-to-use and open form.

  • What is the project about: Collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form - in particular, as up-to-date, well-structured Data Packages.
    The “Core Datasets” effort is part of the broader Frictionless Data initiative.
  • What would you be doing: identifying and locating core (public) datasets, cleaning and standardizing the data and making sure the results are kept up to date and easy to use
  • Who can participate: anyone can contribute. Details on the skills needed are below.
  • Get involved: read more below or jump straight to the sign-up section.

What is the Core Datasets effort?

Summary: Collect and maintain important and commonly-used (“core”) datasets in high-quality, reliable and easy-to-use form (as Data Packages).

Core = important and commonly-used datasets e.g. reference data (country codes) and indicators (inflation, GDP)

Curate = take existing data and provide it in high-quality, reliable, and easy-to-use form (standardized, structured, open)

What Roles and Skills are Needed

We need a variety of roles from identifying new “core” datasets to packaging the data to performing quality control (checking metadata etc).

Core Skills - at least one of these skills will be needed:

  • Data Wrangling Experience. Many of our source datasets are not complex (just an Excel file or similar) and can be “wrangled” in a Spreadsheet program. What we therefore recommend is at least one of:
    • Experience with a Spreadsheet application such as Excel or (preferably) Google Docs including use of formulas and (desirably) macros (you should at least know how you could quickly convert a cell containing ‘2014’ to ‘2014-01-01’ across 1000 rows)
    • Coding for data processing (especially scraping) in one or more of python, javascript, bash
  • Data sleuthing - the ability to dig up data on the web (specific desirable skills: you know how to search by filetype in google, you know where the developer tools are in chrome or firefox, you know how to find the URL a form posts to)

Desirable Skills (the more the better!):

  • Data vs Metadata: know difference between data and metadata
  • Familiarity with Git (and Github)
  • Familiarity with a command line (preferably bash)
  • Know what JSON is
  • Mac or Unix is your default operating system (will make access to relevant tools that much easier)
  • Knowledge of Web APIs and/or HTML
  • Use of curl or similar command line tool for accessing Web APIs or web pages
  • Scraping using a command line tool or (even better) by coding yourself
  • Know what a Data Package and a Tabular Data Package are
  • Know what a text editor is (e.g. notepad, textmate, vim, emacs, …) and know how to use it (useful for both working with data and for editing Data Package metadata)

Get Involved - Sign Up Now!

We are looking for volunteer contributors to form a “curation team”.

  • Time commitment: Members of the team commit to at least 8-16h per month (though this will be an average - if you are especially busy with other things one month and do less that is fine)
  • Schedule: There is no schedule so you can contribute at any time that is good for you - evenings, weekeneds, lunch-times etc
  • Location: all activity will be carried out online so you can be based anywhere in the world
  • Skills: see above

To register your interest fill in the following form. Any questions, please get in touch directly.

Want to Dive Straight In?

Can’t wait to get started as a Data Curator? You can dive straight in and start packaging the already-selected (but not packaged) core datasets. Full instructions here:

http://data.okfn.org/roadmap/core-datasets#contribute