Author Archives: Rufus Pollock

Amazon Twitch Acquisition – Paying 70x Sales

Just an aside from reading the recent Amazon 10-Q. In Note 4 on acquisitions they state:

On September 25, 2014, we acquired Twitch Interactive, Inc. (“Twitch”) for approximately $842 million in cash, as adjusted for the assumption of options and other items. During the nine months ended September 30, 2014, we acquired certain other companies for an aggregate purchase price of $20 million. Acquisition activity for the nine months ended September 30, 2013 was not material. We acquired Twitch because of its community and the live streaming experience it provides. The primary reasons for our other 2014 acquisitions were to acquire technologies and know-how to enable Amazon to serve customers more effectively.

and then in th pro-forma add:

The acquired companies were consolidated into our financial statements starting on their respective acquisition dates. The aggregate net sales and operating loss of the companies acquired was $12 million and $3 million for the nine months ended September 30, 2014.

This means that Amazon acquired Twitch for approximately 70x sales! (Earnings multiple is negative since Twitch was losing money it would appear).

A Data Revolution that Works for All of Us

Many of today’s global challenges are not new. Economic inequality, the unfettered power of corporations and markets, the need to cooperate to address global problems and the unsatisfactory levels of accountability in democratic governance – these were as much problems a century ago as they remain today.

What has changed, however – and most markedly – is the role that new forms of information and information technology could potentially play in responding to these challenges.

What’s going on?

The incredible advances in digital technology mean we have an unprecedented ability to create, share and access information. Furthermore, these technologies are increasingly not just the preserve of the rich, but are available to everyone – including the world’s poorest. As a result, we are living in a (veritable) data revolution – never before has so much data – public and personal – been collected, analysed and shared.

However, the benefits of this revolution are far from being shared equally.

On the one hand, some governments and corporations are already using this data to greatly increase their ability to understand – and shape – the world around them. Others, however, including much of civil society, lack the necessary access and capabilities to truly take advantage of this opportunity. Faced with this information inequality, what can we do?

How can we enable people to hold governments and corporations to account for the decisions they make, the money they spend and the contracts they sign? How can we unleash the potential for this information to be used for good – from accelerating research to tackling climate change? And, finally, how can we make sure that personal data collected by governments and corporations is used to empower rather than exploit us?

So how should we respond?

Fundamentally, we need to make sure that the data revolution works for all of us. We believe that key to achieving this is to put “open” at the heart of the digital age. We need an open data revolution.

We must ensure that essential public-interest data is open, freely available to everyone. Conversely, we must ensure that data about me – whether collected by governments, corporations or others – is controlled by and accessible to me. And finally, we have to empower individuals and communities – especially the most disadvantaged – with the capabilities to turn data into the knowledge and insight that can drive the change they seek.

In this rapidly changing information age – where the rules of the game are still up for grabs – we must be active, seizing the opportunities we have, if we are to ensure that the knowledge society we create is an open knowledge society, benefiting the many not the few, built on principles of collaboration not control, sharing not monopoly, and empowerment not exploitation.

Announcing a Leadership Update at Open Knowledge

Today I would like to share some important organisational news. After 3 years with Open Knowledge, Laura James, our CEO, has decided to move on to new challenges. As a result of this change we will be seeking to recruit a new senior executive to lead Open Knowledge as it continues to evolve and grow.

As many of you know, Laura James joined us to support the organisation as we scaled up, and stepped up to the CEO role in 2013. It has always been her intention to return to her roots in engineering at an appropriate juncture, and we have been fortunate to have had Laura with us for so long – she will be sorely missed.

Laura has made an immense contribution and we have been privileged to have her on board – I’d like to extend my deep personal thanks to her for all she has done. Laura has played a central role in our evolution as we’ve grown from a team of half-a-dozen to more than forty. Thanks to her commitment and skill we’ve navigated many of the tough challenges that accompany “growing-up” as an organisation.

There will be no change in my role (as President and founder) and I will be here both to continue to help lead the organisation and to work closely with the new appointment going forward. Laura will remain in post, continuing to manage and lead the organisation, assisting with the recruitment and bringing the new senior executive on board.

For a decade, Open Knowledge has been a leader in its field, working at the forefront of efforts to open up information around the world and and see it used to empower citizens and organisations to drive change. Both the community and original non-profit have grown – and continue to grow – very rapidly, and the space in which we work continues to develop at an incredible pace with many exciting new opportunities and activities.

We have a fantastic future ahead of us and I’m very excited as we prepare Open Knowledge to make its next decade even more successful than its first.

We will keep everyone informed in the coming weeks as our plans develop, and there will also be opportunities for the Open Knowledge community to discuss. In the meantime, please don’t hesitate to get in touch with me if you have any questions.

A Data API for Data Packages in Seconds Using CKAN and its DataStore

dpm the command-line ‘data package manager’ now supports pushing (Tabular) Data Packages straight into a CKAN instance (including pushing all the data into the CKAN DataStore):

dpm ckan {ckan-instance-url}

This allows you, in seconds, to get a fully-featured web data API – including JSON and SQL-based query APIs:

dpm ckan demo

View fullsize

Once you have a nice web data API like this we can very easily create data-driven applications and visualizations. As a simple demonstration, there’s the CKAN Data Explorer (example with IMF data - see below).

Where Can I Find a CKAN instance to Upload to?

If you’re looking for a CKAN site to upload your Data Packages to we recommend the DataHub which is community-run and free. To upload to the DataHub you’ll want to.

  1. Configure the DataHub CKAN instance in your .dpmrc

    [ckan.datahub]
    url = http://datahub.io/
    apikey = your-api-key
    
  2. Upload your Data Package

    dpm ckan datahub --owner_org=your-organization
    

    You have to set the owner organization as all datasts on the DataHub need an owner organization.

One I Did Earlier

Here’s a live example of one “I did earlier”:

Context: a big motivation (personally) for doing this is that I’d like to see a nice web data API available for the “Core” Data Packages we’re creating as part of the Frictionless Data effort. If you’re interested in helping, get in touch.

Labs newsletter: 5 June, 2014

Welcome back to the OKFN Labs! Members of the Labs have been building tools, visualizations, and even new data protocols—as well as setting up conferences and events. Read on to learn more.

If you’d like to suggest a piece of news for next month’s newsletter, leave a comment on its GitHub issue.

commasearch

Thomas Levine has been working on an innovative new approach to searching tabular data, commasearch.

Unlike a normal search engine, where you submit words and get pages of words back, with commasearch, you submit spreadsheets and get spreadsheets in return.

What does that mean, and how does it work? Check out Thomas’s excellent blog post “Pagerank for Spreadsheets” to learn more.

GitHub diffs for CSV files

Submitted by Paul Fitzpatrick.

GitHub has added CSV viewing support in their web interface, which is fantastic, but it still doesn’t handle changes well. If you use Chrome, and want lovely diffs, check out James Smith’s CSVHub extension (blogpost and screenshot). The diffs are produced using the daff library, available in javascript, ruby, php, and python3.

Textus Wordpress plugin

Update from Iain Emsley.

The Open Literature project to provide a Wordpress plugin back-end for the Textus viewer has made new progress.

This project’s goal was to keep the existing Textus frontend—which has been split off as its own project by Rufus Pollock—and replace the backend with a Wordpress plugin, to make it easier to deploy. A version of this plugin backend is now available.

The new plugin acts as a stand-alone module that can be enabled and disabled as required by the administrative user. It creates a new Wordpress post type called “Textus” which is available as part of the menu, giving the user a place to upload text and annotation files using the Media uploader.

If you are interested in the project, check out its issues and discussion on the Open Humanities list.

Data protocols: updates

Data Protocols, the Labs’s set of lightweight standards and patterns for open data, has had a couple of interesting developments.

The JSON Table Schema protocol has just added support for constraints (i.e. validation), thanks to Leigh Dodds. This adds a constraints attribute containing requirements on the content of fields. See the full list of valid constraints on the JSON Table Schema site.

The Data Package Manager tool for Data Packages is shaping up nicely: the install and init commands have now been implemented. You can see an animated GIF of the former in the issue thread.

AnnotatorJS: new home

Annotator is “an open-source JavaScript library to easily add annotation functionality to any webpage”.

The project now lives on its own domain at annotatorjs.org. Check it out and see how easy it is to add comments and notes to your pages!

csv,conf

Data makers everywhere will want to check out csv,conf, a fringe event of Open Knowledge Festival 2014 taking place in Berlin on 15 July.

csv,conf is a non-profit community conference that will “bring together data makers/doers/hackers from backgrounds like science, journalism, open government and the wider software industry to share tools and stories”.

Tickets are $75, $50 with an OKFest ticket. If you can make it to Berlin in July and you’re into “advancing the art of data collaboration”, come join in!

Steve Wynn on Impact of QE on Businesses and Consumers

Saw this nugget buried in a recent earnings call of Wynn Resorts Management. This is Steve Wynn responding to a caller question:

Well, we finished our financing recently. The last tranche was a $750,000 — $750 million bond. We sold it at 5.09 with no covenants nonrecourse to the parent. And that brought our total financing for Cotai to $3,850,000,000 at an average cost of 3.3%. Or to put it another way, we rented the $3.85 billion for $125 million.

Now on one hand, as a businessman, I’m thrilled. Never dreamt that we would see anything so tasty and wonderful as that. On the other hand, it’s a reflection of questionable fiscal and monetary policy in the United States that is artificially depressed interest rates because of quantitative easing by the Fed, which is also sort of killing the value of the dollar and the living standard of the working people.

So the good news is, if you’re a high-class borrower with good credit rating, this is one of the most tastiest seasons of all time for 2 reasons. You’re borrowing money at artificially depressed rates. And you’re most likely going to pay them back with 85-cent dollars.

It’s a perfect storm for a businessperson unless you look at the truth of the matter and the impact it has on your customers and your employees. And that’s a much darker story. It doesn’t lend itself to a soundbite, but it’s — for every businessman in America and any economist that has their heads screwed on right, it’s an ominous situation.

But in terms of our moment in history, in commercial history and our projects in Cotai, along with our colleagues in the industry, it’s nirvana. Capital structure now is — these are mostly at the Venetian and the Wynn, things of beauty. They’re lovely, better than you could ever want. I mean, they’ve got everything, low interest rates, long maturities, low covenants. What else do you want? I mean, it’s great.

If you look at it from our point of view, look at it from a consumers’ point of view or a working person’s point of view, who’s paying for all this cheap money? Well, right now, the Fed is. I thought Bernie Madoff went to jail for that. But anyway, that’s my answer about your capital structure.

CSV Conf 2014 – for Data Makers Everywhere

Announcing CSV,Conf - the conference for data makers everywhere which takes place on 15 July 2014 in Berlin.

This one day conference will focus on practical, real-world stories, examples and techniques of how to scrape, wrangle, analyze, and visualize data. Whether your data is big or small, tabular or spatial, graphs or rows this event is for you.

Key Info

CSV,Conf is run in conjunction with the week long Open Knowledge Festival.

What Is It About?

Building Community

We want to bring together data makers/doers/hackers from backgrounds like science, journalism, open government and the wider software industry to share tools and stories.

For those who love data

CSV Conf is a non-profit community conference run by some folks who really love data and sharing knowledge. If you are as passionate about data and the application it has to society then you should join us!

Big and small

This isn’t a conference just about spreadsheets. We are curating content about advancing the art of data collaboration, from putting your CSV on GitHub to producing meaningful insight by running large scale distributed processing.

Colophon: Why CSV?

This conference isn’t just about CSV data. But we chose to call it CSV Conf because we think CSV embodies certain important qualities that set the tone for the event:

  • Simplicity: CSV is incredibly simple - perhaps the simplest structured data format there is
  • Openness: the CSV ‘standard’ is well-known and open - free for anyone to use
  • Easy to use: CSV is widely supported - practically every spreadsheet program, relational database and programming language in existence can handle CSV in some form or other
  • Hackable: CSV is text-based and therefore amenable to manipulation and access from a wide range of standard tools (including revision control systems such as git, mercurial and subversion)
  • Big or small: CSV files can range from under a kilobyte to gigabytes and its line-oriented structure mean it can be incrementally processed – you do not need to read an entire file to extract a single row.

More informally:

CSV is the data Kalashnikov: not pretty, but many [data] wars have been fought with it and even kids can use it. @pudo (Friedrich Lindenberg)

CSV is the ultimate simple, standard data format - streamable, text-based, no need for proprietary tools etc @rufuspollock (Rufus Pollock)

[The above is adapted from the “Why CSV” section of the Tabular Data Package specification]

Candy Crush, King Digital Entertainment, Offshoring and Tax

Sifting through the King Entertainment F-1 filing with the SEC for their IPO (Feb 18 2014) I noticed the following in their risk section:

The intended tax benefits of our corporate structure and intercompany arrangements may not be realized, which could result in an increase to our worldwide effective tax rate and cause us to change the way we operate our business. Our corporate structure and intercompany arrangements, including the manner in which we develop and use our intellectual property and the transfer pricing of our intercompany transactions, are intended to provide us worldwide tax efficiencies [ed: for this I read – significantly reduce our tax-rate by moving our profits to low-tax jurisdictions …]. The application of the tax laws of various jurisdictions to our international business activities is subject to interpretation and also depends on our ability to operate our business in a manner consistent with our corporate structure and intercompany arrangements. The taxing authorities of the jurisdictions in which we operate may challenge our methodologies for valuing developed technology or intercompany arrangements, including our transfer pricing, or determine that the manner in which we operate our business does not achieve the intended tax consequences, which could increase our worldwide effective tax rate and adversely affect our financial position and results of operations.

It is also interesting how they have set up their corporate structure going “offshore” first to Malta and then to Ireland (from the “Our Corporate Information and Structure” section):

We were originally incorporated as Midasplayer.com Limited in September 2002, a company organized under the laws of England and Wales. In December 2006, we established Midasplayer International Holding Company Limited, a limited liability company organized under the laws of Malta, which became the holding company of Midasplayer.com Limited and our other wholly-owned subsidiaries. The status of Midasplayer International Holding Company Limited changed to a public limited liability company in November 2013 and its name changed to Midasplayer International Holding Company p.l.c. Prior to completion of this offering, King Digital Entertainment plc, a company incorporated under the laws of Ireland and created for the purpose of facilitating the public offering contemplated hereby, will become our current holding company by way of a share-for-share exchange in which the existing shareholders of Midasplayer International Holding Company p.l.c. will exchange their shares in Midasplayer International Holding Company p.l.c. for shares having substantially the same rights in King Digital Entertainment plc. See “Corporate Structure.”

Here’s their corporate structure diagram from the “Corporate Structure” section (unfortunately barely readable in the original as well …). As I count it there are 19 different entities with a chain of length 6 or 7 from base entities to primary holding company.

Labs newsletter: 20 March, 2014

We’re back with a bumper crop of updates in this new edition of the now-monthly Labs newsletter!

Textus Viewer refactoring

The TEXTUS Viewer is an HTML + JS application for viewing texts in the format of TEXTUS, Labs’s open source platform for collaborating around collections of texts. The viewer has now been stripped down to its bare essentials, becoming a leaner and more streamlined beast that’s easier to integrate into your projects.

Check out the demo to see the new Viewer in action, and see the full usage instructions in the repo.

JSON Table Schema: foreign key support

The JSON Table Schema, Labs’s schema for tabular data, has just added an important new feature: support for foreign keys. This means that the schema now provides a method for linking entries in a table to entries in a separate resource.

This update has been in the works for a long time, as you can see from the discussion thread on GitHub. Many thanks to everyone who participated in that year-long discussion, including Jeff Allen, David Miller, Gunnlaugur Thor Briem, Sebastien Ballesteros, James McKinney, Paul Fitzpatrick, Josh Ferguson, Tryggvi Björgvinsson, and Rufus Pollock.

Renaming of Data Explorer

Data Explorer is Labs’s in-browser data cleaning and visualization app—and it’s about to get a name change.

For the past four months, discussion around the new name has been bubbling. As of right now, Rufus Pollock is proposing to go with the new name DataDeck.

What do you think? If you object, now’s your chance to jump in the thread and re-open the issue!

On the blog: SEC EDGAR database

Rufus has been doing some work with the Securities and Exchange Commission (SEC) EDGAR database, “a rich source of data containing regulatory filings from publicly-traded US corporations including their annual and quarterly reports”. He has written up his initial findings on the blog and created a repo for the extracted data.

This is an interesting example of working with XBRL, the popular XML framework for financial reporting. You can find several good Python libraries for working with XBRL in Rufus’s message to the mailing list.

Labs Hangout: today!

Labs Hangouts are a fun and informal way for Labs members and friends to get together, discuss their work, and seek out new contributions—and the next one is happening today (20 March) at 1700-1800 GMT!

If you want to join in, visit the hangout Etherpad and record your name. The URL of the Hangout will be announced on the Labs mailing list as well as reported on the pad.

Get involved

Want to join in Labs activities? There’s lots to do! Possibilities for contribution include:

And much much more. Leave an idea on the Ideas Page, or visit the Labs site to learn more about how you can join the community.

The SEC EDGAR Database

This post looks at the Securities and Exchange Commission (SEC) EDGAR database. EDGAR is a rich source of data containing regulatory filings from publicly-traded US corporations including their annual and quarterly reports:

All companies, foreign and domestic, are required to file registration statements, periodic reports, and other forms electronically through EDGAR. Anyone can access and download this information for free. [from the SEC website]

This post introduces the basic structure of the database, and how to get access to filings via ftp. Subsequent posts will look at how to use the structured information in the form of XBRL files.

Note: an extended version of the notes here plus additional data and scripts can be found in this SEC EDGAR Data Package on Github.

Human Interface

See http://www.sec.gov/edgar/searchedgar/companysearch.html

Bulk Data

EDGAR provides bulk access via FTP: ftp://ftp.sec.gov/ - official documentation. We summarize here the main points.

Each company in EDGAR gets an identifier known as the CIK which is a 10 digit number. You can find the CIK by searching EDGAR using a name of stock market ticker.

For example, searching for IBM by ticker shows us that the the CIK is 0000051143.

Note that leading zeroes are often omitted (e.g. in the ftp access) so this would become 51143.

Next each submission receives an ‘Accession Number’ (acc-no). For example, IBM’s quarterly financial filing (form 10-Q) in October 2013 had accession number: 0000051143-13-000007.

FTP File Paths

Given a company with CIK (company ID) XXX (omitting leading zeroes) and document accession number YYY (acc-no on search results) the path would be:

File paths are of the form:

/edgar/data/XXX/YYY.txt

For example, for the IBM data above it would be:

ftp://ftp.sec.gov/edgar/data/51143/0000051143-13-000007.txt

Note, if you are looking for a nice HTML version you can find it at in the Archives section with a similar URL (just add -index.html):

http://www.sec.gov/Archives/edgar/data/51143/000005114313000007/0000051143-13-000007-index.htm

Indices

If you want to get a list of all filings you’ll want to grab an Index. As the help page explains:

The EDGAR indices are a helpful resource for FTP retrieval, listing the following information for each filing: Company Name, Form Type, CIK, Date Filed, and File Name (including folder path).

Four types of indexes are available:

  • company — sorted by company name
  • form — sorted by form type
  • master — sorted by CIK number
  • XBRL — list of submissions containing XBRL financial files, sorted by CIK number; these include Voluntary Filer Program submissions

URLs are like:

ftp://ftp.sec.gov/edgar/full-index/2008/QTR4/master.gz

That is, they have the following general form:

ftp://ftp.sec.gov/edgar/full-index/{YYYY}/QTR{1-4}/{index-name}.[gz|zip]

So for XBRL in the 3rd quarter of 2010 we’d do:

ftp://ftp.sec.gov/edgar/full-index/2010/QTR3/xbrl.gz

CIK lists and lookup

There’s a full list of all companies along with their CIK code here: http://www.sec.gov/edgar/NYU/cik.coleft.c

If you want to look up a CIK or company by its ticker you can do the following query against the normal search system:

http://www.sec.gov/cgi-bin/browse-edgar?CIK=ibm&Find=Search&owner=exclude&action=getcompany&output=atom

Then parse the atom to grab the CIK. (If you prefer HTML output just omit output=atom).

There is also a full-text company name to CIK lookup here:

http://www.sec.gov/edgar/searchedgar/cik.htmL

(Note this does a POST to a ‘text’ API at http://www.sec.gov/cgi-bin/cik.pl.c)