Labs newsletter: 5 June, 2014

Welcome back to the OKFN Labs! Members of the Labs have been building tools, visualizations, and even new data protocols—as well as setting up conferences and events. Read on to learn more.

If you’d like to suggest a piece of news for next month’s newsletter, leave a comment on its GitHub issue.

commasearch

Thomas Levine has been working on an innovative new approach to searching tabular data, commasearch.

Unlike a normal search engine, where you submit words and get pages of words back, with commasearch, you submit spreadsheets and get spreadsheets in return.

What does that mean, and how does it work? Check out Thomas’s excellent blog post “Pagerank for Spreadsheets” to learn more.

GitHub diffs for CSV files

Submitted by Paul Fitzpatrick.

GitHub has added CSV viewing support in their web interface, which is fantastic, but it still doesn’t handle changes well. If you use Chrome, and want lovely diffs, check out James Smith’s CSVHub extension (blogpost and screenshot). The diffs are produced using the daff library, available in javascript, ruby, php, and python3.

Textus Wordpress plugin

Update from Iain Emsley.

The Open Literature project to provide a Wordpress plugin back-end for the Textus viewer has made new progress.

This project’s goal was to keep the existing Textus frontend—which has been split off as its own project by Rufus Pollock—and replace the backend with a Wordpress plugin, to make it easier to deploy. A version of this plugin backend is now available.

The new plugin acts as a stand-alone module that can be enabled and disabled as required by the administrative user. It creates a new Wordpress post type called “Textus” which is available as part of the menu, giving the user a place to upload text and annotation files using the Media uploader.

If you are interested in the project, check out its issues and discussion on the Open Humanities list.

Data protocols: updates

Data Protocols, the Labs’s set of lightweight standards and patterns for open data, has had a couple of interesting developments.

The JSON Table Schema protocol has just added support for constraints (i.e. validation), thanks to Leigh Dodds. This adds a constraints attribute containing requirements on the content of fields. See the full list of valid constraints on the JSON Table Schema site.

The Data Package Manager tool for Data Packages is shaping up nicely: the install and init commands have now been implemented. You can see an animated GIF of the former in the issue thread.

AnnotatorJS: new home

Annotator is “an open-source JavaScript library to easily add annotation functionality to any webpage”.

The project now lives on its own domain at annotatorjs.org. Check it out and see how easy it is to add comments and notes to your pages!

csv,conf

Data makers everywhere will want to check out csv,conf, a fringe event of Open Knowledge Festival 2014 taking place in Berlin on 15 July.

csv,conf is a non-profit community conference that will “bring together data makers/doers/hackers from backgrounds like science, journalism, open government and the wider software industry to share tools and stories”.

Tickets are $75, $50 with an OKFest ticket. If you can make it to Berlin in July and you’re into “advancing the art of data collaboration”, come join in!

Steve Wynn on Impact of QE on Businesses and Consumers

Saw this nugget buried in a recent earnings call of Wynn Resorts Management. This is Steve Wynn responding to a caller question:

Well, we finished our financing recently. The last tranche was a $750,000 — $750 million bond. We sold it at 5.09 with no covenants nonrecourse to the parent. And that brought our total financing for Cotai to $3,850,000,000 at an average cost of 3.3%. Or to put it another way, we rented the $3.85 billion for $125 million.

Now on one hand, as a businessman, I’m thrilled. Never dreamt that we would see anything so tasty and wonderful as that. On the other hand, it’s a reflection of questionable fiscal and monetary policy in the United States that is artificially depressed interest rates because of quantitative easing by the Fed, which is also sort of killing the value of the dollar and the living standard of the working people.

So the good news is, if you’re a high-class borrower with good credit rating, this is one of the most tastiest seasons of all time for 2 reasons. You’re borrowing money at artificially depressed rates. And you’re most likely going to pay them back with 85-cent dollars.

It’s a perfect storm for a businessperson unless you look at the truth of the matter and the impact it has on your customers and your employees. And that’s a much darker story. It doesn’t lend itself to a soundbite, but it’s — for every businessman in America and any economist that has their heads screwed on right, it’s an ominous situation.

But in terms of our moment in history, in commercial history and our projects in Cotai, along with our colleagues in the industry, it’s nirvana. Capital structure now is — these are mostly at the Venetian and the Wynn, things of beauty. They’re lovely, better than you could ever want. I mean, they’ve got everything, low interest rates, long maturities, low covenants. What else do you want? I mean, it’s great.

If you look at it from our point of view, look at it from a consumers’ point of view or a working person’s point of view, who’s paying for all this cheap money? Well, right now, the Fed is. I thought Bernie Madoff went to jail for that. But anyway, that’s my answer about your capital structure.

CSV Conf 2014 – for Data Makers Everywhere

Announcing CSV,Conf - the conference for data makers everywhere which takes place on 15 July 2014 in Berlin.

This one day conference will focus on practical, real-world stories, examples and techniques of how to scrape, wrangle, analyze, and visualize data. Whether your data is big or small, tabular or spatial, graphs or rows this event is for you.

Key Info

CSV,Conf is run in conjunction with the week long Open Knowledge Festival.

What Is It About?

Building Community

We want to bring together data makers/doers/hackers from backgrounds like science, journalism, open government and the wider software industry to share tools and stories.

For those who love data

CSV Conf is a non-profit community conference run by some folks who really love data and sharing knowledge. If you are as passionate about data and the application it has to society then you should join us!

Big and small

This isn’t a conference just about spreadsheets. We are curating content about advancing the art of data collaboration, from putting your CSV on GitHub to producing meaningful insight by running large scale distributed processing.

Colophon: Why CSV?

This conference isn’t just about CSV data. But we chose to call it CSV Conf because we think CSV embodies certain important qualities that set the tone for the event:

  • Simplicity: CSV is incredibly simple - perhaps the simplest structured data format there is
  • Openness: the CSV ‘standard’ is well-known and open - free for anyone to use
  • Easy to use: CSV is widely supported - practically every spreadsheet program, relational database and programming language in existence can handle CSV in some form or other
  • Hackable: CSV is text-based and therefore amenable to manipulation and access from a wide range of standard tools (including revision control systems such as git, mercurial and subversion)
  • Big or small: CSV files can range from under a kilobyte to gigabytes and its line-oriented structure mean it can be incrementally processed – you do not need to read an entire file to extract a single row.

More informally:

CSV is the data Kalashnikov: not pretty, but many [data] wars have been fought with it and even kids can use it. @pudo (Friedrich Lindenberg)

CSV is the ultimate simple, standard data format - streamable, text-based, no need for proprietary tools etc @rufuspollock (Rufus Pollock)

[The above is adapted from the “Why CSV” section of the Tabular Data Package specification]

Candy Crush, King Digital Entertainment, Offshoring and Tax

Sifting through the King Entertainment F-1 filing with the SEC for their IPO (Feb 18 2014) I noticed the following in their risk section:

The intended tax benefits of our corporate structure and intercompany arrangements may not be realized, which could result in an increase to our worldwide effective tax rate and cause us to change the way we operate our business. Our corporate structure and intercompany arrangements, including the manner in which we develop and use our intellectual property and the transfer pricing of our intercompany transactions, are intended to provide us worldwide tax efficiencies [ed: for this I read - significantly reduce our tax-rate by moving our profits to low-tax jurisdictions ...]. The application of the tax laws of various jurisdictions to our international business activities is subject to interpretation and also depends on our ability to operate our business in a manner consistent with our corporate structure and intercompany arrangements. The taxing authorities of the jurisdictions in which we operate may challenge our methodologies for valuing developed technology or intercompany arrangements, including our transfer pricing, or determine that the manner in which we operate our business does not achieve the intended tax consequences, which could increase our worldwide effective tax rate and adversely affect our financial position and results of operations.

It is also interesting how they have set up their corporate structure going “offshore” first to Malta and then to Ireland (from the “Our Corporate Information and Structure” section):

We were originally incorporated as Midasplayer.com Limited in September 2002, a company organized under the laws of England and Wales. In December 2006, we established Midasplayer International Holding Company Limited, a limited liability company organized under the laws of Malta, which became the holding company of Midasplayer.com Limited and our other wholly-owned subsidiaries. The status of Midasplayer International Holding Company Limited changed to a public limited liability company in November 2013 and its name changed to Midasplayer International Holding Company p.l.c. Prior to completion of this offering, King Digital Entertainment plc, a company incorporated under the laws of Ireland and created for the purpose of facilitating the public offering contemplated hereby, will become our current holding company by way of a share-for-share exchange in which the existing shareholders of Midasplayer International Holding Company p.l.c. will exchange their shares in Midasplayer International Holding Company p.l.c. for shares having substantially the same rights in King Digital Entertainment plc. See “Corporate Structure.”

Here’s their corporate structure diagram from the “Corporate Structure” section (unfortunately barely readable in the original as well …). As I count it there are 19 different entities with a chain of length 6 or 7 from base entities to primary holding company.

Labs newsletter: 20 March, 2014

We’re back with a bumper crop of updates in this new edition of the now-monthly Labs newsletter!

Textus Viewer refactoring

The TEXTUS Viewer is an HTML + JS application for viewing texts in the format of TEXTUS, Labs’s open source platform for collaborating around collections of texts. The viewer has now been stripped down to its bare essentials, becoming a leaner and more streamlined beast that’s easier to integrate into your projects.

Check out the demo to see the new Viewer in action, and see the full usage instructions in the repo.

JSON Table Schema: foreign key support

The JSON Table Schema, Labs’s schema for tabular data, has just added an important new feature: support for foreign keys. This means that the schema now provides a method for linking entries in a table to entries in a separate resource.

This update has been in the works for a long time, as you can see from the discussion thread on GitHub. Many thanks to everyone who participated in that year-long discussion, including Jeff Allen, David Miller, Gunnlaugur Thor Briem, Sebastien Ballesteros, James McKinney, Paul Fitzpatrick, Josh Ferguson, Tryggvi Björgvinsson, and Rufus Pollock.

Renaming of Data Explorer

Data Explorer is Labs’s in-browser data cleaning and visualization app—and it’s about to get a name change.

For the past four months, discussion around the new name has been bubbling. As of right now, Rufus Pollock is proposing to go with the new name DataDeck.

What do you think? If you object, now’s your chance to jump in the thread and re-open the issue!

On the blog: SEC EDGAR database

Rufus has been doing some work with the Securities and Exchange Commission (SEC) EDGAR database, “a rich source of data containing regulatory filings from publicly-traded US corporations including their annual and quarterly reports”. He has written up his initial findings on the blog and created a repo for the extracted data.

This is an interesting example of working with XBRL, the popular XML framework for financial reporting. You can find several good Python libraries for working with XBRL in Rufus’s message to the mailing list.

Labs Hangout: today!

Labs Hangouts are a fun and informal way for Labs members and friends to get together, discuss their work, and seek out new contributions—and the next one is happening today (20 March) at 1700-1800 GMT!

If you want to join in, visit the hangout Etherpad and record your name. The URL of the Hangout will be announced on the Labs mailing list as well as reported on the pad.

Get involved

Want to join in Labs activities? There’s lots to do! Possibilities for contribution include:

And much much more. Leave an idea on the Ideas Page, or visit the Labs site to learn more about how you can join the community.

The SEC EDGAR Database

This post looks at the Securities and Exchange Commission (SEC) EDGAR database. EDGAR is a rich source of data containing regulatory filings from publicly-traded US corporations including their annual and quarterly reports:

All companies, foreign and domestic, are required to file registration statements, periodic reports, and other forms electronically through EDGAR. Anyone can access and download this information for free. [from the SEC website]

This post introduces the basic structure of the database, and how to get access to filings via ftp. Subsequent posts will look at how to use the structured information in the form of XBRL files.

Note: an extended version of the notes here plus additional data and scripts can be found in this SEC EDGAR Data Package on Github.

Human Interface

See http://www.sec.gov/edgar/searchedgar/companysearch.html

Bulk Data

EDGAR provides bulk access via FTP: ftp://ftp.sec.gov/ - official documentation. We summarize here the main points.

Each company in EDGAR gets an identifier known as the CIK which is a 10 digit number. You can find the CIK by searching EDGAR using a name of stock market ticker.

For example, searching for IBM by ticker shows us that the the CIK is 0000051143.

Note that leading zeroes are often omitted (e.g. in the ftp access) so this would become 51143.

Next each submission receives an ‘Accession Number’ (acc-no). For example, IBM’s quarterly financial filing (form 10-Q) in October 2013 had accession number: 0000051143-13-000007.

FTP File Paths

Given a company with CIK (company ID) XXX (omitting leading zeroes) and document accession number YYY (acc-no on search results) the path would be:

File paths are of the form:

/edgar/data/XXX/YYY.txt

For example, for the IBM data above it would be:

ftp://ftp.sec.gov/edgar/data/51143/0000051143-13-000007.txt

Note, if you are looking for a nice HTML version you can find it at in the Archives section with a similar URL (just add -index.html):

http://www.sec.gov/Archives/edgar/data/51143/000005114313000007/0000051143-13-000007-index.htm

Indices

If you want to get a list of all filings you’ll want to grab an Index. As the help page explains:

The EDGAR indices are a helpful resource for FTP retrieval, listing the following information for each filing: Company Name, Form Type, CIK, Date Filed, and File Name (including folder path).

Four types of indexes are available:

  • company — sorted by company name
  • form — sorted by form type
  • master — sorted by CIK number
  • XBRL — list of submissions containing XBRL financial files, sorted by CIK number; these include Voluntary Filer Program submissions

URLs are like:

ftp://ftp.sec.gov/edgar/full-index/2008/QTR4/master.gz

That is, they have the following general form:

ftp://ftp.sec.gov/edgar/full-index/{YYYY}/QTR{1-4}/{index-name}.[gz|zip]

So for XBRL in the 3rd quarter of 2010 we’d do:

ftp://ftp.sec.gov/edgar/full-index/2010/QTR3/xbrl.gz

CIK lists and lookup

There’s a full list of all companies along with their CIK code here: http://www.sec.gov/edgar/NYU/cik.coleft.c

If you want to look up a CIK or company by its ticker you can do the following query against the normal search system:

http://www.sec.gov/cgi-bin/browse-edgar?CIK=ibm&Find=Search&owner=exclude&action=getcompany&output=atom

Then parse the atom to grab the CIK. (If you prefer HTML output just omit output=atom).

There is also a full-text company name to CIK lookup here:

http://www.sec.gov/edgar/searchedgar/cik.htmL

(Note this does a POST to a ‘text’ API at http://www.sec.gov/cgi-bin/cik.pl.c)

Labs newsletter: 20 February, 2014

The past few weeks have seen major improvements to the Labs website, another Open Data Maker Night in London, updates to the TimeMapper project, and more.

Labs Hangout: today

The next Labs online hangout is taking place today in just a few hours—now’s your chance to sign up on the hangout’s Etherpad!

Labs hangouts are informal online gatherings held on Google Hangout at which Labs members and friends get together to discuss their work and to set the agenda for Labs activities.

Today’s hangout will take place at 1700 - 1800 GMT. Check the hangout pad for more details, and watch the pad for notes from the meeting.

Crowdcrafting at Citizen Cyberscience Summit 2014

In today’s other news, Labs’s Daniel Lombraña González is presenting Crowdcrafting at the Citizen Cyberscience Summit 2014. You can read more about his presentation here.

Crowdcrafting is an open-source citizen science platform that “empowers citizens to become active players in scientific projects by donating their time in order to solve micro-task problems”. Crowdcrafting has been used by institutions including CERN, the United Nations, and the National Institute of Space Research of Brazil.

Labs site updates

Labs has been discussing improving the website for some time now, and the past weeks have seen many of those proposed improvements being put into action.

One of the biggest changes is a new projects page. Besides having a beautiful new layout, the new projects page implements filtering by tags, language, and more.

The site now also features a reciprocal linking of users and projects. The projects page now shows projects’ maintainers (n.b. plural!), and users pages now show which projects users contribute to (e.g. Andy Lulham’s page highlights his Data Pipes contributions).

TimeMapper improvements

TimeMapper is a Labs project allowing you to create elegant timelines with inline maps from Google Spreadsheets in a matter of seconds.

A number of improvements have been made to TimeMapper:

Open Data Maker Night February

Two weeks ago today, the ninth Open Data Maker Night London was hosted by Andy Lulham. This edition was a mapping special, featuring OpenStreetMap contributor Harry Wood.

Open Data Maker Nights are informal, action-oriented get-togethers where things get made with open data. Visit the Labs website for more information on them, including info on how to host your own.

DataPackage + Bubbles

On last week’s newsletter, you heard about S?tefan Urbánek’s abstract data processing framework Bubbles. S?tefan just notified the OKFN Labs list that he has created a demo of Bubbles using Data Packages, Labs’s simple standard for data publication.

“The example is artificial”, S?tefan says, but it highlights the power of the Bubbles framework and the potential of the Data Package format.

Get involved

We’re always looking for new contributions at the Labs. Read about how you can join, and see the Ideas Page to get in on the ground floor of a Labs project—or just join the Labs mailing list to participate by offering feedback.

Labs newsletter: 30 January, 2014

From now on, the Labs newsletter will arrive through a special announce-only mailing list, newsletter@okfnlabs.org, more details on which can be found below.

Keep reading for other new developments including the fifth Labs Hangout, the launch of SayIt, and new developments in the vision of “Frictionless Data”.

New newsletter format

Not everyone who wants to know about Labs activities wants or needs to observe those activities unfolding on the main Labs list. For friends of Labs who just want occasional updates, we’ve created a new, Sendy-based announce-only list that will bring you a Labs newsletter every two weeks.

Everyone currently subscribed to okfn-labs@lists.okfn.org has been added to the new list. To join the new announce list, see the Labs Contact page, where there’s a form.

Labs Hangout no. 5

Last Thursday, Andy Lulham hosted the fifth OKFN Labs Hangout. The Labs Hangouts are a way for people curious about Labs projects to informally get together, share their work, and talk about the future of Labs.

For full details, check out the minutes from the hangout. Highlights included:

SayIt

SayIt, an open-source tool for publishing and sharing transcripts, has just been launched by Poplus. At last week’s Labs Hangout, Tom Steinberg of mySociety (one half of Poplus, alongside Ciudadano Inteligente) shared some of the motivations behind the creation of the tool, which was also discussed on the okfn-discuss mailing list.

As Tom explained, mySociety’s They Work For You has proven the popularity of transcript data. But making the transcripts available in a nice way (e.g. with a decent API) has so far called for bespoke software development. SayIt is designed to encourage “nice” publication as the starting-point—and to serve as a pedagogical example of what a good data publication tool looks like.

Frictionless data: vision, roadmap, composability

We’ve heard about Rufus’s vision for an ecosystem of “frictionless data” in the past. Now the discussion is starting to get serious. data.okfn.org now hosts two key documents generated through the conversation:

  • the vision: what will create a dynamic, productive, and attractive open data ecosystem?
  • the roadmap: what has to happen to bring this vision to life?

The new roadmap is a particularly lucid overview of how the frictionless data vision connects with concrete actions. Would-be creators of this new ecosystem should consult the roadmap to see where to join in.

Discussion on the Labs list has also generated some interesting insights. Data Unity’s Kev Kirkland discussed his work with Semantic Web formalization of composable data manipulation processes, and S?tefan Urbánek made a connection with his work on “abstracting datasets and operations” in the ETL framework Bubbles.

On the blog: OLAP part two

Last week, S?tefan Urbánek wrote us an introduction to Online Analytical Processing. Shortly afterwards, he followed up with a second post taking a closer look at how OLAP data is structured and why.

Check out S?tefan’s post to learn about how OLAP represents data as multidimensional “cubes” that users can slice and dice to explore the data along its many dimensions.

TimeMapper improvements

Andy Lulham has started working on TimeMapper, Labs’s easy-to-use tool for the creation of interactive timelines linked to geomaps.

Some of the improvements he has made so far have been bugfixes (e.g. preventing overflowing form controls, fixing the template settings file), but one of them is a new user feature: adding a way to change the starting event on a timeline so that they don’t always have to start at the beginning.

Get involved

Want to get involved with Labs’s projects? Now is a great time to join in! Check out the Ideas Page to see some of the many things you can do once you join Labs, or just jump on the Labs mailing list and take part in a conversation.

Labs newsletter: 16 January, 2014

Welcome back from the holidays! A new year of Labs activities is well underway, with long-discussed improvements to the Labs projects page, many new PyBossa developments, a forthcoming community hangout, and more.

Labs projects page

Getting the Labs project page organized better has been high on the agenda for some time now. In the past little while, significant progress has been made. New improvements to the project page include:

Oleg Lavrosky, Daniel Lombraña González, and Andy Lulham have all contributed to this development—and work is still ongoing, with further enhancements to attributes and more work on the UI still to come.

Lots of PyBossa milestones

PyBossa has achieved so many milestones since the last newsletter that it’s hard to know where to begin.

PyBossa v0.2.1 was released by Daniel Lombraña González, becoming a more robust service through the inclusion of a new rate-limiting feature for API calls. Alongside rate limits, the new PyBossa has improved security through the addition of a secure cookie-based solution for posting task runs. Full details can be found in the documentation.

Daniel also released a new PyBossa template for annotating pictures. The template, which incorporates the Annotorious.JS JavaScript library, “allow[s] anyone to extract structured information from pictures or photos in a very simple way”.

The Enki package for analyzing PyBossa applications was also released over the break. Enki makes it possible to download completed PyBossa tasks and associated task runs, analyze them with Pandas, and share the result as an IPython Notebook. Check out Daniel’s blog post on Enki to see what it’s about.

New on the blog

We’ve had a couple of great new contributions on the Labs blog since the last newsletter.

Thomas Levine has written about how he parses PDF files, lovingly exploring a problem that all data wranglers will encounter and gnash their teeth over at least a few times in their lives.

Stefan Urbanek, meanwhile, has written an introduction to OLAP, “an approach to answering multi-dimensional analytical queries swiftly”, explaining what that means and why we should take notice.

Da?nabox

Labs friend Darwin Peltan reached out to the list to point out that his friend’s project Da?nabox is looking for testers and general feedback. Labs members are invited to pitch in by finding bugs and breaking it.

Da?nabox is “Heroku but with public payment pages”, crowdsourcing the payment for an app’s hosting costs. Da?nabox is open source and built on the Deis platform.

Community hangout

It’s almost time for the Labs community hangout. The Labs hangout is the regular event where Labs members meet up online to discuss their work, find ways to collaborate, and set the agenda for the weeks to come.

When will the hangout take place? Rufus proposes moving the hangout from the 21st to the 23rd. If you want to participate, leave a comment on the thread to let Labs know what time would work for you.

Get involved

Labs is the Labs community, no more and no less, and you’re invited to become a part of it! Join the community by coding, blogging, kicking around ideas on the Ideas Page, or joining the conversation on the Labs mailing list.

Convert data between formats with Data Converters

Data Converters is a command line tool and Python library making routine data conversion tasks easier. It helps data wranglers with everyday tasks like moving between tabular data formats—for example, converting an Excel spreadsheet to a CSV or a CSV to a JSON object.

The current release of Data Converters can convert between Excel spreadsheets, CSV data, and JSON tables, as well as some geodata formats (with additional requirements).

Its smart parser can guess the types of data, correctly recognizing dates, numbers, strings, and so on. It works as easily with URLs as with local files, and it is designed to handle very large files (bigger than memory) as easily as small ones.

Data Converters homepage

Converting data

Converting an Excel spreadsheet to a CSV or a JSON table with the Data Converters command line tool is easy. Data Converters is able to read XLS(X) and CSV files and to write CSV and JSON, and input files can be either local or remote.

dataconvert simple.xls out.csv
dataconvert out.csv out.json

# URLs also work
dataconvert https://github.com/okfn/dataconverters/raw/master/testdata/xls/simple.xls out.csv

Data Converters will try to guess the format of your input data, but you can also specify it manually.

dataconvert --format=xls input.spreadsheet out.csv

Instead of writing the converted output to a file, you can also send it to stdout (and then pipe it to other command-line utilities).

dataconvert simple.xls _.json  # JSON table to stdout
dataconvert simple.xls _.csv   # CSV to stdout

Converting data files can also be done within Python using the Data Converters library. The dataconvert convenience function shares the dataconvert command line utility’s file reading and writing functionality.

from dataconverters import dataconvert
dataconvert('simple.xls', 'out.csv')
dataconvert('out.csv', 'out.json')
dataconvert('input.spreadsheet', 'out.csv', format='xls')

Parsing data

Data Converters can do more than just convert data files. It can also parse tabular data into Python objects that captures the semantics of the source data.

Data Converters’ various parse functions each return an iterator over the records of the source data along with a metadata dictionary containing information about the data. The records returned by parse are not just (e.g.) split strings: they’re hash representations of the contents of the row, with column names and data types auto-detected.

import dataconverters.xls as xls
with open('simple.xls') as f:
    records, metadata = xls.parse(f)
    print metadata
    print [r for r in records]
=> {'fields': [{'type': 'DateTime', 'id': u'date'}, {'type': 'Integer', 'id': u'temperature'}, {'type': 'String', 'id': u'place'}]}
=> [{u'date': datetime.datetime(2011, 1, 1, 0, 0), u'place': u'Galway', u'temperature': 1.0}, {u'date': datetime.datetime(2011, 1, 2, 0, 0), u'place': u'Galway', u'temperature': -1.0}, {u'date': datetime.datetime(2011, 1, 3, 0, 0), u'place': u'Galway', u'temperature': 0.0}, {u'date': datetime.datetime(2011, 1, 1, 0, 0), u'place': u'Berkeley', u'temperature': 6.0}, {u'date': datetime.datetime(2011, 1, 2, 0, 0), u'place': u'Berkeley', u'temperature': 8.0}, {u'date': datetime.datetime(2011, 1, 3, 0, 0), u'place': u'Berkeley', u'temperature': 5.0}]

What’s next?

Excel spreadsheets and CSVs aren’t the only kinds of data that need converting.

Data Converters also supports geodata conversion, including converting between KML (the format for geographical data used in Google Maps and Google Earth), GeoJSON, and ESRI Shapefiles.

Data Converters’ ability to convert between tabular data may also grow, adding JSON support on the input side and XLS(X) support on the output side—as well as new conversions for XML, SQL dumps, and SPSS.

Visit the Data Converters home page to learn how to install Data Converters and its dependencies, and check out Data Converters on GitHub to see how you can contribute to the project.