A phrase I’ve been using in talks recently:
Data is a platform not a commodity: you build on it rather than sell it. And that’s why it should be open.
A phrase I’ve been using in talks recently:
Data is a platform not a commodity: you build on it rather than sell it. And that’s why it should be open.
Why once this nothingness where once a city
Who will answer? Only the wind.
From Laslett ‘Phillipe AriÃ¨s and “La Famille”‘ p.83 (quoted in Eisenstein, p.131):
The actual reality, the tangible quality of community life in earlier towns or villages … is puzzling … and only too susceptible to sentimentalisation. People seem to want to believe that there was a time when every one belonged to an active, supportive local society, providing a palpable framework for everyday life. But we find that the phenomenon itself and its passing — if that is what, in fact happened– perpetually elude our grasp.
I was looking again recently at “Understanding the Knowledge Commons” which I had perused previously.
While reading the introductory chapter by Hess and Ostrom I came across:
People started to notice behaviors and conditions on the web-congestion, free riding, conflict, overuse, and “pollution” — that had long been identified with other types of commons. They began to notice that this new conduit of distributing information was neither a private nor strictly a public resource.
I think they are absolutely right to consider the analogies of “knowledge commons” with traditional commons. However, and at the same time, I think it essential to emphasize that “knowledge commons” are also fundamentally different.
The key difference here is in the nature of the underlying good that makes up the commons: in traditional cases the good is some physical resource — seas, rivers, land — to which usage is shared (either de facto or de jure), while in the knowledge case, well, it’s knowledge!
Now physical resources are by their nature ‘rival’ (or ‘subtractable’ as the authors put it), that is your usage and my usage are substitutes — your usage reduces the amount available for me to use and, when we are close to capacity, is strictly rival — either I use it or you use it. Knowledge, however, is a classic example of a non-rival resource: when you learn something from me I’ve lost nothing but you’ve gained something.
This means, for example, that the classic ‘tragedy’ of the commons where overuse leads to destruction of the resource is simply not possible for a knowledge commons — in fact, knowledge is like some magical food from a fairytale where the more its used the more of it there is!
The more useful ‘commons’ analogy for knowledge is not in relation to use but to production and the ‘free-rider’ problems that can arise where something must be done by a team or community. The issue here is that a separation appears between your effort (private) and the resulting outcome (shared) which may lead to an under-supply of effort and ‘free-riding’ on the efforts of others (if there are ten people on guard duty late at night, one can probably take a nap endangering the city but if all ten of them do it then it could be disastrous).
1. Before any misunderstanding arises I should make clear that the authors also acknowledge the role of rival/non-rival distinction — Ostrom, in fact, was one of the ‘coiners’ of the term rivalry. However, the article’s overall focus is on the analogies with the traditional commons.
2. Jamie Boyle has talked about the “second enclosure movement”. Though interesting to make this analogy I think references to the original enclosure movement is unfortunate for two reasons. First, it reinforces the mistaken analogy between knowledge and physical goods. Second, the evidence that the original enclosure movement was bad isn’t very compelling (in fact, it probably delivered net benefits).
In doing research for the EU Public Domain project (as here and here) we are often handling large datasets, for example one national library’s list of pre-1960 books stretched to over 4 million items. In such a situation, an algorithm’s speed (and space) can really matter. To illustrate, consider our ‘loading’ algorithm — i.e. the algorithm to load MARC records into the DB, which had the following steps:
The first part of this worked great: on a 1 million record load we averaged between 8s and 25s (depending on hardware, DB backend etc) per thousand records with speed fairly constant throughout (so that’s between 2.5 and 7.5h to load the whole lot). Unfortunately, at the consolidate stage we ran into problems: for a 1 million item DB there were several 100 thousand consolidations and we were averaging only 900s per 1000 consolidations! (This also scaled significantly with DB size: a 35k records DB averaged 55s per 1000). This would mean a full run would require several days! Even worse, because of the form of the algorithm (all the consolidation for a given person were done as a batch) we ran into memory issues on big datasets with some machines.
To address this we switched to performing “consolidation” on load, i.e. when creating each Item for a catalogue entry we’d search for existing authors who matched the information we had on that record. Unfortunately this had a huge impact on the load: time grew superlinearly and had already reached 300s per 1000 records at the 100k mark having started at 40 — Figure 1 plots this relationship. By extrapolation, 1M records would take 100 hours plus — almost a week!
At this point we went back to the original approach and tried optimizing the consolidation, first by switching to pure sql and then by adding some indexes on join tables (I’d always thought that foreign keys were auto indexed but it turned out not to be the case!). The first of these changes solved the memory issues, while the second resolved the speed problems providing a speedup of more than 30x (30s per 1000 rather 900s) and reduced the processing time from several days to a few hours.
Many more examples of this kind of issue could be provided. However, this one already serves to illustrate the two main points:
Both of these have a significant impact on the speed, and form, of the development process. First, because one has to spend time optimizing and profiling — which like all experimentation is time-consuming. Second because longer run-times directly impact the rate at which results are obtained and development can proceed — often bugs or improvements only become obvious once one has run on a large dataset, plus any change to an algorithm that alters output requires that it be rerun.
Last Thursday I attended a talk by Frederick Scherer at the [Judge] entitled: “Deregulatory Roots of the Current Financial Crisis”. Below are some sketchy notes.
It goes bad:
What is to be done:
This was an excellent presentation though, as was intended, it was more a summary of existing material than a presentation of anything “new”.
Not sure I was convinced by the “remember history” logic. It is always easy to be wise after the event and say “Oh look how similar this all was to 1929″. However, not only is this unconvincing analytically — it is really hard to fit trends in advance with any precision (every business cycle is different), but before the event there are always plenty of people (and lobbyists) arguing that everything is fine and we shouldn’t interfere. Summary: Awareness of history is all very well but it does not provide anything like the precision to support pre-emptive action. As such it is not really clear what “awareness of history” buys us.
More convincing to me (and one could argue this still has some “awareness of history in it) are actions like the following:
Worry about incentives in general and the principal-agent problem in particular. Try to ensure long-termism and prevent overly short-term and high-powered contracts (which essentially end up looking like an call option).
Since incentives can be hard to regulate directly one may need to work via legislation that affects the general structure of the industry (e.g. Glass-Stegall).
Summary: banking should be a reasonably dull profession with skill-adjusted wage rates similar to other sectors of the economy. If things get too exciting it is an indicator that incentives are out of line and things are likely to go wrong (quite apart from the inefficiency of having all those smart people pricing derivatives rather than doing something else!)
Be cautious regarding financial innovation especially where new products are complex. New products have little “track record” on which to base assessments of their benefits and risks and complexity makes this worse.
In particular, complexity worsens the principal-agent problem for “regulators” both within and outside firms (how can I decide what bonus you deserve if I don’t understand the riskiness and payoff structure of the products you’ve sold?). Valuation of many financial products such as derivatives depend heavily — and subtly — on assumptions regarding the distribution of returns of underlying assets (stocks, bonds etc).
If it is not clear what innovation — and complexity — are buying us we should steer clear, or at least be very cautious. As Scherer pointed out (in response to a question), there is little evidence that the explosion in variety and complexity of financial products since the 80s has actually done anything to make finance more efficient, e.g. by reducing the cost of capital to firms. Of course, it is very difficult to assess the benefits of innovation in any industry, let alone finance, but the basic point that 1940s through 1970s (dull banking) saw as much “growth” in the real economy as the 1980s-2000s (exciting banking) should make us think twice about how much complexity and innovation we need in financial products.
Finally, and on a more theoretical note, I’d also like to have seen more discussion about exactly why standard backward recursion/rational market logic fails here and what implications do the answers have for markets and their regulation. In particular, one would like to know doesn’t knowledge of a bubbles existence in period T lead to its unwinding (and hence by backward recursion to its unwinding in period T-1, and then T-2 etc until the bubble never existed). There are various answers to this in the literature based on things like herding, presence of noise investors, uncertainty about termination, but it would be good to have a summary, especially as regards welfare implications (are bubbles good?), and what policy interventions different theories prescribe.
I’m posting up an essay on “Discounting and Self-Control” (pdf). The essay, which I haven’t really touched for over a year, is still in its early stages but having lacked the time to do much on it over the last year, and going on the motto of “release early, release often”, I’m posting it up as a form of alpha version.
… then must you speak
Of one that loved not wisely, but too well;
Of one not easily jealous, but, being wrought,
Perplex’d in the extreme; of one whose hand,
Like the base Judean, threw a pearl away
Richer than all his tribe; …
An agent’s intertemporal choices depend on a variety of factors, most prominently, their valuation of future payoffs as encapsulated in a discount function. However, it is also clear that factors such as self-control may also play an important role, and given the similarity of impact, a confouding one. We explore the literature on this issue as well as examining what occurs when those with higher time-preference (whether arising from discounting or self-control) also enjoy their consumption more.
The exercise of will, especially in the form of self-control, has long been recognized as central to human existence, experience, and morality. Over the last few decades there has been increasing interest in the issue from a scientific perspective. At the same time, it has also long been appreciated that humans (and other animals) make trade-offs between the present and the future — as well as between different points in the future, and that events taking place closer to the present are given greater weight than those which are more distant. Traditionally, at least in economics, this type of behaviour has been subsumed under the heading of discounting.
Both of these factors, self-control and discounting, affect behaviour, and choices, in relation to outcomes which do not (all) take place in the present. However they are distinct. Specifically, consider a very simple case of two outcomes A and B where B occurs after A (for example, A might be one ice cream today and B an ice cream and a doughnut tomorrow). Self-control issues arise where one prefers B over A but is unable to execute on this preference and therefore actually takes (‘chooses’) A. By contrast, in the discounting case A is actually preferred over B and therefore is chosen (freely) by the decision maker.
It would seem important to keep these two aspects of decision making clearly separated. While lack of ‘self-control’ is usually seen as disadvantageous and a reason for adopting various ‘commitment strategies’ — for example, by opting to remove various items from the choice set (having no cigarettes in the house) — the simple preference for the present over the future incorporated in the discounting model would seem to generate no such difficulties.
However, empirically it may prove rather difficult to do so. As shown by the simple example above the same observed ‘choice’ for A (one ice cream today) over B (ice cream plus doughnut tomorrow) can be the result of two very different processes. Thus if we only observe choices, and not the underlying preferences and/or the process by which the choice is arrived at, it may be impossible to distinguish the two.
It is perhaps for this reason that these distinct aspects are sometimes conflated. Consider, for example, Mischel et al 1989 which is entitled “Delay of Gratification in Children” and summarizes much of Mischel of pioneering work on this area. Mischel’s approach is clearly more oriented along the self-control aspect, and this is borne out in the types of experiments conducted (more on this below). Nevertheless they state (p.934) “The obtained concurrent associations [between treatments and delay] are extensive, indicating that such preferences reflect a meaningful dimension of individual differences, and point to some of the many determinants and correlates of decisions to delay (18).” Here the orientation towards self-control has become a general “decision to delay” and this is borne out by the associated footnote (18) which references related literature in other disciplines and is worth quoting in its entirety:
Lots of people have been up in arms about a letter sent out by Ordnance Survey about the “Use of Google Maps for display and promotion purposes”. With titles like “Are the Show Us A Better Way winners safe from Ordnance Survey?” (Guardian), “Home Secretary’s crime maps not allowed say Ordnance Survey” (localgov.co.uk) or “The mapping mess – Google v OS” (bbc.co.uk) these seemed to indicate some particularly unreasonable behaviour by OS.
However, after actually reading the original OS letter I’m far from convinced. In essence OS say:
Much of the discussion centred on the last of these items: what is derived data? OS state:
Simply put, Ordnance Survey derived data is any data created using Ordnance Survey base data. For example, if you capture a polygon or a point or any other feature using any Ordnance Survey data, either in its data form or as a background context to the polygon/point/other feature capture, this would constitute derived data.
It should also be borne in mind that data from other suppliers may be based on Ordnance Survey material, and thus the above considerations may still apply. We therefore recommend that you verify whether any third-party mapping you use may have been created in some way from Ordnance Survey data before displaying it on Google Maps.
NOTE: Again, the answer to this question is based on our understanding of which of Googleâ€™s standard terms and conditions we believe would apply. In the event that Google is prepared to offer you terms and conditions which do not involve you purporting to grant Google a licence of Ordnance Survey base or derived data, we would have no objection to your hosting such data on top of Google Maps in this scenario.
My understanding of this is that if you extract the geodata from an OS map (i.e. polygon, points, features) by some extraction method (such as tracing) then that’s derived data and OS can control what you do. This is pretty standard: if I copy text from a book by typing it out longhand I’m still infringing copyright.
However, this does not mean if I’m using OS maps as a base-layer and, for example, by clicking at some particular point I generate a lat-long (say to indicate where I live, or where a crime happened) then that lat-long is ‘derived’ data.
Now, of course, this could be a fine line: if I happened to click on a bunch of points, say to indicate a walk I went on, and these also showed the route of road there could be debate as to whether I’m infringing the OS rights in the feature or not.
Nevertheless, the basic principle (as I understand it) is clear: geodata created when using OS tools and maps is always yours unless it is directly replicating the underlying OS data. If this interpretation is correct then this whole debate is a bit of a storm in a teacup and projects such as crime-mapping or providing a loofinder aren’t at any risk from OS’s licensing terms.
It would be interesting to chart over time the progress of open-source, standards compliant, Mozilla-type web browsers (e.g. Firefox) versus Microsoft’s Internet Explorer. As is often the case in other areas, it is not easy to get good (open) data over a reasonable time period. The graph below shows browser market share as measured by the browser usage of visitors to the W3Schools website (data source on Open Economics plus the code to extract original data into this usable form).
Given the source, and therefore the bias towards more technically savvy users, these figures probably overstate Firefox’s market share somewhat, though the overall trend is probably largely correct. What we see is a steady and continuing increase in Mozilla (Firefox) market share ever since Firefox’s launch in Autumn 2004 and a concomitant decline in market share of IE (the little dip for Firefox at the end appears to be directly attributable to the launch of Google’s chrome). What is particularly interesting is that, at least for W3Schools users, we are almost at the point where there are as many people using Firefox as IE. This is significant for several reasons.
First, because of its level of usage it will no longer be possible for websites to only ‘work in IE’ but instead will always have to work in Firefox as well. This is both good for Firefox and for the standards-compliant browsers more generally (while, of course, Firefox itself is not perfectly standards compliant it has traditionally been much better than IE).
Second, it is an (unusual) example of a case where dominance has not been maintained. Generally a firm with established dominance in a given area is able to maintain — witness the robustness of Microsoft’s established dominance in other areas. By contrast, in this market, as the graph shows, Firefox has almost drawn level with IE and may soon surpass it if the trend of the last few years continues.
The human problem of ‘scarce resources and unlimited wants’ is oft-posited as a primary motivation for studying economics. As this phrase makes clear, ‘wants’ (‘preferences’ to use the more usual terminology) are a central part of what we study, and the existence, and stability, of those ‘wants/preferences’ therefore merit serious consideration.[^1]
[^1]: It is interesting how the term ‘preference’ is studiously neutral, and almost anodyne in comparison with a term such as ‘want’, ‘desire’ or even ‘need’, each of which is a potential synonym. One might imagine, and this is simply conjecture, that the term was intentionally adopted in order to remove any overtone of judgement. After all, across most culture and over much of human history, the formation and satisfaction of ‘preferences’ has been a process laden with ethical, and religious, significance.
Few of us have difficulty accepting the fundamental nature of our desire for food and shelter. However, many of us might have greater difficulties assigning the same fundamentality to the desire for a particular brand of designer perfume or a digital music player. In fact, it is unclear to what extent one can want what one has never known (or conceived of), and thus, while it is not difficult to imagine any human desiring food and shelter — especially when they are absent, it is hard to imagine a stone-age nomad, say, even being able to conceive of designer perfume or iPods (let alone feel their lack).
It is also telling that so many of the consumer goods, especially those away from the necessity end of the spectrum, appear to require active promotion to the public. Of course it is true, as economists are particularly fond of pointing out, that advertising has an informational component — simply letting you know about the existence and attributes of products. However, it is also hard to deny that advertising also has a substantial ‘persuasive’ component, operating either to create preferences or alter existing ones.
If so this has important implications. In particular, it strongly suggests that our wants aren’t simply given but are, at least to some extent, formed by our experience and choices.[^2] This raises some deep and important questions for economists to answer — questions with a major bearing on the state and direction of many modern societies. It also has some direct connections with one of the oldest, and most philosophical, of the world’s religious traditions: Buddhism. Central to Buddhist teaching are the Four Noble Truths. Succintly put these are, in order:[^3]
[^2]: In economics jargon: preference are endogenous (i.e. determined within the system) rather than exogenous (fixed externally — e.g. by ‘nature’). The study of endogenous preferences is certainly not new. See for example the review of Bowles (1998) or the early incorporation of changeable preferences into the ‘traditional’ framework by Becker and Stigler (1977).
[^3]: These translations of the Dhammacakkappavattana Sutta are taken from http://www.accesstoinsight.org/tipitaka/sn/sn56/sn56.011.piya.html
Why is this teaching relevant here? First, observe a commonality: both economics and Buddhism takes unsatisfied ‘wants’ (or ‘cravings’) as a source of unhappiness. But how do go about solving this problem? Here economics and Buddhism part ways, and rather dramatically, with the Four Noble Truths presenting a path to the achievement of well-being which is almost diametrically opposite to that advocated by economics.
Specifically, the ‘economics’ approach, is based on taking preferences as given and focusing on generating the goods to satisfy them. By contrast, Buddhism sees ‘wants’ as ultimately unsatisfiable, and instead proposes that the way to well-being is not to satisfy them but to relinquish them — while some ‘cravings’ can be temporarily satisfied more will always be generated, moreover there some fundamental desires, such as the wish not to die, cannot be addressed in the material world.
Put starkly: economic thought directs our energy efforts to satisfying our wants taking them as given while Buddhism directs those self-same energies to altering our wants, and views most attempts to satisfy them by obtaining ever more ‘things’ as inevitably doomed to failure — in fact, actively counter-productive as more ‘wants’ are generated by the very process of satisfaction.