Proposal in Brazil to Legalize Non-Commercial File-Sharing and Monetize P2P
September 2nd, 2010
Pedro Paranaguá points me to a proposal for monetizing P2P file-sharing in Brazil.
The proposal has been submitted as part of Brazil’s open public consultation to review its copyright law. As he summarizes it for non-Portugese speakers like myself (though Google translate did not do a bad job!):
Basically, non-commercial file sharing will be authorized – should the proposal be accepted and passed into law. Each broadband user will pay a R$3 (or US$1.71) fee together with her/his monthly Internet Service Provider (ISP) bill. The ISP will collect the fees and distribute it to a collecting society comprised of authors’ associations that will then distribute the collected fees to authors, composers, and so on in the proportion that the works are downloaded.
Public Sector Transparency Board
June 28th, 2010
As announced on Friday on the UK Government’s data.gov.uk, I am one of the members of the UK Government’s newly formed Public Sector Transparency Board.
From the announcement:
The Public Sector Transparency Board, which was established by the Prime Minister, met yesterday for the first time.
The Board will drive forward the Government’s transparency agenda, making it a core part of all government business and ensuring that all Whitehall departments meet the new tight deadlines set for releasing key public datasets. In addition, it is responsible for setting open data standards across the whole public sector, listening to what the public wants and then driving through the opening up of the most needed data sets.
Chaired by Francis Maude, the Minister for the Cabinet Office, the other members of the Transparency Board are Sir Tim Berners-Lee, inventor of the World Wide Web, Professor Nigel Shadbolt from Southampton University, an expert on open data, Tom Steinberg, founder of mySociety, and Dr Rufus Pollock from Cambridge University, an economist who helped found the Open Knowledge Foundation.
In the words of Francis Maude:
“In just a few weeks this Government has published a whole range of data sets that have never been available to the public before. But we don’t want this to be about a few releases, we want transparency to become an absolutely core part of every bit of government business. That is why we have asked some of the country’s and the world’s greatest experts in this field to help us take this work forward quickly here in central government and across the whole of the public sector.”
The Size of the Public Domain (Without Term Extensions)
May 26th, 2010
We’ve looked at the size of the public domain extensively in earlier posts.
The basic take away from the analysis was the finding that, based on library catalogue data, for books in the UK, approximately 15-20% of work was in the public domain — with public domain work being pretty old (70 years plus, due to the life+70 nature of copyright).
An interesting question to ask then is: how large would the public domain be if copyright had not been extended from its original length of 14 years with (possible) 14 year renewal (14+14) set out in Statute of Anne back in 1710? And how does this compare with how the situation, back when 14+14 was in “full swing”, say, 1795?
Furthermore, what about if copyright today was a simple 15 years — the point estimate for the optimal term of copyright found in paper on this subject? Well here’s the answer:
| Today | 1795 (14+14) | Today (14+14) | Today (15y) | |
|---|---|---|---|---|
| Total Items | 3.46m | 179k | 3.46m | 3.46m |
| No. Public Domain | 657k | 140k | 1.2m | 2.59m |
| %tage Public Domain | 19 | 78 | 52 | 75 |
Number and percentage of public domain works based on various scenarios based on Cambridge University Library catalogue data.
That’s right folks: based on the data available, if copyright had stayed at its Statute of Anne level, 52% of the books available today would in the public domain compared to an actual level of 19%. That’s around 600,000 additional items that would be in the public domain including works like Virginia Woolf’s (d. 1941) the Waves, Salinger’s Catcher in the Rye (pub. 1951) and Marquez’s Chronicle of a Death Foretold (pub. 1981).
For comparison, in 1795 78% of all extant works were in the public domain. A figure which we’d be close to having if copyright was a simple 15 years (in that case the public domain would be a substantial 75%).
To put this in visual terms, what the public domain is missing out as a result of copyright extension is the yellow region in the following figure: those are the set of works that would be public domain under 14+14 but aren’t under current copyright!
The Public Domain of books today (red), under 14+14 (yellow), and published output (black)
Update: I’ve posted the main summary statistics file including per-year counts. I’ve also started a CKAN data package: eupd-data for this EUPD-related data.
I was recently asked to put together a short document outlining my main policy recommendations in the area of “innovation, creativity and IP”. Below is what I prepared.
General IP Policy
Recommendation: IP policy, and more generally innovation policy, should aim at the improvement of the overall welfare of UK society and citizens and not just at promoting innovation and creativity
Innovation is, of course, a major factor in the improvement of societal welfare — but not the only factor, access to the fruits of that innovation is also important.
IP rights are monopolies and such monopolies when over-extended do harm rather than good. The provision of IP rights must balance the promotion of innovation and creativity with the need for adequate access to the results of those efforts both by consumers and those who would seek to innovate and create by building upon them. A policy which aims purely at maximizing innovation, via the use of IP rights, will almost certainly be detrimental to societal welfare, since it will ignore the negative consequences of extending IP on access to innovation and knowledge. As such, IP policy is about having “enough, but not too much”.
This basic point is often overlooked. To help minimize the risk of this occurring in future it is suggested that this basic purpose — of promoting the welfare of UK citizens — be explicitly embedded within the goals of organisations and departments tasked with handling policies related to innovation and IP.
Recommendation: Move away from a focus on intellectual property to look at innovation and information policy more widely
IP rights are but one tool for promoting innovation and often a rather limited one. The focus should be on the general problem — promoting societal welfare through innovation and access to innovation — not on one particular solution to that problem.
Provision and Pricing of Public Section Information
Background
Public sector information (PSI) is information held by a public sector organisation, for example a government department or, more generally, any entity which is majority owned and/or controlled by government. Classic examples, of public sector information in most countries would include, among many others: geospatial data, meteorological information and official statistics.
While much of the data or information used in our society is supplied from outside the public sector, compared to other parts of the economy, the public sector plays an unusually prominent role. In many key areas, a public sector organization may be the only, or one among very few, sources of the particular information it provides (e.g. for geospatial and meteorological information). As such, the policies adopted regarding maintenance, access and re-use of PSI can have a very significant impact on the economy and society more widely.
Funding for public sector information can come from three basic sources: government, ‘updaters’ (those who update or register information) and ‘users’ (those who want to access and use it). Policy-makers control the funding model by setting charges to external groups (’updaters’ or ‘users’) and committing to make up any shortfall (or receive any surplus) that results. Much of the debate focuses on whether ‘users’ should pay charges sufficient to cover most costs (average cost pricing) or whether they should be given marginal cost access — which equates to free when the information is digital. However, this should not lead us to neglect the third source of funding via charges for ‘updates’.
Policy-makers must also to concern themselves with the regulatory structure in which public sector information holders operate. The need to provide government funding can raise major commitment questions while the fact that many public sector information holders are the sole source of the information they supply raise serious competition and efficiency issues.
Recommendation: Make digital, non-personal, upstream PSI available at marginal cost (zero)
The case for pricing public sector information to users at marginal cost (equal to zero for digital data) is very strong for a number of complementary reasons. First, the distortionary costs of average rather than marginal cost pricing are likely to be high. Second, the case for hard budget constraints to ensure efficient provision and induce innovative product development is weak. As such, digital upstream public sector information is best funded out of a combination of ‘updater’ fees and direct government contributions with users permitted free and open access. Appropriately managed and regulated, this model offers major societal benefits from increased provision and access to information-based services while imposing a very limited funding burden upon government.
Recommendation: Regulation should be transparent, independent and empowered. For every public sector information holder there should be a single, clear, source of regulatory authority and responsibility, and this ‘regulator’ should be largely independent of government.
This is essential if any pricing-policy is to work well and is especially important for marginal-cost pricing where the Government may be providing direct funding to the information holder. Policy-makers around the world have had substantial experience in recent years with designing these kinds of regulatory systems and this is, therefore, not an issue that should be especially difficult to address.
Copyright Term
Background
The optimal term of copyright has been a very live policy issue over the last decade. Recently, in the European Union, and especially in the UK, there has been much debate over whether to extend the term of copyright in sound recordings from its current 50 years.
The basic trade-off inherent in copyright is a simple one. On the one hand, increasing copyright yields benefits by stimulating the creation of new works but, on the other hand, it reduces access to existing works (the welfare ‘deadweight’ loss). Choosing the optimal term, that is the length of protection, presents these two countervailing forces particularly starkly. By extending the term of protection, the owners of copyrights receive revenue for a little longer. Anticipating this, creators of work which were nearly, but not quite, profitable under the existing term will now produce work, and this work will generate welfare for society both now and in the future. At the same time, the increase in term applies to all works including existing ones — those created under the term of copyright before extension. Extending term on these works prolongs the copyright monopoly and therefore reduces welfare by hindering access to, and reuse of, these works.
Recommendation: Reduce Copyright Term – And Certainly Do Not Extend It
Current copyright term is significantly over-extended. Calculations performed in the course of my own work indicate that optimal copyright term is likely around 15 years and almost certainly below 40 (the breadth of the estimates here are a direct reflection of the existing data limitations but this upper bound is still (far) below existing terms).
Even a simple present-value calculation would indicate that the incentives for creativity today offered by extra term 50 years or more in the future are negligible — while the effect on access to knowledge can be very substantial, especially when term extensions are applied retrospectively (as they almost always are).
It is also noteworthy that recent extensions, such as that for authorial copyright in the US (the CTEA) and the proposed extension of recording copyright in the EU, have been opposed well-nigh unanimously by academic economists and other IP scholars. Policy-making in this area should be evidence-based and designed to promote the broader welfare of society as a whole. Policies that appear to reflect nothing more than special-interest lobbying will only perpetuate the “marked lack of public legitimacy” which the Gowers report lamented, discouraging those who wish to contribute constructively to future Government policy-making in these areas, and making enforcement ever harder — effective enforcement, after all, depends on consent borne of respect as well as obedience coerced through punishment.
Talking at Cambridge University Library on Openness and Libraries
January 25th, 2010
This Wednesday (27th of January) at 1pm I’m giving one of Cambridge University Library’s regular lunch-time talks on Openness and Libraries. Attendance is free and anyone can come along!
Update (28th Jan): talk is done and slides are now up.
Blurb
Over the past few years, open licensing (http://www.opendefinition.org/) has facilitated the explosive growth of a ‘knowledge commons’. To give a few prominent examples: Open Access journals, Open Educational Resources and Open Data in scientific research have all been enabled by licenses which permit material to be freely re-used and re-distributed. This outpouring of support for openness has led to an incredible rise in community-led development and innovative uses.
Bibliographic records are a key part of our shared cultural heritage and essential to anyone working with cultural materials (books, music, films etc). Opening up those records for access and re-use offer a variety of benefits.
First, it would allow libraries to share records more efficiently and improve quality more rapidly through better, easier feedback. Second, easier access to catalogue data would spur development of the multifarious services, technologies and research that use that data, including, for example, search engines, book or music websites, researchers working on information production, journalists writing on orphan works, as well as many other areas we cannot even imagine in advance.
With a growing number of Government agencies and public institutions making data open, is is now time for the library community to do likewise?
Argentina Extends Copyright Term in Recordings
January 14th, 2010
Apparently, on the 11th of December 2009, Argentina extended copyright term in recordings from 50 to 70 years (see e.g. here, here and here).
Instead of the real reasons for extension — propping up the profits of a handful of multinational record labels and their shareholders (at the expense of everyone else) — the usual disingenuous justifications were once again being trotted out by music industry representatives.
First up was (all quotes from the billboard article):
The investment argument
“I would like to thank all those who supported this new law which will benefit the music community in Argentina,” tango master Leopoldo Federico, president of AADI, said in a statement. “It will improve incentives to invest in future recordings and also helps older performers who had faced losing their rights just when they need them the most.”
…
John Kennedy, chairman and chief executive of IFPI, also welcomed the legislation. “I am delighted that Argentina has strengthened the rights of performers and producers by extending the term of protection,” he said in a statement. “Argentina has a strong musical heritage and this reform means that producers will have a greater incentive to invest in the next generation of local talent.“
But wait a moment: “producers” are already getting 50 years of monopoly protection. How much extra incentive are those 20 extra years going to provide?
Let’s do some simple calculations.
First off remember this is about incentives, which means it is about expected payoffs at the point of investment, i.e. when the recording is created. As such we should be dealing with “present value” figures, i.e. total revenue in “today’s terms”.
To work out the the effect of an extension then we need an idea for a) what future sales look like relative today (the cultural decay rate) and b) a way of putting future revenue in today’s term (the discount rate). The industry’s own analysis (commissioned for the Gowers review in the UK) used a nominal discount rate of 12.3% (pre-tax) and cultural decay rates of 3-20% (in nominal terms it appears). Let’s be generous and take the lowest possible cultural decay rate of 3%. Combined with the 12.3% discount rate this means that, on average, revenue is dropping at a substantial 14.3%!
Running this through a bit of basic maths (and I mean really basic — code inline below) we find that the 20 year extension will deliver a tiny 0.08% increase in revenues. Even halving the nominal discount rate to a very low figure like 6% only pushes up the revenue gain to just over 1% (1.1%). For those who like things visually here’s a picture:

Aside: Of course there will be a lot of variation from the average — note that the relevant variation is not between hits and duds (as these may experience exactly the same decay!) but between records which go on selling at a reasonably steady rate and those which fade away fairly quickly. However, an “investor”, such as a record label, tends to “invest” in a whole “portfolio” of records precisely in order to reduce this “risky” variability (and in any case greater risk implies a higher discount rate assuming the investor is risk averse). As such the average revenue increase is precisely what an “investor” will use when making decisions such as how many recordings to fund.
Next up was:
The pension for performers argument
“I would like to thank all those who supported this new law which will benefit the music community in Argentina,” tango master Leopoldo Federico, president of AADI, said in a statement. “It will improve incentives to invest in future recordings and also helps older performers who had faced losing their rights just when they need them the most.“
But life expectancy in Argentina is 75 years — and is probably shorter for most performers who are old today. So, unless a performer is especially prolific in their teens, 50 years of copyright monopoly is already enough to cover them in their old(er) age.
And anyway haven’t performers heard about pensions or saving for the future — everyone else has. I don’t expect the plumber I pay today to fix by sink to come back in 50 years asking for additional payment for a pension plan! Instead I expect the plumber to save some of the income received today to use in retirement.
Moreover, as the calculations above should make clear, copyright income 50+ years in the future from recordings today is likely (on average) to be tiny (0.08% of the revenue received during the first 50 years!). As such there is no way the average performer could rely on income from a 20 year term extension 50 years in the future to support them in their old age. Just like everyone else they will need to save some of the income during that first 50 years.
Aside: in fact it is is more like 10 years or even fix years, as for most recordings, the vast majority of the revenue they will ever generate will come in the first 5 or 10 years after release.
Last up we had:
The cultural argument
Javier Delupí, CAPIF’s executive director, added: “This new law is good news for Argentine culture. It promotes the creation of new music and safeguards the rights of performers and producers both here and abroad.”
But:
- The investment argument is completely invalid (see above) and hence there won’t be any “promoting the creation of new music”.
- In fact, to the contrary, the extension will impede the creation of new works by reducing the public domain on which all creators can and do build.
- Moreover, an extension transfers money to (older and already successful) performers away from younger and less well-known ones.
- Depending on how comparison of terms is implemented an extension actually harms the balance of payments of the enacting country (e.g. the UK looses out from a term extension in recordings)
So, no, term extensions aren’t good for (Argentine) culture — though they may be good for CAPIF (Representando a la Industria Argentina de la Música).
Conclusion
It’s time we start calling a spade a spade: this term extension is a simple, and highly inefficient, subsidy to the major record labels plus, perhaps, a few, already highly successful, performers, which is paid for by the general populace.
If it can command widespread assent in that form, then, fine, let it pass! But I sincerely doubt the likelihood of this occurrence. If this is so, then the passage of such bills, is nothing more or less than a straightforward “robbery upon the public” — in the 150 year-old words of Henry Warburton, radical opponent of the UK’s term extension of the 1840s.
Colophon
Here’s the python script used for the revenue calculations above, together with the code to generate the figure.
#!/usr/bin/env python
def extra_revenue(term, extension, decay, irate):
dfactor = 1/(1+decay+irate)
def geometric(df, NN):
return (1-df**(NN+1))/(1-df)
total = geometric(dfactor, term)
textension = dfactor**term * geometric(dfactor, extension)
increase = textension/total
print('Term, Extension, decay, irate: %s %s %s %s' % (term, extension,
decay, irate))
print('Percentage increase: %s' % (100*increase))
extra_revenue(50, 20, 0.03, 0.123)
extra_revenue(50, 20, 0.05, 0.123)
extra_revenue(50, 20, 0.03, 0.06)
extra_revenue(50, 20, 0.04, 0.06)
import math
def visualize():
import matplotlib.pyplot as pyplot
# normalize main square to 10x10 = 100
pyplot.bar(0, 10, width=10, fc='red', alpha=0.6)
edge = math.sqrt(0.08)
pyplot.bar(14, edge, width=edge, bottom=5, align='center', fc='blue', alpha=0.6)
pyplot.bar(14, 1, width=1, bottom=1, align='center', fc='blue', alpha=0.6)
pyplot.figtext(0.15, 0.7, 'Present Value of Revenue\nUnder Existing\n50y Term', multialignment='center', va='top')
pyplot.figtext(0.65, 0.7, 'PV of Extra Revenue\nfrom 20y Extension',
multialignment='center', va='top')
pyplot.figtext(0.7, 0.4, '1% of Existing\n Revenue',
multialignment='center', va='top')
# hack to get rid of axes ...
ax = pyplot.gca()
ax.set_frame_on(False)
pyplot.yticks([],[])
pyplot.xticks([],[])
fig = pyplot.figure(1)
fig.set_size_inches(5, 3)
pyplot.savefig('revenue_impact.png')
visualize()
print('Saved image to disk')
Historical Banking Crises and the Rules of the Game
December 7th, 2009
Attended an interesting talk today: “Historical Banking Crises and the Rules of the Game” by Professor Charles Calomiris, Columbia Business School. Sporadic notes below. See also this Weaving History thread on Financial Crises.
Notes
- One crisis with 20 different explanations. Need to sort these out a little.
- If banks are uninsured then in a recession banks cut their supply of loans
- Banks are facing losses, need to bulk up their balance sheet and can do it either by raising equity or cutting supply of loans. Former is hard so do the latter.
- Crisis aren’t just inherent to human nature or capitalism. “Crisis propensity reflects politically determined rules of the banking game that are conducive to crises:”
- industry setup that determines exposure of banks to risk
- absence of decent (effective and incentive compatible) central-banking (NB: 2 isn’t a big problem w/o 1)
- subsidization of risk by govt policies
- Panic = moments of severe sudden withdrawal that threatened the system. Observable variable: collective action by NY clearing banks
- In US (19th and early 20th c.): 1857, 1873, 1877, 1893, 1907 [ed: missing at least 2 and may have got wrong I think]
- All of 6 crises in US post civil war were all preceded by 50% increase in liabilities and 7% drop in stock market
- Britain: 1825, 1836, 1847, 1857, 1866 then none for over a century
- Solvency crisis: -ve net worth of failed bank > 1% of GDP
- 140 examples since 1978
- Rare in past: 4 in 1873-1913
- Australia: 1893 (10%)
- Argentina: 1890 (10%)
- Norway: 1900 (3%)
- Italy: 1893 (1%)
- Literature has converged in last 20 years to agree that safety-net provision on balance increases instability (rather than reducing it)
- Crucial reform in 1858 in UK following 1857 crisis. BoE would no longer intervene in bills market. In 1866 made good on this promise when largest bill discounter went bust (Overend and Gurney)
- Crisis origins:
- Loose money: CBs, flat yield curve … (but note not enough for a crisis on own)
- Housing subsidies delivered by leverage. F&F have $1.6 trillion out of $3 trillion total subprime. $350 billion cost on F&F alone.
- Huge buy-side agency problems
- Lots of buy-side people buying poor quality material for clients facility by big race-to-the-bottom at ratings agency
- Prudential regulation failure
- Everyone smart knew there was a subprime crisis in mid-2006.
- Long-term regulatory reforms
- Micro-prudential reform: focus on measurement of risk
- Credit rating agency reform
- Resolution policy/TBTF Problems
Size of the Public Domain III
November 26th, 2009
Here we are going to apply the results on Public Domain “proportions” derived in our previous post and thereby obtain best estimates of the UK public domain.
The logic is simple, and similar to that in our first post in the series: we will take the Public Domain proportions from Table 3 of our last post and combine with our (conservative) estimates for output based on library catalogues. Here are the results:
| Pub. Date | Items | % PD | No. PD |
|---|---|---|---|
| 1400-1850 | 304587 | 100 | 304587 |
| 1850-1860 | 40970 | 100 | 40970 |
| 1860-1870 | 43734 | 100 | 43734 |
| 1870-1880 | 50564 | 95 | 48035 |
| 1880-1890 | 66857 | 90 | 60171 |
| 1890-1900 | 66883 | 85 | 56850 |
| 1900-1910 | 70360 | 65 | 45734 |
| 1910-1920 | 60489 | 40 | 24195 |
| 1920-1930 | 78670 | 25 | 19667 |
| 1930-1940 | 90576 | 10 | 9057 |
| 1940-1950 | 72692 | 6 | 4361 |
| 1950-1960 | 118251 | 0 | 0 |
| 1960-1970 | 262974 | 0 | 0 |
| 1970-2009 | 2130509 | 0 | 0 |
| Total | 3458116 | 19 | 657361 |
UK Public Domain Totals Based on Cambridge University Library Data. Note, as discussed in previous posts, figures from British Library are approximately 3x larger (both for Public Domain and total items).

Total (Black) and Public Domain (Red) Items per year based on the CUL Catalogue.
Zooming in to the pre-1960 period to get more detail:

Total (Black) and Public Domain (Red) Items per Year based on the CUL Catalogue for pre-1960 period.
Public Domain Calculators Workshop
November 6th, 2009
I’m one of the co-organizers of a workshop on Public Domain Calculators workshop taking place next week, on the 10th and 11th of November, at Emmanuel College, University of Cambridge.
Hosted by the Open Knowledge Foundation in association with the Centre for Intellectual Property and Information Law at the University of Cambridge, it’s a meeting of European experts on copyright and the digital public domain taking place as part of the Communia project.
The purpose of the workshop is to produce materials such as legal flow charts and public domain “algorithms” which will help with the representation of different national copyright laws and the determination of public domain status.
Details of the meeting are as follows:
- When: 10-11th November 2009
- Where: Emmanuel College, Cambridge
- Wiki: http://wiki.okfn.org/PublicDomainCalculators/Meeting
- Participate: Free but space is limited. If you are interested in coming, email the organizers at: info@okfn.org
Background
There is often a tendency to talk of ‘the public domain’ and of works falling out of copyright and ‘into the public domain’ – as though there is a single set of works which are out of copyright all over the world. In fact, of course, there are different national laws about the nature and duration of copyright in different types of works – and hence what is in the public domain is different in different countries.
Efforts are currently underway to build a series of public domain calculators – which will help to determine whether or not a given work is in copyright in a given jurisdiction. At the time of writing groups and individuals in more than 17 jurisdictions are assisting in this effort.
Author “Significance” From Catalogue Data
November 5th, 2009
Continues the series of post related to analyzing catalogue data, here are some stats on author “significance” as measured by the number of book entries (’items’) for that author in the Cambridge University Library catalogue from 1400-1960 (there being 1m+ such entries).
I’ve termed this measure “significance” (with intentional quotes) as it co-mingles a variety of factors:
- Prolificness — how many distinct works an author produced (since usually each work will get an item)
- Popularity — this influences how many times the same work gets reissued as a new ‘item’ and the library decision to keep the item
- Merit — as for popularity
The following table shows the top 50 authors by “significance”. Some of the authors aren’t real people but entities such as “Great Britain. Parliament” and for our purposes can be ignored. What’s most striking to me is how closely the listing correlates with the standard literary canon. Other features of note:
- Shakespeare is number 1 (2)
- Classics (latin/greek) authors are well-represented with Cicero at number 2 (4), Horace at 5 (9) followed Homer, Euripides, Ovid, Plato, Aeschylus, Xenophon, Sophocles, Aristophanes and Euclid.
- Surprise entries (from a contemporary perspective): Hannah More, Oliver Goldsmith, Gilbert Burnet (perhaps accounted by his prolificity).
- Also surprising is limited entries from 19th century UK with only Scott (26), Dickens (28) and Byron (41)
| Rank | No. of Items | Name |
|---|---|---|
| 1 | 3112 | Great Britain. Parliament. |
| 2 | 1154 | Shakespeare, William |
| 3 | 1076 | Church of England. |
| 4 | 973 | Cicero, Marcus Tullius |
| 5 | 825 | Great Britain. |
| 6 | 766 | Catholic Church. |
| 7 | 721 | Erasmus, Desiderius |
| 8 | 654 | Defoe, Daniel |
| 9 | 620 | Horace |
| 10 | 599 | Aristotle |
| 11 | 547 | Voltaire |
| 12 | 539 | Virgil |
| 13 | 527 | Swift, Jonathan |
| 14 | 520 | Goethe, Johann Wolfgang Von |
| 15 | 486 | Rousseau, Jean-Jacques |
| 16 | 479 | Homer |
| 17 | 444 | Milton, John |
| 18 | 388 | Sterne, Laurence |
| 19 | 387 | England and Wales. Sovereign (1660-1685 : Charles II) |
| 20 | 386 | Euripides |
| 21 | 372 | Ovid |
| 22 | 358 | Goldsmith, Oliver |
| 23 | 358 | Plato |
| 24 | 351 | Wang |
| 25 | 349 | Alighieri, Dante |
| 26 | 338 | Scott, Walter (Sir) |
| 27 | 326 | More, Hannah |
| 28 | 322 | Dickens, Charles |
| 29 | 315 | Aeschylus |
| 30 | 304 | Burnet, Gilbert |
| 31 | 302 | Luther, Martin |
| 32 | 295 | Dryden, John |
| 33 | 290 | Xenophon |
| 34 | 280 | Sophocles |
| 35 | 262 | Pope, Alexander |
| 36 | 259 | Fielding, Henry |
| 37 | 258 | Li |
| 38 | 250 | Calvin, Jean |
| 39 | 248 | Zhang |
| 40 | 247 | Aristophanes |
| 41 | 247 | Byron, George Gordon Byron (Baron) |
| 42 | 247 | Bacon, Francis |
| 43 | 24have 7 | Chen |
| 44 | 245 | Terence |
| 45 | 241 | Euclid |
| 46 | 235 | Augustine (Saint, Bishop of Hippo.) |
| 47 | 232 | Burke, Edmund |
| 48 | 223 | Johnson, Samuel |
| 49 | 222 | Bunyan, John |
| 50 | 222 | De la Mare, Walter |
Top 50 authors based on CUL Catalogue 1400-1960
The other thing we could look at is the overall distribution of titles per author (and how it varies with rank — a classic “is it a power law” question). Below are the histogram (NB log scale for counts) together with a plot of rank against count (which equates, v. crudely, to a transposed plot of the tail of the histogram …). In both cases it looks (!) like a power-law is a reasonable fit given the (approximate) linearity but this should be backed up with a proper K-S test.
Histogram of items-per-author distribution (log-log)
Rank versus no. of items (log-log)
TODO
- K-S tests
- Extend data to present day
- Check against other catalogue data
- Look at occurrence of people in title names
- Look at when items appear over time
Colophon
Code to generate table and graphs in the open Public Domain Works repository, specifically method ‘person_work_and_item_counts’ in this file: http://knowledgeforge.net/pdw/hg/file/tip/contrib/stats.py


