The Size of the Public Domain
This post continues the work begun in this earlier post on “Estimating Information Production and the Size of the Public Domain”. Update: 2009-07-17 there is now a follow-up post.
Having already obtained estimates of the number of items (publications) produced each year based on library catalogue data our next step is to convert this into an estimate of the “size” of the public domain. (NB: as already discussed, “size” could mean several different things. Here, at least to start with, we’re going to take the simplest and crudest approach and equate size with number of publications/items.)
The natural, and most obvious, approach here is to go through our 1 million+ items and compute their public domain status (as discussed in this earlier post). Unfortunately, as detailed there, this is problematic because we often have insufficient information in library catalogues with which to compute PD status with certainty — in particular, author death dates are frequently absent. Thus, it will be necessary to fall back on some approximate method.
For example, we can use base PD status on simple publication dates: if a book was published, say, 140 years ago it is very likely it is in the public domain — for it to be in copyright its author must have lived more than 70 years after the book came out (remember copyright lasts for life plus 70 years in the EU)! Conversely, any publication less than 70 years old is almost certainly not in the public domain. For periods in between we can assume some proportion of publications are PD starting close to zero for more recent items and rising towards one for older ones. A calculation along those lines is provided in the following table:
| Start | End | Items | % PD | Number PD |
|---|---|---|---|---|
| 1400 | 1870 | 389291 | 100 | 389291 |
| 1870 | 1880 | 50564 | 95 | 48035 |
| 1880 | 1890 | 66857 | 90 | 60171 |
| 1890 | 1900 | 66883 | 80 | 53506 |
| 1900 | 1910 | 70360 | 50 | 35180 |
| 1910 | 1920 | 60489 | 30 | 18146 |
| 1920 | 1930 | 78670 | 10 | 7867 |
| 1930 | 1940 | 90576 | 5 | 4528 |
| Total | 873690 | 0.71 | 616724 |
<
p class=”caption”>Number of UK Public Domain Publications (Based on Cambridge University Library Catalogue Data)
So, based on the assumptions regarding PD proportions given in the table, there are somewhat over 600 thousand PD books according to the holdings of Cambridge University Library (of which just over half, approx 390k are from before 1870). The British Library dataset is approx 4x as big as Cambridge University Library and the numbers scale up roughly proportionately giving a total of over 2.4 million items.
Of course this is a fairly crude approach based purely on publication date and it be improved in a variety of ways, most notably by using the authorial birth date information which is usually present in catalogue data (we can also use death date information where present). This will be the subject of the next post. (2009-07-17 the post is up here).
2 Responses to The Size of the Public Domain
Leave a Reply Cancel reply
-
Categories
- *nix
- Academic
- Activity Updates
- Books
- Cinema
- Code
- Command Line
- Copyright
- Culture and Society
- Data Digging
- Economics
- EUPD
- External
- Filesharing
- Governance
- Hacks
- Happiness
- Hardware
- History
- Innovation and Intellectual Property
- Intellectual Myths
- Javascript
- Knowledge Systems
- Miscellaneous
- Musings
- Notes
- Open Bibliographic Data
- Open Data
- Open Knowledge Foundation
- Openness
- Own Work
- Papers
- People
- Photos
- Platforms
- Poetry
- Policy
- PSI
- Python
- Quote
- RDF
- Shuttleworth Fellow
- Software
- Sysadmin
- Talks
- Transaction Costs
- Work In Progress
-
Articles
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
- June 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- June 2004
- May 2004
- March 2004
- October 2003
-
Meta





[...] had occasion recently to frequently work with “dates” that come in a lot of shapes and sizes [...]
[...] follows up my previous post. Here we are going to calculation public domain numbers based directly on authorial birth/death [...]