Friday, 14 July 2017

How much newspaper content is digitized?

There is a lot of digitized newspaper out there in the world but it still seems to be a small fraction of the total newspaper collections worldwide (possibly less than 5% of all English language content by my guesswork). This blog highlights some sources and seeks more information on how much newspaper is out there and yet to be digitized?

A quick look at the Library of Congress Historic American Newspapers site shows 154,205 titles available and 12 million pages of searchable newspaper digitized. The British Newspaper Archive is showing 20 million pages at present.

And yet... I feel these digitized collections are still only a fragment of the newspaper resources that are out there to be digitized. They reflect the challenges of building a digitized corpus where there is so much printed material and so few resources for digitization. In the British Library alone there are approximately 450 million pages of printed material with roughly 18 million pages digitized (Tweet from Luke McKernan). If there is that much left to do at The British Library then how much else is there out there to do? The simple answer is we don't really know and it is confounded by a number of issues.

These confounding issues are best highlighted in the European Newspaper Survey Report from the European Library / Europeana Foundation authored by Alistair Dunning in 2012. The report states:

"Over half of the libraries (27 out of 47, 57%) have a cut off date beyond which they will not publish digitised newspapers on the web. Most frequently, this is based on a 70 year sliding scale, meaning that content after 1942 is inaccessible in digital form. 23% (11 out of 47) had an agreement with a rights organisation so that in-copyright digitised newspapers could be published. However, this tended to be restricted to individual titles rather than collective agreements for complete collections."

"There can be no denying the extent of newspaper digitisation undertaken in Europe. Libraries managed to identify nearly 130m pages of digitised content comprising nearly 24,000 titles (129,041,663 of pages and 23,987 titles were the precise figures obtained). The number is likely to be much higher in reality. Because of the vast size of their collections and the cursory nature of their cataloguing, there were six libraries unable to give a number of titles, and seven who could not give a definite number of pages...

The number of pages digitised is impressive. Yet where it was possible to compare the number of titles or pages digitised against the actual size of the physical collections, the ongoing challenge of creating an entirely digital library of newspaper holdings was reinforced. Only 12 (26%) of the libraries had digitised more than 10% of their collection (either in terms of titles or page numbers), and only two of those had done more than 50% - the consortium of libraries represented by the Biblioteca Virtual de Prensa Histórica (58% of their pages were digitised) and the National Library of Turkey, unique for having digitised its entire collection of 800,000 pages and 845 titles."

There are other sources of data as well on newspaper digitization. The Australian National Library has often led the way on newspaper digitization and opening up collections through crowdsourcing. Their resources page contains strategic indicators and plans illustrating the decision making process for digitizing newspaper. Trove states: "It has been estimated that approximately 7,700 newspaper titles have been published within Australia. Trove provides access to over 20 million pages from over 1000 Australian newspapers". Does this suggest 1/7 of Australian newspapers are digitised - there is really no good estimate that can be inferred from these numbers other than "there is more to do".

Newspaper digitization is a valuable public good and there are a number of gaps, not least in contemporary newspaper resources where libraries are constrained by copyright or legal deposit restrictions.

So some questions:
  • Do we have a critical mass of newspaper digitized now?
  • How much newspaper content (as a proportion of the total print content) is needed before we can say we have enough to satisfy (let's say) 80% or more of the public information need for this material?
  • Are questions based on volume and numeric measures in any way a useful way to address the questions of what is needed and how much is enough? Are they just too blunt an instrument to serve as useful measures of information need satisfaction?
  • What are the true indicators of the value of digitized newspaper?
If you have any ideas, sources or comments please feel free to add them here.

Thanks to @lukemckernan @eurnews @Ajprescott and @conal_tuohy for their Twitter comments that have fed into this blog post.


4 comments:

  1. Simon I have no current data but my PhD was in this field (nearly 20 years ago) and it's an area I'm interested in. Microfilming newspapers was hugely unpopular with researchers. You can find my thesis in both the Aberystwyth and now City university's repository. It's called Newspapers and Historical Research.

    ReplyDelete
    Replies
    1. Great thanks - nice to have the background perspective! See also Paul Gooding's book Historic Newspapers in the Digital Age (2017) for a current perspective also.

      Delete
  2. The Center for Research Libraries maintains an excellent directory of other newspaper digitisation efforts around the world, sorted by country. Although likely not complete, nor would it ever be, it is the best listing I know of.

    http://icon.crl.edu/digitization.php

    ReplyDelete
  3. Some data of Municipal Archive of Girona*, if they could be useful.

    - 2.835.857 p. digitised from 28 titles. That means +/- 90% newspaper collection.
    - Titles are from 1795 to 2015 (with continuity from 1887).
    - Language: Catalan and Spanish
    - 2016: on line consultation of 465.004 p.
    - The webpage of newspaper content searching received 27% of global consults of our website. Significantly more than our thousands of images or textual documents on line.

    http://www.girona.cat/sgdap/cat/premsa.php
    http://www.girona.cat/sgdap

    http://www.europeana.eu/portal/en/search?f%5BPROVIDER%5D%5B%5D=AthenaPlus&q=DATA_PROVIDER%3A%22Ajuntament+de+Girona%22

    Thus, it is a very relevant issue for us.

    * Girona is a municipality of Catalonia, 100.000 inhabitants.

    ReplyDelete