Tuesday, 7 July 2015

Digitization links with Digital Humanities

This post sketches the relationship between digitization and Digital Humanities. Considering the processes and how these map to John Unsworth's Digital Primitives and the research benefits of digitization for the digital humanities.

In the Oxford English Dictionary, digitization refers to the ‘action or process of digitizing; the conversion of analogue data (esp. in later use images, video, and text) into digital form’ (OED Online, 2015) .This definition provides the focus for this posting: which views digitization as the intricate and multi-faceted material process of converting the analogue forms of information storage into digital bits. The OED traces the term digitization back to its use in the 1950’s in relation to computing science and can be spelt with an “s” or a “z” depending on a person's linguistic origins and preferences. Digitization is sometimes used interchangeably with ‘digital imaging’ or ‘scanning’, but these are merely mechanisms within the process for capturing a digital picture that is sampled and mapped as a grid of squares known as picture elements (pixels). Many other processes exist within digitization, each could also be characterised as a decision to be made, such as for instance:
  • Assessment and selection of originals for digitization
  • Feasibility testing, costing, and piloting
  • Copyright clearance and intellectual property rights management
  • Preparation of original materials, including conservation
  • Benchmarking of processes and technologies
  • Digital capture, including scanning, digital imaging, OCR, digital recording for audio/video
  • Quality assessment ands assurance
  • Metadata for discovery, data management, preservation and administration
  • Storage solutions for long-term preservation and sustainability of the digitized content
  • Delivery mechanisms to get the end digitized content to the user
  • Workflow processes to manage the flow of activity effectively
  • Project management (of crucial importance) to ensure time, money, risk and deliverables are well managed
This non-exhaustive list are just some of the overview processes and no digitization activity should proceed without knowing the plan and technologies for each of these processes. Underlying these will be a whole range of others issues specific to each type of original material, to each information goal desired from digitization and the aimed for functional outcomes. Issues that frequently need answering before starting digitization include: what is appropriate handling for the originals; how much physical and format variation is there; are the catalogues and indexes adequate; what skills and infrastructure are needed?

Because digitization relates to a process it is often treated as a neutral technology or naturally beneficent activity. Consideration of the varied facets and components of these processes demonstrate that digitization has so many aspects that its impact is intrinsically linked to the wider context of its application. Digitization, especially of cultural heritage, brings ‘...a curious and unprecedented fusion of technology, imagination, necessity, philosophy and production which is continuously creating new images, many of which are changing the culture within which we live’ (Colson and Hall 1992: 75). Michelle Pickover, curator of manuscripts at the University of the Witwatersrand in South Africa, argues that ’Cyberspace is not an uncontested domain. The digital medium contains an ideological base – it is a site of struggle’ (Pickover 2005). It remains important to always maintain a critical appraisal of the selection criteria for materials to digitize and the way that choices in terms of digitization processes affect digital humanities research opportunities.

Memory institutions (such as libraries, archives, museums) have historically focused upon archiving, managing and preserving what can be termed containers of information: whether boxed letters, reports, documents, paintings, film or photographs. These collections frequently form the backbone of primary sources used by humanities scholars in their research and thus are an important corpus for digital humanists also. These physical, primary carriers of recorded information and knowledge content is a form of semantic memory and was the main focus of efforts to enable description and discovery by archivists and librarians in the past.

One of the core benefits of digitization for the digital humanities has been the growth in the existence of digital content that may be investigated, parsed, re-used and mined for humanistic research purposes. A digitized resource should thus enable, from a digital humanities perspective, a ‘list of functions (recursive functions) that could be the basis for a manageable but also useful tool-building enterprise’ as described by John Unsworth, namely (but not exclusively):
  • Discovering
  • Annotating
  • Comparing
  • Referring
  • Sampling
  • Illustrating
  • Representing (Unsworth, 2000: 1)
These functions reflect the same uses of digitized content that the general public desire as much as a scholar. Thus, digitization for the digital humanities should address these functional requirements for scholarly reasons and because they will also attract a wider base of use and appreciation.

Digitization takes primary sources beyond the book and the laboratory.  Digitized sources bring into the classroom simulated practical experimentation in science as well as rare and fragile artefacts to support teaching.
“The First World War Poetry Archive makes it easier to discuss the creative process in greater detail with students than was traditionally possible. My teaching is enhanced because there is much more primary source material freely available, especially the full colour images of all the manuscript variants of a poem. This represents a significant benefit to students, teachers and researchers.”
Dr Stuart Lee, University of Oxford
The increasing availability of digitized resources allows educational institutions to provide students with more varied, more accessible and richer teaching materials than ever before. This encourages a more exploratory, research-based approach to teaching and learning. Entirely new kinds of topics and courses can be studied, new modes of assessment are possible, and students are given a richer educational experience.
“Not only an invaluable resource for studying Chopin’s music, but potentially a means of studying music in general—perhaps with application beyond music, too.”
Professor Nicholas Cook, University of Cambridge on the Online Chopin Variorum Edition
Research benefits accrue when we invest in deepening our understanding of the world and build upon the intellectual legacy of previous generations. Digitized resources continue to transform the research process. The researcher can now ask questions that were previously not feasible; they can engage in a new process of discovery and focus their intellect on analysis rather than data collation.

Digitized resources transform the research process:
  • New areas of research are enabled.
  • Rich research content now widely accessible through innovative interfaces and friendly research tools.
  • The researcher can now ask questions that were previously not feasible.
  • Researchers can engage in a new process of discovery and focus their intellect on analysis rather than data collation.
A bedrock of scholarship is the ability to share, discuss and reference thoughts, ideas, and discoveries. Scholars require access to the accumulated knowledge of human endeavour to move research and discovery forward rather than in circles.

Costly collections of early books like Early English Books Online (EEBO) and Eighteenth-Century Collections Online (ECCO), together offering some 50 million pages of works in English published between 1475 and 1800, can be made widely accessible to students and researchers in institutions large and small.
Early English Books Online itself has transformed research into early English literature. It has democratised the research process by extending this facility to individuals and institutions without easy access to specialist libraries.”
Dr Sarah Carpenter, University of Edinburgh
“I would not be able to do my research without the use of EEBO. Moreover, I am looking at lots of medical recipes and need to assess quickly what the ingredients were thought to be useful for. The ability to search full texts to find these ingredients with ease and speed is crucial to my work. It saves me a lot of time not to have to read the whole document to find one herb.”
Jennifer Evans, PhD Medical History student, Exeter University
Primary source material is vital to scholarly research but in some cases the source material may have been artificially separated and physically distributed over the whole planet. In the past scholars travelled to libraries around the world if they wanted to compare sources. This was costly, time consuming and inefficient. In addition to reunifying primary sources, digitized resources enable new tools to facilitate research once they can be brought together again.

Jane Austen’s Fiction Manuscripts Digital Edition presents all Austen’s manuscripts to be viewed side-by side for the first time in 150 years.
“Jane Austen’s Fiction Manuscripts Digital Edition offers unprecedented opportunities for new scholarship, particularly in exploring the creative laboratory of her novels, so far an under examined area of Austen studies. It also makes the manuscript sources freely available to the wider public.”
Professor Kathryn Sutherland, University of Oxford
Chopin’s First Editions Online unites all of the first impressions of Chopin's first editions in an unprecedented virtual collection, thereby providing direct access to musicians and musicologists to the most important primary source materials relevant to the composer's music.
“This digital resource is the only complete collection of the Chopin first editions, which otherwise are scattered across the globe... users have at their fingertips source materials which they otherwise would never have sight of, or only with considerable difficulty and concomitant expense.”
Professor John Rink, Royal Holloway, University of London
Oral histories are an especially powerful means of connecting personal stories with digitized content to create a wider contextual research and knowledge framework.
“With Harry Patch's death, any direct living connection to these [First World War soldiers'] records has finally been severed and marks the passing of this significant period in British military activity into history. Digitising these records makes them accessible to people around the world, many of whom had ancestors who served in the ‘war to end all wars’, and who will now be able to discover so much more about them.”
William Spencer, Military Records Specialist at The National Archives
For a fuller exploration of the benefits of digitization then please see the evidence provided in my work below:


Colson, F and Hall, W (1992) Educational Systems: Pictorial Information Systems and the Teaching Imperative. In, Thaller, M (ed.) UNSPECIFIED Images and Manuscripts in Historical Computing , Scripta Mercaturae Verlag, 73-86.

"digitization, n.". OED Online. June 2015. Oxford University Press. http://www.oed.com/view/Entry/240886 (accessed July 07, 2015).

Pickover, M. (2005) ‘Negotiations, Contestations and Fabrications: The Politics of Archives in South Africa Ten Years After Democracy’, Innovation, 30, 2005, pp. 1–11.

Unsworth, J. (2000) Scholarly Primitives: What methods do humanities researchers have in common, and how might our tools reflect this? Presented at the Humanities Computing: formal methods, experimental practice symposium, King's College London, London, 2000.

All other quotes from:
Inspiring Research, Inspiring Scholarship: The Value and Impact of Digitised Resources for Learning, Teaching, Research and Enjoyment, JISC, 2011. Available at http://www.kdcs.kcl.ac.uk/innovation/inspiring.html


  1. Excellent post and information Simon. I would only add that the interface by which users access digitized content is becoming increasingly important. So much so that if you you do an outstanding job with imaging, cataloging, workflows and core technologies but have a bad interface the utilization of your digitized collection might be minimal. So much cultural material ends up on tumblr and pinterest not because they are great retrieval mechanisms but because they are visually compelling and socially engaging. They also offer users the ability to see materials from varied sources side by side while most digitized collections are singular silos without the ability to be searched or compared with similar materials in outside collections.

    1. Thanks Caleb, very good point. You highlight a current burgeoning issue that is maybe worthy of another blog post - that of the difference between the dataset and the web interface and how much responsibility does a digital humanities project have to maintain and provide both and at high levels of sophistication. It's a knotty problem as it engages with the cost of sustaining and the intellectual responsibilities of those who have created content.

  2. Every point very nicely explained. Keep up the great work.
    Kristen from Digitization Embroidery