Text analysis using the Old Bailey API & Annotated Books Online

The Old Bailey Online provides digitised proceedings of the Old Bailey from 1674-1913.  It offers a general search function, however using the open API allows the user query to results in a more specific way, “undrilling” to modify a query, or breaking the query down into further subcategories.  Using the API also allows results to be exported to the online reference management software Zotero and also to Voyant for further visualisation.

For my search, I used the keyword “Camberwell” (where I live), with gender of the defendant set to “female”, and punishment category set to “Death”. This returned 8 (highly interesting!) results.

OldBaileyScreenshotsegmentfromoldbailey

 

I exported these texts to Voyant, and the resulting word cloud looked like this:

OldBaileyVoyantThe prominent words,  “child” “mr” “mrs” “death” “house” “room” “seen” “know” “said”, paint an eerie picture of domestic mishap, which would definitely be a good starting point if you were looking for inspiration for a Victorian murder mystery. Aside from that, the word cloud doesn’t give you the kind of information you’d expect a researcher to be looking for while using this tool, i.e. you don’t get any kind of picture about what kinds of crimes these women committed or the kinds of evidence presented at court. This does seem to be one of those situations Jacob Harris mentions in his blog post at Nieman Lab wherein the use of the word cloud doesn’t provide much in the way of insight.

I was interested to read that these court proceedings were digitised through a process of text rekeying. Earlier texts were manually typed twice by two different typists, and then the transcripts were compared by a computer, with editing performed manually.  Later texts were keyed once, with the second version being created using OCR software, and the texts once again compared and manually corrected.  In my place of work I use OCR on PDFs uploaded to Moodle in order to make them accessible for visually impaired students so that they can use text-to-speech software.  This is a time consuming process, especially if the original text is old and the print quality not very good (we have students studying Olde English and Witchcraft, which and the OCR software really doesn’t like their texts).  In some ways it was pleasing to know that there just *isn’t* the technology out there to get it right at the moment to make this task easy, as demonstrated by the laborious processes performed by the people behind the Old Bailey Online.  I am glad to know that in my place of work we aren’t just wasting our time with all our manual editing, at present it seems this is the only way!

Later in the DITA lab, I looked at Universiteit Utrecht’s Digital Humanities Lab, specifically at their text mining research projects, and chose  to explore the project Annotated Books Online. This project digitises early modern books with handwritten annotations, marking the text up in order to separate out the annotations themselves for closer inspection.  Annotations can be highlighted with different colours, and have transcriptions added to them.  Well, that was the theory anyway.  The first time I used ABO I could highlight the annotations and get them to change colours, however, I haven’t been able to since for some reason.

ABOThis research project really appealed to me, I have always found marginalia interesting, and I like that the present-day reader can, in a sense “interact” with the annotator of the past by “doing stuff” with their scribblings in the margin.  Considering these texts are quite old and no doubt delicate, it’s a treat to be able to manipulate them in this way (well, it would be if I could get the annotation features to work for me again!).