Thoughts about open data and the future of librarianship

voyantThese are most common words I have used on this blog since I began writing it back at the beginning of October.  I feel, looking at this representation exported from Voyant tools, that I must have been on the right track.  It was actually even more interesting, from a writerly point of view, to leave a few of the stop words in, as what resulted gave me an indication of boring words that I tend to overuse. “Particularly” seems to be one of them… I might have to give thought to this before I hand in my final essay!

In the final lab of DITA we attempted to obtain Twitter metrics reports, however, given that I have only Tweeted a handful of times since the course began, my results were singularly uninteresting, and I couldn’t seem to get the program to work properly so I won’t publish the results here. This is not to say I haven’t been using Twitter during this time.  As well as following up my classmates’ links and suggestions, I have also used it to track the protests post the Ferguson verdict, read feedback and comments from students using the library I work in, found out the details of various incidents I have passed while cycling to work and discovered details about what some of my favourite bands are doing.  All of this data I have generated and accessed has covered vast swathes of my life and has made me realise how useful open access to data, via APIs and beyond, can be for people developing apps to help us get on in our lives.  It also scares me a bit, when you look at the ways that companies such as Uber are using data to invade people’s privacy.

The move towards open data generated from research has been prominent in the university in which I work – well – talk of open data has been prominent, whether or not the university eventually sets up a repository similar to the institutional repository we currently have remains to be seen.  Increasingly it is being recognised that researchers providing their raw data will, as the Open Data Initiative says, contribute “economic, environmental, and social value” to society.  If research is publicly funded, it stands to reason the public should have access to results.  And as I mentioned before, the ability to utilise this kind of data, to mash up different applications really can’t be underestimated, considering the kinds of things that people are creating, such as this woman’s mission to make it easy for people to locate public toilets in Denmark. Someone needs to do that for London!

What has been interesting, and slightly uncomfortable for me throughout the last 10 weeks of DITA is that, while on the one hand I can definitely see the need for librarians and information specialists getting a handle on these kinds of technologies, on the other hand it seems to run very parallel to what companies and corporations are doing (such as Uber).  The difference being, I guess, that we’re not in it to (necessarily) make money off of people, but much of it does feel a bit like we’re learning business analyst tools.  In fact, a friend of mine recently got a job working with “big data”, and her company does market research and the like for various big companies.  We’ve been able to share a lot of knowledge in the last few weeks, and while I realise it is reactionary (and probably a bit technophobic) of me to feel uncomfortable, there is a bit of “I am training to be a librarian after all, not help car companies sell cars!”.

But I think this will be the (future) role of librarians; to help the public to gain/retain control of their own information and understand what is being done with their data, as well as navigate copyright limitations and in an academic context, promote useful data analysis tools to students. To that end I am pleased to have been given these leads to follow up and look forward to integrating them into my work within the library..


Word clouds: “mullets of the internet”? What would Tupac say?

The description of word clouds employed by Jeffrey Zeldman as the “mullets of the internet” made me laugh.  I’ve never found them particular attractive to look at.  That said, using tools like Wordle, Many Eyes and Voyant was fun and, like the Altmetrics doughnuts made the data in the otherwise eye-strainingly dull Excel spreadsheets much easier to get my head around, though I’m not sure how useful they are beyond getting a very general picture of a situation.

That said though, we used data collected from our altmetrics work in the last DITA lab and a few things were revealed to me.  Firstly, using Altmetric I performed a keyword search for “Aotearoa” as I mentioned in my previous blogpost. When I  looked at the results produced by Altmetric, it seemed that some of the journal articles/blog posts/Tweets etc it gathered did not contain the word Aotearoa, and it felt like the results were a bit random.  However, using Voyant on the titles from the Altmetric data exported to Excel, resulted in the following word cloud:


with the word “Aotearoa” (as well as “Zealand”) showing very prominently, which lead me to realise that I probably dismissed my Altmetric results too quickly, and on further inspection they were more relevant than I thought – and the word cloud more than just a colourful mullet! (And yes, I did forget to include “stop words” which is why “and” and “of” appear so frequently – oops).

I also gathered Altmetric data using the keyword “Bicycle” and exported these to Voyant as well. This screenshot shows the kinds of information Voyant pulled out for me:

BicycleVoyantOne of the most useful features is being able to select a word, in this case “helmets” from the corpus, and on the bottom right of the screen, the instances of this word being used are shown, surrounded by the context of the sentence (which can be expanded). This is useful if the word the researcher is looking for is more ambiguous than “helmets” or “Aotearoa”, and could perhaps be mentioned in a context irrelevant to the thing being studied.  This more granular way of looking at the data ensures that the researcher is getting an accurate picture of how the words are being used in the text, with minimal effort.

I still can’t say I am convinced by the usefulness of the word cloud, or even 100% sold on text analysis when looked at in this quantitative way.  I did my undergraduate degree in English literature, so I guess Franco Moretti’s concept of distant reading which employs graphical and quantitative visualisations of a text is a new one to me (though would have been REALLY helpful when writing those essays on Victorian literature!).  But I was interested in Julie Meloni’s blogpost at the Chronicle of Higher Education regarding the use of word clouds for engaging students.  I used to work in a youth library, and many of the teenagers I worked with were very interested in poetry and expressive language.  “The Rose that Grew from Concrete” by Tupac Shakur was (perhaps unsurprisingly) one of the most popular books in the library.  In a bid to get the kids to engage with how poems are written, I photocopied some of Tupac’s poetry and whited-out some of the more visceral words.  The kids then had to imagine/guess/decide what words should be used where the spaces were.  I just used Voyant on poems from “The Rose that Grew from Concrete”, and I think that this would’ve been a hit amongst all those emo teenagers at the library: