It seems that every week I’m introduced to more programs that will supposedly make my life easier. Sometimes this is the case, but not all of them have turned out to be that way. In some cases, I think that has to do with flaws in the programs; but at other times, I wonder if it’s just my lack of experience that prevents me from taking full advantage of a new opportunity.
We were introduced to a variety of digital tools this week. This includes topic modeling programs, Wordle/Tagxedo, Google Ngrams, and Voyant Tools. Each of these programs has a different purpose and produced a different reaction from me.
Perhaps what impresses me the least is Google Ngrams and the Java Topic Modeling Tool. In the case of the ngrams, it’s difficult for me to picture how I could make use of them. At first they struck me as quite untrustworthy and of little use. I have learned a little more after reading this page, but I still have some reservations. The claims that I would feel comfortable making using the Ngram Viewer are limited. I really need to spend more time trying to understand it before I use it for anything substantial. The same is true of topic modeling. This is one of the hot phrases in digital humanities, but why? The topics generated are unwieldy and difficult to understand. I want to know more about how exactly the computer “finds” them. I got confused when my classmates uploaded the same texts into the tool, asked for the same number of topics with the same settings, and got different results. I was shocked to learn that topic modeling is a “random” process! How then can I possibly use topic modeling to make any claims? I can’t guarantee that I can generate the same results, I can’t guarantee that anyone else can generate my results, and it’s quite possible for someone to use the exact same program to generate results that undermine the current data. Given these problems, I’m struggling to find something redeeming about topic modeling. As with the Ngram Viewer, I think I would need someone who uses this technique often to show me how and when it is useful. And I would want to be very careful not to make claims with these tools that they cannot actually support (as Scott Weingart warns in this post).
I was a lot happier with Wordle/Tagxedo, which were used to give the class a quick demonstration of word clouds. I like them because they’re very easy to use. They can’t tell us very much, but at least I firmly know what I can use them to say.
In this week’s testing, Voyant Tools definitely takes first prize. It took me a little while to wrap my head around, but that got easier after watching these tutorial videos. I was especially happy to know that the appearance and function of Voyant is largely customizable. It is sophisticated, but not impossible to learn, it gives me one place to do multiple kinds of analysis, and it gives me the options that the other programs didn’t. I tested Voyant by customizing the skin, then uploading three samples of my written work: a paper about working-class churches in early-twentieth-century Hamilton, the proposal for that paper, and an oddball, a paper about the historiography of Flavius Josephus. With Voyant I was able to easily spot the discrepancies between the Josephus paper and the others. I was able to compare the three documents together, or to explore each one separately. When I looked at my working-class churches paper, I was able to track my use of a particular author’s work by clicking on their last name and noting where in the document instances of that name peaked. I noticed a few flaws. Some of the tools I attempted to use needed a plugin that I don’t have, but that wasn’t exactly Voyant’s fault. Word counts were thrown off because of my footnotes (e.g., the computer didn’t recognize that “immigrants14” should be in the same group as “immigrants”). But at least these errors were easy to detect. Finally, something I can really use!
So what can I use Voyant Tools for? Firstly, I can use it to quickly understand a new corpus of documents. Secondly, I can use it to find patterns within large amounts of text – and to spot discrepancies that I should investigate further. (That’s exactly what this researcher did to find trends in a database of post-apocalyptic fiction.) I would still be careful to make sure I don’t overstate any claims I make using Voyant, but the fact that I can look between the visualizations/statistics it produces and the original full text makes me feel like I have a better understanding of what it’s doing.
There are plenty of online examples of researchers and teachers using Voyant. I do wonder if it can have other applications. Could teachers use it not only for research, but to detect illegal student collaboration in papers? Could businesses analyze large quantities of customer reviews and detect specific areas for improvement? Can Voyant contribute to public engagement with history, or is it too technical and too far removed from public interests to be used that way? There are plenty of programs out there offering new possibilities to the world. The problem is, not all of them are all they’re cracked up to be, and there are too many people like me who don’t quite understand how to use the good ones to their full potential.