Time to practice my command line skills again! What is Mandeville up to now?
Part 2: Mandeville Makes some Friends
During the past few sessions of Digital Research Methods, we learned how to build up search queries so we could download batches of files with wget. I thought I would try to get some more texts from the Internet Archives to join my copy of “The Travels of Sir John Mandeville.” I quickly ran into a problem. Unlike the class example, where we searched for “Natural history – Juvenile literature,” there was no category called “Medieval travel literature.” I was able to find a copy of Marco Polo’s travels to download, but it wasn’t listed under a “travel writing” category, so I couldn’t look for other similar books. It seems that not all of the materials uploaded to the Internet Archive have the same kinds of metadata. And I didn’t want to download everything with the keyword “travel,” because I only wanted the medieval stuff. So I couldn’t actually find a corpus of documents to acquire.
Nevertheless I did build a query to download the two volumes of Marco Polo. They’re huge books, with over 10,000 lines at the beginning of Volume 1 just in introductory material, tables, lists of plates and historical essays. I went to trim the header and footer, remove punctuation, reduce capitalization and separate words into lines, but then I realized that I didn’t need to do that in order to use swish-e. I also realized that trimming the header and footer wouldn’t get rid of extraneous material, as the book contains footnotes at the end of each chapter. (If I want to analyze this book, but I only want the text of the actual Marco Polo narrative, how can I get rid of the notes? Is there any way to do this other than to skim it and manually pick out line number ranges for deletion?) Instead I took the original Mandeville text, put it into a folder with the Polo books, burst them into 20-line sections, renamed them and set up a swish-e configuration file to index them. Now instead of using grep, I can use swish-e to show me the relevance of my search results, as well as to show me the precise location in each book of my search terms. The only problem is that swish-e can’t search across the 20-line sections, only within them. That might end up being an issue.
One thing that I would like to know how to do is to share folders and files between my virtual machine and my Windows laptop. That way I would be able to move documents I currently have into the virtual machine to experiment with. It would also be interesting to see what results I would get if I uploaded my travel book sections into Voyant, a set of tools I just started to learn. I like the visualizations Voyant creates with texts, and I think it would help me understand what I’m doing on the command line if I could visualize it somehow.