The Digital Travels of John Mandeville: 2

Time to practice my command line skills again! What is Mandeville up to now?

Part 2: Mandeville Makes some Friends

Dog People.jpgDuring the past few sessions of Digital Research Methods, we learned how to build up search queries so we could download batches of files with wget. I thought I would try to get some more texts from the Internet Archives to join my copy of “The Travels of Sir John Mandeville.” I quickly ran into a problem. Unlike the class example, where we searched for “Natural history – Juvenile literature,” there was no category called “Medieval travel literature.” I was able to find a copy of Marco Polo’s travels to download, but it wasn’t listed under a “travel writing” category, so I couldn’t look for other similar books. It seems that not all of the materials uploaded to the Internet Archive have the same kinds of metadata. And I didn’t want to download everything with the keyword “travel,” because I only wanted the medieval stuff. So I couldn’t actually find a corpus of documents to acquire.

Nevertheless I did build a query to download the two volumes of Marco Polo. They’re huge books, with over 10,000 lines at the beginning of Volume 1 just in introductory material, tables, lists of plates and historical essays. I went to trim the header and footer, remove punctuation, reduce capitalization and separate words into lines, but then I realized that I didn’t need to do that in order to use swish-e. I also realized that trimming the header and footer wouldn’t get rid of extraneous material, as the book contains footnotes at the end of each chapter. (If I want to analyze this book, but I only want the text of the actual Marco Polo narrative, how can I get rid of the notes? Is there any way to do this other than to skim it and manually pick out line number ranges for deletion?) Instead I took the original Mandeville text, put it into a folder with the Polo books, burst them into 20-line sections, renamed them and set up a swish-e configuration file to index them. Now instead of using grep, I can use swish-e to show me the relevance of my search results, as well as to show me the precise location in each book of my search terms. The only problem is that swish-e can’t search across the 20-line sections, only within them. That might end up being an issue.

One thing that I would like to know how to do is to share folders and files between my virtual machine and my Windows laptop. That way I would be able to move documents I currently have into the virtual machine to experiment with. It would also be interesting to see what results I would get if I uploaded my travel book sections into Voyant, a set of tools I just started to learn. I like the visualizations Voyant creates with texts, and I think it would help me understand what I’m doing on the command line if I could visualize it somehow.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s