Distant Reading

In this post, I will discuss Linn’s loss of innocence with the use of Voyant Tools. While messing around with Voyant Tools, I was able to see the correlations between a lot of different words. The first word I looked at was “shot”. As a soldier in the civil war, I expected Linn to have seen a lot of gruesome battles. “Shot” was the first word that came to my mind when thinking of the Civil war. However, when searching for “shot”, there was 11 times that this word come up in the diary. The only places that the word “shot” was significant was when Linn talked about people getting wounded. Though the word was not often, it did show Linn’s loss of innocence as he witness more injuries and death. Lenig’s and Buskirk’s injuries seemed to take a toll on Linn as he mentioned this twice in his diary, on April 15th and April 19th. Since the word was not used as often as I expected, I moved on to “sick” and “hospital”.

Screen Shot 2014-09-24 at 10.09.51 PM

Remembering Linn mentioned a lot of people getting sick throughout the whole diary, I decided that “sick” and “hospital” would give a better idea of Linn’s innocence. Using Voyant Tools, I was able to see the connections between “sick” and “hospital”. “Hospital” was used 16 times and “sick” was used 38 times. Looking at the graph, we can see that the frequencies of these words are very similar. This directly correlate with the horrible conditions of the war. As the war progress, the use of “hospital” increased. This proved that injuries and illness took a big part in this war and on Linn’s innocence.

The connections of these wordScreen Shot 2014-09-24 at 11.07.12 PMs brings me to a new question: What words are connected with these? Using Links, I tried to see how they intertwine. However, seeing that they never directly intertwine, I decided to use different variations of “sick”. I used “sickness”, “disease”, and “illness”. Finally, with “disease”, I was able to find a direct connection with “hospital”.

Seeing these connections, I learned that distant reading can be very useful in understanding a large document in a short time. Using these tools, we can process a large amount of informations that would take a lot of work and time otherwise.

 

Hypothesis and Distant Reading

I could not think of anything to support or contradict the hypothesis given. So instead I used Voyant tools to choose a word that would pose a question or two for me to think about. I started with “Cossack” because I thought it would be interesting to see which words linked to it, and then I though “sick” would work really well connecting to it.

So rather than finding data to support or contrast Professor Jakacki’s hypothesis, I made my own hypothesis by looking at the two words, “Cossack” and “sick”. My hypothesis is that I think that the words “Cossack” and “sick” will consistently appear together because James Merrill Linn wrote how people were always getting sick aboard the ship. This was either because they got seasick out on the rough ocean, or just since they were all stuck in a tight space together for long period of time, diseases spread faster.

First, I started by seeing how many times the different words appear in the text. “Cossack”, appears 46 times, while “sick” shows up only 38 times. Then I decided to look at the relative frequencies of the two words compared to onScreen shot 2014-09-24 at 1.49.53 PMe another. The first thing I noticed was how extreme the frequencies were, they never really remained flat for a long period of time. The result of comparing them was also not what I expected. In segments 1, 2, 3, and 4 the two words did not have even slightly similar word trends, they were complete opposites. However, in segments 5, 7, 8 and 9 they had either identical word trends or very similar ones, as shown below. And although this somewhat supports my hypothesis, it does not completely.

The most uses of the word “sick” were in the third segment of the text, where almost all of the men were getting sick from the rocking of the boat. In one Screen shot 2014-09-24 at 9.31.33 PMdiary entry, “sick” was used 4 times, and in the one right after it was used 5 times. “Cossack” is used, as I expected, most often when they are aboard the ship, but also when they are about to go on it or right after they got off the boat. The word frequencies overlap especially in segments 5 and 8 because that is when the men in the army and their prisoners were getting very sick aboard the ship.

Distant reading was very helpful in this situation with looking up those two words. Although I did not get a clear answer to whether my hypothesis was accurate or not, I think it was in the broad sense, and distant reading definitely helped to prove that.