On Distant Reading

For my investigation into the Linn letters I used a set of word I initially hypothesized to have some relation to the internal changes within Linn himself. The Jakacki hypothesis states, that a profound shift can be found half-way through the text that reveals a deep change and loss of innocence for Linn. Using the words received, brother and mother I was able to gain some insight into Linn’s relationship to his own family and Linn’s level of acknowledgment and mutual interest among those closest to him. The visualization revealed little other than that mention of family increases near the beginning and end of the letters.  I find little evidence to either verify or refute the original hypothesis, even carefully chosen words reveal few patterns as do carefully selected sets of words. I believe that Linn’s writing style makes it exceptionally difficult to derive deeper trends within his texts. My own investigation didi reveal that he rarely mentions any family member other than immediately after he left and before he came back, hardly evidence for the hypothesis.

The technique of using transcribed text and visual analysis tools might be helpful in some situations. Unfortunately, I have found little evidence that those methods reveal any more information in this particular case. This does not mean that distant reading methods are entirely ineffective, as the cirrus cloud did illustrate the relationships between words and their frequency. Small trends like the appearance of the word “boat” and “wounded” did match up with the general shape of Linn’s experiences. Overall however, this particular exercise revealed little evidence to support the hypothesis. grab




Blog Post 2:Making Inferences and Drawing Conclusions

While trying to discover the validity of Professor Jakacki’s hypothesis, a new question came to my mind regarding the words “board” and “Cossack.” I want to know how closely “board” and “Cossack” correlate to each other in Linn’s writing and whether or not Linn uses these words more when on land or out at sea. To discover these questions, I will plug both words into Voyant (http://voyant-tools.org/).

To start solving my questions, I’m going to first plug in “board” to the Cirrus. After doing this, one can see that “board” appears 69 times. By looking at the corpus reader, one can see where in the text “board” is most frequent. Taking a peek to the right of the screen at the words trend panel, one can tell that the usage of “board” tends to spike in segments 4,5 and 7. These spikes are on January 6th, January 25th and February 9th (a Sunday to be exact). One must also remark on the sharp drop that the usage of “board” experiences in segment 6 (February 6th), which is between January 25th and February 9th.

To completely answer my question, I must not plug “Cossack” into Voyant and look for the correlation. Cossack appears 46 times throughout the diary, and tends to have the same frequency of appearance as “board.” To me, this is not a surprise because both words have to do with ships and more likely than not, when Linn is talking about ships, he is probably talking about the Cossack. The Cossack experiences its spikes on January 6th (segment 4) and February 9th (segment 7). The only difference between the two words is that the largest decline in usage happens on January 9th (segment 3).

After looking at both words in Voyant, I can now see that there is a strong correlation between “board” and “Cossack”. On top of this, the usage of these words spikes when Linn is onboard a ship (most of the time it is the Cossack). There are many reasons behind this. In my opinion, Linn tends to use words regarding ships when he is in fact on the ship. However, there are many possible reasons as to why Linn does this (mood, weather, time of day). The only real way of knowing would be speaking to Linn himself.

Hypothesis and Distant Reading

I could not think of anything to support or contradict the hypothesis given. So instead I used Voyant tools to choose a word that would pose a question or two for me to think about. I started with “Cossack” because I thought it would be interesting to see which words linked to it, and then I though “sick” would work really well connecting to it.

So rather than finding data to support or contrast Professor Jakacki’s hypothesis, I made my own hypothesis by looking at the two words, “Cossack” and “sick”. My hypothesis is that I think that the words “Cossack” and “sick” will consistently appear together because James Merrill Linn wrote how people were always getting sick aboard the ship. This was either because they got seasick out on the rough ocean, or just since they were all stuck in a tight space together for long period of time, diseases spread faster.

First, I started by seeing how many times the different words appear in the text. "Cossack", appears 46 times, while "sick" shows up only 38 times. Then I decided to look at the relative frequencies of the two words compared to one another. The first thing I noticed was how extreme the frequencies were, they never really remained flat for a long period of time. The result of comparing them was also not what I expected. In segments 1, 2, 3, and 4 the two words did not have even slightly similar word trends, they were complete opposites. However, in segments 5, 7, 8 and 9 they had either identical word trends or very similar ones, as shown below. And although this somewhat supports my hypothesis, it does not completely.

The most uses of the word "sick" were in the third segment of the text, where almost all of the men were getting sick from the rocking of the boat. In one diary entry, "sick" was used 4 times, and in the one right after it was used 5 times. "Cossack" is used, as I expected, most often when they are aboard the ship, but also when they are about to go on it or right after they got off the boat. The word frequencies overlap especially in segments 5 and 8 because that is when the men in the army and their prisoners were getting very sick aboard the ship.

Distant reading was very helpful in this situation with looking up those two words. Although I did not get a clear answer to whether my hypothesis was accurate or not, I think it was in the broad sense, and distant reading definitely helped to prove that.