On Distant Reading

For my investigation into the Linn letters I used a set of word I initially hypothesized to have some relation to the internal changes within Linn himself. The Jakacki hypothesis states, that a profound shift can be found half-way through the text that reveals a deep change and loss of innocence for Linn. Using the words received, brother and mother I was able to gain some insight into Linn’s relationship to his own family and Linn’s level of acknowledgment and mutual interest among those closest to him. The visualization revealed little other than that mention of family increases near the beginning and end of the letters.  I find little evidence to either verify or refute the original hypothesis, even carefully chosen words reveal few patterns as do carefully selected sets of words. I believe that Linn’s writing style makes it exceptionally difficult to derive deeper trends within his texts. My own investigation didi reveal that he rarely mentions any family member other than immediately after he left and before he came back, hardly evidence for the hypothesis.

The technique of using transcribed text and visual analysis tools might be helpful in some situations. Unfortunately, I have found little evidence that those methods reveal any more information in this particular case. This does not mean that distant reading methods are entirely ineffective, as the cirrus cloud did illustrate the relationships between words and their frequency. Small trends like the appearance of the word “boat” and “wounded” did match up with the general shape of Linn’s experiences. Overall however, this particular exercise revealed little evidence to support the hypothesis. grab

 

 

 

Distant Reading

In this post, I will discuss Linn’s loss of innocence with the use of Voyant Tools. While messing around with Voyant Tools, I was able to see the correlations between a lot of different words. The first word I looked at was “shot”. As a soldier in the civil war, I expected Linn to have seen a lot of gruesome battles. “Shot” was the first word that came to my mind when thinking of the Civil war. However, when searching for “shot”, there was 11 times that this word come up in the diary. The only places that the word “shot” was significant was when Linn talked about people getting wounded. Though the word was not often, it did show Linn’s loss of innocence as he witness more injuries and death. Lenig’s and Buskirk’s injuries seemed to take a toll on Linn as he mentioned this twice in his diary, on April 15th and April 19th. Since the word was not used as often as I expected, I moved on to “sick” and “hospital”.

Screen Shot 2014-09-24 at 10.09.51 PM

Remembering Linn mentioned a lot of people getting sick throughout the whole diary, I decided that “sick” and “hospital” would give a better idea of Linn’s innocence. Using Voyant Tools, I was able to see the connections between “sick” and “hospital”. “Hospital” was used 16 times and “sick” was used 38 times. Looking at the graph, we can see that the frequencies of these words are very similar. This directly correlate with the horrible conditions of the war. As the war progress, the use of “hospital” increased. This proved that injuries and illness took a big part in this war and on Linn’s innocence.

The connections of these wordScreen Shot 2014-09-24 at 11.07.12 PMs brings me to a new question: What words are connected with these? Using Links, I tried to see how they intertwine. However, seeing that they never directly intertwine, I decided to use different variations of “sick”. I used “sickness”, “disease”, and “illness”. Finally, with “disease”, I was able to find a direct connection with “hospital”.

Seeing these connections, I learned that distant reading can be very useful in understanding a large document in a short time. Using these tools, we can process a large amount of informations that would take a lot of work and time otherwise.

 

Hypothesis and Distant Reading

I could not think of anything to support or contradict the hypothesis given. So instead I used Voyant tools to choose a word that would pose a question or two for me to think about. I started with “Cossack” because I thought it would be interesting to see which words linked to it, and then I though “sick” would work really well connecting to it.

So rather than finding data to support or contrast Professor Jakacki’s hypothesis, I made my own hypothesis by looking at the two words, “Cossack” and “sick”. My hypothesis is that I think that the words “Cossack” and “sick” will consistently appear together because James Merrill Linn wrote how people were always getting sick aboard the ship. This was either because they got seasick out on the rough ocean, or just since they were all stuck in a tight space together for long period of time, diseases spread faster.

First, I started by seeing how many times the different words appear in the text. “Cossack”, appears 46 times, while “sick” shows up only 38 times. Then I decided to look at the relative frequencies of the two words compared to onScreen shot 2014-09-24 at 1.49.53 PMe another. The first thing I noticed was how extreme the frequencies were, they never really remained flat for a long period of time. The result of comparing them was also not what I expected. In segments 1, 2, 3, and 4 the two words did not have even slightly similar word trends, they were complete opposites. However, in segments 5, 7, 8 and 9 they had either identical word trends or very similar ones, as shown below. And although this somewhat supports my hypothesis, it does not completely.

The most uses of the word “sick” were in the third segment of the text, where almost all of the men were getting sick from the rocking of the boat. In one Screen shot 2014-09-24 at 9.31.33 PMdiary entry, “sick” was used 4 times, and in the one right after it was used 5 times. “Cossack” is used, as I expected, most often when they are aboard the ship, but also when they are about to go on it or right after they got off the boat. The word frequencies overlap especially in segments 5 and 8 because that is when the men in the army and their prisoners were getting very sick aboard the ship.

Distant reading was very helpful in this situation with looking up those two words. Although I did not get a clear answer to whether my hypothesis was accurate or not, I think it was in the broad sense, and distant reading definitely helped to prove that.

Blog Post II: Drawing Conclusions and Asking Questions

Thinking through  the hypothesis, the first words I decided to take a look at were “guns” and “arms.”  My initial thought was that maybe the usage of these terms would change as Linn went in and out of battle, either in frequency or in connotation.  A pattern did show up while looking at the trend chart, though it wasn’t what I was expecting to find.  While the frequency of these terms did increase once he got to the battlefront, there was no real change in his tone after the first battle was over.  These words didn’t really help me answer this question, so I decided to move on to a new search term.

Voyant Tools  Reveal Your Texts (2)

 

I decided that if the instruments of war weren’t going to give me my answer, I would have to take a step back.  I used the word trends graph to give me a comparison between the uses of the words “war” and “battle.”  It was this search that gave me my more interesting results.  Most of the 15 uses of the word “war” happened in the first half of the diary, while he was in Maryland before departing and while aboard the Cossack.  There is a long break where “war” does not appear once the first battle has started until after it had ended, where the gap was broken by the simple sentence “War is horrible.”  This very much seemed to prove Professor Jakacki’s hypothesis, at least in part.  Linn did not have such a negative view of war before that point, something had changed in his perception.  After that revealing search, I added a map of “battle” to my graph.  This revealed an entirely different trend.  The two words, “battle” and “war,” seemed to show up somewhat exclusive of each other; their frequencies were inversely related.

Voyant Tools  Reveal Your Texts (1)

This made me ask another question: Why are these two topics discussed exclusively of each other?  Was Linn unable to see the bigger picture of the war in the heat of battle?  Was he not imagining the battles that lay ahead when he was on campaign?  These are questions that I could seek to answer in the future.

Utilizing Voyant for Distant Reading tools

Voyant is a great resource to find trends in specific documents. In particular, I will be using “Collocate Clusters” to make connections between words and ideas in a series of comprised diary entries by James Merrill Linn. In Linn’s diary, he writes, “War is horrible. I first saw the pomp & circumstance – the battle field – the dead and wounded now the prison ship.” The hypothesis poses the question, “For Linn, is this a turning point where he loses his innocence?” Using Voyant to see relationships between words, I will analyze to see if I can draw any conclusions from this hypothesis.

Screen Shot 2014-09-24 at 3.09.49 PM

Relationship between “boat” and “men” in Linn’s diary entries

At first, I tried using word cloud to look at trends in the diary. The two words that stood out to me were “boat” and “men”. Boat did not appear to be as prominent as other words, as boat was only used 81 times in the diary entries. However, after transcribing a diary page about Linn’s experience boating, there were many words that related to boat in the diary, including men, captain, and regiment. Instead of using word cloud, I decided to look at the relationship between boat and other common words. Therefore, I added the comprised Linn diary entries and edited my settings by putting in stop words. Then, I typed in boat to see the first few connections. As a result, men not only was one of the most common words used in the entire document, but it was also related to boat in the diary.

Screen Shot 2014-09-24 at 3.47.56 PM

Relationship between “boat”, “men”, and “wounded” in Linn’s diary entries

Next, I wanted to look at connections with one of the words used in the given quote by Linn. I chose “wounded”, mostly because I remember transcribing it in my specific diary entry.  I typed “wounded” in the search bar at the top to hopefully find connections with boat and men. I found that wounded was not as commonly used in the diary as men because wounded was only used 31 times whereas men was used 133 times. Although wounded was not used as often, there was a connection to men. Therefore, wounded was indirectly connected to boat because men and boat had a greater connection.

This is useful information for distant reading because the connecting words and the sizes of the words show how often Linn used them and the major and minor connections between those words. Unfortunately, this resource does not help me come to a conclusion about Linn’s loss of innocence because it does not reveal any trends. For example, the hypothesis was asking if Linn lost his innocence halfway through the transcription but I am unable to draw any conclusions because there’s no time frame for the connections. This means that I cannot easily find within the document where and when these words were used. Word cloud may be more useful in terms of finding trends, but Links is better for making connections and seeing how words relate within a document. Using both of these tools together could be extremely beneficial by making common connections between words or ideas, and also by showing you where the words are specifically in the document and how often they are used. Because I could not draw any conclusions relative to the hypothesis, I am posing a question about distant reading in general. When doing distant reading, is it better to begin by making connections with words or by finding specific trends or patterns of the words? I believe that these distant reading tools go hand and hand; however, depending on what you are searching for, one can be more helpful than the other. In our case, when analyzing Linn’s diary, both Word Cloud and Links could be used together to find the best result.