Utilizing Voyant for Distant Reading tools

Voyant is a great resource to find trends in specific documents. In particular, I will be using “Collocate Clusters” to make connections between words and ideas in a series of comprised diary entries by James Merrill Linn. In Linn’s diary, he writes, “War is horrible. I first saw the pomp & circumstance – the battle field – the dead and wounded now the prison ship.” The hypothesis poses the question, “For Linn, is this a turning point where he loses his innocence?” Using Voyant to see relationships between words, I will analyze to see if I can draw any conclusions from this hypothesis.

Screen Shot 2014-09-24 at 3.09.49 PM

Relationship between “boat” and “men” in Linn’s diary entries

At first, I tried using word cloud to look at trends in the diary. The two words that stood out to me were “boat” and “men”. Boat did not appear to be as prominent as other words, as boat was only used 81 times in the diary entries. However, after transcribing a diary page about Linn’s experience boating, there were many words that related to boat in the diary, including men, captain, and regiment. Instead of using word cloud, I decided to look at the relationship between boat and other common words. Therefore, I added the comprised Linn diary entries and edited my settings by putting in stop words. Then, I typed in boat to see the first few connections. As a result, men not only was one of the most common words used in the entire document, but it was also related to boat in the diary.

Screen Shot 2014-09-24 at 3.47.56 PM

Relationship between “boat”, “men”, and “wounded” in Linn’s diary entries

Next, I wanted to look at connections with one of the words used in the given quote by Linn. I chose “wounded”, mostly because I remember transcribing it in my specific diary entry.  I typed “wounded” in the search bar at the top to hopefully find connections with boat and men. I found that wounded was not as commonly used in the diary as men because wounded was only used 31 times whereas men was used 133 times. Although wounded was not used as often, there was a connection to men. Therefore, wounded was indirectly connected to boat because men and boat had a greater connection.

This is useful information for distant reading because the connecting words and the sizes of the words show how often Linn used them and the major and minor connections between those words. Unfortunately, this resource does not help me come to a conclusion about Linn’s loss of innocence because it does not reveal any trends. For example, the hypothesis was asking if Linn lost his innocence halfway through the transcription but I am unable to draw any conclusions because there’s no time frame for the connections. This means that I cannot easily find within the document where and when these words were used. Word cloud may be more useful in terms of finding trends, but Links is better for making connections and seeing how words relate within a document. Using both of these tools together could be extremely beneficial by making common connections between words or ideas, and also by showing you where the words are specifically in the document and how often they are used. Because I could not draw any conclusions relative to the hypothesis, I am posing a question about distant reading in general. When doing distant reading, is it better to begin by making connections with words or by finding specific trends or patterns of the words? I believe that these distant reading tools go hand and hand; however, depending on what you are searching for, one can be more helpful than the other. In our case, when analyzing Linn’s diary, both Word Cloud and Links could be used together to find the best result.

Can Distant Reading Prove Hypothesis?

With respect to Professor Jackacki’s hypothesis about James’ perspective, I have used Voyant tools to attempt to either affirm or deny that James shows a loss of innocence roughly halfway through our class transcription. The words I decided to utilize during my distant reading instilled upon me new questions.

 

Screen Shot 2014-09-24 at 1.37.51 PM The first world I selected was “night.” This word appears 82 times throughout the text. As I clicked on the word in cirrus tool and viewed it on the word trends tool, I did not notice any patterns whatsoever. Especially during the time period Dr. J. addresses, there was a consistent frequency of the number of times night was used. It also did not appear to be too much different than the beginning diary transcripts, either.  Distant reading using this word alone, did not help with Professor Jackacki’s research question.

 

 

Screen Shot 2014-09-24 at 1.38.10 PMHowever, when I added a second word, “boat,” I noticed a distinct interesting. I noticed these two trends were almost mirror images of each other. There seemed to be a negative correlation between talking about night and talking about boats. Whenever Linn was talking about a boat, typically the Cossack, he did not seem to be mentioning night. This leads me to the question: when Linn is on board the Cossack, is he writing at night, therefore he does not mention night, or does all the action seem to talk place during daylight hours? Why does is there this strange relationship between “boat” and “night”?

 

These two words do not necessarily reveal anything about Linn’s experience in battle. I just know that when talking about his boat, the Cossack, does not seem to have a correlation to his innocence or lack thereof. This plays out to be true when Linn mentions night. Unfortunately, I was unable to affirm or refute Professor Jackacki’s research question and hypothesis; yet, maybe other words will be able to. They do guide me towards the thought that Linn may have a pattern of the time he rights his diary entries.

I benefitted from using Voyant tools and feel that once I am able to come up with a better research question, I will be more successful with the program. Distant reading is a very intriguing concept and, although I was “unsuccessful” with the goal of getting to some type of conclusion over the accuracy of Professor Jackacki’s hypothesis, I have used this as an experience that can improve my skills in this field.

Wounded and Battle

The two words that I picked to look at were wounded and battle. I wanted to pick words that would reflect his feelings during battle and would show if the way that he talked about the battles changed overtime. Instead of picking random words, i wanted to find words that had some sort of connection, and i found that wounded and battle seemed like they could possibly have some beneficial links. When i continued to look closer, it was obvious that a link was present.

Screen Shot 2014-09-24 at 1.41.20 PM

The graph above shows the number of appearances of both of my words in the progression of his writings. The second half of his writings had dramatically more appearances of both of the words. Actually, wounded does not show up once in the entire first half of the writing, but shows up thirty four times in the second half. Battle is also used drastically more often in the second half than in the first. For me the connections between the appearance of these words could show a change in Linn. Both of the words that i picked are connected to pain, war, and conflict so i would assume they would led to a change in the way that Linn sees that. His time spent in the war made him hard, resulting in the loss of innocence which in turn led to the change in the way that he writes and what he chooses to write about.

I also found it interesting to look at the words that were connected to battle and wounded in the Links tool. They were both connected to the word company, which comments that he was often talking about his solider companions when discussing battles and the wounded. The words left, exhausted, and killed were also all in this cluster. These words all have some connotations that could play a part in Linn’s transformation. Although i would say a deeper look might be required to make a strong conclusion about his change, my dive into these two words seemed to fit that conclusion.

Screen Shot 2014-09-24 at 1.36.51 PM

Links: Connections between my words

Linn’s loss of innocence ?

In this blog post I will present my hypothesis in response to the question being asked.

I think Linn does have a loss of innocence because he’s becoming more complacent and less sensitive to the terrible things happening to his fellow soldiers around him. I believe he had to adapt in this way, so he could emotionally survive this horrible war experience. When you’re exposed to bad things over and over again, you usually become less sensitive and hardened to them. This is a common survival and coping mechanism.

I can demonstrate Linn’s loss of innocence with the word “saw” in the word trends. In the first battle he uses the word “saw” frequently, but less often in subsequent battles. He uses this word to describe things he is looking at with his innocent eyes. For instance, when he “saw” a friend. He’s using this word in a friendly and warm context. It seems apparent that as he got further into the war, he used the word “saw” less often because it was too painful for him to see what has happening around him. He was changing his focus and looked at things differently. He was less naïve and less sensitive and he “saw” less

When I used the word “wounded” in the word trends tool, I didn’t believe it was helpful or insightful. This is because the frequency pattern of the word “wounded” did not correlate with my thesis that Linn became less sensitive and had a loss of innocence. The word “wounded” speaks to empathy and sensitivity and I would expect its use to decline as Linn became more hardened. There is a frequency spike toward the end of Linn’s diary, which may be related to the battles becoming more frequent and more intense. This certainly would have resulted in many more people getting wounded. The word pattern seems more related to the intensification of the war rather than his loss of innocence.

Below is a screenshot of the word trends graph. The first thing I did was type in the word “saw” into the “search” bar. Shortly after typing the word, a graph appeared showing the relative frequencies of the word.Then I typed in the word “wounded” into the search bar. I then saw the relative frequencies of the word “wounded” and the word “saw”.

Screen shot 2014-09-23 at 4.12.48 PM

Word Trends Graph showing frequency of “saw’ and “wounded”

 

Below is a screenshot of the keywords in context tool. After I typed in the word “saw” into the “search” bar all of the ways the word “saw” was used in the diary show up. This tool was extremely helpful with helping me to see the context of the word “saw”.

 

Screen shot 2014-09-24 at 1.30.49 PM

The key word “saw” in context

 

The value of distant reading is that is that it gives you a different perspective on text, which provides a quantitative and qualitative approach to language. This helps to highlight word usage, frequencies and patterns. Distant reading is quantitative because it computes word frequency and its qualitative because it shows you how he’s using a word and in what context. While this approach is helpful it doesn’t tell the whole story; it misses subtleties and messages that can only be grasped when reading the entire passage in a fluid way.

For example, as the war gets worse and more people are getting wounded, he uses the word “wounded” more frequently. This doesn’t mean he’s more or less sensitive. The circumstances of war can be so strong that it practically forces the frequency of a particular word without regard to values and attitudes.

Analysis of Distant Reading

In this post I will examine distant reading. Distant reading definitely has its benefits, but can it help to prove or refute a hypothesis? I am wondering whether coming about halfway through the text if James illustrates a profound shift in perception, and did he demonstrate a loss of innocence? I am going to use Voyant tools, a website which allows for analysis through distant reading, to see if I can gather some evidence to answer this question.

Screen Shot 2014-09-24 at 10.35.47 AM

Word cloud

When I inserted the diary of Linn into Voyant tools, a word cloud appears with words that are used commonly throughout this text. I scanned this word cloud in search for words that would relate to my question. The first word that stuck out to me was sick. I figured that an increase in people getting sick might change James’s perception and also might cause him to lose his innocence, which is why I chose to analyze sick as my first word. According to the word trend, Linn did not write often about sickness in the very beginning. However, there are two huge peaks. If there was one peak in the center of this plot, then that would give pretty good evidence to support my hypothesis. Unfortunately this is not the case and the decrease that occurs in between the two peaks provides me with confusing data. Why was there a sudden decrease before Linn picked back up and starting writing more frequently again about sickness? Although I have a few questions about the data, it does show me that from the beginning to the end there is definitely an increase in Linn’s writing of illness. This increase might have been a factor that caused Linn’s perception to shift, however we cannot know for sure.

The second word I decided to take a closer look at was battle. I tried to get in the head of Linn and I decided that if I were him, battle would definitely be something that would alter my perception and take away my innocence. The word trend shows me that at the very beginning, Screen Shot 2014-09-24 at 1.40.39 PMbattle was hardly ever written about. There was a slow increase, followed by a huge peak. The peak appears roughly halfway through the entry which would support my hypothesis. Battle became a huge part of Linn’s writing at this point and stayed important to him throughout the rest of the time he was there. It seems that once Linn began to focus on battle, he could not stop writing about it. This trend provides me with pretty good evidence that something changed about halfway through his journals, and he had a shift in the material that he chose to write about. Both of these words have similar frequencies, sick appearing 35 times, and battle appearing 38 times. The word trend of both sick and battle shows me that both of these things became more and more prominent in his life, which illustrates a change. Although this prominence is not enough to prove my hypothesis, it does support it.

This exercise taught me how helpful distant reading can truly be. If I were using close reading this task would have definitely been much harder and more time consuming. I would have had to read through the whole text and pay close attention to a shift in attitude. It would have been nearly impossible to track certain words such as battle and sick while doing a close reading. The word trends were extremely helpful in analyzing the text, as they show the frequency of specific words, I can easily see important shifts. This exercise truly opened up my eyes and I have a new appreciation for distant reading as a means of analysis.