Personal accounts in letters in comparison to factual information in diary entries

For our final project, Mary Medure and I collaborated together to compare and contrast James Merrill Linn’s diary entries and his letters to his mother and brother, John. We wanted to focus more on the content of his diary entries and letters rather than specific tools that documented his locations. Thus, instead of mapping, we chose to each transcribe different letters that would be eventually tagged in TEI and converted to a Digital Edition. Mary and I chose to transcribe letters that were written around the same time frame to compare the content in each letter. Additionally, we wanted to transcribe both the letters that were in the same time frame as the diary entries we transcribed earlier this semester. Mary transcribed the letter to John on February 11, 1862 and her diary entries she already transcribed were February 8-12, 1862. I transcribed the letter to Linn’s mother on February 19, 1862 and the diary entries from February 5-7, 1862. We used Voyant tools to compare his most commonly used words in his diary entries and letters.

Screen Shot 2014-12-12 at 1.32.59 PM

Transcription Difficulty of Letter to Mother on February 19, 1862

During the transcription process, Mary and I separately transcribed the 2 pages of each letter and then collaborated together to clarify the words we could not decipher. We would read the letters aloud to each other to make more sense of Linn’s experiences. However, some words were illegible so we went to the archives in the library to read the letters first hand. In Linn’s letter to his Mother, Mary and I could not read the words at the end of each page because of the binding of the documents. In the archives, we could not bend or fold the pages over to read the full words so we had to make some educated guesses related to the context of each sentence. In Pierazzo’s article, she raises a great point that “[j]udgment is necessarily involved in deciding what is in fact present [in the manuscript], as when an ambiguously formed character resembles two different letters; but the transcriber’s goal is to make an informed decision about what is actually inscribed at each point (Meulen and Tanselle, 1999, p. 201)” (465). This demonstrates that although Mary and I went to the archives for a second look at the documents, we still needed to make educated contextual guesses for multiple words for the document to make sense. For example, the screenshot on the left shows the word “tomatoes” cut off. In this section of the letter, he was talking about food and “toma-” is legible. Therefore, I needed to make an educated guess with regards to the context of the sentence to figure out the word that was cut off at the end of the page.

Color Coding of Events and Affiliation in Letter to Mother February 19, 1862

Color Coding of Events and Affiliation in Letter to Mother February 19, 1862

After the transcription process, we needed to start tagging the words that we felt were most important to include. To make the tagging process simpler, we color coded based on person/people, place, affiliation, object, state, trait, event, date, time and military role. In our diary entries, we did not color code to the same extent. We found that affiliation and person/people  were important enough to be a separate entity. For instance, we consider “Americans” to be an affiliation because it is a group of people associated to a specific location. We also categorized “war” and “battles” as events rather than places because they are at different locations. I did not have “event” as a category in the diary entry I transcribed because he would refer to the battles as their real names. As he writes to his mother, I believe that he refers to the battles generally because he is not using the letters as a reference to his specific locations and events.

Screen Shot 2014-12-12 at 5.33.53 PM

Color Coding of Descriptions and States in Letter to Mother February 19, 1862

After color coding, we noticed that the majority of words we highlighted were descriptions and states of well being.  Highlighted in turquoise are the descriptions and highlighted in gray are states, including weather and emotions. He is writing to his mother pertaining more of his personal experiences and his emotional responses to the war overall. After color coding the letters, we tagged the words that were highlighted and transferred the document to Oxygen to make a Digital Edition.

Letters to Mom & John

Letters to Mother and John most commonly used words

Voyant is a great tool to use when comparing contextual information in different documents. Therefore, Mary and I thought it would be a good idea to compare the diary entries to the letters using Voyant.  First, we took our my transcription files of Linn’s letter to his mother and brother, John, to show the most commonly used words. I noticed that he frequently used “hope”, “remember”, “little”, and “home”. These words are more of an expression and description of how he feels and his reactions to his surroundings as opposed to specific locations and people. He refers to “home” (Lewisburg) frequently, which makes sense because he is talking to his mother. Generic terms like “men” and “company” are commonly used because his letter to his mother is more of a representation of his personal experiences rather than a collection of locations he travels to or people he encounters.

Diary entries (both)

Linn’s diary entries most commonly used words

After analyzing our transcriptions of Linn’s letters to his mother and brother, Mary and I combined our diary entries to see the most commonly used words. We noticed that military men of different ranks were prevalent throughout his diary entries. Linn refers to specific people such as General Burnside, Captain Bennet, and many more. Comparatively speaking, “battle” appears to be used in both the letters and diary entries; however, “battle” is significantly larger, indicating it was used more, in his diaries. This supports the hypothesis that Linn’s diary entries are more of a personal account of places and people, whereas his letters to his family are more of his emotional experiences throughout the war.

Transcribing Linn’s letters to his mother and John around the same time as Linn’s previously transcribed diary entries gave Mary and I the support to claim that Linn’s diary entries are a personal collection for himself of locations he has traveled to and people he has encountered along the way. In contrast, Linn’s letters to his mother and John are more generic and express his feelings regarding the war rather than the a series of places and people. Color coding helped us significantly as we found that our hypothesis was correct in saying that Linn’s writing to his mother and brother were more emotional and personal whereas his diary entries were a collection of people and places for himself to remember later. To visualize the contrast in diary entries and letters written to family, Voyant is a great visualization tool to give the viewer a general idea of the premise and themes of each document. Overall, this project gave me a much better understanding of James Merrill Linn’s diary purpose in writing what he did in both his diary entries and letters to home.

Here are the links to my final TEI product!

Digital edition:

Works Cited
Linn, James Merrill. Diary. February 5-7, 8-12, 1862. MS. Bucknell University Archives and Special Collections, Lewisburg, PA.
Linn, James Merrill. Letter to John. February 11, 1862. MS. Bucknell University Archives and Special Collections, Lewisburg, PA.
Linn, James Merrill. Letter to Mother. February 19, 1862. MS. Bucknell University Archives and Special Collections, Lewisburg, PA.
Pierazzo, Elena. “A Rationale of Digital Documentary Editions.” Literary and Linguistic Computing. 26.4(2011): 463-477.



On Distant Reading

For my investigation into the Linn letters I used a set of word I initially hypothesized to have some relation to the internal changes within Linn himself. The Jakacki hypothesis states, that a profound shift can be found half-way through the text that reveals a deep change and loss of innocence for Linn. Using the words received, brother and mother I was able to gain some insight into Linn’s relationship to his own family and Linn’s level of acknowledgment and mutual interest among those closest to him. The visualization revealed little other than that mention of family increases near the beginning and end of the letters.  I find little evidence to either verify or refute the original hypothesis, even carefully chosen words reveal few patterns as do carefully selected sets of words. I believe that Linn’s writing style makes it exceptionally difficult to derive deeper trends within his texts. My own investigation didi reveal that he rarely mentions any family member other than immediately after he left and before he came back, hardly evidence for the hypothesis.

The technique of using transcribed text and visual analysis tools might be helpful in some situations. Unfortunately, I have found little evidence that those methods reveal any more information in this particular case. This does not mean that distant reading methods are entirely ineffective, as the cirrus cloud did illustrate the relationships between words and their frequency. Small trends like the appearance of the word “boat” and “wounded” did match up with the general shape of Linn’s experiences. Overall however, this particular exercise revealed little evidence to support the hypothesis. grab




Hypothesis and Distant Reading

I could not think of anything to support or contradict the hypothesis given. So instead I used Voyant tools to choose a word that would pose a question or two for me to think about. I started with “Cossack” because I thought it would be interesting to see which words linked to it, and then I though “sick” would work really well connecting to it.

So rather than finding data to support or contrast Professor Jakacki’s hypothesis, I made my own hypothesis by looking at the two words, “Cossack” and “sick”. My hypothesis is that I think that the words “Cossack” and “sick” will consistently appear together because James Merrill Linn wrote how people were always getting sick aboard the ship. This was either because they got seasick out on the rough ocean, or just since they were all stuck in a tight space together for long period of time, diseases spread faster.

First, I started by seeing how many times the different words appear in the text. “Cossack”, appears 46 times, while “sick” shows up only 38 times. Then I decided to look at the relative frequencies of the two words compared to onScreen shot 2014-09-24 at 1.49.53 PMe another. The first thing I noticed was how extreme the frequencies were, they never really remained flat for a long period of time. The result of comparing them was also not what I expected. In segments 1, 2, 3, and 4 the two words did not have even slightly similar word trends, they were complete opposites. However, in segments 5, 7, 8 and 9 they had either identical word trends or very similar ones, as shown below. And although this somewhat supports my hypothesis, it does not completely.

The most uses of the word “sick” were in the third segment of the text, where almost all of the men were getting sick from the rocking of the boat. In one Screen shot 2014-09-24 at 9.31.33 PMdiary entry, “sick” was used 4 times, and in the one right after it was used 5 times. “Cossack” is used, as I expected, most often when they are aboard the ship, but also when they are about to go on it or right after they got off the boat. The word frequencies overlap especially in segments 5 and 8 because that is when the men in the army and their prisoners were getting very sick aboard the ship.

Distant reading was very helpful in this situation with looking up those two words. Although I did not get a clear answer to whether my hypothesis was accurate or not, I think it was in the broad sense, and distant reading definitely helped to prove that.

Blog Post II: Drawing Conclusions and Asking Questions

Thinking through  the hypothesis, the first words I decided to take a look at were “guns” and “arms.”  My initial thought was that maybe the usage of these terms would change as Linn went in and out of battle, either in frequency or in connotation.  A pattern did show up while looking at the trend chart, though it wasn’t what I was expecting to find.  While the frequency of these terms did increase once he got to the battlefront, there was no real change in his tone after the first battle was over.  These words didn’t really help me answer this question, so I decided to move on to a new search term.

Voyant Tools  Reveal Your Texts (2)


I decided that if the instruments of war weren’t going to give me my answer, I would have to take a step back.  I used the word trends graph to give me a comparison between the uses of the words “war” and “battle.”  It was this search that gave me my more interesting results.  Most of the 15 uses of the word “war” happened in the first half of the diary, while he was in Maryland before departing and while aboard the Cossack.  There is a long break where “war” does not appear once the first battle has started until after it had ended, where the gap was broken by the simple sentence “War is horrible.”  This very much seemed to prove Professor Jakacki’s hypothesis, at least in part.  Linn did not have such a negative view of war before that point, something had changed in his perception.  After that revealing search, I added a map of “battle” to my graph.  This revealed an entirely different trend.  The two words, “battle” and “war,” seemed to show up somewhat exclusive of each other; their frequencies were inversely related.

Voyant Tools  Reveal Your Texts (1)

This made me ask another question: Why are these two topics discussed exclusively of each other?  Was Linn unable to see the bigger picture of the war in the heat of battle?  Was he not imagining the battles that lay ahead when he was on campaign?  These are questions that I could seek to answer in the future.

Utilizing Voyant for Distant Reading tools

Voyant is a great resource to find trends in specific documents. In particular, I will be using “Collocate Clusters” to make connections between words and ideas in a series of comprised diary entries by James Merrill Linn. In Linn’s diary, he writes, “War is horrible. I first saw the pomp & circumstance – the battle field – the dead and wounded now the prison ship.” The hypothesis poses the question, “For Linn, is this a turning point where he loses his innocence?” Using Voyant to see relationships between words, I will analyze to see if I can draw any conclusions from this hypothesis.

Screen Shot 2014-09-24 at 3.09.49 PM

Relationship between “boat” and “men” in Linn’s diary entries

At first, I tried using word cloud to look at trends in the diary. The two words that stood out to me were “boat” and “men”. Boat did not appear to be as prominent as other words, as boat was only used 81 times in the diary entries. However, after transcribing a diary page about Linn’s experience boating, there were many words that related to boat in the diary, including men, captain, and regiment. Instead of using word cloud, I decided to look at the relationship between boat and other common words. Therefore, I added the comprised Linn diary entries and edited my settings by putting in stop words. Then, I typed in boat to see the first few connections. As a result, men not only was one of the most common words used in the entire document, but it was also related to boat in the diary.

Screen Shot 2014-09-24 at 3.47.56 PM

Relationship between “boat”, “men”, and “wounded” in Linn’s diary entries

Next, I wanted to look at connections with one of the words used in the given quote by Linn. I chose “wounded”, mostly because I remember transcribing it in my specific diary entry.  I typed “wounded” in the search bar at the top to hopefully find connections with boat and men. I found that wounded was not as commonly used in the diary as men because wounded was only used 31 times whereas men was used 133 times. Although wounded was not used as often, there was a connection to men. Therefore, wounded was indirectly connected to boat because men and boat had a greater connection.

This is useful information for distant reading because the connecting words and the sizes of the words show how often Linn used them and the major and minor connections between those words. Unfortunately, this resource does not help me come to a conclusion about Linn’s loss of innocence because it does not reveal any trends. For example, the hypothesis was asking if Linn lost his innocence halfway through the transcription but I am unable to draw any conclusions because there’s no time frame for the connections. This means that I cannot easily find within the document where and when these words were used. Word cloud may be more useful in terms of finding trends, but Links is better for making connections and seeing how words relate within a document. Using both of these tools together could be extremely beneficial by making common connections between words or ideas, and also by showing you where the words are specifically in the document and how often they are used. Because I could not draw any conclusions relative to the hypothesis, I am posing a question about distant reading in general. When doing distant reading, is it better to begin by making connections with words or by finding specific trends or patterns of the words? I believe that these distant reading tools go hand and hand; however, depending on what you are searching for, one can be more helpful than the other. In our case, when analyzing Linn’s diary, both Word Cloud and Links could be used together to find the best result.