Analyzing Transcription with Tagging

Using close reading as a tool to analyze the transcription helped us to better understand the text. In class, we have used two tools/techniques, categorizing words by colors and TEI. Both of which were very useful, especially TEI, in categorizing important words. By tagging words, we analyzed every bit of information they might offer. Pierazzo stated “no transcription, however accurate, will ever be able to represent entirely the source document” (Pierazzo, 464). Although we can’t represent it entirely, we can at least get every bit of information we can.

Screen Shot 2014-10-26 at 9.31.15 PMCategorizing words by colors was a very interesting technique. It is simple yet efficient in highlighting significant words. We tagged words by categories (people, places, events, traits, states, etc.) and highlight them in different colors. As simple as it sounds, we encountered a lot of problems. We had to define what is and what isn’t tag-worthy. The categories were a problem themselves. We had many arguments on what should be in which category. For example, we had to define whether “Cossack” should be a place or an object. Like “Cossack”, many words were on the verges of two different categories. Overall, it was interesting to see how everyone chooses to tag and how Linn chooses to write down his observations. There were more tagging for people and objects than anything else. Linn seems to be more concerned with physical things.Screen Shot 2014-10-26 at 10.19.14 PM

TEI changes the way we can analyze text.Similarly to the colorization technique, TEI allows us to categorize words with a variety of options. With the help of TEI, we have endless options in tagging significant words. In Pierazzo’s article, Dristol stated “to all intents and purposes there is no limit to the information one can add to a text—apart, that is, from the limits of the imagination” (466) when commenting on the possibilities of TEI. While encoding with TEI, I had a lot of problems with deciding how many different codes I needed to analyze a word. We had a lot of options but we also had a lot of words. With TEI, I found myself tagging more words than with the colorization. I tagged a lot of words that were not significant. However, by tagging them, I was able to learn everything we could from the physical states of the object to the time and place.

The collaborative process works in our advantage. As we were able to work with each other, we made sure that we had the same guidelines for tagging these words. Pierazzo said that the opinion of the editor changes the interpretation of the transcription. By deciding on the tagging of certain words, we can have similar interpretation of the text, therefore prevents us from deviating from the accepted guideline.

Blog IV: TEI, XML, and Close Reading

The density of brown descriptive terms in the second day of the diary.

The density of brown descriptive terms in the second day of the diary.

Through the markup of Linn’s diary entries and close examination of the words and phrases he used to express himself, I have developed a deeper understanding of Linn’s words and have begun to formulate new questions based on the last two weeks’ exercises.  I consider myself lucky that my page of Linn’s diary contained two days worth of writings.  This has allowed me, through markup of descriptive terms, to witness how Linn’s writing styled changed by day, and by his emotions at the time.  By looking at the density of negative descriptive terms, I was able to pick out a distinct change in Linn’s tone between the 7th and the 8th.  More specifically, the occurrence of negative descriptors was roughly three times as dense on the 8th than it was on the 7th.  I was able to assume from this information that Linn’s mood dropped dramatically between the two days, likely a result of the incessant rain and cold weather he had to sleep in.  This kind of revelation is possible through the features Pierazzo describes as “Semantics,” the markup of “dates, names of people, of places, keywords.”  I would never have noticed this subtle change, nor really understood Linn’s feelings these days without close reading and markups of the text.


Collaborating with the rest of the class in creating a standardized markup style gave me insight into the workings of editorial boards; specifically how long the editorial decision process takes.  As a group of ten, we spent the better part of 15 minutes discussing the benefits of labeling boats as objects or places.  Both sides of the argument made good points, and we found it difficult to come to a consensus.  I think this illustrates a point made by Elena Pierazzo, “objectivity is not very productive or helpful in the case of transcription and subsequently of diplomatic edition… it is argued here for informed, circumspect, documented, scholarly interpretation.”  There was no right answer in the debate we had.  It was a matter of weighing the facts in front of us and making a subjective decision, a decision that was in part based on what information we wanted the markup to carry.  We ended up marking named boats as objects because we wanted readers to know that they were only referred to as places in specific circumstances.  This is an example of the purpose of a digital edition as defined by Pierazzo, that they are meant “to achieve the scholarly purpose of the edition–a purpose which, by definition, varies.”

A selection of my original markup...

A selection of my original markup…

... vs. the same selection after group editing.

… vs. the same selection after group editing.

What I Learned From Tagging

Learning how to mark up our documents and then taking what we learned and applying it to our journal entires has allowed me to obtain a deeper understanding about the way that Linn writes about the war. Although the process was tricky and frustrating at points purely because of my lack of experience, I believe that it brought focus to the specific types of things that Linn talks about when he is writing. For example, when going through the version of the Google Docs that was marked up with colors, it was clear that some of the colors were used more than others. For me, I would say that blue and orange were the two most used, while purple, brown, and cyan were the least used. This comments on Linn’s writings because it gives us insight into his writing style, with a focus on people and objects. Although he is descriptive in some places, he sometimes jumps from topic to topic, which is why we see less cyan, brown, and purple.

A lot of what Pierazzo talks about in her piece was visible in our process. For example, there was a large variety in the amount of tagging that occurred, with some people tagging most words, while some just picked out the important ones.  This resonates in Pierazzo’s article when she says, “So, we must have limits, and limits represent the boundaries within which the hermeneutic process can develop”(466). One of my paragraphs is below (A), and i chose to only tag the words that I thought were important and relative.

Screen Shot 2014-10-26 at 9.21.11 PM


Screen Shot 2014-10-26 at 9.20.56 PM



Although I think i did not make a mistake in being sparse, other people heavily marked up their entries (B) which made me come to think about how they thought those words were important compared to how I choose to select my words. Again, this links back to Pierazzo when she says that a digital edition includes words and sections that are “considered meaningful to the editors” (475) and “that one cannot declare once
and for all which features should be included” (475). The degree to which each person marked up their piece was one of the most interesting factors when I looked over everyone else’s entries.

Screen Shot 2014-10-26 at 10.09.45 PM

I also learned a lot in the editorial process, primarily that it is harder to come to conclusions on basic stuff like whether a boat is a place or object than I thought. When we were talking about the cossack and different ways to go deeper in tagging, it changed the way that i thought about this tagging and my reading. When tagging mine, i had a deep internal struggle about how to tag battery, considering that like cossack, it could be both. My struggle was the externalized when we came to class and discussed cossack. When talking about what to mark up and what not to mark up, Pierazzo says that it “depends either on the particular vision that we have of a particular manuscript or on practical constraints” (465). For me, the idea of particular vision is why we disagreed. I saw battery as an place, and when asked about cossack it made sense to me that it would be a place too. Boats represent places for me, but someone made a point that to Linn, they are objects not places, and that makes sense to me. By making it an object but adding the type boat, we were able to come to a consensus. However, considering that we spent so much time arguing over one word, it makes me dread what it must be like to go through an entire edited text. I thought that this was interesting, gave me a better look into the kinds of words and descriptions that Linn uses, and taught me some new useful skills.

Things I learned through tagging

The process of marking up my transcription was definitely very helpful as it allowed me to make observations that I would not have otherwise made. The first step was for us to tag people, places, objects, events, etc. in our our own diary entry. Before doing the markups in XML, we made a class google document with all of our diary entries in order. Each category (people, places, objects, etc.) eScreen Shot 2014-10-26 at 5.45.56 PMach had its own color and we were instructed to highlight the words accordingly. For me, this was the most useful step. During this step was when I decided which words were important enough to be highlighted. For example, a person was referred to in Linn’s entry as “gentleman,” but I decided that he was someone Linn saw in passing and was not essential to be marked up.

Another helpful part of this step was that when each of my classmates and I finished the markups I was able to scroll through the document and see which color was the most prominent. It turned out that blue and orange, which represented people and objects, appeared to be the two most seen colors. On the other hand, red represented events and this was probably the most seldom seen color. This allowed me to observe that Linn did not view the specific events, accomplishments, or defeats of the battle as significant to write about, but instead Linn focused on the people and objects that directly involved him on a day-to-day basis.

Lastly, through scrolling through the document I was able to see that each person chose to focus on tagging different word types. For example, there were some diary entries that had numerous purple markups (dates and times) and others that had zero. I do not think that this difference came about because of Linn, but this occurred because of the students’ different ideas of what they viewed as important.  This observation connects heavily to the Pierazzo reading. Pierazzo focused a lot on how the digital medium allows for greater possibilities for representation, which proved to be true. Additionally, I was able to see the large role individuality and perspective plays in marking up documents that Pierazzo discussed. By actually completing markups and comparing mine to that of my classmates, I now agree with Pierazzos statement that, “a digital edition includes features of the original document that are considered meaningful to the editors” (475). The digital edition is exactly so, but I may be difficult to understand this without actually going through the process for yourself.

After highlighting in the google document, we used XML in order to tag the words. Personally, I think it is significantly harder to make observations in this medium. This is because the google document allowed for both close and distant reading analyses to be made, which cannot be done using the XML. In XML only close reading analysis can be easily made. I definitely used this method as for each word that I tagged, I first analyzed the importance of it in terms of Linn and his entry. Based on my analysis I decided whether the word was worth being tagged.  This connects to another central topic of Pierazzo’s article, which was on “when to stop.” Since the digital world does not place many limitations on the editors, how do the editors know enough is enough? Personally, I believe it is better to under tag than over tag, because if every other word is tagged it is harder to see what is truly meaningful.

Another aspect of this project that was an eye-opener for me was the class debate. During this class, I felt like I was at an editorial staff meeting. We were sitting in a circle comparing specific words that some of us tagged as different word types. For example, cossack was a word that was of huge debate. A portion of the class felt that cossack was a place, but others argued that it was an object. It was interesting to take part in this debate and to in the end agree on one of the two. As a class we decided to mark cossack as an object. We came to this conclusion because although sometimes cossack is mentioned as a place in which Linn is going to, this is not always the case. However, it can not be argued against that cossack is always an object since it is a boat. I thought it was very interesting to see how much passion was put into this argument over tagging one single word.

I also found that this act of collaboration was helpful in enhancing my TEI file. Prior to this class, I did not go into detail on any of my tags. I merely used the word categories given to me, without further identifying. As a class we agreed that Beaver was someone of importance based on how frequently he Screen Shot 2014-10-26 at 5.59.56 PMwas discussed throughout the diary entries. Since he was important, we decided to give him an attribute. As a group we thought it was appropriate to give Beaver the type military.

I definitely had a lot of fun doing this project and I learned a lot about digital editions and the battles that editors can face in the process of publishing. Sometimes freedom is a bad thing because it can be difficult to place limits on oneself. Although a digital edition will never be the same as its source document I enjoyed trying to preserve it as much as I could. For example, in the TEI the line breaks match up with that of the original copy. I also kept Linn’s abbreviations such as his ampersands. Although there are some aspects that can not be replicated, such as the specific spacings between his written words, it is important to maintain as much as the digital allows.

Importance of Tagging in TEI

Close reading is a great tool to help categorize people, places, events, and more within a specific text. Using TEI, we analyzed Linn’s diary by choosing what words to tag. For example, one of our class discussions consisted of whether or not “cossack” should be tagged as a place or object. I argued that a cossack, which is a type of boat, is always an object but depending on the context of the sentence, it can be a place, too. In Linn’s diary, cossack was frequently used so we knew that we needed to tag it. We decided to tag it as object because in some instances in the diary, cossack wasn’t always a place.
Screen Shot 2014-10-26 at 3.52.50 PM

However, we resolved the place vs. object dilemma by categorizing it as an object but by also specifying what kind of object it is. Thus, we specified cossack by placing an object type tag as “boat”. By consulting with my peers, I realized that there can be multiple different perspectives and outlooks of a word, phrase or even an entire document. Cossack is a great example of a word that can be interpreted differently depending on its context. I may feel strongly that cossack is an object, but others can interpret it differently. Collaborating throughout Linn’s diary will allow our class to determine and classify words, which will also help clarify different opinions and interpretations.

In general, marking up the transcription has helped me better understand the context and circumstances of Linn. For instance, we individually started separating the people in the database by union and confederate army. Most of the people are union, which is to be expected because Linn is part of the union army and talks about the military men surrounding him. I also learned a little more about the men in the specific diary entry I transcribed.Screen Shot 2014-10-26 at 4.13.46 PMI thought that Alcot, Ripley and Prawe were all part of the union army but they were actually reporters who were supposedly neutral during the war. This helped clarify the context of the diary entry when I knew they were not directly involved in the war. As shown above, Alcot, Ripley and Prawe are reporters for the Herald & Inquirer. Before we started categorizing people, I assumed they were part of the military and I was confused why a newspaper company was mentioned. Now the context of this diary entry makes more sense!

In Pierazzo’s essay “A Rationale of Digital Documentary Editions”, she discusses the process of tagging selection. One of the most challenging aspects of specifying by tagging in TEI is knowing when to stop. You could essentially tag everything but that’s very time-consuming and does not distinguish significant phrases or words from less important ones. Pierazzo writes, “…we might conclude that one possible and tempting answer to the question ‘where to stop’ could be ‘nowhere’, as there are potentially infinite sets of facts to be recorded” (466). This causes a wide variation in interpretation. If there’s no limit, then one would think there is essentially no structure or guidelines between different articles. Although there may not be a hard limit, “the vast majority of decisions we make in this realm are decisions on which all (or most) competent readers agree or seem likely to agree (p. 196)” (466). Pierazzo makes the point that the tags made are (almost) universally acceptable and understood. There is room for interpretation, but the tags are not completely random. Therefore, there is some order when tagging words. Additionally, Pierazzo feels that when tagging, it is important to consider your audience. She writes, “to achieve the purpose of the edition and meet the editors’ needs, one needs to ask which features bear a cognitive value, that is, which are relevant from a scholarly point of view” (469). This demonstrates that the person marking up the document must consider the audience and make thoughtful, educated decisions when tagging. Although there’s no limit or “correct” way to tag words, Pierazzo believes that there are ways to make it somewhat orderly and structured while also having room for different interpretation.