Analyzing Transcription with Tagging

Using close reading as a tool to analyze the transcription helped us to better understand the text. In class, we have used two tools/techniques, categorizing words by colors and TEI. Both of which were very useful, especially TEI, in categorizing important words. By tagging words, we analyzed every bit of information they might offer. Pierazzo stated “no transcription, however accurate, will ever be able to represent entirely the source document” (Pierazzo, 464). Although we can’t represent it entirely, we can at least get every bit of information we can.

Screen Shot 2014-10-26 at 9.31.15 PMCategorizing words by colors was a very interesting technique. It is simple yet efficient in highlighting significant words. We tagged words by categories (people, places, events, traits, states, etc.) and highlight them in different colors. As simple as it sounds, we encountered a lot of problems. We had to define what is and what isn’t tag-worthy. The categories were a problem themselves. We had many arguments on what should be in which category. For example, we had to define whether “Cossack” should be a place or an object. Like “Cossack”, many words were on the verges of two different categories. Overall, it was interesting to see how everyone chooses to tag and how Linn chooses to write down his observations. There were more tagging for people and objects than anything else. Linn seems to be more concerned with physical things.Screen Shot 2014-10-26 at 10.19.14 PM

TEI changes the way we can analyze text.Similarly to the colorization technique, TEI allows us to categorize words with a variety of options. With the help of TEI, we have endless options in tagging significant words. In Pierazzo’s article, Dristol stated “to all intents and purposes there is no limit to the information one can add to a text—apart, that is, from the limits of the imagination” (466) when commenting on the possibilities of TEI. While encoding with TEI, I had a lot of problems with deciding how many different codes I needed to analyze a word. We had a lot of options but we also had a lot of words. With TEI, I found myself tagging more words than with the colorization. I tagged a lot of words that were not significant. However, by tagging them, I was able to learn everything we could from the physical states of the object to the time and place.

The collaborative process works in our advantage. As we were able to work with each other, we made sure that we had the same guidelines for tagging these words. Pierazzo said that the opinion of the editor changes the interpretation of the transcription. By deciding on the tagging of certain words, we can have similar interpretation of the text, therefore prevents us from deviating from the accepted guideline.

How Tagging Helped Me

When marking up a transcription it also forces you to take a deeper look into the diary entry, mainly just because you’re analyzing it even mScreen shot 2014-10-26 at 10.37.45 PMore. It has allowed me to understand the details of Linn’s diary better, especially who the people are and how they relate to John Linn and to the Civil War. For example, I was not sure if I had transcribed the word “twit” correctly, but after I did and looked up the definition it helped my to better understand the context of the text around it. It was particularly helpful to work with my classmates to figure out together who someone mentioned in a diary was because it was most likely in another student’s diary as well. It definitely allowed me to understand better how edited texts are produced, and it is not easy. OScreen shot 2014-10-26 at 10.42.30 PMur editorial board had some particular issues with resolving disputes over places vs. objects. Although I did not have any of the words in my diary post that were being argued over, one that was constantly discussed was if we should tag regiments as people. We did finally decide and all compromised on many tagging of words and discussions. It all had to do with judgment, and “Judgment is necessarily involved in deciding what is in face present… but the transcriber’s goal is to make an informed decision about what is actually inscribed at each point” (Pierrazo 465). Thankfully, it was a collaborative effort with the rest of the class that we were all able to work on together via computers and the Internet. Pierrazo put it in a way that makes sense by saying, “An electronic edition is like an iceberg, with far more data potentially available than is actually visible on the screen, and this is at the same time a great opportunity and a temptation to overdo things, When so many possibilities exist, there is a danger of technological considerations of what can be done taking priority over intellectual considerations of what is actually desirable or necessary in any particular case” (Pierrazo 467). Making decisions about transcribing has the potential to take an incredibly long time. “We all know how important economic considerations are in our decision-making processes; almost all of our research projects are funded for a specific time-span and budget, and so it is fundamental to ensure that the transcription (and encoding) is feasible within this lifetime.” (p 469). Although we don’t have to worry about a budget, it would not be time conscious to give every single word an in-depth tag, and we did a good job at tagging words but not going overboard.

Blog IV: TEI, XML, and Close Reading

The density of brown descriptive terms in the second day of the diary.

The density of brown descriptive terms in the second day of the diary.

Through the markup of Linn’s diary entries and close examination of the words and phrases he used to express himself, I have developed a deeper understanding of Linn’s words and have begun to formulate new questions based on the last two weeks’ exercises.  I consider myself lucky that my page of Linn’s diary contained two days worth of writings.  This has allowed me, through markup of descriptive terms, to witness how Linn’s writing styled changed by day, and by his emotions at the time.  By looking at the density of negative descriptive terms, I was able to pick out a distinct change in Linn’s tone between the 7th and the 8th.  More specifically, the occurrence of negative descriptors was roughly three times as dense on the 8th than it was on the 7th.  I was able to assume from this information that Linn’s mood dropped dramatically between the two days, likely a result of the incessant rain and cold weather he had to sleep in.  This kind of revelation is possible through the features Pierazzo describes as “Semantics,” the markup of “dates, names of people, of places, keywords.”  I would never have noticed this subtle change, nor really understood Linn’s feelings these days without close reading and markups of the text.


Collaborating with the rest of the class in creating a standardized markup style gave me insight into the workings of editorial boards; specifically how long the editorial decision process takes.  As a group of ten, we spent the better part of 15 minutes discussing the benefits of labeling boats as objects or places.  Both sides of the argument made good points, and we found it difficult to come to a consensus.  I think this illustrates a point made by Elena Pierazzo, “objectivity is not very productive or helpful in the case of transcription and subsequently of diplomatic edition… it is argued here for informed, circumspect, documented, scholarly interpretation.”  There was no right answer in the debate we had.  It was a matter of weighing the facts in front of us and making a subjective decision, a decision that was in part based on what information we wanted the markup to carry.  We ended up marking named boats as objects because we wanted readers to know that they were only referred to as places in specific circumstances.  This is an example of the purpose of a digital edition as defined by Pierazzo, that they are meant “to achieve the scholarly purpose of the edition–a purpose which, by definition, varies.”

A selection of my original markup...

A selection of my original markup…

... vs. the same selection after group editing.

… vs. the same selection after group editing.

What I Learned From Tagging

Learning how to mark up our documents and then taking what we learned and applying it to our journal entires has allowed me to obtain a deeper understanding about the way that Linn writes about the war. Although the process was tricky and frustrating at points purely because of my lack of experience, I believe that it brought focus to the specific types of things that Linn talks about when he is writing. For example, when going through the version of the Google Docs that was marked up with colors, it was clear that some of the colors were used more than others. For me, I would say that blue and orange were the two most used, while purple, brown, and cyan were the least used. This comments on Linn’s writings because it gives us insight into his writing style, with a focus on people and objects. Although he is descriptive in some places, he sometimes jumps from topic to topic, which is why we see less cyan, brown, and purple.

A lot of what Pierazzo talks about in her piece was visible in our process. For example, there was a large variety in the amount of tagging that occurred, with some people tagging most words, while some just picked out the important ones.  This resonates in Pierazzo’s article when she says, “So, we must have limits, and limits represent the boundaries within which the hermeneutic process can develop”(466). One of my paragraphs is below (A), and i chose to only tag the words that I thought were important and relative.

Screen Shot 2014-10-26 at 9.21.11 PM


Screen Shot 2014-10-26 at 9.20.56 PM



Although I think i did not make a mistake in being sparse, other people heavily marked up their entries (B) which made me come to think about how they thought those words were important compared to how I choose to select my words. Again, this links back to Pierazzo when she says that a digital edition includes words and sections that are “considered meaningful to the editors” (475) and “that one cannot declare once
and for all which features should be included” (475). The degree to which each person marked up their piece was one of the most interesting factors when I looked over everyone else’s entries.

Screen Shot 2014-10-26 at 10.09.45 PM

I also learned a lot in the editorial process, primarily that it is harder to come to conclusions on basic stuff like whether a boat is a place or object than I thought. When we were talking about the cossack and different ways to go deeper in tagging, it changed the way that i thought about this tagging and my reading. When tagging mine, i had a deep internal struggle about how to tag battery, considering that like cossack, it could be both. My struggle was the externalized when we came to class and discussed cossack. When talking about what to mark up and what not to mark up, Pierazzo says that it “depends either on the particular vision that we have of a particular manuscript or on practical constraints” (465). For me, the idea of particular vision is why we disagreed. I saw battery as an place, and when asked about cossack it made sense to me that it would be a place too. Boats represent places for me, but someone made a point that to Linn, they are objects not places, and that makes sense to me. By making it an object but adding the type boat, we were able to come to a consensus. However, considering that we spent so much time arguing over one word, it makes me dread what it must be like to go through an entire edited text. I thought that this was interesting, gave me a better look into the kinds of words and descriptions that Linn uses, and taught me some new useful skills.

On Tagging and Markup

The process of marking up text itself reveals some key inherent aspects. As I built my TEI file, I saw the structure of the text as dictated by the appearance of objects, places and their relationships and definitions. For example, I noticed the relatively few times places were mentioned in the text. Only one place, “the road” was mentioned within Linn’s narrative for my section of text, events were tied to names not places. After going through the process it is evident that the marked up file is inherently different than the original. One problem I encountered was determining the amount of information and detail to include in my markup. If I did  too little I might end up losing important meaning or details, marking up too much risked obscuring the original text.  Pierazzo comments on this dilemma and notes that a transition can be viewed as  model of a physical object, considering there are an infinite number of details present in a physical object, one might be tempted to create a model that aspires to be the original as much as possible, however, a model is useless unless it is a simplification of what it models. Pierazzo  suggests this balance between too much and too little detail can be remedied with a application of a “grid of features” , a hierarchy of what characteristics of a text are important. 

This if an example of levels of detail TEI markup language allows. The transcriber must make decisions on what is important.

This if an example of levels of detail TEI markup language allows. The transcriber must make decisions on what is important.

The collaborative process changed the way I thought of edited work production. For the tagging of words, what might be an obvious tag for one might not be for another, while neither would have reasons completely disproving the other. Compromise was the eventual outcome thus collaborative texts are built on compromises between differing viewpoints. When considering other’s viewpoints on the same text, one gains the appreciation of how one’s own editing is the result of their own interpretation.

Elena Pierazzo mentions this as part of a larger phenomenon encountered in digital transcriptions, that is the essential effect of the transcriber when they chose what to bring to light. Pierazzo claims that a text in its original form contains an infinite number of facts and that transcription is “a substantially  interpretive act” as only a finite number of those facts can be presented in a marked transcription.

For example, place vs. object was a significant issue that was brought up in class. Is the Cossack, a ship, a place or an object? Some considered the Cossack an object as all ships are objects, others considered it a place as it behaves this way in Linn’s narrative, much like a house. The finished collaborative transcription is very different from one made individually. Pierazzo mentions this distinction with her description of a diplomatic edition.

Pierazzo claims that a direct transcription is  “a derivative document that holds a relationship with the transcribed document.” A diplomatic transcription however, is a “formal presentation of such a derivative document” that is proofread, corrected and peer reviewed before publication for the public. 


The finished product is a result of collaboration and is inherently different than the original individual text.