How Tagging Helped Me

When marking up a transcription it also forces you to take a deeper look into the diary entry, mainly just because you’re analyzing it even mScreen shot 2014-10-26 at 10.37.45 PMore. It has allowed me to understand the details of Linn’s diary better, especially who the people are and how they relate to John Linn and to the Civil War. For example, I was not sure if I had transcribed the word “twit” correctly, but after I did and looked up the definition it helped my to better understand the context of the text around it. It was particularly helpful to work with my classmates to figure out together who someone mentioned in a diary was because it was most likely in another student’s diary as well. It definitely allowed me to understand better how edited texts are produced, and it is not easy. OScreen shot 2014-10-26 at 10.42.30 PMur editorial board had some particular issues with resolving disputes over places vs. objects. Although I did not have any of the words in my diary post that were being argued over, one that was constantly discussed was if we should tag regiments as people. We did finally decide and all compromised on many tagging of words and discussions. It all had to do with judgment, and “Judgment is necessarily involved in deciding what is in face present… but the transcriber’s goal is to make an informed decision about what is actually inscribed at each point” (Pierrazo 465). Thankfully, it was a collaborative effort with the rest of the class that we were all able to work on together via computers and the Internet. Pierrazo put it in a way that makes sense by saying, “An electronic edition is like an iceberg, with far more data potentially available than is actually visible on the screen, and this is at the same time a great opportunity and a temptation to overdo things, When so many possibilities exist, there is a danger of technological considerations of what can be done taking priority over intellectual considerations of what is actually desirable or necessary in any particular case” (Pierrazo 467). Making decisions about transcribing has the potential to take an incredibly long time. “We all know how important economic considerations are in our decision-making processes; almost all of our research projects are funded for a specific time-span and budget, and so it is fundamental to ensure that the transcription (and encoding) is feasible within this lifetime.” (p 469). Although we don’t have to worry about a budget, it would not be time conscious to give every single word an in-depth tag, and we did a good job at tagging words but not going overboard.

What I Learned From Tagging

Learning how to mark up our documents and then taking what we learned and applying it to our journal entires has allowed me to obtain a deeper understanding about the way that Linn writes about the war. Although the process was tricky and frustrating at points purely because of my lack of experience, I believe that it brought focus to the specific types of things that Linn talks about when he is writing. For example, when going through the version of the Google Docs that was marked up with colors, it was clear that some of the colors were used more than others. For me, I would say that blue and orange were the two most used, while purple, brown, and cyan were the least used. This comments on Linn’s writings because it gives us insight into his writing style, with a focus on people and objects. Although he is descriptive in some places, he sometimes jumps from topic to topic, which is why we see less cyan, brown, and purple.

A lot of what Pierazzo talks about in her piece was visible in our process. For example, there was a large variety in the amount of tagging that occurred, with some people tagging most words, while some just picked out the important ones.  This resonates in Pierazzo’s article when she says, “So, we must have limits, and limits represent the boundaries within which the hermeneutic process can develop”(466). One of my paragraphs is below (A), and i chose to only tag the words that I thought were important and relative.

Screen Shot 2014-10-26 at 9.21.11 PM


Screen Shot 2014-10-26 at 9.20.56 PM



Although I think i did not make a mistake in being sparse, other people heavily marked up their entries (B) which made me come to think about how they thought those words were important compared to how I choose to select my words. Again, this links back to Pierazzo when she says that a digital edition includes words and sections that are “considered meaningful to the editors” (475) and “that one cannot declare once
and for all which features should be included” (475). The degree to which each person marked up their piece was one of the most interesting factors when I looked over everyone else’s entries.

Screen Shot 2014-10-26 at 10.09.45 PM

I also learned a lot in the editorial process, primarily that it is harder to come to conclusions on basic stuff like whether a boat is a place or object than I thought. When we were talking about the cossack and different ways to go deeper in tagging, it changed the way that i thought about this tagging and my reading. When tagging mine, i had a deep internal struggle about how to tag battery, considering that like cossack, it could be both. My struggle was the externalized when we came to class and discussed cossack. When talking about what to mark up and what not to mark up, Pierazzo says that it “depends either on the particular vision that we have of a particular manuscript or on practical constraints” (465). For me, the idea of particular vision is why we disagreed. I saw battery as an place, and when asked about cossack it made sense to me that it would be a place too. Boats represent places for me, but someone made a point that to Linn, they are objects not places, and that makes sense to me. By making it an object but adding the type boat, we were able to come to a consensus. However, considering that we spent so much time arguing over one word, it makes me dread what it must be like to go through an entire edited text. I thought that this was interesting, gave me a better look into the kinds of words and descriptions that Linn uses, and taught me some new useful skills.

Things I learned through tagging

The process of marking up my transcription was definitely very helpful as it allowed me to make observations that I would not have otherwise made. The first step was for us to tag people, places, objects, events, etc. in our our own diary entry. Before doing the markups in XML, we made a class google document with all of our diary entries in order. Each category (people, places, objects, etc.) eScreen Shot 2014-10-26 at 5.45.56 PMach had its own color and we were instructed to highlight the words accordingly. For me, this was the most useful step. During this step was when I decided which words were important enough to be highlighted. For example, a person was referred to in Linn’s entry as “gentleman,” but I decided that he was someone Linn saw in passing and was not essential to be marked up.

Another helpful part of this step was that when each of my classmates and I finished the markups I was able to scroll through the document and see which color was the most prominent. It turned out that blue and orange, which represented people and objects, appeared to be the two most seen colors. On the other hand, red represented events and this was probably the most seldom seen color. This allowed me to observe that Linn did not view the specific events, accomplishments, or defeats of the battle as significant to write about, but instead Linn focused on the people and objects that directly involved him on a day-to-day basis.

Lastly, through scrolling through the document I was able to see that each person chose to focus on tagging different word types. For example, there were some diary entries that had numerous purple markups (dates and times) and others that had zero. I do not think that this difference came about because of Linn, but this occurred because of the students’ different ideas of what they viewed as important.  This observation connects heavily to the Pierazzo reading. Pierazzo focused a lot on how the digital medium allows for greater possibilities for representation, which proved to be true. Additionally, I was able to see the large role individuality and perspective plays in marking up documents that Pierazzo discussed. By actually completing markups and comparing mine to that of my classmates, I now agree with Pierazzos statement that, “a digital edition includes features of the original document that are considered meaningful to the editors” (475). The digital edition is exactly so, but I may be difficult to understand this without actually going through the process for yourself.

After highlighting in the google document, we used XML in order to tag the words. Personally, I think it is significantly harder to make observations in this medium. This is because the google document allowed for both close and distant reading analyses to be made, which cannot be done using the XML. In XML only close reading analysis can be easily made. I definitely used this method as for each word that I tagged, I first analyzed the importance of it in terms of Linn and his entry. Based on my analysis I decided whether the word was worth being tagged.  This connects to another central topic of Pierazzo’s article, which was on “when to stop.” Since the digital world does not place many limitations on the editors, how do the editors know enough is enough? Personally, I believe it is better to under tag than over tag, because if every other word is tagged it is harder to see what is truly meaningful.

Another aspect of this project that was an eye-opener for me was the class debate. During this class, I felt like I was at an editorial staff meeting. We were sitting in a circle comparing specific words that some of us tagged as different word types. For example, cossack was a word that was of huge debate. A portion of the class felt that cossack was a place, but others argued that it was an object. It was interesting to take part in this debate and to in the end agree on one of the two. As a class we decided to mark cossack as an object. We came to this conclusion because although sometimes cossack is mentioned as a place in which Linn is going to, this is not always the case. However, it can not be argued against that cossack is always an object since it is a boat. I thought it was very interesting to see how much passion was put into this argument over tagging one single word.

I also found that this act of collaboration was helpful in enhancing my TEI file. Prior to this class, I did not go into detail on any of my tags. I merely used the word categories given to me, without further identifying. As a class we agreed that Beaver was someone of importance based on how frequently he Screen Shot 2014-10-26 at 5.59.56 PMwas discussed throughout the diary entries. Since he was important, we decided to give him an attribute. As a group we thought it was appropriate to give Beaver the type military.

I definitely had a lot of fun doing this project and I learned a lot about digital editions and the battles that editors can face in the process of publishing. Sometimes freedom is a bad thing because it can be difficult to place limits on oneself. Although a digital edition will never be the same as its source document I enjoyed trying to preserve it as much as I could. For example, in the TEI the line breaks match up with that of the original copy. I also kept Linn’s abbreviations such as his ampersands. Although there are some aspects that can not be replicated, such as the specific spacings between his written words, it is important to maintain as much as the digital allows.

Importance of Tagging in TEI

Close reading is a great tool to help categorize people, places, events, and more within a specific text. Using TEI, we analyzed Linn’s diary by choosing what words to tag. For example, one of our class discussions consisted of whether or not “cossack” should be tagged as a place or object. I argued that a cossack, which is a type of boat, is always an object but depending on the context of the sentence, it can be a place, too. In Linn’s diary, cossack was frequently used so we knew that we needed to tag it. We decided to tag it as object because in some instances in the diary, cossack wasn’t always a place.
Screen Shot 2014-10-26 at 3.52.50 PM

However, we resolved the place vs. object dilemma by categorizing it as an object but by also specifying what kind of object it is. Thus, we specified cossack by placing an object type tag as “boat”. By consulting with my peers, I realized that there can be multiple different perspectives and outlooks of a word, phrase or even an entire document. Cossack is a great example of a word that can be interpreted differently depending on its context. I may feel strongly that cossack is an object, but others can interpret it differently. Collaborating throughout Linn’s diary will allow our class to determine and classify words, which will also help clarify different opinions and interpretations.

In general, marking up the transcription has helped me better understand the context and circumstances of Linn. For instance, we individually started separating the people in the database by union and confederate army. Most of the people are union, which is to be expected because Linn is part of the union army and talks about the military men surrounding him. I also learned a little more about the men in the specific diary entry I transcribed.Screen Shot 2014-10-26 at 4.13.46 PMI thought that Alcot, Ripley and Prawe were all part of the union army but they were actually reporters who were supposedly neutral during the war. This helped clarify the context of the diary entry when I knew they were not directly involved in the war. As shown above, Alcot, Ripley and Prawe are reporters for the Herald & Inquirer. Before we started categorizing people, I assumed they were part of the military and I was confused why a newspaper company was mentioned. Now the context of this diary entry makes more sense!

In Pierazzo’s essay “A Rationale of Digital Documentary Editions”, she discusses the process of tagging selection. One of the most challenging aspects of specifying by tagging in TEI is knowing when to stop. You could essentially tag everything but that’s very time-consuming and does not distinguish significant phrases or words from less important ones. Pierazzo writes, “…we might conclude that one possible and tempting answer to the question ‘where to stop’ could be ‘nowhere’, as there are potentially infinite sets of facts to be recorded” (466). This causes a wide variation in interpretation. If there’s no limit, then one would think there is essentially no structure or guidelines between different articles. Although there may not be a hard limit, “the vast majority of decisions we make in this realm are decisions on which all (or most) competent readers agree or seem likely to agree (p. 196)” (466). Pierazzo makes the point that the tags made are (almost) universally acceptable and understood. There is room for interpretation, but the tags are not completely random. Therefore, there is some order when tagging words. Additionally, Pierazzo feels that when tagging, it is important to consider your audience. She writes, “to achieve the purpose of the edition and meet the editors’ needs, one needs to ask which features bear a cognitive value, that is, which are relevant from a scholarly point of view” (469). This demonstrates that the person marking up the document must consider the audience and make thoughtful, educated decisions when tagging. Although there’s no limit or “correct” way to tag words, Pierazzo believes that there are ways to make it somewhat orderly and structured while also having room for different interpretation.

Week Four Assignments, Readings, Exercises

Monday 9/22

  • Reading: Complete reading transcribed Linn diary
  • Discussion: keywords from distant reading as metadata
  • Lab: More complex analytical/visualization tools (cross-corpus analysis)

Tuesday 9/23

Wednesday 9/24

  • Discussion: Distant reading & blog assignment wrap-up

Blog post #2 due (11pm)

Friday 9/26

  • Reading: Watch “Secrets of the Dead: The Lost Diary of Dr. Livingstone” (available to stream on our course Moodle site)
  • Timemapper exercise introduced
  • Research topics chosen