Text Conversion

The novel we chose was unfortunately not available in a plain text format, but as an epub. To convert it to plain text we used calibre, then read through for spelling and OCR errors, before creating structural markup. Our texts are divided into five stories, each containing a fictionalized account and a more brief historical summary. They are further divided by paragraph and sentence tags. At the top of the text is a TEI header including bibliographic information, as well as a character list. We decided to make our document TEI conforment, because if we were ever to publish the text and bring it out of "the great unread", compliance with standards of textual markup like the Text Encoding Initiative are important to archiving and attributing our work. After creating a schema based on the parameters included in the TEI, we set about creating an ODD, "one document does it all," which is an important element of a TEI conformant document. After several misguided adventures involving ROMA and other outdated services to create ODDs, we eventually used the guide provided by to create our ODD almost "automagically."


To answer our questions regarding use of descriptive language and verbs in the different sections of the work, we marked up our text to include a word tag, w, which included type attrubutes, descLang and verb, as well as subtype attributes, which would indicate which type of verb or descriptive language was being used. After some time using just active and be verbs, we expanded our markup to include linking verbs, since we had neglected to take them into account initially. Our markup of adjectives and adverbs was more consistent. We also tagged quotations and character names, for rendition in html and potential future research.


As we were completing our markup, we started noticing a strong correlation between the descriptive language being used in each story and the gender of the characters being discussed. We quickly became interested in this correlation, and decided to make it the main focus of our research. Once we decided the focus of our project, we were able to come up with lists of the adjectives used in each story and thier frequency. From here, we decided to select the top adjectives that each tale had in common, and represent these in a radar chart. We chose a radar chart because we wanted to be able to compare multiple tales at the same time, as the data lent itself to this kind of visualization. Because, however, not all of the stories shared the same adjectives, we knew this would not be a good way to portray the whole picture. By creating bar charts, we were able to show the top adjectives for each particular story, and show correlation amongst a singular tale as opposed to a comparison across multiple tales. By having both the radar and the bar charts, we were able to offer two different takes on the way language works throughout Illustrious Children.