{"id":167,"date":"2014-04-24T08:02:36","date_gmt":"2014-04-24T08:02:36","guid":{"rendered":"http:\/\/uahost.uantwerpen.be\/bdmp\/EncodingManual\/?page_id=167"},"modified":"2015-10-02T15:05:00","modified_gmt":"2015-10-02T15:05:00","slug":"markup","status":"publish","type":"page","link":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/general-remarks\/markup\/","title":{"rendered":"First Things First: Markup"},"content":{"rendered":"<h3 style=\"text-align: justify;\">What is Markup?<\/h3>\n<p style=\"text-align: justify;\">For a Scholarly Digital Edition such as the <a title=\"BDMP\" href=\"http:\/\/www.beckettarchive.org\">Beckett Digital Manuscript Project<\/a>\u00a0(BDMP) to work,\u00a0the texts in the edition&#039;s corpus will need to be computer readable: only then can we take full advantage of all the possibilities the digital medium has to offer. To make our texts computer readable, we transcribe them into a descriptive markup language called XML. So the\u00a0first thing you will need\u00a0to know before you can start transcribing manuscripts, really, is what markup is.<\/p>\n<p class=\"p1\" style=\"text-align: justify;\"><span class=\"s1\">As is explained in the <a title=\"TEI\" href=\"http:\/\/www.tei-c.org\/index.xml\">Text Encoding Initiative<\/a>&#039;s &#039;<a title=\"Gentle Introduction to XML\" href=\"http:\/\/www.tei-c.org\/release\/doc\/tei-p5-doc\/en\/html\/SG.html\">Gentle Introduction to XML<\/a>\u2019, the concept of electronic markup is derived from a practice dating back to the age of print, in which manuscripts were annotated with instructions for compositors or typists, explaining how the text should be printed or laid out (xxvii). By extension, markup can be regarded as a system that indicates how a\u00a0text should be presented (or\u00a0read). We can even go a step further: as James H. Coombs, Allen H. Renear, and Steven J. DeRose already argued\u00a0in their influential paper &#039;<a title=\"Coombs et al. (pdf)\" href=\"http:\/\/cpe.njit.edu\/dlnotes\/CIS\/CIS732_447\/Cis732_6R.pdf\">Markup Systems and the Future of Scholarly Text Processing<\/a>\u2019 in 1987: &#039;Whenever an author writes anything, he or she &#034;marks it up&#034;&#039; (934). <\/span><\/p>\n<p class=\"p1\" style=\"text-align: justify;\"><span class=\"s1\">Because basic writing conventions such as the use of capitals, punctuation, or spacing can all be regarded as the most minimal of layout and reading instructions, markup can be said to form an inextricable part of the writing process itself. But there are of course many different ways to mark up text that go far beyond these basic writing conventions. In their paper, Coombs et al. distinguished six types of markup: punctuational markup, presentational markup, procedural markup, descriptive markup, referential markup, and metamarkup (935-937).<\/span><\/p>\n<h5 class=\"p1\" style=\"text-align: justify;\">Punctuational markup<\/h5>\n<p class=\"p1\" style=\"text-align: justify;\">Punctuational markup is the type of markup that every writer uses: spaces to separate words from one another, full stops to distinguish between sentences, capitals to mark the beginning of sentences, titles, names, etc. Without punctuational markup, all texts would be written in\u00a0<strong>scriptio continua<\/strong> &#8211; one long, uninterrupted string of characters.<\/p>\n<h5 class=\"p1\" style=\"text-align: justify;\">Presentational markup<\/h5>\n<p class=\"p1\" style=\"text-align: justify;\">Presentational markup is a similar form of markup that is not applied to the words of the text itself, but to how those words are presented: pagination, indentation, white spaces, heading formats, etc. Markup on the page-level, rather than on the word-level.<\/p>\n<h5 class=\"p1\" style=\"text-align: justify;\">Procedural markup<\/h5>\n<p class=\"p1\" style=\"text-align: justify;\">While\u00a0punctuational and presentational markup\u00a0are intended for human readers (to facilitate the reading process), procedural markup\u00a0is intended for the computer. This type of markup consists of codes that a computer will interpret as formatting <em>instructions<\/em> (such as:\u00a0<strong>leave a whitespace here<\/strong> or\u00a0<strong>change the font here<\/strong>). In a WYSIWYG (What You See Is What You Get) input environment like Word, for instance, the codes that make up these instructions are hidden from the human reader\u2019s sight, and translated into presentational markup instead.<\/p>\n<h5 class=\"p2\" style=\"text-align: justify;\">Descriptive markup<\/h5>\n<p class=\"p2\" style=\"text-align: justify;\"><span class=\"s1\">Descriptive markup, then, does not tell the computer <em>what to do<\/em>, but rather what the text <em>is<\/em>\u00a0(such as:\u00a0<strong>this section of the text is a paragraph<\/strong>, <strong>this section of the text is a quote<\/strong>, or\u00a0<strong>this section of the text is emphasized<\/strong>). To tell the computer which sections of the text are which, descriptive markup languages use\u00a0<a title=\"Tags\" href=\"http:\/\/localhost:8888\/EncodingManual\/?page_id=169\">tags<\/a> that mark the beginning and ending of each section. For example:<\/span><\/p>\n<div class=\"panel panel-info \"><div class=\"panel-body\"><\/div><\/p>\n<p>[xml]\n&lt;quote&gt;To be, or not to be: that is the question.&lt;\/quote&gt;[\/xml]<\/p>\n<p><div class=\"panel-body\"><\/div>\n<\/div>\n<h5 class=\"p1\"><span class=\"s1\"><b>Referential Markup and Metamarkup<\/b><\/span><\/h5>\n<p class=\"p1\" style=\"text-align: justify;\"><span class=\"s1\">For our purposes, the most important thing about these last two types of markup is that they exist to make the descriptive markup work. As such, they are mainly used as reference points for the computer:\u00a0 referential markup is used to refer to information that is external to the marked up document, and metamarkup entails a set of instructions that tells the computer what the different elements in the descriptive markup mean, how they can be used, and how they should be formatted.<\/span><\/p>\n<p class=\"p1\" style=\"text-align: justify;\"><span class=\"s1\">Descriptive markup has two significant advantages over procedural markup. Firstly, it makes formatting a text much easier, because it allows the author to declare how all instances of a specific class of textual elements should be formatted (such as: <strong>indent every paragraph<\/strong> or <strong>italicize all emphasized text<\/strong>), and to change those declarations (and thus the text\u2019s formatting) at any point in the writing process. Say we have a text in which book titles and emphasis are both rendered in italics. And say we want to change this, by rendering all the book titles in bold instead. If we used descriptive markup language to encode the text, doing so would be a piece of cake: just go to the metamarkup file that contains the declaration that book titles should be rendered in italics, and change it to bold. If we used procedural markup to encode the text, however, changing the formatting of all the book titles automatically would be impossible. In a Word-file, for instance, we would have to go over every italicized string of characters, determine whether it constitutes a book title or not, and change the formatting accordingly.\u00a0<\/span><\/p>\n<p class=\"p2\" style=\"text-align: justify;\">The second advantage of descriptive markup languages over procedural markup languages is that identifying the different sections of text adds a layer of &#039;meaning&#039; to the text that can be recognized, processed, and analyzed by a computer in a way that is\u00a0impossible with procedural markup. For our descriptive markup language example, it would be relatively easy to produce a graph that displays how often the author used emphasis in her text, and compare her results to that of other authors, for instance. If her text was marked up using procedural markup, on the other hand, this would again be impossible, because there would be no (easy) way to automatically filter all of the book titles out of the graph\u2019s results.<\/p>\n<p><a name=\"Standards\"><\/a><\/p>\n<h3 class=\"p2\" style=\"text-align: justify;\">Standards for Markup Languages<\/h3>\n<p class=\"p2\" style=\"text-align: justify;\">Descriptive Markup Languages come in all shapes and sizes. To encode the quote from Hamlet above, I used a <code>&lt;quote&gt;<\/code> tag to mark the quote; but if I wanted to, I could just as easily have used a\u00a0<code>&lt;q&gt;<\/code> tag instead. As long as you are consistent, you are free to choose your own tags. But the problem is, of course, that if everyone uses their own tag-set to encode their texts, nobody will be able to use and process texts that are encoded by others. To solve this problem, the Text Encoding Initiative developed a standard for encoding texts, using <strong>XML<\/strong> (eXtensible Markup Language). That is why we use TEI-compliant XML to encode our texts.<\/p>\n<p class=\"p2\" style=\"text-align: justify;\">The TEI&#039;s goal is to provide an encoding standard that suits the needs of any type of text. As a result, the TEI&#039;s\u00a0tag-set is huge, and its\u00a0accompanying <a title=\"TEI Guidelines\" href=\"http:\/\/www.tei-c.org\/release\/doc\/tei-p5-doc\/en\/html\/index.html\">documentation<\/a> can be daunting.\u00a0<strong>But no-one needs to use\u00a0all the tags<\/strong>. As Lou Burnard explained in his recent monograph\u00a0<a title=\"What is the TEI\" href=\"http:\/\/books.openedition.org\/oep\/426\"><em>What is the Text Encoding Initiative<\/em><\/a>, deciding which tags your text will need is the first step of\u00a0any text encoding project (9). Like any type of text,\u00a0our specific corpus of genetic materials only needs a fraction\u00a0of the tags that the TEI has to offer. This encoding manual will therefore serve to explain which of the TEI&#039;s tags we use at the BDMP, and how; for those who collaborate on one of the BDMP&#039;s upcoming modules as well as for those who are interested in our encoding schema in general.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Markup? For a Scholarly Digital Edition such as the Beckett Digital Manuscript Project\u00a0(BDMP) to work,\u00a0the texts in the edition&#039;s corpus will need to be computer readable: only then can we take full advantage of all the possibilities the digital medium has to offer. To make our texts computer readable, we transcribe them into [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":19,"menu_order":1,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"class_list":["post-167","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/pages\/167","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/comments?post=167"}],"version-history":[{"count":3,"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/pages\/167\/revisions"}],"predecessor-version":[{"id":1416,"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/pages\/167\/revisions\/1416"}],"up":[{"embeddable":true,"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/pages\/19"}],"wp:attachment":[{"href":"https:\/\/bdmpmanual.uantwerpen.be\/index.php\/wp-json\/wp\/v2\/media?parent=167"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}