Entities – BDMP Encoding Manual

For your XML transcription to be universally readable, it needs to be encoded in a unicode character set. More specifically, the encoding used is usually either encoding="ISO-8859-1" or encoding="UTF-8". You can find this information in the very first tag of your XML-document:

[xml] <?xml version="1.0" encoding="utf-8"?>[/xml]

In practice, this means that all characters that are not standard keyboard characters (which include letters with or without accents, numbers, and interpunction) cannot be used as characters in the transcription. Instead, symbols and other non-standard characters need to be transcribed by means of their entity reference. This entity reference consists of an opening part ('&') a reference number (preceded by a #) or phrase, and a closing part (always ';'). Here are some examples:

[table]
,symbol,meaning,code
,©,copyright sign,"©"
,←,leftwards arrow,"←"
,η,greek small letter eta,"η"
,œ,small oelig,"œ"
,✓,check mark,"✓"
,⁁,caret insertion point (addition sign), "⁁"
,‸,caret (smaller addition sign), "‸"
,₰, deleatur (deletion sign), "₰"
[/table]

Sometimes you might also need to encode symbols that are normally reserved for the XML encoding itself, such as the ampersand sign (&) or angled brackets (< or >). To make sure the computer does not interpret these symbols incorrectly, you should also transcribe them as entities whenever they occur in the text:

[table]
,symbol,meaning,code
,&,ampersand,"&"
,<,'lesser than' sign,"<"
,>,'greater than' sign,">"
[/table]

If you come across other symbols that need to be transcribed, you can find the entity codes for most characters here. Or you can go to amp-what.com and try to describe the symbol in their search engine.