There is not any standard electronic format for specifying the translation component of a braille system for use by braille transcribing software or related purposes. This article details BrailleSpec, a proposed electronic format, that has been tested for specifying print-to-braille translation according to English Braille American Edition (EBAE). This format has a number of advantages:
The version of the BrailleSpec format as described here accommodates the translation rules of English Braille American Edition (EBAE) but will likely need extensions for other braille systems. I welcome feedback as to needed improvements.
This new format is not simply an XML tagged version of a standard translation table. One significance of the new format is that it is more complete: it supports improved translation accuracy and new features by providing considerably more information than is present in current translation tables. As an example of support for improved accuracy, this new format only requires the user to add a few lines in order for a translation application to have the information needed to automatically produce specialized translation tables, such as those used by some braille systems to translate proper names. Examples of new features easily supported by the BrailleSpec format are user-specified systems of graded braille and production of summary reports on contraction useage in translated documents.
Since this proposed XML format is quite simple in comparison with ZedAI, I've chosen to use DTDs rather than Schemas to define it. These DTDs are detailed below.
The current braille specification format uses a single simple master file which references
a number of supporting files via XML include
statements.
Here is its DTD:
<!ELEMENT brailleSystem (identifier, xi:include+)> <!ATTLIST brailleSystem xmlns:xi CDATA #IMPLIED> <!ELEMENT identifier (#PCDATA)> <!ELEMENT xi:include EMPTY> <!ATTLIST xi:include href CDATA #REQUIRED>And here are the first few lines of the master file:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE brailleSystem SYSTEM "brailleSystem.dtd" [ ]> <brailleSystem xmlns:xi="http://www.w3.org/2001/XInclude"> <identifier>English Braille American Edition, 2007 Update</identifier> <xi:include href="cells.xml"/> <xi:include href="signs.xml"/> ... <brailleSystem>
The individual supporting files are modular with each specifying one particular type of data.
Typical translation tables combine the data represented here in the two separate Signs and Restrictions Files. However,these tables do not typically include the additional data represented in the numerous other Files.
The first supporting file specifies the representation of the braille cells used in the other supporting files by reference to the corresponding Unicode Braille Patterns. Use of this approach makes it possible for the user to employ any desired representation. Here I've chosen to use North American ASCII Braille as I find it the easiest to verify.
This is the complete DTD for the cells file:
<!ELEMENT cells (cell+)> <!ELEMENT cell (ABrl, Unicode)> <!ELEMENT ABrl (#PCDATA)> <!ELEMENT Unicode (#PCDATA)> <!ATTLIST Unicode dots CDATA #IMPLIED>Here are the first few lines of the
cells.xml
file:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE brailleSystem SYSTEM "brailleSystem.dtd" [ ]> <cells> <cell><ABrl>A</ABrl><Unicode dots="1">⠁<Unicode></cell> <cell>><ABrl>B</ABrl><Unicode dots="12">⠃<Unicode></cell> <cell><ABrl>C</ABrl><Unicode dots="14">⠉<Unicode></cell> <cell><ABrl>D</ABrl><Unicode dots="145">⠙<Unicode></cell> ...Note that if a Unicode-compatible simulated braille font is installed, the simulated braille glyphs will be displayed when a
cells.xml
file is viewed in a browser.
The second supporting file specifies the official print-to-braille replacements specified by the braille system. This is in contrast to a typical undifferentiated translation table which contains both official and ad hoc replacements. This file should only be modified to fix unintended errors or to reflect changes in the official rules of the targetted braille system.
This is the DTD for the signs file with the exception of the full
specification optional
choiceSign
section which is used to simplify the
application to American English braille. (The type
attribute of a sign
element plays a role similar
to what is called an opcode
in liblouis and similar
translation software.)
<!ENTITY % shared1 SYSTEM "setSigns.dtd" > %shared1; <!ELEMENT signs (sign+, altsign*, indicatorSign+, choicesign*)> <!ELEMENT sign (print, braille, DotlessBraille* )> <!ATTLIST sign type (%signtype;) #REQUIRED uname CDATA #IMPLIED unique (no) #IMPLIED> <!ELEMENT altsign (print, braille, DotlessBraille*)> <!ATTLIST altsign type (numeric|smartApos|lower|altPrimes) #REQUIRED> <!ELEMENT print (#PCDATA)> <!ELEMENT braille (#PCDATA)> <!ELEMENT DotlessBraille (#PCDATA)> <!ELEMENT indicatorSign ( braille, DotlessBraille* )> <!ATTLIST indicatorSign name (%indicatorNames;) #REQUIRED unique (no) #IMPLIED>
Since the signs.xml
file is a significant part of the specification,
it is worth examining an example in detail. Note that there can be up to
four different kinds of top-level elements.
First, since it is desirable from the standpoint of implementation to be able to use the
contents of the print elements
as keys to the corresponding replacements, it turns out that (at least for English braille),
one actually needs two separate types
of sign elements, sign
and altsign
,
since there are a few cases, including the print period, where English
braille uses a different replacement depending on the semantics of the character. Here we show a
a fragment of the file with examples of both types of elements.
... <sign type="accentedLetter" uname="cap A with grave"> <print>À</print> <braille>A</braille> <DotlessBraille>À</DotlessBraille> </sign> ... <sign type="initialLetterContraction"> <print>day</print> <braille>"D</braille> <DotlessBraille>ãä</DotlessBraille> </sign> <sign type="initialLetterContraction"> <print>there</print> <braille>"!</braille> <DotlessBraille>čĎ</DotlessBraille> </sign> ... <sign type="postPunc"> <print>.</print> <braille>4</braille> <DotlessBraille>.</DotlessBraille> </sign> ... <!-- Decimal Point, not period --> <altsign type="numeric"> <print>.</print> <braille>.</braille> <DotlessBraille>ȣ</DotlessBraille> </altsign>
Now let's examine the elements in more detail. Each sign
element must have
a type
attribute with its value chosen from the specified list
in order
that groups of replacement elements with
the same value for their type
attribute can be referenced elsewhere.
The actual attribute values could be arbitrary text but, as these examples
illustrate, it can be quite useful to choose text with mnemonic significance
to persons familiar with the braille system being specified. The user can
employ the optional uname
attribute
to better identify an unfamiliar replacement. (The unique
attribute is used to help address some inconsistencies between different
specifications for American English braille.)
The print
and
braille
elements are, of course, the actual replacement rule with the
braille cell or cells represented according to the
specification in the cells.xml
file. The optional
DotlessBraille
element specifies the corresponding glyph code(s)
in the DotlessBraille font. This element can also be used to
provide unique character codes for each distinct use of the same braille cell.
The third type of signs
element is the
indicatorSign
element. Although braille indicators are not
actually replacements but, rather, markup unique to braille, it
is again desirable from the standpoint of implementation to include their
representations here. (One important feature currently missing from this braille specification is a generic method for encoding the rules for
the use of braille indicators.)
The next supporting file is the optional restrictions file. With contracted braille it can be necessary to restrict the use of certain contractions in certain words in order to enhance readability. A still common way of implementing these restrictions, which was originally proposed in 1970 by Dr. Jonathan Millen of Mitre Corporation, is to add additional ad hoc replacement rules similar to the official replacement rules. (This and an alternative approach are described in more detail in a separate article.)
It is typical to include these ad hoc replacement rules together with the official rules in a single translation table. Separating them makes it easier for the user to identify which rules can be changed as necessary to improve translation accuracy. It also makes it easier for the the ad hoc rules to be represented in terms of the official rules.
Having both official and ad hoc rules in a single undifferentiated table has other disadvantages in addition to inconveniencing the user. It has, for example, led to confusion for persons otherwise unfamiliar with the braille system in that they may incorrectly believe the ad hoc rules to be part of the official system. Also, having all of the rules in a single table (or file) makes it difficult to utilize translation algorithms that don't employ the ad hoc rules. (Note that my experience is that a person who is familiar with a braille system can in an hour or so of concentrated effort edit an undifferentiated translation table containing both types of rules so as to separate the offical rules from the ad hoc ones. Of course, once this has been done, one can easily develop a simple application to reconstruct the original table as necessary.)
Here is the DTD:
<!ENTITY % shared1 SYSTEM "setssigns.dtd" > %shared1; <!ENTITY % signEl "sign|altsign"> <!ELEMENT restrictions (restriction+)> <!ELEMENT restriction (input, use)> <!ATTLIST restriction type (%signtype;) #REQUIRED example CDATA #IMPLIED> <!ELEMENT input (#PCDATA)> <!ELEMENT use (print+)> <!ELEMENT print (#PCDATA)> <!ATTLIST print type (%signEl;) "sign" >and here are two examples of actual
restriction
elements used
in American English braille:
<restriction type="beginningPartWord" example="dispirit"> <input>dispirit</input> <use> <print type="sign">d</print> <print>i</print> <print>spirit</print> </use> </restriction> <restriction type="midEndPartWord" example="dunghill"> <input>ghill</input> <use> <print>g</print> <print>h</print> <print>i</print> <print>l</print> <print>l</print> </use> </restriction>Note that the intent of the DTD for
restriction
elements is that
each of the individual replacements used for the restrictions be identical
to an official replacement identified by a print
element of a sign
or altsign
element
specified in the signs.xml
file. (Ensuring that this
is the case has to be done by the implementing software.)
The approach of referencing the official replacements is necessary
to keeping track
of which contractions are used and how often they are used
and also to supporting graded braille.
An alternative or extension to the use of ad hoc replacements for handling exceptions to contraction rules is to use one or more dictionary files containing user-specified print-to-braille translations such as those that typically appear in an appendix of braille transcription manuals. The proposed XML format uses this simple DTD for the data file which specifies the names of the dictionary files:
<!ENTITY % dictionaryFormat "oldStyle" > <!ELEMENT exceptions (file+)> <!ELEMENT file (#PCDATA)> <!ATTLIST file format (%dictionaryFormat;) #REQUIRED>Files that use the "oldStyle" format specify the translations in terms of the official braille replacements in the
signs
file. As is the case for
restrictions
, the approach of requiring that the official replacements be referenced has the advantage
of making it possible to keep track
of contraction useage.
First posted September 2, 2010. Contact info at dotlessbraille dot org
Updated version posted September 23, 2010.