(This pages concludes the previous page.)
The previous page described the first part of a new XML format for specifying braille translation systems. That first part contains basically the same data that appears in a typical "translation table" but organizes it in a more useful way. This page describes the remaining part of this new XML format. This remaining part contains new data that makes it possible for a braille translation application to carry out a variety of new functions including providing specialized translation for tagged items.
The sets.xml
file is the first major new feature of the new XML format.
This file provides the user with a simple mechanism for specifying arbitrary subsets
of the replacement rules in the
signs.xml
file and,
optionally, for including the information
in the files listed in the
exceptions.xml
file.
The ability to specify a particular subset of replacement rules
is intended to support applications
which can create
and use different tables for translating special
items such as proper names without requiring the user to supply these tables
directly. The sets.xml
file can also be used in
conjunction with the rules.xml
file
to support translating certain items with special algorithms as well as
with specially-constructed translation tables.
Sometimes using a different algorithm can be an alternative to using a different table. For example, uncontracted braille can be produced from the same table as contracted braille by a using a special algorithm that is restricted to replacing a single print character at a time. Alternatively, uncontracted braille can be produced using the same algorithm as for contracted braille simply by using a special table which doesn't have any contractions.
Here is the DTD for sets.xml
file. Sets of replacement
rules can be defined by referencing one or more groups of
sign
or altsign
replacement rules
by the value of the common value of
their type
attribute and/or by referencing
individual replacement rules by the contents of their print
elements. Also, for convenience where not all of the
rules in a given group are needed, the DTD includes
elements for removing individual
replacement rules which had been included as part of a group.
<!ENTITY % shared1 SYSTEM "setSigns.dtd" > %shared1; <!ENTITY % ruleTypes SYSTEM "ruleTypes.dtd" > %ruleTypes; <!ELEMENT sets (set+, setWFiles+)> <!ELEMENT set (signType*, addSign*, removeSign*, addAltSign*, removeAltSign*)> <!ATTLIST set name (justBrlWords |%startingPW; |%midEndPW; |%interiorPW; |%endPW; |%specialWordSet; |%singleChars;) #REQUIRED> <!ELEMENT setWFiles (fileName+, signType*, addSign*, removeSign*, addAltSign*, removeAltSign*)> <!ATTLIST setWFiles name (%words;) #REQUIRED> <!ELEMENT fileName (#PCDATA)> <!ELEMENT signType EMPTY> <!ATTLIST signType typeId (%signtype;) #REQUIRED > <!ELEMENT addSign (#PCDATA)> <!ELEMENT removeSign (#PCDATA)> <!ELEMENT addAltSign (#PCDATA)> <!ELEMENT removeAltSign (#PCDATA)>
Typically in a braille translation system all of the replacements are in a single list with each replacement somehow flagged to identify the contexts where that replacement may be used. While this strategy may save a bit of computer memory, it isn't needed for that purpose on modern computers. Here we see that many of the same replacements are included in the two example replacement sets, one which includes all replacements that can be used at the start of an ordinary word and one which includes all replacements that can be used in the interior of an ordinary word.
<set name="defaultStartingPW"> <signType typeId="largesign"/> <signType typeId="initialLetterContraction"/> <signType typeId="oneSyllableShortform"/> <signType typeId="shortform"/> <signType typeId="anywherePartWord"/> <signType typeId="beginningPartWord"/> <signType typeId="letter"/> <signType typeId="accentedLetter"/> <addSign>-</addSign> </set> <set name="defaultInteriorPW"> <ignType typeId="largesign"/> <signType typeId="initialLetterContraction"/> <signType typeId="finalLetterContraction"/> <signType typeId="oneSyllableShortform"/> <signType typeId="shortform"/> <signType typeId="anywherePartWord"/> <signType typeId="midPartWord"/> <signType typeId="midEndPartWord"/> <signType typeId="letter"/> <signType typeId="accentedLetter"/> <addSign>-</addSign> </set>As an example of a specialized set, compare the following set of replacements which can be used at the start of proper names with the corresponding set for ordinary words.
<set name="namesStartingPW"> <signType typeId="largesign"/> <signType typeId="initialLetterContraction"/> <signType typeId="anywherePartWord"/> <signType typeId="beginningPartWord"/> <signType typeId="letter"/> <signType typeId="accentedLetter"/> </set>
The rules.xml
file is perhaps the
most significant feature of the new XML format.
A translation rule is an association between a set of replacement rules
and a translation algorithm.
The rules
file provides the user with a mechanism for associating appropriate sets of
of replacement rules with any of the translation rules or translation
algorithms that are implemented in the target braille
translation application.
In typical documents, the majority of words can be correctly translated
to contracted braille
by using the standard translation algorithm or rule which is
the one that translates a word from left to right by continually
replacing
the longest possible
print sequence with its locally eligible braille replacement. This
"longest eligible" algorithm
is used in EBAE for translating ordinary words, proper names, and the
component parts of compound words albeit with different sets
of replacements for proper names and the non-leading parts of
compound words than for ordinary words and the leading parts of
compound words. We saw earlier how the sets.xml
file
supports the creation of various sets of replacements
for use with this
standard rule.
However, most documents contain a few special types of words such as letter words, homonyms, and hesitations that cannot be translated correctly by the standard translation algorithm even with a special translation table; correct translations of these words require special translation algorithms as well as special translation tables. It isn't feasible to specify actual algorithms in an XML input file but if an application does implement certain special algorithms, it is certainly feasible to specify the replacements those algorithms should employ and the situations under which they should be used.
The XTrans translator implements six different braille translation algorithms. These are supported by the following DTD which specifies the information required by each of these algorithms.
<!ENTITY % shared SYSTEM "shared.dtd" > %shared; <!ENTITY % ruleTypes SYSTEM "ruleTypes.dtd" > %ruleTypes; <!ELEMENT rules (longestEligible+, specialWords*, byCharacters+, bySyllables*, hesitate*, homographs*)> <!ELEMENT longestEligible (wholeWord, startPW, midPW, endPW)> <!ATTLIST longestEligible name (%longestEligibleRuleNames;) #REQUIRED > <!ELEMENT specialWords (wordSet)> <!ATTLIST specialWords name (%specialWordsRuleNames;) #REQUIRED > <!ELEMENT byCharacters (charSet+)> <!ATTLIST byCharacters name (%byCharactersRuleNames;) #REQUIRED > <!ELEMENT bySyllables (startPW, midendPW)> <!ATTLIST bySyllables name (%bySyllablesRuleNames;) #REQUIRED > <!ELEMENT hesitate (startPW, startPWnoLow, midPW, midPWnoLow, endPW)> <!ATTLIST hesitate name (%hesitateRuleNames;) #REQUIRED identified CDATA #IMPLIED> <!ELEMENT homographs (wholeWord)> <!ATTLIST homographs name (%homographsRuleNames;) #REQUIRED> <!ELEMENT wholeWord EMPTY> <!ATTLIST wholeWord set (%words;) #REQUIRED > <!ELEMENT startPW EMPTY> <!ATTLIST startPW set (%startingPW;) #REQUIRED > <!ELEMENT midPW EMPTY> <!ATTLIST midPW set (%interiorPW;) #REQUIRED > <!ELEMENT endPW EMPTY> <!ATTLIST endPW set (%endPW;) #REQUIRED > <!ELEMENT wordSet EMPTY> <!ATTLIST wordSet set (%specialWordSet;) #REQUIRED > <!ELEMENT charSet EMPTY> <!ATTLIST charSet set (%singleChars;) #REQUIRED > <!ELEMENT midendPW EMPTY> <!ATTLIST midendPW set (%midEndPW;) #REQUIRED > <!ELEMENT startPWnoLow EMPTY > <!ATTLIST startPWnoLow set (%startingPW;) #REQUIRED > <!ELEMENT midPWnoLow (#PCDATA)> <!ATTLIST midPWnoLow set (%interiorPW;) #REQUIRED >
Here are three examples of rules that can be used with XTrans. The first two examples use the same standard translation algorithm but with different sets of replacement rules. The last example uses a specialized translation algorithm with the required sets of specialized replacement rules.
<longestEligible name="default"> <wholeWord set="defaultWords"/> <startPW set="defaultStartingPW"/> <midPW set="defaultInteriorPW"/> <endPW set="defaultEndingPW"/> </longestEligible> <longestEligible name="properNames"> <wholeWord set="defaultNames"/> <startPW set="namesStartingPW"/> <midPW set="defaultInteriorPW"/> <endPW set="defaultEndingPW"/> </longestEligible> <bySyllables name="syllabified"> <startPW set="syllabifiedPWStart"/> <midendPW set="syllabifiedPWMidEnd"/> </bySyllables>(The reason that both the
default
and properNames
rules can
use some of the same replacement sets is
that XTrans doesn't allow shortforms to be used as
replacements other than at the start of a word. It handles
these cases as individual exceptions via one of the
exceptions files
.)
Finally we come to how BrailleSpec interfaces with print document markup such as ZedAI. ZedAI provides for a lot of markup intended to support braille production. But, of course, this markup isn't useful unless the targetted braille production applicaton knows what to do when it encounters the markup.
A BrailleSpec file uses a very simple mechanism for communicating
how a BrailleSpec-enabled translation system should translate
marked-up words. It simply matches the text used as markup,
specified here as the
text contents of an element named
userTag
, with one of the named, user-specified, translation rules defined
in the just-described
rules
file.
<!ENTITY % shared SYSTEM "shared.dtd" > %shared; <!ELEMENT semanticTags (semanticTag+)> <!ELEMENT semanticTag (userTag, rule)> <!ELEMENT userTag (#PCDATA)> <!ELEMENT rule EMPTY> <!ATTLIST rule name (%longestEligibleRuleNames; |%byCharactersRuleNames; |%bySyllablesRuleNames; |%hesitateRuleNames; |%homographsRuleNames; ) #REQUIRED >
The modular structure of the BrailleSpec format makes it easy to add additional information to support the features of a particular braille translator. For example, the BrailleSpec fileset used with the latest version of XTrans includes the specification for the BANA Computer Braille Code which is a simple auxiliary braille code often used in conjunction with EBAE.
As another possibility, it would be straightforward to specify a file containing the information describing a system of graded or beginner braille such as US Patterns or UK Learner Braille.
Braille systems are unavoidably complex because their goal is to convey print documents as meaningfully and unamibigously as possible within the constraints imposed by use of a limited character set and by the terseness necessary to efficient tactile reading. In fact, braille systems are becoming more complex as print documents become more complex.
The ZedAI specification for print source documents is intended to support braille production via adequate markup. This article describes one way to design the braille translation function of a braille production application so as to take advantage of ZedAI markup to produce accurate braille translations.
First posted September 2, 2010. Contact info at dotlessbraille dot org
Updated version posted September 23, 2010