PROOF-READING THE WEBSTER: SUGGESTIONS
============================================
This file specifically relates to proof-reading the 1913 Webster,
ans supplements the general information in the file titled
"Contributors' Procedures"
---------------------------------------------
The basic proofreading is intended to bring the electronic
version of the 1913 Webster into conformity with the printed
version. However, we are also building a supplemented version
to also be available free on-line, which we call the CIDE, the
"Collaborative International Dictionary of English". As an aid in
prepating the supplemented version, proofreaders who notice errors in
the definitions, or missing senses of words, or definitions which
are out of date and should be updated, are encouraged to make
notes (perhaps to write updated definitions, if appropriate),
and send those in a separate file from the proofed copy, which
should accurately represent the printed version. If you wish, you
may make a separate copy of the entry with the updated information,
or write a new definition missing from the original.
What needs to be done is for the proofreader to load the the
electronic version into a word processor on your own computer, and
compare the text visible on the computer terminal, word for word with
the corresponding paper copy (which I will send through the mail), to
see if the electronic version has any errors, i.e. to see if it is not
a faithful reproduction of the printed version. If there is an error
in the electronic version, it should be corrected directly in the
electronic version itself. A commercial at (@) sign should be
added at the location where any correctionis made, so that the locus of
the correction will be easy to find. When any significant portion is
done (at least one numbered page), the corrected version may be sent
back to me by email. Or you could wait for several pages to be finished
before sending any.
The text file for the Webster's 1913 dictionary is not in graphical
form, it is fully simple ASCII, i.e. it has no special characters at all,
just the usual 26 lower-case and 26 upper-case letters, numbers, and
punctuation, so it can be viewed on any computer. It uses only the
characters with codes 1 to 127; so to represent the special characters
and variations in type (such as Italic or bold fonts), there are special
marks that make the text look somewhat unusual and complicated. In order
to indicate the special field-tags are used, somewhat similar but not
always identical to those used in SGML (Standardized General Markup
Language), with opening and closing tags for each field (closing tags are
identical to opening tags, except that they have a forward slash
after the left angle-bracket). Tags are of the form:
This section is in italic type and this part is not.
We also use field-tags to indicate specific parts of a
definitional entry, such as Entry for the headwords, or
n. for "part of speech = noun". But it should take only
a little experience to recognize the common tags, and in most cases
*** the field tags can be ignored ***. The point is to concentrate on
the text between the tags, to be sure it reproduces the original printed
text.
In the master file for the 1913 Webster, the tagged fields such
as headword () are not otherwise marked with respect to the
actual font used, and when they are converted to a format which has
varied fonts, they can be marked as having any font the user
desires. Thus, headwords are in Webster 1913 printed with a
bold font slightly larger than the main text, but anyone
printing or displaying this dictionary on a computer screen can
easily convert all headwords into any font they desire. This is
the virtue of using such tags -- they allow great flexibiity
in printing or display style.
The full text of the electronic version of the 1913 Webster,
containing the tags, is available by ftp from the Project Gutenberg
site. Those who are proof-reading the dictionary will receive
electronic versions of the sections they will proof-read, either
by email or on a floppy disk. Other sections can be downloaded
from Project Gutenberg if desired.
The most important part of the proofreading is to be sure that
the headword and definition are entered correctly. The headword
has only three characters that may not be obvious, the double
quote (") used to represent the strong accent of the printed version,
the left single quote (`) used to represent the light accent of the
printed version, and the asterisk (*) used to represent the short dash
in the printed version, which only indicates syllable breaks. One
example of this is the word "abacination", A*bac`i*na"tion.
A headword may begin with \'d8, which is an entity code for a double
vertical bar, indicating that this waord was taken directly from
a foreign langauge without modification. In many words in the printed
version there is dash longer than the short dah indicating syllable
break, and this longer dash indicates a true hyphenated word, and that
hyphen is represented by the usual text hyphen (-). Within
the definition, bounded by the .. marks, there may be an
... section, giving an illustration of usage, and words
marked with ..., which should be the same as the headword
(perhaps in a plural or conjugated form), in italic font. After a
little practise, these conventions should become quite
familiar and unobtrusive, and it should be possible to concentrate on
looking at the text parts for errors.
In addition to the tags marking special "fields" in the dictionary,
there are codes for special characters not contained within the
usual 127-character ASCII set. For example, special characters such
as the French accented characters or the German umlaut characters (and
many others) are indicated in one of two ways: if such characters
are included in the special font table (see file titled "Webster fonts"),
they may have a hexadecimal code of the form:
\'91
which means that the character may be encoded by a byte with the
hexadecimal value 91 (decimal value = 145). This specific character
is the "ae ligature", found in a lot of species names. This character
may also be represented as "2. Nonessential; not necessarily belonging;
incidental; as, songs are accidental to
play.
In this example, the "as" section is not a proper part of the
definition itself. The field tags the use of the headword
in an example phrase, and the word within the field is
always italicised in the original.
A list of the fields used, and their significance, will be
found in the file titled "Field Marks for Webster 1913".
Many of these are rarely used, so trying to memorize them is
not likely to be productive. As one gets some experience
comparing copies of the original pages with the field-tagged
electronic version, one will likely get familiar with the usage
of the main tags fairly quickly.
The task will be much clearer when you have the printed pages
in front of you and can compare the electronic and printed versions.
I will also be sending you the files listing the field tags and
their meanings, and another file with the special character codes.
These two files are mostly for reference, and in proofreading the
definitions will seldom need to be consulted. The etymologies
have a higher density of special characters, and if you also
proofread the etymologies, it may be necessary to consult the
special character file for unfamiliar codes. THe print in the original
dictionary is on the small side in some places, and I find a good
wide (4-5 inch) magnifying glass to be almost necessary to be
sure I can recognize the printed characters where they may be
smeared.
Just reading through the definitions, rather than comparing word
for word with the paper copy, can pick up a lot of errors, since some
of the typos will be obvious. But many of the typing errors are
due to missing words, especially "to" "for", "a", "the". Also
you may see "of" where "or" should be, and vice versa. Where such
errors occur, the text may sound a little strange, and that is
a clue to look carefully at the paper copy, since it is usually
not possible to tell what is incorrect just by looking at the
erroneous electronic version. Occasionally a whole line may have
been omitted.
=============================================================
Possible Typographical errors in the original printed version?
---------------------------------------------------------------
The original *printed* text of the 1913 Webster is remarkably free of
typographical errors, but one can still find a few rare ones,
every dozen pages or so. Where there is an apparent
typographical error in the original, the editor should correct it
in the proof-read file (provided that it is clearly an error).
More common than real typos are places where the print didn't take,
and small white spaces force the reader to guess at what was there.
In a few places, some unusual words or words used in unusual
senses at first appeared to be in error, but a check of that word
in the original dictionary showed that it was in fact correctly
used. Where a real error in teh original occurs, a note indicating
that the *original* had an error should be placed at the end of the
paragraph, in a separate "comment" paragraph of the form:
<-- @@ the word "fighter" in the original was spelled "figter" -->
In some cases, because of the poor quality of the printing,
certain characters may be missing. For example, in some "field
of knowledge" tags (marked as (xxx)) there may be a
period missing from the abbreviation: thus, we sometimes see
"(Naut )" where normally it is written "(Naut.)" In such cases,
we assume that this was not an error of the editors, but an
imperfection in the printing process, and we include the expected
period, *without* any added comment.
===========================================================
Typos in the electronic version
------------------------------
The type of errors found are various, and to find all the
errors, it will be necessary to read the full text as typed, and
the text of the original. For the definitions, an error will
usually be signalled by some peculiar or incomprehensible
character of the typed text. This may indicate an incorrect or
missing word, or, in some cases, a missing line of type. It is
not possible to rely on a spell-checker, since most of the errors
are ommissions, and many of the words which were mis-typed are in
fact legitimate words.
The largest number of errors, counting individual characters,
are caused by omission of words. Often particles or
prepositions, but also longer words which are unexpectedly and
apparently randomly omitted. Thus, one will often find that
the typists have omitted words like "the", "a", "to", or "of".
Other whole words may be omitted, and sometimes whole lines were
skipped. In a few cases, an entire definition was left out.
In misspellings, errors of more than one character are common,
and the correct word may be only approximately similar to the
word as typed. For example, "arguments," may be found, where the
original has "augments"; "expert" in place of "except",
"obstructed" in place of "obscured", "specter" in place of
"scepter". Even greater differences between the typed and
correct word will be found: "diatribes" replaced "doctrines" in one
case (I wonder if that was a deliberate political statement!).
===============================================================
***NOTE*** Only continue reading this *after* you have had some
experience proofreading, and if you decide that you also want to
look for errors in the tags. They will not be as common as typographical
errors in the text, and this will not be necessary to make the
text conform to the original.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Errors in field-tagging
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
The placement of the field-tags will also occasionally contain
errors, which, of course, do not reflect errors in the
representation of the actual text, since the original text did
not contain the field tags. The field tags were placed by
automatic macros, and errors and inconsistencies may be
found in the placement. A proof-reader may decide to ignore the
finer problems raised by the field-tagging, except in those cases
where it is apparent that a field-tag is erroneous. For those
who are willing to tackle this aspect of proof-reading, the
following common problems may be found:
Absent tag:
This is most noticeable when a word *italicised* in the
original has *no* tag in the electronic version. This was
a typing error, and should be corrected. The proofreader only
needs to add an error mark (@) at that location, and I will
insert the proper tag there.
Definition field
----------------------------
The convention adopted here is that the definition field should
not include the usage tags (e.g. [Obs.]) or the
\'bd--\'b8 quotation fields when they occur at the end of the
definition. These latter two fields were included by the marking macro
in many cases, and should be excluded by moving the tag
from *after* the [] or \'bd-\'b8 to *in front of* either
of these fields.
In some cases the "definition" field has only "p.
p. of xyz.", and for now this entire phrase will be kept
within the definition; the part-of speech field is, in
these cases, part of the definition rather than in front of it,
even though the "p. p." is in the location where it could be
interpreted, as the "pos" field for that entry. As a result,
such entries will *not* have a "pos" field *outside* of the
definition field, and the POS within that field will also be
interpreted as the POS of the entry. Such entries are
recognizable by the pattern: ...
In a few cases the pattern occurs:
n.; pl. of ...
the POS of the entry could be corrected to n. pl.
===========================================================
Species names
---------------------------
The names of species, genera, and higher divisions of
biological classes which appeared in *italics* were tagged by
the "species" field tag, e.g. Drosophila
melanogaster. The automatic marking of these was very
inconsistent, since it was not always obvious which italicised
words were species and which were not. If the proofreader
notices a species (or genus, etc.) name that is marked by
an tag, rather than a tag, the tag should
be substituted for the tag. In some cases where the
names of taxonomic genera or orders are given in italic font, they
are marked as ... or ... respectively,
but this has not been done consistently.
================================================================
The "as" field
-------------------------
The automatic marking of the fields, which give examples
of word use within the first sentence of the definition,
sometimes failed because of typographical errors or unusual
formats in a definition. If the proofreader notices such a usage
example not marked by ... tags, those tags should
be added. The example headword contained between those tags
is, in the original, in italics, and may be marked with
... tags in the electronic version. Those
tags should be changed to .... Sometimes in the
electronic version the italicised example word will have *no* field
tags around it, and in those cases the proofreader should add
in the tags, to signal the italicisation.
===============================================================
Unmarked italic words
---------------------
In many cases, the words marked by italics in the original
have no special field tags in the electronic version. This is
a typographical error of the electronic version which should
be corrected. If the proofreader recognizes the functional type
of field which the italics represent (such as or ),
those specific tags should be used. Otherwise, italics of unrecognized
functional significance should be marked with the ... tags.
========================================================
Authors and authorities
----------------------------------
In some segments, such as the collocation segments
or the notes, an or field *follows* the
closing or . The closing or tag
should be moved to *behind* the or .
==============================================================
usage marks
Most but not all fields tagged as will be after the definition,
within square brackets. The actual usage terms themselves are
in italic font (e.g. "Obs." for "obsolete", "R." for "rare",
"Prov. Eng." for "provincial England", "U.S.", etc.), but there may
also be some comments that are not in italics. In this first version,
the italicised words are not distinguished fom the non-italicised
words, and this typographical inconsistency will need to be
corrected en masse at a later stage. The proofreader needs only
to be certain that such usage comments in square brackets after a
definition are marked with the ... tags.