PROOF-READING THE WEBSTER: SUGGESTIONS ============================================ This file specifically relates to proof-reading the 1913 Webster, ans supplements the general information in the file titled "Contributors' Procedures" --------------------------------------------- The basic proofreading is intended to bring the electronic version of the 1913 Webster into conformity with the printed version. However, we are also building a supplemented version to also be available free on-line, which we call the CIDE, the "Collaborative International Dictionary of English". As an aid in prepating the supplemented version, proofreaders who notice errors in the definitions, or missing senses of words, or definitions which are out of date and should be updated, are encouraged to make notes (perhaps to write updated definitions, if appropriate), and send those in a separate file from the proofed copy, which should accurately represent the printed version. If you wish, you may make a separate copy of the entry with the updated information, or write a new definition missing from the original. What needs to be done is for the proofreader to load the the electronic version into a word processor on your own computer, and compare the text visible on the computer terminal, word for word with the corresponding paper copy (which I will send through the mail), to see if the electronic version has any errors, i.e. to see if it is not a faithful reproduction of the printed version. If there is an error in the electronic version, it should be corrected directly in the electronic version itself. A commercial at (@) sign should be added at the location where any correctionis made, so that the locus of the correction will be easy to find. When any significant portion is done (at least one numbered page), the corrected version may be sent back to me by email. Or you could wait for several pages to be finished before sending any. The text file for the Webster's 1913 dictionary is not in graphical form, it is fully simple ASCII, i.e. it has no special characters at all, just the usual 26 lower-case and 26 upper-case letters, numbers, and punctuation, so it can be viewed on any computer. It uses only the characters with codes 1 to 127; so to represent the special characters and variations in type (such as Italic or bold fonts), there are special marks that make the text look somewhat unusual and complicated. In order to indicate the special field-tags are used, somewhat similar but not always identical to those used in SGML (Standardized General Markup Language), with opening and closing tags for each field (closing tags are identical to opening tags, except that they have a forward slash after the left angle-bracket). Tags are of the form: This section is in italic type and this part is not. We also use field-tags to indicate specific parts of a definitional entry, such as Entry for the headwords, or n. for "part of speech = noun". But it should take only a little experience to recognize the common tags, and in most cases *** the field tags can be ignored ***. The point is to concentrate on the text between the tags, to be sure it reproduces the original printed text. In the master file for the 1913 Webster, the tagged fields such as headword () are not otherwise marked with respect to the actual font used, and when they are converted to a format which has varied fonts, they can be marked as having any font the user desires. Thus, headwords are in Webster 1913 printed with a bold font slightly larger than the main text, but anyone printing or displaying this dictionary on a computer screen can easily convert all headwords into any font they desire. This is the virtue of using such tags -- they allow great flexibiity in printing or display style. The full text of the electronic version of the 1913 Webster, containing the tags, is available by ftp from the Project Gutenberg site. Those who are proof-reading the dictionary will receive electronic versions of the sections they will proof-read, either by email or on a floppy disk. Other sections can be downloaded from Project Gutenberg if desired. The most important part of the proofreading is to be sure that the headword and definition are entered correctly. The headword has only three characters that may not be obvious, the double quote (") used to represent the strong accent of the printed version, the left single quote (`) used to represent the light accent of the printed version, and the asterisk (*) used to represent the short dash in the printed version, which only indicates syllable breaks. One example of this is the word "abacination", A*bac`i*na"tion. A headword may begin with \'d8, which is an entity code for a double vertical bar, indicating that this waord was taken directly from a foreign langauge without modification. In many words in the printed version there is dash longer than the short dah indicating syllable break, and this longer dash indicates a true hyphenated word, and that hyphen is represented by the usual text hyphen (-). Within the definition, bounded by the .. marks, there may be an ... section, giving an illustration of usage, and words marked with ..., which should be the same as the headword (perhaps in a plural or conjugated form), in italic font. After a little practise, these conventions should become quite familiar and unobtrusive, and it should be possible to concentrate on looking at the text parts for errors. In addition to the tags marking special "fields" in the dictionary, there are codes for special characters not contained within the usual 127-character ASCII set. For example, special characters such as the French accented characters or the German umlaut characters (and many others) are indicated in one of two ways: if such characters are included in the special font table (see file titled "Webster fonts"), they may have a hexadecimal code of the form: \'91 which means that the character may be encoded by a byte with the hexadecimal value 91 (decimal value = 145). This specific character is the "ae ligature", found in a lot of species names. This character may also be represented as "2. Nonessential; not necessarily belonging; incidental; as, songs are accidental to play. In this example, the "as" section is not a proper part of the definition itself. The field tags the use of the headword in an example phrase, and the word within the field is always italicised in the original. A list of the fields used, and their significance, will be found in the file titled "Field Marks for Webster 1913". Many of these are rarely used, so trying to memorize them is not likely to be productive. As one gets some experience comparing copies of the original pages with the field-tagged electronic version, one will likely get familiar with the usage of the main tags fairly quickly. The task will be much clearer when you have the printed pages in front of you and can compare the electronic and printed versions. I will also be sending you the files listing the field tags and their meanings, and another file with the special character codes. These two files are mostly for reference, and in proofreading the definitions will seldom need to be consulted. The etymologies have a higher density of special characters, and if you also proofread the etymologies, it may be necessary to consult the special character file for unfamiliar codes. THe print in the original dictionary is on the small side in some places, and I find a good wide (4-5 inch) magnifying glass to be almost necessary to be sure I can recognize the printed characters where they may be smeared. Just reading through the definitions, rather than comparing word for word with the paper copy, can pick up a lot of errors, since some of the typos will be obvious. But many of the typing errors are due to missing words, especially "to" "for", "a", "the". Also you may see "of" where "or" should be, and vice versa. Where such errors occur, the text may sound a little strange, and that is a clue to look carefully at the paper copy, since it is usually not possible to tell what is incorrect just by looking at the erroneous electronic version. Occasionally a whole line may have been omitted. ============================================================= Possible Typographical errors in the original printed version? --------------------------------------------------------------- The original *printed* text of the 1913 Webster is remarkably free of typographical errors, but one can still find a few rare ones, every dozen pages or so. Where there is an apparent typographical error in the original, the editor should correct it in the proof-read file (provided that it is clearly an error). More common than real typos are places where the print didn't take, and small white spaces force the reader to guess at what was there. In a few places, some unusual words or words used in unusual senses at first appeared to be in error, but a check of that word in the original dictionary showed that it was in fact correctly used. Where a real error in teh original occurs, a note indicating that the *original* had an error should be placed at the end of the paragraph, in a separate "comment" paragraph of the form: <-- @@ the word "fighter" in the original was spelled "figter" --> In some cases, because of the poor quality of the printing, certain characters may be missing. For example, in some "field of knowledge" tags (marked as (xxx)) there may be a period missing from the abbreviation: thus, we sometimes see "(Naut )" where normally it is written "(Naut.)" In such cases, we assume that this was not an error of the editors, but an imperfection in the printing process, and we include the expected period, *without* any added comment. =========================================================== Typos in the electronic version ------------------------------ The type of errors found are various, and to find all the errors, it will be necessary to read the full text as typed, and the text of the original. For the definitions, an error will usually be signalled by some peculiar or incomprehensible character of the typed text. This may indicate an incorrect or missing word, or, in some cases, a missing line of type. It is not possible to rely on a spell-checker, since most of the errors are ommissions, and many of the words which were mis-typed are in fact legitimate words. The largest number of errors, counting individual characters, are caused by omission of words. Often particles or prepositions, but also longer words which are unexpectedly and apparently randomly omitted. Thus, one will often find that the typists have omitted words like "the", "a", "to", or "of". Other whole words may be omitted, and sometimes whole lines were skipped. In a few cases, an entire definition was left out. In misspellings, errors of more than one character are common, and the correct word may be only approximately similar to the word as typed. For example, "arguments," may be found, where the original has "augments"; "expert" in place of "except", "obstructed" in place of "obscured", "specter" in place of "scepter". Even greater differences between the typed and correct word will be found: "diatribes" replaced "doctrines" in one case (I wonder if that was a deliberate political statement!). =============================================================== ***NOTE*** Only continue reading this *after* you have had some experience proofreading, and if you decide that you also want to look for errors in the tags. They will not be as common as typographical errors in the text, and this will not be necessary to make the text conform to the original. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Errors in field-tagging =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= The placement of the field-tags will also occasionally contain errors, which, of course, do not reflect errors in the representation of the actual text, since the original text did not contain the field tags. The field tags were placed by automatic macros, and errors and inconsistencies may be found in the placement. A proof-reader may decide to ignore the finer problems raised by the field-tagging, except in those cases where it is apparent that a field-tag is erroneous. For those who are willing to tackle this aspect of proof-reading, the following common problems may be found: Absent tag: This is most noticeable when a word *italicised* in the original has *no* tag in the electronic version. This was a typing error, and should be corrected. The proofreader only needs to add an error mark (@) at that location, and I will insert the proper tag there. Definition field ---------------------------- The convention adopted here is that the definition field should not include the usage tags (e.g. [Obs.]) or the \'bd--\'b8 quotation fields when they occur at the end of the definition. These latter two fields were included by the marking macro in many cases, and should be excluded by moving the tag from *after* the [] or \'bd-\'b8 to *in front of* either of these fields. In some cases the "definition" field has only "p. p. of xyz.", and for now this entire phrase will be kept within the definition; the part-of speech field is, in these cases, part of the definition rather than in front of it, even though the "p. p." is in the location where it could be interpreted, as the "pos" field for that entry. As a result, such entries will *not* have a "pos" field *outside* of the definition field, and the POS within that field will also be interpreted as the POS of the entry. Such entries are recognizable by the pattern: ... In a few cases the pattern occurs: n.; pl. of ... the POS of the entry could be corrected to n. pl. =========================================================== Species names --------------------------- The names of species, genera, and higher divisions of biological classes which appeared in *italics* were tagged by the "species" field tag, e.g. Drosophila melanogaster. The automatic marking of these was very inconsistent, since it was not always obvious which italicised words were species and which were not. If the proofreader notices a species (or genus, etc.) name that is marked by an tag, rather than a tag, the tag should be substituted for the tag. In some cases where the names of taxonomic genera or orders are given in italic font, they are marked as ... or ... respectively, but this has not been done consistently. ================================================================ The "as" field ------------------------- The automatic marking of the fields, which give examples of word use within the first sentence of the definition, sometimes failed because of typographical errors or unusual formats in a definition. If the proofreader notices such a usage example not marked by ... tags, those tags should be added. The example headword contained between those tags is, in the original, in italics, and may be marked with ... tags in the electronic version. Those tags should be changed to .... Sometimes in the electronic version the italicised example word will have *no* field tags around it, and in those cases the proofreader should add in the tags, to signal the italicisation. =============================================================== Unmarked italic words --------------------- In many cases, the words marked by italics in the original have no special field tags in the electronic version. This is a typographical error of the electronic version which should be corrected. If the proofreader recognizes the functional type of field which the italics represent (such as or ), those specific tags should be used. Otherwise, italics of unrecognized functional significance should be marked with the ... tags. ======================================================== Authors and authorities ---------------------------------- In some segments, such as the collocation segments or the notes, an or field *follows* the closing or . The closing or tag should be moved to *behind* the or . ============================================================== usage marks Most but not all fields tagged as will be after the definition, within square brackets. The actual usage terms themselves are in italic font (e.g. "Obs." for "obsolete", "R." for "rare", "Prov. Eng." for "provincial England", "U.S.", etc.), but there may also be some comments that are not in italics. In this first version, the italicised words are not distinguished fom the non-italicised words, and this typographical inconsistency will need to be corrected en masse at a later stage. The proofreader needs only to be certain that such usage comments in square brackets after a definition are marked with the ... tags.