![E-text](https://www.english.nina.az/image-resize/1600/900/web/wikipedia.jpg)
This article needs additional citations for verification.(January 2013) |
e-text (from "electronic text"; sometimes written as etext) is a general term for any document that is read in digital form, and especially a document that is mainly text. For example, a computer-based book of art with minimal text, or a set of photographs or scans of pages, would not usually be called an "e-text". An e-text may be a binary or a plain text file, viewed with any open source or proprietary software. An e-text may have markup or other formatting information, or not. An e-text may be an electronic edition of a work originally composed or published in other media, or may be created in electronic form originally. The term is usually synonymous with e-book.
E-text origins
E-texts, or electronic documents, have been around since long before the Internet, the Web, and specialized E-book reading hardware. Roberto Busa began developing an electronic edition of Aquinas in the 1940s, while large-scale electronic text editing, hypertext, and online reading platforms such as Augment and FRESS appeared in the 1960s. These early systems made extensive use of formatting, markup, automatic tables of contents, hyperlinks, and other information in their texts, as well as in some cases (such as FRESS) supporting not just text but also graphics.
"Just plain text"
This section possibly contains original research.(April 2015) |
In some communities, "e-text" is used much more narrowly, to refer to electronic documents that are, so to speak, "plain vanilla ASCII". By this is meant not only that the document is a plain text file, but that it has no information beyond "the text itself"—no representation of bold or italics, paragraph, page, chapter, or footnote boundaries, etc. Michael S. Hart, for example, argued that this "is the only text mode that is easy on both the eyes and the computer". Hart made the correct[according to whom?] point that proprietary word-processor formats made texts grossly inaccessible; but that is irrelevant to standard, open data formats. The narrow sense of "e-text" is now uncommon, because the notion of "just vanilla ASCII" (attractive at first glance), has turned out to have serious difficulties:
First, this narrow type of "e-text" is limited to the English letters. Not even Spanish ñ or the accented vowels used in many European languages cannot be represented (unless awkwardly and ambiguously as "~n" "a'"). Asian, Slavic, Greek, and other writing systems are impossible.
Second, diagrams and pictures cannot be accommodated, and many books have at least some such material; often it is essential to the book.
Third, "e-texts" in this narrow sense have no reliable way to distinguish "the text" from other things that occur in a work. For example, page numbers, page headers, and footnotes might be omitted, or might simply appear as additional lines of text, perhaps with blank lines before and after (or not). An ornate separator line might be represented instead by a line of asterisks (or not). Chapter and sections titles, likewise, are just additional lines of text: they might be detectable by capitalization if they were all caps in the original (or not). Even to discover what conventions (if any) were used, makes each book a new research or reverse-engineering project.
In consequence of this, such texts cannot be reliably re-formatted. A program cannot reliably tell where footnotes, headers or footers are, or perhaps even paragraphs, so it cannot re-arrange the text, for example to fit a narrower screen, or read it aloud for the visually impaired. Programs might apply heuristics to guess at the structure, but this can easily fail.
Fourth, and a perhaps surprisingly[according to whom?] important issue, a "plain-text" e-text affords no way to represent information about the work. For example, is it the first or the tenth edition? Who prepared it, and what rights do they reserve or grant to others? Is this the raw version straight off a scanner, or has it been proofread and corrected? Metadata relating to the text is sometimes included with an e-text, but there is by this definition no way to say whether or where it is preset. At best, the text of the title page might be included (or not), perhaps with centering imitated by indentation.
Fifth, texts with more complicated information cannot really be handled at all. A bilingual edition, or a critical edition with footnotes, commentary, critical apparatus, cross-references, or even the simplest tables. This leads to endless practical problems: for example, if the computer cannot reliably distinguish footnotes, it cannot find a phrase that a footnote interrupts.
Even raw scanner OCR output usually produces more information than this, such as the use of bold and italic. If this information is not kept, it is expensive and time-consuming to reconstruct it; more sophisticated information such as what edition you have, may not be recoverable at all.
If actuality, even "plain text" uses some kind of "markup"—usually control characters, spaces, tabs, and the like: Spaces between words; two returns and 5 spaces for paragraph. The main difference from more formal markup is that "plain texts" use implicit, usually undocumented conventions, which are therefore inconsistent and difficult to recognize.
The narrow sense of e-text as "plain vanilla ASCII" has fallen out of favor.[according to whom?] Nevertheless, many such texts are freely available on the Web, perhaps as much because they are easily produced as because of any purported portability advantage. For many years Project Gutenberg strongly favored this model of text, but with time, has begun to develop and distribute more capable forms such as HTML.
See also
- Text file
- e-book
- Electronic paper
- Digital library
- Online Books Page
- Distributed Proofreaders
- L'Association des Bibliophiles Universels
References
- Reading and Writing the Electronic Book. Nicole Yankelovich, Norman Meyrowitz, and Andries van Dam. IEEE Computer 18(10), October 1985. http://dl.acm.org/citation.cfm?id=4407
- Michael S. Hart
- Coombs, James H.; Renear, Allen H.; DeRose, Steven J. (November 1987). "Markup systems and the future of scholarly text processing". Communications of the ACM. 30 (11). ACM: 933–947. doi:10.1145/32206.32209. S2CID 59941802.
External links
- Scholarly Electronic Publishing Bibliography
This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources E text news newspapers books scholar JSTOR January 2013 Learn how and when to remove this message e text from electronic text sometimes written as etext is a general term for any document that is read in digital form and especially a document that is mainly text For example a computer based book of art with minimal text or a set of photographs or scans of pages would not usually be called an e text An e text may be a binary or a plain text file viewed with any open source or proprietary software An e text may have markup or other formatting information or not An e text may be an electronic edition of a work originally composed or published in other media or may be created in electronic form originally The term is usually synonymous with e book E text originsE texts or electronic documents have been around since long before the Internet the Web and specialized E book reading hardware Roberto Busa began developing an electronic edition of Aquinas in the 1940s while large scale electronic text editing hypertext and online reading platforms such as Augment and FRESS appeared in the 1960s These early systems made extensive use of formatting markup automatic tables of contents hyperlinks and other information in their texts as well as in some cases such as FRESS supporting not just text but also graphics Just plain text This section possibly contains original research Please improve it by verifying the claims made and adding inline citations Statements consisting only of original research should be removed April 2015 Learn how and when to remove this message In some communities e text is used much more narrowly to refer to electronic documents that are so to speak plain vanilla ASCII By this is meant not only that the document is a plain text file but that it has no information beyond the text itself no representation of bold or italics paragraph page chapter or footnote boundaries etc Michael S Hart for example argued that this is the only text mode that is easy on both the eyes and the computer Hart made the correct according to whom point that proprietary word processor formats made texts grossly inaccessible but that is irrelevant to standard open data formats The narrow sense of e text is now uncommon because the notion of just vanilla ASCII attractive at first glance has turned out to have serious difficulties First this narrow type of e text is limited to the English letters Not even Spanish n or the accented vowels used in many European languages cannot be represented unless awkwardly and ambiguously as n a Asian Slavic Greek and other writing systems are impossible Second diagrams and pictures cannot be accommodated and many books have at least some such material often it is essential to the book Third e texts in this narrow sense have no reliable way to distinguish the text from other things that occur in a work For example page numbers page headers and footnotes might be omitted or might simply appear as additional lines of text perhaps with blank lines before and after or not An ornate separator line might be represented instead by a line of asterisks or not Chapter and sections titles likewise are just additional lines of text they might be detectable by capitalization if they were all caps in the original or not Even to discover what conventions if any were used makes each book a new research or reverse engineering project In consequence of this such texts cannot be reliably re formatted A program cannot reliably tell where footnotes headers or footers are or perhaps even paragraphs so it cannot re arrange the text for example to fit a narrower screen or read it aloud for the visually impaired Programs might apply heuristics to guess at the structure but this can easily fail Fourth and a perhaps surprisingly according to whom important issue a plain text e text affords no way to represent information about the work For example is it the first or the tenth edition Who prepared it and what rights do they reserve or grant to others Is this the raw version straight off a scanner or has it been proofread and corrected Metadata relating to the text is sometimes included with an e text but there is by this definition no way to say whether or where it is preset At best the text of the title page might be included or not perhaps with centering imitated by indentation Fifth texts with more complicated information cannot really be handled at all A bilingual edition or a critical edition with footnotes commentary critical apparatus cross references or even the simplest tables This leads to endless practical problems for example if the computer cannot reliably distinguish footnotes it cannot find a phrase that a footnote interrupts Even raw scanner OCR output usually produces more information than this such as the use of bold and italic If this information is not kept it is expensive and time consuming to reconstruct it more sophisticated information such as what edition you have may not be recoverable at all If actuality even plain text uses some kind of markup usually control characters spaces tabs and the like Spaces between words two returns and 5 spaces for paragraph The main difference from more formal markup is that plain texts use implicit usually undocumented conventions which are therefore inconsistent and difficult to recognize The narrow sense of e text as plain vanilla ASCII has fallen out of favor according to whom Nevertheless many such texts are freely available on the Web perhaps as much because they are easily produced as because of any purported portability advantage For many years Project Gutenberg strongly favored this model of text but with time has begun to develop and distribute more capable forms such as HTML See alsoText file e book Electronic paper Digital library Online Books Page Distributed Proofreaders L Association des Bibliophiles UniverselsReferencesReading and Writing the Electronic Book Nicole Yankelovich Norman Meyrowitz and Andries van Dam IEEE Computer 18 10 October 1985 http dl acm org citation cfm id 4407 Michael S Hart Coombs James H Renear Allen H DeRose Steven J November 1987 Markup systems and the future of scholarly text processing Communications of the ACM 30 11 ACM 933 947 doi 10 1145 32206 32209 S2CID 59941802 External linksScholarly Electronic Publishing Bibliography