------------------------------
Subject: 0. about this FAQ: the psychology of etext
The process of assembling references for this FAQ, and tracing each topic back to its origins (often hundreds or thousands of years ago), has a led me to a somewhat novel set of generalizations about human communication:
And no other medium of human communication is so ideally *self-documenting* for the purpose of research as etext-communication-- if computers are to be used in the research, *every* communication we study must be transcribed first into etext. So we can ask a vast general question:
Could the study of _etext_ be the hidden gateway to a new paradigm for psychological science?
The history of etext reflects all the intricacies of *human communication* So this FAQ will try to take this broad view of etext-theory, even as it offers the conventional _overview of Net resources_ in every area of etext-- a very broad and rich domain that includes file formats, info-retrieval, linguistics, optical scanning, conservation, copyright, cryptography, hypertext, and e-sociology. Etexts will be viewed as a microcosm of human psychology, with a sequence of development that embodies all aspects of the psyche *in a uniquely well-documented fashion*: from conception and composition of the message, thru transmission and response.
My theory of FAQs is that they should offer a compact summary of a topic-area, with carefully chosen links to Net resources for further exploration. (My ideal 'Yahoo' would consist entirely of such FAQs, arranged hierarchically.) In the course of researching this FAQ, I found that all the individual topics were very well-explored on the Net, but I found no single source trying to bring them all together.
This FAQ will also mount a couple of soapboxes:
1) that etexts can and should be formatted for accessibility via the *lowest* common denominator of display technology: 80-column ascii, or '80a'
2) that improved software for analysing etexts is the most interesting challenge in artificial intelligence today, but by no means an easy one.
Also, past experience has shown that a faq can sometimes turn an underutilised newsgroup into a community... and alt.etext seems to me underutilised!
------------------------------
Subject: 1a. etext history: preelectric prehistory
"...King Roderick O'Conor, the paramount chief polemarch and last >preelectric< king of all Ireland..." James Joyce, 1923 draft for Finnegans Wake
*Electrons and electricity* are a uniquely malleable medium for the transmission of messages-- thus we can speak of electronic communication or 'e-communication'. When such messages are for the eye alone, they may be called 'e-video' (both still images and moving pictures), for the ear 'e-audio'. But by far the most efficient form of e-communication uses language encoded in a phonetic alphabet: etext.
Even before there was spoken language, there were symbol systems in nature, with DNA encoding an alphabet of some 26 amino acids, accumulating over the ages a repository of survival skills. Another pre-speech alphabet remains to be discovered: the unsolved puzzle of the *memory molecule* (which may yet prove to be DNA/RNA). Chimps without language still display an extraordinarily subtle range of intuition that must be encoded in these alphabets of gene and memory.
Spoken language (or 's-text', produced and understood at speeds of up to 300 words per minute) must be tens of thousands of years old. It may have originated as a way to communicate warnings or desires, to teach skills, to woo mates, to appease gods, or to entertain with stories. And though language has never been perfect, especially for describing inner intuitions, at rare times it is still capable of extremely subtle effects, especially in poetry.
Handwritten languages (h-text, always slower than s-text for writing, but considerably faster for reading) are about 5000 years old, phonetic alphabets (h-audio, in a way!) less than 4000 years old. Handdrawn images like cave paintings (h-video) go back tens of thousands of years, while h-audio annotations of music goes back more than 1000 years.
H-images allowed a considerable degree of communication at a distance both in space and time. The first symbolic tokens seem to have been 'bills of lading' sent with shipments of goods, to discourage pilferage. Later, political leaders kept in touch via writing, and monuments honored gods and monarchs. Myths were recorded, and proverbs. Writing on clay was often preserved, but other early writing was usually lost to fire or water. (The dry climate of Egypt and the mideast allowed some discarded scrolls to survive.)
But lost, in h-text transmission, were gesture and tone of voice, the possibility of immediate interactive feedback, and of showing or pointing directly at an object under discussion. Both spoken and written messages might be misunderstood, or retransmitted with errors, leading to further misunderstandings. The process of composition became more deliberate and less spontaneous, permitting one to substitute second thoughts by recopying.
Most written messages would have been betwen people who'd already met, but when this wasn't true, the text carried an enormous burden of conveying the personal integrity of the author. For this reason, it might be delivered by an intermediary representing the author, or accompanied by a cover letter from someone known to both. *Publication* involved offering an original copy in a public place, often with a human to read it to the illiterate. Handwriting must have become a significant substitute for intonation, with bad handwriting becoming associated with untrustworthiness. (The physical text conveys a general 'vibe' about the author.) The possibility of private messages being intercepted led to codes (forms of which had even been used in s-text, to prevent eavesdropping).
[A not-too-poncy overview of poncy theories of orality vs literacy]
<URL:http://www.aber.ac.uk/~dgc/litoral.html>
[Links to poncier approaches]
The limitations of the eye as compared to the ear required that the continuous 'line' of speech be broken into multiple lines of text. Only 2600 years ago did this process standardize, around the Mediterranean, on left-to-right lines (rather than back-and-forth, up-and-down, etc). This convention also enabled conventions for lists and tables, etc, combining some of the advantages of (two-dimensional) pictures with the one-dimensional *line* of speech. Conventions for breaking lines between words evolved when spaces between words were added in x, vowels in x, paragraphs in x. The lowercase roman alphabet for h-text was distinguished from the uppercase only 1300 years ago, while punctuation and spelling weren't standardised until x. Conventions for distinguishing quoted speech from text were explored in x. Titles for texts were introduced in x.
Mechanical printing (p-text, eventually reaching more than 200 words/min for inputtting, with the electric keyboard, and vitually without limit for the rate of text-output) goes back to Gutenberg less than 600 years ago. (P-video with limited text, via woodcuts, goes back more than 1000 years in China. P-audio had to wait until the 19th century, for music boxes and player pianos.) McLuhan argues (with amazing historical detail) in his "Gutenberg Galaxy" that the creation of multiple identical copies of uniform lines of type, drained of the individual expressiveness of handwriting, created an illusion of *authority* that had profound social implications.
[McLuhan links]
With the printing press, 'publication' took on a new meaning. While the earliest era of printing was limited to religious works and reprinted classics, the potential for wider readership inspired not just new writing but new thinking, and publication represented a courageous choice to move ideas into the public sphere. And people who had read the same work began to share a new sort of bond, even forming a new sort of text-community. But readers might respond to a published work as a letter, and reply with fanmail or a 'poison pen' letter (now called a flame). In all likelihood, some early readers probably became obsessive about some authors, maybe even stalking them in some way. Piracy of original works now became a serious problem, as did political and religious censorship. Newspapers and advertising were among the early innovations of print.
But the power of printing was confined by its expense to a very limited number of people, and most writing remained as h-text not p-text, at least until the typewriter became popular after the US Civil War. *Typesetting* for the printing press was a serious skilled craft, requiring subtle esthetic judgment and sophisticated knowledge, including complex principles of hyphenation, and subtle esthetic judgments for letterspacing. Who but typesetters would notice that letter frequencies are uneven, the most common in English being 'Etaoin Shrdlu' according to one popular mnemonic (or 'Esarin Tulom' in French)?
Various systems were improvised for marking up proofs as they were proofread, corrections usually required breaking up the lines of type and resetting them to get the spacing correct. Businesses also employed droves of clerks to copy documents by hand, and secretaries to take dictation. These secondhand copies might then be proofread and marked up for recopying.
Occasional early text analysis was undertaken by hand: dictionaries (1746), thesauruses (1852), and concordances, with special emphasis on the Bible and Shakespeare. Mystics like Ramon Llull, and the Kabalists, explored text combinatorics in the middle ages. Comparative analysis of different versions of various texts were undertaken.
Other mechanical formats for p-text included punchcards for weaving-patterns (1801), Babbage's difference engine (proposed 1832, built 1853) which was meant to have printed output via a soft metal plate, and his analytic engine (1834) programs stored on punched cards. The first successful mechanical typewriter was introduced in 1867, and the QWERTY keyboard in 1874. By 1889 there were mechanical calculators with keyboard input and printed output, and the 1890 US Census used Hollerith punchcards, read electronically. The phonograph provided p-audio in 1877, allowing dictating machines in 1881, which merged p-audio into the p-text path. P-audio became e-audio with wire recording in 1898, and tape in 1928.
[History of typewriters, with an amazing bibliography]
<URL:http://xavier.xu.edu:8000/~polt/tw-history.html>
[Another, from Popular Mechanics]
<URL:http://popularmechanics.com:80/popmech/spec/9608SFACM.html>
------------------------------
Subject: 1b. etext history: from the telegraph to the computer
The first electronic encoding of the alphabet (ie etext) was Morse's telegraph code in 1837, the first published etext in 1844 his "WHAT HATH GOD WROUGHT?". His original transmitter used metal templates for each letter to transmit the patterns to a receiver printing dots and dashes on a paper drum, but telegraph operators found it more efficient to key by hand and translate by ear (up to 75 words/min receiving, less than 40 sending).
Later, automated transmitters and receivers (starting with punched paper tape in 1915) increased those speeds to 100 words/minute and created the first body of etexts in enduring form (mostly destroyed for privacy reasons, alas!). Direct printed output from Morse was introduced in the 1920s. Electric typewriters were explored beginning in 1902, including one that transmitted Morse in 1908, but they only became popular after WW2. IBM's classic Selectric line (with a single ball as the print element) debuted in 1961, along with a line of dictation machines (merging e-audio into the p-text path).
The Morse code had included 44 characters via patterns of dots and dashes of varying length: the 26 letters of the alphabet, the ten digits, plus eight punctuation marks: .,:;-/?" (Why did 'STOP' replace '.' for sentence ends?) So the variations of fontsize, typeface, styles like bold and italic, and even upper/lowercase were now squeezed out. While Morse in itself doesn't specify upper or lower case, transcriptions standardized on using all upper case-- which may even then have created that sense of urgent shouting that's familiar in cyberspace today. (I assume there was some way to signal 'backspace' for errors?)
The more uniform five-bit Baudot code, introduced in 1875 for teletype, and for stock tickers, combined keyboard input and printed output creating the illusion of a long-distance typewriter. This code introduced the concept of _control_characters_ for its remaining six non-alphabetic values (32 - 26 = 6), two of which were effectively 'shift-on' and 'shift-off', with the shifted values representing digits, fractions from 1/8 to 7/8 for stock-quotes, punctuation .,-/":amp;$, 'bell' and 'stop'. Two other Baudot control characters introduced the distinction between carriage-return (move the carriage back to the start of the line) and linefeed (move the paper up one line)-- a split that still causes problems. (So Baudot must sometimes have overprinted lines by using the cr without the lf? Or were they just trying to avoid extra attempts to return the carriage???)
Telegraph offices were always two-way, while stock tickers never were, and teletypes only infrequently (I imagine). And both interposed an operator at each end of any 'conversation', with a consequent timelag (except for chatting between the operators themselves-- surely the first flamewar involved two bored telegraph operators!). Eavesdropping by 'tapping' the unprotected lines was not unknown, and the military invented codes to allow for radio transmission of secret messages.
Opening up the shortwave radio bands for amateurs (hams) created the first worldwide etext-based society, but restrictions on political or religious content, plus the difficulty of conversation, kept the range of discussion very limited (weather, sports, technology, hobbies). Also, no privacy is possible (without establishing a code), and like 'talk' on the Internet, radiotelegraphy displays every keytap as it's made, including errors, with no provision for proofreading or second thoughts before you press 'enter'. Because of the slower speed of Morse code, hams innovated a shorthand of *numbers* for common phrases, later picked up by CBers: "10-4" has entered popular speech, along with "what's your 40?". Hams traditionally exchange snailmail postcards to confirm contacts, a tradition with no parallel on the Net.
Broadcast radio, though, quickly became commercialized in the US, as would television, denying individuals the power to express themselves in that medium.
Photography automated the creation of p-video images in the first half of the 19th century, with motion added around 1895. Facsimile machines used telegraph lines to transmit the first e-video, along with 'teletype art' that used alphabet characters to draw pictures. Television was developed in a series of steps between 1908 and the 1930s, providing moving e-video, and videotape recording followed in 1956.
Meanwhile, the telephone (1876) and voice-transmission by radio had allowed etext to 'revert back' to e-audio both by wire and in most shortwave bands. Telegraphy's cost-advantage over long distance telephony gradually dwindled to nothing (though as early as 1926 Nikolai Tesla foresaw the electronic delivery of newspapers to every home). So after WW2, etext moved almost entirely into the domain of computers.
[History of online services with Tesla quote]
<URL:http://www.ege.edu.tr/mirror/presno/bok/2.html>
------------------------------
Subject: 1c.etext history: computers and word processing
From the first, computers spoke a language of logic and mathematics, and (with the 'stored program' concept) introduced the novel possibility of *executing* a text: they can be viewed as *text machines*, automating parts of the thought-process. Two-valued variables (bits) were clustered to represent wider ranges of values, requiring problems to be analysed into discrete, definite dimensions. Programmers had to create new languages to address questions that had never been articulated before. And since the mind itself is also an information processor, many of these proved to be metaphors for subtle psychological realities as well.
Temporary storage of electronic data was achieved using relays (1935), sliding metal (1938), vacuum tubes (1939), capacitors (1941), CRTs (1948), mercury tanks (1949), diodes (1950), and eventually transistors (1953) and integrated circuits (1958). Permanent storage involved punched paper tape (1943?), punchcards, punched 35mm film (1938), magnetic cores and drums (1947), and eventually magnetic tape (1951), floppy disks (1971), hard disks (1973) and CD-ROMs (1983). Display options extended from lightbulbs (1938) to punchcards (1941) to teletypewriters and printers to CRTs (1956?) to plotters. Input devices included teletype (1940), plugboard (1945), cardpunch, lightpen (1951), mouse (1963). Computer languages evolved from direct rewiring to machine language to assembly language (1948) to compiled higher-level languages (1952).
[Brader's Chronology of Digital Computing Machines (to 1952)]
Typewriters had made little allowance for errors except for the backspace key (forcing the invention of whiteout and erasable bond paper), but in 1947 Friden introduced the Flexowriter, an electric typewriter with paper tape storage. Errors could be corrected by stopping the tape during replay and manually typing the correction-- a technology that also enabled personalized form letters for mass mailings. The Flexowriter became the input device of choice for many of the first computers, allowing offline composition of texts-- especially the programs themselves. In 1962 Teletype offered its Model 33 keyboard and punched-tape terminal, which would play the Flexowriter's role for many early microcomputers, and in 1964 IBM added magnetic tape storage to the Selectric, inspiring the expressions 'word processing' and 'text processing', which was upgraded to magnetic cards in 1969.
The most economical alternative to paper tape was punched cards, in use in various forms since 1801. By breaking up an etext into cards of a fixed length (eventually standardized at 80 characters, a convention that was carried over to CRTs), punchcards made error correction trivial-- you simply retyped the relevant cards. A printed 'listing' of a deck of cards made analysis and modification easier, and protected you in case cards were destroyed or shuffled. IBM's cardpunches in the 1960s, the 026 and 029, printed out the text above the punchholes, for easier proofreading, and even allowed a limited degree of programming to make card numbers increment automatically, and skip or copy fields, etc.
Programmers created the first significant body of etexts because their programs were created *as* text files (called 'source code') that were then automatically translated into machine language. Gradually, other sorts of text were added, eg program documentation and the occasional computer-science thesis. Starting around 1962, special formatting programs were introduced to automate refinements like line-justification, centering, doublespacing and margins, that would otherwise have had to be laboriously worked out by hand. Around 1964, Jerry Saltzer's RUNOFF introduced the idea of 'dot commands' mixed with the output text (on separate cards, or on lines starting with a period) to define the formatting.
Typesetters since 1932 had been using a 6-bit code called TTS (TeleTypeSetter) that supported proportional fonts and used paper tape for etext storage (formal texts only, no informal conversation, for obvious reasons). In 1945 these began to be upgraded to phototypesetting, still using TTS and paper tape, but the process was mostly mechanical, not electronic. The first computer program to automate typesetting was x in 196x. RCA and IBM offered very expensive, mainframe-driven typesetting in the 1960s, and by 1972 at least a dozen newspapers were using PDP-8's to do typesetting with descendants of RUNOFF, outputting TTS.
[Old Phototypesetter Tales]
<URL:http://www.slip.net/~graphion/oldtype.html>
CRTs (in black and white, greenscreen, and later amber), invented in the 30s, were gradually optimised for computer output (and allowed etext display novelties like inverse video and blinking text). (IBM couldn't choose at first between white-on-black and black-on-white, so included a hardware toggle.) For the first time in history, the shapes of letters were built from a matrix of dots stored in ROM (cf needlework?). The first offered only upper case.
Before the introduction of CRTs, realtime 'online' communication with the computer meant reading its typed replies off a *printing* terminal. As CRTs became more common in the 1960s, specialized programs for editing were introduced that allowed texts to be edited while still in the computer's memory, but these were mostly line-oriented, requiring one to keep indicate by a number which line one wished to change. (EDLIN for MS-DOS, written in 1980, was the last major dinosaur in this lineage, still included with Windows95?)
[Annotated manual for a 1963 'word processor']
<URL:http://world.std.com/~dpbsmith/tj2.html>
[An afc thread on early editors]
<URL:http://ars-www.uchicago.edu/~eric/afc/wp>
TVEDIT by Brian Tolliver (?) was the first 'screen editor' in 1963, followed by the Rand Editor in 1967. Both were repeatedly modified and ported to other operating systems. Doug Engelbart's NLS (Augment) system was demoed in 1968, with screen-editing using a mouse, leading directly to the first WYSIWYG wp, Bravo on the Xerox Alto, in 1974. A bitmapped font-editor for the Alto was built in 1972, but these innovations were so far ahead of the pack that their immediate influence was very small.
[Engelbart on being overlooked in 1968]
<URL:http://innovate.si.edu/history/engel/engel13.htm>
[Alto page]
[Giant GIF of the Alto, not showing WYSIWYG]
<URL:http://ei.cs.vt.edu/~history/Alto.GIF>
[Xerox history?]
<URL:http://www.spies.com/aek/xerox.html>
Programmers also had to handle text-strings *within* their programs, eg for labelling printed output, and these techniques were used in writing their editors and formatters as well. (An early programmers' quandary: to save strings in uniform 80-character records, or in variable-lengths, marked with a zero at the end, or with a length count at the beginning? Such decisions sometimes depended on the convenience for a given system, but like QWERTY may endure today.)
As new hardware and software were created, their requirements for character sets also led to compatibility problems. While typesetters and electric-typewriter-based printers (like the Flexowriter) supported both upper and lower case text, punchcards supported only upper case and required formatting codes to distinguish lower. A six-bit uppercase character set was standardised by ASA in 1963, followed by the seven-bit ECMA standard in 1965 that led to ASCII (see below). IBM took a different path from Binary Coded Decimal (BCD) to EBCDIC (Enhanced Binary Coded Decimal Interchange Code).
[a short history of ascii]
<URL:http://www.tbi.net/~jhall/history1.html#ascii>
[the origins of ASCII]
<URL:http://search.dejanews.com/getdoc.xp?recnum=%3c592eag$lr7@reader.seed.net.tw%3e&server=db96q5>
[why EBCDIC was so strange]
<URL:http://search.dejanews.com/getdoc.xp?recnum=11817283&threaded=1&server=db96q4>
As the typewriter had become a symbol of the secretary (and of the professional writer), computer printing began as a symbol of science, and of big business's data processing departments. The gentle 'tap tap' of typewriters was replaced there by the extreme decibel levels of card punches and lineprinters. But even by printing out columns of numbers-- and even more with the first *graphs* drawn using character graphics-- the computer was retracing the early discoveries of handwriting in L->R lines.
[Folklore thread on early printers]
<URL:http://search.dejanews.com/dnquery.xp?search=thread&filter=&svcclass=dnold&threaded=1&recnum=%3cE0Eo84.F1s@midway.uchicago.edu%3e%231/1>
While programmers don't often need fancy formatting options, they do love efficiency, so their text editors innovated many sophisticated capabilties like search-and-replace, macros, and version control. (They seem to take a degree of pride in user-unfriendliness, as well...) As early as 1962, a sophisticated language called TECO (Tape Editor and COrrector) by Murphy and Greenblatt was used to make editing more efficient, allowing the construction of *macros* for the first time.
Since TECO's birth in 1962, a body of macros had been accumulating and evolving, leading Richard Stallman and others around 1975 to create a unified set of macros that created the illusion of a window onto the text itself, with editing commands producing immediate visible results-- ultimately dubbed the Emacs word-processor, later rewritten by Stallman in a dialect of LISP. Because Emacs originated as macros, it embodied from the first a unique quality of *infinite extensibility* that makes it the text-hacker's choice to this day (see below).
[Les Earnest on early wps]
<URL:http://maelstrom.stjohns.edu/CGI/wa.exe?A2=ind9608C&L=cyhist&P=R2192>
[Links for history of MS Word]
<URL:http://www.wordinfo.com/links/other.htm>
[Emacs code to emulate TECO, with language summary]
<URL:ftp://ftp.mindlink.net/pub/teco/usc-archive/gnuteco.el>
[TECO macro (cf line noise) for calculating pi]
<URL:http://wwwis.cs.utwente.nl:8080/~faase/Ha/Pi_TECO_macro.html>
[Multics Emacs history]
<URL:http://www.lilli.com/mepap.html>
Unix, invented in 1969, quickly became the preferred platform of serious hackers, because it encouraged the development of software tools that worked together in a highly customizable way. Always brilliant, these were often user-interface disasters, simply because they assumed a programmer-mindset in their users. (One example: unreadable 'man pages' for online documentation.) Curiously, the first recorded use of Unix (after the developers themselves) was a word-processing system for patent applications.
Floppy disks, introduced by IBM in 1971, gradually made word processing more practical by allowing etexts to be easily archived and transported. It was in 1971 that Michael Hart conceived Project Gutenberg, to use the power of electronic texts for public education, setting the goal of 10,000 texts offered by the year 2000. Wang, who'd scored a great success with a typesetter in 1964, and followed up in 1971 with a dedicated word processor with a 133k tape, added disk storage and a CRT in 1976-- as did IBM. (~$20k?) A textbook, "Word Processing" by Rosen and Fielden, was published in 1977. Qyx, IBM, and Olivetti introduced all-electronic daisywheel typewriters in 1978. And by 1980, IBM's Displaywriter system included spellchecking.
[A brief history and overview of word processing]
<URL:http://www.css.msu.edu/WordProcessing.html>
Don Lancaster had described a "TV Typewriter" for hobbyists in 1973, but the first microcomputer word processor, Michael Shrayer's Electric Pencil, wasn't created until 1976, for the Altair. Audio tape cassettes were a popular form of storage. It was quickly followed by a vast field of competitors experimenting with new features. The Apple ][ was limited by offering only upper case and 40 columns from 1977 to 1980?. WordStar for CP/M premiered in 1979 with intricate control-key combinations that were beloved by touchtypists who could remember them. WordPerfect for the Data General also premiered in 1979 and began setting a new standard for attentiveness to the desires of its users, achieving spectacular success when it was ported to the PC in 1984.
['Pete' Peterson has written a revealing history of WordPerfect called "Almost Perfect"]
[Capt Crunch on the creation of EasyWriter]
<URL:http://www.vcomm.net/~crunch/ibmstory/>
[AppleWriter and AppleWorks, detailed descriptions]
WordPerfect was one of the first wp's for micros to allow text to be inserted anywhere in the document, with the text below it being (almost) immediately reformatted on the screen. (Applewriter 2 in 1981 may have been an earlier precursor.) Wang, and its popular software clone MultiMate, insisted on opening up a special insertion-area, even for a single-letter insertion. WordPerfect also offered a 'Reveal Codes' mode that showed hidden formatting codes (a feature that ought to be universal by now).
Probably the most important factor in choosing a word-processor in this period was how its output looked: could it do proportional spacing, etc. This depended on the combined capabilities of the video display and the printer. Epson introduced its popular MX-80 dot matrix printer in 1978, introducing a graphic standard that made fonts and faces possible (but only slowly!) under software control.
IBM's first PCs had offered business users only monochrome text, no graphics, so while 'on-screen formatting' was possible (indents, centering, line- and page-breaks), WYSIWYG was not, at first (What You See on screen Is What You Get on the printer: fontfaces, sizes, and styles). Improving printers inspired the first etext WYSIWYG, requiring video displays that supported high-resolution bitmapped graphics, and (in the mid-80s) requiring major rewrites of word processor code to support PostScript output. Xerox's Alto computer had supported WYSIWYG text since 1974 in its wp Bravo, which led to the 'modeless' BravoX, whose designer Charles Simonyi developed Microsoft Word for the PC in 1983. MacWrite debuted with the Macintosh in 1984, a year that also saw the HP Laserjet, followed by Pagemaker and Apple's LaserWriter in 1985.
Toshiba's 24-pin printers briefly set a new standard, replaced by much quieter and cheaper inkjet technology in 19. The proliferation of incompatible printer-control codes was a continuing headache for early word-processor programmers-- WordPerfect's success was partly due to their outstanding effort to support every known printer.
[Newsgroups]
<URL:news:comp.periphs.printers>
<URL:news:comp.laser-printers>
<URL:news:comp.fonts>
In the era before the PC, every sort of word processing platform had a different, incompatible file format, and converting from one to another was an enormous headache. The success of the IBM PC after 1981 briefly made the 360k, 5.25" floppy a reliable media-standard, but formatting codes remain only minimally standardized to this day.
[A partial inventory of incompatible tape and diskette formats]
<URL:http://www.shaffstall.com/prod_iii.html>
[an exhaustive list of file formats]
<URL:http://www.pivar.com/form/index.html>
integrated software 1983
While the mouse-based click-and-drag GUI was a splendid advance for most applications, for word processing it has serious unresolved problems: it's just too hard to accurately mark the start and end of a text passage with the mouse!
Microsoft Windows
Windows Help
Mac balloon help
annotated interface
------------------------------
Subject: 1d. etext history: networking, WYSIWYG, and the future
Since the first computers stored files on punchcards and paper tape, protecting their privacy was not an electronic issue. But as magnetic storage on tape and disk became economical, and as time-sharing operating systems were developed, individuals had to have passwords to protect their privacy. But there was also the possibility of shared files that anyone could add comments to (possibly evolving from a grafitti-like academic tradition of written debate on bulletin boards). This led quickly to the idea of local electronic 'bulletin boards' which evolved by 1979 into the Usenet newsgroup protocol.
'threading' = computer multitasking (mid-50s) *or* hypertext
Mortimer Adler's "Great Conversation" views library as threaded hyperdocument
topic-drift, subjectlines
file-hierarchies
Simple though it may seem, private email is actually considerably more complex than public shared-files, and appeared only after 1970. It would have been possible to transfer private files via public spaces, but only with the risk of interception. It would be easy to allow all users to place files in others' private directories (write-only access), but this places burdens of file-management on the recipient that make various sorts of mailbombing too easy, so email required that users be able to place the private file in a separate location with its own file-management, that could be accessed at the recipient's convenience.
Systems that allowed many simultaneous users at terminals allowed 'talk'. The first commercial modems in 1965 extended this capability over phonelines: "The Community Memory Project was very much about computer-mediated communication, but the emphasis was on placing terminals in a rich social space. The first terminal (a teletype in a cardboard box) had a full time-human attendant (barker & trainer) and frequently a line waiting to use noisy thing. The result was an interesting mix of cyberspace with flesh and blood, where the biological space was usually dominant. There were some fascinating exceptions to this which were the CM crew's first hints of the way Gibson's poorly imagined, but very accurately tagged "consensual hallucination" would develop."
Possibly the greatest feature inherent in etext is its *searchability*. Searchable shared read-only file access goes back all the way to 1954, when an etext search service was offered by the Naval Ordinance Test Station in Michigan. In 1960 a database of medical literature called MEDLARS was launched. Search also allows easy counting and sorting: in 1967 the 'Brown Corpus' used etext to tabulate the frequencies of a million words of text (50k different words, most frequently: the-of-and-to-a-in-that-is-was-he). The Bible was a leading subject for early computer text analysis.
Scifi had been portraying humanoid robots for decades, and machine translation was expected to deliver quick returns, but instead revealed the vast hidden complexities of ordinary language.
Computer programs to offer text interactively, simulating a human conversation, have their roots in the first limited-audience computer-based education dating from 1960. Eliza dates from 1966, the first adventure games from 1976.
"My impression is that the Plato system was the first were a sense of cyberspace developed and then probably Murray Turoff's EIES system in NJ. I am not sure of the dates for USENET and it's news groups, but the size of its community in the late 70's makes me believe it was the first city constructed in cyberspace."
1972: "PLATO IV got going on a really big mainframe Cyber-series machine, simultaneous terminals at physically different locations drove the development of online communication systems such as Notes, talkomatic, and the wealth of multi-player interactive games which really were PLATO's greatest achievement."
Also around this time were the very early interactive e-video-games, inluding checkers, chess, tic-tac-toe, and Spacewar, whose graphic display allowed players to pilot their spaceships around the screen in realtime. (The mouse was invented in 1965, vastly increasing the sense of *gesture* in communicating with the computer. Sutherland's drawing program 'Sketchpad' had used a lightpen in 1962. So programmers were now faced with the challenge of creating a simulated, graphical world that persuades the intuition to relax control via gesture.)
Networking (since SAGE air-defense system in the late 50s) led to innovation in e-society: commercial modems (300 baud) for timesharing around 1965 allowed teletypes to talk over phone lines, PLATO in the mid-60s, ARPANET from 1969, email from about 1970, Ethernet from 1973, the first BBS from 1978, Hayes' first 300 baud modem, CompuServe, and Usenet newsgroups in 1979, 1200baud in 1980, the French Minitel network and BITNET from 1981, and the Internet from 1983, 2400baud in 1984. Email and newsgroups on the Internet, dominated by technical wizards who were outsiders to traditional social conventions, created a new sort of communication, quickly extending far beyond tech talk, and establishing its own standards of argument (and humor :^). Net-wide decisions were made using an unusually open procedure based around RFCs.
The next evolution was networked computers, where continuous contact is made possible by dedicated lines. "...a strong case for the Ethernet at Xerox PARC, circa 1973, as being the first system that was ubiquitous in an organization and that was net-centered, rather than mainframe/timesharing centered."
1963: American Airlines SABRE System (first airline reservation system)
"Many of the features of rich computer-mediated communication--communication over short and long distance; a very large user community; multiple venues for communication; the use of handles for anonymity; support for extended communication by email; etc.--were present in the early- and mid-1970s on CDC Cyber computers through the use of TALK (popularly known as "X,TALK") and PPC (written by Mike Huck?). X,TALK and PPC were forerunners to IRC. One of the home-brew email systems was called +WRITE+ and was written by someone with the handle Aragorn (I think)."
"One of the projects at PCC was a program called "Public Caves". It was the ancient ancestor of today's "MUD"s -- Multi-user Dungeons. There were knockoffs of this program at Berkeley around 1980, and I did an implementation..."
1979: IBM's Audio Typing Unit offers text-to-speech
talking dolls
Eliza: 1966
TI Speak'n'Spell
AI offered parsing of syntax, but 'grammar checkers' based on this technology proved mostly useless. The Text Encoding Initiative
RTF
[origins of SGML]
mailing lists SFLOVERS 1979
netnews, Canter and Siegal, spam, alt.*
fax
Kurzweil's OCR from 1976
barcodes
voicemail, digital tv
pagers, GPS
laptops
1984 saw the publication of Hans Gabler's computerized edition of James Joyce's Ulysses, attempting to make sense of a vast body of typos and modifications, which at first overawed the critics, but gradually came to be rejected as deeply flawed by false theories and imperfect execution.
MS Bookshelf on CD-ROM, and HyperCard, in 1987.
1988: Tetris released
the surprise success of the Internet via the WorldWide Web
incompatibility of HTML and 80a
academics miss the boat by attempting status-display, posting jargon, PostScript (3 Mb for 30 pages!)
AOL and Netscape ignore 80-column convention
Microsoft's character set
1991: Lotus Development abandons plans for Lotus MarketPlace
1991: The ban on business is lifted on the Internet?
1993: Newton PDA, MS Encarta multimedia encyclopedia
Pico
streams:
input streams: telegraph key, teletype kb, paper tape, cardpunch, joystick, mouse, ocr scanner
display streams: telegraph receiver, teletype printer, cardpunch, lineprinter, crt, plotter, 9-pin dot-matrix, 24-pin dot matrix, laser, inkjet
transmission streams: modems 110, 300, 1200, 2400, 14.4k, 28.8k, 56k
volatile storage: relay, mercury tube, magnetic core, vacuum tube, transistor
archival storage: paper tape, punched card, magnetic tape, hard drive, floppy drive, ROM (game cartridges, often called 'tapes'), bubble memory, cd-rom
------------------------------
Subject: 2. character sets: ASCII and Unicode
character sets Morse: Baudot: TTS: ASCII-US EBCDIC: ISO 8859-1:
ASCII table:
Only the numbers from 32 to 126 (20 to 7E hex) are defined as *printable* characters (the others are defined as control codes):
0 1 2 3 4 5 6 7 8 9 A B C D E F
=--------------------------------
2 | ! " # $ % & ' ( ) * + , - . / <- <- <- 20 hex is the
3 | 0 1 2 3 4 5 6 7 8 9 : ; < = > ? blankspace
4 | @ A B C D E F G H I J K L M N O
5 | P Q R S T U V W X Y Z [ \ ] ^ _
6 | ` a b c d e f g h i j k l m n o 7F is non-printing
7 | p q r s t u v w x y z { | } ~ <- in the US ("rubout")
Unfortunately, this narrow standard ignored the needs of many other cultures:
the British 'pound' sign, letters with accents in French and Scandinavian
alphabets, etc., which led each to introduce slight, incompatible
modifications to the standard.
Though not without its problems, ASCII remains today almost universally accepted as the common basis for encoding the roman alphabet. When other characters are required, they can be included by substitution or extension. For example, utilising the eighth bit allows 128 more characters to be defined, though many different conflicting schemes have been implemented, by IBM, ANSI, Commodore, Atari, etc. (Through the devious efforts of SubGeniuses Matt Householder and Candi Strecker, the Atari 800 character set even included the face of Bob Dobbs, split between two neighboring values.)
The dominant standard, which supports most European character sets, is called ISO 8859-1. The WWWeb universally (?) supports this standard. It adds: no-break space, soft-hyphen, inverted exclamation and question marks, cents, pounds, yen, currency-sign, broken bar, section-sign, macron, some fractions, degree, copyright, registered, not-sign, plus-minus, some superscripts, micro, pilcrow, middle dot, multiplication and division signs, various letters with diacritical marks: grave, acute, and circumflex accents, tildes, diaeresis, ring-above, oblique bar, cedilla, stroke, and extra letters: eth, thorn, sharp s, and the ae ligature.
[a superb overview of character sets]
<URL:http://daffy.robelle.com/smugbook/char.html>
[An informative FAQ about internat'l 8-bit standards, biased towards Unix]
<URL:http://www.vlsivie.tuwien.ac.at/mike/i18n.html>
[various ASCII charts]
<URL:http://www.tbi.net/~jhall/ascii1.html>
[GIFs of IBM's extended ASCII
symbols]
[more extended ascii tables]
<URL:http://ssrl.rtp.com:443/library/Data_Formats/ASCII/>
An ambitious attempt to extend the system to 16-bits (65,535 possible values, 34,168 actually defined) covering all the alphabets of the world, and all the most important special symbols, is called Unicode, already adopted in Windows NT. (By a corollary to Godel's theorem, however, artists will always find ways to subvert such efforts towards completeness.)
[Unicode overview]
<URL:http://www.asca.com/unicode.html>
[Unicode stuff]
[an amazing 440k text-only description of the Unicode character set]
[a critique of Unicode for Chinese]
[a terrifying 42k list of officially recognized international character sets]
<URL:http://www.isi.edu/in-notes/iana/assignments/character-sets>
[a terrifyingly complex utility for converting among them]
<URL:http://sizif.mf.uni-lj.si/linux/cee/recoding.html>
[some frivolous unofficial Unicode sections for creations like Dr Seuss,
Tolkien, Ferengi and Klingon]
[Unicode links]
While Unicode retains the conventional namings for the ASCII characters from 0 to 127, it demands twice as much storage to express them: 16 instead of 8 bits. And for historical reasons, the Internet has traditionally respected a *seven-bit* limit for most transactions, so eight-bit schemes have an uncertain future. (When transferring files, the 'binary' setting is required for eight-bit files, while 'text' assumes seven-bit ASCII.) So schemes that work within the seven-bit ASCII character set, however inelegantly, should be explored-- for example, representing diacritical marks with extra punctuation before or after the affected base-letter: Go:del's theorem. (Again, I believe it's a mistake to demand too much elegance, completeness, or even perfect consistency. A degree of makeshift is often the most appropriate course.)
------------------------------
Subject: 3. Internet etext standards
The Internet seven-bit ASCII standard has shaped a variety of emergent standards since ARPANET in 1969, evolving towards an optimally universal format for communicating to a very broad range of hardware display capabilities:
Software developed in ignorance of these conventions (especially by Microsoft, where Internet connections were forbidden for many years as a security risk) produce characteristic problems: lines that stretch beyond the 80th column, 'curlyquotes' that may display like PthisQ instead of "this", or ANSI graphics characters that may display in any number of ways. And a special problem is that Microsoft chose to break text lines with both a CR and an LF (carriage return and linefeed), causing them to display on the Net with a "^M" at the end of each line. (All word processors ought to have a 'Net preview' mode that reveals these in advance.) Internet convention also recommended that character sets leave values 128-159 unused because they're transformed into semi-dangerous control codes 0-31 if the Net strips their eighth bits.
Unix's case-sensitivity has led to dilemmas regarding the usual rules of capitalization: is it go2net or Go2Net? filenames vs document titles (vs document summaries) John_-_Winston johnwinston john.winston
-> ^ | arrows <- | V
[Smiley FAQ]
http://www.newbie.net/JumpStations/SmileyFAQ.html
http://www.newbie.net/JumpStations/SmileyFAQ.html
[an outdated inquiry into the CR/LF/^M problem]
[Newsgroup: comp.mail (.*)]
<URL:news:comp.mail>
[Newsgroup: alt.comp.blind-users]
<URL:news:alt.comp.blind-users>
[Newsgroup: comp.speech]
<URL:news:comp.speech>
[FAQ-format FAQ]
<URL:ftp://rtfm.mit.edu/pub/usenet/news.answers/faqs/minimal-digest-format>
[Newsgroup: news.answers (FAQs)]
<URL:news:news.answers>
The conventions of monospaced fonts with the ascii character set have allowed a degree of experimentation with ascii-art and ascii page-layout, most commonly in the login screens of BBS's. Decent tools for this have yet to be written, but there is one fine program called Figlet that offers dozens of enlarged ascii fonts.
[Figlet home page]
<URL:http://st-www.cs.uiuc.edu/users/chai/figlet.html>
[Newsgroup: rec.arts.ascii]
<URL:news:rec.arts.ascii>
[Newsgroup: alt.ascii-art]
<URL:news:alt.ascii-art>
[Newsgroup: alt.ascii-art.animation]
<URL:news:alt.ascii-art.animation>
------------------------------
Subject: 4. Programming conventions
The close association between the Internet and Unix and the C programming language has led to some of C's notations becoming common parlance, eg "*" and "?" as wildcards, "!=" for not-equals (instead of the more common "<>"). It's unfortunate that there's so little standardization even in the common conventions of programming languages, eg the delimiters for comments:
C: /*comment*/
Fortran: ! comment
Pascal: {comment}
Ada: -- comment
LISP: ; comment
Perl: # comment
PostScript: %% comment
RTF: \comment comment
HTML/SGML: <!-- comment -->
In another direction, the 'literate programming' movement is trying to
embed their sourcecode within the prose of its own documentation.
Programmers have also led the way in studying the problems of version-control, tracking the history of revisions in a document, with conventions for annotating changes and utilities for analysing the differences between two similar documents. Groupware extends this to track multiple authors and editors. (Lotus Notes probably dominates this field.)
Bibliographers have begun exploring conventions for annotating the evolution of manuscripts. In my own 'genetic' studies of James Joyce, I use pure ascii:
?uncertain *** unreadable <insertion> [deletion] | linebreak || pagebreak[Newsgroup: comp.groupware]
[Compact Composer (CoCo), a system for encoding music as ASCII]
<URL:http://platine.ulb.ac.be/MUS/syntax.html>
------------------------------
Subject: 5. Structural markup
The Text Encoding Initiative (TEI) uses a different approach, based on SGML's system of <emphasis>tags</emphasis>, also used for HTML. These files must be processed/translated before posting them to the Internet, and it's unclear to me whether AI and NLP have progressed far enough to choose tags that are really useful.
SGML was formalized in 1986 with the goal of enabling *structural* as opposed to graphical markup, with tags designating abstractions like <title> instead of formats like 24-point Palatino bold.
[SGML history by Goldfarb]
<URL:http://www.sil.org/sgml/sgmlhist0.html>
The opposite pole is represented by RTF, PDF, PostScript, and TeX/dvi. These try to offer fine control over font faces, sizes, styles and colors, margins, centering and justification, paging, tables, embedded images, etc.
HTML3 tries to offer a compromise in the form of stylesheets that will allow the author to *suggest* the formatting... but I'm unconvinced there's much gain from the intervening layer of structural markup.
SGML's malign effect on HTML: Containers for anchors and paragraphs no whitespace control, no pagebreaks
The primary competing standards currently are: HTML, SGML, RTF, PDF, PostScript, and TeX/dvi. The many advantages these offer, though-- in formatting, hyperlinks, and graphics-- could easily be captured in footnotes at the end of each 80a document, allowing this single file, without modification, to be posted to netnews, or emailed, or transmitted via telnet, or fingered in a .plan file, or (in theory) published via the WWWeb.
[Newsgroups:]
<URL:news:comp.text.sgml>
<URL:news:sci.lang>
<URL:news:sci.lang.translation>
<URL:news:alt.comp.linguistics>
<URL:news:comp.speech>
<URL:news:comp.ai.doc-analysis.misc>
<URL:news:comp.ai.nat-lang>
------------------------------
Subject: 6. 'Procedural' markup
RTF
PDF
PostScript
TeX/dvi
troff
setext
doc, wpd
formatted text
html standard (comp.infosystems.www.authoring.*, alt.html)
fonts, styles
paging, margins
images
RTF specs in txt format (!??) and various RTF tools
Ian Feldman's setext model for nonintrusive markup.
Enriched text format
A gov't clearinghouse of standards for info interchange.
[Clearinghouse for links to data-format info]
<URL:http://ssrl.rtp.com:443/library/Data_Formats/>
[Newsgroups]
<URL:news:comp.text>
<URL:news:comp.text.tex>
<URL:news:comp.text.pdf>
<URL:news:comp.text.desktop>
<URL:news:comp.text.frame>
<URL:news:comp.text.interleaf>
<URL:news:alt.aldus.pagemaker>
<URL:news:comp.lang.postscript>
<URL:news:comp.publish.prepress>
------------------------------
Subject: HTML
[Newsgroup: comp.infosystems.www.authoring.html]
<URL:news:comp.infosystems.www.authoring.html>
[Newsgroup: alt.html (.*)]
<URL:news:alt.html>
------------------------------
Subject: 7. File conversion and compressiion
compression
[Newsgroup: comp.compression]
<URL:news:comp.compression>
A very thorough
compression FAQ.
FTPable
decompression utilities for various platforms (MS-DOS, Mac, Windows, Unix, etc) for these filetypes: arc, ark, arj, base64, bck, cpt, ddi, exe, F, gif, gz, ha, hap, hpk, hqx, jam, lha, lzh, MIME, pak, pit, pp, ?q?, rar, sea, sdn, shar, sit, sqz, tar, tar.Z, tar-z, taz, tar.gz, tgz, tar-gz, tar.z, td0, uc2, Y, z, Z, zip, zoo, and ??_
A shorter page
for arc, arj, hqx, cpt, gz, lha, lzh, MIME, shk, sit, tar, uu, z, Z, zip, zoo
[Newsgroup: alt.comp.dataconversion]
<URL:news:alt.comp.dataconversion>
[Newsgroup: alt.comp.compression]
<URL:news:alt.comp.compression>
formats
basic
.zip
.hqx
.tar
.Z
The idea of structural markup is to facilitate complex info-retrieval tasks, but this area is just beginning to be explored with regard to prose (rather than simple text-fields in databases). It's not at all certain that the TEI's categories will prove to be the most useful ones.
------------------------------
Subject: 8. Word-processors
[A good Nisus page]
<URL:http://www.mcelhearn.com/nisusmain.html>
[Mark of the Unicorn, Inc.]
<URL:http://www.motu.com/>
[Applewriter history]
<URL:http://www.hypermall.com/History/AH18.html>
[AppleWorks history]
<URL:http://www.hypermall.com/History/AH19.html>
[a thorough Wordstar page]
<URL:http://www.e-z.net/~paul/wswin/>
[A very thorough WordPerfect Page]
<URL:http://raf.rutgers.edu/dmitriy/wordperfect/wp.htm>
[A short hotlist of MS Word pages]
<URL:http://www1.tpgi.com.au/users/twhelan/otherwp.html>
[Detailed 1995 comparison of Mac's Word, WordP, and Nisus]
<URL:http://www.macworld.com/pages/march.95/Feature.449.html>
[How Microsoft Word Handles Formatting]
<URL:http://www.knowhow.com/wwfmt1a.htm>
[Large segmented hotlist of MS Word links]
<URL:http://www.wordinfo.com/links/default.htm>
[A clearinghouse of wp-related resources]
<URL:http://www.in.net/~smschill/softcomm.htm>
[Lots of word-processor links]
<URL:http://www.hic.net/goliad/wordbk.htm>
[A Mac word processing page]
<URL:http://www.astro.nwu.edu/lentz/mac/software/wp.html>
macro languages
[an AltaVista pattern for finding wp macros]
<URL:http://www.altavista.digital.com/cgi-bin/query?pg=q&q=%2bmacros+%2bword+%2bdownload>
Deja News Author Profile on dski@cameonet.cameo.com.twx (Dan Strychalski)
Deja News Author Profile on dski@cameonet.cameo.com.tw
WPCORP-L@LISTSERV.ACSU.BUFFALO.EDU - Archives
Defense of WordStar
detailed MacWordPerfect review
[PCWorld's wp tips]
<URL:http://www.pcworld.com/software/word_processing/>
[Windows magazine's wp reviews]
<URL:http://www.winmag.com/ibg/wp/wprefs.htm>
[Yahoo's wp page]
<URL:http://www.yahoo.com/Business_and_Economy/Companies/Computers/Software/Desktop_Publishing/Word_Processing/>
[GNU Emacs Lisp reference manual]
<URL:http://funnelweb.utcc.utk.edu/~harp/gnu/elisp/elisp_toc.html>
[Matt Neuberg's paeon to Nisus]
<URL:http://www.ssrc.hku.hk/tb-issues/TidBITS-116.html>
[A paeon to word processing]
<URL:http://www.tcp.ca/1996/96June/96JuneEd/Edletter/Edletter.html>
[Newsgroups]
<URL:news:comp.editors>
<URL:news:alt.comp.editors.batch>
<URL:news:comp.text>
<URL:news:comp.emacs>
[Newsgroup: gnu.emacs (.*)]
<URL:news:gnu.emacs>
<URL:news:alt.lucid-emacs.help>
<URL:news:comp.lang.perl>
<URL:news:comp.lang.icon>
<URL:news:comp.lang.snobol>
<URL:news:alt.lang.teco>
<URL:news:alt.religion.emacs>
<URL:news:bit.listserv.techwr-l>
<URL:news:bit.listserv.wpcorp-l>
<URL:news:misc.writing>
------------------------------
Subject: 9. Info-retrieval
The domain of info-retrieval ranges from simple searches in popular word processors, to the much more sophisticated 'grep' facility introduced in Unix, to word-processor macro languages, to Personal Information Managers (PIMs) that allow freeform personal textbases, to WWWeb search engines like AltaVista that index every word in tens of millions of webpages, to natural language understanding (NLU) projects that try to extract the underlying meaning from ordinary prose.
GREP:
[A tutorial on regular expressions]
<URL:http://www.lib.uchicago.edu/keith/tcl-course/topics/regexp.html>
The AltaVista search engine overthrew old concepts of info-retrieval by offering a free, fast concordance of a huge portion of the WWWeb. Its designers made many thoughtful choices as well, to offer the fastest and most flexible retrieval possible. Apparently their index:
This last makes it very easy for individual sites to piggyback a 'private' concordance of their own pages by offering a direct link to AltaVista that includes a preset value for 'host:'. (Similar tricks might turn AV into a selective concordance just for, eg, the Net's literary etexts.)
As useful as this is, it conspicuously lacks the ability to track down, eg, a biographical sketch of a particular celebrity, because there's no reliable sequence of words that uniquely characterizes biographical sketches! This general challenge is being explored by many different groups in many different ways, but it centers on the primary unsolved puzzle of artificial intelligence (AI): finding a neat universal map of human concepts. So a quick solution is unlikely.
[Newsgroup: comp.theory.info-retrieval]
<URL:news:comp.theory.info-retrieval>
alt.internet.search alt.fan.dejanews
The Icon Programming Language
Introduction to Text Analysis
Bible Analysis for Scholars
------------------------------
Subject: 10. Scanning/OCR
A complementary process to document search is correction of inputting errors, eg spellchecking. Even without the benefit of proofreading AI, optical character reading (OCR) software has progressed far enough in the last few years to scan printed documents into text files with 99% accuracy, so long as they're clearly printed. As the AI continues to improve, they should become better at guessing ambiguous words, and eventually this should allow them to translate even spoken language into accurate copy.
Scanning tips from a commercial site
[Newsgroup: comp.ai.doc-analysis.ocr]
<URL:news:comp.ai.doc-analysis.ocr>
[Newsgroup: comp.periphs.scanners]
<URL:news:comp.periphs.scanners>
[Newsgroup: alt.comp.periphs.scanners]
<URL:news:alt.comp.periphs.scanners>
------------------------------
Subject: 11. Epublishing: archives, etc
The body of etexts freely available on the Internet is now growing exponentially. Beginning with Project Gutenberg in 1971, which now offers about 1000 texts.
etexts
clearinghouse for etext sites
Project Gutenberg plain-vanilla etexts since 1971, now ~1000 etc
A convenient quick index to PG txt versions.
copyright
[A superb collection of essays on copyright in electronic publishing, from the Educom Review]
<URL:http://www.educom.edu/web/pubs/review/legPolIndex.html>
[Newsgroups]
<URL:news:alt.ezines>
<URL:news:comp.publish.cdrom>
<URL:news:comp.publish.electronic.misc>
<URL:news:comp.publish.electronic.>
<URL:news:comp.publish.electronic.>
<URL:news:alt.cdrom>
<URL:news:alt.cdrom.reviews>
------------------------------
Subject:
12. Hardware
13. Hypertext
vannavar bush's memex, alan kay's dynabook, ted nelson's xanadu
conversion/ compression
hypertext (alt.hypertext)
ms help
display (comp.human-factors)
black-on-white?
handhelds
sony bookman
a huge, ugly well-linked but unannotated
bibliography on e-publishing
ocr (comp.ai.doc-analysis.ocr)
effectiveness, best packages
ezines
footernoise model
concordances (alt.internet.search, alt.fan.dejanews)
markup
conversion and compression
info-retrieval
word processors
scanning/ocr
spellcheck
keyboards, ergonomics, rsi
portables
etext archives
epublishing
copyright
publicity
status
being linked, linking
server games: SGML->HTML (UVa)
etext style innovation
hypertext
interactive fiction
etext history
vannavar bush's memex, alan kay's dynabook, ted nelson's xanadu
standards
ascii
conversion/ compression
hypertext (alt.hypertext)
ms help
[Newsgroups]
<URL:news:alt.hypertext>
<URL:news:comp.infosystems.www.authoring.site-design>
<URL:news:comp.multimedia>
<URL:news:alt.multimedia.director>
<URL:news:alt.multimedia.cu-seeme>
<URL:news:alt.multimedia.toolbook>
<URL:news:alt.authorware>
<URL:news:comp.sys.mac.hypercard>
[Newsgroup: rec.arts.int-fiction]
<URL:news:rec.arts.int-fiction>
[Newsgroup: rec.games.int-fiction]
<URL:news:rec.games.int-fiction>
[Newsgroup: alt.multimedia.cu-seeme]
<URL:news:alt.multimedia.cu-seeme>
[Newsgroup: (alt.irc (.*)]
<URL:news:alt.irc>
[Newsgroup: alt.pub.callahans (alt.pub.*)]
<URL:news:alt.pub.callahans>
display (comp.human-factors)
black-on-white?
handhelds
sony bookman
[Newsgroup: comp.sys.palmtops]
<URL:news:comp.sys.palmtops>
[Newsgroup: comp.sys.pen]
<URL:news:comp.sys.pen>
[Newsgroup: soc.libraries.talk]
<URL:news:soc.libraries.talk>
archiving
Electronic Records
[A chatty intro to 'metadata' for indexing]
<URL:http://info.lib.uh.edu/pr/v6/n4/capl6n4.html>
Hermit's Electronic Publishing Page
Some near approaches:
[The Educom Review archives]
[PACS Review subject-index]
<URL:http://info.lib.uh.edu/pr/bysub.htm>
Voice of the Shuttle: Technology of Writing Page
BOBBI'S PLACE
Instructional Technology Courses
The History of Computers & The Internet
Fang's Media History Timeline
Timeline of Microcomputers
AN INCOMPLETE, BUT FASCINATING, HISTORY OF DIGITAL MEDIA
Probert E-Text Encyclopaedia
Network-Based Electronic Publishing of Scholarly Works: A Selective Bibliography
Educom-Educom Review
Graphical User Interface History
THE RISE OF THE GRAPHICAL USER INTERFACE
Visual Design for the User Interface, Part 1
The Virtual Community by Howard Rheingold: Chapter Three
THE HISTORY OF COMPUTING
Al's Xerox Workstation Collection
WordStar: A Writer's Word Processor
Suggestions for Software Designers
The Unofficial WordPerfect Web Site
The Unofficial WordPerfect Web Site
The Unofficial Microsoft Word for Windows Home Page
KnowHow - How Microsoft Word Handles Formatting - for WordPerfect Escapees
CYHIST archives -- August 1996, week 3 (#33)
[Up- home] [Map] [No Next] [Robot Wisdom home page] (Feedback)