Jun 29, 2004

chaos in verse, anyone?

I was bored, and typing random phrases on Google, okay, googlewhacking, which was an addiction for three days once (I had a great list, but it's gone, daddy, gone).

What to make of this, then, when using "disordered poetry" as a search phrase:

The number of idols of death, or physical causes of Lipari, at most of Rome. The practice of charity and of the other. In the object of Plato, since, so disordered poetry, he afterwards corrected as success. As well as mute in consequence with less to the number. But it is chosen by such imperfect circumstances of the general scandal for the most eminent saints. How various and alarming signs of Cyprian, how shall find at last and jealousy. A temporary gospel, which neither their proportion.


It seems like a translation of a translation of Gibbon's Decline and Fall. Digging slightly deeper confirms this suspicion: "The text here is generated randomly using words from the famous Chapter XV of Gibbon's Decline and Fall of the Roman Empire."

To generate a sentence, the script first chooses an initial word and then chooses the subsequent words one by one, always taking the preceding word and the corresponding probability distribution into account. Eventually, one of the end-of-sentence punctuation symbols (such as ., ? and !) might be generated, in which case the sentence is complete. Alternatively, after a certain number of words has been generated, the script will try to choose, among the last few words, a suitable one where the sentence might end. It will try to choose a word that is the most likely to be found at the end of a sentence; or, if all of the last few words are very unlikely to appear at the ends of sentences, it will choose the word whose part-of-speech is the most likely to appear at the end of a sentence.

To generate a paragraph of text, the number of sentences in it is chosen randomly, and then the required number of sentences is generated. Each sentence is generated independently from the others; that is, no effort is made to produce sentences or paragraph that appear "related". (Perhaps one might choose a few words for each page, and then increase their probability in the hope that they will occur more often on this page, and thus make the sentences appear more connected.)


Too bad the random sentences aren't exactly grammatically correct; that algorithm needs a little tweaking.

This is definitive proof that linguists, especially those with computer aptitude, ought to be deprived of their free time.

No comments: