The Mormon Organon welcomes again guest blogger David H. Bailey! David is a researcher at the High-Performance Computational Research Department at the Lawrence Berkeley Laboratory in Berkeley, California. He is a leading figure in the field of high-performance scientific computing. He has over 100 scientific papers in that area, but to Mormon audiences he is best known for his insightful writings about Mormonism and Science issues. Welcome David!
A fundamental precept of evolutionary biology is that a combination of random variation and natural selection is the fundamental driving force for evolution. The consensus of the vast majority of biologists is that over the course of many generations, species have diverged and adapted to their local environment, thus producing the remarkable variety of life presently seen on earth. In contrast, skeptics of evolution, including many in the creationist and intelligent de-sign communities, assert that whereas natural biological processes may result in minor changes in a single species over time, nothing fundamentally new can arise from “random” evolution.
Some writers have drawn the analogy to English text. For example, David Foster, in a book skeptical of evolution, discusses and then refutes an argument he attributed to Thomas Huxley, namely that a few monkeys typing randomly for millions of millions of years would type all the books in the British Museum. Foster asserts that even a single line of 50 characters could not be produced in this way, since there are at least 8.5 x 10^(49) alphabetic strings of length 50; thus generating a specific given string “at random” is unlikely even over billions of years.
In response to Foster, biologist Gert Kortof points out that Huxley could not possibly have told this story in 1860, because typewriters were not commercially available until 1874. Furthermore, as both Gert Kortof and Peter Olofsson have noted, this type of argument suffers from fail-ing to define precisely what should truly be counted as “surprising.” To correctly assess the odds of such an occurrence, one should not calculate the probability of some single event (all of which may have the same probability), but instead the probability of all events in a given class.
Along this line, Oxford biologist Richard Dawkins has described a simple computer program he wrote to generate the Shakespearean sentence “Methinks it is like a weasel,” starting from a ran-domly generated character string. His program achieved its goal in 41 evolution-like iterations, where, at each iteration the population of “sentences” were scored based on how many letters were in agreement with the target phrase. Selective “breeding” improved the score of the best sentence until there were no errors.
While this is an interesting exercise, it has significant flaws, some of which Dawkins himself acknowledged. To begin with, his experiment involved only a single “species.” Secondly, Dawkins’ process was defined by a single pre-specified target, whereas biological evolution is governed instead by a complicated “fitness landscape” involving hundreds of interacting factors. Finally, Dawkins’ experiment progressed to a fixed future goal, whereas real biological evolution does not operate with any future goal in mind – each step must bestow some advantage.
A Computational Experiment
I thought it would be interesting to explore whether an evolutionary computing approach can generate more than a single, short, targeted phrase as Dawkins produced, but instead a significant volume of text segments that are typical, say, of some genre of English literature. To that end, I wrote a computer program that begins by constructing a set of 1024 segments of text, each 64 characters long. The individual characters are chosen at random according to the natural distribution of individual characters in Charles Dickens’ novel Great Expectations. Some examples:
o ao ,fludoy aocueu feidh,iaemehaiheyh daneny shpesaems y nhte
nrtnnbaa.nn hymeo t fiilunnw nt t,ntehg eu y’ t h l dieosea ii
mbdsoee lueleciro ,ynaeenetg itln h srw l,pn uf svee,ee a’l sl
snd etke snoymnra lhs gdnu,nmrs e trlhueafpraa.c.ys f yjser g
The program then finds the longest consecutive match of a given segment in character position 1 up through position 16 to any 16-long segment in the text of Great Expectations. This check is then repeated for positions 2 through 17 of the segment, then for positions 3 through 18, and on until the end of the segment is reached. The sum of the match lengths for these checks is the score for the given 64-long segment. Note that this scoring function has no specific future target, but only measures how typical the given segment is of text in Great Expectations. In other words, Great Expectations plays the role of “fitness landscape.”
Evolutionary iterations are then initiated: First, the top-scoring segments are permitted to “mate” (i.e., randomly exchange 4-long character strings, beginning at positions 1, 5, 9, etc.) with an-other segment chosen at random from the top-scoring segments. Then random changes are made to these strings, much in the spirit of mutations observed in real biology. After these “mutations” have been performed, each resulting segment is scored, and the segments are sorted according to their new scores. This cycle repeats until 10,000 iterations have been performed. At the end of these iterations, the highest-scoring segment is taken to be the result of the trial, and the other 1023 segments are discarded. The computer program ran for 24,576 repetitions of the process described above, thus generating 24,576 segments of length 64 characters each.
Many segments generated by the program, such as these four examples, contain syntax errors and nonsensical or misspelled words:
had i learn a lesson – looked at the stars, and held the gate.
i felt as if he were a surgeon or a dentistrate in the table.
did, in a comfortable about it and hear a triale beside her.
he is sure to be executed on mond another in the mire of time.
Many other segments, such as these four, are syntactically acceptable but don’t make much sense:
and gloves, and as there no one and between his countenance.
asked me why i wanted it and at her, said i, almost in a french
for three in the station that he was in it rather resented.
at remained, all these reasons for my part, he were a file.
But other segments are entirely reasonable, and could easily pass as fragments of literary text. Along this line, I constructed the quiz below, then had it administered to some college students at a large university. They were told only that some of these twenty segments of English text are extracted from the writings of Charles Dickens, and some are computer generated.
1. up at it for an instant. but he was down on the rank wet grass,
2. or do any such job, i was favoured with the employment. in order,
3. at the fire as she took up her work again, and said she would be
4. the monster was even careless as to the word that i had him so.
5. as to go with him to his father’s house on a visit, that i might
6. fitted it to nothing and get the ashes between me to the last.
7. as no relation into another that it is the same room – a little
8. a separation to be made for the desolater, like the man he was.
9. we said that as you put it in your pocket very glad to get it, you
10. that he had treated him to a little bee, he was to call the
11. if he had for a time such an interest here and contented me.
12. great iron coat-tails, as he had done, and then ran to that.
13. he saw me going to ask him anything, he looked at me with his glass
14. on my objecting to this retreat, he took us into another room with
15. been born on there, or that i had the greatest indignature.
16. the chimney as though it could not bear to go out into such a night
17. later to settle to anything i had hesitated as to the sound.
18. the greatest slight and injury that could be done to the many far
19. of it on the hearth close to the fear that she had done rather
20. out of my thoughts for a few moments together since the hiding had
The reader is invited to try to identify which of these are authentic snippets of Dickens’ writings and which are computer-generated segments produced by the scheme described above, without consulting any references. The answers are given in the Appendix below.
Looking collectively at the 66 sets of responses that the author received for this quiz, the average number of correct responses is 40 (60.6%), which is not a great deal higher than the 33 correct responses (50%) that one would expect at random. If we look at “majority vote” statistics, the majority of the 66 responses is correct for most items, but it is wrong for items #8, 9, 11, 13, 20. All of the computer-generated items had at least 18 incorrect responses out of 66.
It is important to note that none of the 24,576 segments produced by the computer program coincides with any 64-character segment of Great Expectations. In other words, the computer program is not merely “regurgitating” portions of the input text file. What’s more, none of these 24,576 computer-generated segments coincides with any other of the 24,576 segments in more than 17 consecutive characters, even when shifts are allowed – all 24,576 generated segments are substantially distinct. In addition, the computer program constructed numerous legitimate English words that do not appear anywhere in Great Expectations. Some examples:
administer, agitate, allowing, arrangers, assail, assessed, attenuated,
attraction, auctioned, baroness, batter, bellow, breather, chastened,
coached, conspire, contentions, credited, deceived, descension, despot,
detained, detriment, discriminate, dispensable, dispenses, distances,
easiness, elected, enhance, formations, foundered, generate, generation,
gentile, glisten, gradation, handler, hitches, inconvenient, increase,
intentionally, intentioned, intimations, iterate, lacerate, liberate,
liberated, likened, mattered, mediated, migration, ministered, mission,
necessitated, operated, positioned, possibilities, powered, prostrate,
releases, remonstration, renderings, retirements, retreated, searches,
session, silenced, simmer, situations, slinging, soothings, spheres,
statements, steamed, steers, straits, stratified, stressed, teased,
tendered, termination, thickens, threatenings, threshes, torments,
traitors, trench, utters, wandered, wither, weathers
Recall that 24,576 distinct 64-long segments of text were generated by the computer program, for a total of 1,572,864 bytes. Note that this figure is higher than the length of the computer program (17,622 bytes) plus the length of the Great Expectations input file (994,587 bytes), which total 1,012,209 bytes. In other words, the computer program generated 1.55 times more text than the combined input data file and computer program. After compressing these files using a well-known compression utility (as a measure of underlying information), the ratio is still 1.46.
A computer program based on methodology developed in the genetic programming community is indeed able to generate English text segments reminiscent of Dickens literature. At the least, some of the better resulting text segments are sufficiently good to fool human judges in an informal test – college students were correct in distinguishing true Dickens from computer-generated segments only about 61% of the time (on average).
Obviously a full-scale computer simulation of biological evolution would have to be much more sophisticated. It would have to incorporate thousands of species and millions of individual organisms, together with full details of a complicated and changing environment. Such a simulation is well beyond the scope of what could be done today even on the most powerful supercomputers. Nonetheless, it is clear that if the claims of creationist and intelligent design scholars (namely that “random” evolution cannot generate truly novel information) have any substance, we should be able to see evidence of this phenomenon even in modest simulations of evolutionary processes, such as the one described in this note.
But we do not see the claimed effect. Instead, we see results very much in keeping with principles of evolution that have been established in the field for many years, harkening back to the original mathematical models of evolution presented by Fisher back in the 1920s. Evolution does generate novel information.
In the exercise presented above, these items are authentic Dickens:
1, 2, 3, 5, 9, 13, 14, 16, 18, 20
These items are produced by the computer program:
4, 6, 7, 8, 10, 11, 12, 15, 17, 19
Full details and references are available here: