Human Knowledge Compression Contest Frequently Asked Questions & Answers |
In [1], HTML and XML documents, the logical constructs known as character data and [2] [3] consist of sequences of characters, in which each [4] can manifest directly (representing itself), or can be [5] by a series of characters called a [6] reference, of which there are [7] types: a numeric character reference and a character entity [8]. This article lists the [9] entity [10] that are valid in [11] and [12] documents.If you do not understand any English, there is little hope that you can fill in the missing words. You may assign a high probability that the missing words equal some of the other words, and a small probability to all other strings of letters. Working on a string pattern matching level you may conjecture [11]=HTML and [12]=XML, since they precede the word "documents", and similarly [9]=character preceding "entity". If you do understand English you would probably further easily guess that [4]=character, [5]=represented, and [7]=two, where [4] requires to understand that characters are "in" sequences, [5] needs understanding of conjugation, and [7] needs understanding the concept of counting. To guess [8]=reference probably needs some understanding of the contents of the article, which then easily implies [10]=references. As a Markup Language expert, you "know" that [2]=attribute, [3]=values, and [6]=character, and you may conjecture [1]=SGML. So clearly, the more you understand, the more you can delete from the text without loss (this idea is behind Cloze-style reading comprehension and led to the Billion Word Imputation challenge). If a program reaches the reconstruction capabilities of a human, it should be regarded as has having the same understanding. (Sure, many will continue to define intelligence as what a machine can't do, but that will make them ultimately non-intelligent). (Y cn ccmplsh lt wtht ndrstndng nd ntllgnc, bt mr wth).
© 2000 by ... | [home] [search] [science] [contact] [up] [prize] | ... Marcus Hutter |