Human Knowledge Compression Contest
Frequently Asked Questions & Answers
[back to the prize page]
Prize Medal Prize Medal
Compress the 100MB file enwik8 to less than the current record of about 16MB


Frequently Asked Questions

What is this contest about?

The contest is about compressing the human world knowledge as well as possible. There's a prize of initially 50'000€ attached to the contest. If your compressor compresses the 100MB file enwik8 x% better than the current record, you'll receive x% of the prize. The contest is motivated by the fact that compression ratios can be regarded as intelligence measures. See http://prize.hutter1.net/ for details.

Is the compression contest still ongoing?

Yes it is (as of 2020), and I intend to fund it even beyond my death. It just became harder to beat the previous record, which is the reason that there have been fewer winners in recent years. If I ever should change my mind, I will update this website.

Where do I start? How do I develop a competitive compressor?

Minimal programming skills are required to write programs that compress files somewhat, but to win the contest, you first need to understand the most important existing data compression ideas, concepts, and basic algorithms, and then the state-of-the-art compressors which build on these (by refining, adapting, and combining them). Maybe you have a new great idea of your own, but by itself without combining it with existing ideas, you have no chance of winning. You can get a first glimpse of the vast field of data compression from Matt Mahoney's Data Compression Explained. If this is too dense, a more gentle and comprehensive introduction to (lossless) data compression and overview of existing compressors is (Chapters 1-6 of) the Handbook of Data Compression, after which you should be able to re-implement and tinker with the most important existing algorithms, and understand the current state-of-the-art compressors such as PAQ. Most modern compression algorithms are based on arithmetic coding based on estimated probabilistic predictions. To fully appreciate them you need some background in information theory, machine learning, probability and statistics, on the level of University courses/books. For a more light-weight start, I recommend you implement some simple Run-Length coding, then Lempel-Zip, and if you're able to, the elegant but fiendishly subtle-to-implement CTW. If you have an idea of your own, great, but to be able to win, you need to combine it with current state-of-the-art compressors, which are messy combinations of many ideas. Unfortunately the source code of the past winning entries (as of 2017) is not publicly available, but the PAQ source, on which they build on, is. Matt Mahoney's webpage lists and explains the most recent (and older) Data Compression Programs. Use the discussion forum for questions and general discussion or to get feedback on your ideas. Good luck!

What is (artificial) intelligence?

Intelligence has many faces, like creativity, solving problems, pattern recognition, classification, learning, induction, deduction, building analogies, optimization, surviving in an environment, language processing, knowledge, and many more. A formal definition incorporating all or at least most aspect of intelligence is difficult but not impossible. Informally, intelligence is an agent's ability to achieve goals in a wide range of environments. The key concepts for a formal definition are compression and utility maximization, the other aspects are emergent phenomena.

What does compression has to do with (artificial) intelligence?

One can prove that the better you can compress, the better you can predict; and being able to predict [the environment] well is key for being able to act well. Consider the sequence of 1000 digits "14159...[990 more digits]...01989". If it looks random to you, you can neither compress it nor can you predict the 1001st digit. If you realize that they are the first 1000 digits of π, you can compress the sequence and predict the next digit. While the program computing the digits of π is an example of a one-part self-extracting archive, the impressive Minimum Description Length (MDL) principle is a two-part coding scheme akin to a (parameterized) decompressor plus a compressed archive. If M is a probabilistic model of the data D, then the data can be compressed (to an archive of) length log(1/P(D|M)) via arithmetic coding, where P(D|M) is the probability of D under M. The decompressor must know M, hence has length L(M). One can show that the model M that minimizes the total length L(M)+log(1/P(x|M)) leads to best predictions of future data. For instance, the quality of natural language models is typically judged by its perplexity, which is essentially a compression ratio. Finally, sequential decision theory tells you how to exploit such models M for optimal rational actions. Indeed, integrating compression (=prediction) into sequential decision theory (=stochastic planning) can serve as the theoretical foundations of super-intelligence (brief introduction, comprehensive introduction, full treatment with proofs.

What is/are (developing better) compressors good for?

The most obvious application of (better) data compressors is of course to save (more) memory. The most popular large data types are images, music, and movies, which require lossy compression, but not less important are data from scientific experiments, billions of user data, documents, medical records, software, etc., which require exact storage and transmission and hence lossless compression, as e.g. in zip-files. For rarely accessed data, esp. backup files, even slow (de)compressors may be acceptable, saving a couple of percent in a multi-billion dollar industry. Similarly for data transmission, most prominently mobile internet with limited-capacity channels, higher compression effectively results in faster and/or more data transmission. Most gain could be achieved on (semi)synthetic data by smart compressors that reverse engineer the algorithms that produced the data. Another important application is prediction or time-series forecasting, with weather forecasting and stock-market forecasting being classical examples. The better you can compress, the better you can predict. Indeed one can show that Solomonoff's universal predictor is the best possible predictor for any task, which is essentially equivalent to Kolmogorov's universal best possible compressor. Natural Language Processing models (for translation, virtual assistants, text analytics, free-form search, etc.) heavily rely on and measure their performance in terms compression (log perplexity). Better compression requires understanding, but the reverse is true too. Finding more compact representations of some data can lead to a better understanding or in some sense is understanding. Scientists do that all the time by hand, but imagine an algorithm that finds regularities in (obscure) time-series, then inspecting the algorithm can help understanding the time-series. In simple cases this is exactly what Sloane's On-Line Encyclopedia of Integer Sequences does. Philosophically, the process of inducing=inferring=learning models from data, is called 'induction'. While philosophers still ponder how and why induction is possible, scientists are rarely troubled, if necessary use regularization in practice, and in theory Solomonoff's theory solves induction as well as logic serves deduction. Finally, AI increasingly is about building agents that understand their environment (learning/training models inductively), and base their decisions on the predicted consequences of their actions, all related to compression and the primary motivation for this contest.

The contest encourages developing special purpose compressors tailored towards enwik8, rather than general purpose compressors.

The top compressors may be somewhat tuned to enwik8, but the major progress will not come from coding e.g. thousands of grammatical and semantic rules to compress frequent phrases well. A compressor that learns these rules from the data itself will lead to better compression. Such a compressor will compress other text corpora well too.

Why lossless compression?

Doesn't a lossless compressor wastes resources on coding random noise and (spelling) errors in enwik8, in contrast to lossy compressors like the human brain?   ---   Not really, due to the following reasons:
  1. The test data is very clean. Misspelled words and grammatical errors are very rare. This is one reason why we chose this data set. See http://mattmahoney.net/dc/textdata.html for an analysis of the data.
  2. Even if the corpus would contain a lot of noise, lossless compression is still the right way to go for. One can show that among the shortest codes for a noisy data corpus, there is a two-part code of length L(A)+L(B), where A contains all "useful" information and B contains all noise. The theory is called "algorithmic statistics", while in practice two-part MDL is used: A probabilistic model M plus the log-likelihood of the data D under the model: CodeLength(x) = -log P(D|M) + L(M). Or briefly: Noise does not at all harm the strong relation between compression and understandng/intelligence/predictability/etc.
  3. It is not always clear what accounts for errors/noise and what for useful data. Consider e.g. some keyboard-layout, hence country-related, typos like y versus z. There is probably a correlation between the type of Wikipedia entry and the type of typo. A compressor finding such a correlation will compress better. Figuring out this correlation can lead to useful insight. Simply correcting the errors would miss this piece of human "culture". Hence it is not clear how to setup a fair lossy compression contest.
  4. Compressors can always ignore errors or noise; the decompressor reconstructs the error-corrected enwik8. Finally, a table of errors is used to reintroduce them. In this way any lossy compressor can be converted to a lossless compressor.
  5. All compressors have to deal with the same noise and errors. The extra information needed to encode the exact text is small, and is the same for all contestants. In this sense lossless compression is not harder than lossy compression.
  6. Human brains are probably lossy compressors, but this does not invalidate the strong relation between lossless compression and AI. The competition is not about imitating or emulating a human brain. It is about creating rational intelligences. Suitably storing what today looks like noise and errors does not harm and allows future reconsideration.
  7. Lossless compression implies AI, but lossy compression doesn't.

Why aren't cross-validation or train/test-set used for evaluation?

A common way of evaluating machine learning algorithms is to split the data into a training set and a test set, learn e.g. the parameters of a Neural Network (NN) on the train set and evaluate its performance on the test set. While this method, and similarly its extension to cross-validation, can work in practice, it is not a fool-proof method for evaluation: In the training phase, the algorithm could somehow manage to "effectively" store the information contained in the test set and use it to predict the test set without the desired generalization capability. This can happen in a number of ways:
  1. The test set could be very similar or in the extreme case identical to the train set, so even without access to the test set, the algorithm has effectively access to the information in the test set via the train set. For instance, if you downloaded all images from the internet and randomly split them into train and test set, most images would be in both sets, since most images appear multiple times online. Similarly if you download all text. Admittedly, Wikipedia should be less prone to repetition, since curated.
  2. The algorithm could accidentally contain test set information, though statistically this is very unlikely, and would only be a problem if HKCP received an enormous number of submissions, or contestants optimize their algorithms based on test-set performance.
  3. The contestant could cheat and simply hide the test set in the algorithm itself. This could be circumvented by keeping the test set secret, but one could never be sure whether it has leaked, a grain of doubt will always remain, and even if not, ...
  4. if the test set is taken from a public source like Wikipedia, a gargantuan NN could be trained on all of Wikipedia or the whole Internet. Limiting the size of the decompression algorithm can prevent this. Indeed this is the spirit of the used compression metric.
One the other hand, including the size of the decompressor rules out many SOTA batch NN, which are often huge, but maybe they only appear better than HKCP records, due to some of (a)-(d). The solution is to train online (see next FAQ) or to go to larger corpora such as enwik9, which seems to be a representative sample of human knowledge.

How to achieve small code length with huge Neural Networks.

Large Neural Networks (NN) can be trained to achieve excellent performance for text compression incl. enwik8, but are not competitive for the HKCP, since the networks often have (tens of) millions of (real-valued) parameters (weights and biases), i.e. the models hence decompressors are huge, possibly even larger than enwik8 itself. On the other hand, the untrained NN usually has a simple description: the structure of the network, the activation function, the learning algorithm, some hyper-parameters, and deterministic (pseudo-random or zero) weight initialization -- all code that is explicitly written by a human, which by necessity is not much, as long as no data-heavy libraries such as dictionaries are included. In this case it is possible to include only the smallish code and not the millions of trained weight values in the decompressor, provided the NN is trained online (rather than in batch) as follows (similarly for other small-ROM(=:code) large-RAM algorithms): For t=1,2,3,...:
- train the NN only on data up to time t,
- use it to probabilistically predict the (t+1)st data item,
- employ an arithmetic (de)coder based on this prediction.
It is more practical to do block-prediction with an exponential scheme such as t=1,2,4,8,16,... (or a bit finer) likely with comparable compression. Plenty of details, like dynamic hyper-parameter choices, likely make this batch to online conversion less straight-forward. The online performance will be inferior to the batch train/test performance, since the smaller t the worse the prediction, but this is the price to pay for a fair and incorruptible and universal comparison.

Why don't you allow using some fixed default background knowledge database?

Since Wikipedia contains enough human knowledge in the text itself: If advantageous, some advanced statistical and/or other smart but relatively short algorithms could extract and create such a probably more structured "database of knowledge" from Wikipedia. If this turns out to be impossible, it would indicate that enwik8 is not a representative sample of human knowledge, and enwik9 or other sources should be used.

Why is "understanding" of the text or "intelligence" needed to achieve maximal compression?

Consider the following little excerpt from a Wikipedia page, where 12 words have been replaced by the placeholders [1]-[12].
In [1], HTML and XML documents, the logical constructs known as character data and [2] [3] consist of sequences of characters, in which each [4] can manifest directly (representing itself), or can be [5] by a series of characters called a [6] reference, of which there are [7] types: a numeric character reference and a character entity [8]. This article lists the [9] entity [10] that are valid in [11] and [12] documents.
If you do not understand any English, there is little hope that you can fill in the missing words. You may assign a high probability that the missing words equal some of the other words, and a small probability to all other strings of letters. Working on a string pattern matching level you may conjecture [11]=HTML and [12]=XML, since they precede the word "documents", and similarly [9]=character preceding "entity". If you do understand English you would probably further easily guess that [4]=character, [5]=represented, and [7]=two, where [4] requires to understand that characters are "in" sequences, [5] needs understanding of conjugation, and [7] needs understanding the concept of counting. To guess [8]=reference probably needs some understanding of the contents of the article, which then easily implies [10]=references. As a Markup Language expert, you "know" that [2]=attribute, [3]=values, and [6]=character, and you may conjecture [1]=SGML. So clearly, the more you understand, the more you can delete from the text without loss. If a program reaches the reconstruction capabilities of a human, it should be regarded as has having the same understanding. (Sure, many will continue to define intelligence as what a machine can't do, but that will make them ultimately non-intelligent). (Y cn ccmplsh lt wtht ndrstndng nd ntllgnc, bt mr wth).

Why do you focus on text?

Visual (and audio and tactile) knowledge precedes linguistic knowledge and seems crucial to ground symbolic knowledge.

While the first part is true, the second is questionable. We decided not to include photo, video, and audio material in the file for the following reasons: Natural text, in particular novels, contains ample of descriptions of 3-dimensional scenes (the white house right of the beautiful tree under which ...). It is possible to extract the meaning of nouns and (directional) propositions and all other words from a sufficiently large corpus of text in an unknown language. Meaning is codified by relating words to other words (house-artificial, house-residence, house-right-tree), so a (spatial) representation of the world can be established without ever having seen a single picture This is akin to being able to "visualize" abstract mathematics, given enough exposure and skills. So it is plausible that any knowledge that can be demonstrated over a text-only channel as in the Turing test, can also be learned over a text-only channel, as e.g. evidenced by the blind and deaf writer Helen Keller. Given that higher (conscious) cognition is essentially symbolic, and a symbolic representation and understanding can be extracted from text alone, it is plausible that textual information is sufficient for the purpose of this contest. Inclusion of photo, video, and audio material would require the contestants to deal with and improve upon not only state-of-the-art textual but also many kinds of non-textual compressors and deal with terabyte of video. Most of the compression would be about modeling physics rather than cognition. This puts enormous unnecessary extra burden upon the contestants. and given the arguments above, without real benefit.

What is the ultimate compression of enwik8?

By definition, the Kolmogorov complexity K of a string x is defined as the length of the shortest program (self-extracting archive) computing x. K(x) itself cannot be computed, only approximated from above, namely by finding better and better compressions, but even if we reach K(x), we will never know whether we have done so. For a text string like enwik8, Shannon's estimate suggests that enwik8 might be compressible down to 12MB.

Can you prove the claims you make in the answers to the FAQ above?

Most of the assertions above can be formalized and proven, but some mathematical background is necessary. Good science books to start with are M. Li & P.M.B.Vitanyi, Kolmogorov Complexity, 2008, and M.Hutter, Universal Artificial Intelligence, 2005. For a broader reading recommendation, see here.

The PAQ8 compressors are already so good that it will be difficult to beat them.

Yes, it will not be easy (nobody said this), but there is likely a lot of room for improvement. PAQ8 models text only. There has been lots of research in language modelling (mostly for speech recognition) at the syntactic and semantic levels, but these are usually offline models on preprocessed text where words are mapped to tokens, and messy details like capitalization, punctuation, formatting, and rare words are removed. So far, nobody has figured out how to integrate these two approaches, but when that happens we will have a very powerful text compressor. Also from Shannon's estimates, that human text contains information of about 1 bit per character, enwik8 should be compressible down to 12MB. Anyway, meanwhile, after 12 years, PAQ8 has been beaten 4 times in total by ~14%, which is over 1% per year.

There are lots of non-human language pieces in the file.

Enwik8 is about 75% clean text and 25% "artificial" data of various forms (tables, hypertext links, XML, etc). As long as there is some natural language text (and enwik8 contains 75%), compressing it is a hard AI problem, regardless of whatever else is present in the file. The "artificial" data is also human-generated, and belongs to the human knowledge base. So it is not that devious to ask to model that too. Ideas of filtering out the "artificial" data or using other apparently cleaner text-sources didn't prove superior.

Including the decompressor size encourages obfuscation

Doesn't adding the decompressor length to the length of the archive encourage developing unreadable short decompressors, rather than smart compressors based on understanding the text corpus enwik8?

The typical size of current decompressors is less than 100KB. So obscuring them by making them shorter will give you at most a 0.1% = 100KB/100MB advantage (not enough to be eligible for the prize). On the other hand, for fairness it is necessary to include the size of the decompressor. Take a compressor like PAQ8H that contains 800KB of tokens. Clearly it can achieve better compression than one from scratch. If you're not convinced by this argument, consider an extreme "decompressor" of size 100MB that simply outputs enwik8 byte by byte (from a zero byte archive), thus achieving a compressed size of 0.

Why do you use the 100MB enwik8 and not 1GB or all of Wikipedia?

Indeed, there are good arguments for the 1GB corpus. You can find them at http://mattmahoney.net/dc/rationale.html. Nevertheless we decided to start initially with the 100MB file enwik8 rather than the 1GB file enwik9 for the following reasons:
  1. Although enwik9 is not a hardship in the long run, it is a hardship now. The contest will have highest participation when the prize will be initiated, so the file-size should be appropriate for moderate-cost home-computers.
  2. 100MB already contains a lot of Human knowledge.
  3. Typical decompressors are significantly less than 1% of 100MB, so there is no problem here.
  4. Enwik8 could still be downloaded by an analog modem.
  5. It's easy to upgrade to enwik9 later.
  6. Many standard compressors crash on enwik9 or start thrashing.
  7. It allows non-streaming offline (de)compressors that need the whole file in the RAM in a straightforward way. Enwik9 needs to be processed block-wise unless you have many GB RAM available.
  8. Maybe the most compelling one: We want to encourage smart compressors. Smart compressors can easily take ridiculous time already on small corpora. So if we would go for enwik9, we would lose some of the smarter ideas for (de)compressors.

Why are you limiting validation to less than 10 hours on systems with less than 1GB RAM?

Some of the best minds in the world are limited by the computers they can afford. Typical PCs and Laptops bought in 2006 are equipped with 1GB RAM or less, let alone older systems and developing countries. One of our most successful contestants wrote us: "I still don't have access to a PC with 1 Gb or more, have found only 512 Mb computer, but it's about 7 or 8 km away from my home (~20 minutes on the bicycle I use), thus I usually test on a 5 Mb stub of enwik8, with -5, using only ~210 Mb". Upgrading to 2GB for about 100€ sounds cheap (for many but not all) compared to the involved prize money and invested effort. But lets assume 1000 potential participants (sure, fewer will succeed), and, say, half of them first need to upgrade their machines, then overall 500×100€ = 50'000€ is invested in hardware, i.e. the prize supports the memory chip industry, and overall the participants suffer a loss, which is definitely not the aim of the prize. So 1GB RAM for the whole system (800MB working RAM) was chosen as a widely-affordable system. Also the overall ratio of 1/10 : 1 : 10 GB of enwik8 : RAM : HD seems balanced.
So far no aspirant had problems with the time constraint. Future smarter compressors we are aiming at probably need a lot more time to "meditate" about enwik8 in order to compress it well. For this and other reasons it is allowed to send us the already compressed archive instead of the compressor. Only the decompressor needs reasonable speed. Since decompression is much easier than compression, the time bound is hopefully not too severe. If it is, let us know. Instead of a hard time-bound one could think of penalizing slow decompressors. In theory one should penalize a decompressor by one bit for every factor of two it is slower than its competitor (keyword: Levin complexity). In practice one would need a much stronger penalty.

The total prize is not exactly 50'000€

Right! The prize formula is a compromise between many considerations (simplicity, fairness, risk, and others). Since it is principally impossible to know what the ultimate compression of enwik8 will be, a prize formula leading to an exact total payout of 50'000€ is impossible. So no formula allows to tell what total amount will be awarded. Theoretically you can win more than 50'000€. For instance, if you halve the current record, then halve your own record, and then again halve it again, you would receive 3×25'000€. Realistically (based on past experience) we expect at most a 3% improvement per year for the next 10 years, but you may be the one having a breakthrough of 20% improvement tomorrow. In theory, the current formula also allows optimizing your profit by sending e.g. three 3% improvements in monthly intervals, rather than one 9% improvement. You would gain scanty 136€ at the risk that someone sneaks in. (An alternative prize formula Z×ln(L/S) would prevent this gain). The total payout will be roughly 50'000€ if enwik8 can be compressed down to 7MB (in 3% steps), about the lower bound of Shannon's estimate of 0.6 bit per character.

Is 100MB 100×2^20 byte or 10^8 byte?

Most computer scientists use the prefix K(ilo) for 2^10, M(ega) for 2^20, and G(iga) for 2^30, while the International System of Units (SI, 7th edt.1998, p.103) defines K(ilo), M(ega), G(iga) as 10^3, 10^6, 10^9, respectively, and explicitly forbids misusing them for powers of 2. So the answer depends on whether you are a geek or a pedant. A good solution is to regard K/M/G as standing for "roughly" 10^3 or 2^10, etc, and avoid using them if more precision is needed, as is done on the contest page. Enwik8 is 10^8 byte. See MAA Online Counting by Twos (1999) for other solutions and references. In any case this is tangential to the contest.

Why do you require Windows or Linux executables?

There is a myriad of options we could allow for the contest. The related Large Text Compression Benchmark for instance is very flexible in this respect. This is convenient for contributors, but makes a fair comparison more difficult and fraud easier. Ideal would be one platform (e.g. Turing's Universal Machine). In real life, Windows and Linux executables cover most needs. Most programs are written in C(++), which neatly compiles to self-contained reasonably-sized executables.

Why do you require submission of documented source code?

A primary goal of this contest is to increase awareness of the relation between compression and (artificial) intelligence, and to foster the development of better compressors. The (ideas and insights behind the) submitted (de)compressors should in turn help to create even better compressors and ultimately in developing smarter AIs. Up until 2017 the source code was not required for participation in the contest, and has also not been released voluntarily. The past submissions are therefore useless to others and the ideas in them may be lost forever. Furthermore this made it difficult for other contestants to beat the (as of 2017) four-time winner Alexander Rhatushnyak. Making the source available should rectify these problems. Therefore, as of 2018, the source code is required, which should help to revive the contest, make it easier to build improved compressors by combining ideas, foster collaboration, and ultimately lead to better AI. Contributors can still copyright their code or patent their ideas, as long as non-commercial use, and in particular use by other future contestants, is not restricted.

Under which license can/shall I submit my code?

Any of the Open Source Initiative (OSI)-approved licenses is acceptable. In case of doubt, consider choosing a license as permissive as you feel comfortable with. We prefer Unlicense over MIT over Apache over Mozilla over GNU, but all of them are acceptable. Simply put a statement such as "This Code is licensed under UNLICENSE https://unlicense.org" somewhere in your code.

What if I can (significantly) beat the current record?

In this case, submit your code and win the award and/or copyright your code and/or patent your ideas. You should be able to monetize your invention beyond the HKCP. This happened to the first winner, a Russian who always had to cycle 8km to a friend to test his code because he did not even have a suitable computer, and who now has a lucrative job at QTR in Canada. I cannot directly help you with your aspirations, but the HKCP award on your CV plus a report that clearly explains your code, your algorithm, and the ideas behind them, should make you an attractive employee and/or your patent a valuable asset. The mp3 patent (the most famous lossy compressor for music) for instance, made millions of dollars from licensing fees.

How can I produce self-contained or smaller decompressors?

Is Artificial General Intelligence possible?

There have been many arguments for and against the possibility of creating Artificial General Intelligence. If AGI is not possible, what's the relevance of this contest?

The primary argument that AGI is possible is the computational hypothesis (supported by physical evidence) that the universe including organic matter and human brains are computable. Estimates of the complexity of a human brain look promising too, although they are more controversial. The second argument is the steady progress (hardware and software) in the last century in solving increasingly complex AI problems. Third, all arguments against AGI so far (at least those worth to refute) have been refuted; most are not even plausible in the first place. The primary reasons for their perseverance seems to be fear of change, robophobia, lack of imagination, and Carbon chauvinism. Scientific arguments include no free lunch theorems, alleged experimental evidence against Ockham's razor, Lucas/Penrose Goedel-type argument, and some others. Philosophical arguments that a machine cannot have free will, imagination, emotions, a soul, cannot be conscious, creative, etc. have been put forward. Actually, one can find claims that 'a machine will never be able to do X' for literally hundreds of X (many of them meanwhile being refuted). Further, there are all kinds of religious arguments. Finally, even if the great AGI goal is denied, the close relation between compression, prediction, and intelligence is undeniable, and hence this contest will in any case advance the field of text understanding and AI. And the conceptual problem of AGI is already solved.

Is Ockham's razor and hence compression sufficient for AGI?

Ockham's razor principle roughly says if two theories are equally well supported by data, one should favor the simpler one. Ockham's razor principle is well supported theoretically and experimentally, and there is no other similarly general and powerful principle which could replace or augment it. Ockham's razor principle has been proven to be invaluable for understanding our world. Until other necessary or sufficient principles are found, it is prudent to accept Ockham's razor as the foundation of inductive reasoning. Kolmogorov complexity quantifies the concept of simplicity/complexity. Together with with sequential decision theory, this can serve as a foundation for AGI. Indeed, Ockham's razor together with experimental design and logical reasoning are even sufficient founding principles of science itself. So far, all attempts to discredit the universal role of Ockham's razor have failed. Arguments include no free lunch theorems and some questionable experimental evidence.

I have other questions or am not satisfied with the answer

Matt Mahoney's page on the rationale of the contest may help you. It explains why compression is equivalent to AI, why text compression is AI-Hard, what's the problem with other evaluation measures, how much compression can be achieved, why Wikipedia, why only English, why only text and not a variety of data types, why include the decompressor size, why not rank by speed and memory, what has been achieved so far, and others. If all this doesn't help, read or post your questions at the H-Prize newsgroup.

 © 2000 by ... [home] [search] [science] [contact] [up] [prize] ... Marcus Hutter