50'000€ Prize for Compressing Human Knowledge
(widely known as the Hutter Prize)

Compress the 100MB file enwik8 to less than the current record of about 16MB

News: Alexander Rhatushnyak is also
the third Winner! Congratulations!
Alexander Rhatushnyak
... the contest continues ...
Prize Medal

Being able to compress well is closely related to intelligence as explained below. While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 100MB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs as a path to AGI.

The Task

Create a compressed version (self-extracting archive) of the 100MB file enwik8 of less than about 16MB. More precisely: Remark: You can download the zipped version enwik8.zip of enwik8 here. Please find more details including constraints and relaxations at http://prize.hutter1.net/hrules.htm.

Motivation

This compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently, thus reducing the slippery concept of intelligence to hard file size numbers. In order to compress data, one has to find regularities in them, which is intrinsically difficult (many researchers live from analyzing data and finding compact models). So compressors beating the current "dumb" compressors need to be smart(er). Since the prize wants to stimulate developing "universally" smart compressors, we need a "universal" corpus of data. Arguably the online encyclopedia Wikipedia is a good snapshot of the Human World Knowledge. So the ultimate compressor of it should "understand" all human knowledge, i.e. be really smart. enwik8 is a hopefully representative 100MB extract from Wikipedia.

Detailed Rules for Participation

Previous Records

Author Date Decompressor Total Size Compr.Factor|RAM|time Award Sponsor
? ? ? ? ? ? You?
Alexander Rhatushnyak 23.May 2009 decomp8 ... 15'949'688 6.27 | 936MB | ~9h 1614€ Marcus Hutter
Alexander Rhatushnyak 14.May 2007 paq8hp12 -7 16'481'655 6.07 | 936MB | 9h 1732€ Marcus Hutter
Alexander Rhatushnyak 25.Sep.2006 paq8hp5 -7 17'073'018 5.86 | 900MB | 5h 3416€ Marcus Hutter
Matt Mahoney 24.Mar.2006 paq8f -7 18'324'887 5.46 | 854MB | 5h pre-prize -

More Information

History

Committee

Donations

We would like to increase the prize with the help of donations. Currently we can only accept pledges of over 1000€, i.e. the donor obliges himself to pay up to the pledged amount to one or more winners in the future. In return, the donor will be appreciated by placing his name besides the winner in the table of records, unless he wants to remain anonymous. If you consider becoming a sponsor for (or have questions or suggestions regarding) our the prize, please contact one of the committee members above for more information or fill out and return the pledge form (PDF / ASCII). Please regard this as a suggestion only. We are open to other forms, and in particular establishing a real fund.

Frequently Asked Questions (FAQ)

Contestants

So far we have received the submissions below. Each is/was open for public comment and verification for 30 days before an award decision will be/was made. Comments should be made to the Hutter Prize Newsgroup or by email to members of the Prize committee.
Date Author Decompressor Compression
Options
Size of
archive
Size of
decompr.
Total Size %Improve
1-S/L
Compr.
Factor
Bits/
Char
Memory Time Note
23.May'09 Alexander Rhatushnyak decomp8 archive8.bin enwik8 15'932'968 16'720 15'949'688 3.2% 6.27 1.278 936MB ~9h Meets all prize criteria. Third winner!
22.Apr'09 Alexander Rhatushnyak decomp8 archive8.bin enwik8 15'970'425 16'252 15'986'677 3.0% 6.26 1.279 924MB 9h 3.0% improvement over new baseline paq8hp12
14.May'07 Alexander Rhatushnyak paq8hp12 -7 16'381'959 99'696 16'481'655 3.5% 6.07 1.319 936MB 9h Meets all prize criteria. Second winner!
... " ... ... ... ... ... ... ... ... ... ... ...
6.Nov'06 Alexander Rhatushnyak paq8hp6 -7 16'731'800 170'400 16'902'200 1% 5.92 1.352 941MB 5h 1% improvement over new baseline paq8hp5
25.Sep'06 Alexander Rhatushnyak paq8hp5 -7 16'898'402 174'616 17'073'018 6.8% 5.86 1.366 900MB 5h Meets all prize criteria. First winner!
10.Sep'06 Alexander Rhatushnyak paq8hp4 -7 17'039'173 206'336 17'245'509 5.9% 5.80 1.380 803MB 5h Superseded by paq8hp5
3.Sep'06 Alexander Rhatushnyak paq8hp3 -7 17'241'280 178'468 17'419'748 4.9% 5.74 1.394 742MB 5h Superseded by paq8hp4
28.Aug'06 Alexander Rhatushnyak paq8hp2 -7 17'390'460 205'276 17'595'736 4.0% 5.68 1.408 747MB 5h Superseded by paq8hp3
21.Aug'06 Alexander Rhatushnyak paq8hp1 -7 17'566'769 206'764 17'773'533 3.0% 5.63 1.422 748MB 5h Superseded by paq8hp2
20.Aug'06 Alexander Rhatushnyak paq8hkcc -7 17'597'599 244'224 17'841'823 2.6% 5.61 1.43 747MB 5h Superseded by paq8hp1
16.Aug'06 Dmitry Shkarin durilca0.5h -m1650 -o21 -t2 17'958'687 86'016 18'044'703 1.5% 5.54 1.444 1650MB 30min Fails to meet the reasonable memory limitations
16.Aug'06 Rudi Cilibrasi raq8g -7 18'132'399 34'816 18'167'215 0.9% 5.50 1.453 1089MB 7h Fails to meet the 1% hurdle, and others
24.Mar'06 Matt Mahoney paq8f -7 18'289'559 35'328 18'324'887 0% 5.46 1.466 854MB 5h Pre-prize baseline

The time for decompression/compression is estimated for a 2GHz P4. The percent (%) improvement is over the baseline (previous record) L=17'073'018 and L=18'324'887 respectively More details on the (de)compressors can be found here.

Links

Warning: The average quality of the posts in the discussion groups and mailing lists is very low. Most participants don't know the underlying scientific concepts and some have not even read the rationale behind the contest. For a cleaned summary consult the frequently asked questions. The competition was also announced or discussed in many blogs.

Disclaimer: Copying and distribution of this page (http://prize.hutter1.net) is permitted, provided the source is cited. The prize will be paid if the solution reflects the spirit of the contest. In particular decompressors (secretely) receiving any kind of "outside" information are forbidden. Also in order to verify your claim we need to be able to run your executable on our machines within reasonable space and time constraints. Payment of the prize cannot be legally enforced. The smallest claimable prize is 1500€. After an award, the prize formula (L) will be adapted. Rules may change at any time to meet the goals of fairness, accuracy, maximizing public participation, and recognizing existing practice. July 2006.