walto/2nd_challenge_proposal.md

4.5 KiB

== Stop grinding bullshit, start compressing real-world data!

We hereby propose a blockchain-based cryptocurrency where the consensus mechanism is based on improvement in data compression (lossless) obtained on a fixed, large piece of real-world data.

== History and motivation

Open, long-term data compression contests have existed before. The biggest ones are the Hutter Prize for Lossless Compression of Human Knowledge and the Calgary Compression Challenge. Related to this is particularly Matt Mahoney's ZPAQ compression method, which decompresses by executing arbitrary bytecode; this effectively makes a standard for compression contests.

The connection between data compression and machine learning is clear: machine learning, particularly terse neural networks is effectively about data compression (into a particular format).

One concrete class of fast (de)compression algorithms is obvious: a (integral) neural network predicts the next byte/chunk in the data stream, and only a small prediction error is what gets stored. ZPAQ mixes in neural networks as part of its operation. Also, the “The intention of [the Hutter Prize] is to encourage development of intelligent compressors/programs as a path to AGI.”

== Technical rules

There needs to be a fixed virtual machine (preferrably specialized to plain data decompression, but could even be Bitcoin scripts or EVM).

A valid compression solution — like in ZPAQ — is bytecode that executes to reproduce the original data.

Unfortunately, since plain bytecode is not cryptographically secured, it can be copied. As such, there needs to be an underlying (e.g. other blockchain-based) layer that can seal an attestation linking an author to the new compressed data, the latter of which needs to be blinded (hashed) until the linkage is practically finalized.

The value of a solution should depend on the compression ratio as well as the decompression time (to avoid needing to verify solutions that take practically forever to decompress). Lower is better.

To avoid “seeding”, the initial solution does not imply any rewards.

The submitter of a solution earns points equal to the improvement in value.

Also, as a cryptocurrency, the author of a new, better solution gets to decide on arbitrary transactions compounded with the block containing the solution. The block size is variable depending on the elapsed time (see the next section as to why).

== Analysis

Analogous to PoW-based cryptocurrency systems with exponentially increasing difficulty (“reward halving”), it will be the easiest to cut down the size of the data while it is not yet extremely compressed, favoring early adopters of such a new kind of a cryptocurrency.

In order to avoid extreme early disproportionality, the base compression solution should be initialized with the most hardcore state-of-the-art compression algorithm cranked up to maximum settings, I.e. lrzip-next (see some statistics).

Copying of solutions (for the purposes of improving upon them) is clearly allowed, although submissions can effectively be black boxes.

As a not so intended consequence, such a game could also spark the development automated copying and improvement systems (imagine: “Hey, GPT-X, take this compression program, analyze it, change its parameters to allocate more resources for compression, and parallelize it to run on a server farm!”)

Unfortunately, this is a system where one cannot, in general, expect any reasonable regularity in appearance of new blocks: after a while, it will be the case that huge blocks will get added every time a mathematical breakthrough happens.

The fixed piece of data should be a concatenation of a very diverse kinds of constituent data (plain texts, natural and technical images, audio and video, documents in various file formats, packages, executable files, databases (including social, genomic, etc), etc), to leave plenty of area for improvements.

== Novelty in light of existing literature

There is a large body of research on decentralized machine learning, e.g. see Decentral and Incentivized Federated Learning Frameworks: A Systematic Literature Review. However, it appears that all approaches incentivize collaborative machine learning by plain payouts of external cryptocurrency. Our approach, however, builds whole Bitcoin-like cryptocurrency.