Published on November 14, 2025 9:29 PM GMT
There were two technical comments on my post Turing-complete vs Turing-universal, both about my claim that a prior based on Lambda calculus would be weaker than a prior based on Turing machines, reflecting the idea that Turing-completeness is a weaker requirement than Turing-universality.
Here, my aim is to address those comments by explicitly constructing what I had in mind.
To briefly review, the universal semimeasure is a probability distribution we can define by feeding random coinflips to a universal Turing machine (UTM).[1] We can think of the coin-flip inputs as an infinite string, or we can imagine that we flip a coin to determine the next input whenever the machine advances the input tape. In any case, the resulting distribution over outputs is the universal semimeasure.
The probabilities of the "next output bit" being 1 or 0 do not always sum to one, because the UTM may halt, or it may go into an infinite loop. The Solomonoff prior is defined from the universal semimeasure through the Solomonoff normalization, which isn't typical normalization (scaling everything at once), but rather iteratively scales up every p(next-bit|string).[2]
My controversial claim from the earlier post:
As a consequence, Turing-completeness is an inherently weaker property. This has important learning-theoretic consequences. For example, the prototypical example of a Turing-complete formalism is lambda calculus. The prototypical example of a prior based on Turing machines is Solomonoff's prior. Someone not familiar with the distinction between Turing-complete and Turing-universal might naively think that a prior based on lambda calculus would be equally powerful. It is not so. Solomonoff's prior guarantees a constant Bayes loss compared to the best computable prior for the job. In contrast, a prior based on lambda calculus can guarantee only a multiplicative loss.
Here is what I had in mind:
First, we choose a prior over lambda terms, μ. Any will do, but it is reasonable to choose a length-based prior, since that keeps us conceptually close to Solomonoff. Vanessa's comment on the earlier post already suggests one such μ, so I'll just quote it here:
First, we choose some reasonable complexity measure C on lambda terms, such as:
For a variable x, we define C(x):=1For two terms t,s, we define C(ts):=C(t)+C(s)For a term t and a variable x, we define C(λx.t):=C(t)+1
Denote the set of lambda-terms by Λ. We then choose β>0 s.t. ∑t∈Λe−βC(t)≤1 [...] and make the prior probabilities of different lambda terms proporitional to e−βC(t).
Now, I need to specify the correlate of the (potentially infinite) output tape, and the (potentially infinite) input tape filled with random bits.
The output is not too difficult. I'll use the encoding listed on wikipedia:
0 := λf.λx.x
1 := λf.λx.f x
PAIR := λx.λy.λf.f x y
NIL := λx.λy.λz.y
A sequence, such as 110, can then be encoded as a linked list: (PAIR 1 (PAIR 1 (PAIR 0 NIL))). A finite lambda term can create an infinite sequence like this when we run it. So far so good.
Now, what about the infinite random inputs?
What I had in mind was the following:
If we just sampled from μ, and then ran the result,
- we might get a finite sequence such as (PAIR 1 (PAIR 1 (PAIR 0 NIL)))we might get an infinite sequence, which looks similar but just keeps going forever (PAIR 1 (PAIR 1 (PAIR 0 ...)))we might get something else which is not any of these things, EG, (PAIR 1 (PAIR 1 (PAIR 0 γ))) where γ is some arbitrary lambda term which is fully reduced (finished running) but doesn't conform to our target output format.
My proposal in the third case is to take the outer part as valid-so-far partial output (in this example, 110), and then feed γ another random lambda term sampled from μ. This is my analogue of feeding the UTM another coin-flip each time it advances the input tape.[3]
This still gives us a semimeasure on bit sequences, since lambda-term evaluation can halt with a NIL or fail to halt. If we want, we can apply Solomonoff-normalization to get a full measure, just like we did for Turing machines.
This completes the definition of the "lambda calculus prior" I had in mind.
Multiplicative Loss
To make my claim about this prior precise:
The "universal" in "universal semimeasure" does not refer to the Turing-universality which was discussed in the earlier post. Rather, it refers to being dominant in its class, namely the lower-semicomputable semimeasures. (We can more properly call it the "universal lower-semicomputable semimeasure", but "universal semimeasure" is typically enough for other algorithmic information theorists to know what is being referred to.)
More formally, "dominant in its class" means: given another lower-semicomputable semimeasure q, a universal semimeasure p dominates q, IE, there exists α>0 such that for any binary string x:
p(x)≥αq(x)This translates to a constant bound on log-loss relative to q.
It is easy to see that my lambda calculus prior l satisfies a weaker form of dominance, namely, for some lower-semicomputable semimeasure q, there exist α,β>0 such that:
l(x)≥αq(x)βThis translates to a multiplicative bound on log-loss relative to q. I say this is "easy to see" because given a stochastic Turing machine that implements q, we can convert to a regular Turing machine that implements q when fed random coin-flips, and then we can translate this to a lambda-term. The reason this gives only weak dominance, not true dominance, is because the procedure I described for generating new random lambda-terms are only like coin-flips with some probability, because we can generate lambda-terms other than 0 or 1.
I think of it like this: imagine if, when using the universal semimeasure, you were also allowed to look for patterns in the random coin-flips. Maybe you think to yourself "there have been a lot of zeros lately; I'm going to guess that the next ten random bits will be zeroes." (This corresponds to generating a lambda term which expands to a sequence of 0s.)
This might sound more powerful, but it is actually a handicap: it means there is no way to lock in a specific probabilistic hypothesis.
I haven't provided a proof that my lambda calculus prior isn't dominant over the universal semimeasure (and hence, a universal semimeasure in its own right); I just think it probably isn't, based on the above intuitions. So, I could be wrong there.
Even if I'm correct in my technical claim, there's the question of whether I'm "wrong in spirit" due to choosing an unnatural lambda-calculus prior. Both of the comments on my earlier post seem to suggest this. Quoting Cole's comment in full:
I think probabilistic lambda calculus is exactly equivalent to monotone Turing machines, so in the continuous case relevant to Solomonoff induction there is no difference. It’s “unfair” to compare the standard lambda calculus to monotone Turing machines because the former does not admit infinite length “programs.”
My personal sense is that my argument is still "fair" -- I've addressed the concern about infinite-length "programs" by feeding in new random lambda terms, potentially forever. However, there's plenty of room to disagree by claiming that I've set up my lambda-calculus prior in a dumb way. The point I'm making is (like Turing-completeness) ultimately informal and subject to some interpretation. However, I still feel like it has solid legs to stand on: Turing-completeness (unlike Turing-universality) allows a translation step, so that we can simulate binary computations in some other formalism; this translation step may introduce large differences in description-lengths. I'm trying to illustrate that by showing how description-length priors in different formalisms may not be equally powerful.
- ^
Specifically, a monotone universal Turing machine, as defined in EG Universal Learning Theory by Marcus Hutter (Encyclopedia of Machine Learning and Data Mining). These are Turing machines with an input tape which can only be moved in one direction, an output tape which can only be moved in one direction, and some number of bidirectional working tapes.
I am also restricting the alphabet of the machines to binary.
- ^
See, EG, Solomonoff Induction Violates Nicod's Criterion by Leike and Hutter.
- ^
Perhaps a cleaner version of this would contain an actual infinite sequence of random lambda-terms, rather than feeding them in one at a time like this, but I don't think it makes a big difference?
Discuss
