Published on October 26, 2025 8:33 PM GMT
This post is a comment on Natural Latents: Latent Variables Stable Across Ontologies by John Wentworth and David Lorell. It assumes some familiarity with that work and does not attempt to explain it. Instead, I present an alternative proof that was developed as an exercise to aid my own understanding. While the original theorem and proof are written in the language of graphical models, mine instead uses the language of information theory. My proof has the advantage of being algebraically succinct, while theirs has the advantage of developing the machinery to work directly with causal structures. Very often, seeing multiple explanations of a fact helps us understand it, so I hope someone finds this post useful.
Specifically, we are concerned with their Theorem 1 (Mediator Determines Redund): both the older Iliad 1 version for stochastic latents, and the newer arXiv version for deterministic latents. I will translate each theorem into the language of information theory: Wentworth & Lorell's assumptions will imply mine, while their conclusions will be equivalent to mine. The equivalences follow from the d-separation criterion and the fact that independence is equivalent to zero mutual information.
In our version of the new theorem, Λ is a mediator between subsets A and B of the data, meaning that it contains essentially all of the information in common between A and B, whereas Λ′ is a redund between A and B, meaning it essentially only contains information that is in common between A and B.[1]
New Theorem 1 (deterministic latents)
Let A,B be disjoint subsets of {1,...,n}.
Suppose the random variables X1,…Xn,Λ,Λ′ satisfy the following:
Λ Mediation: I(XA:XB∣Λ)≤ϵmed,
Λ′ Redundancy: H(Λ′∣XA)≤ϵred and H(Λ′∣XB)≤ϵred.
Then, H(Λ′∣Λ)≤ϵmed+2ϵred.
Proof
H(Λ′∣Λ)
=H(Λ′∣XB,Λ)+I(Λ′:XB∣Λ) by definition of conditional mutual information,
≤H(Λ′∣XB)+I(XA:XB∣Λ)+H(Λ′∣XA) by information theory inequalities,
≤ϵmed+2ϵred by Redundancy and Mediation.
Old Theorem 1 (stochastic latents)
Suppose the random variables X1,…Xn,Λ,Λ′ satisfy the following:
Independent Latents: I(Λ:Λ′∣X)≤ϵind,
Λ Mediation: I(Xj:X−j∣Λ)≤ϵmed for all j,
Λ′ Redundancy: I(Λ′:Xj∣ X−j)≤ϵred for all j.
Then, I(Λ′:X∣Λ)≤n(ϵind+ϵmed+ϵred).
Proof
First, we have
I(Λ′:Xj∣X−j)−I(Λ′:Xj∣Λ,X−j)
=I(Λ′:Xj:Λ∣X−j) by definition of 3-way interaction information,
=I(Λ′:Λ:Xj∣X−j) by symmetry of 3-way interaction information,
=I(Λ′:Λ∣X−j)−I(Λ′:Λ∣Xj,X−j)
≥−I(Λ′:Λ∣Xj,X−j)
≥−ϵind by Independent Latents.
Therefore,
I(Λ′:Xj∣Λ)
≤I((Λ′,X−j):Xj∣Λ)
=I(X−j:Xj∣Λ)+I(Λ′:Xj∣Λ,X−j) by mutual information chain rule,
≤I(X−j:Xj∣Λ)+I(Λ′:Xj∣X−j)+ϵind by the above derivation,
≤ϵind+ϵmed+ϵred by Mediation and Redundancy.
The result now follows by summing over all j=1,...,n.
- ^
Since probabilistic models are often only defined in terms of a latent structure, you might find it philosophically suspect to impose a joint distribution on all variables including the latents. If so, feel free to replace the random variables with their specific instantiations: the derivations go through almost identically with Kolmogorov complexity and algorithmic mutual information replacing the Shannon entropy and mutual information, respectively.
Discuss
