Blog about software - ordep.dev 10月02日
深入理解默克尔树
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

默克尔树是一种数据结构,用于高效总结和验证大量数据的完整性,使用户能够验证接收到的响应的真实性。它最初用于一次性签名和认证公钥分发,主要目的是总结大量数据并验证特定数据是否属于更大的数据集。默克尔树利用单向哈希函数,通过递归计算节点哈希值,形成二叉树结构,最终生成默克尔根,便于验证数据完整性。在分布式系统中,默克尔树可用于检测副本不一致性、减少数据传输量,支持对等文件共享和持久化数据结构。

🔹默克尔树的核心功能是高效总结和验证大量数据的完整性,通过递归计算节点哈希值形成二叉树结构,最终生成默克尔根(Merkle Root),便于验证数据真实性。

🔹默克尔树利用单向哈希函数(如MD5、SHA-3、SHA-256)对数据块进行哈希处理,将数据块转换为叶子节点,并通过拼接子节点哈希值计算父节点哈希值,逐层构建树结构。

🔹默克尔树的审计证明(Merkle Audit Proof)用于验证特定数据块是否属于更大的数据集,通过提供从数据块到默克尔根所需的中间节点哈希值,证明数据块的归属性。

🔹在分布式系统中,默克尔树可用于检测副本不一致性,通过比较不同副本的默克尔根或中间节点哈希值,发现并修复数据差异,提高数据一致性。

🔹默克尔树支持对等文件共享,用户只需获取文件树的默克尔根和所需数据块,无需下载完整数据集,从而减少数据传输量,提高共享效率。

This is a transcript of my talk on Diving into Merkle Trees that I will giveat Lambda Days and ScaleConf Colombia. Slides and video should be up soon!

Introduced in 1979 by Ralph C. Merkle in his Thesis: Secrecy, Authentications,and Public Key Systems, the Merkle Tree, also known as a binary hash tree, is adata structure used for efficiently summarizing and verifying the integrity oflarge sets of data enabling users to verify the authenticity of their receivedresponses.

“The general idea in the new system is to use aninfinite tree of one-time signatures. […] Each node of the tree performs threefunctions: (1) it authenticates the left sub-node (2) it authenticates the rightsub-node (3) it signs a single message.”

Initially, it was used for the purpose of one-time signatures and authenticatedpublic key distribution, namely providing authenticated responses as to thevalidity ofhttps://ordep.dev/assets/images/digital-signature.pngssets/images/digital-signature.png" alt="digital-signature">

Ralph C. Merkle described a new digital signature that was able to sign anunlimited number of messages and the signature size would increaselogarithmically as a function of the number of messages signed.

At this point we can identify the two main purposes of a Merkle Tree: (1)summarize large sets of data and (2) verify that a specific piece of databelongs to a larger data set.

One-Way Hashing Functions

Before diving into one-time signatures lets first get confortable withone-way functions. Usually, a one-way function is a mathematical algorithmthat take inputs and provide unique outputshttps://ordep.dev/assets/images/one-way-hashing-functions.pngg src="/assets/images/one-way-hashing-functions.png" alt="one-way-hashing-functions">

A one-way function F is a function that iseasy to compute, but difficult to invert. Given x and F, it is easy tocompute y=F(x), but given y and F, it is effectively impossible to computex.

One-way hashing functions are especially useful within Merkle Trees for twoobvious reasons; storage and privacy.

With systems that contain massive amounts of data, the benefits of beingable to store and identify data with a fixed length output can create vaststorage savings and help to increase efficiency.

The person who computes y=F(x) is the only person who knows x. If y ispublicly revealed, only the originator of y knows x, and can choose toreveal or conceal x at his whim!

One-time Signatures

Also in 1979, Leslie Lamport published his concept of One-time Signatures.Most signature schemes rely in part on one-way functions, typically hashfunctions, for their security proofs. The beauty of Lamport scheme was that thissignature was only relying on the security of these one-way functions!

One time signatures are practical between a single pairof users who are willing to exchange the large amount of data necessary but theyare not practical for most applications without further refinements.

If 1000 messages are to be signed before new publicauthentication data is needed, over 20,000,000 bits or 2.5 megabytes must bestored as public information.

If B had to keep 2.5 megabytes of data for 1000 other users, B would have tostore 2.5 gigabytes of data. With further increases in the number of users, orin the number of message each user wants to be able to sign, the system wouldeventually become burdensome.

Improving One-time Signatures

Merkle focused on how to eliminate the huge storage requirements in the Lamportmethod and proposed an improved One-time Signature that reduced the size ofsigned messages by almost a factor of 2.

This improved method was easy to implement and cutted the size of the signedmessage almost in half, although this was still too large for most applications;instead of storing 2.5 gigabytes of data, B only had to store

The method is called tree authentication because it’s computation forms a binarytree of recursive calls. Using this method, requires only log2 ntransmissions. A close look at the algorithm will reveal that half thetransmissions are redundant since we’re able to compute a given parent node Afrom their children A1 and A2, so there’s really no need to send A.

How to compute a Merkle Root?

Given that we have a data file represented by a set of blocks [L1, L2].

We start by applying a one-way hashing function to https://ordep.devhttps://ordep.dev/assets/images/building-blocks-02.pngntext highlighter-rouge">h(📄L1) = 9ec4.

The next step is to apply the same function to h(📄L2) = 7e6a.

To calculate the parent node, we need always to concatenate both child hashesh(📄L1) and h(📄L2) before applying, once again, the h(h(📄L1) || h(📄L2)) = aea9.

At this point we know the building blocks of a Merkle Tree; let’s represent itin Elixir.

Building a Merkle-Tree

In order to build a Merkle Tree, we need to define three new types: Leaf,Node, and the MerkleTree itself. Let’s start by defining Leaf – itshould contain the hash and the value of a given data block.

defmodule MerkleTree.Leaf do  defstruct [:hash, :value]  @type hash :: String.t  @type value :: String.t  @type t :: %MerkleTree.Leaf{    hash: hash,    value: value  }end

The next type is Node – it should contain the left and right childnodes, and the hash value of the concatenation of both child hashes.

defmodule MerkleTree.Node do  defstruct [:hash, :left, :right]  @type hash :: String.t  @type left :: MerkleTree.Node.t | MerkleTree.Leaf.t  @type right :: MerkleTree.Node.t | MerkleTree.Leaf.t  @type t :: %MerkleTree.Node{    hash: hash,    left: left,    right: right  }end

And, finally, the Merkle Tree itself.

defmodule MerkleTree do  defstruct [:root]  @type root :: MerkleTree.Node.t  @type t :: %MerkleTree{    root: root  }end

Hashing the data blocks

The first step to build a Merkle Tree is to hash the data blocks andconverting them to leaves. In order to hash something we need to define a newmodule Crypto with a single function hash that should accept an input and isresponsible for encoding it using the appropriate one-way hashing function.

defmodule MerkleTree.Crypto do  def hash(input, type \\ :sha256) do    type    |> :crypto.hash("#{input}")    |> Base.encode16  endend

Having blocks = ["L1", "L2", "L3", "L4"], the expected output would be:

[  %MerkleTree.Leaf{    value: "L1",    hash: "DFFE8596427FC50E8F64654A609AF134D45552F18BBECEF90B31135A9E7ACAA0"  },  %MerkleTree.Leaf{    value: "L2",    hash: "D76354D8457898445BB69E0DC0DC95FB74CC3CF334F8C1859162A16AD0041F8D"  },  %MerkleTree.Leaf{    value: "L3",    hash: "842983DE8FB1D277A3FAD5C8295C7A14317C458718A10C5A35B23E7F992A5C80"  },  %MerkleTree.Leaf{    value: "L4",    hash: "4A5A97C6433C4C062457E9335709D57493E75527809D8A9586C141E591AC9F2C"  }]

By defining a function new that accepts blocks, we should be able to hashthe data blocks and convert them into Leafs.

defmodule MerkleTree do  alias MerkleTree.Leaf  alias MerkleTree.Crypto  def new(blocks) do    blocks    |> Enum.map(&%Leaf.build(&1, Crypto.hash(&1)))  endend

Where Leaf.build/2 is just:

defmodule MerkleTree.Leaf do  def build(value, hash) do    %Leaf{value: value, hash: hash}  endend

Calling the function above MerkleTree.new ["L1", "L2", "L3", "L4"] shouldyield the expected output. Although, we’re not done yet.

Hashing the nodes

Remember that for creating a Node we need to join both child hashes andapply the one-way hashing function once again?

defmodule MerkleTree.Node do  alias MerkleTree.Crypto    def new(nodes) do    nodes    |> Enum.map_join(&(&1.hash))    |> Crypto.hash    |> build(nodes)  end    def build(hash, [left, right]) do    %Node{hash: hash, left: left, right: right}  endend

That’s basically what new is doing before calling build(nodes). Once wehave the Node hash value, we’re ready to create a new Node with hash,left, and right. As an example, by calling the function above with these twoleaves:

MerkleTree.Node.new([  %MerkleTree.Leaf{    value: "L1",    hash: "DFFE8596427FC50E8F64654A609AF134D45552F18BBECEF90B31135A9E7ACAA0"  },  %MerkleTree.Leaf{    value: "L2",    hash: "D76354D8457898445BB69E0DC0DC95FB74CC3CF334F8C1859162A16AD0041F8D"  }])

Would yield the following Node:

%MerkleTree.Node{  hash: "8C569660D98A20D59DE10E134D81A8CE55D48DD71E21B8919F4AD5A9097A98C8",  left: %MerkleTree.Leaf{    value: "L1",    hash: "DFFE8596427FC50E8F64654A609AF134D45552F18BBECEF90B31135A9E7ACAA0"  },  right: %MerkleTree.Leaf{    value: "L2",    hash: "D76354D8457898445BB69E0DC0DC95FB74CC3CF334F8C1859162A16AD0041F8D"  }}

All the way up

Having a way to build a Node from a pair of nodes, we can now make use ofrecursion to calculate the remaining nodes up to the Merkle root. Let’scomplete the new function:

defmodule MerkleTree do  alias MerkleTree.Leaf  alias MerkleTree.Crypto  def new(blocks) do    blocks    |> Enum.map(&%Leaf.build(&1, Crypto.hash(&1)))    |> build  endend

Currently, this new function it’s yielding a list of leaves. Let us define abuild function that accepts that list of leaf nodes with the goal ofgrouping it into several pairs of leaf nodes in order to build the parentNode by concatenating both left and right child hashes.

defp build(nodes) do  nodes  |> Enum.chunk_every(2)  |> Enum.map(&MerkleTree.Node.new(&1))  |> buildend

Note that we’re making use of tail recursion to build our Merkle Tree from theground up to the root. Still, we need to stop that recursive processing.

defp build([root]) do  %MerkleTree{root: root.hash, tree: root}enddefp build(nodes) do  nodes  |> Enum.chunk_every(2)  |> Enum.map(&MerkleTree.Node.new(&1))  |> buildend

By pattern matching a single element root in the list of nodes, we’re now ableto stop the processing and return a MerkleTree.

The final new function would look like this:

defmodule MerkleTree do  alias MerkleTree.Leaf  alias MerkleTree.Crypto  def new(blocks) do    blocks    |> Enum.map(&%Leaf.build(&1, Crypto.hash(&1)))    |> build  endend

Finally, by calling the function MerkleTree.new ["L1", "L2", "L3", "L4"] wouldyield the following result:

%MerkleTree{  root: "B8EC6582F5B8ED1CDE7712275C02C8E4CC0A2AC569A23F6B7738E6B69BF132E6",  tree: %MerkleTree.Node{    hash: "B8EC6582F5B8ED1CDE7712275C02C8E4CC0A2AC569A23F6B7738E6B69BF132E6",    left: %MerkleTree.Node{      hash: "8C569660D98A20D59DE10E134D81A8CE55D48DD71E21B8919F4AD5A9097A98C8",      left: %MerkleTree.Leaf{        value: "L1",        hash: "DFFE8596427FC50E8F64654A609AF134D45552F18BBECEF90B31135A9E7ACAA0"      },      right: %MerkleTree.Leaf{        value: "L2",        hash: "D76354D8457898445BB69E0DC0DC95FB74CC3CF334F8C1859162A16AD0041F8D"      }    },    right: %MerkleTree.Node{      hash: "29C5146A0AABBC4444D91087D91D2637D8EB4620A686CF6179CCD7A0BFB9B8EF",      left: %MerkleTree.Leaf{        value: "L3",        hash: "842983DE8FB1D277A3FAD5C8295C7A14317C458718A10C5A35B23E7F992A5C80"      },      right: %MerkleTree.Leaf{        value: "L4",        hash: "4A5A97C6433C4C062457E9335709D57493E75527809D8A9586C141E591AC9F2C"      }    }  }}

Audit Proof

As we already know, we can use a Merkle Tree to verify that a specific piece ofdata belongs to a larger data set. The Merkle audit proof is the missing nodesrequired to compute all of the nodes between the data block and the Merkle root.If a Merkle audit proof fails to prodhttps://ordep.devhttps://ordep.dev/assets/images/audit-proof.pngmatches theoriginal Merkle root hash, it means that our data block is not present in thetree.

In this example, we need to provide a proof that the data block L1 exists inthe tree. Since we already know the hash value of L1, we’ll need the hashvalue of L2 in order to compute P1. Now that we are able to compute P1 wefinally need to get P2 to compute R. In this specific case the Merkle auditproof is a list of nodes [H2, P2].

The use of tree authentication is now fairly clear. A given userA transmits R to another user B. A then transmits the authentication path forYi. B knows R, the root of the authentication tree, by prior arrangement. B canthen authenticate Yi, and can accept any Ynth from A as genuine.

How they are useful?

Merkle trees are especially useful in distributed, peer-to-peer systems wherethe same data should exist in multiple places. By using Merkle Trees we candetect inconsistencies between replicas, reduce the amount of transferred dataenabling peer-to-peer file sharing, and maintaining several versions of the sametree, also called persistent data-structures.

Detect inconsistencies

Having a data file represented by a data structure we’re able to detectinconsistencies between replicas of that same tree. Take for example threereplicas of the same Mhttps://ordep.devhttps://ordep.dev/assets/images/replicas-00.pnge root nodes we can makesure that those trees are not the same, or in this case, there areinconsistencies between them.

By using an Anti-entropy mechanismonly the data neededto repair the inconsistent tree.

To compare the state of two nodes, they exchange the corresponding Merkle Treesby levelhttps://ordep.devhttps://ordep.dev/assets/images/replicas-02.pngn the tree if the corresponding hashes aredifferent. If two corresponding leaf nodes have different hashes, then there areobjects which must be repaired.

This is actually used by Dynamo, Riak, and Cassandra to repair bad replicas!

Peer-to-peer file sharing

The principal advantage of Merkle Trees is that each branch of the tree can bechecked independently without requiring nodes to download thhttps://ordep.dev/assets/images/peer-to-peer-01.pngpeer-to-peer file sharing another good use for Merkle Trees, wherewe start by fetching the root of the tree from a trusted source to access agiven file.

Since we can fetch single parts of a tree, reducing the amount of transferreddata, we then fetch chunks of data from untrusted sources.

We start by fetching L3 and deriving its hash, b2d0. To allow us to get tothe root, we must fetch the hash value from the right leaf, 8f14. With thesetwo nodes, we can derive the next hash value, 165f. By fetching the lasthash, 165f, to derive the root hash,which is indeed 9cee.

We were building a partial tree having just the root hash and a given datablock. If the root computed from the audit path matches the true root, then theaudit path is proof that the data block exists in the tree.

L4, thebranch that links to it must calculate new hashes all the way up to theroot, although, the other branches stay intact.

Wrapping up

Merkle trees are just binary trees containing an infinite number ofcryptographic hashes where leaves contain hashes of data blocks and nodescontain hashes of their children. They also produce a Merkle Root that summarizesthe entire data set and that’s publicly distributed across the network and can easilyprove that a given data block exists in the tree.

You can find them in Cassandra, IPFS, Riak, Ethereum, Bitcoin, Open ZFS, andmuch more. Also, you have lots of papers to read as well if you want to diveeven deeper.

Have fun!

References

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

默克尔树 数据完整性 单向哈希函数 审计证明 分布式系统 对等文件共享 持久化数据结构
相关文章