Blobs

A blob is an object that represents the contents of a file. To allow shared storage of data between similar files, blobs are trees consisting of leaf and branch objects. A leaf object is just bytes, and a branch object is an array of branches or leaves.

type Blob = Leaf | Branch;
type Leaf = {
bytes: Uint8Array;
};
type Branch = {
children: Array<Blob>;
};

You can create a blob on the command line:

$ echo "Hello, World\!" | tg blob create
lef_01c8sr6jyef0bxp7j03f7emawb8q0p008nt9dn343c0qq345bba710

This blob is small, so it only needs one leaf to represent. Try creating a larger one:

$ for i in $(seq 0 999999); do echo $i; done | tg blob create
bch_01bn61ywpt8vsqkerfzpkzk7qgg3gsqx54dasvv4hnzwwx76qvq8g0

You can view the object tree using tg view:

$ tg view bch_01bn61ywpt8vsqkerfzpkzk7qgg3gsqx54dasvv4hnzwwx76qvq8g0

You can also manipulate blobs directly in Tangram TypeScript:

let blob = tg.blob("Hello, ", "World!");

This will create a blob which is a branch with two children, each of which are leaves.

Content-Defined Chunking

To minimize the amount of data stored on disk and tranferred over the network, blobs with similar content should produce similar object trees. Tangram does this with a technique called content defined chunking. As the bytes are read, a rolling hash is computed. When the hash matches a fixed value, a chunk is emitted. With content defined chunking, if you make a small edit in the middle of a large file, most of of the objects in the tree will be unchanged.