Module geode

Source
Expand description

Chunk-based file storage implementation. This is a building block for a DHT or something similar.

The API supports file insertion and retrieval. There is intentionally no remove support. File removal should be handled externally, and then it is only required to run garbage_collect() to clean things up.

The filesystem hierarchy stores two directories: files and chunks. chunks store MAX_CHUNK_SIZE files, where the filename is a BLAKE3 hash of the chunk’s contents. files store metadata about a full file, which can be retrieved by concatenating the chunks in order. The filename of a file in files is the BLAKE3 hash of hashed chunks in the correct order.

It might look like the following:

/files/B9fFKaEYphw2oH5PDbeL1TTAcSzL6ax84p8SjBKzuYzX
/files/...
/chunks/2bQPxSR8Frz7S7JW3DRAzEtkrHfLXB1CN65V7az77pUp
/chunks/CvjvN6MfWQYK54DgKNR7MPgFSZqsCgpWKF2p8ot66CCP
/chunks/...

In the above example, contents of B9fFKaEYphw2oH5PDbeL1TTAcSzL6ax84p8SjBKzuYzX may be:

2bQPxSR8Frz7S7JW3DRAzEtkrHfLXB1CN65V7az77pUp
CvjvN6MfWQYK54DgKNR7MPgFSZqsCgpWKF2p8ot66CCP

This means, in order to retrieve B9fFKaEYphw2oH5PDbeL1TTAcSzL6ax84p8SjBKzuYzX, we need to concatenate the files under /chunks whose filenames are the hashes found above. The contents of the files in /chunks are arbitrary data, and by concatenating them we can retrieve the original file.

It is important to note that multiple files can use the same chunks. This is some kind of naive deduplication, so we actually don’t consider chunks to be specific to a single file and therefore when we do garbage collection, we keep chunks and files independent of each other.

Structs§

ChunkedFile
ChunkedFile is a representation of a file we’re trying to retrieve from Geode.
Geode
Chunk-based file storage interface.

Constants§

CHUNKS_PATH 🔒
Path prefix where file chunks are stored
FILES_PATH 🔒
Path prefix where file metadata is stored
MAX_CHUNK_SIZE
Defined maximum size of a stored chunk (256 KiB)

Functions§

hash_to_string
read_until_filled
smol::fs::File::read does not guarantee that the buffer will be filled, even if the buffer is smaller than the file. This is a workaround. This reads the stream until the buffer is full or until we reached the end of the stream.