Expand description
Chunk-based file storage implementation. This is a building block for a DHT or something similar.
The API supports file insertion and retrieval. There is intentionally no
remove
support. File removal should be handled externally, and then it
is only required to run garbage_collect()
to clean things up.
The filesystem hierarchy stores two directories: files
and chunks
.
chunks
store MAX_CHUNK_SIZE
files, where the filename is a BLAKE3
hash of the chunk’s contents.
files
store metadata about a full file, which can be retrieved by
concatenating the chunks in order. The filename of a file in files
is the BLAKE3 hash of hashed chunks in the correct order.
It might look like the following:
/files/B9fFKaEYphw2oH5PDbeL1TTAcSzL6ax84p8SjBKzuYzX
/files/...
/chunks/2bQPxSR8Frz7S7JW3DRAzEtkrHfLXB1CN65V7az77pUp
/chunks/CvjvN6MfWQYK54DgKNR7MPgFSZqsCgpWKF2p8ot66CCP
/chunks/...
In the above example, contents of B9fFKaEYphw2oH5PDbeL1TTAcSzL6ax84p8SjBKzuYzX
may be:
2bQPxSR8Frz7S7JW3DRAzEtkrHfLXB1CN65V7az77pUp
CvjvN6MfWQYK54DgKNR7MPgFSZqsCgpWKF2p8ot66CCP
This means, in order to retrieve B9fFKaEYphw2oH5PDbeL1TTAcSzL6ax84p8SjBKzuYzX
,
we need to concatenate the files under /chunks
whose filenames are the
hashes found above. The contents of the files in /chunks
are arbitrary
data, and by concatenating them we can retrieve the original file.
It is important to note that multiple files can use the same chunks. This is some kind of naive deduplication, so we actually don’t consider chunks to be specific to a single file and therefore when we do garbage collection, we keep chunks and files independent of each other.
Structs§
- Chunked
File ChunkedFile
is a representation of a file we’re trying to retrieve fromGeode
.- Geode
- Chunk-based file storage interface.
Constants§
- CHUNKS_
PATH 🔒 - Path prefix where file chunks are stored
- FILES_
PATH 🔒 - Path prefix where file metadata is stored
- MAX_
CHUNK_ SIZE - Defined maximum size of a stored chunk (256 KiB)
Functions§
- hash_
to_ string - read_
until_ filled - smol::fs::File::read does not guarantee that the buffer will be filled, even if the buffer is smaller than the file. This is a workaround. This reads the stream until the buffer is full or until we reached the end of the stream.