I’ve done something similar. Here’s a rough outline of the approach:
We built a chunked storage system on top of Forge’s storage API to handle json datasets that exceed the 240KiB limit. The main challenge was making it safe for multiple users/processes to read/write simultaneously without corrupting data.
High-Level Approach:
- Split large objects into chunks with a specific metadata block that is used for tracking
- Use atomic write patterns and validation to ensure data consistency
- Implement cleanup mechanisms for failed operations
Key Concurrency Mechanisms:
UUID Isolation: Each multi-part write gets a unique UUID appended to chunk keys (data_chunk_0_abc123) as well as a full set of metadata. This prevents chunks from different concurrent writes from mixing - even if two processes write to the same logical key simultaneously, their chunks remain separate. NOTE: the metadata includes format version, app version, timestamp, whether the write is 1 or many chunks and lastly the actual data.
Write-Then-Metadata Pattern: Always write all chunks first, then the main metadata last. The main metadata is the key that defines the ownership of the storage. This ensures readers never see incomplete data - they either get the complete dataset or nothing at all.
Chunk Validation: When reading, we validate that every chunk’s metadata (UUID, timestamp, total size, etc.) matches the main metadata. If any chunk is from a different write operation, we delete the entire corrupted dataset and return null.
Jittered Rate Limiting: Multiple processes hitting Forge’s rate limits would normally retry simultaneously, making the problem worse. We add random jitter to retry delays, spreading the load over time.
Cleanup Process: A scheduled weekly background task removes orphaned chunks from failed writes based on age, preventing storage bloat.
Task Isolation: Each operation gets a unique taskId, creating separate namespaces so concurrent operations don’t interfere.
Downsides:
- We are dealing with json and to split it up into chunks, we had to escape the string and write the chunk as a string, which causes undesirable bloat in the size of the value.
- It’s irritating to have a process to clean up orphaned chunks.
- It feels like a lot of code for something that seems like it should be simple.
Would appreciate any feedback on the approach including potential issues or suggestions to make it more efficient.