RFC-72 : Forge Object Store

RFCs are a way for Atlassian to share what we’re working on with our valued developer community.

It’s a document for building shared understanding of a topic. It expresses a technical solution, but can also communicate how it should be built or even document standards. The most important aspect of an RFC is that a written specification facilitates feedback and drives consensus. It is not a tool for approving or committing to ideas, but more so a collaborative practice to shape an idea and to find serious flaws early.

Please respect our community guidelines: keep it welcoming and safe by commenting on the idea not the people (especially the author); keep it tidy by keeping on topic; empower the community by keeping comments constructive. Thanks!


Summary of Project:

This RFC proposes the introduction of an Object Store on the Forge platform to support the storage of large objects, addressing current limitations and enhancing app capabilities.

  • Publish: October 29, 2024
  • Discuss: November 19, 2024
  • Resolve: December 3, 2024

Problem

Forge storage currently supports both a Key-Value store and a Custom-Entity store. These features enable you to store structured data without the need to set up underlying databases, addressing concerns such as data residency and other requirements that the platform accommodates out of the box.

However, Forge storage does have certain limitations regarding object sizes and depth. These constraints can hinder some applications from fully utilizing the capabilities offered by the Forge platform.

Examples of use cases impacted by these limitations include:

  • Applications that generate reports exceeding 240 KiB
  • Applications that manage large object depths, necessitating developers to chunk their data

While some of these restrictions can be circumvented, doing so introduces additional complexity to your application design and requires time to develop patterns that address size limitations.

The Forge team recognizes this as a critical issue to resolve in order to promote the development of more Forge apps directly on the Forge platform. By addressing these limitations, we aim to eliminate the need to egress data and enhance trust, which is particularly important for our largest enterprise customers as they transition to the cloud.


Proposed Solution

The proposed Object Store will provide a scalable storage solution capable of handling large objects. Key features include:

  • Enhanced Storage Capabilities: Support for larger object sizes, reducing the need for data egress and unlocking more complex use-cases.
  • Security and Compliance: Secure access in compliance with data protection standards, ensuring safe and reliable data storage.
  • Familiar to developers : The Forge platform plans to introduce a new storage solution that is familiar to developers who have used similar solutions, such as Amazon S3 and Google Filestore. Forge apps, using this solution, will have proposed capabilities like put, get etc

Limits for Object Store

We are currently evaluating the limits and quotas associated with object store access for each app installation.

We are considering the implementation of several measures and aim to gain insights from this RFC. By understanding your use cases more thoroughly, we can better define what the numbers (X/Y) should represent.

  • Single Object Size Limitations
  • Per Invocation Operation Limits
    • A maximum of X number of (or size of) object downloads
    • A maximum of Y number of (or size of) object uploads
  • Data Transfer Quotas for each app installation

Asks

We appreciate your engagement and look for feedback on this RFC that will help shape the object store offering. Specifically, the Forge team would love to get some insights on:

  1. What specific scenarios or applications would benefit from an Object Store solution, such as handling large-scale data storage or media content management?
  2. What security measures do expect to be implement to ensure safe and controlled access to stored objects, considering encryption and access policies?
  3. Do you perform any administrative or maintenance tasks on stored objects, and would features like automatic expiration (TTL) be beneficial for managing data lifecycle?
  4. What is your projected data storage requirement per customer over the next 12 months? How often do you anticipate these objects will be read or written in your application?
  5. Are there scenarios in your application where multiple users or processes might attempt to update the same object simultaneously? If so, how do you currently handle such concurrent updates?
  6. Does your use case require versioned storage and retrieval of objects, and how would you utilise object versioning to manage data changes?
  7. What additional functionalities or operations would you expect from the Object Store solution to support your application needs, such as metadata management or data replication?
  8. How might the proposed storage limits and quotas implementation impact your application, and what strategies would you employ to optimise resource usage?

Your feedback is invaluable in shaping the Forge Object Store to better serve the developer community.

While this RFC is specifically around the Forge Object Store solution, we understand that this might be an important aspect of existing Connect apps.

If you are a developer of a Connect app, we would love for you to participate in the App Data Migrations Survey. This survey will help us design the app data migration experience from Connect/Remote to Forge, as well as understand your requirements for Object Store better by understanding areas like data size, shape, etc. The survey is thoughtfully designed to be short and focused.

Take the survey now!

15 Likes

Thanks for this RFC, looks very promising!

What specific scenarios or applications would benefit from an Object Store solution, such as handling large-scale data storage or media content management?

We’d mostly use it for storing temporary files generated by our apps which users then need to be able to download. We currently use S3 pre-signed URLs for this.

What security measures do expect to be implement to ensure safe and controlled access to stored objects, considering encryption and access policies?

We’d expect the same encryption as for data stored inside the host app. DARE support is required by our customers. Wrt. access policy: Individual files must be scoped to user accounts so only a user owning a file can download it.

Do you perform any administrative or maintenance tasks on stored objects, and would features like automatic expiration (TTL) be beneficial for managing data lifecycle?

Automatic expiration is a must-have. We currently expire after 1d, but prevent access even earlier (2h).

What is your projected data storage requirement per customer over the next 12 months? How often do you anticipate these objects will be read or written in your application?

This is very different for our tenants.
Data from today: Max value per tenant: ~32 GB, average ~84 MB.
Data is typically written once, and read once or twice (if automatic download in browser failed)

Are there scenarios in your application where multiple users or processes might attempt to update the same object simultaneously? If so, how do you currently handle such concurrent updates?

We currently don’t have such scenarios.

Does your use case require versioned storage and retrieval of objects, and how would you utilise object versioning to manage data changes?

Not required

What additional functionalities or operations would you expect from the Object Store solution to support your application needs, such as metadata management or data replication?

We’d need some sort of control over the mimetype served by the download endpoint to enforce downloads in browsers (rather than displaying the files).

How might the proposed storage limits and quotas implementation impact your application, and what strategies would you employ to optimise resource usage?

If quotas prevent creation / download of the files we’d store, then our application is broken.
The download of these files is the main purpose of several apps and for the others it would impair key functionality.
The usage depends on the end user and the amount of data (Confluence pages, attachments, …) they want to process with our apps, therefore we have very little influence on the resource usage.

4 Likes

Hello @SunandanGokhroo ,

We have a couple of Apps, so it is a bit hard to squeeze all into one answer, so I try to mainly take the vantage point of our Survey App (Multivote & Enterprise Survey).

The current storage size is quite low, so we can run into problems for storing definition and responses to our surveys - I know DC clients with long definitions that would fail in the Cloud. But we are talking more in the range of a couple of hundred Kbs, maybe some Mbs.

I would expect it to behave like the before mentioned storage systems.

We don’t, but we are about to release a possibility for users to do it. As the lifespan of a survey is driven by the users (ie, they decide how long it should run, and how long they want to store the raw data), it might even be detrimental.

This is really hard to tell, but I don’t see many peaks. For reading: there might be peaks during the hot phase of a vote (which then could maybe cached), afterwards I would anticipate roughly something like 2, 3 times a month max.

Yes, when updating a definition. As the storage is eventually consistent, we just live with this now and it is inherently racing. No atomic updates is really some kind of pain - I would love to have at least the possibility to only update when the previous version has some quality (like version number).

See above - we live with the ray behaviour. With versioned storage, we could more gracefully handle this, but also easily add features like undos.

Right now, I don’t see much. The very tight limits (invocation size and time wise) and the consistency without any way to interact are the major factors. This is also hinders upgrading the data for new versions when a new version rolls out.

Additionally, we would want to have a way to migrate from the other storage solutions to the new more easily - Custom entity would have been nice for us, but as there is no good migration path: you have to be really careful with the tight limits to react on an App update trigger; but then we cannot force older version to be upgraded, see RFC-71: Improving Connect to Forge and Forge Platform Upgrades - #21 by EckhardMaass , so the codebase becomes a mess…

Right now, if it is too much, it won’t work. In another App, to temporarily store something, we split up the file and stored the splits. This is a eally ugly workaround!

Best
Eckhard

1 Like

Thank you so much @EckhardMaass and @jens for sharing your use cases and helping us understand the requirements better. We really do appreciate this, as it will help us in building something useful for the developer community.

For reference of a key-value store in the open market: Limits · Cloudflare Workers KV

Hi @SunandanGokhroo,

we’re currently evaluating internally if this blob storage could also be an option for another use case.

Key requirements (in addition to the ones mentioned above):

  • Uploaded files should ideally be accessible to the end user’s browser in custom UIs without us proxying data through a function (e.g. displaying an uploaded image)
  • We’d need some sort of a collection concept for the stored files, similar to a directory structure
  • Listing of collection contents
  • Listing of such collections or alternatively listing all objects in the storage by name match. In our use case each of the collection / directory instances would contain a file with a static name (e.g. metadata.json), so we could search for those and then list the contents of the containing directory etc…

Feel free to reach out if you’re interested in more details :slight_smile:

Thanks,
Jens

Hi @SunandanGokhroo,
Sorry for slow response.

  1. We store Connect webhook payloads in S3 for downstream processing. We also store large function invocation payloads in S3 (bulk operations and also large user-generated input payloads). User generated content with app is also stored in S3.
  2. Encryption at rest - tenant isolated access controls.
  3. Yes and yes. Occasionally we need to process all stored objects to update their schemas. We have specific lifecycle policies for webhooks and some function invocation payloads as the are only required to be stored for short periods of time.
  4. For long-term storage, volume is low (MBs per tenant) but for the short term payloads mentioned in 1. the volume ie high (multi GBs per tenant). Writes dominate the workload significantly for webhooks. Reads dominate for user generated content.
  5. N/a
  6. Yes for long term storage of user generated content.
  7. Listing stored objects must be possible. We also need a really fast and reliable way of replicating data into Forge Storage to aid with any migration from Connect into Forge Storage.
  8. We use lifecycle policies to optimise resource usage at this time in S3. Storage and quote limits must reflect the volume and frequently of data generated by the host application: Jira/Confluence and its processing.

Thanks,
Jon

1 Like