Feedback request: Forge complex queries through custom entities

RodolfoCandido · July 5, 2022, 11:28pm

Hi there, I’m Rodolfo, an engineer at Atlassian. Our team is working on improvements to the Forge platform’s hosted storage capabilities. We’re planning to introduce the ability to create complex queries through custom entities in Forge and we’d love your feedback on it.

How Forge hosted storage works right now

At the moment, Forge hosted storage only allows you to perform query operations on the key defined by you, the developer. It doesn’t let you query based on the JSON values stored against each key.

As a result, you can only perform basic CRUD operations on hosted storage. In addition, Forge hosted storage query only supports the startsWith condition.

What we’re planning to do

We’re planning to introduce new ways of structuring data and indexes based on how you want to query it.

We’ll let you define up to 5 custom entities per app. Each entity will have:

Up to 50 attributes
Up to 5 indexes

After defining these entities and indexes, you’ll then have the ability to perform complex queries . This means you can query data through our storage API based on the indexes pointing to your data value, and not just the key field.

We’re also planning to add new conditions to support complex queries – more on this in the below section.

How this will affect you

Because this new capability will coexist with the existing untyped key/value storage, there’s no impact on your app’s data.

However, with this new ability, you’ll be able to start defining custom entities in your manifest.yml file. Each entity will be identified by name and will have a list of attributes. The indexes section allows you to define your access patterns.

app:
  id: <app id here>

  storage:
    entities:
      - name: books
        attributes:
          - title: string
          - author: string
          - publishedYear: float
        indexes:
          - author
          - publishedYear
          - name: by-author-and-published-year
            partition: author
            range: publishedYear

Attribute types

You can define up to four data types:

string
float
boolean
any

Note, the data type any is not supported for queries, so it can’t be used in any part of the indexes definition.

Indexes

You can specify two kinds of indexes:

Simple index
Named index

A simple index is based on a single attribute on which you can apply range conditions when you query on this index (see below for details).

A named index can take an additional attribute that will be used to partition the data before any query conditions are applied.

We expect that most use-cases will be solved with simple indexes. However, for large datasets, the partitions in named indexes will allow you to reduce the number of entities queries traverse; thereby providing performance improvements and advanced access patterns.

The existing API for untyped key/value will still work as normal, but we will extend it to allow you to specify which entity your operation is for, like this:

// Set
await storage.entity("books").set("book-1656551689", {
  title: "The Shining",
  author: "Stephen King",
  publishedYear: 1977
});

// Get
await storage.entity("books").get("book-1656551689");

// Delete
await storage.entity("books").delete("book-1656551689");

// Query on simple index
await storage.
  .query()
  .entity("books")
  .index("author")
  .range(isEqualTo("Stephen King"))
  .limit(20)
  .getMany();

// Query on named index
await storage.
  .query()
  .entity("books")
  .index("by-author-and-published-year")
  .partition("Stephen King")
  .range(isGreaterThan(2010), 'DESC')
  .filter("title", contains("au"))
  .limit(20)
  .cursor('')
  .getMany();

Partition

The partition() method is only required for named indexes; otherwise, it should not be used. The query will return only results where the partition attribute of the index exactly matches the argument value.

Range

The range() method is the recommended way to filter your data with the new conditions listed below. The second parameter defines the order of the results as the range attribute is used for sorting.

ASC (default)
DESC

Filter

The filter() method provides further filtering capabilities for all other typed attributes (excluding any).

New conditions

We’re planning to support the following conditions. Some of them will be only supported in certain parts of the query.

Supported for range:
- Equal to
- Less than / Less than or equal to
- Greater than / Greater than or equal to
- Between
- Begins with (already supported as startsWith)
Supported for filtering:
- all the conditions supported for range and;
- Not equal to
- Contains / Not contains
- Exists / Not exists

Known limitations

Filter operations

The response of queries using filter() might result in pages with fewer items than the limit you specify in the query (or even no items at all) even though there may be more items remaining. You can access these items with additional calls using the cursor.

You can tell when there really are no items by checking if the attribute nextCursor is undefined or not. This may happen because filtering happens in memory after the range operations return the data from the database. This behaviour can result in the response payload with items or empty pages and is less performant.

Feedback

We would love to hear your feedback about these plans, so we can adjust our solution as needed. In particular, we’re interested in:

Whether your app needs higher limits than our planned ones – 5 different entities, 50 attributes each, 5 indexes.
The type of query conditions that will be most useful to your development plans.
Whether you have more complex needs than what we’re planning to provide.
Any clarifications you need in addition to the information above to understand the feature. This will help us ensure we can provide the right level of documentation and support.

Feel free to comment on this post and let’s keep the conversation going!

You can also submit feature requests here.

Will this proposed solution address your current pain points in using Forge hosted storage and its data structuring/querying capabilities?

Yes, this will address my pain points in using Forge-hosted storage
No, this will not address my pain points using Forge-hosted storage (please provide more details in the comments)

0 voters

clement_garin · July 6, 2022, 7:06am

Nice! Would definitively solve our query performance issues, thanks to the filters.
Now the expected question: when can we expect these features to be available (maybe an EAP ?)

EulogioGutierrez · July 6, 2022, 7:59am

Hi Rodolfo,

One initial question/clarification related to the range method sorting

Say you have a “date” attribute (number), with a simple index on it, and you want to do a .index(“date-index”).range(between(d1, d2), ‘DESC’), and you have more than the max number of results you can return in a query (currently 20, I think).

Is the ASC/DESC sorting done when getting the results, or after some 20 results have been picked up and returned (like your filtering limitation)?

Thanks,

clouless · July 6, 2022, 8:06am

awesome!!!

SvenHe · July 6, 2022, 9:59am

@RodolfoCandido thanks for the great news.

I would like know if this way of implementation will handle manifest changes in custom entities or theirs attributes, since those changes in a growing app are pretty common.

Changes like:

Adding/removing new attributes
Renaming attributes/entities
Changing an attribute’s type

Some kind of migration from a deprecated entity to a new one would be useful in the future.

RodolfoCandido · July 7, 2022, 3:43am

It’s great to hear that the proposed solution will solve your problems Clément!

Our main focus at the moment is validating the solution, to ensure everything’s on the right track.
We’re hoping to do an early access program around Oct-Nov 2023, but we will keep the community updated on the progress.

Thank you very much for your feedback!

RodolfoCandido · July 7, 2022, 3:49am

Hi Eulogio,

Your entire dataset will first be sorted (in this case by the date attribute) and then the range is applied, meaning that if there’s more data to be retrieved (through pagination) the order will be kept.

I hope it clarifies things, thank you for your question!

RodolfoCandido · July 7, 2022, 3:58am

That’s a great question Sven, thank you for that!

At this point we are not going to provide support for breaking changes (ie. removing attributes, changing types, removing entities), but your concern is under our radar and we are going to iterate to provide a more robust solution that should address this.

We will share more details about this with the community when we have planned a solution for it.

SvenHe · July 7, 2022, 6:57am

Hi @RodolfoCandido,

Is 2023 correct? It seems a bit too far away.

Best regards,
Sven

clement_garin · July 7, 2022, 6:57am

Just to be sure: 2023 and not 2022 ?

RodolfoCandido · July 7, 2022, 7:06am

Apologies, it’s Oct-Nov 2022

(I was going to refer to Q2 FY23 but then decided to share based on calendar year to make easier to understand).

danielwester · July 7, 2022, 9:24am

On the surface, this looks promising. I am concerned about the limits though:

We’ll let you define up to 5 custom entities per app. Each entity will have:

Up to 50 attributes

Up to 5 indexes

Why limit us to 5 custom entities? This means that you’re thinking that we might only be storing 5 types of json objects in Forge Storage? Once you start with configs and “schema” type of things - the number of object types gets large quite fast (all of which really needs to be queriable if they’re attached to a project, issue etc).

Because of this limitation - I’m not sure if we would be able to make use of this feature. Any chance of use being able to have that limitation be larger than 5 entities? Ideally there is no limit on the number of entities that can be indexed, but if there has to be a limit - for large functionality apps - I would imagine 20 wouldn’t be too far off (we’d be at 10 today - so I’m future proofing myself). Of course the work around is for us to create indexing entities in forge storage to work around this limitation (but that seems like a fragile approach).

RodolfoCandido · July 8, 2022, 1:50am

The initial number of up to 5 entities was defined to make sure we can handle all existing apps and their environments, and maintain our service reliability.

We do intend to increase those limits in future releases (after the EAP), but we will need to assess what those new limits are going to be. We will definitely take your use case of up 20 entities into consideration.

–

On another note, would the limit in the number of attributes also be a problem for your use case?

Thank you very much for your feedback, it’s very important to us!

SushantBista · November 30, 2022, 4:52am

Hi everyone, I wanted to provide a quick timeline update on this project.

Our team is still progressing with development work, however due to some technical challenges and dependencies we are trying to sort out, the EAP release will be delayed. At this stage, we are still firming up the timelines (looking like March end), however I will provide an update in January on how we are going / if any changes to the plan.

I apologise for any inconvenience caused due to this shift in dates. Please let me know if you have any questions.

Thanks,
Sushant

JulianWolf · December 24, 2022, 11:27am

Hey @SushantBista,

It is a pity that there haven’t been any updates to the Forge Storage API this year. It’s really been the largest roadblock for me and others when it is about building more complex Forge apps. We started a new Forge app with egress to its own database a few days ago and I would have loved to avoid this.

If there are any updates or other insights on this please keep us posted. I would also love to hop on a call with the team to give feedback and to see what features to expect.

Happy holidays

Julian

SushantBista · January 9, 2023, 1:54am

Hey @JulianWolf, Happy New Year! Apologies for the delayed response. I was on leave.

I apologise for the delays on updates to Forge storage API. We are actively working on it, however there have been some challenges we are trying to sort out. I understand this is a big roadblock and it is one of our top priorities to address.

Yes, it would be great to have a call with you on this. I will DM you to organise a time.

Thanks,
Sushant

NathanReddy · February 21, 2023, 9:57pm

Hi Atlassian Team,

Any ETA for this feature? We are working on a Forge App and can really really use this feature. We are working towards our first release around May 31st 2023 and would like to know what your timeline is, so we can plan for either leveraging this feature or working around it.

If you want us to connect so we can explain our use case, and load / query scenarios, we would be happy to meet you and explain.

thanks

Nathan

ryan · February 22, 2023, 6:34pm

I hate to sound so negative because I’m sure the team has put a lot of work into this already.

I think this is the completely wrong direction. It feels like a terrible mashup between document based storage and SQL based storage.

If it’s sql based storage, you should host a real db like Postgres or MySQL and just impose limitations and give the developer root access to their own database and step back. It’s been done successfully for decades by hundreds of hosting providers.

If it’s document-based storage, the storage should be schema-less because this is what everyone uses nowadays. Also, declaring entities introduces a migration problem. What happens to the data on v2 of your add-on when you need to add a field to your entity. It feels like you are creating more complexity than required for the developer

Schema-less has been the state of the art in JSON based document storage for years. You can do indexes, queries, etc. on schema-less JSON documents. I know you guys are on Amazon. Please look at their Document DB product. If you are already using something like that for Forge storage, why are you making it more complicated than it needs to be? Just create a thin layer on top of an existing MongoDB-like product. This is pretty much the defacto standard.

SushantBista · February 24, 2023, 4:33am

Hi @NathanReddy, great to hear this feature will be useful for your app. In terms of ETA, we are currently still targeting March end for an EAP. However, we are still working through some technical challenges, there is a possibility this will get pushed a bit. We will be able to confirm once we progress more with our internal testing. Please also note, as an EAP release, it will not be recommended for production deployment. We would love to connect further to better understand your use-cases and how we can support in the interim.

Thanks,
Sushant