Working with Atlassian Forge SQL often raises important questions for developers building apps at scale: How does Forge SQL handle large datasets? How many rows can you actually manage effectively within the platform’s limitations? And most importantly, are there optimization strategies to push these boundaries further?
Recently, I explored exactly these questions by experimenting with a database containing over 600,000 rows, testing the performance limits of Forge SQL queries against Atlassian’s strict constraints—such as the 5-second execution time and 16MB response size limit.
In this post, I’ll share my findings, including insights into:
- How Forge SQL behaves with hundreds of thousands of records
- The real impact of platform-imposed limits on query performance
- Practical query optimization techniques to handle large-scale data effectively
If you’ve ever wondered how far you can go with Forge SQL—or how you might optimize queries for maximum performance—read on to discover what’s possible and how to achieve it.
To measure and improve query performance in Forge SQL, we’ll leverage two powerful TiDB tools: EXPLAIN
and EXPLAIN ANALYZE
. These tools provide deep insights into query execution plans, memory usage, runtime performance, and help identify bottlenecks when handling large datasets.
EXPLAIN
allows you to view the query plan without actually executing it, revealing the operations sequence, scan methods, join strategies, and index usage.EXPLAIN ANALYZE
goes deeper by executing the query, providing actual execution times, detailed memory consumption, disk operations, and other runtime metrics. This makes it invaluable for diagnosing performance issues and verifying the effectiveness of optimizations.
This step-by-step guide is based on a practical example that demonstrates real-world optimizations:
forge-sql-orm-example-query-analyses on GitHub — fully implemented with Drizzle SQL ORM and designed specifically for Atlassian Forge apps.
Setting Up the Example: Table Structure and Data
To clearly illustrate how Forge SQL performance behaves and how to optimize queries, we’ll use the following realistic database schema, compatible with TiDB/MySQL:
Table definitions:
CREATE TABLE IF NOT EXISTS category (
id varbinary(16) DEFAULT NULL,
name varchar(255) NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
CREATE TABLE IF NOT EXISTS product (
id varbinary(16) DEFAULT NULL,
name varchar(255) NOT NULL,
category_id varbinary(16) NOT NULL,
category_name varchar(255) NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
CREATE TABLE IF NOT EXISTS order_item (
id varbinary(16) DEFAULT NULL,
product_id varbinary(16) NOT NULL,
product_name varchar(255) NOT NULL,
quantity int NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
Installing and Preparing the Environment
To reproduce this optimization journey in your own Forge environment, follow these straightforward steps:
- Deploy and install the Forge app:
forge register
forge deploy
forge install
- Run the schema migration and populate the database (only needed once):
forge webtrigger -f runSchemaMigration
This script will:
- Create the tables defined above.
- Populate them with realistic test data:
Table | Records |
---|---|
category |
1,100 |
product |
200,000 |
order_item |
400,000 |
Note: Initial population may take 10–15 minutes. Wait until you see the following confirmation log in your Forge tunnel or Atlassian console:
DATABASE INITIALIZED with orderItems=400000, products=200000, catalogs=1100
- Open the installed Forge app in your environment to start experimenting with query performance.
Analyzing Performance: The Problematic Query
Here’s the real-world scenario we’ll analyze, a typical pagination query joining three large tables:
SELECT
`category`.`name` AS `a_categoryname_name`,
`product`.`name` AS `a_productname_name`,
`order_item`.`quantity` AS `a_orderitemquantity_quantity`
FROM `order_item`
LEFT JOIN `product` ON `order_item`.`product_id` = `product`.`id`
LEFT JOIN `category` ON `product`.`category_id` = `category`.`id`
ORDER BY `order_item`.`created_at` ASC
LIMIT 10 OFFSET 50000;
Optimizing Query Performance Step-by-Step 
In this post, I’ll take you through 4 practical stages, demonstrating how to analyze and gradually optimize Forge SQL queries for large datasets. Although I already had the optimal solution in mind when I began this journey, I’ve intentionally structured the analysis step-by-step to illustrate clearly how to identify and resolve performance bottlenecks.
At each stage, we’ll move closer to the optimal query, learning along the way why certain approaches perform better than others. Let’s start from the initial query analysis, and step-by-step, approach the most efficient solution:
Stage 1: Diagnosing the Performance Problem
We’ll start by running our original query and inspecting its performance with EXPLAIN ANALYZE
. This initial step is crucial to clearly understand why exactly the query is slow and where its main bottlenecks are.
The Query Under Analysis
SELECT
`category`.`name` AS `a_categoryname_name`,
`product`.`name` AS `a_productname_name`,
`order_item`.`quantity` AS `a_orderitemquantity_quantity`
FROM `order_item`
LEFT JOIN `product` ON `order_item`.`product_id` = `product`.`id`
LEFT JOIN `category` ON `product`.`category_id` = `category`.`id`
ORDER BY `order_item`.`created_at` ASC
LIMIT 10 OFFSET 50000;
EXPLAIN ANALYZE Summary
Here’s what the initial execution revealed in terms of performance:
Operator | Actual Rows | Task | Memory Usage | Disk Usage | Notes |
---|---|---|---|---|---|
TableFullScan_44 | 400,000 | cop[tikv] | 3.83 MB | N/A | Full scan on order_item |
TableFullScan_46 | 200,000 | cop[tikv] | 6.58 MB | N/A | Full scan on product |
TableFullScan_48 | 1,100 | cop[tikv] | 37.5 KB | N/A | Full scan on category |
HashJoin_33 | 50,010 | root | 5.08 MB | 3.05 MB | Joins product ← order_item |
HashJoin_23 | 50,010 | root | 116.2 KB | 54.8 KB | Joins category ← product |
TopN_36 | 50,010 | root | 6.58 MB | 0 Bytes | Sort & applies OFFSET |
TopN_18 | 10 | root | 2.47 MB | 0 Bytes | Final LIMIT |
Key Insights & Performance Issues
- The query execution took over 750ms, processed more than 50,000 rows, and consumed significant amounts of memory, even triggering temporary disk usage.
- The database executed multiple full table scans, particularly problematic for the
order_item
(400K rows) andproduct
(200K rows) tables. - Joins were performed using expensive HashJoins, contributing heavily to memory usage.
- The
TopN
operation for sorting and pagination is memory-intensive. AsOFFSET
grows, memory and execution time significantly increase.
Conclusion:
This query is clearly not scalable. Despite retrieving only 10 rows, the database has to scan, sort, and join tens of thousands of rows in memory. This inefficient approach quickly exhausts Forge’s memory and time constraints, making it unsuitable for production use at scale.
In fact, due to its slowness, this query is automatically logged to the TiDB
slow_query
table. You can view and analyze such slow queries either directly through the database or via the Atlassian Developer Console in your Forge app dashboard.
Stage 2: Adding Indexes to Improve Performance
To address the costly TableFullScan
operations identified in Stage 1, I added several indexes designed specifically to optimize the joins and sorting involved in our problematic query:
Creating the Indexes
I executed the following DDL statements through the Forge SQL client to add relevant indexes:
await forgeSQLClient.executeDDL('CREATE INDEX IF NOT EXISTS idx_catalog_id ON category(id)');
await forgeSQLClient.executeDDL('CREATE INDEX IF NOT EXISTS idx_order_id ON order_item(id)');
await forgeSQLClient.executeDDL('CREATE INDEX IF NOT EXISTS idx_order_item_product_id ON order_item(product_id)');
await forgeSQLClient.executeDDL('CREATE INDEX IF NOT EXISTS idx_order_item_created_name_qty ON order_item (created_at ASC, product_id, quantity)');
These indexes specifically target:
- Foreign key relations (
product_id
inorder_item
andid
incategory
). - Sorting and filtering columns (
created_at
,product_id
, andquantity
).
Re-running the Query: Improved Performance
After applying these indexes, we ran our original query again using EXPLAIN ANALYZE
. The execution plan significantly improved:
IndexHashJoin
andIndexLookUp
replaced previous expensive HashJoins and table scans.IndexFullScan
was utilized, removing unnecessary full-table scans from execution.- Total execution time dropped from ~750ms to approximately 414ms—a noticeable improvement.
Updated EXPLAIN ANALYZE Results
Operator | Actual Rows | Task | Memory Usage | Disk Usage | Notes |
---|---|---|---|---|---|
IndexFullScan_61 | 72,160 | cop[tikv] | N/A | N/A | Uses index (created_at , product_id , quantity ) |
HashJoin_45 | 50,010 | root | 5.04 MB | 0 Bytes | order_item → product join |
IndexHashJoin_75 | 50,010 | root | 1.83 MB | N/A | product → category join via index |
TopN_83 | 50,010 | root | 2.46 MB | 0 Bytes | Pagination (sort & offset) before final limit |
Projection_15 | 10 | root | 968 Bytes | N/A | Final projection |
Conclusion & Remaining Issue:
Indexes clearly improved query performance—execution time almost halved—but memory usage remained problematic, still exceeding 5MB for even moderate pagination offsets.
Memory Limit Warning
When increasing the offset to 100,000, we encountered a critical error:
Your query has been cancelled due to exceeding the allowed memory limit for a single SQL query. Please try narrowing your query scope or increase the tidb_mem_quota_query limit and try again.
Why does this happen?
Even though our query uses optimized index paths, the database still needs to perform sorting and offset pagination on tens of thousands of rows before applying the LIMIT. As a result, memory consumption becomes unsustainable at larger offsets.
Stage 3: Rewriting Queries with Common Table Expressions (CTEs)
To eliminate the heavy performance cost associated with large OFFSETs, I rewrote the query using a Common Table Expression (CTE) (WITH
clause). The core idea: retrieve just the desired subset of rows from the large order_item
table first, then join this small subset with related tables.
CTE-based Query:
WITH withQuery AS (
SELECT id, product_id, product_name, quantity, created_at
FROM order_item
ORDER BY created_at ASC
LIMIT 10 OFFSET 350000
)
SELECT category.name, withQuery.quantity, product.name
FROM withQuery
LEFT JOIN product ON withQuery.product_id = product.id
LEFT JOIN category ON category.id = product.category_id;
Updated EXPLAIN ANALYZE Results:
Operator | Actual Rows | Task | Memory Usage | Disk Usage | Time | Notes |
---|---|---|---|---|---|---|
TopN (CTE subquery) | 10 | root | 15.7 MB | 32.5 MB | 426.7ms | Sorts & applies OFFSET/LIMIT on order_item |
TableFullScan (order_item) | 400,000 | cop[tikv] | 8.08 MB | N/A | 22.2ms | Full table scan, major bottleneck |
HashJoin | 10 | root | 1.26 KB | 1.06 KB | 441.8ms | Joins order_item (CTE) → product |
TableFullScan (product) | 200,000 | cop[tikv] | 6.58 MB | N/A | 13.6ms | Full scan on product |
IndexHashJoin | 10 | root | 1.89 KB | N/A | 442.1ms | Joins product → category via index |
Projection | 10 | root | 62.0 KB | N/A | 442.1ms | Final projection |
Positive results:
- Stable final memory: Small joins now use minimal memory (<100 KB).
- Performance improvement: Although still significant, query execution remains under Forge’s limits at this offset.
Critical Bottleneck Remaining:
Despite the improvement, we still see significant issues:
- TableFullScan on
product
(200K rows, 6.58 MB memory) - TableFullScan on
order_item
(400K rows, ~8 MB memory and ~32 MB disk usage for sorting)
These full table scans significantly limit scalability, especially with larger tables.
Stage 4: Achieving Optimal Performance
To fully eliminate the remaining performance bottlenecks, particularly the costly TableFullScan
on the product
table, I introduced two additional indexes:
await forgeSQLClient.executeDDL('CREATE INDEX IF NOT EXISTS idx_product_category_id ON product(category_id)');
await forgeSQLClient.executeDDL('CREATE INDEX IF NOT EXISTS idx_product_id ON product(id)');
We then reran our optimized query utilizing the Common Table Expression (CTE):
WITH withQuery AS (
SELECT id, product_id, product_name, quantity, created_at
FROM order_item
ORDER BY created_at ASC
LIMIT 10 OFFSET 350000
)
SELECT category.name, withQuery.quantity, product.name
FROM withQuery
LEFT JOIN product ON withQuery.product_id = product.id
LEFT JOIN category ON category.id = product.category_id;
Final EXPLAIN ANALYZE Summary
Operator | Actual Rows | Time | Memory Usage |
---|---|---|---|
Projection | 10 | 16.2ms | 25.0 KB |
IndexHashJoin (product→category) | 10 | 16.2ms | 2.23 KB |
IndexHashJoin (order_item→product) | 10 | 16ms | 37.6 KB |
IndexReader & Limit | 350,496 | 14.7ms | 3.83 MB |
IndexFullScan (order_item) | 373,024 | 14.6ms | N/A |
Conclusion:
- Execution time drastically improved—down to just 16ms, demonstrating optimal performance.
- Memory consumption significantly reduced, staying comfortably within Forge limits (under 100 KB at the join level).
- All previous
TableFullScan
operations have been completely replaced by efficient index operations.
General Impressions and Recommendations
During this optimization journey with Forge SQL, I identified several important insights and recommendations:
-
Query Analysis Challenges:
- Analyzing queries that fail due to out-of-memory errors is particularly challenging. The only practical diagnostic approach is using preliminary execution plans (
EXPLAIN
), but interpreting these plans without actually running the queries remains difficult.
- Analyzing queries that fail due to out-of-memory errors is particularly challenging. The only practical diagnostic approach is using preliminary execution plans (
-
Limitations of Slow Query Logs:
- The slow query log helps identify problematic queries, but without detailed execution plans, its usefulness is limited. For thorough diagnostics, developers must manually execute queries using
EXPLAIN
directly in the database.
- The slow query log helps identify problematic queries, but without detailed execution plans, its usefulness is limited. For thorough diagnostics, developers must manually execute queries using
-
Flexibility in Data Insertion:
- Forge SQL allows large batch inserts—up to approximately 6000 records per insert operation in my example. This flexibility greatly simplifies the initialization and maintenance of large datasets.
-
Disabled Strict Mode and Constraints:
- Forge SQL currently operates with strict mode and data integrity constraints disabled. For instance, declaring a column as
VARCHAR(11)
doesn’t enforce the input length; instead, it silently truncates the data without explicit errors. Warnings about such truncations can only be retrieved by separate queries—it would be more convenient to have them directly visible in the Atlassian developer console.
- Forge SQL currently operates with strict mode and data integrity constraints disabled. For instance, declaring a column as
-
Foreign Key Behavior:
- Foreign keys function solely as indexes without enforcing referential integrity. Consequently, you can insert arbitrary values into foreign key fields, potentially resulting in empty JOIN results if mismatches occur.