Over the past month, I introduced a new observability layer inside my library forge-sql-orm.
The goal was simple: make Forge SQL measurable and transparent — without breaking Runs on Atlassian compliance and without accessing any customer data.
As part of my Codegeist project, I integrated this layer into a real Forge app and connected it to PostHog.
This instantly gave me end-to-end visibility into resolver performance across environments, tenants, and app versions - all while staying fully within the platform boundary.
Why Observability Is Especially Hard in Forge SQL
Forge SQL is multitenant — the physical database is shared across many customers, and each tenant gets its own logical slice of data.
In practice, this creates two major challenges:
1. Tenants can have radically different dataset sizes
A resolver that runs in 50–100 ms for one customer may take hundreds of milliseconds, or even seconds, for another.
And you have no way to know this ahead of time:
- you cannot see the tenant’s actual table sizes
- you cannot log into the underlying database
- you cannot estimate index selectivity per tenant
- you can receive slow-query entries from TiDB, but only after a tenant has already experienced degraded performance
2. Platform-level analytics are available - but not enough
Forge SQL exposes some low-level database metrics:
- slow-query logs
- cluster statistics
- execution summaries
However, these analytics are:
- not tied to specific resolvers
- not correlated with app versions or environments
- not connected to payload size or resolver logic
- not continuous (visible only when TiDB marks a query as “slow”)
- not designed to show trends, regressions, or per-tenant behavior
They help diagnose extreme cases, but they’re not sufficient for understanding how your application performs in real-world multi-tenant conditions.
You only see what your resolver sees — nothing more.
Why That Becomes a Real Problem
As schemas grow and joins become more complex, behavior becomes unpredictable:
- A query with a perfect execution plan can still be slow for a large tenant.
- Pagination with large OFFSET becomes inconsistent between customers.
- A new join may be harmless in dev but catastrophic for a tenant with millions of rows.
- Regression detection is impossible without telemetry — you cannot see if performance worsened after a release.
Because all real data lives inside Atlassian infrastructure, the app developer has almost no visibility into how SQL behaves “out in the wild.”
This is exactly the gap that the new observability layer is designed to fill.
What the New Observability Layer Provides
The layer automatically captures performance characteristics for every resolver invocation.
Aggregated total DB execution time
All SQL statements executed by the resolver contribute to a single aggregated DB time metric.
Tiered logging thresholds
- Debug when total DB time > 1000 ms
- Warn when total DB time > 2000 ms
Thresholds can be tuned per resolver.
Automatic SQL plan dump
When total DB time exceeds 2000 ms, the ORM prints full execution plans for all queries executed inside the resolver directly into Forge logs.
This helps diagnose:
- unexpected
TableFullScan - heavy
IndexJoin - missing indexes
- window functions with large memory needs
- skewed statistics for a large tenant
- inefficient pagination
Performance telemetry sent to allowed analytics tools (e.g., PostHog)
Only anonymized metadata is sent:
- cloudId
- environment
- resolverName
- appVersion
- totalDbExecutionTime
- totalResponseSize
This enables weekly insights, multi-tenant comparisons, and regression detection — fully compliant and PII-free.
Automatic timeout and out-of-memory diagnostics
The observability layer also detects and analyzes failures such as:
- “Your query has been cancelled due to exceeding the allowed memory limit for a single SQL query.”
- “The provided query took more than 5000 milliseconds to execute.”
When these errors occur, the ORM retrieves and logs execution plans for the failed queries — making it possible to understand the issue even without access to tenant data.
Configurable by Design
One important aspect of this observability layer is that it is not global.
It is resolver-level, meaning every resolver can define its own behavior independently.
Each resolver can configure:
- custom thresholds
- warning levels
- plan-dump behavior
- analytics logic
- additional metadata
- sampling rules
- environment-specific overrides
- or disable observability entirely
This flexibility is powered by a simple callback structure:
executeWithMetadata(
async () => {
// resolver logic
},
async (totaldbTime, totalResponseSize, printPlan) => {
// your custom logic here
}
);
Because the logic lives at the resolver level, the system is:
- easy to tune
- lightweight
- platform-safe
- precise and predictable
Each resolver gets exactly the level of observability it needs — no more, no less.
Even If Analytics Are Disabled — Observability Still Works
Some customers disable analytics or block outbound requests entirely.
In this case, no telemetry is sent to PostHog (or any analytics tool), but the observability layer still provides full visibility.
All essential diagnostic information remains available directly in the logs:
- totalDbExecutionTime
- totalResponseSize
- per-query execution details
- full SQL plans
- timeout and OOM diagnostics
- resolver-level warnings and thresholds
This means that even if analytics events never leave the customer’s infrastructure, the customer can still provide logs that fully explain:
- which resolver was slow
- which queries were executed
- what each plan looked like
- why the slowdown happened (bad plan, full scan, index join, statistics skew, etc.)
Observability does not depend on outbound analytics — telemetry is optional, but diagnostics are always available.
100% inside the Forge boundary
- No external storage
- No outbound data beyond anonymized telemetry
- No PII
- No customer content
Runs on Atlassian — by design.
Integration Example
1. manifest.yml (permissions)
permissions:
external:
fetch:
backend:
- address: "*.posthog.com"
category: analytics
inScopeEUD: false
2. Wrapping a resolver
resolver.define('Test Resolver', async (req: Request) => {
const resolverName = 'Test Resolver';
return FORGE_SQL_ORM.executeWithMetadata(
async () => {
return ... // resolver logic
},
async (totalDbExecutionTime, totalResponseSize, printQueriesWithPlan) => {
await ANALYTIC_SERVICE.sendAnalytics(
"sql_resolver_performance",
resolverName,
req.context.cloudId,
{ totalDbExecutionTime, totalResponseSize },
);
if (totalDbExecutionTime > 2000) {
console.warn(
`Resolver ${resolverName} has high database execution time: ${totalDbExecutionTime}ms`,
);
await printQueriesWithPlan();
} else if (totalDbExecutionTime > 1000) {
console.debug(
`Resolver ${resolverName} has elevated database execution time: ${totalDbExecutionTime}ms`,
);
}
},
);
});
3. Sending analytics to PostHog
const appContext = getAppContext();
const properties = {
resolverName,
cloudId,
envName: appContext.environmentType,
envId: appContext.environmentAri.environmentId,
version: appContext.appVersion,
parsedVersion: this.parseVersion(appContext.appVersion),
totalDbExecutionTime: data.totalDbExecutionTime,
totalResponseSize: data.totalResponseSize,
eventVersion: 1,
};
await fetch("https://eu.i.posthog.com/capture/", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
api_key: process.env.ANALYTICS_API_KEY,
event: eventName,
distinct_id: cloudId,
timestamp: new Date().toISOString(),
properties: properties,
}),
});
This is enough to build dashboards that reflect real-world performance across tenants.
4. PostHog query for weekly resolver performance
SELECT
count(uuid) AS event_count,
properties.envId,
properties.cloudId,
AVG(properties.totalDbExecutionTime) AS avgTime,
concat(properties.cloudId, ':', properties.envName, ':', properties.resolverName) AS resolverName,
max(properties.parsedVersion),
max(timestamp) AS last_seen_at
FROM events
WHERE event = 'sql_resolver_performance'
AND properties.eventVersion = 1
AND timestamp >= now() - INTERVAL 7 DAY
GROUP BY
properties.envId,
properties.cloudId,
properties.envName,
properties.resolverName
HAVING avgTime > 500;
Putting It to the Test
To verify the full pipeline, I intentionally left a performance bottleneck in place:
SELECT SLEEP(2)
This pushed the total DB time above the 2000 ms threshold.
Here’s what happened:
- In PostHog, I immediately saw a spike:
- I opened the Forge logs and found the detailed plans:
- The execution plan pointed directly to the problematic place in code:
After removing the artificial delay, latency dropped from ~2200 ms → ~150 ms.
From Black Box to Glass Box
Before, Forge SQL was essentially a black box.
Now, with observability integrated directly into forge-sql-orm, it becomes:
- diagnosable
- measurable
- predictable
- transparent
This observability layer is lightweight, compliant, and extremely helpful when building applications with complex schemas, heavy joins, and tenant-specific performance patterns.
Try It Yourself
Codegeist project:
https://github.com/vzakharchenko/Forge-Secure-Notes-for-Jira
forge-sql-orm repository:
https://github.com/vzakharchenko/forge-sql-orm
If you want to integrate it or explore how it works — happy to help.






