Practical SQL Observability for Forge Apps with forge-sql-orm

vzakharchenko · November 26, 2025, 7:04pm

Thanks again for the valuable feedback, @varun - I’ve shipped several improvements to forge-sql-orm based directly on our discussion. Here’s a quick overview of what changed.

1. New deterministic default mode (TopSlowest)

The default behavior no longer depends on CLUSTER_STATEMENTS_SUMMARY.

forge-sql-orm now logs the exact SQL digests executed inside the resolver, giving deterministic diagnostics even for long-running logic.

By default it prints:

the single slowest query, and
optionally that query’s execution plan (showSlowestPlans: true)

Configurable like this:

{
  topQueries: 2,          // how many slowest queries to analyze
  showSlowestPlans: true, // re-executes them with EXPLAIN ANALYZE
}

If showSlowestPlans is enabled — the library re-executes these queries with EXPLAIN ANALYZE.
If disabled — it prints only SQL + timing.

plan enabled:

plan disabled:

2. SummaryTable mode (optional)

SummaryTable mode still exists, but now works as an advanced diagnostic option.

It uses a short memory window:

summaryTableWindowTime: 15000 // 15s default

If resolver execution exceeds this window, forge-sql-orm automatically falls back to TopSlowest, avoiding stale diagnostics.

This keeps SummaryTable useful for fresh metadata, but avoids relying on it for long workflows.

3. Updated API

Here is the updated API with all configuration options:

executeWithMetadata(
  async () => {
    // resolver logic
  },
  async (totalDbTime, totalResponseSize, printPlan) => {
    // your custom logic:
    // analytics, thresholds, alerts, logging, etc.
    // e.g.: if (totalDbTime > 1000) await printPlan();
  },
  {
    mode?: QueryPlanMode;            // "TopSlowest" | "SummaryTable" (default: TopSlowest)
    summaryTableWindowTime?: number; // ms window for SummaryTable (default: 15000)
    topQueries?: number;             // number of slowest queries to print (default: 1)
    showSlowestPlans?: boolean;      // print EXPLAIN ANALYZE in TopSlowest mode (default: true)
  }
);

Everything is opt-in and Forge-safe by design.

4. Timeout & OOM post-mortem diagnostics

For catastrophic SQL failures, the library performs an immediate post-mortem lookup.

Right after a Timeout or OOM, TiDB’s metadata is still in memory — so forge-sql-orm extracts the actual plan of the failing query before eviction can occur.

Case A: Timeout

“The provided query took more than 5000 milliseconds to execute…”

Timeout Error1046×123 31.1 KB

forge-sql-orm automatically logs the execution plan of the failing query:

Case B: Out of Memory (OOM)

“Your query has been cancelled due to exceeding the allowed memory limit…”

Out of Memory Error1164×158 64.1 KB

The library captures the memory-heavy plan that triggered the crash:

Why this matters

No re-execution required - avoids triggering the same timeout or OOM again.
Plans come from the real execution - captured with actual data distribution and bind parameters.
No tenant data is exposed - metadata only, fully compliant.
Runs entirely inside the Forge boundary - no privileged access or special APIs.
Works reliably even for complex, deeply nested SQL workloads - joins, pagination chains, window functions, etc.

This gives developers a safe way to understand severe failures without privileged access.

5. Why developer-side observability matters

This configurability — implemented on the application side — lets developers enable observability exactly where it’s needed:

resolver/long function/scheduler-level instrumentation
custom thresholds
selective plan printing
sampling
environment rules
optional analytics

And importantly:

You don’t need my library to do any of this.
Developers can implement the same pattern manually — forge-sql-orm simply makes it easier and more consistent.

Developer-side observability naturally complements platform-level observability.

Together, they enable building Forge apps with deep SQL execution paths: complex joins, multi-stage pagination, window functions, large OFFSET workflows — while still maintaining transparency and safety.

Curious to hear your thoughts, @varun:

Does this direction seem reasonable from the platform perspective?
Or would you recommend approaching any part of it differently?