RFC-18: Forge App Invocation Metrics API

reddy1 · July 12, 2023, 11:52am

RFCs are a way for Atlassian to share what we’re working on with our valued developer community.

It’s a document for building shared understanding of a topic. It expresses a technical solution, but can also communicate how it should be built or even document standards. The most important aspect of an RFC is that a written specification facilitates feedback and drives consensus. It is not a tool for approving or committing to ideas, but more so a collaborative practice to shape an idea and to find serious flaws early.

*Please respect our community guidelines : keep it welcoming and safe by commenting on the idea not the people (especially the author); keep it tidy by keeping on topic; empower the community by keeping comments constructive. Thanks!

For the avoidance of doubt, the Atlassian Developer Terms govern any feedback you provide, and any sample code we provide is deemed to be part of the “Atlassian Platform” under that agreement.*

Summary

This project aims to enable the consumption of app invocation metrics by third-party tools.

Publish: 12 July 2023
Discuss: 19 July 2023
Resolve: 2 Aug 2023

Problem

Currently, app invocation metrics can be consumed only on the developer console. This project aims to build an API, which gives users the ability to use third party tools to:

group and filter metrics by different attributes, like appVersion, contextAri, functionKey, moduleKey , errorType, and more
set highly configurable alerts on metrics (defining SLIs and SLOs as necessary)
integrate with incident response tools, like Opsgenie, Pagerduty, and more

We intend to add any new metrics that we make available via the developer console to this API. We’re looking for feedback to make sure we’re building the best possible solution.

Proposed solution

As part of this project, we’re planning to provide an API that contains invocation metrics in OTLP protobuf JSON format. Few terms which are extensively used throughout the RFC:

OpenTelemetry is an Observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs
The OpenTelemetry Protocol (OTLP) specification describes the encoding, transport, and delivery mechanism of telemetry data between telemetry sources, intermediate nodes such as collectors and telemetry backends.
The OpenTelemetry Collector (OTEL) offers a vendor-agnostic implementation of how to receive, process and export telemetry data. It removes the need to run, operate, and maintain multiple agents/collectors. This works with improved scalability and supports open source observability data formats (e.g. Jaeger, Prometheus, Fluent Bit, etc.) sending to one or more open source or commercial back-ends.

Authentication with Atlassian GraphQL API

Note: The Atlassian account making the request has to be the same account that owns the Forge app.

Follow the steps to authenticate with the Atlassian GraphQL (AGG) API.

To get started using Basic authentication :

Copy your API token from Atlassian account.
Include the token and your email in the header of your GraphQL request.
Pass the X-ExperimentalAPI header. This is because the Forge Metrics API is still in an experimental state and is subject to change.
Provide a custom User-Agent header. This will help differentiate traffic coming from the developer console and your own export service. We recommend using this value: ForgeMetricsExportServer/1.0.0

Sample AGG Query

query Ecosystem($appId: ID!, $query: ForgeMetricsOtlpQueryInput!) {
  ecosystem {
    forgeMetrics(appId: $appId) {
      exportMetrics(query: $query) {
        ... on ForgeMetricsOtlpData {
          resourceMetrics
        }
        ... on QueryError {
          message
          identifier
          extensions {
            statusCode
            errorType
          }
        }
      }
    }
  }
}

Sample AGG Query Variables

{
  "appId": "ari:cloud:ecosystem::app/8ce114f4-d82c-45e2-b4fb-c6a0751d7d57",
  "query": {
    "filters": {
      "environments": ["8cb293d5-be08-47ae-a75c-95b89da5ad1d"],
      "interval": {
        "start": "2023-06-18T02:55:00.000Z",
        "end": "2023-06-18T02:57:00.000Z"
      },
      "metrics": ["FORGE_BACKEND_INVOCATION_LATENCY", "FORGE_BACKEND_INVOCATION_COUNT", "FORGE_BACKEND_INVOCATION_ERRORS"]
    }
  }
}

Sample AGG Query Headers

{
  "Authorization": "Basic base64<email:token>",
  "User-Agent": "ForgeMetricsExportServer/1.0.0",
  "X-ExperimentalApi": "ForgeMetricsQuery"
}

Sample AGG Query Response

{
    "data": {
        "ecosystem": {
            "forgeMetrics": {
                "exportMetrics": {
                    "resourceMetrics": [
                        {
                            "resource": {},
                            "schemaUrl": "https://opentelemetry.io/schemas/1.9.0",
                            "scopeMetrics": [
                                {
                                    "metrics": [
                                        {
                                            "name": "forge_backend_invocation_count",
                                            "description": "",
                                            "sum": {
                                                "aggregationTemporality": 1,
                                                "dataPoints": [
                                                    {
                                                        "asInt": 70,
                                                        "attributes": [
                                                            {
                                                                "key": "appId",
                                                                "value": {
                                                                    "stringValue": "8ce114f4-d82c-45e2-b4fb-c6a0751d7d57"
                                                                }
                                                            },
                                                            {
                                                                "key": "appVersion",
                                                                "value": {
                                                                    "stringValue": "4.64.0"
                                                                }
                                                            },
                                                            {
                                                                "key": "contextAri",
                                                                "value": {
                                                                    "stringValue": "ari:cloud:confluence::site/13095d29-407d-47ec-aa57-76764a470f36"
                                                                }
                                                            },
                                                            {
                                                                "key": "environmentId",
                                                                "value": {
                                                                    "stringValue": "8cb293d5-be08-47ae-a75c-95b89da5ad1d"
                                                                }
                                                            },
                                                            {
                                                                "key": "functionKey",
                                                                "value": {
                                                                    "stringValue": "updateStatusTitle"
                                                                }
                                                            }
                                                        ],
                                                        "startTimeUnixNano": "1687497375656000000",
                                                        "timeUnixNano": "1687497375662000000"
                                                    }
                                                ]
                                            },
                                            "unit": "s"
                                        },
                                        {
                                            "name": "forge_backend_invocation_errors",
                                            "description": "",
                                            "sum": {
                                                "aggregationTemporality": 1,
                                                "dataPoints": [
                                                    {
                                                        "asInt": 0,
                                                        "attributes": [
                                                            {
                                                                "key": "appId",
                                                                "value": {
                                                                    "stringValue": "8ce114f4-d82c-45e2-b4fb-c6a0751d7d57"
                                                                }
                                                            },
                                                            {
                                                                "key": "appVersion",
                                                                "value": {
                                                                    "stringValue": "5.1.0"
                                                                }
                                                            },
                                                            {
                                                                "key": "contextAri",
                                                                "value": {
                                                                    "stringValue": "ari:cloud:compass::site/6a9ea14f-759d-4f4a-b3ac-11395d8bf519"
                                                                }
                                                            },
                                                            {
                                                                "key": "environmentId",
                                                                "value": {
                                                                    "stringValue": "8cb293d5-be08-47ae-a75c-95b89da5ad1d"
                                                                }
                                                            },
                                                            {
                                                                "key": "errorType",
                                                                "value": {
                                                                    "stringValue": "UNHANDLED_EXCEPTION"
                                                                }
                                                            },
                                                            {
                                                                "key": "functionKey",
                                                                "value": {
                                                                    "stringValue": "process-app-event"
                                                                }
                                                            },
                                                            {
                                                                "key": "moduleKey",
                                                                "value": {
                                                                    "stringValue": "app-event-webtrigger"
                                                                }
                                                            }
                                                        ],
                                                        "startTimeUnixNano": "1687488960000000000",
                                                        "timeUnixNano": "1687489020000000000"
                                                    }
                                                ]
                                            },
                                            "unit": "s"
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            }
        }
    }
}

Notes

Try the AGG API here: GraphQL Gateway
Each API call retrieves at most 15 minutes of metrics. This limit is enforced to make sure the number of data points returned is not too big in the API response.
The preferred approach is to fetch data periodically, for example, every 3 or 5 minutes.
A rate limit of 5 calls per minute per user token is implemented.

Expected partner flow when consuming metrics

Partner Server

To consume the Atlassian GraphQL API programmatically and ingest metrics in real-time into the monitoring tool, we visualise partner infrastructure to have following two components at their end:

CronJob Service

The CronJob service periodically polls the exposed GraphQL endpoint for required metrics. The AGG endpoint returns OTLP protobuf JSON standard format as a response. The same response is then pushed as is to the OTEL sidecar, which is running alongside this cron service. Few approaches for same:

Serverless framework: If using AWS infra, we can configure lambda to be executed every “x” minutes or so. Similar configuration should be possible with GCP Cloud functions as well. Sample lambda configuration can look like below:

Sample lambda configuration

MyLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: MyLambdaFunction
      Runtime: nodejs14.x
      Handler: index.handler
      Code:
        S3Bucket: my-function-bucket
        S3Key: my-function-package.zip
      Layers:
        - !Ref OTelLambdaLayer
      Environment:
        Variables:
          OPENTELEMETRY_COLLECTOR_CONFIG_FILE: /var/task/config.yml
  MyScheduledRule:
    Type: AWS::Events::Rule
    Properties:
      Description: My scheduled rule
      ScheduleExpression: rate(3 minutes)
      State: ENABLED
      Targets:
        - Arn: !GetAtt MyLambdaFunction.Arn
          Id: MyLambdaTarget

Server framework: If using AWS infra, we can setup a dedicated EC2 resource running a server which polls the AGG API every “x” minutes or so. This can be a VM if running an on premise data center

OTEL Collector/Sidecar

Running an OTEL Collector involves simple configuration of below three components:

Receiver: A receiver, which can be push- or pull-based, is how data gets into the OTEL Collector. We’ll use OTLP receiver, which can receive trace export calls via HTTP/JSON. The AGG response is compatible with the accepted format for this receiver to work.
Processors: Processors are run on data between being received and being exported. While processors are optional, these are some of the recommended ones.
Exporters: An exporter, which can be push- or pull-based, is how you send data to one or more backends or destinations. All supported exporters can be found here.

Few approaches to run the OTEL collector with serverless or server framework as suitable:

Serverless framework: If using AWS infra, we can leverage OTEL lambda layer. For GCP or Azure, we can use equivalent concept as applicable.

Sample lambda with lambda layer configuration

Resources:
  OTelLambdaLayer:
    Type: AWS::Lambda::LayerVersion
    Properties:
      LayerName: OTelLambdaLayer
      Description: My OTEL Lambda layer
      Content:
        S3Bucket: my-layer-bucket
        S3Key: my-layer-package.zip
      CompatibleRuntimes:
        - nodejs14.x
  MyLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: MyLambdaFunction
      Runtime: nodejs14.x
      Handler: index.handler
      Code:
        S3Bucket: my-function-bucket
        S3Key: my-function-package.zip
      Layers:
        - !Ref OTelLambdaLayer
      Environment:
        Variables:
          OPENTELEMETRY_COLLECTOR_CONFIG_FILE: /var/task/config.yml
  MyScheduledRule:
    Type: AWS::Events::Rule
    Properties:
      Description: My scheduled rule
      ScheduleExpression: rate(3 minutes)
      State: ENABLED
      Targets:
        - Arn: !GetAtt MyLambdaFunction.Arn
          Id: MyLambdaTarget

Server framework: Running OTEL collector as a sidecar docker container on same VM/EC2 server responsible for Cron scheduling.
Create a sample otel-collector-config.yaml file in the repository as needed. Assuming signalfx is the external monitoring tool (AKA exporter), the config file should look similar to:

Sample otel-collector-config.yaml file

receivers:
  otlp:
    protocols:
      http:
      
exporters:
  signalfx:
    # Access token to send data to SignalFx.
    access_token: <access_token>
    # SignalFx realm where the data will be received.
    realm: us1
    # Timeout for the send operations.
    timeout: 30s  

processors:
  batch:

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [signalfx]

b. Create a docker image with open source OTEL collector docker image available GitHub - open-telemetry/opentelemetry-collector-contrib: Contrib repository for the OpenTelemetry Collector using the command: docker build . -t otel-sidecar:v1

Sample dockerfile

FROM otel/opentelemetry-collector-contrib:latest

# Copy the collector configuration file into the container
COPY otel-collector-config.yaml /etc/otel-collector-config.yaml

# Start the collector with the specified configuration file
CMD ["--config=/etc/otel-collector-config.yaml"]

c. Run the above docker image: docker run -p 4318:4318 otel-sidecar:v1. This will spin up the OTEL sidecar at http://localhost:4318

d. Make an HTTP “POST” request with the response of the above AGG API endpoint i.e- response.data.ecosystem.forgeMetrics.exportMetrics to the sidecar running at path http://localhost:4318/v1/metrics on same server

Sample HTTP POST curl request to OTEL sidecar

curl --location --request POST 'localhost:4318/v1/metrics' \
--header 'Content-Type: application/json' \
--data-raw '<response.data.ecosystem.forgeMetrics.exportMetrics>'

e. Metrics should now be visible in the monitoring tool configured i-e SFX in above case.

Feedback

While we would appreciate any feedback, we’re especially interested in learning more about:

Will the proposed feature allow you to easily consume the metrics that we make available? If not, what would be your preferred method and why?
What functionality (alerting, integration, advanced filters) will you configure once you have the metrics in your third-party tool?
Once the initial version with the invocation metrics is released, which metrics should we add next?

christoffer · July 19, 2023, 7:52am

Thanks for sharing.
On a meta level it would be great, if you add relevant deployments ((Server), DC, Cloud) and for Cloud the ‘integration frameworks’ (Connect, 3LO, Forge) to the RFCs (where it makes sense) - many marketplace vendors have many different apps on multiple deployments and ‘integration frameworks’ and that would make it much easier to understand the context .

I assume this RFC is aimed at Forge exclusively - or would the Cloud Fortified App metrics (for Connect apps) also be exposed via this.

Thanks for considering

AndreasEbert · July 21, 2023, 12:55pm

Thanks for letting us know. This may be interesting for us.

Two questions:

@christoffer already asked: Is this only for Forge, or also for Connect?
For clarification: This is only intended for those metrics that are made available by Atlassian in the Developer Console? So, we as app vendor cannot define and consume other kinds of metrics (for example, invocations/usage of specific app functionality)? That would be of some interest to us.

Thanks.

reddy1 · July 24, 2023, 4:32am

I’ve modified the title to reflect that this topic relates to Forge apps, thanks @AndreasEbert and @christoffer for the feedback.

Currently this is limited to the app invocation metrics that are available on the developer console. However, we are actively looking for feedback on what to add next and vendor defined metrics is one of the asks that we have heard. Could you elaborate on the metrics you would like to define?

reddy1 · October 17, 2023, 1:48pm

Hello Everyone,

While we did not get many responses on this thread, we did connect with a few of you who shared their feedback for which we are grateful. Your input has helped us come finalise a solution that addresses most of the use cases discussed here. Thank you for being a part of our journey towards enabling the consumption of metrics by third party tools.

What did we hear?

Partners were happy with the flexibility provided by an API that returns data in a standard format (OTLP) . While there is some initial work required in setting up a service to call the API, this does provide a way to consume the metrics in a wide range on tools. This approach also allows for moving to a different tool if and when required.

There is a need to be able to define custom metrics in the app which can then be consumed via this API or on the developer console.

What do we intend to tackle later?

We are exploring ways to add support for custom metrics. A push based model via a direct integration to popular observability tools will be tackled later.

What is coming next?

The metrics API is currently in EAP and we are currently working on making it generally available in a few weeks.

Once that is done, a common reason partners wanted custom metrics was to instrument API calls made from Forge apps. We plan to add the status codes and latencies of API requests made from Forge apps to this API and then follow that up with custom metric support as well.