Enabling Metrics
Metrics are not enabled by default. To turn them on you need all of the following:- Enable the
prometheusfeature flag (CONVOY_ENABLE_FEATURE_FLAG=prometheusor--enable-feature-flag=prometheus). - Set the metrics backend to Prometheus: JSON
metrics.metrics_backend:"prometheus", envCONVOY_METRICS_BACKEND=prometheus, or CLI--metrics-backend=prometheus. - Set
metrics.enabled/CONVOY_METRICS_ENABLED=truetogether withmetrics.metrics_backend:"prometheus"/CONVOY_METRICS_BACKEND=prometheuswhen usingconvoy.jsonor environment. Pure CLI use can instead pass--metrics-backend=prometheus, which (with the prometheus feature flag) enables the merged metrics configuration loaded forserverandagent. - Ensure your license allows Prometheus export where the handler enforces entitlements.
enabling convoy metrics using flags
enabling convoy metrics using env vars
GET /metrics on each process you run. Typical split deployments use convoy server (control plane API, default HTTP port 5005) and convoy agent (data plane: ingest, queue consumers, and data-plane HTTP including /metrics, default agent_port 5008). For example, docker-compose.dev.yml maps web → 5005 and agent → 5008. Both can register the shared Prometheus registry when Redis + Postgres are available. Export still requires the license to allow Prometheus metrics where the handler enforces it.
Example scrape configuration
Point Prometheus at each Convoy process you care about (replace host, port, and labels). Metrics path is always/metrics.
prometheus.yml fragment
server.http.port for convoy server (often 5005) and server.http.agent_port / AGENT_PORT for convoy agent (often 5008, matching the dev compose layout).
Example PromQL queries
Illustrative only—adjust label selectors to match your deployment._bucket / _sum / _count suffixes for convoy_end_to_end_latency.
Grafana and dashboards
Use Grafana (or any Prometheus-backed UI) to track whether observability keeps pace with the product and stays trustworthy. Dashboard and query hygiene- Version and review: Treat dashboard JSON like code—store it in git when possible, review changes in PRs, and note which Convoy version each dashboard targets. When upgrading Convoy, re-check panels that use concrete metric or label names (see tables below and
internal/pkg/metricsin the Convoy repo). - Metric contract: Prefer recording rules or documented PromQL snippets next to dashboards so renames or label changes surface during review, not only in production.
- Environment parity: If staging and production differ (extra labels, fewer scrapes, or different scrape intervals), document that on the dashboard or in your internal runbook so comparisons stay honest.
- Scrape health: Alert on Prometheus
upfor eachjobthat scrapes Convoy, onscrape_samples_scrapedcollapsing to zero for critical jobs, and on rule evaluation errors if you use recording rules. - Series sanity: Compare
convoy_ingest_totalrates to traffic you expect; sudden flatlines or orders-of-magnitude drift often indicate scrape, network, or process issues—not just low traffic. - Cardinality: High-cardinality labels (many unique
endpoint,source, orprojectvalues) increase cost; use Grafana’s Explore or Prometheus TSDB status to spot exploding label sets after config or feature changes. - Postgres-backed gauges: Queue depth metrics refresh on
metrics.prometheus_metrics.sample_time; stale values can reflect sampling interval, DB load, or query timeouts (CONVOY_METRICS_QUERY_TIMEOUT/query_timeout), not only empty queues.
- Noise vs signal: Pair alerts on error rates or backlog with minimum traffic thresholds (for example, a
for:duration, or a separate guard on ingest rate) to avoid flapping on idle environments. - Runbook links: Add a runbook URL or on-call note to each alert annotation (Grafana Alerting, Alertmanager
annotations.runbook_url, etc.) describing first steps: check Convoy process health, Redis, Postgres, recent deploys, and the relevant section of this metrics doc. - License-gated metrics: If
/metricsreturns minimal or empty output after a license change, verify entitlements before chasing infrastructure issues.
Ingest counters and end-to-end latency
These are registered frominternal/pkg/metrics/data_plane.go when Prometheus is enabled and the license allows export. Labels: project and source on ingest counters; project and endpoint on the histogram.
| Name | Type | Description |
|---|---|---|
convoy_ingest_total | Counter | Total number of events ingested |
convoy_ingest_success | Counter | Total number of events successfully ingested and consumed |
convoy_ingest_error | Counter | Total number of errors during event ingestion |
convoy_end_to_end_latency | Histogram | Total time (in seconds) an event spends in Convoy (recorded per delivery). |
convoy_ingest_latency histogram (per project); your build may or may not register it on /metrics—confirm by scraping.
Queue depth and backlog (Redis and Postgres)
These come from custom collectors, not fromdata_plane.go. When metrics are enabled, RegisterQueueMetrics attaches the Redis queue and Postgres implementations to the same registry, so they appear alongside the series above on /metrics for that process. In server + agent deployments, queue and ingest series are normally observed on the agent scrape target (data plane); the control server exposes its own /metrics for whatever it registers.
Postgres-backed values are refreshed on a sample interval (metrics.prometheus_metrics.sample_time). Depending on version and schema, queries may use materialized views or live SQL—see the server release notes if you upgrade.
Redis (Asynq) queues
| Name | Type | Labels | Description |
|---|---|---|---|
convoy_event_queue_scheduled_total | Gauge | status | Tasks waiting on the create-event queue (queue size minus completed/archived). |
convoy_event_workflow_queue_match_subscriptions_total | Gauge | status | Tasks waiting on the workflow queue used when matching subscriptions. |
Postgres (events and deliveries)
| Name | Type | Labels | Description |
|---|---|---|---|
convoy_event_queue_total | Gauge | project, source, status | Counts derived from events (or materialized views when present). |
convoy_event_queue_backlog_seconds | Gauge | project, source | Age in seconds of the oldest pending work for that project/source. |
convoy_event_delivery_queue_total | Gauge | project, project_name, endpoint, status, event_type, source, organisation_id, organisation_name | Tasks in the delivery pipeline per endpoint and dimensions. |
convoy_event_delivery_queue_backlog_seconds | Gauge | project, endpoint, source | Oldest pending delivery backlog per endpoint (seconds). |
convoy_event_delivery_attempts_total | Gauge | project, endpoint, status, http_status_code | Delivery attempts grouped by outcome and HTTP status. |
Tracing
Convoy can emit application traces (separate from product telemetry in Mixpanel). Configure the tracer undertracer in convoy.json (see Configuration) or use the environment variables below—they map to TracerConfiguration in the Convoy server config.
- Provider:
CONVOY_TRACER_PROVIDER=otel|sentry|datadog(CLI:--tracer-type). - OpenTelemetry:
CONVOY_OTEL_COLLECTOR_URL(collector gRPC URL),CONVOY_OTEL_SAMPLE_RATE,CONVOY_OTEL_INSECURE_SKIP_VERIFY, optionalCONVOY_OTEL_AUTH_HEADER_NAME/CONVOY_OTEL_AUTH_HEADER_VALUE(same values as JSONtracer.otel.otel_auth.header_name/header_value). - Sentry:
CONVOY_SENTRY_DSN,CONVOY_SENTRY_SAMPLE_RATE,CONVOY_SENTRY_ENVIRONMENT,CONVOY_SENTRY_DEBUG. - Datadog:
CONVOY_DATADOG_AGENT_URL(requires Datadog tracing entitlement on the license).
tracer otel fragment
sample_rate / CONVOY_OTEL_SAMPLE_RATE; not every code path may emit spans at every request. Span names emitted from the data plane (agent) today include (non-exhaustive): event.creation.success, event.creation.error, dynamic.event.creation.success, dynamic.event.creation.error, dynamic.event.subscription.matching.error, and meta_event_delivery. New releases may add or rename spans—confirm in your trace backend.
[!WARNING] Feature flags in Convoy were reimplemented on a per-feature basis.
The following flags/configs are no longer valid:
--feature-flag=experimentalexport CONVOY_FEATURE_FLAG=1