Instrumenting Python services with OpenTelemetry and Grafana Cloud

OpenTelemetry instrumentation appears straightforward at first: install the packages, set OTEL_EXPORTER_OTLP_ENDPOINT, configure the Grafana Cloud auth header, run the service under opentelemetry-instrument, and wait for traces to appear.

That establishes transport. It does not, by itself, produce a useful observability system.

The more important work begins after telemetry is successfully exported. Consider a Python backend that runs ASGI services, includes long-lived async background workers, emits structured logs through structlog, and deploys more than one runtime from the same repository. Grafana Cloud is the destination, but the core engineering concerns are process boundaries, signal quality, cardinality control, and deciding which telemetry should not be emitted yet.

Export configuration is only the starting point

The runtime configuration should usually be small and explicit:

ENV OTEL_RESOURCE_ATTRIBUTES=service.name=api \
    OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
    OTEL_TRACES_EXPORTER=otlp \
    OTEL_LOGS_EXPORTER=otlp \
    OTEL_METRICS_EXPORTER=none

Grafana Cloud provides the OTLP endpoint and authentication header. OpenTelemetry already knows how to export OTLP. The main implementation risks are usually not in that handshake.

They are in questions such as:

Which process actually initializes instrumentation?
Are logs being exported as records or as already-rendered JSON strings?
Which service name shows up in Grafana when two runtimes share the same repo?
Are metrics worth turning on before we understand their cardinality?
Will local development become unreadable once the OTel env vars exist?

These decisions determine whether telemetry remains usable after the first successful export.

Initialize inside the worker, not only before it

The first non-obvious part is process startup. An application might be started with an auto-instrumentation wrapper:

CMD ["opentelemetry-instrument", "granian", "--interface", "asgi", "--workers", "1", "--http", "auto", "--host", "0.0.0.0", "--port", "8000", "app.entrypoint:app"]

Wrapping the server command is necessary, but it may not be sufficient. ASGI servers have their own import and worker model. If instrumentation happens in the wrong process, OpenTelemetry can appear to be enabled while the application worker remains the place where imports, middleware, clients, and background tasks actually run.

The fix was to make the ASGI import path go through a tiny OTel-aware entrypoint:

from observability import initialize_otel_worker

initialize_otel_worker()

from yourapp.web import app

The initializer is intentionally idempotent:

def initialize_otel_worker() -> None:
    if os.environ.get("OTEL_WORKER_INITIALIZED") == "1":
        return

    if not _otel_export_enabled():
        return

    from opentelemetry.instrumentation.auto_instrumentation import initialize

    initialize()
    os.environ["OTEL_WORKER_INITIALIZED"] = "1"

This small entrypoint makes worker initialization explicit, avoids double instrumentation, and lets each runtime opt into the same behavior without making deployment commands more fragile.

The lesson: for Python web services, “I ran opentelemetry-instrument” is not a complete answer. Check where your application is imported, where workers are forked or spawned, and whether instrumentation is happening in the process that actually handles work.

Pre-fork worker caution

This is especially important for pre-fork servers. For example, a Gunicorn deployment might be started like this:

gunicorn myapp.main:app --workers 4

With Python auto-instrumentation, multiple pre-fork workers can break metrics export. The issue is that key OpenTelemetry SDK components assume background threads and locks that do not survive fork() cleanly. In particular, PeriodicExportingMetricReader uses a thread to periodically flush metrics to the exporter; after a fork, each child can inherit references to thread and lock state that is no longer valid. The OpenTelemetry troubleshooting guide calls this out directly and links to related issues #2767 and #3307, as well as Python’s long-running fork-and-lock issue 6721.

If you need multiple worker processes, validate the metric path specifically. Traces and logs may appear healthy while metrics silently fail or become inconsistent.

Logs should remain structured records

The second common failure mode is logging.

Before OTel, JSON logs to stdout were enough. A log aggregator can ingest JSON lines, and humans can still read local console output. Once OpenTelemetry logs enter the picture, the shape changes. The logging instrumentation wants stdlib LogRecord objects with attributes. If structlog renders a JSON string too early, OTel may export the log message, but the useful fields become trapped inside a string.

Production OpenTelemetry logging should use a processor chain that preserves fields:

if otel_log_export_active:
    processors = shared_processors + [
        _prefix_log_record_reserved_fields,
        structlog.stdlib.render_to_log_kwargs,
    ]

The important part is render_to_log_kwargs. It passes structured fields through the stdlib logging path instead of flattening them into text first.

This also exposed a less obvious problem: some perfectly reasonable application field names collide with reserved LogRecord attributes. For example, name is already a logging attribute. If you send name="example" as a structured field, it can conflict with the logger name.

The fix was not to ban normal words from logs. It was to prefix reserved collisions:

def _prefix_log_record_reserved_fields(_, __, event_dict):
    for key, value in pop_reserved_log_fields(event_dict).items():
        event_dict[f"field_{key}"] = value
    return event_dict

That turns name="example" into field_name="example" while preserving the actual logger name.

The lesson: if you use structured logging, test the actual path into OTel. Do not only check that a line appears in Grafana. Check that fields are queryable as fields.

Development output should stay boring

It is tempting to make local development match production exactly. In practice, OTel env vars leak into dev shells, Docker Compose files, and one-off commands. If that makes every local log line JSON, debugging gets worse.

The logging setup treats development as a separate presentation layer:

otel_log_export_active = otel_logs_enabled() and not ENVIRONMENT.is_dev

In dev, even if OTEL_LOGS_EXPORTER=otlp is present, console logs stay human-readable. Production gets the OTel-friendly stdlib path. The tests assert both behaviors.

This distinction is operationally important. Observability code is infrastructure code. If it makes the normal edit-run-debug loop harder, developers will route around it, remove fields, or avoid adding instrumentation.

The lesson: production telemetry and local readability are different product requirements. Treat them separately.

Service names are part of your data model

Grafana Cloud will happily ingest everything under whatever service.name you give it. That makes service naming feel like a label you can fix later.

You can fix it later, but every dashboard, trace query, alert, and mental model starts depending on it immediately.

If the repository has more than one deployable runtime, each runtime should get a deliberate service name. The public API might use:

OTEL_RESOURCE_ATTRIBUTES=service.name=api

A worker or internal dashboard might use:

OTEL_RESOURCE_ATTRIBUTES=service.name=worker

Those names are not just decoration. They decide whether Grafana shows one giant service with mixed routes and logs or two operationally distinct services. If you choose too granularly, everything fragments. If you choose too coarsely, traces become hard to navigate.

The lesson: choose service names based on runtime ownership and operational behavior, not repository layout alone.

Metrics are not automatically worth it

One of the most important configuration choices can be disabling a signal:

OTEL_METRICS_EXPORTER=none

This can be deliberate.

Traces and logs had immediate value. Metrics were less obvious because automatic metrics can create a lot of series quickly, especially when HTTP routes, clients, database calls, task labels, or model names get involved. Grafana Cloud can handle serious volume, but your bill and your dashboards still depend on cardinality discipline.

Disabling metrics was not an anti-metrics stance. It was a sequencing choice: get traces and logs correct first, then add curated metrics where they answer a real operational question.

Good early metrics are boring:

request latency by normalized route, method, and status class
queue depth by queue name
background worker loop success and failure counts
external provider latency by provider, not by user or prompt
database pool saturation

Bad early metrics are exciting:

labels with user IDs
labels with task IDs
labels with raw URLs
labels with model prompts, email subjects, or arbitrary error text

The lesson: OpenTelemetry makes it easy to emit metrics. That does not make every dimension a metric label.

Background workers need first-class traces

HTTP traces are the obvious win. A request comes in, middleware creates a span, downstream HTTP and database clients attach child spans, and Grafana shows a waterfall.

But a lot of backend behavior does not live inside an HTTP request. This app has background consumers, schedulers, recovery tasks, PostgreSQL LISTEN/NOTIFY loops, and long-lived async workers. Those paths are often where the incidents are.

Auto-instrumentation will not magically turn every business workflow into a meaningful trace. It can show library calls, but it cannot know that “recover orphaned tasks after restart” is one operation, or that “wake a scheduled task from a database notification” is one operation.

A useful pattern is to keep auto-instrumentation for the substrate and add manual spans around workflow boundaries:

with tracer.start_as_current_span("worker.startup_recovery"):
    await recover_interrupted_jobs()

The same applies to queue consumers and notification handlers. If the unit of work is not an HTTP request, create a unit-of-work span yourself.

The lesson: auto-instrumentation gives you plumbing traces. Product and worker traces still need names chosen by humans.

Context fields beat clever log messages

Before observability tooling is in place, logs often encode context directly in the message:

Search completed for request abc123 with status 200 in 34ms

That is readable, but it is less useful than structured fields:

logger.info(
    "Search request completed",
    request_id=request_id,
    client_trace_id=client_trace_id,
    status_code=response.status_code,
)

The second version lets Grafana filter by request_id, correlate with traces, group by status, and keep the event name stable. It also lets you change presentation later without rewriting every log line.

The request middleware binds context fields at the start of the request, including a generated request_id, route path, method, and optional client trace headers. That context then follows downstream logs without every call site repeating it.

The lesson: once logs are structured and queryable, the message should be boring and stable. Put variability in fields.

Test the observability code

Telemetry code often goes untested because it feels like configuration. That is a mistake.

The logging path has tests for three things that were easy to regress:

production OTel logs emit JSON-compatible structured fields
reserved LogRecord attributes are renamed instead of dropped
development keeps console output readable even when OTel env vars are present

These tests protect real behavior. A broken logging processor can silently turn every useful field into an opaque string. A reserved attribute collision can drop context at the moment it is needed. A dev/prod branch can regress because it is not exercised locally in the same way it runs in CI or production.

The lesson: test the shape of emitted telemetry, not just the business code that creates it.

Key takeaways

The main lesson is that the exporter is only a transport mechanism. The quality of the observability system depends on what the application sends through it.

Service names should reflect operational boundaries, not just repository layout. Logs should arrive as structured records, not pre-rendered strings. Metrics should be added deliberately, with close attention to worker models and cardinality. Auto-instrumentation is useful, but it does not remove the need to name important background workflows with manual spans.

Connecting OpenTelemetry to Grafana Cloud is usually not the hard part. The hard part is deciding what information deserves to become operational data.

OpenTelemetry provides a wide telemetry pipeline, and Grafana Cloud provides a place to query it. The engineering work is ensuring the pipeline carries facts that can drive operational decisions rather than a high-volume record of low-value events.