Instrumenting Claude Code with OpenTelemetry

Claude Code orchestrates multiple LLM calls, tool executions, and caching decisions per conversation turn. You see the final output, not the machinery. How many LLM calls did that response actually take? What got cached and what didn’t? How much did that session cost? Which tool calls were expensive? When a response feels slow, you’re left guessing — was it the model, a slow tool, or multiple round trips? Is it looping mindlessly?

Claude Code ships with built-in OpenTelemetry instrumentation.

It’s off by default. Turn it on, point it at a collector, and you get full distributed traces of every session — LLM calls, tool executions, token usage, cache behavior. The instrumentation already exists. You just need to enable it and plug in your observability stack.

Setting Up the Collector to receive data

Skip the config files and the docker-compose plumbing. Use the grafana/otel-lgtm all-in-one image. It’s Grafana, Tempo, Loki, Mimir, and an OTel collector pre-wired together. No configuration needed. One command:

			
docker run -d --name lgtm \
  -p 3000:3000 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 3200:3200 \
  -v /<your persistent path to>/grafana-data:/data \
  grafana/otel-lgtm

		

The ports:

3000: Grafana UI — where you’ll view your traces
4317: OTLP gRPC — standard OTel endpoint, not used here
4318: OTLP HTTP/protobuf — what Claude Code uses
3200: Tempo API — used later for querying traces programmatically

The -v mount persists your data across container restarts. Without it, you lose your traces every time you restart the container.

Why this image? Everything works out of the box and for local development and experimentation, that’s exactly what you want.

For Quarkus Developers: You Already Have One

If you’re a Quarkus developer running quarkus dev, you might already have this collector running. The quarkus-observability-devservices-lgtm extension spins up the exact same grafana/otel-lgtm container as a devservice:

			
<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-observability-devservices-lgtm</artifactId>
    <scope>provided</scope>
</dependency>

		

The devservice auto-configures OpenTelemetry for your Quarkus application. To add Claude Code traces to the mix, set OTEL_EXPORTER_OTLP_ENDPOINT (see bellow) to point at the devservice’s mapped OTLP HTTP port. Check your dev mode logs or the Dev UI for the ephemeral port — it’s usually http://localhost:<some-high-port>.

The payoff: Claude Code traces and application traces in the same Grafana instance. When you’re debugging or creating an app in dev mode with Quarkus app with Claude, you see both what Claude did and what your app did in the same place minimizing context switching.

Enabling Telemetry in Claude Code

Configuration happens in two layers: what to emit and where to send it.

Layer 1: System environment variables control what gets emitted. Set these in your shell before starting Claude Code:

			
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1
export OTEL_LOG_USER_PROMPTS=1        # include prompt text
export OTEL_LOG_TOOL_DETAILS=1        # include tool parameters
export OTEL_LOG_TOOL_CONTENT=1        # include full tool input/output

		

The first two are gates: CLAUDE_CODE_ENABLE_TELEMETRY turns on basic telemetry, CLAUDE_CODE_ENHANCED_TELEMETRY_BETA enables the detailed instrumentation you actually want. The last three control granularity. Without them, you get spans and timings but not the content. With them, you get prompt text, tool parameters, and full tool input/output in your traces.

Privacy note: Those last three flags mean your prompts and tool outputs end up in your Tempo instance. Fine for local development. Think carefully before pointing this at a shared collector.

Layer 2: ~/.claude/settings.json controls where telemetry goes. Add an env block with standard OTel SDK environment variables:

			
{
  "env": {
    "OTEL_TRACES_EXPORTER": "otlp",
    "OTEL_METRICS_EXPORTER": "otlp",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4318"
  }
}

		

These are standard OpenTelemetry SDK variables. The env key in settings.json injects them into the Claude Code process. You could set them in your shell instead, but settings.json keeps them scoped to Claude and out of your global environment.

Reading Your First Trace

Say hello to Claude code in the terminal.

Open Grafana at http://localhost:3000. Navigate to Explore, select the Tempo data source, and search for recent traces. You should see a trace from your Claude Code session.

Click into it. The span hierarchy shows what actually happened under the hood:

Session span at the top — the entire conversation session
Turn spans as children — each conversation turn, one per user message
LLM calls and tool executions as leaf spans — the actual work

LLM spans carry the details you care about: model name, token counts broken down by input, output, cache read, and cache create. Latency for each call. Tool spans show tool name, execution time, and if you enabled the detail flags, full input and output.

My trace id: 197337e269ade02f7cb7d7fef8c9fc6c

The key insight: what looked like one “response” from Claude was actually multiple LLM calls with tool executions interleaved. You asked a question, Claude read files, called tools, made decisions, generated responses, all orchestrated across several round trips to the model. The trace makes the orchestration visible. You see the caching behavior, the token usage per call, which tools ran, how long they took. You see the machinery.

That’s the shift. Before, you had a black box. Now you have a timeline that you can measure.

Closing the Loop: Claude Queries Its Own Traces

Tempo exposes an API on port 3200. With a Tempo MCP server configured, Claude can query this API directly. Not just view traces in a dashboard, but analyze them, calculate costs, extract insights.

I added the Tempo MCP server to Claude just by telling it to use the MCP server in port 3200. It will figure out what to do.
Then I asked:

Can you list the token cost per span in this trace: 197337e269ade02f7cb7d7fef8c9fc6c

Claude queried the Tempo MCP server, pulled the span data, extracted token counts from span attributes, calculated costs using Opus 4.6 pricing ($15/$75/$1.50/$18.75 per 1M input/output/cache-read/cache-create tokens), and produced this:

Span	Input	Output	Cache Read	Cache Create	Cost
LLM #1	3	193	19,621	63	$0.0451
LLM #2	3	112	19,684	711	$0.0513
LLM #3	1	170	20,395	136	$0.0459
LLM #4	1	617	20,531	323	$0.0831
LLM #5	1	241	20,854	2,032	$0.0875
Total	9	1,333	101,085	3,265	$0.3130

Look at the numbers. Cache reads dominate the token volume at 101K tokens, but they’re cheap at $1.50 per million. Output tokens are the real cost driver at $75 per million. Five LLM calls for what felt like a single interaction. Total cost: $0.31.

The insight isn’t new if you’re paying attention to cost. LangChain4J already provides this data for some models, but in here Claude analyzed its own execution telemetry and surfaced cost per step.

That’s the payoff. Observability isn’t just for dashboards. It’s data. And if the data is accessible via API or an MCP server like in here, Claude can reason about it. Query traces, analyze patterns, calculate costs, debug performance issues, all without leaving the conversation. The system that generates the traces can now analyze them.

For the full reference on all available metrics, events, trace attributes, and configuration options, check the official monitoring documentation.

Note: I wrote the post assisted by Claude Code.

Observability in Quarkus 3

Observability on a software system can be described as the capability to allow a human to ask and answer questions. To enable developers and support engineers in understanding how their applications behave, Quarkus 3.3 includes many improvements to its main observability related extensions:

quarkus-opentelemetry (tracing)
quarkus-micrometer (metrics)

For the full article please see the Quarkus Blog

One year of Quarkus at Talkdesk

Quarkus was launched by Red Hat on March 7, 2019, and it looked shiny and disruptive. Almost at the same time, Talkdesk’s Workforce Management (WFM) project kicked off and embraced it.

This is the story of our journey together.

See the full article at Talkdesk’s technical blog.

TomEE vs SpringBoot vs Quarkus

There’s a new framework in the Java world. Quarkus. I’ve decided to compare how it behaves in relation to Apache TomEE and Spring Boot. I’m going to measure artefacts size, compilation, test and deploy times while using Java 8 and Maven.

To compare the frameworks I will create an application with a realistic set of features. It will use REST endpoints, Fault Tolerance and an in memory database with JPA. The source code is available on Github at the java-framework-compare project.

The starting point for each of the subprojects:

The Apache TomEE 8.0.0-M2 Microprofile flavour comes from the MicroProfile Starter by selecting Config and Fault Tolerance APIs.

The Spring Boot 2.1.3 application comes from the Spring Initializr page by adding web, jpa and H2 dependencies.

The Quarkus 0.11.0 app starting point is humbler. It comes from the rest-json example bundled in the quarkus-quickstarts bundle.

The baseline

All starter projects have different bundled dependencies. Just for curiosity these are the sizes, build and startup times out of the box, without an application running:

Platform	Build Time (s)	Start Time (s)	Size (MB)
Apache TomEE 8.0.0-M2	5,454	3,789	44
Spring-boot 2.1.3	2,248	2,936	16,7
Quarkus 0.11.0	1,890	0,623	10,5

This is important to later evaluate the impact of our changes to the baseline, in a realistic project.

The application

To start, I used the implementation coming from the Quarkus rest-json quick-start example. This bundles a set of tests and rest endpoints I was able to successfully port to SpringBoot and TomEE.

The Integration tests are using RestAssured and JUnit 5 Jupiter and are equivalent across the 3 projects. They needed some tweaks on SpringBoot. Getting the random port and configuring the RestAssured serialization was not obvious.

On TomEE I had to use JUnit 4 because Arquillian, the integration test framework commonly used there, does not support JUnit 5 yet.

I’ve added Timeout and Fallback Fault tolerance features to a method. On both Quarkus and TomEE using MicroProfile and to SpringBoot using Netflix’s Hystrix. Hystrix, commonly used in SpringBoot is on maintenance mode I’m not aware of a replacement on the Spring side. As a comparison, the Microprofile Spec has added support for CompletionStage and will be doing work around reactive code soon.

I tried to use JPA with H2 database but couldn’t make it in Quarkus so I also dropped it in the others but kept all dependencies. The documentation is still sketchy and I need more research time.

Conclusions

Platform	Build Time (s)	Build Time with tests (s)	Start Time (s)	Size (MB)
Apache TomEE 8.0.0-M2	6,178	15,304	4,993	44
Spring-boot 2.1.3	3,358	13,348	6,799	46,9
Quarkus 0.11.0	2,533	7,153	0,767	23.4

It looks like build time without tests increased by ~1s in all projects.

The build with tests with Quarkus takes half the time from the other 2. Artefact size is also half of the other two. Start time is almost one order of magnitude bellow. This is very significant. Please note that this start time can be further reduced by using native code with GraalVM.

SpringBoot startup time and uber war size increase very significantly (>2x) when you add real functionality to it.

TomEE starts faster than SpringBoot and the uber jar size is very stable for this use case. No Change.

The online resources on Quarkus are rare if compared with TomEE or SpringBoot leading to a lot of trial and error.

Further work

Make JPA with H2 work and measure the performance impact.

Add docker and kubernets and measure deployment times in the cloud.

Generate native artefacts for Quarkus and measure the effect.

Tech notes

Build time is calculated as the average of 3 consecutive executions of the base project using “mvn clean install -DskipTests”

Start time is calculated as the average of 3 consecutive executions of the generated uber jar/war. In the case of Quarkus it’s the runner+lib folder. Example: “java -jar target/spring-boot-0.0.1-SNAPSHOT.war”

The size is of the generated uber jar/war. In the case of Quakus it’s the runner+lib folder.

All this was executed on a Lenovo Thinkpad T460 with an Intel I7, 32GB ram and 1/3 empty 512GB SSD.

Photo taken on Saturday near Penacova, Portugal