Cees Bos

My gists about Observability

Observability enthusiast and Grafana Champion, always looking for ways to improve software with better observability and what new insights can be gained. Delivering reliable software is always my priority. Combining existing tools often results in new ways to get an even better view. I work for OpenValue as a software & observability engineer and SRE.
    @github @mastodon @rss Grafana Champions information Spreaker profile at Sessionize @bluesky LinkedIn
    May 12 – 15, 2025

    How I solved production issues with OpenTelemetry (and how you can too)

    May 12 – 15, 2025 - Cologne, Germany
    Talk details: https://schedule.jcon.one/session/781034

    Abstract

    Ensuring the reliability of your Java applications is critical in today's fast-paced world. But how do you identify and fix production issues before they get worse? With cloud-native applications, it can be even more difficult because you can't log into the system to get some of the data you need. The answer lies in observability - and in particular, OpenTelemetry.

    In this session, I'll show you how I used OpenTelemetry to solve several production problems. You'll learn how I uncovered critical issues that were invisible without the right telemetry data - and how you can do the same. OpenTelemetry provides the tools you need to understand what's happening in your application in real time, from tracking down hidden bugs to uncovering system bottlenecks. These solutions have significantly improved our applications' performance and reliability.

    A key concept we will use is traces. Architecture diagrams often don't tell the whole story, especially in microservices landscapes. I'll show you how traces can help you build a service graph and save you hours in a crisis. A service graph gives you an overview and helps to find problems.

    Whether you're new to observability or a seasoned professional, this session will give you practical insights and tools to improve your application's observability and change the way how you handle production issues. Solving problems is much easier with the right data at your fingertips.

    Slides

    All shared slides from Cees Bos

    Details about code examples

    An overview of the demo setup I used during the presentation:

    The K6 scripts and the code of Alpha en Beta services can be found here:
    🖵 https://github.com/cbos/solving-problems-with-opentelemetry

    The OpenTelemetry collector, Prometheus, Tempo, Loki and Grafana are part of the Observability Toolkit:
    🖵 https://github.com/cbos/observability-toolkit

    OpenTelemetry documentation

    As mentioned during the talk, OpenTelemetry has a lot of good documentation.
    Here are some links:

    And some more links on zero code/auto-instrumentation:

    • 🗎 Java agent
    • 🗎 Annotations to create internal spans for example
    • 🗎 API to enrich your spans and create custom metrics

    📣 Application Observability Code Challenges

    With Application Observability Code Challenges you can practice yourself.
    The goals of the challenges:

    The goals of the challenges:

    • 🎉 - Have fun !
    • 🖵 - Learn to understand the behaviour of the code
    • 📈 - Learn how to use observability tools to understand code behaviour 📈
    • 🔍 - Spot the unexpected behaviour!
    • 🤗 - Practice and learn!
    • 🎁 - Share your findings and solution, either as a comment or as a pull request

    Here is a list of available code challenges:

    ChallengeDescriptionExplanation
    1A simple Spring Boot application with a REST endpoint implemented in Jersey/JAX-RS with unexpected behaviour. Can you find out what is happening based on the observability data?step by step revealed

    JCON Talk announcement

    propulsed by hugo and hugo-theme-gists with