Skip to main content
Architecture

AWS Observability vs OpenTelemetry: What I Learned

Ashish Suman8 min read

Why I Explored OpenTelemetry

For the past 9 years, every AWS project I worked on used CloudWatch and X-Ray. It was automatic — spin up services, observability comes built-in. No complaints.

Then came a project with a twist: the application needed to run across multiple clouds. AWS-native observability simply wasn't an option.

That led me to explore alternatives — both paid and open-source. After analyzing several options, we landed on OpenTelemetry. The paid tools were impressive, but we didn't want to trade one vendor lock-in for another.


What I Still Like About CloudWatch/X-Ray

Let me be clear: CloudWatch and X-Ray are excellent tools. Here's where they shine:

Zero setup friction. You can get up and running in no time. Almost no code required — everything works out of the box.

Native integration. CloudWatch talks to Lambda, API Gateway, DynamoDB, and every other AWS service without configuration. It just works.

Perfect for getting started. When you're building an MVP or early-stage product, you don't need a complex observability pipeline. You need to ship. CloudWatch lets you do that.


Where CloudWatch Falls Short

After years of using it, I've hit some consistent pain points:

Customization is hard. The visualization is rigid. Widget limitations and cross-account/cross-region constraints get frustrating as your system grows.

Connecting the dots is painful. Correlating metrics, logs, and traces in a single view requires significant configuration and code. It's possible, but not seamless.

These aren't deal-breakers for simple architectures. But when you're running distributed systems across environments, they start to compound.


Setting Up OpenTelemetry

For our stack, we chose:

  • Prometheus for metrics

  • Jaeger for traces

  • OpenSearch for logs

  • Grafana for visualization

OpenTelemetry has become an industry standard with strong community support and integrations with virtually every observability tool on the market.

What surprised me: The configuration is simple yet powerful. It covers not just the application layer but the underlying system as well. OpenTelemetry exports data to specialized tools (Prometheus, Jaeger, OpenSearch), and Grafana ties it all together with end-to-end request lifecycle visualization.

Setup time: A few hours to get a working pipeline. We've since automated the entire setup with Ansible, making it repeatable across environments.


The Real Comparison

Here's how the two approaches stack up in practice:

Dimension

CloudWatch / X-Ray

OpenTelemetry

Setup time

Almost none

A few hours

Customization

Hard

Easy

Cost

$$$

$

Multi-cloud

No

Yes

Debugging

Easy

Easy

Learning curve

Easy

Easy

Where OpenTelemetry wins: Cloud-agnostic solutions without vendor lock-in. Same monitoring capabilities for on-premises and internal applications. When we needed identical observability for internal applications running on on-prem servers, the OTel stack worked flawlessly.

Where CloudWatch wins: Quick deployment on AWS when you want an efficient, no-code monitoring solution.


The Operational Reality

Running your own observability stack isn't free. Here's what I've learned:

Index management is painful. Managing indices for logs and traces in OpenSearch requires ongoing attention. It's not set-and-forget.

Reliability requires planning. Early on, Prometheus stopped accepting requests due to high call volume. Once we started batching requests, it stabilized. But it was a reminder: you're now responsible for your monitoring infrastructure.

Monitoring the monitor. We use Grafana alerts to notify us of any downtime in the observability pipeline itself. Yes, you need to monitor your monitoring.

Cost comparison: OpenTelemetry is cheaper than most paid solutions. No restrictions on application count, call volume, or data retention. Retention depends entirely on your needs. Maintenance has its overhead, but so does running any production system.


Team Adaptation

The team was happy. Using the same tooling everywhere meant consistent knowledge across environments. Same dashboards, same queries, same debugging workflows — whether troubleshooting AWS, another cloud, or on-prem.

Skills required: Prometheus and Grafana experience was important for our team. Jaeger and OpenSearch were easier to pick up.

Small teams: It depends entirely on the application's architecture and roadmap. A distributed, multi-cloud application in maintenance mode can actually be managed by a small team if the automation is solid. However, for a 2-3 person team building a fresh AWS-only MVP, the overhead of OTel might be a distraction.


My Decision Framework

When a CTO asks me "CloudWatch or OpenTelemetry?", I ask three questions:

  • Where will your applications run? AWS only, or multiple environments?

  • Is AWS the only cloud you're targeting? Now and in the future?

  • Are you willing to invest in monitoring infrastructure right now?

My rule of thumb:

  • If you're targeting AWS only and it's a new product, the AWS observability stack gets you up and running in no time.

  • If you have a mature product with multiple microservices and don't want vendor lock-in, choose OTel.

For my next greenfield project: It depends. For serverless development, AWS observability still suits perfectly. But if I'm building a distributed system with multi-cloud support, OpenTelemetry will be my default choice.


The Future of Observability

Every major paid monitoring tool now supports OpenTelemetry. That tells you where the industry is heading. The community support is massive and growing.

OpenTelemetry is becoming the standard — not because it's free, but because it solves real problems around portability and vendor independence.


Final Thoughts

Use CloudWatch/X-Ray when you need to hit the ground running on AWS with zero setup friction. Use OpenTelemetry when you need a mature, cloud-agnostic standard that grows with your multi-cloud or on-prem architecture without vendor lock-in.

One thing most people get wrong about observability: it's not a silver bullet. It gives you insight, but at the end of the day, it's still a developer's responsibility to write performant code.

If I could only have one observability signal — logs, metrics, or traces — I'd choose traces. Applications can generate text-based logs, but the ability to see an end-to-end request lifecycle is irreplaceable for debugging distributed systems.

Any regrets going the OpenTelemetry route? None so far.


What drove your observability strategy? Running OpenTelemetry in production — how are you managing collector infrastructure and reliability? I'd love to hear your experience.

AS
Ashish Suman

Solution Architect with 18+ years building production systems. I write about what breaks at 3 AM.

Get Architecture Insights

My best lessons on scaling systems and leading teams — delivered once a month. No spam, just 18 years of production experience distilled.