Skip to main content

First-Class Feature Flags. Out of the box.

Say goodbye to context switching and hello to feature flags in your IDE.

Micrometer Gauges, Datadog and Kubernetes

· 4 min read
Jeff Dwyer

I didn't think I'd be writing this.

I really thought it would be a 3 line commit. All I wanted to know was how many streaming connections I had. DataDog was already setup and was happily sending metrics to it, so I figured I'd just add a gauge and be done with it.

But here we are.

A Basic Gauge

Micrometer is a "Vendor-neutral application observability facade" which is Java speak for "a common library of metrics stuff like Counters, Timers, etc" If you want a basic "what is the level of X over time", a gauge is the meter you are looking for.

Here's a basic example of using a Gauge. This is a Micronaut example, but is pretty generalizable.

@Singleton
public class ConfigStreamMetrics {

private final AtomicInteger projectConnections;

@Inject
public ConfigStreamMetrics(MeterRegistry meterRegistry) {
projectConnections =
meterRegistry.gauge(
"config.broadcast.project-connections",
Tags.empty(),
new AtomicInteger()
);
}

@Scheduled(fixedDelay = "1m")
public void recordConnections(){
projectConnections.set(calculateConnections());
}
}

Ok, with that code in place and feeling pretty sure that calculateConnections() was returning a consistent value. You can imagine how I felt looking at the following, which shows my gauge value going all over the place from 0 to 1 to 2 (it should just be 2). All over the place

Why is my gauge not working?

What is happening here? The gauge is all over the place. It made sense to me that taking the avg was going to be wrong, if I have 2 servers I don't want the average of the gauge on each of them, I want the sum. But I'm charting the sum() here and that doesn't exp lain what's happening.

The Key

The key is remembering how statsd with tagging works and discovering some surprising behavior from a default DataDog setup.

Metrics from micrometer come out looking like config.broadcast.project-connections.connections:0|g|#statistic:value,type:grpc.

As an aside, I'd highly recommend setting up a quick clone of statsd locally that just outputs to stdout when you're trying to get this all working.

The "aha" is that all of these metrics get aggregated based on just that string. So if you have

Server 1: config.broadcast.project-connections.connections:99|g|#statistic:value,type:grpc

Server 2: config.broadcast.project-connections.connections:0|g|#statistic:value,type:grpc

A gauge is expecting a single value at any given point, so what we end up with here is a heisengauge that could be either 0 or 99. Our sum doesn't work, because we don't have a two data points to sum across. We just have one value that is flapping back and forth.

The gotcha

Now we know what's up, and it's definitely a sad state of affairs. What we do want is outputting a different key per pod and then summing across those. But why aren't these metrics getting tagged with the pod?

It turns out that https://micronaut-projects.github.io/micronaut-micrometer/latest/guide/#metricsAndReportersDatadog hits DataDog directly, not my local Datadog agent which is normally responsible for adding these host & pod tags.

Since it goes straight there and we aren't explicitly sending a pod or host tag, these metrics are clobbering each other.

Two solutions

1) Point your metrics to your datadog agent and get the host tags that way

This makes a lot of sense, but I wasn't able to get it working easily.

2) Set CommonTags Yourself

The other solution is to calculate the same DataDog hostname that the datadog agent uses and manually add that as a commonTag to our MetricRegistry. Doing that looks like this:

@Order(Integer.MAX_VALUE)
@Singleton
@RequiresMetrics
public class MetricFactory
implements MeterRegistryConfigurer<DatadogMeterRegistry>, Ordered {

@Property(name = "gcp.project-id")
protected String projectId;

@Override
public void configure(DatadogMeterRegistry meterRegistry) {
List<Tag> tags = new ArrayList<>();
addIfNotNull(tags, "env", "MICRONAUT_ENVIRONMENTS");
addIfNotNull(tags, "service", "DD_SERVICE");
addIfNotNull(tags, "version", "DD_VERSION");
addIfNotNull(tags, "pod_name", "POD_ID");

if (System.getenv("SPEC_NODENAME") != null) { // contruct the hostname that datadog agent uses
final String hostName =
"%s.%s".formatted(System.getenv("SPEC_NODENAME"), projectId);
tags.add(Tag.of("host", hostName));
}

meterRegistry.config().commonTags(tags);
}

private void addIfNotNull(List<Tag> tags, String tagName, String envVar) {
if (System.getenv(envVar) != null) {
tags.add(Tag.of(tagName, System.getenv(envVar)));
}
}

@Override
public Class<DatadogMeterRegistry> getType() {
return DatadogMeterRegistry.class;
}
}

Passing the node & pod names in required some kubernetes yaml work so that the pod name and node name were available as environment variables.

spec:
containers:
- image: gcr.io/-----
name: -----------
env:
- name: SPEC_NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_ID
valueFrom:
fieldRef:
fieldPath: metadata.name

Wrap

With all of that in place we're finally in a good place. Our gauges are independently gauging and our sum is working as expected.

Yay

Like what you read? You might want to check out what we're building at Prefab. Feature flags, dynamic config, and dynamic log levels. Free trials and great pricing for all of it.
See our Feature Flags