Troubleshooting

Common issues and solutions for the metrics package

7 minute read

Solutions to common issues when using the metrics package.

Service Name Not Correct

Symptoms

service_name="rivaas-service" instead of your configured name
service_name="unknown_service:main" in target_info metric
Wrong service name in dashboards

Solutions

1. Use WithServiceName Option

Always specify your service name when creating the recorder:

recorder := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithServiceName("my-actual-service"),  // Set your service name
)

2. Check target_info Metric

The target_info metric shows OpenTelemetry resource information:

target_info{service_name="my-actual-service",service_version="1.0.0"} 1

If you see unknown_service:main, make sure you’re using the latest version of the metrics package.

3. Verify Configuration Order

Options can be passed in any order. The service name will be applied correctly:

// Both work the same
recorder := metrics.MustNew(
    metrics.WithServiceName("my-service"),
    metrics.WithPrometheus(":9090", "/metrics"),
)

recorder := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithServiceName("my-service"),
)

4. Check Where Service Name Appears

The service name shows up in two places:

Metric labels: Every metric has a service_name label

http_requests_total{service_name="my-service",method="GET"} 42

Target info: Resource metadata metric

target_info{service_name="my-service",service_version="1.0.0"} 1

Metrics Not Appearing

OTLP Provider

Symptoms:

Metrics not visible in collector
No data in monitoring system
Silent failures

Solutions:

1. Call Start() Before Recording

The OTLP provider requires Start() to be called before recording metrics:

recorder := metrics.MustNew(
    metrics.WithOTLP("http://localhost:4318"),
    metrics.WithServiceName("my-service"),
)

// IMPORTANT: Call Start() before recording
if err := recorder.Start(ctx); err != nil {
    log.Fatal(err)
}

// Now recording works
_ = recorder.IncrementCounter(ctx, "requests_total")

2. Check OTLP Collector Reachability

Verify the collector is accessible:

# Test connectivity
curl http://localhost:4318/v1/metrics

# Check collector logs
docker logs otel-collector

3. Wait for Export Interval

OTLP exports metrics periodically (default: 30s):

// Reduce interval for testing
recorder := metrics.MustNew(
    metrics.WithOTLP("http://localhost:4318"),
    metrics.WithExportInterval(5 * time.Second),
    metrics.WithServiceName("my-service"),
)

Or force immediate export:

if err := recorder.ForceFlush(ctx); err != nil {
    log.Printf("Failed to flush: %v", err)
}

4. Enable Logging

Add logging to see what’s happening:

recorder := metrics.MustNew(
    metrics.WithOTLP("http://localhost:4318"),
    metrics.WithLogger(slog.Default()),
    metrics.WithServiceName("my-service"),
)

Prometheus Provider

Symptoms:

Metrics endpoint returns 404
Empty metrics output
Server not accessible

Solutions:

1. Call Start() to Start Server

recorder := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithServiceName("my-service"),
)

// Start the HTTP server
if err := recorder.Start(ctx); err != nil {
    log.Fatal(err)
}

2. Check Actual Address

If not using strict mode, server may use different port:

address := recorder.ServerAddress()
log.Printf("Metrics at: http://%s/metrics", address)

3. Verify Firewall/Network

Check if port is accessible:

# Test locally
curl http://localhost:9090/metrics

# Check from another machine
curl http://<server-ip>:9090/metrics

Stdout Provider

Symptoms:

No output to console
Metrics not visible

Solutions:

1. Wait for Export Interval

Stdout exports periodically (default: 30s):

recorder := metrics.MustNew(
    metrics.WithStdout(),
    metrics.WithExportInterval(5 * time.Second),  // Shorter interval
    metrics.WithServiceName("my-service"),
)

2. Force Flush

if err := recorder.ForceFlush(ctx); err != nil {
    log.Printf("Failed to flush: %v", err)
}

Port Conflicts

Symptoms

Error: address already in use
Metrics server fails to start
Different port than expected

Solutions

1. Use Strict Port Mode (Production)

Fail explicitly if port unavailable:

recorder := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithStrictPort(),  // Fail if 9090 unavailable
    metrics.WithServiceName("my-service"),
)

2. Check Port Usage

Find what’s using the port:

# Linux/macOS
lsof -i :9090
netstat -tuln | grep 9090

# Windows
netstat -ano | findstr :9090

3. Use Dynamic Port (Testing)

Let the system choose an available port:

recorder := metrics.MustNew(
    metrics.WithPrometheus(":0", "/metrics"),  // :0 = any available port
    metrics.WithServiceName("test-service"),
)
recorder.Start(ctx)

// Get actual port
address := recorder.ServerAddress()
log.Printf("Using port: %s", address)

4. Use Testing Utilities

For tests, use the testing utilities with automatic port allocation:

func TestMetrics(t *testing.T) {
    t.Parallel()
    recorder := metrics.TestingRecorderWithPrometheus(t, "test-service")
    // Automatically finds available port
}

Custom Metric Limit Reached

Symptoms

Error: custom metric limit reached
New metrics not created
Warning in logs

Solutions

1. Increase Limit

recorder := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithMaxCustomMetrics(5000),  // Increase from default 1000
    metrics.WithServiceName("my-service"),
)

2. Monitor Usage

Track how many custom metrics are created:

count := recorder.CustomMetricCount()
log.Printf("Custom metrics: %d/%d", count, maxLimit)

// Expose as a metric
_ = recorder.SetGauge(ctx, "custom_metrics_count", float64(count))

3. Review Metric Cardinality

Check if you’re creating too many unique metrics:

// BAD: High cardinality (unique per user)
_ = recorder.IncrementCounter(ctx, "user_"+userID+"_requests")

// GOOD: Low cardinality (use labels)
_ = recorder.IncrementCounter(ctx, "user_requests_total",
    attribute.String("user_type", userType),
)

4. Consolidate Metrics

Combine similar metrics:

// BAD: Many separate metrics
_ = recorder.IncrementCounter(ctx, "get_requests_total")
_ = recorder.IncrementCounter(ctx, "post_requests_total")
_ = recorder.IncrementCounter(ctx, "put_requests_total")

// GOOD: One metric with label
_ = recorder.IncrementCounter(ctx, "requests_total",
    attribute.String("method", "GET"),
)

What Counts as Custom Metric?

Counts:

Each unique metric name created with IncrementCounter, AddCounter, RecordHistogram, SetGauge

Does NOT count:

Built-in HTTP metrics
Different label combinations of same metric
Re-recording same metric name

Metrics Server Not Starting

Symptoms

Start() returns error
Server not accessible
No metrics endpoint

Solutions

1. Check Context

Ensure context is not canceled:

ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
defer cancel()

// Use context with Start
if err := recorder.Start(ctx); err != nil {
    log.Fatal(err)
}

2. Check Port Availability

See Port Conflicts section.

3. Enable Logging

recorder := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithLogger(slog.Default()),
    metrics.WithServiceName("my-service"),
)

4. Check Permissions

Ensure your process has permission to bind to the port (< 1024 requires root on Linux).

Invalid Metric Names

Symptoms

Error: invalid metric name
Metrics not recorded
Reserved prefix error

Solutions

1. Check Naming Rules

Metric names must:

Start with letter (a-z, A-Z)
Contain only: letters, numbers, underscores, dots, hyphens
Not use reserved prefixes: __, http_, router_
Maximum 255 characters

Valid:

_ = recorder.IncrementCounter(ctx, "orders_total")
_ = recorder.IncrementCounter(ctx, "api.v1.requests")
_ = recorder.IncrementCounter(ctx, "payment-success")

Invalid:

_ = recorder.IncrementCounter(ctx, "__internal")      // Reserved prefix
_ = recorder.IncrementCounter(ctx, "http_custom")     // Reserved prefix
_ = recorder.IncrementCounter(ctx, "router_gauge")    // Reserved prefix
_ = recorder.IncrementCounter(ctx, "1st_metric")      // Starts with number
_ = recorder.IncrementCounter(ctx, "my metric!")      // Invalid characters

2. Handle Errors

Check for naming errors:

if err := recorder.IncrementCounter(ctx, metricName); err != nil {
    log.Printf("Invalid metric name %q: %v", metricName, err)
}

High Memory Usage

Symptoms

Excessive memory consumption
Out of memory errors
Slow performance

Solutions

1. Reduce Metric Cardinality

Limit unique label combinations:

// BAD: High cardinality
_ = recorder.IncrementCounter(ctx, "requests_total",
    attribute.String("user_id", userID),        // Millions of values
    attribute.String("request_id", requestID),  // Always unique
)

// GOOD: Low cardinality
_ = recorder.IncrementCounter(ctx, "requests_total",
    attribute.String("user_type", userType),    // Few values
    attribute.String("region", region),          // Few values
)

2. Exclude High-Cardinality Paths

handler := metrics.Middleware(recorder,
    metrics.WithExcludePatterns(
        `^/api/users/[0-9]+$`,      // User IDs
        `^/api/orders/[a-z0-9-]+$`, // Order IDs
    ),
)(mux)

3. Reduce Histogram Buckets

// BAD: Too many buckets (15)
metrics.WithDurationBuckets(
    0.001, 0.005, 0.01, 0.025, 0.05,
    0.1, 0.25, 0.5, 1, 2.5,
    5, 10, 30, 60, 120,
)

// GOOD: Fewer buckets (7)
metrics.WithDurationBuckets(0.01, 0.1, 0.5, 1, 5, 10)

4. Monitor Custom Metrics

count := recorder.CustomMetricCount()
if count > 500 {
    log.Printf("WARNING: High custom metric count: %d", count)
}

Performance Issues

HTTP Middleware Overhead

Symptom: Slow request handling

Solution: Exclude high-traffic paths:

handler := metrics.Middleware(recorder,
    metrics.WithExcludePaths("/health"),  // Called frequently
    metrics.WithExcludePrefixes("/static/"),  // Static assets
)(mux)

Histogram Recording Slow

Symptom: High CPU usage

Solution: Reduce bucket count (see High Memory Usage).

Global State Issues

Symptoms

Multiple recorder instances conflict
Unexpected behavior with multiple services
Global meter provider issues

Solutions

1. Use Default Behavior (Recommended)

By default, recorders do NOT set global meter provider:

// These work independently
recorder1 := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithServiceName("service-1"),
)

recorder2 := metrics.MustNew(
    metrics.WithStdout(),
    metrics.WithServiceName("service-2"),
)

2. Avoid WithGlobalMeterProvider

Only use WithGlobalMeterProvider() if you need:

OpenTelemetry instrumentation libraries to use your provider
otel.GetMeterProvider() to return your provider

// Only if needed
recorder := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithGlobalMeterProvider(),  // Explicit opt-in
    metrics.WithServiceName("my-service"),
)

Thread Safety

All Recorder methods are thread-safe. No special handling needed for concurrent access:

// Safe to call from multiple goroutines
go func() {
    _ = recorder.IncrementCounter(ctx, "worker_1")
}()

go func() {
    _ = recorder.IncrementCounter(ctx, "worker_2")
}()

Shutdown Issues

Graceful Shutdown Not Working

Solution: Use proper timeout context:

shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

if err := recorder.Shutdown(shutdownCtx); err != nil {
    log.Printf("Shutdown error: %v", err)
}

Metrics Not Flushed on Exit

Solution: Always defer Shutdown():

recorder := metrics.MustNew(
    metrics.WithOTLP("http://localhost:4318"),
    metrics.WithServiceName("my-service"),
)
recorder.Start(ctx)

defer func() {
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()
    recorder.Shutdown(shutdownCtx)
}()

Testing Issues

Port Conflicts in Parallel Tests

Solution: Use testing utilities with dynamic ports:

func TestHandler(t *testing.T) {
    t.Parallel()  // Safe with TestingRecorder
    
    // Uses stdout, no port needed
    recorder := metrics.TestingRecorder(t, "test-service")
    
    // Or with Prometheus (dynamic port)
    recorder := metrics.TestingRecorderWithPrometheus(t, "test-service")
}

Server Not Ready

Solution: Wait for server:

recorder := metrics.TestingRecorderWithPrometheus(t, "test-service")

err := metrics.WaitForMetricsServer(t, recorder.ServerAddress(), 5*time.Second)
if err != nil {
    t.Fatal(err)
}

Getting Help

If you’re still experiencing issues:

Check logs: Enable logging with WithLogger(slog.Default())
Review configuration: Verify all options are correct
Test connectivity: Ensure network access to endpoints
Check version: Update to latest version
File an issue: GitHub Issues

Quick Reference

Common Patterns

Production Setup:

recorder := metrics.MustNew(
    metrics.WithPrometheus(":9090", "/metrics"),
    metrics.WithStrictPort(),
    metrics.WithServiceName("my-api"),
    metrics.WithServiceVersion(version),
    metrics.WithLogger(slog.Default()),
)

OTLP Setup:

recorder := metrics.MustNew(
    metrics.WithOTLP("http://localhost:4318"),
    metrics.WithServiceName("my-service"),
)
// IMPORTANT: Call Start() before recording
recorder.Start(ctx)

Testing Setup:

func TestMetrics(t *testing.T) {
    t.Parallel()
    recorder := metrics.TestingRecorder(t, "test-service")
    // Test code...
}

Next Steps

Review Configuration Guide for setup examples
Check API Reference for method details
See Examples for complete applications
Read Basic Usage for fundamentals

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified March 17, 2026: Update OpenAPI documentation to introduce UISnapshot type (a6248a2)