how to

setting up cpu and memory alerts for containerized applications

the short answer

setting up cpu and memory alerts for containerized applications means choosing thresholds that catch real problems early without generating noise from normal operational variation. with gromitor, you set per-container thresholds in the dashboard — no alertmanager yaml, no PagerDuty routing rules — and get notified by email or in-app when a container breaches them. the key is picking the right thresholds, which depends on whether your container's resource usage is expected to be spiky or steady.

virtual machines tend to have relatively stable resource baselines. containers are designed to be ephemeral, dense, and variable. a container running a node.js API might idle at 0.5% cpu and spike to 80% during a request burst, then return to baseline — all in a few seconds. alerting strategies that work for VMs (alert at 70% cpu) produce constant noise in container environments.

memory is especially important for containers because it has a hard limit. when a container exceeds its memory limit, the kernel OOM-kills it immediately. there's no swap, no graceful degradation. an alert at 80% memory is your early warning to act before the OOM killer does.

52%of on-call engineers report receiving more than 10 alert notifications per week that required no action — a primary driver of alert fatigueSource: PagerDuty State of Digital Operations Report, 2023

threshold strategy by workload type

the right thresholds depend on what the container does. for stateless web services, alert on sustained cpu above 70–80% for 5+ minutes (short spikes are normal) and memory above 80% of limit at any point. for stateful workloads like databases and caches, alert on cpu above 60% sustained (these workloads shouldn't be compute-bound) and memory above 70% of limit (memory growth in databases often precedes crashes).

for batch jobs and workers, cpu alerting is often not useful — high cpu is expected and normal. focus alerting on memory (does it return to baseline after the job?) and job completion signals at the application level. sidecar containers (log shippers, proxies) should alert on any sustained cpu above 20%, and queue consumers should alert on memory growth over time (a consumer not draining the queue).

tuning thresholds to reduce noise

the biggest alerting mistake is setting thresholds on day one and never revisiting them. spend the first two weeks watching your containers' actual behavior in the gromitor dashboard without any alerts set. note the typical range for cpu and memory during peak and off-peak hours. set your initial thresholds well above that typical range — say, 2x the peak baseline — and adjust downward as you get a feel for what's anomalous vs. normal.

alert fatigue is a real operational risk. an alert that fires too often becomes background noise. the gromitor dashboard helps with this because you can look at the historical trend for any container and see whether a threshold would have produced too many false positives over the past 24 hours.

integrating alerts into your workflow

in-app alerts are useful when you're actively watching the gromitor dashboard. email alerts are better for off-hours coverage — you get a notification in your inbox when something crosses a threshold overnight, and you can assess severity in the morning. for teams that need immediate paging, gromitor's roadmap includes webhook delivery which enables integration with PagerDuty, OpsGenie, and Slack.

for related guidance, the container memory alerts across multiple cloud environments article covers multi-cloud alert configuration, and the how to monitor docker cpu usage in real-time article covers the cpu monitoring fundamentals in more depth.

how it works

  1. 01

    watch the baseline first

    run for two weeks with no alerts and note the typical cpu and memory range for each container during peak and off-peak hours.

  2. 02

    pick thresholds by workload type

    stateless APIs: cpu 70–80% sustained 5m, memory 80% of limit; databases/caches: cpu 60% sustained, memory 70%; batch jobs: skip cpu, watch memory return to baseline.

  3. 03

    set the alert in gromitor

    open a container's alerts tab, choose the metric, threshold value, a sustained duration, and a delivery method (in-app, email, or both).

  4. 04

    use name patterns and tune down

    cover groups like `worker-1..10` with one name-pattern rule, then adjust thresholds downward as you learn what's normal vs. anomalous.

frequently asked

should i set cpu alerts as percentage of host or percentage of container limit?
percentage of container limit is more meaningful for catching misconfigured workloads — a container at 100% of its cpu limit is throttled regardless of what the host is doing. percentage of host is useful for capacity planning — knowing that your containers are collectively using 80% of host cpu helps with provisioning decisions. gromitor shows both views.
how do i avoid waking up on call for a 30-second cpu spike during a cron job?
use a sustained duration on your alerts. gromitor's sustained duration setting requires the threshold to be exceeded for a rolling window (e.g. 5 minutes) before the alert fires. a 30-second spike won't trigger a 5-minute sustained alert, but a runaway process that has been at 95% cpu for 10 minutes will.
can i alert on cpu throttling specifically, not just cpu usage?
not currently. gromitor alerts on cpu utilization percentage. cpu throttling (where the container is being rate-limited by the kernel because it exceeded its cpu quota) is a related but distinct metric that requires cgroup-level data beyond what the Docker stats API exposes at this level of abstraction.
what if i have a container whose memory usage legitimately grows over time?
some containers (JVM applications with generational GC, for example) have memory that grows until GC kicks in and drops it back down — a sawtooth pattern. for these, alert on memory trend rather than absolute value: if memory has been growing for 30+ minutes without a GC drop, that's worth an alert. gromitor's trend-based alerting handles this case better than a simple threshold.

Published June 9, 2026 · Last updated June 16, 2026

ready to try gromitor?

open gromitor