threshold strategy by workload type
the right thresholds depend on what the container does. for stateless web services, alert on sustained cpu above 70–80% for 5+ minutes (short spikes are normal) and memory above 80% of limit at any point. for stateful workloads like databases and caches, alert on cpu above 60% sustained (these workloads shouldn't be compute-bound) and memory above 70% of limit (memory growth in databases often precedes crashes).
for batch jobs and workers, cpu alerting is often not useful — high cpu is expected and normal. focus alerting on memory (does it return to baseline after the job?) and job completion signals at the application level. sidecar containers (log shippers, proxies) should alert on any sustained cpu above 20%, and queue consumers should alert on memory growth over time (a consumer not draining the queue).
tuning thresholds to reduce noise
the biggest alerting mistake is setting thresholds on day one and never revisiting them. spend the first two weeks watching your containers' actual behavior in the gromitor dashboard without any alerts set. note the typical range for cpu and memory during peak and off-peak hours. set your initial thresholds well above that typical range — say, 2x the peak baseline — and adjust downward as you get a feel for what's anomalous vs. normal.
alert fatigue is a real operational risk. an alert that fires too often becomes background noise. the gromitor dashboard helps with this because you can look at the historical trend for any container and see whether a threshold would have produced too many false positives over the past 24 hours.
integrating alerts into your workflow
in-app alerts are useful when you're actively watching the gromitor dashboard. email alerts are better for off-hours coverage — you get a notification in your inbox when something crosses a threshold overnight, and you can assess severity in the morning. for teams that need immediate paging, gromitor's roadmap includes webhook delivery which enables integration with PagerDuty, OpsGenie, and Slack.
for related guidance, the container memory alerts across multiple cloud environments article covers multi-cloud alert configuration, and the how to monitor docker cpu usage in real-time article covers the cpu monitoring fundamentals in more depth.
how it works
- 01
watch the baseline first
run for two weeks with no alerts and note the typical cpu and memory range for each container during peak and off-peak hours.
- 02
pick thresholds by workload type
stateless APIs: cpu 70–80% sustained 5m, memory 80% of limit; databases/caches: cpu 60% sustained, memory 70%; batch jobs: skip cpu, watch memory return to baseline.
- 03
set the alert in gromitor
open a container's alerts tab, choose the metric, threshold value, a sustained duration, and a delivery method (in-app, email, or both).
- 04
use name patterns and tune down
cover groups like `worker-1..10` with one name-pattern rule, then adjust thresholds downward as you learn what's normal vs. anomalous.
frequently asked
- should i set cpu alerts as percentage of host or percentage of container limit?
- percentage of container limit is more meaningful for catching misconfigured workloads — a container at 100% of its cpu limit is throttled regardless of what the host is doing. percentage of host is useful for capacity planning — knowing that your containers are collectively using 80% of host cpu helps with provisioning decisions. gromitor shows both views.
- how do i avoid waking up on call for a 30-second cpu spike during a cron job?
- use a sustained duration on your alerts. gromitor's sustained duration setting requires the threshold to be exceeded for a rolling window (e.g. 5 minutes) before the alert fires. a 30-second spike won't trigger a 5-minute sustained alert, but a runaway process that has been at 95% cpu for 10 minutes will.
- can i alert on cpu throttling specifically, not just cpu usage?
- not currently. gromitor alerts on cpu utilization percentage. cpu throttling (where the container is being rate-limited by the kernel because it exceeded its cpu quota) is a related but distinct metric that requires cgroup-level data beyond what the Docker stats API exposes at this level of abstraction.
- what if i have a container whose memory usage legitimately grows over time?
- some containers (JVM applications with generational GC, for example) have memory that grows until GC kicks in and drops it back down — a sawtooth pattern. for these, alert on memory trend rather than absolute value: if memory has been growing for 30+ minutes without a GC drop, that's worth an alert. gromitor's trend-based alerting handles this case better than a simple threshold.
Published June 9, 2026 · Last updated June 16, 2026