Skip to content

Icinga 2 dependencies, downtimes and host/service unreachability

There are a few gotchas you have to be aware of when working with Icinga 2 dependencies and downtimes.

Gotcha 1a: Downtimes and dependencies are independenty of each other

Intuitively, I had expected downtime to always traverse down the parent-child dependency tree. It doesn’t. It’s opt-in. The ScheduledDowntime.child_options attribute can be set to DowntimeTriggeredChildren or DowntimeNonTriggeredChildren to make it so. (These options are called “schedule triggered downtime for all child hosts” and “schedule non-triggered downtime for all child hosts”, respectively, in Icingaweb2.) With one of these options set, runtime downtime objects will also be created for (grand)child hosts (but not services; see Gotcha 1b).

Gotcha 1b: Downtimes never traverse to child hosts’ services

In Icingaweb2, when you’re scheduling downtime for a host and choose to also “schedule (non-)triggered downtime for all child hosts”, this excludes services on those child hosts. The “All Services” toggle applies only to the current host. There is an (open, as of May 5 2020) Icingaweb 2 feature request to address this. So far, the only attempt to implement the Icinga 2 side of this was shot down by the Icinga maintainers on the basis of making things too complex. @dnsmichi would prefer rethinking the current options.

If you want to make it easy to schedule downtime for dependency chain involving numerous hosts and/or services, I recommend using a single HostGroup and/or ServiceGroup to make it easy to Shift-select all dependent objects in Icingaweb2 and schedule the downtime in batch. In the worst case you than have to select all objects in each group separately to plan the bulk downtime twice. In some cases, just a HostGroup may do (because in Icingaweb2 you can include downtime for all of a hosts services), but this won’t be sufficient if you have services that depend directly on other services rather than hosts.

From the configuration, it’s not at all possible to include all of a host’s services in the ScheduledDowntime objects. But, here it’s not really an issue, because it’s enough to abstract your downtime particularities into a template and apply that to both the Host and the Service objects that are to be affected.

Gotcha 2a: Child hosts will (almost) never become UNREACHABLE when the parent host fails and Dependency.disable_checks == true

object Host "A" {
}

object Host "B" {
}

object Dependency "B-needs-A" {
  parent_host_name = "A"
  child_host_name = "B"
  disable_notifications = true
  disable_checks = true
}

In this example, when host A goes down, the B-needs-A dependency is activated and notifications about B are suppressed (because disable_notifications == True). However, because checks are also disabled, host B never becomes unreachable, unless if you manually/explicitly trigger a check via the Icingaweb2 interface.

The means that any service on the child host (B in this example) will still generate notifications, because the (default) host-service dependencies will not be activated until the child host becomes UNREACHABLE. (Of course, any other non-UP state of the child host would also activate the the host-service dependencies.) The same goes for grandchild hosts.

So, if you want a child host to become UNREACHABLE when the parent host fails, Dependency.disable_checks must be false. Only as soon as the check fails will the host become UNREACHABLE.

Gotcha 2b: Grandchild dependencies don’t become active until the child/parent in between them fails

Dependencies are always between a parent and a child. Icinga never traverses further along the tree to determine that a grandchild should be UNREACHABLE rather than DOWN.

Take the following setup:

object Host "A" {
}

object Host "B" {
}

object Host "C" {
}

object Dependency "B-needs-A" {
  parent_host_name = "A"
  child_host_name = "B"
  disable_notifications = true
}

object Dependency "C-needs-B" {
  parent_host_name = "B"
  child_host_name = "C"
  disable_notifications = true
}

If host A fails, host B doesn’t become UNREACHABLE until its check_command returns a not-OK status. The same goes for host B and C. And, despite disable_notifications = true, problems with host C will generate notifications as long as host B is Up. Therefore, to avoid needless notifications, you must always make sure that the hosts go down in the order of the dependency chain. You can do this by playing with check_interval, max_check_attempts, and retry_interval. And, make sure that disable_checks is always false for any intermediate host or service in the dependency chain!


    No Comments ( Add comment / trackback )