...
Creating a new Event Profile
Each new ZEN Master environment provides the ‘Default’ Event Profile, which cannot be edited, however the default profile can be cloned to create new editable profiles.
...
The Event Profile table is split into sections based on object type, and within each type, rules are grouped into similar state issues.
...
With a new or cloned Event Profiles, each individual rule can be edited to re-configure the escalation policy that bests suit the needs of production and monitoring operations teams.
Whenever you modify an event rule, be sure to click the Save button. You can also see the save status of the rule next too its title. The system will confirm “Not Saved” or “Modified” after saving.
...
After having modified an event rule, the rule can be set back to the default values by using the ‘Reset’ button.
...
Event Profile Alert Rule Types
All rules within the Event Profile fall under one of two rule types:
Boolean Rule Type
Threshold Rule Type
Boolean Rule Type
The "Boolean rule type" is used for static states, there is no meaningful scale of degradation, but the state is simply ‘OK’ or not.
...
Broadcaster - Configuration - Error Rule
For example, the Broadcaster Configuration Error rule is parsed when a broadcaster’s configuration does not match the configuration expected by ZEN Master. Historical data or previous confirmation results are not used in this case. When this error occurs, it must be addressed.
This type of rule offers three options that can be enabled/disabled:
Generate OK Notification - If enabled, the system will generate an ‘Ok’ alert email and send it to the user and groups who have email privileges for this object.
Generate Error Notification - If enabled, the system will generate an ‘Error’ alert email and send it to the user and groups who have email privileges for this object.
Ignore object state for notification generation – ZEN Master objects can have multiple issues, e.g., a Broadcaster with high CPU and full hard drive. An object already in an error state may later have cause to error for a secondary reason. By default, the initial email notification is sufficient, but in cases where a specific issue warrants a secondary email, select this option to force a notification even if a previous email has been sent.
Boolean Rules Type with Escalation Control
Some Boolean rules provide further controls for how the system manages and alerts on errors states within a definable time window. These rules are designed to serve two purposes:
To address flapping states - Flapping states occur when a problem is intermittent. For example, when a monitored object jumps continually between good and bad states. Users can be overloaded and inundated with email.
To address continuous warning states - it guarantees that an object will not sit in a continual warning state, never escalating to an Error status or generating an alert email. The rule ensures operator are made aware the issue.
The "Escalation Rule" provides additional controls under the Event Type toggle to tune ZEN Masters propensity to either subdue or escalate warnings to full Errors and send an email notification.
When the Event Type toggle is set to ‘Error’ the rule will immediately escalate an issue to an Error status, bypassing any intermediate warning state.
...
When it is set to ‘Warning’, the rule provides two additional controls:
Event Escalation Count – the number of events needed to occur within a window for the system to escalate to a full Error.
Escalation/de-escalation window – specifies the timeframe to escalate/De-Escalate in minutes.
...
If the event type configuration is set to ‘Warning’, the following will occur:
The object status will elevate to a ‘Warning’ upon the first and any subsequent continuous occurrences of the event.
It will return to ‘Ok’ if the state recovers.
Subsequent events within the allotted a time window will again limit the escalation to a Warning status.
The object will escalate to an Error status when the total number of Event occurrences within the time window configured first reaches the event count escalation based on number occurrences and time window. If the event does not occur again in the time window, then it will de-escalate back to an OK state.
If the event type configuration is set to ‘Error’ the following will occur:
It will be an error event upon the first occurrence of the event. When event type is set to Error there is no de-escalation process.
A notification is sent.
The ‘Track only, do not update object state’ option specifies that any escalation event is to be registered in the event log only. This option is useful when the issue occurrence itself if not a primary reason to alert the operator, just useful information to log for a later time. If the event type configuration is set to ‘Track only’, the following will occur:
ZEN Master alert system will obey the escalation settings, as described above.
The issue triggering the rule and any resulting escalation, will not escalate the object itself to a warning or error.
The Threshold Rule Type
The Threshold Rule provides controls for events where both thresholds are relevant, of which there are 2 categories.
Some thresholds cannot be noisy, or flap above and below the defined threshold. For example, the broadcaster license expiration warning can be set to a specific number of days in advance of expiration. The number of days remain on the license can only decrease.
...
In such a case the object is escalated to a warning status when the first threshold is met and escalated to an error with corresponding notification emails when the second threshold is met.
The second category of Threshold Rules is used for volatile metrics and provides controls for events where both thresholds are relevant, and de-escalation can occur, such as CPU Usage.
For this type of event, ZEN Master will again escalate the object to a Warning or Error status when the corresponding threshold is crossed. However, once the value falls back and stays below the threshold for the specified de-escalation window duration, the object will de-escalate back to the state of the most recent confirmation result.\
When user notifications are enabled, they will occur on ‘Error’ and / or ‘Ok’ events.
Applying Event Profiles
Once defined, an Event Profile can be applied individually to each resource such as Channels, Broadcasters, Sources, etc. in ZEN Master. The “Events Configuration Profile” pull-down is shown when creating or editing an object in ZEN Master, where the appropriate event profile can be selected.
...
If an actively applied profile is edited and saved, the change will propagate to all objects using the event profile. If a new profile is created, it can be applied to many objects at a time, using the multi-select and edit capability on the list pages for sources broadcasters etc. In either case, changing and objects event profile will not cause a restart. In addition, the change will not affect any active escalations. i.e., if an object is in an ‘Error’ state, and its profile is edited to disable the actively triggered rule, the object will stay in Error until its resolved.
Escalation Rule Example
Consider a healthy Zixi Source that suddenly encounters intermittent packet loss over the course of several minutes, which eventually recovers. Notifications for packet loss, resulting in CC Errors, are controlled by the “Source – Analysis – TR101 P1 Transport” Rule of the Event Profile, which is configured as follows:
...
The escalation window is set to 10 minutes and an escalation count of 4 events. According to the rule, notification emails should be sent for both Error and OK states.
The resulting behavior of the monitoring system is as follows:
...
ZEN Master checks the source. The source is actively streaming, and packets are delivered. For each confirmation during the rolling escalation widow, the response is good.
...
The source has recovered, at the next check the response indicates the source is good. ZEN Master returns the source into a good state.
...
Further packet loss occurs, and the following checks return a non-positive result of a source exhibiting CC Errors. ZEN Master has returned the source to a warning state. Note that the cumulative event total is now 2, and that concurrent non-positive results are considered a single ongoing event.
...
Monitoring of the source continues, returning several more CC Error results between to concurrent sets of good responses. The event count has reached 3, when the following check returns another non-positive result. The event count within the window is now 4, meeting the escalation requirement. ZEN Master now escalates the state of the source to an Error. Due to the notification settings, an Alert email is sent to users with notification privileges on the source.
...
The next and subsequent confirmations return positive results, but the Source state has already met the Error criteria within the window and remains in Error.
...
Once the Error criteria is met, further responses indicating CC Errors within the window will cause the Error state on the Source to persist.
...
The source recovers, and subsequent checks return positive state. After a full de-escalation window of good responses ZEN Master returns the source to a Good status and. Due to the notification settings, an Alert email is sent to all users with notification privileges on the source
...
In scenario 2, the healthy source encounters sustained interruption before recovery.
The healthy source begins to return non-positive responses indicating CC Errors, and ZEN Master places the source in a warning state. The cumulative event count is 1.
Packet loss in ongoing for almost the full duration of the escalation window, and the confirmations continue to return non-positive responses indicating CC Errors. The Event count remains 1 and the warning states persists.
Upon another non-positive response, if the full duration of the escalation window is reached, ZEN Master escalates the state of the source to an Error. Note that in this scenario the event count did not exceed one, but continual non-positive events for the duration of the window result in an Error state . Due to the notification settings, an email Alert is generated to all users with notification privileges on the source.
The source recovers, and subsequent confirmations return a good state. After a full de-escalation window of good responses (i.e., 10 minutes and 4 event couts), ZEN Master finally returns the source to an OK state. ZEN Master generates an Alert email to all users with notification privileges on the source.
...