One of the main aspects of monitoring connected products is their health status and the presence of failures.
Failure events can be used to identify the presence of critical problems, for instance the product is blocked, down, or unavailable.
A Failure is described by:
-
Name: the name of the failure (e.g. TAPE_ERROR_1).
-
Title: the user-friendly name of the event (e.g. Tape blocked).
-
Description: the event description (e.g. The tape is blocked due to the presence of.....).
-
Group: the name of the group to which this event belongs.
The group can be used to filters and groups events into the DPS pages. -
Topics: the set of topics to which this event relates (e.g. CONNECTION, UPTIME, MAINTENANCE).
Topics are predefined and can be used to filters and groups events into the DPS pages. -
Active/Clear Conditions: the metric/property-based conditions used to activate/deactivate the event instance.
In case of property-based condition, you can use only properties of type DATE. - Troubleshooting: a set of remedies the user can follow to fix the problem by himself.
-
Options: a set of variables that can be used to let the user customize the thresholds for event triggering.
- Alert: how to notify the event to the users.
Note that a FAILURE event, by default is associated to CRITICAL severity and FAILURE category.
Creating a Failure
To add a new Failure event, you should:
- Enter the Events / Failures page.
- Select the Location or Thing Definitions tab, depending on where you want to create the event.
- In case of Thing Definition, select the Thing Definition to edit.
- Press the Add Event button.
- Provide the required information.
- Press the Save button and edit the additional information, if needed.
Editing a Failure
A Failure event is described by the following sections:
General
The main section describing the alert through these properties:
-
Name: specifies the name of the alert, it is a free value (e.g. TEMP WARNING).
-
Title: the alert shown title within the alert list.
-
Description: the text describing the occurred problem.
-
Severity: fixed to CRITICAL
-
Category: fixed to FAILURE
By selecting the Limit the alert visibility depending on the user type checkbox, it is possible to profile the alert visibility to a specific set of User Types.
By selecting the Limit this event computation on a specific time period checkbox, it is possible to define a Start and End day of the year when the event can be activated.
Within the event Title, and Description it is possible to use placeholders to include information about the thing, the event, and measures.
Dashboard
- Event Template: the template used to display the event details in a dedicated dashboard.
Active Condition
This section allows defining when the event must be activated.
Note that, an event is evaluated periodically and is triggered only when the active condition changes from false to true and remains true for at least one evaluation interval (e.g. 60 seconds). This means that metric values with a duration of less than one evaluation interval (e.g. 60 seconds) may not be detected in order to correctly trigger the event.
In addition, you can specify the Minimum Active Time the condition must remain true before activating the event.
The condition can be defined by selecting between:
- Manual: allows specifying whether the event must be manually activated by using the Manual Event Reporting widget. Once the event has been manually activated, it can be cleared automatically by using the Clear Condition or manually by using the Clear action within the Active Alert List widget.
- Simple: you can select a metric, predicate, and specify a value that can be static (e.g. Temperature 1 > 100 °C), read from another metric (e.g., Temperature 1 > Temperature 2), or read from a property (e.g. Temperature > thing.properties.maxTemperature).
- Expression: you can select one or more metrics and properties and combine the values into a mathematical expression (e.g. (temp1 + temp2) / 2).
Clear Condition
This section allows defining when the event must be cleared.
If unspecified, the negated active condition is used instead.
The condition can be defined by selecting between:
- Simple: you can select a metric, predicate, and specify a value that can be static (e.g. Temperature 1 > 100 °C), read from another metric (e.g., Temperature 1 > Temperature 2), or read from a property (e.g. Temperature > thing.properties.maxTemperature).
- Expression: you can select one or more metrics and properties and combine the values into a mathematical expression (e.g. (temp1 + temp2) / 2).
Technical Description and Remedies
Here you can describe more technically the event that has occurred and which are the causes and impacts. Optionally, you can provide a set of remedies, the user can follow to fix the problem by itself.
Remedies are presented to the user through the Thing Troubleshooting widget.
Options
This section allows defining options to be used within the event definition and whose value can be redefined by the end-user within the page by suing the thing-options widget.
For more details, see the Options article.
Alert
This section allows defining whether the Event must be notified to users through an Alert.
By clicking the Add Alert Button, you can create a new Alert for the event.
Event Refactoring
In case you have wrongly defined a failure that should instead be an anomaly, you can convert the event by clicking the Convert to Anomaly button present in the page bottom.
Comments
0 comments
Please sign in to leave a comment.