Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
wsrtinfo:sensoralarmdescription [2008/06/26 09:44] arnoschoenmakerswsrtinfo:sensoralarmdescription [2009/08/24 07:11] (current) arnoschoenmakers
Line 1: Line 1:
 ====== Description of sensorAlarm ====== ====== Description of sensorAlarm ======
  
-SensorAlarm pools the database filled by ''readsensor'' and checks for alarm conditions. Alarms are given based on criteria. The table with all criteria and actions is given below.+Program ''sensorAlarm'' polls the database filled by ''readsensor'' and checks for alarm conditions. Alarms are given based on criteria and are provided in a file. The table with all criteria and actions is given [[#alarmconditions|below]]. 
 + 
 +===== Location of the code ===== 
 + 
 +The code for ''sensorAlarm'' can be found in CVS, in the module ''wsrt_mac/alarms''. The following files are used by the ''sensorAlarm'' program:   
 +   * ''alarmActions.py'' -> class to execute shell-commands and send mail to the observer on duty. 
 +   * ''class_sensorAlarm.py'' -> class that does most of the things 
 +   * ''DBConnection.py'' -> class for interfacing with a MySQL database  
 +   * ''log.py'' -> defines a function to print a log message with date/time stamp. 
 +   * ''sensorAlarm.py'' -> Main program 
 + 
 +The program can be started by executing ''sensorAlarm.py''; no commandline options are needed for a default run. It should be started by user ''monitor'' on the WSRT monitor host (now wop10). The files should be present in directory ''/wsrt/bin'' on wop10. 
 + 
 +===== Component description ===== 
 + 
 +==== alarmActions class ==== 
 + 
 +The alarmActions class handles the execution of shell-commands. It provides a mechanism to stop a command after it has been running for a certain amount of time (timeout). The default timeout is one second. 
 + 
 +In case of problems with executing a command (timeout, or a non-zero exit status) the observer will be notified by email/SMS. This uses the [[wsrtinfo:sendmaildescription|sendMail]] program.  
 + 
 +Logmessages are written to a logfile (see class ''log.py''). 
 + 
 +==== sensorAlarm class ==== 
 + 
 +This class is described in the file ''class_sensorAlarm.py''. This class holds all the methods that do the actual checking and alarming. 
 + 
 +The class reads the file holding the alarmconditions. At predefined intervals (to be set at startup of the main program) it will go through the list of alarmconditions and evaluate each condition to be true or false. In case a condition is true, an alarm must be raised. That involves: 
 +  * Executing the action that is described in the alarmconditions file 
 +  * Setting the alarmstatus of all the associated sensors to ''on'' in the Sensor database(s). 
 + 
 +Before obtaining the values, we first check if the sensor database values are recent enough to trust. This limit is set at 10 minutes, i.e., only if the database contains values less than 10 minutes old, the alarmconditions will be evaluated. All values are taken from the local MySQL database (the one running on wop10, that is). 
 + 
 +In the ''alarmconditions.cfg''-file is a column that gives the amount of individual samples/measurements of a sensor value to take into account. This is done in the following way: 
 +  * In case of 1 or 2 measurements, we just take the sample (or the mean of the two) 
 +  * in case of more than 2 samples: we determine the time interval between the latest two samples, and allow only values that are younger than this time interval multiplied by the amount of samples to take into account. This is done so that when the database is not filled for a while (e.g., due to maintenance, or a software problem), only the newer values from after the interruption are included. 
 +  * Of all the 'valid' values, the median is calculated (instead of the average) so we are less vulnerable to accidentally large deviations from the mean (i.e., in case of a sensor readout problem). 
 + 
 +==== DBConnection class ==== 
 + 
 +This class is the interface with a MySQL table. A default constructor connects to the local MySQL server and the database ''Sensor'', but by providing other parameters in the constructor other servers and databases can be reached as well. It has methods to connect, to check if a connection is still valid, to run a 'select'-type query and to run a 'update'-type query. 
 + 
 + 
 +==== sensorAlarm main program ==== 
 + 
 +The main program can be found in file ''sensorAlarm.py''
 + 
 +It handles the following things: 
 +  * Interpret commandline options. 
 +  * Creation of a logfile with a unique name in directory ''/log/alarms''
 +  * Daemonize itself so the user can close the shell after starting the program. 
 +  * Run a never-ending loop in which alarmconditions are checked at regular intervals. 
 + 
 +=== Command-line options === 
 + 
 +To handle the commandline options we use the module [[http://docs.python.org/lib/module-optparse.html|Optionparser]]. This module defines simple functions so that commandline options can be defined and taken along. The available options are:<code> 
 +  -h, --help            show this help message and exit 
 +  -iINTERVAL, --interval=INTERVAL 
 +                        Check interval time in seconds (prefer multiples of 60) 
 +  -fFILE, --inputfile=FILE 
 +                        Name of input alarmconditions config file 
 +  -n, --nodaemonize     Don't run as a daemon process 
 +</code> 
 +The default interval is 120 seconds. The default alarmconditions config file is ''/wsrt/config/alarmconditions.cfg''
 + 
 +=== Logfile === 
 + 
 +The logfile is written when the process is daemonized, only. Otherwise the log messages are written to to terminal. The logfile ends up in directory ''/log/alarms''. Logfiles are named: ''sensorAlarm.yyyymmddThhmmss'', so with a date/time stamp of creation attached. A softlink to the latest logfile is always created as well and named ''sensorAlarm.log''
 + 
 +Also, there is a file called ''sensorAlarm.pid'' that holds the Process PID of the daemonized process. This can be used to kill the process if needed. 
 + 
 +=== The main loop === 
 + 
 +Before we start the loop, we read the alarmconditions file, set up the database connection handlers for two databases (two, as the alarm flag needs to be updated both locally and on the WSRT MySQL server), and reset all alarms in these two databases. 
 + 
 +In the main loop, we do the following things: 
 +  * Check if the local database can still be reached. If not, raise an alarm by sensing a email/SMS to observer. 
 +  * Check if there is recent (less than 10 minutes old) data in the database. If not, raise an alarm by sensing a email/SMS to observer. 
 +  * Check all alarmconditions. If an alarmcondition is met, it will be reported and acted upon only once. If the alarmcondition is over, this will be reported as well (and the alarm flag in the database will be retracted). 
 +  * Wait until the next requested checktime. 
 + 
 +==== Used Python modules ==== 
 + 
 +For ''sensorAlarm'', we have used the following Python modules: 
 +  * For string handling: ''string'' 
 +  * For date and time issues: ''time'' and ''datetime'' 
 +  * For system calls and such : ''os'', ''sys'' and ''popen2'' 
 +  * For signal handling (timeouts): ''signal'' 
 +  * For parsing commandline options: ''optparse'' (see [[http://docs.python.org/lib/module-optparse.html|here]]).  
 +  * For MySQL connections: ''MySQLdb'' (see [[http://mysql-python.sourceforge.net/MySQLdb.html| the manual]] and [[https://sourceforge.net/projects/mysql-python| the project page]]). 
 +  
 + 
 +==== The configuration file ==== 
 + 
 +The ''alarmconditions.cfg''-file has lines like this:<code> 
 + 
 +... 
 +CONSwarm > 27.0 # 5 # sendMail obs 'Console too warm' 
 +CONSwarm > 30.0 # 5 # sendMail obs 'Console meltdown' 
 +CONSwarm < 15.0 # 5 # sendMail obs 'Console too cold' 
 +... 
 +Sup-He-RT0 + Ret-He-RT0 < 15 # 5 # sendMail stiepel 'He-pressure RT0 too low' 
 +Sup-He-RT1 + Ret-He-RT1 < 15 # 5 # sendMail stiepel 'He-pressure RT1 too low' 
 +Sup-He-RT2 + Ret-He-RT2 < 15 # 5 # sendMail stiepel 'He-pressure RT2 too low' 
 +... 
 +</code> 
 +\\ 
 + 
 + 
 + 
 + 
 +  
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
  
  
 ===== Alarmconditions ===== ===== Alarmconditions =====
  
-This is the Alarm conditions table that is used in the software.+This is the Alarm conditions table that is used in the software. It originates from the file ''/wsrt/config/alarmconditions.cfg'' on wop10.
  
 ^ Sensor ^ Type/Unit^ Alarm Condition(s) ^ Nr. of samples ^ Mail/SMS who ^ (Software) Action ^ ^ Sensor ^ Type/Unit^ Alarm Condition(s) ^ Nr. of samples ^ Mail/SMS who ^ (Software) Action ^
 | CCwarm | Temp/C| > 25 | 5 | obs | send mail/SMS | | CCwarm | Temp/C| > 25 | 5 | obs | send mail/SMS |
-| | | > 28 | 5 | obs | Stop observation/power down |+| | | > 28 | 5 | obs | send mail/SMS |
 | | | < 15 | 5 | obs | send mail/SMS | | | | < 15 | 5 | obs | send mail/SMS |
 | CCcold | Temp/C | | | | | | CCcold | Temp/C | | | | |
 | PUMAwarm | Temp/C | > 28 | 5 | obs | send mail/SMS | | PUMAwarm | Temp/C | > 28 | 5 | obs | send mail/SMS |
-| | | > 31 | 5 | obs | Stop observation/power down |+| | | > 31 | 5 | obs | send mail/SMS |
 | | | < 15 | 5 | obs | send mail/SMS | | | | < 15 | 5 | obs | send mail/SMS |
 | PUMAcold | Temp/C | | | | | | PUMAcold | Temp/C | | | | |
 | DZBwarm | Temp/C | > 27 | 5 | obs | send mail/SMS | | DZBwarm | Temp/C | > 27 | 5 | obs | send mail/SMS |
-| | | > 30 | 5 | obs | Stop observation/power down |+| | | > 30 | 5 | obs | send mail/SMS |
 | | | < 15 | 5 | obs | send mail/SMS | | | | < 15 | 5 | obs | send mail/SMS |
 | DZBcold | Temp/C | | | | | | DZBcold | Temp/C | | | | |
 | CONSwarm | Temp/C | > 27 | 5 | obs | send mail/SMS | | CONSwarm | Temp/C | > 27 | 5 | obs | send mail/SMS |
-| | | > 30 | 5 | obs | Stop observation/power down |+| | | > 30 | 5 | obs | send mail/SMS |
 | | | < 15 | 5 | obs | send mail/SMS | | | | < 15 | 5 | obs | send mail/SMS |
 | CONScold | Temp/C |  | |  | | | CONScold | Temp/C |  | |  | |
 | WATERcold | Temp/C | | | | | | WATERcold | Temp/C | | | | |
 | WATERwarm | Temp/C | > 15 | 5 | obs | send mail/SMS | | WATERwarm | Temp/C | > 15 | 5 | obs | send mail/SMS |
-| | | > 18 | 5 | obs | Stop observation/power down |+| | | > 18 | 5 | obs | send mail/SMS |
 | | | < 10 | 5 | obs | send mail/SMS | | | | < 10 | 5 | obs | send mail/SMS |
 | CVwarm | Temp/C | | | | | | CVwarm | Temp/C | | | | |

QR Code
QR Code wsrtinfo:sensoralarmdescription (generated for current page)