Table of Contents


Overview

GPM system is a distributed monitoring system for clusters or Grid nodes. Construction of system is based on a hierarchical design.

Monitoring system is based on the concept of metric. Metric is the basic entity, that can be described by one value at any time. Metric's name is splitted into parts by symbol ':'.
There are some types of metrics:
Host metrics
These metrics descibe the state of host. Name of these metrics contains 4 logical sections:
host:<GridNode or Cluster Name>:<Host Name>:<Host metric name>
For example: 'host:UNN_Cluster:s-cw0-00:mem:avail'

Job metrics
These metrics descibe the job information. Name of these metrics contains 6 logical sections:
job:<Job Identifier (JOBID)>:<GridNode or Cluster Name>:<Host Name>:<Process identifier (PID)>:<Job metric name>
For example: 'job:12e28850-520d-11db-9ecc-c11185f8066c:UNN_Cluster:s-cw0-00:0000:mem:usage'

Other metrics
These metrics can describe other information (for example: from Grid services). The logical structure of this information is not specified, because this information is depended from many factors. We are planned to add the service's information.
The system is component based. The components are:
Monitoring Manager
It is the main component of the monitoring system. Manager is the controller which organizes the system work. Manager gets information from Monitoring Agents and strores this data. Client applications requests a monitoring information from the Manager.
New feature added:
Now some managers can be configured to store the duplicated data. This feature was added to prevent the lost of data during managers restart or server reboot. To use this feature you must setup the same time on all specified servers. The data can be requested from any of duplicated managers. The received data from one manager is same when all managers are in the restored state.
See a
System Configuration for more details.
Agents
This component is intended for collecting system information and information about running tasks at the host. Monitoring Agent is installed on each host you want to monitor. It is system-dependent application, therefore there are different versions of the Agent for Microsoft Windows and Unix OS.

Clients
Monitoring Clients are intended for analyzing of the monitoring information. Clients connects to the Manager and requests the necessary info which can be represented in the table or the diagramm form.
Web service
Monitoring Service is a component for easy access to monitoring data. This component must be installed in Globus Java WS Core container.
System demo is available at the GPM system site.

Up

Installation and running


Requirements for the system

Local machines

  1. The Windows machine
    Operating systemMicrosoft Windows 2000/XP/2003
  2. The Linux machine: the support is planned

Storage machine (manager)

  1. JAVA 1.5 (5.0) or later

The Web server

  1. PHP 4.0 with the socket support
  2. Network access to the manager's machines (TCP/IP v4)

Machine for execution client

  1. JAVA 1.5 (5.0) or later
  2. Browser with the applet support is necessary for the access using the web server
  3. Network access to the storage machine is necessary for the access using standalone client (TCP/IP v4)

The Web service machine

  1. Globus Java WS Core 4.0.3 or above (http://www.globus.org/toolkit/downloads/)
  2. Network access to the manager's machines (TCP/IP v4)

Up

System Configuration

Manager (the storage machine)

The manager.properties file (directory is ./Distrib/manager) contains parameters for Manager

Up

Starting system on Windows machines

Configure the starting scripts (gpmenv.bat)

  1. Add path to gpmenv.bat to environment variable PATH.
  2. Setup the host name of where manager is started (Monitoring Manager). Use localhost if Monitoring Manager started on same machine. It is used by the agents starting script.
    set GPM_MANAGER_HOST=ws-2k-110-05
  3. Setup the Manager port which used by agents. Should be same as in manager.properties
    set GPM_MANAGER_PORT=8999
  4. Set the system base directory variable. Specify the install directory.
    set GPM_BASE_DIR=C:\GPM\distrib
  5. Set the directory with GPM *.jar files. Specify the directory ./Distrib/release
    set GPM_RELEASE_DIR=Y:\GPM\distrib\release

Starting the Manager on the storage machine

Run the script MonitoringManager.java.win.bat in the folder ./manager

Starting Agents on the local machines

Run the script ma.bat in the folder ./Distrib/ma_win or ./Distrib/ma_linux. The parameters '. "" ""' setup the external metric values producer. Second and third parameters are script names, which are used to obtain the values for external metrics. The first parameter is a working directory. The scripts are run in this directory. For additional info about scripts see files in then ./Distrib/ma_win/scripts or ./Distrib/ma_linux/scripts. subdirectories.
Last parameters of command line are the manager's addresses.
Example of command line:
MonitoringAgent GridNode_or_ClusterName workdir hostinfo_script processinfo_script server01:port01 server02:port02

Obtaining the collected information

Run the gpm-monitoringclient.jar in the folder release. Also you can configure the Web Frontend to access via browser.

Up

The Web server installation

  1. Configure the manager.
  2. If firewall is used then configure it: make accessible the manager or list of managers.
  3. Place files from the Distrib/web directory in the folder of your web server.
  4. Specify the applet parameters (in the file client.html):
    • baseURL : it is the http address for running the PHP scripts
    • Specify the GridNode applet parameters (in the file gridnode.html):
      • baseURL : it is the http address for running the PHP scripts
      • GridNode : it is the name of a Cluster or a Grid Node.
    • Setup the manager host and port in the PHP script (gate/gpm_config.php):
      $gpm_managers = the list of manager's address (host:port). Example: "MANAGER1:8999 MANAGER2:8999"
    • If zlib module is enabled in PHP, you can enable the deflate compression of transfered data (gate/gpm_config.php):
      $gpm_use_compression = true;
      $gpm_compression_level = 9;

Up

The Web service installation in Globus container

  1. Configure the manager.
  2. If firewall is used then configure it: make accessible the manager or list of managers.
  3. Install and configure Globus Java WS Core 4.0.3 or above (link: http://www.globus.org/toolkit/)
  4. Deploy service using the GAR package from /distrib/release/gpm-services.gar. For more information see Globus Toolit documentation.

Up

Clients

The GPM system contains advanced client part. System clients carry out the following tasks: The system has the following kinds of clients:

Up

Commandline tools

There are commandline tools that works with Manager and used to do:

FlushManagerCache

The FlushManagerCache is a commandline utility that allows to stop Manager without lost the cached values.
Usage:
FlushManagerCache managerHost managerPort

Up

DestroyMetrics

The DestroyMetrics is a commandline utility that allows to destroy metrics. This operation is needed to free storage space and to destroy useless information about destroyed jobs a long time ago.
Usage:
DestroyMetrics managerHost managerPort metricsTemplate

Up

GetRequestValue

The GetRequestValue is a commandline utility that allows to request information from Manager. You must specify the request file.
Usage:
GetRequestValue managerHost managerPort requestFile [parameters]

SubscribeRequest

The SubscribeRequest is a commandline utility that allows to control the value of the specified request. Also you must specify the manager address and port.
Usage:
SubscribeRequest managerHost managerPort heartbeat requestFile [parameters]

DumpManagerInfo

The DumpManagerInfo is a commandline utility that allows to check current Manager state. You must specify the manager address and port.
Usage:
DumpManagerInfo managerHost managerPort

Up

Future works

The project was started in Jan 2006. Today we have the first version of the system. The big work was done, but much remained unfinished. The nearest our plans:

Up

Copyright

Copyright (C) 2006-2007 University of Nizhniy Novgorod

Up

Authors

The GPM System Development Team:
Dmitry LabutinProject Adminlabutin at users.sourceforge.net
Alexander AlekhinDeveloperalekhin at users.sourceforge.net
Denis BogolepovDeveloperdeniscmc at users.sourceforge.net

Up