What should I monitor on my Eclipse servers?

The basic elements that make up monitoring are:

  • Events: triggered when a set condition occurs
  • Threshold: the point on a scale that must be reached in order to trigger a response to an event. The response can be an alert, a notification, or a script being run.
  • Notification: how an administrator is informed that something (an event or a response) has occurred.
  • Health: a set of metrics that define the state of the functionality being monitored. The administrator defines the values that represent a “healthy” state for each of their components.

For your Eclipse environment, we recommend monitoring the following items where applicable and possible:

General Monitoring:

  • Availability (ping test)
  • CPU (high CPU threshold alerts, tracking historical trends)
  • Memory (high RAM threshold alerts, tracking historical trends)
  • Disk space (top utilization threshold alerts, tracking historical trends)
  • Hardware (failures, power loss, firmware events, etc.)

Database server:

  • UniVerse (process running, responsive, spooler status)
  • SYSTEM.ADMIN (process running)
  • SOCKET.PH.SERVER (process running, listening on port 22222)
  • VSIFAX (process running, responding, modems down)
  • JBoss (process running, listening on port 2080)
  • Samba shares (availability, read/write)
  • CUPS (print queue status)
  • Sendmail (service running, queue status)

Forms Server:

  • Windows share(s) (availability, read/write)
  • Formscape (availability of port or services running)
  • VSIFAX (services)

Imaging Server:

  • Windows share(s) (availability, read/write)

Internet Gateway Server:

  • IIS (service running, listening on ports 80 and/or 443)