Inter-server communication

iCore Process Server uses Microsoft Message Queuing (MSMQ) for internal communication between the service and worker processes on the same computer, as well as between servers in a clustered system. MSMQ provides a reliable messaging platform that helps ensure that mission critical messages are never lost or delayed for too long.

Most queues used by iCPS are ephemeral in nature, meaning that they are created dynamically as needed when the system or a worker process is started. There are however three queues that are referred to as global system queues that are configured globally for a system and must be available to – and accessible by – all servers in a clustered system. The global system queues are configured in the system settings in iCore Administrator.

The global system queues are:

  • Job Manager result NACK queue
    An administration queue to which any messages regarding new Jobs and their completion are returned if they were not delivered and received on time.
  • New events nack queue
    An administration queue to which messages about the creation of new Events are returned if they could not be delivered and received on time, as well as the queue to which new Events created by external sources are sent.
  • Express events queue
    The queue to which express Events created by external sources are sent. 

Note: To ensure that iCPS keeps functioning (in a high-availability cluster) during the failure of a single machine, the global system queues must be located on a clustered MSMQ service.

Many messages sent by iCPS are sent using recoverable messaging (for more information, see http://msdn.microsoft.com/en-us/library/ms704130(v=vs.85).aspx). When recoverable messaging is used to send messages, the messages are written to disk on the sending computer and on every computer that forwards the message during routing until the message is delivered to the destination computer. This ensures that the message is not lost if a machine fails or a failover occurs in a clustered MSMQ environment. Also, it means that on a system where a great number of Events and Jobs are generated and processed in a short period of time, the message processing will be utilizing disk I/O intensively. Performance gains can be realized by using several physical disks for storage of the messages on this service.

TCP/UDP communication

For non mission-critical messages to and between servers, UDP and/or TCP communication is used. For example, in a clustered system a heartbeat message is periodically sent over UDP between the servers to ensure that they are running correctly. If no such messages are received by a server in a certain amount of time, it is assumed to have failed and another server will take over its responsibilities.

The IP address (on which each server is listening for incoming TCP/UDP messages) is configured on the Server entity representing that machine in the iCore system. The default port is configured on the system level but can be overridden on a per-server basis in the server entity. The "IP address" property of the server entity can be set to "0.0.0.0" to make it listen on all available network interfaces. Messages are sent to the host name specified in the "Machine ID" property.

If iCore is configured as a high-availability cluster, it is recommended that a separate network interface is used for the heart-beat communication between servers to ensure that this communication is not disrupted if the rest of the network experiences problems.

 

Do you have comments to the material on this page? Please tell us!