Queueing and Data Loss in syslog Processing
(... as in "Recognizing and Compensating for")
Background
Top of Page
One of the main hurdles that must be overcome in any serious attempt to use the syslog protocol as a management tool is the recognition
and mitigation applied to the issue of data-loss that is endemic to the syslog protocol's use of UDP and Linux datagram-services.
The datagram protocols (LInux datagram and UDP) were specfically chosen for their 'lossy' quality (as they would have been unworkable any other way).
This does not mean that data-loss issues should be ignored or papered over however.
The use of UDP (or Linux datagram) sockets for syslog transmission presents the following possibilites for 'lost' data:
- Network Congestion: When routers or network trunk lines get congested, traffic is dropped as the receiver is unable to process all the incoming data. These dropped
packets may contain syslog data.
- Host congestion: If the host network stack is receiving more information than it can process, data will get dropped. Data transmitted over a connection-oriented
protocol (such as TCP or Linux streaming sockets) will be re-transmitted by the sender. UDP and Linux datagram packets are simply lost as there is no attempt or
mechanism in these protocols to insure delivery or to retransmit data.
- Application congestion: This is probably the major source of data-loss with syslog data. The mechanism for passing data from the OS to an application is for the
OS to put that data into a small 'ring-buffer' for the application to read. If the application does not read the data fast enough, the OS will over-write unread data
with 'new' incoming packets and the previous content is lost (with no way for the application to know it lost 'potential' input data.
- When an application forwards data that it HAS managed to receive from the OS, if it uses the UDP (or Linux datagram) protocol, the network factors come back into play
and the congestion issues at the new host also come into play.
- The more data that is transmitted over the network from a syslog host, the more likely it is that network links become congested and that either the host or the
application (or both) at the point where data from multiple syslog data-streams converge will become congested -- possibly to the point of becoming unable to function.
Syslogd2 can help to mitigate all the above 'data loss' scenarios (and in selected circumstances) to entirely elminate the data loss.
While Syslogd2 cannot do anything about network congestion loss during transmission of individual connection-less-protocol packets (between source host and receiving host), Syslogd2 CAN
via output-filters) reduce the overall volume of traffic that is 'released' from a Syslogd2 host.
This reduction can be over 95% or higher of what can be achieved using traditional syslog processors.
The resulting reduction in overall network traffic will do a LOT to reduce overall network congestion (due to syslog traffic levels) and will go a long way toward reducing
(or eliminating) the host- and application-level congestion issues of the receiving (centralized) hosts.
- TCP: Syslogd2 can transmit (and receive) syslog traffic over TCP which (as a connection-oriented protocol) WILL re-transmit any dropped packets.
TCP offers a reliable way to transmit data over 'lossy' (congested or error-prone) connections.
The use of TCP is also a preferred method of passing connections through firewalls since it allows firewalls to check the rule-base for the handshake protocol only instead
of for every packet being transmitted.
The 'downside' of using TCP is that there is a practical limit to the number of simultaneous connections a single host can maintain.
This limit is fairly high (likely in the hundreds), but nowhere near the 10,000 or more potential desktop hosts in a large company.
TCP also presents the issue of 'backed-up-data' which is when the aggregate incoming connections to a host exceed that host's ability to process the data. Since there is no
'safety-valve' with the TCP protocol, the transmitting stations will start to see increasing error-counts and dropped packets in their own right as they are unable to transmit
over the 'backed-up' TCP connection.
- CAP_WORKERTHREADS / CAP_OUTPUTTHREADS
- COMMAND-TOOL
- TCP Support
- LossLess Queueing
Top of Page