Introduction

Syslogd2 is built to explicitly recognize that syslog data collection (especially in networked environments) can no longer be a one-size-fits-all operation.

Traditional views of syslog data collection are centered around the needs of the local host.
For these needs, there will only ever be a moderate volume of input, perhaps (at most) a dozen files (since there are only so many facilities to be logged: 12 named and 8 'local' facilities) and no significant input from external sources. Thus a one-size-fits-all approach is acceptable if that size is large enough to handle the expected syslog generation of a server-class machine, it can also be deployed to the "smaller' desktop needs. (Perhaps the syslog processor is written as a multi-threaded application that allows gradations of service by adjusting the thread-counts such as in rsyslog).

Attempting to transfer this view into a networked environment immediately encounters several problems:

A traditonal syslog daemon (even a multi-threaded one like rsyslog) has insufficient capacity to handle firewalls, or input from multiple simultaneously-transmitting devices. The same is true for the 'primary collection host' (the host at the 'center' of the data collection responsible for reporting on the condition of the overall network). The larger the corporate network (and therefore the busier the firewall, routers and switches), the more obvious this deficiency becomes.
Any syslog collector capable of handling firewall-level input would be overkill (consuming excessive resources) on desktop hosts.
Network data links do not have the same data-transmission capacity as a host's local disk controllers. This means that the volume of data written to a network connection must be DRASTICALLY reduced from what can - and often is - written to disk. At the desktop or server level, potential congestion of the disk channel has long been compensated for (and largely prevented) by disk buffering done in the operating system. At the network level, congestion of the network link results in unacceptable network performance and excessive network delays (if not timeouts) experienced by users and overall increases in network performance complaints. Network-link congestion represents a 'new' requirement for syslog processors: that of somehow providing more granular control over the data to be transmitted over the network so as to be able to accept (and log) voluminous amounts of data while simultaneously reducing the amount of data transmitted over network links to a relative 'trickle'.
A network-management system must also allow for scalability and effective resource usage. It must be capable of scaling down to be effectively installed on every desktop and server (to collect host and application data) while also scaling up to handle the needs of multiple-network-devices-per-log-host that do not maintain their own local files. This means (potentially) a high number of input sources per concentration-point/collection-station (desktops, servers and/or network switches and routers) or (potentially) a few very high-speed input sources (firewalls, syslog collection/concentration hosts, reverse proxy servers or web-hosting equpment).
Additionally, while a local host-logging viewpoint can afford to ignore any input that is not explicitly formatted as syslog input, a network-management viewpoint and orientation (one that requires all relavant data to be forwarded to a centralized collection-point) asserts that there will virtually always be (somewhere in the organization) home-grown (even commercial or open-source) applications that do not explicitly 'feed' their output to syslog collectors for a variety of reasons. This represents another new requirement for network-oriented syslog processors: accepting and integrating non-syslog-formatted input into the overall network-management infrastructure.

There are other issues that surface in a network-environment as well, but you get the idea: a one-size-fits-all approach to deploying syslog collectors in a network is simply a non-feasible approach.

Equally infeasible (in the long term) is any approach that attempts to mix-and-match multiple versions of syslog collectors from different sources -- especially if those different version / variations of syslog collector use different syntax or network protocols to 'talk' to each other (or if they may diverge in the future).

In short: the use of desktop-oriented syslog collectors for network manaegement purposes has been infeasible for several decades now (pretty much since corporate networks started to be deployed and since the invention of the internet that has became so relavent to corporate bottom-line profits). The problem has been that there has been a disconnect between those with the awareness of the difficiencies and those with the ability to do something about it.

This is where Syslogd2 comes in....

Syslogd2 operates somewhat like an old-time country doctor that used to make house-calls. He would typically have a large selection of medications (in large bottles) in his office, but for any house-calls, he would carry only a small bag containing small amounts of only those medications he thought he would need. Therefore the contents (and weight) of the doctor's bag (on any given house-call) would vary with the need without becoming unnecessarily burdonsome.

In a similar way, the Syslogd2 (uncompiled) code base acts like the doctor's office while each (compiled) binary instance contains feature-sets (representing the contents of the doctor's 'bag') tailored to the need of the intended host (the doctor's patient). Deploying Syslogd2 effectively involves anticipating the needs of the various hosts it will be deployed to and customizing the feature-sets to satisfy those needs.

Effective deployment of Syslogd2 involves a handful of primary determinations (all of which (unfortunately) require some familiarity with Syslogd2). Because of the lack of familiarity, Syslogd2 ships with several pre-configured 'sample' configurations that serve two purposes: (1) to provide known configurations that support a set of demonstration walk-throughs designed to 'show off' Syslogd2's capabilities and (2) to provide some experience with different feature-sets to allow users to better select options for their own customized binaries.

Selecting an Architecture

[Top of page]

In Syslogd2 the term 'architecture' refers to how Syslogd2 is structured to process data. Scalability in Syslogd2 is achieved by expanding in three 'directions':

Multi-threading within the threadpool.
Distribution of traffic among multiple threadspools (each of which can be individually configured with different thread-counts).
Distribution of msg-processing steps among different (specialized) threadpool-types to focus the cpu-cyles where they can do the most good. The 'data hand-off' ('linkage' or 'glue') between these threadpool-types can be selected by the user for large to very-large processing models.

Syslogd2 divides the processing of each syslog event into 3 phases:

Input Phase: This phase obtains a line of input from the input-connection, the time-of-receipt from the system clock, the peer-address from the input socket (if the connection is an IP socket), and the input-record from which the data was read.
Processing Phase: This phase parses the data from the input phase and prepares it for handoff to the output-phase where it will be written to the various destinations. The processing phase performs all processing steps that are common to all syslog events, excluding only those that depend on the output destination type.
Output Phase: This phase is responsible for actually writing the information to the output device and handling any output-error conditions that may occur. It also formats the output according to the type of output-device (and for Syslogd2: the 'format' output option).

The 'data handoff' from ('linkage' or 'glue' between) one phase to the next can be determined by the user.

The most common handoff method is the use of a subroutine call. The input-phase calls the processing phase as a subroutine to process the message, then call the output-phase as a subroutine to actually write the data. While the use of a subroutine call may be decried by some as inefficient compared to in-line programming (where the content of the subroutine code is moved 'in-line' with the calling routine), the trade-off in flexibility is well worth it - especially since the subroutine call is intended only for low-to-moderat speed throughput and multiple threads can be brought to bear.

By including the CAP_WORKERTHREADS compiler-symbol in the binary definition at compiler-time, the linkage between the input-phase and the processing-phase is changed to a threadpool-transfer. The processing-phase code gains a FIFO buffer and the input-phase gains a new parameter to designate which (of potentially several) of these new 'worker-threadpools' the data from that particular input threadpool should be queued to. (The default worker-threadpool is always 0 [zero]). The use of an architecture using separate intput and worker-threadpools may be desireable in situations where LInux kernel, a low-volume local Linux socket, no IP socket and one or more (slow-growing) text files need to be monitored. This represents 3 types of threadpools (kernel, socket and tailfile) that cannot be combined but CAN 'share' a single pool of working threads to process their (low-to-moderate combined volume of) traffic.
--> The default configuration 'syslogd2d' (demonstration) uses this architecture).

By including the CAP_OUTPUTTHREADS compiler-symbol in the binary definition at compiler-time, the linkage between the processing-phase and the output-phase is changed to a threadpool-transfer. The output-phase code gains a FIFO buffer and each of these new 'output-threadpools' gains a 'summary selector-string mask'. Routing of traffic from processing-phase to output-threadpool is modified by changing the comparison of each message facility + priority from individual destinations to the output-threadpool-level summary-threadpool-mask(s) and then queueing to the matching FIFO buffers instead of actually writin to the output devices. By distributing output destinations among multiple output-threadpools, it becomes possible to support many more output destinations than can be supported without CAP_OUTPUTTHREADPOOLS because (a) more (different) threads are involved in processing the output and (b) because the processing-phase threads are no longer processing output, they can focus on processing additional incoming messages.

It is also (now) possible to 'isolate' 'slow' output devices (such as network links) from faster connections such as local files, allowing for fewer instances of internal congestion (caused by threads blocking each other in compeitition for the same output device).
--> The default configuration 'syslogd2o' (output) uses this architecture).

By including both CAP_WORKERTHREADS and CAP_OUTPUTTHREADS you get a system capable of keeping up with virtually all firewalls (given sufficient CPU cores, network bandwidth for incoming data and disk-channel capacity via RAID disk configuration). (Note: firewall traffic up to T3-speeds should not require much additional system-performance hardware other than sufficient disk storage for any desired logging). This is the highest-throughput (and potentially most resource-intensive) architecture supported by Syslogd2. --> The default configuration 'syslogd2g' (mega) uses this architecture).

In addition to the new threadpool types 'worker' and 'output', Syslogd2 supports two additional types: 'kernel-threadpool' and 'user-threadpool'. Both are optional. 'Kernel' is Linux-specific.

The Kernel Threadpool

The 'Kernel' threadpool is the recommended (more efficient) way to read kernel data. It provides two methods: polling of the /proc/kmsge pseudo-file or system-call access (where logging can occur even when /proc is not mounted).
The default is system-call.
There is never more than one instance of a 'kernel threadpol-type' built.
The new keyword for specifying kernel input and kernel input parameters is --kernel=.
The alternative to the use of CAP_KERNELTHREADS is to use CAP_TAILFILES and explicitly reference /proc/kmsg.
The alternative does not do as well because it relies on modifications of a generic file-polling algorithm that compensate for the fact that /proc/kmsg is not a 'real' file -- however it does not require the creation of (or use of) the 'kernel' threadpool.
If declared and compiled into the binary, the kernel-threadpool may be disabled at run-time so it is neither created nor used.

The User Threadpool

Whenever a message is being processed by an output-phase thread and is destined for a user-terminal, the output-phase processing thread (subroutine or threadpool) makes a subroutine call to handle the process of finding out if the user is currently logged on and if so, to write the message to that user's open termiunal(s).

This subroutine-linkage can be converted to a threadpool that will be used by all output-phase threads with the CAP_USERTHREADS compiler-symbol.
There will never be more than one user-threadpool built for any configuration.
The user-threadpool (if declared) can be disabled at run-time to revert back to the sub-routine linkage between output-phase processing threads and the user-processing code.
User-threads are automatically disabled if (during parsing) Syslogd2 detects that no user-list-destinations are defined.
For systems with medium or high demand for user notifications, the use of user-threadpools will offload significant (time-consuming) processing requirements from threads in the output-phase, allowing for a smoother processing flow.

Selecting a Feature-Set

[Top of page]

Syslogd2 contains an extensive set of base features that are compiled into every binary. Additional features can be added to increase (or restrict) the base functionality. I am open to any suggestions or requests to either modify or update these features - including requests to convert current 'base' features into optional ones. A full list of the current optional features can be found on the Capabilities page of this site. A (reasonably complete) list of 'base' features can also be found at that page.

A good way to get familiar with the capabilities of SYslogd2's base and optional features is to work through the demonstrations page to see some of the features in action. The decision on which optional featues to include is not final by any means -- you can always go back and modify your choices.
To see what options are included in any version of Syslogd2, execute the binary with either the '--version' (-v) or '--help' (-h, -?) options. The full list of supported optional features along with their enabled/disabled status will be included in the output of these command-line options.

[Top of page]

Home Page

Configure Syslogd2

Sample Files

Deployment

New Concepts

The Config File

Compile & Install

Misc Topics

Capabilities

Demonstrations

Reference

Glossary of Terms

External Links

Other References

Focus on Network

Introduction

Selecting an Architecture

The Kernel Threadpool

The User Threadpool

Selecting a Feature-Set