Deployment Considerations for Syslogd2

(Selecting architectures and feature-sets)


Home Page

Configure Syslogd2

Sample Files

Deployment

Introduction
Selecting an architecture
Selecting a feature-set
Tips & features

New Concepts

The Config File

Compile & Install

Misc Topics

Capabilities

Demonstrations

Reference

Glossary of Terms

External Links

Syslogd2 Project Site
DBD2 Home Page
DBD2 Project Site

Other References

RFC 3164 (The BSD Syslog Protocol)
RFC 3339 (Internet Time Format)
RFC 5424 (Syslog Version 1)

Focus on Network

Syslogd2 Input Options
Syslogd2 Output Options
Queueing and Data Loss

Introduction

[Top of page]


Syslogd2 is built to explicitly recognize that syslog data collection (especially in networked environments) can no longer be a one-size-fits-all operation.

Traditional views of syslog data collection are centered around the needs of the local host.
For these needs, there will only ever be a moderate volume of input, perhaps (at most) a dozen files (since there are only so many facilities to be logged: 12 named and 8 'local' facilities) and no significant input from external sources. Thus a one-size-fits-all approach is acceptable if that size is large enough to handle the expected syslog generation of a server-class machine, it can also be deployed to the "smaller' desktop needs. (Perhaps the syslog processor is written as a multi-threaded application that allows gradations of service by adjusting the thread-counts such as in rsyslog).

Attempting to transfer this view into a networked environment immediately encounters several problems:


There are other issues that surface in a network-environment as well, but you get the idea: a one-size-fits-all approach to deploying syslog collectors in a network is simply a non-feasible approach.

Equally infeasible (in the long term) is any approach that attempts to mix-and-match multiple versions of syslog collectors from different sources -- especially if those different version / variations of syslog collector use different syntax or network protocols to 'talk' to each other (or if they may diverge in the future).

In short: the use of desktop-oriented syslog collectors for network manaegement purposes has been infeasible for several decades now (pretty much since corporate networks started to be deployed and since the invention of the internet that has became so relavent to corporate bottom-line profits). The problem has been that there has been a disconnect between those with the awareness of the difficiencies and those with the ability to do something about it.

This is where Syslogd2 comes in....

Syslogd2 operates somewhat like an old-time country doctor that used to make house-calls. He would typically have a large selection of medications (in large bottles) in his office, but for any house-calls, he would carry only a small bag containing small amounts of only those medications he thought he would need. Therefore the contents (and weight) of the doctor's bag (on any given house-call) would vary with the need without becoming unnecessarily burdonsome.

In a similar way, the Syslogd2 (uncompiled) code base acts like the doctor's office while each (compiled) binary instance contains feature-sets (representing the contents of the doctor's 'bag') tailored to the need of the intended host (the doctor's patient). Deploying Syslogd2 effectively involves anticipating the needs of the various hosts it will be deployed to and customizing the feature-sets to satisfy those needs.

Effective deployment of Syslogd2 involves a handful of primary determinations (all of which (unfortunately) require some familiarity with Syslogd2). Because of the lack of familiarity, Syslogd2 ships with several pre-configured 'sample' configurations that serve two purposes: (1) to provide known configurations that support a set of demonstration walk-throughs designed to 'show off' Syslogd2's capabilities and (2) to provide some experience with different feature-sets to allow users to better select options for their own customized binaries.

Selecting an Architecture

[Top of page]

In Syslogd2 the term 'architecture' refers to how Syslogd2 is structured to process data. Scalability in Syslogd2 is achieved by expanding in three 'directions':


Syslogd2 divides the processing of each syslog event into 3 phases:


The 'data handoff' from ('linkage' or 'glue' between) one phase to the next can be determined by the user.

The most common handoff method is the use of a subroutine call. The input-phase calls the processing phase as a subroutine to process the message, then call the output-phase as a subroutine to actually write the data. While the use of a subroutine call may be decried by some as inefficient compared to in-line programming (where the content of the subroutine code is moved 'in-line' with the calling routine), the trade-off in flexibility is well worth it - especially since the subroutine call is intended only for low-to-moderat speed throughput and multiple threads can be brought to bear.

By including the CAP_WORKERTHREADS compiler-symbol in the binary definition at compiler-time, the linkage between the input-phase and the processing-phase is changed to a threadpool-transfer. The processing-phase code gains a FIFO buffer and the input-phase gains a new parameter to designate which (of potentially several) of these new 'worker-threadpools' the data from that particular input threadpool should be queued to. (The default worker-threadpool is always 0 [zero]). The use of an architecture using separate intput and worker-threadpools may be desireable in situations where LInux kernel, a low-volume local Linux socket, no IP socket and one or more (slow-growing) text files need to be monitored. This represents 3 types of threadpools (kernel, socket and tailfile) that cannot be combined but CAN 'share' a single pool of working threads to process their (low-to-moderate combined volume of) traffic.
--> The default configuration 'syslogd2d' (demonstration) uses this architecture).

By including the CAP_OUTPUTTHREADS compiler-symbol in the binary definition at compiler-time, the linkage between the processing-phase and the output-phase is changed to a threadpool-transfer. The output-phase code gains a FIFO buffer and each of these new 'output-threadpools' gains a 'summary selector-string mask'. Routing of traffic from processing-phase to output-threadpool is modified by changing the comparison of each message facility + priority from individual destinations to the output-threadpool-level summary-threadpool-mask(s) and then queueing to the matching FIFO buffers instead of actually writin to the output devices. By distributing output destinations among multiple output-threadpools, it becomes possible to support many more output destinations than can be supported without CAP_OUTPUTTHREADPOOLS because (a) more (different) threads are involved in processing the output and (b) because the processing-phase threads are no longer processing output, they can focus on processing additional incoming messages.

It is also (now) possible to 'isolate' 'slow' output devices (such as network links) from faster connections such as local files, allowing for fewer instances of internal congestion (caused by threads blocking each other in compeitition for the same output device).
--> The default configuration 'syslogd2o' (output) uses this architecture).

By including both CAP_WORKERTHREADS and CAP_OUTPUTTHREADS you get a system capable of keeping up with virtually all firewalls (given sufficient CPU cores, network bandwidth for incoming data and disk-channel capacity via RAID disk configuration). (Note: firewall traffic up to T3-speeds should not require much additional system-performance hardware other than sufficient disk storage for any desired logging). This is the highest-throughput (and potentially most resource-intensive) architecture supported by Syslogd2. --> The default configuration 'syslogd2g' (mega) uses this architecture).

In addition to the new threadpool types 'worker' and 'output', Syslogd2 supports two additional types: 'kernel-threadpool' and 'user-threadpool'. Both are optional. 'Kernel' is Linux-specific.


The Kernel Threadpool

The 'Kernel' threadpool is the recommended (more efficient) way to read kernel data. It provides two methods: polling of the /proc/kmsge pseudo-file or system-call access (where logging can occur even when /proc is not mounted).
The default is system-call.
There is never more than one instance of a 'kernel threadpol-type' built.
The new keyword for specifying kernel input and kernel input parameters is --kernel=.
The alternative to the use of CAP_KERNELTHREADS is to use CAP_TAILFILES and explicitly reference /proc/kmsg.
The alternative does not do as well because it relies on modifications of a generic file-polling algorithm that compensate for the fact that /proc/kmsg is not a 'real' file -- however it does not require the creation of (or use of) the 'kernel' threadpool.
If declared and compiled into the binary, the kernel-threadpool may be disabled at run-time so it is neither created nor used.


The User Threadpool

Whenever a message is being processed by an output-phase thread and is destined for a user-terminal, the output-phase processing thread (subroutine or threadpool) makes a subroutine call to handle the process of finding out if the user is currently logged on and if so, to write the message to that user's open termiunal(s).

This subroutine-linkage can be converted to a threadpool that will be used by all output-phase threads with the CAP_USERTHREADS compiler-symbol.
There will never be more than one user-threadpool built for any configuration.
The user-threadpool (if declared) can be disabled at run-time to revert back to the sub-routine linkage between output-phase processing threads and the user-processing code.
User-threads are automatically disabled if (during parsing) Syslogd2 detects that no user-list-destinations are defined.
For systems with medium or high demand for user notifications, the use of user-threadpools will offload significant (time-consuming) processing requirements from threads in the output-phase, allowing for a smoother processing flow.

Selecting a Feature-Set

[Top of page]

Syslogd2 contains an extensive set of base features that are compiled into every binary. Additional features can be added to increase (or restrict) the base functionality. I am open to any suggestions or requests to either modify or update these features - including requests to convert current 'base' features into optional ones. A full list of the current optional features can be found on the Capabilities page of this site. A (reasonably complete) list of 'base' features can also be found at that page.

A good way to get familiar with the capabilities of SYslogd2's base and optional features is to work through the demonstrations page to see some of the features in action. The decision on which optional featues to include is not final by any means -- you can always go back and modify your choices.
To see what options are included in any version of Syslogd2, execute the binary with either the '--version' (-v) or '--help' (-h, -?) options. The full list of supported optional features along with their enabled/disabled status will be included in the output of these command-line options.


[Top of page]