New Concepts in Syslogd2

(Rethinking the 'Common Knowledge' of Syslog Event Processing)


Home Page

Configure Syslogd2

Sample Files

Deployment

New Concepts


Forward

In Base Feature-Set

Application Mode
Delayed Resolution
Deployment
Extra Facilities
Hostname Management
Input Processing
Multi-Homed Hosts
Network Awareness
Output Processing
Prorammable Interrupts
Static Data Parameters

In Optional Features

Command Tool
Configuration Display
Filters
Advanced Multi-Threading
Name Cache (w/ pre-load)
Named-Pipe Input
Spooling
TCP Support
Text-File Input
Variable Buffer-Lengths

The Config File

Compile & Install

Misc Topics

Capabilities

Demonstrations

Reference

Glossary of Terms

External Links

Syslogd2 Project Site
DBD2 Home Page
DBD2 Project Site

Other References

RFC 3164 (The BSD Syslog Protocol)
RFC 3339 (Internet Time Format)
RFC 5424 (Syslog Version 1)

Focus on Network

Syslogd2 Input Options
Syslogd2 Output Options
Queueing and Data Loss

Forward

[Top of page]

Syslogd2 implements several features based on technologies and techniques never before integraed into a system logging daemon. Some of these features are based on computer-science theory (such as capacity-analysis, and queueing-theory), some on operations-research techniques (such as human-nature-based tendencies and traffic-flow-management) and some features are based on simple common-sense, but ALL are based on experience (specifically MY experience) attempting to manage significant numbers of Linux servers and network devices.
To some extent, Syslogd2 is also somewhat of an experiment to discover what features would be most useful in network- and host-administration personnel as well as the result of about 2 decades of hobby programming...

To fully understand Syslogd2, it is necessary to question every assumption currently held about how syslog is used and configured in today's networks, as compared to its original purpose when it was first devised in the early 1970s. Syslogd2 is more of a 'feature grab-bag' and 'network-data-collector' than it is a 'one-size-fits-all' 'host-based-logger' daemon (like rsyslog or syslog-ng).

This (New Concepts) page attempts to summarize most of the new features and concepts and to provide links to the web pages where each feature or concept is discussed in more detail.

A brief history of the development of syslog processors sice 1970.

Syslog-ng existed for a brief time for Linux, but due to it's radically different configuration structure that only a programmer could love, it was never widely accepted.

Rsyslog introduced a mult-threaded version of a syslog daemon in the early 2000s that supports TCP/IP and IPv6 (and now also supports reading the systemd journal interface to obtain syslog messages). Rsyslog is Linux-only as it relies on system calls that are unique to the Linux OS.

Syslogd2 is being released in 2023/2024 and supports a variety of features. Written in 'C' with no external support libraries required, Syslogd2 is portable to any Unix-like OS. (It HAS been ported to MAX OSX though that port is currently out-of-date.)

It is my hope that the addtional functionality of Syslogd2 and its ability to be ported to other systems will be able to overcome the industry's usual inertia against anything new or different.

That was a pretty brief summary - wasn't it ?

Syslogd2's contribution

Syslogd2 was designed and developed expressly to address all the issues I experienced while trying to use the syslog protocol in both network and host administration environments.



New Concepts Implemented in the Base Feature-Set


Application Mode

[Top of page]

'Application Mode' refers to the ability of Syslogd2 to run along side of another syslog daemon, sharing the same host yet not interfering with (or being interfered with by) the existing syslog processor. This might be desirable where a network-management data-collection capability is desired on a host that is under the control of a different organization or where the existing syslog processor is providing some output-formatting or other service that Syslogd2 is not designed to support.

This 'application' role allows Syslogd2 to enhance or supplement (not replace) the abilities of the host's existing syslog processor in the role of of syslog data colleciton, processing and logging. For example, Syslogd2 might be deployed on a cloud-based server or a remotely-hosted server to monitor one or more log files, reporting back to 'home base' only selected events instead of the entire contents of facility/priority data streams or it may allow Syslogd2 to be deployed to collect and concentrate TCP syslog streams and log-file input on a host whose native syslog processor does not support TCP connections.

Running Syslogd2 in Applciation Mode is not without drawbacks. One issue is how to set up the data-transfer from the native syslog processor into Syslogd2 where filters and other processing steps can be applied for purposes of event filtering and 'noise reduction' (reducing the congestion of network-links and down-stream processors). Running in 'Application Mode' is the primary use-case for Syslogd2's support of named-pipe input.

Not running as the primary syslog processor for a host is generally a less efficient method of syslog data collection (both because many message will have to be processed twice and because Syslogd2 may not recieve 'clean' or complete data either through administrative configuration error or if the other processor is unable to keep up with the incoming traffic and the system buffers drop messages that it is not fast enough to read).

There may also be instances where a host (perhaps not even a Linux host) processes syslog data in a manner that is 'foreign' to Syslogd2 due either to the proprietary nature of the host operating system or due to some unique capability that Syslogd2 does not possess. (One example I've already found is the unique format of the OSX operationg system's syslog file output).

For all the above reasons, Syslogd2 provides the --ApplicationMode 'macro' that simplifies the process of 'converting' Syslogd2 from a system-default-syslog-service daemon to simply a backgrounding application process. '--ApplicationMode' makes the following changes as if they had been to compile-time settings (meaning they may still be individually over-ridden by run-time settings without running afoul of Syslogd2 first-come-first-serve parsing policy).


Application Mode may be invoked in using either of two methods. As a Syslogd2 keyword, it is non-case-sensitive either way:

  1. As an independent command-line parameter on either the actual command-line or from inside the configuration file:
    --ApplicationMode
  2. As a global boolean value on the actual command-line or from the configuration file:
    ~ --enable applicationmode

The promise of Syslogd2 running in ApplicationMode on a UNIX or Linux host is not without potential failure. There are technical and implementation issues that may restrict the ability of a Syslogd2 application to receive syslog data from a legacy syslog processor running on the same host.
Because there is at least one method of communication between rsyslo (legacy syslog processo) and Syslogd2 on Linux, it is feasible to run both rsyslog and Syslogd2 on the same host (if Syslogd2 is in 'ApplicationMode'). This allows Syslogd2 to function as a supplement to rsyslog for network-management purposes (reading log-file (and perhaps IP or kernel input) input while receiving other syslog data from rsyslog). This feasibility is enhanced because rsyslog is designed with pluggable modules for (at least) kernel input and IP support, so it appears that these input functions can be disabled on either system and enabled on the other.

Delayed Resolution

[Top of page]

Traditional (single-threaded) syslog processors (host-loggers) assume that the network will be up and stable at all times so if a host canot be resolved at startup, they mark that host unusable and move on. At no time do they go back and attempt to re-resolve a hostname that did not initialize at startup. A big part of the reason for this is that most traditional syslog processors are not multi-threaded so once they finish startup, they start a processing loop that only terminates when they shut down. To leave that processing loop for any purpose would be to stop processing syslog traffic and to potentially miss incoming traffic due to system-buffer overflows.

Syslogd2 (being multi-threaded) has a specialized threadpool set aside to run background operations while the primary threadpools are focused on processing syslog input. This new threadpool is called a 'housekeeping' threadpool because it does various 'maintenance' chores for the Syslogd2 applicaiton. Two of the 'maintenance chores' that it is called upon to do (by the parent-thread running as the main-scheduler) are to periodically check the network state and run the CheckSources routine and the CheckDesitnations routine. In the event of an 'upwards' network state change (Down->Local or Local->Other), these routines call upon a 2nd pair of routines to 'walk' the list of input and output connections and to resolve any previously unresolved entries. After resolving all entries that can be resolved, the proposed connections are checked for conflicts. Those entries not marked 'in-conflict' either with the network state or with each other, are then opened. Those connections that succeed in being opened at this time (either newly resolved or previously opened, but closed due to communication error) are returned to service -- all while the primary threads continue to process incoming syslog input.

The term 'Delayed Resolution' refers to the (possibly lengthy) delay between system startup and when individual IP hosts can be resolved and activated. Syslogd2's awareness of the state of the network combined with periodic attempts to resolve IP hostnames implements both delayed resolution and standard network-recovery. (Note this can only occur if the network is in state 'Other' if DNS servcie is required to resolve addresses because otherwise the DNS is assumed to be unreachable/unusable.) In addition to just checking and reopening input/output connections, CheckSources and CheckDestinations do other connection-housekeeping chores as well. Among these chores are:

Deployment

[Top of page]     [More Information]

Deployment of Syslogd2 is a topic that many may disagree with. This is my perspective. Deployment (for purposes of this section) covers two conceptual areas:


The need to consider how a system is deployed is something that host-loggingg packages rarely consider since their deployment approach is simple: one daemon per host with a scope of responsiblity of just that one host. It is a rare syslog daemon indeed that gives any thought or concern to the administrators that will have to support the system after deployment. How do configuration changes get updated ? Package upgrades ?

Deployment of Syslogd2 has been a consideration from its inception. The link above will take you to a page that displays some possible deployment scenarios for Syslogd2. The short version is that since Syslogd2 is a proper superset of the requirements of legacy logging daemons (with the possible exception of some of rsyslog's more esoteric output formats), it can be deployed on any Linux host as a substitute for that host's current logging daemon.

Once deployed, Syslogd2 has the facilities and capacity to collect data from text log-files or remote IP devices and to act as either a felay or as an first-receipt host to inject these additional sources into the overall syslog-management infrasttructure. What makes Syslogd2 so attractive for network-manamgement purposes ?


Using Syslogd2's SoftComment command-line option, a single file can be maintained and deployed generically to support both Syslogd2 and a legacy syslog-colleciton package (rsyslog) in a corporate host-management environment. Using the optional paremeters to the --IncludeConfig command-line option, some advanced planning and foresight active corporate syslog management becomes very doable.

Deployment in a network-management mode needs a different approach since syslog-data collection from network-devices requires external host support. Syslogd2 can be deployed on Linux hosts in a distributed manner to act as these syslog collectors. Syslogd2's filters and other features provide superior syslog-colleciton performance, but just deploying the Linux hosts allows for distribution of other network services. The distrubted 'network-service-hosts' hosts would have sufficient left-over resources (cpu-cycles, memory, disk space) to act as distributed DNS secondary servers and/or NIS secondary servers in the event of link failure, allowing users to continue working even if temporarily cut off from the primary network services.

Extra Facilities

[Top of page]

An important feature of Syslogd2 is full support for facilities that have higher numeric values than '23' (local7) as well as some 'reserved' facilities that have no assigned name in Linux. These 'extra' facilities are currenlty undefined by either the operating system or the international community which is good for network management purposes as it keeps them from becoming 'polluted' by multiple applications' direct use, leaving them 'clean' for use by network-management and logging applications. (Note: Because some non-Linux operating systems assign values to these facilities and allow their use, facility and priority values for such traffic can be preserved when processed by Syslogd2.)

By itself, this does not sound like much, (after all, what value could there possibly be in syslog facilities that no-one can ever use ?), but the ability to access these facilities in combination with Syslogd2 filters and some of the input-processing options is a very powerful tool for point-of-receipt selection and first-stage filtering of network-oriented syslog traffic.

Spoilers:


By default, Syslogd2 provides 32 additional facilities, but at compile-time this quantity can be changed to a number between 0 [zero] and 1000 inclusive. These additional facilities are referred to as 'extra' facilities in Syslogd2 (extra0 - extra31).

As an addiitional 'feature', Syslogd2 makes accessible the 4 reserved facilities at numeric values 12-15. Some operating systems make use of these values under various names even though there appears to have been no international consensus to formally define them. Since Linux does not define these syslog facilities, Syslogd2 refers to them as reserved0 - reserverd3.

Hostname Management

[Top of page]

When collecting external (non-syslog-generated) log data for down-stream processes, it is important for upstream collectors to remember that off-host processes may not be able to distinguish between syslog events collected from two instances of the same non-syslog-reporting application. This may mean that the hostname of a message can no longer be assumed if there is more than one instance of a similar reporting source on the network.

Syslogd2 provides various means of determining the contents of the hostname fields for locally-generated messages. This is in response to my experience that sometimes a host's name either does not match the company standard naming scheme or does not adequately describe the source of the syslog events.


For situations like these, it is more important for downstream applications to know the name of the host being monitored than it is to know the name of the host doing the monitoring (logging), yet whether using Syslogd2 or legacy syslog-processors, downstream processes will receive syslog-events with a hostname field indicating the 'canonical hostname' rather than the (desired) hostname of the device being monitored.

Syslogd2 supports an input-specification 'hostname' option that allows each input-specification to specify a hostname to use when processing data from that input. By itself, this 'hostname' parameter acts (through inheritance) to over-ride the global 'hostname' and 'domain' variables for entries that do not supply a hostname field. (This will likely be primarily text-files that Syslogd2 has been told to use for input.) When combined with the input-specification option 'ignore hostname', this 'default value' becomes an over-ride value to replace any pre-existing hostname.

Taking this a step further, there is nothing in the specifications that states that the host-name field of a message MUST be ONLY the 'canonical name' of the host, so some system admins may wish to treat their servers (for logging and reporting purposes) as 'subdomains' creating hostnames of the form 'app.host.domain'. For this purpose (among others), Syslogd2 supports the two variables '$h' and '$d' that may be used in the 'location' component of input-specifications or in the option-list with the 'hostname' option. '$h' is replaced during startup with the value of the global variable 'hostname' and '$d' gets replace with the contents of 'domain'.

~ --tailfile = /directory/web_log_file, hostname=myweb.$h.$d, ...
~ --tailfile = /directory/filename, hostname = callcenter1.$d, ignore = hostname ...

The second example above results in downstream processes seeing 'callcenter1.domainname' as the hostname field.

Input Processing

[Top of page]     [More Information]

Input processing has undergone significant changes in Syslogd2.

Perhaps the biggest conceptual change to input processing is that Syslogd2 is designed around individually-configured connections (individual [IP or Linux] sockets, individual files, etc) where individualized configuration parameters are available to modify inherited global values. Typical syslog processors are designed around global configuration values with no options for customizing individual connections.

This section is probably best understood in terms of text-file input where each file may have a different format (and therefore require different input options). However, almost all the discussion in this section also applies (and is available to) all input types (IP-sockets, Linux-sockets, text-files, pipes, local-kernel). Before going further, let me 'set the stage'.

As mentioned elsewhere, all (new) Syslogd2 configuration (including input-specifications) is done in the form of command-line options.

~ --keyword [=] input-source [, comma-separated-option-list ] [## optional in-line comment]

Input keywords are:


Every message entering Syslogd2 undergoes 3 stages of processing. The term 'input processing' includes the first two phases of processsing.


After the input-string has been fully parsed and resolved, the input record may instruct Sysloggd2 to run the messae fields throuh an input-filter.


Note that there are two ways in which an input-string's facility and priority value may be changed during input-processing (input-specification options or input-filters). There is a very subtle but significant difference between the two. In both cases, the only input-strings that could possibly be affected are those that enter Syslogd2 via the specified input specification. For both methods, the facility.priority specification '*.*' really means 'all data entering via a specified input-specification' because of the limited scope of input-specification itself. (Consider the difference between traffic coming in on the traditional address of '*.*' versus two data streams from 'localhost' and 'hostname.domain' [the external interface].)


Example: When '*.*' is sent to a filter (from perhaps a Linux-based VPN host), anything not re-directed to another facility will get filed to the default locations. This includes authentication records, or traffic using the 'daemon', 'user', 'kernel' or other facilities.

When the input options are used to send all traffic to 'extra3' facility (and priority is not set meaning keep the current priority values), filters can still be used for more granular control, but all traffic not re-directed by the filter will STILL have been moved to 'extra3' facility were it can be logged, detected and reviewed.

As a consequence of Syslogd2 being network aware, each input specification may list a set of network-states in which Syslogd2 should consider it 'valid' with all other network-states being considered 'invalid'. Whenever the network is in an 'invalid' state, the interface will not initialize. This is part of the overall 'network awareness' feature intended to make syslog logging viable for Linux-on-laptops. It has the 'side effect' of allowing Syslogd2 to start up and collect data before the network is started, then to forward the collected data to a remote host after the network is up and running.

Multi-Homed Hosts

[Top of page]

Syslogd2 allows administrators to disable the default IP address while leaving IP support enabled for other (non-default) IP addresses or ports. It considers the individual ip-address (along with identity-options 'port' and 'protocol') to be the fundamental basis for specifying IP or Linux socket connections. A major reaon for this is to support multi-homed hosts, allowing different configuration settings for different interfaces.

As a network-data collection tool, Syslogd2 should be (and is) prepared to support multi-homed hosts (hosts with more than one interface connected to different network subnets) for independent collection of syslog data from each interface or subnet. One place where this is particularly useful is in DMZ scenarios where a multi-homed host may have one interface attached to the internet segment just inside the internet-specific outer firewall, a second interface connected to the extranet switch (to monitor extranet routers) and perhaps a 3rd interface connected to a web-server segment to collect data from web-servers.

In order to provide expected levels of support in the above situation, the use of the address wildcard '*' which is the standard for host-logging systems is clearly not suitable. Syslogd2 allows the administrator to disable the default internet interface of ['*',UDP/514] and to then break down the '*' wild-card address into individual, address-specific connections - each with different option-lists (including different filter settings).

This also means that Syslogd2 can support the use of a secondary address as an 'overlay' for different (distinct) sets of data-sources. For example: a given host may be collecting host data from a subset of Linux servers on its default address as part of the open-source Puppet host-management system. A second IP address is added to the host. The second address is to be used to collect data from network devices. Network-management data is kept separated from the host-management data because the secondary address is configured to shift all incoming data to 'extra4' immediately upon receipt. Filters are then applied to further break the network data down for logging into a separate set of network log files and forwarding to a different central network host using other 'extra' facilities. This second address may or may not be on the same physical interface as the first (so may or may not technically be a 'multi-homed' host, but it will still function as if the two interfaces were separate hosts. It will also likely have a unique name in the DNS server for the convenience of humans.

Network Awareness

[Top of page]     [More Information]

The concept of Network Awareness is that Syslogd2 frequently checks the 'state' of the network and adjusts it's behaviour and working configuration based on a combination of the current network state, the desired network state and instructions in the configuration file.

This concept is somewhat experimental. The experimental parts are to see if I can technically make this work and do so in a semi-efficient manner and then to explore the resulting characteristic performance of my host to see if and what additional changes may need to be made to other OS components.

The full concept (that shows one of the possible future directions of Syslogd2) is that at any given time, the network is in one of several possible 'states':

Configuration file configuration is currenlty only partially implemented, but will eventually allow any parameter to be configured differently depending on what network-state the host is determined to be in. Whenever the network state changes, some amount (possibly total) reconfiuration needs to occur, then the new network-state needs to be set as the 'current' state until thhe next network state-change occurs.

This concept (as stated above) is partially implemented now. Syslogd2 recognizes the Down, Local and Other states as well as the pseudo-state of Any which is used for configuration and also set as the default network state for configuration. So far, the only confiuration components that network-state changges affect are input and output connections.

Other network-state names may be given, but the current implementation will not recognize them and will therefore not enter any user-defined state.

This concept and partial implementation when combined with the spooling feature creates the Network-State Spooling feature. When combined with startup testing, it is the core of the delayed resolution feature where IP addresses do not even attempt to resolve until the network has been detected to be up and the interfaces (and DNS) are known to be active and stable.

Network awareness is also a key component of both input processing and output processing because it drives whether or not the current environment needs to be rebuilt (and attempts made to resolve previously unresolved IP references) in light of the newly-determined network-state.

Output Processing

[Top of page]     [More Information]

Output processing in Syslogd2 as undergone even more changes than the input-processing, so just the highlights will be covered here. Please visit the link to more information for a more thorough coverage of the changes and more implementation details on these concepts.

In general, Syslogd2 output processing is designed around some common-sense observations about data-communications and the current configuration-file format and the (now obsolete) assumptions that went into that specification:

  1. As a consequence of Syslogd2 being network aware, each output may list a set of network-states in which Syslogd2 should consider it 'valid' with all other states being 'invalid'. Whenever the network is in an 'invalid' state, the interface will not initialize unless network-state-spooling is enabled.
  2. When transmitting data to a remote host or local linux socket or pipe, the data is not being transferred to an IP-address, pipe or socket but to an application. This has multiple implications:
    • The remote application may be reachable by several methods -- all of which are equally preferrable as long as the data arrives at the application. An output-specification may specify a hostname, a host nickname, an IPV4 or IPv6 address, any of numerous ports the application may be listening on (as long as it treats data from all ports the same), different interfaces or secondary addresses on the same host (as long as the application is listening on those addresses and ports and treats incoming data the way), etc. An example of this is any syslog application that listens on the '*' address for local IP input. It can be reached via '127.0.0.1', '::1', <external address>, hostname, 'localhost', 'ip6-localhost', etc.
    • Multiple output lines specifying different aliases for the same destination should be allowed in the configuration file and combined (resolved) into a single entry using a single connection with (perhaps) alternative connection methods (multiple addresses, address-families, multiple ports, etc). This is done by Syslogd2. With current code, multiple output lines specifying aliases will result in multiple connections and possibly duplicate messages being transmitted.

  3. The network support team is not perfect. The network will fail from time-to-time. The longer the network is down, the more serious the problem and the more likely it is to stay down for the next 'increasingly longer interval'.

  4. The format of a syslog configuraiton output-line limits the ability of newer approaches and applications to provide services due to the lack of ability to expand to meet new needs.
    • The selector-string and the location to which the data is sent are two separate parameters that serve two distinctly different purposes. Therefore, it is not unreasonable for a new syslog service (such as Syslogd2) to provide a set of options dependent on selector-string values with a different set dependent on location-values.
      • The selector-string is not a monolithic entity, but is comprised of a 'collection' of 'output-streams' that are all going to the same location. Each component of the overall selector-string may have different requirements for filtering, for the expiration-time of the data it represents, whether or not spooling is appropriate (and if so, what kind of spooling) or other parameters. Specific selector-string related options include:
        • Whether to filter the outputstream going to this particular location - and if so, which filter to use. The same selector-string-component-stream (for example: daemon.*) may be filtered differently depending on which host it is going to.
        • The time-to-live value assigned to the data carried by this selector-string-component-stream. User-activity data may be permanent while a notification of a failed network link may only be valid for 5 minutes or less (since the manager's desk-telephone will be ringing within a short time frame anyway...).
        • If the data cannot be immediately transmitted, is it important enough to spool for later transmission ?
      • The location component will also have certain data-structures and options that uniquely define it and others that it uses to support its associated selector-string components (data-streams). Some of these (location-specific) structures and options are:
        • The connection (and identity-options such as the protocol (stream vs datagram or TCP vs UDP) and IP-port). Differences in these options define an entirely different location.
        • The state of the connection is a location parameter since it is common to all associated selector-string components.
        • The name, size and action-when-full for the spool-file are location-specific, but the choice of whether or not to spool a specific message is dependent on the nature of the data selected by the selector-string.
    • Syslogd2 'extends' each output-line with an optional comma-separated option-list at the end of the line (as currently-defined). The options in this list are a mix of selector-string options and location-options pertaining to the values expressed on that line.

  5. The issues highlighted above bring up another (secondary) issue: If individual selector-string components are to be allowed to have unique options, how can this be expressed in the configuration file ? Syslogd2's 'answer' to this issue is to allow multiple output-lines specifyingg the same location. During the parsing phase of startup, the individual selector-string component options will be recorded for later use.

    During the build phase of startup, the output-lines will be 'resolved' and combined into one location-structure with a 'summary' facpri-mask composed from all assigned selector-string components. This 'summary' mask 're-assembles' all the selector-string component facilities and priority values into the equivalent of what would be used by traditional syslog processors. Because there will be multiple (potentially conflicting) sets of location-information for any location with multiple selector-string components (when read from different output-lines), Syslogd2 accepts location-options ONLY from the 1st location instance encountered (this will be the one with the lowest line-number unless found in multiple input-files).

    During the execution phase of operation, the first selector-string-component that matches a message (lowest-line-number) 'wins' and its parameter-set is used for output-processing.

  6. As alluded to above, Syslogd2 can apply a second filter (an output filter), to traffic that exits Syslogd2 destined for a particular destination. More accurately, just before each copy of each message leaves Syslogd2 toward its destination, Syslogd2 will apply the output-filter designated by the selector-string-component that message was a part of. This output filter may instruct Sylogd2 to discard the message instead of tranmitting it or to make last-minute modifications.

    Because of its placement in the processing sequence, the output filter is anticipated to routinely perform two primary tasks:
    • Reduce the total output traffic volume to reduce network congestion and to preserve the capacity of network links for user traffic. An output filter might be used to reduce the stream 'daemon.*' to just 4 or 5 messagges per hour that the target host wants to see, discarding the rest of the output-stream as 'noise'.
    • Make last-second (destination-specific) modifications to the content of the outbound message-string. An output-filter might 're-translate' traffic from facility 'extra4' to 'local7' if sending to a syslog processor without the ability to handle higher-numbered facility-identifiers. An output-filter also might add additional database-fields (name=value pairs) if sending to DBD2 (Syslogd2's sister project).

  7. Then there is the issue of TCP timeouts. For a multi-threaded application, TCP timeouts can be performance-killers because they essentially 'lock' a thread for an extended period of time (usually from several seconds up to a full minute) while waiting for a connection that may never be established.

    This can be simulated with the telnet command. If the host is running, but the application is not (or if there is no application on the selected port), an immdiate connection refusal usually occurs ('telnet localhost 51'). If the host is down or at a different address, a timeout occurs. this timeout is much longer. ('telnet to a non-assigned address')

    The issue is that when the network (or host) is down or if the provided address is not correct, the timeout condition will occur when the first thread attempts to open the outbound connection for writing. The next thread that wants to write to that location has to wait for the first thread to release it, then will attempt to open the connection and time out in its turn. This will be repeated until (potentially) all threads are waiting in line for their turn to time out attempting to open this (doomed) outbound connection. Should the network fail or a destination-host go down during operations, most syslog processors will attempt to re-open a connection any time it finds a closed (or failed) connection and wants to write to it. This leads directly to the failure cycle above.

    As the W.O.P.P.E.R says in the movie WarGames: "...the only winning move is not to play...". Syslogd2 output threads will make one (and only one) attempt to open (or re-open) each outbound destination. If that attempt fails, the connection is marked 'invalid' and all future access attempts by the primary processing threads (having checked the connection-status first), will not attempt to open the connection, but will immediately spool or discard the message.

    Meanwhile, a periodic task ('CheckDestinations') will be run either by the parent thread or a thread from the housekeeping threadpool to attempt to re-open all failed connections and (if successful) clear their 'invalid' mark and unspool any spooled data over this newly-establised connection.

    Because the re-open attempt is done by a separate threadpool, the primary traffic-processors avoid the 'time-out trap' described above. The housekeeping threads can 'afford' to take as long as is required to re-open as many TCP connections as are necessary because they 'have nothing better to do' (that is what they are there for). Whenever the 'invalid' status-flag is cleared (by a housekeeping thread), the operational threads will automatically resume writing to the destination again while the housekeeping thread unspools any spooled data.

  8. The last major change to output processing that should be mentioned is how Syslogd2 handles large numbers of output destinations or a mix of fast and slow destinations. The traditional approach is to have a single thread process a single message from receipt through writing that message to all applicable destinations. The downside to this approach is that as each new destination is declared, the processing time for a single message gets longer. Very quickly, the processing thread can no longer finish one messaga and return to read the next one before that message has been over-written by yet more unread incoming messages, resulting in data-loss that an application has no way to track.

    Syslogd2 responds to this issue with CAP_WORKERTHREADS and CAP_OUTPUTHREADS. These two compiler symbols (if declared at compile-time) cause Syslogd2 to split the processing of each message into 3 parts as explained in the input processing section. When CAP_OUTPUTTHREADS is declared, Syslogd2 creates a new threadpool type (an output-threadpool) to which multiple destinations may be assigned. Since multiple outputthreadpools may be created (each one with some number of threads and an input buffer to hold as-yet-unprocessed input), large numbers of output destinations may be divided between as many as you wish to allocate resources to.

    Just as the destinations create a 'summary' mask from the selector-string components it supports, so too does each output threadpool create a 'summary mask' from the destinations it supports. This means that when a processing thread is ready to hand a message off to output threadpools, it need only check each threadpool (one comparison per threadpool) instead of each destination or selector-string (many more connections). Queing the messageg-data to an output-threadpool's input buffer (a memory operation) is also faster than writing to disk or to a device or network, allowing the processing thread to immediately work on the next message.

Programmable Interrupts

[Top of page]     [More Information]

Syslogd2 uses 4 named interrupts to allow admins to 'trigger' the immediate execution of certain background operations. The problem is that Syslogd2 has about 8-10 functions (and growing) that (in different scenarios), admins may want to trigger.

Syslogd2 resolves this conflict by making the four (4) interrupts configurable via global variables so they can be determined at run-time. (They can also be inspected or changed at run-time via the Command-Tool).

The four interrupts and their default settings are:

SigHUP: RotateFiles
SigINT: CheckFilters
SigUSR1: CheckSources
SigUSR2: CheckDestinations
(kill -HUP process-id)
(kill -INT process-id)
(kill -USR1 process-id)
(kill -USR2 process-id)

Static Data Parameters (SD-String)

[Top of page]     [More Information]

Due to its designated role as a syslog-data-collector (part of a larger network-management system), Syslogd2 has the ability to create and forward parameteres to downstream processes other than just the 'raw' syslog text. These parameters take the form of 'name=value' pairs with each pair surrounded by square brackets and with the bracketed pairs separated by whitespace (usually spaces). Together these parameters comprise what Syslogd2 refers to as the 'SD-String' (Static Data[-field] String). This string (though it may be thought of [and treated as] a separate entity or 'field' of the overall message) is (for formal purposes) the start of the free-form 'msg-string' and comes BEFORE the 'tag' field (that consists of "application-name + WS + [ '[' + pid + ']' [ + WS ]] + ': ' ". The purpose of the SDString is to pass fixed-field data through (or from) syslog in a format that downstream applications can parse.

As a working example of this idea, Syslogd2's sister project (DBD2) can receive data from Syslogd2 containing SD-fields, extract the fields and use insert the corresponding data into MySQL database table(s) in virtually any desired format. The combination of Syslogd2's filters for event selection and traffic management with CAP_JOURNALTHREADS in-depth information from systemd's journal and DBD2's ability to convert SD-Fields to MySQL database entries makes for a very potent start of a syslog-based netowrk-management system (or adjunct).

The Static Data Parameters concept complements (or is complemented by) Syslogd2 Filters, Variable Buffer Lengths, and CAP_JOURNALTHREADS features. Static Data Parameters also define a method for user applications to pass named data-parameters through syslog to downstream management databases.



New Concepts Implemented in Optional Feature-Sets


Command-Tool

[Top of page]     [More Information]

The Syslogd2 command-tool is an ncurses-based text-only interactive interface into an executing Syslogd2 daemon process. It allows for limited interactive configuration in real-time and provides a complete configuration-display and limited interactive-configuration capability based on the currently-executing Syslogd2 background-process.

At first, the idea of the command tool was simply to extend the functionality of the 'testconfig' configuration display from the 'pre-execution' configuration to the 'currently executing' environment. This was primarily for my own debugging purposes so I could confirm that both CheckSources and CheckDestinations were working properly as they built and re-built connections, properly accounting for conflict situations and setting all run-time flags and values as they should and so I could confirm that all data structures used at run-time were built and built correctly.

I immediately realized that my run-time confiuguration reporting connection could also function as a bi-directional link into the executing daemon, allowing me to obtain not only run-time configuration values (like open-file-descriptors and statistics), but also to send commands to implement simple configuration modifications.

The current version of the command-tool is still in a very embryonic stage of development, mostly because I chose to concentrate my development time on the main code of Syslogd2 and associated projects such as DBD2.

I did take some security and operational precautions in the design of the command-tool:


As of now (March 2023), command-tool supports:

Configuration Display

[Top of page]     [More Information]

It is unreasonable to expect administrators to be able to read a complex network-management configuration file and to be able to troubleshoot problems just from the image in their heads. It is equally unreasonable to attempt to develop a complex application with no way to check the correctness of the code that builds the data-structures that support that application. For both of these reasons, Syslogd2 supports the --TestConfig command-line option.

--TestConfig runs through the normal parsing and build startup phases, but just before it starts initializing run-time resources (such as memory-buffers, open files, sockets or pipes, semaphores, mutexes or threads), the --testconfig option causes this (possibly 2nd instance) of Syslogd2 to produce a display of the confiuration it has just parsed and built to the terminal and then exits.

Because this option causes Syslogd to exit before executing any run-time code, it can only be issued from the actual command-line and is not supported if moved into the configuration file itself (Well - Duh !!). As a consequence of using the same configuration file as the (possibly) executing code, --TestConfig automatically 'adjusts' the following files that are used during parsing and startup:


The configuration display is sent to the terminal screen by default. Parameters to the --testconfig command allow this output to be redirected to the defined logfile (--stdout) or to the error file (--stderr). Other options to --testconfig specify which display elements to show (the default is to show all elements) and which network-state to simulate. (Note if the network is actually in 'Down' or 'Local' state, and 'Other' is specified as the state to simulate, --Testconfig will assume that all IP addresses will be resolved during the build process.)

The configuration display is quite detailed and shows the expected status of every setting and every configuration-file entry that was successfully parsed and accepted by Syslogd2. This file is quite useful for debugging configurations, experimenting with different options (especially CAP_WORKERTHREADS and CAP_OUTPUTTREADS) or as a working reference for network-management personnel.

The --testconfig command-line option is limited, however to ONLY the before-execution environment. Obtaining a configuration display of a currently-executing Syslogd2 instance is also possible, but the command-tool must be used instead of --testconfig.

Filters

[Top of page]     [More Information]

Syslogd2 filters can be used to select messages based on string comparisons against the message content or hostname and/or comparisons against the facility or priority values.

Once selected, a message may be dropped from further processing, modified (either string modifications to content and/or hostname and/or modifications to facility and/or priority). Filters provide a key ability to sort and filter data before it ever leaves the point of initial receipt.

Syslogd2 divides its filter capability into two components named after which type of configuration line specifies them. Filters specified by input-specifications are called 'input-filters' while those specified on output-lines are called 'output-filters'. The operational difference between the two types is when they are executed:


Input filters apply to all traffic that enters Syslogd2 via the input-specification on which they were declared. Different filters may apply to each different input source.

Input filters are especially useful for sorting incoming messages by content since they can change the facility and priority of the message before output-selector-strings are searched for matching entries. Input filters are also especially useful when monitoring external log files with the tailfile option. Since there is no need for Syslogd2 to create a duplicate log file for forensic-analysis purposes, any events that are not to be forwarded to a remote host may be discarded at an 'early' point in processing.

Output filters apply to all traffic that exits Syslogd2 via the output-line on which they are declared. Since output-filters are affiliated with the selector-string component of output-lines, different filters may apply to different output-lines even if those lines specify the same destination.

Output filters are especially useful for discarding all unnecessary events that would otherwise be transmitted over the network, congesting network-links and central syslog applications. For example: instead of transmitting 'local0.*' (the syslog configuration of a switch, router or firewall network device), Syslogd2 can apply an output filter and transmit only those events containing the phrases "user", "logon", or "username", discarding all other traffic instead of transmitting it.

Output filters can also be an alternative or supplement to input-filters when processing external log files since different output-lines may use different filters (selecting different events) from the same (external, non-syslog) input log file (whether the file is 'pre-filtered' by an input filter or not).

Since output-filters are affiliated with selector-string elements, the element 'extra3' that may contain messages re-directed by an input-filter may elect to not filter output traffic or may elect to further filter the traffic by discarding selected events.

Advanced Multi-Threading

[Top of page]     [More Information]

Syslogd2 addresses the issues of (1) handling high-traffic generators and (2) diminishing returns for each added thread through its 'multi-dimensional' multi-threading structure. This structure can easily allow over 100 active threads to be 'gainfully employed' with little or no blocking of each other.

Integrated Name Cache (with pre-load file)

[Top of page]     [More Information]

Traditional syslog administrators often disable the use of DNS and forego host-name resolution for incoming IP traffic. This is usually done for one of three time-tested reasons:

  1. The hosts they are receiving traffic from are not in the DNS, so attempting to resolve the hostname is a waste of time. This is more common for network-devices in larger organizations that run their own internal DNS using commerical DNS products that charge licensing fees on a per-assigned-address basis. To save money on license fees, network devices are often omitted from the DNS since the only the people likely to have access to the network devices are the network engineers that use IP addresses.
  2. To avoid DNS server overloads. The load on the DNS server closely mirrors the aggregate input of the combined number of syslog servers that are doing host-resolution. For a busy network (wth lots of syslog activity from lots of different hosts), this 'extra' load on the DNS server can be enough to crash the DNS service.
  3. No DNS server is available. The location of the syslog server may not have DNS service (or the DNS service may not contain correct address information), resulting in a reversion to using the /etc/hosts file as a 'static' DNS substitute. This situation is frequently found in DMZ environments and some network test-lab environments.
  4. The problem with using /etc/hosts as a substitute for DNS is the number of individual hosts that must be individually (and manually) updated for each required change because /etc/hosts files almost always contain localized (host-specific) address information.

The integrated name-cache in Syslogd2 not only allows name-resolution to occur without any additional strain on either DNS or network links, but at the same time allows 'normalization' of hostnames - even if recieved from secondary or tertiary addresses to which DNS may have assigned different names.

The use of a single, centrally-maintained cache-pre-load file for all syslog hosts (that does not contain localized host information) allows name-resolution for incoming data in all the above situations. When a pre-load file is used, DNS is unnecessary unless DNS is to be used to 'fill-in' any names not found in the cache-file (a process known as 'resolving cache-misses'). When a cache-file is used istead of DNS, there is no additional load placed on the network or the DNS servers as a result of syslog name resolution. When a cache-file is NOT used, the load is greatly reduced since only one DNS request per host will be made (results from te first DNS query are stored in the internal cache. additional load placed on the DNS server or network regardless of the volume of traffic received by the Sysloggd2 daemon.

Named-Pipe Input

[Top of page]     [More Information]

Syslogd2 supports named-pipes as an input-source. The main reason for doing so is to better support its ApplicationMode functionality.

Named-pipes are virtually the only way to obtain data from legacy syslog processors that is not blocked by the legacy processor's assumption that enabling IP-service also entails opening an IP listening socket on all available addresses.

Because the legacy application has reserved all addresses at UDP port 514 to itself and can only transmit to port UDP/514, Syslogd2 (running on the same host) is unable to listen for IP traffic on the only port the legacy system can transmit to (UDP/514). Because of this Syslogd2 cannot reasonably expect to receive IP traffic from legacy syslog servers while running in ApplicationMode. Since legacy syslog processors do not traditionally have an option to write to Unix or Linux sockets, that leaves only their named-pipe output method.

Testing of rsyslog in Linux shows rsyslog to be an exception to the above expectations in that rsyslog can write to non-standard UDP ports using the non-standard 'hostname:port' syntax. This is especially good because rsyslog does not include the facpri field when it writes to named-pipes, making rsyslog support for named-pipes much less useful for Syslogd2.

Spooling

[Top of page]     [More Information]

Syslogd2 supports two (2) types of spooling which may be used in tandem with each other. Either type of spooling will cause a message to be written to the same, shared spool-file designated for a given destination. Spool-files and their associated locations are recorded in a reserved file in the Syslogd2 spool-directory so they are persistant across Syslogd2 restarts and system reboots.


The configuration of spooling is split between destination-specific options and selector-string options. Localized parameters for MaxSpoolFileSize, SpoolFileName and SpoolFileAction that are common to all selector-string elements configured for the same location are location-options. Localized parameters for SpoolFileAge, and whether or not to use connection-spooling (keyword: 'sf') or network-state-spooling (keyword: 'spool') are selector-string opitons that are set based on the data represented by each selector-string-component.

If configured, spooling is started automatically if conditions are met (wrong network state or broken connection).

TCP Support

[Top of page]

The support of TCP may not seem like a 'new' feature to those that have worked with rsyslog, but compared to Unix, BSD, Unix and OSX operating systems (rsyslog is Linux-only), it will be new to some systems as Syslogd2 eventually gets ported to run on different platforms.

Syslogd2 considers TCP support to be the IP-component of the more inclusive term 'streaming protocol' which also includes (and takes its name from) the Linux socket streaming protocol. (Streaming protocol is the complement (and alternative) to the default 'datagram' protocol, the IP-component of which is 'UDP'.) Another way to consider the distinction is that streaming protocols are connection-oriented protocols, while databram protocols are connection-less protocols.

Streaming connections are required as pre-requisites for several other features of Syslogd2. Among them are spooling and command-tool. Once declared at compile-time, stream protocol will be dormant unless confiured. To configure a socket connection to use streaming rather than datagram protocol, use the 'stream' option (or one of its alias keywords: 'TCP', 's' or 'T') in the connection-specification's option list.

Text-File Input

[Top of page]     [More Information]

Text-file input is a good example of the difference between a 'host-logging' daemon and a 'syslog-data-collection' daemon. A daemon whose functional boundaries are drawn primarily by the need to log locally-generated syslog traffic to disk has little (to no) incentive to gather data from non-syslog sources (such as text-files). However, ignoring non-syslog-formatted input (or non-socket input in general) makes no sense from a standpoint of participation in any type of network- or host-management environment.


Network-management systems need to be able to support those (relatively rare) occasions when a home-grown (or even open-source/commercial) application is running on a local platform, producing a log file that is not sent to syslog.


Host-manaement systems frequently encounter such log files (web-server logs, MySQL/mariaDB log files, home-grown monitor scripts of all types, etc).


Syslogd2's --tailfile command-line option allows specification of a text-file and a list of opitons that can be specified to parse that file and to specify how Syslogd2 processes input from that file.

One of the less obvious differences in processing a text-file as syslog input is whether or not the syslog-data-collector should log all received content to disk. Syslogd2 leaves this as an option for the user: Choices are log everything to local disk, log nothing to local disk or apply an output-filter and log selected events to local disk (remembering that we are reading from a file that already exists on disk).

Text-file input in Syslogd2 is supported more for the ability to capture data to process, integrate and normalize for the benefit of downstream processes than for any need to actually log a copy of existing data local disk. The option to apply an output filter to log partial data makes some sense if the desire is to parhaps track a specific interimittent error or extract data for a specific purpose.

Variable Buffer-Lengths

[Top of page]     [More Information]

The length of various internal buffers can be modified at run-time if the CAP_VARBUFFERS compiler-symbol was selected at compile-time. The idea behind CAP_VARBUFFERS is allows the user to shorten buffer-lengths to save memory or to expand buffers to allow processing of extra-long strings. This capability was shown to be crucial during the development of the CAP_JOURNALTHREADS feature.

When CAP_VARBUFFERS is declared, the following new global parameters become available (all of which take a non-negative integer as a parameter):