New Concepts in Syslogd2
(Rethinking the 'Common Knowledge' of Syslog Event Processing)
Syslogd2 implements several features based on technologies and techniques never before integraed
into a system logging daemon.
Some of these features are based on computer-science theory (such as capacity-analysis,
and queueing-theory), some on operations-research techniques (such as human-nature-based tendencies and
traffic-flow-management) and some features are based on simple common-sense, but ALL are based on experience
(specifically MY experience) attempting to manage significant numbers of Linux servers and network
devices.
To some extent, Syslogd2 is also somewhat of an experiment to discover what features would be
most useful in network- and host-administration personnel as well as the result of about 2 decades of
hobby programming...
To fully understand Syslogd2, it is necessary to question every assumption currently held about how
syslog is used and configured in today's networks, as compared to its original purpose when it was first
devised in the early 1970s.
Syslogd2 is more of a 'feature grab-bag' and 'network-data-collector'
than it is a 'one-size-fits-all' 'host-based-logger' daemon (like rsyslog or syslog-ng).
This (New Concepts) page attempts to summarize most of the new features and concepts and to provide links
to the web pages where each feature or concept is discussed in more detail.
Syslog-ng existed for a brief time for Linux, but due to it's radically different configuration
structure that only a programmer could love, it was never widely accepted.
Rsyslog introduced a mult-threaded version of a syslog daemon in the early 2000s that supports
TCP/IP and IPv6 (and now also supports reading the systemd journal interface to obtain syslog messages).
Rsyslog is Linux-only as it relies on system calls that are unique to the Linux OS.
Syslogd2 is being released in 2023/2024 and supports a variety of features.
Written in 'C' with no external support libraries required, Syslogd2 is portable to any Unix-like OS.
(It HAS been ported to MAX OSX though that port is currently out-of-date.)
It is my hope that the addtional functionality of Syslogd2 and its ability to be ported to
other systems will be able to overcome the industry's usual inertia against anything new or different.
That was a pretty brief summary - wasn't it ?
Syslogd2 was designed and developed expressly to address all the issues I experienced while trying to use the syslog protocol in both network and host administration environments.
'Application Mode' refers to the ability of Syslogd2 to run along side of another syslog daemon, sharing the same host yet not
interfering with (or being interfered with by) the existing syslog processor.
This might be desirable where a network-management data-collection capability is desired on a host that is under the control of a different organization
or where the existing syslog processor is providing some output-formatting or other service that Syslogd2 is not designed to support.
This 'application' role allows Syslogd2 to enhance or supplement (not replace) the abilities of the host's existing syslog processor
in the role of of syslog data colleciton, processing and logging.
For example, Syslogd2 might be deployed on a cloud-based server or a remotely-hosted server to monitor one or more log files, reporting back
to 'home base' only selected events instead of the entire contents of facility/priority data streams or it may allow Syslogd2 to be deployed
to collect and concentrate TCP syslog streams and log-file input on a host whose native syslog processor does not support TCP connections.
Running Syslogd2 in Applciation Mode is not without drawbacks.
One issue is how to set up the data-transfer from the native syslog processor into Syslogd2 where filters and other processing steps can be applied
for purposes of event filtering and 'noise reduction' (reducing the congestion of network-links and down-stream processors).
Running in 'Application Mode' is the primary use-case for Syslogd2's support of named-pipe input.
Not running as the primary syslog processor for a host is generally a less efficient method of syslog data collection (both because many message
will have to be processed twice and because Syslogd2 may not recieve 'clean' or complete data either through administrative configuration error or
if the other processor is unable to keep up with the incoming traffic and the system buffers drop messages that it is not fast enough to read).
There may also be instances where a host (perhaps not even a Linux host) processes syslog data in a manner that is 'foreign' to Syslogd2
due either to the proprietary nature of the host operating system or due to some unique capability that Syslogd2 does not possess.
(One example I've already found is the unique format of the OSX operationg system's syslog file output).
For all the above reasons, Syslogd2 provides the --ApplicationMode 'macro' that simplifies the process of 'converting' Syslogd2 from
a system-default-syslog-service daemon to simply a backgrounding application process.
'--ApplicationMode' makes the following changes as if they had been to compile-time settings (meaning they may still be individually over-ridden by run-time settings without running afoul of Syslogd2
first-come-first-serve parsing policy).
Application Mode may be invoked in using either of two methods. As a Syslogd2 keyword, it is non-case-sensitive either way:
The promise of Syslogd2 running in ApplicationMode on a UNIX or Linux host is not without potential failure. There are technical
and implementation issues that may restrict the ability of a Syslogd2 application to receive syslog data from a legacy syslog processor
running on the same host.
Because there is at least one method of communication between rsyslo (legacy syslog processo) and Syslogd2 on Linux, it is feasible
to run both rsyslog and Syslogd2 on the same host (if Syslogd2 is in 'ApplicationMode').
This allows Syslogd2 to function as a supplement to rsyslog for network-management purposes (reading log-file (and perhaps IP or kernel input)
input while receiving other syslog data from rsyslog).
This feasibility is enhanced because rsyslog is designed with pluggable modules for (at least) kernel input and IP support, so it appears that
these input functions can be disabled on either system and enabled on the other.
Traditional (single-threaded) syslog processors (host-loggers) assume that the network will be up and
stable at all times so if a host canot be resolved at startup, they mark that host unusable and move on.
At no time do they go back and attempt to re-resolve a hostname that did not initialize at startup.
A big part of the reason for this is that most traditional syslog processors are not multi-threaded so
once they finish startup, they start a processing loop that only terminates when they shut down.
To leave that processing loop for any purpose would be to stop processing syslog
traffic and to potentially miss incoming traffic due to system-buffer overflows.
Syslogd2 (being multi-threaded) has a specialized threadpool set aside to run background operations
while the primary threadpools are focused on processing syslog input.
This new threadpool is called a 'housekeeping' threadpool because it does various 'maintenance' chores
for the Syslogd2 applicaiton.
Two of the 'maintenance chores' that it is called upon to do (by the parent-thread running as
the main-scheduler) are to periodically check the network state and run the CheckSources routine
and the CheckDesitnations routine.
In the event of an 'upwards' network state change (Down->Local or Local->Other), these routines
call upon a 2nd pair of routines to 'walk' the list of input and output connections and to resolve any
previously unresolved entries.
After resolving all entries that can be resolved, the proposed connections are checked for conflicts.
Those entries not marked 'in-conflict' either with the network state or with each other, are then opened.
Those connections that succeed in being opened at this time (either newly resolved or previously opened,
but closed due to communication error) are returned to service -- all while
the primary threads continue to process incoming syslog input.
The term 'Delayed Resolution' refers to the (possibly lengthy) delay between system startup and
when individual IP hosts can be resolved and activated.
Syslogd2's awareness of the state of the network combined with
periodic attempts to resolve IP hostnames implements both delayed resolution and standard
network-recovery.
(Note this can only occur if the network is in state 'Other' if DNS servcie is required to resolve addresses
because otherwise the DNS is assumed to be unreachable/unusable.)
In addition to just checking and reopening input/output connections, CheckSources and
CheckDestinations do other connection-housekeeping chores as well.
Among these chores are:
Deployment of Syslogd2 is a topic that many may disagree with. This is my perspective. Deployment (for purposes of this section) covers two conceptual areas:
The need to consider how a system is deployed is something that host-loggingg packages rarely consider since their deployment approach is simple: one daemon
per host with a scope of responsiblity of just that one host.
It is a rare syslog daemon indeed that gives any thought or concern to the administrators that will have to support the system after deployment. How do
configuration changes get updated ? Package upgrades ?
Deployment of Syslogd2 has been a consideration from its inception. The link above will take you to a page that displays some possible deployment scenarios
for Syslogd2. The short version is that since Syslogd2 is a proper superset of the requirements of legacy logging daemons (with the possible exception
of some of rsyslog's more esoteric output formats), it can be deployed on any Linux host as a substitute for that host's current logging daemon.
Once deployed, Syslogd2 has the facilities and capacity to collect data from text log-files or remote IP devices and to act as either a felay or as an first-receipt
host to inject these additional sources into the overall syslog-management infrasttructure. What makes Syslogd2 so attractive for network-manamgement purposes ?
Using Syslogd2's SoftComment command-line option, a single file can be maintained and deployed generically to support both
Syslogd2 and a legacy syslog-colleciton package (rsyslog) in a corporate host-management environment. Using the optional paremeters to the
--IncludeConfig command-line option, some advanced planning and foresight active corporate syslog management becomes very doable.
Deployment in a network-management mode needs a different approach since syslog-data collection from network-devices requires external host support.
Syslogd2 can be deployed on Linux hosts in a distributed manner to act as these syslog collectors.
Syslogd2's filters and other features provide superior syslog-colleciton performance, but just deploying the Linux hosts allows for distribution of other
network services.
The distrubted 'network-service-hosts' hosts would have sufficient left-over resources (cpu-cycles, memory, disk space) to act as distributed DNS secondary servers
and/or NIS secondary servers in the event of link failure, allowing users to continue working even if temporarily cut off from the primary network services.
An important feature of Syslogd2 is full support for facilities that have higher numeric values than '23'
(local7) as well as some 'reserved' facilities that have no assigned name in Linux.
These 'extra' facilities are currenlty undefined by either the operating system or the international community which is good for
network management purposes as it keeps them from becoming 'polluted' by multiple applications' direct use, leaving them 'clean' for use by network-management
and logging applications.
(Note: Because some non-Linux operating systems assign values to these facilities and allow their use, facility and priority values for such traffic can be
preserved when processed by Syslogd2.)
By itself, this does not sound like much, (after all, what value could there possibly be in syslog facilities that no-one can ever use ?), but the ability
to access these facilities in combination with Syslogd2 filters and some of the input-processing options is a very powerful tool for point-of-receipt
selection and first-stage filtering of network-oriented syslog traffic.
Spoilers:
By default, Syslogd2 provides 32 additional facilities, but at compile-time this quantity can be changed to a number between 0 [zero] and 1000 inclusive.
These additional facilities are referred to as 'extra' facilities in Syslogd2 (extra0 - extra31).
As an addiitional 'feature', Syslogd2 makes accessible the 4 reserved facilities at numeric values 12-15.
Some operating systems make use of these values under various names even though there appears to have been no international consensus to formally define them.
Since Linux does not define these syslog facilities, Syslogd2 refers to them as reserved0 - reserverd3.
When collecting external (non-syslog-generated) log data for down-stream processes, it is important for upstream collectors to remember that off-host processes may not
be able to distinguish between syslog events collected from two instances of the same non-syslog-reporting application.
This may mean that the hostname of a message can no longer be assumed if there is more than one instance of a similar reporting source on the network.
Syslogd2 provides various means of determining the contents of the hostname fields for locally-generated messages.
This is in response to my experience that sometimes a host's name either does not match the company standard naming scheme or does not adequately describe the source of the syslog events.
For situations like these, it is more important for downstream applications to know the name of the host being monitored than it is to know the name of the host doing
the monitoring (logging), yet whether using Syslogd2 or legacy syslog-processors, downstream processes will receive syslog-events with a hostname field indicating
the 'canonical hostname' rather than the (desired) hostname of the device being monitored.
Syslogd2 supports an input-specification 'hostname' option that allows each input-specification to specify a hostname to use when processing data from that input.
By itself, this 'hostname' parameter acts (through inheritance) to over-ride the global 'hostname' and 'domain' variables for entries that do not supply a hostname field.
(This will likely be primarily text-files that Syslogd2 has been told to use for input.)
When combined with the input-specification option 'ignore hostname', this 'default value' becomes an over-ride value to replace any pre-existing hostname.
Taking this a step further, there is nothing in the specifications that states that the host-name field of a message MUST be ONLY the 'canonical name' of the host,
so some system admins may wish to treat their servers (for logging and reporting purposes) as 'subdomains' creating hostnames of the form 'app.host.domain'.
For this purpose (among others), Syslogd2 supports the two variables '$h' and '$d' that may be used in the 'location' component of input-specifications or
in the option-list with the 'hostname' option. '$h' is replaced during startup with the value of the global variable 'hostname' and '$d'
gets replace with the contents of 'domain'.
~ --tailfile = /directory/web_log_file, hostname=myweb.$h.$d, ...
~ --tailfile = /directory/filename, hostname = callcenter1.$d, ignore = hostname ...
The second example above results in downstream processes seeing 'callcenter1.domainname' as the hostname field.
Input processing has undergone significant changes in Syslogd2.
Perhaps the biggest conceptual change to input processing is that Syslogd2 is designed around individually-configured connections (individual [IP or Linux]
sockets, individual files, etc) where individualized configuration parameters are available to modify inherited global values.
Typical syslog processors are designed around global configuration values with no options for customizing individual connections.
This section is probably best understood in terms of text-file input where each file may have a different format (and therefore require different input options).
However, almost all the discussion in this section also applies (and is available to) all input types (IP-sockets, Linux-sockets, text-files, pipes, local-kernel).
Before going further, let me 'set the stage'.
As mentioned elsewhere, all (new) Syslogd2 configuration (including input-specifications) is done in the form of command-line options.
~ --keyword [=] input-source [, comma-separated-option-list ] [## optional in-line comment]
Input keywords are:
Every message entering Syslogd2 undergoes 3 stages of processing. The term 'input processing' includes the first two phases of processsing.
If Syslogd2 attempted to parse a time-value but failed to do so, the message will be processed using the system-clock value from time-of-receipt and both time-string and host-string will be assumed to be missing.
After the input-string has been fully parsed and resolved, the input record may instruct Sysloggd2 to run the messae fields throuh an input-filter.
Note that there are two ways in which an input-string's facility and priority value may be changed during input-processing (input-specification options or input-filters). There is a very subtle but significant difference between the two. In both cases, the only input-strings that could possibly be affected are those that enter Syslogd2 via the specified input specification. For both methods, the facility.priority specification '*.*' really means 'all data entering via a specified input-specification' because of the limited scope of input-specification itself. (Consider the difference between traffic coming in on the traditional address of '*.*' versus two data streams from 'localhost' and 'hostname.domain' [the external interface].)
Example: When '*.*' is sent to a filter (from perhaps a Linux-based VPN host), anything not re-directed to another facility will get filed to the default locations.
This includes authentication records, or traffic using the 'daemon', 'user', 'kernel' or other facilities.
When the input options are used to send all traffic to 'extra3' facility (and priority is not set meaning keep the current priority values), filters can still be
used for more granular control, but all traffic not re-directed by the filter will STILL have been moved to 'extra3' facility were it can be logged,
detected and reviewed.
As a consequence of Syslogd2 being network aware, each input specification may list a set of network-states in which Syslogd2
should consider it 'valid' with all other network-states being considered 'invalid'. Whenever the network is in an 'invalid' state, the interface will not initialize.
This is part of the overall 'network awareness' feature intended to make syslog logging viable for Linux-on-laptops. It has the 'side effect' of allowing
Syslogd2 to start up and collect data before the network is started, then to forward the collected data to a remote host after the network is up and running.
Syslogd2 allows administrators to disable the default IP address while leaving IP support enabled for other (non-default) IP addresses or ports.
It considers the individual ip-address (along with identity-options 'port' and 'protocol') to be the fundamental basis for
specifying IP or Linux socket connections.
A major reaon for this is to support multi-homed hosts, allowing different configuration settings for different interfaces.
As a network-data collection tool, Syslogd2 should be (and is) prepared to support multi-homed hosts (hosts with more than one interface connected to
different network subnets) for independent collection of syslog data from each interface or subnet. One place where this is particularly useful is
in DMZ scenarios where a multi-homed host may have one interface attached to the internet segment just inside the internet-specific outer firewall,
a second interface connected to the extranet switch (to monitor extranet routers) and perhaps a 3rd interface connected to a web-server segment to
collect data from web-servers.
In order to provide expected levels of support in the above situation, the use of the address wildcard '*' which is the standard for host-logging
systems is clearly not suitable. Syslogd2 allows the administrator to disable the default internet interface of ['*',UDP/514] and to then
break down the '*' wild-card address into individual, address-specific connections - each with different option-lists (including different
filter settings).
This also means that Syslogd2 can support the use of a secondary address as an 'overlay' for different (distinct) sets of data-sources. For example:
a given host may be collecting host data from a subset of Linux servers on its default address as part of the open-source Puppet
host-management system. A second IP address is added to the host.
The second address is to be used to collect data from network devices. Network-management data is kept separated from the host-management data
because the secondary address is configured to shift all incoming data to 'extra4' immediately upon receipt. Filters are then applied to further
break the network data down for logging into a separate set of network log files and forwarding to a different central network host using other 'extra' facilities.
This second address may or may not be on the same physical interface as the first (so may or may not technically be a 'multi-homed' host, but it
will still function as if the two interfaces were separate hosts. It will also likely have a unique name in the DNS server for the convenience of humans.
The concept of Network Awareness is that Syslogd2 frequently checks the 'state' of the network and adjusts it's behaviour
and working configuration based on a combination of the current network state, the desired network state and instructions in the configuration file.
This concept is somewhat experimental. The experimental parts are to see if I can technically make this work and do so in a semi-efficient manner and
then to explore the resulting characteristic performance of my host to see if and what additional changes may need to be made to other OS components.
The full concept (that shows one of the possible future directions of Syslogd2) is that at any given time, the network is in one of several possible 'states':
Configuration file configuration is currenlty only partially implemented, but will eventually allow any parameter to be configured differently depending on what
network-state the host is determined to be in. Whenever the network state changes, some amount (possibly total) reconfiuration needs to occur,
then the new network-state needs to be set as the 'current' state until thhe next network state-change occurs.
This concept (as stated above) is partially implemented now. Syslogd2 recognizes the Down, Local and Other states as well as the pseudo-state
of Any which is used for configuration and also set as the default network state for configuration. So far, the only confiuration components that network-state
changges affect are input and output connections.
Other network-state names may be given, but the current implementation will not recognize them and will therefore not enter any user-defined state.
This concept and partial implementation when combined with the spooling feature creates the Network-State Spooling feature.
When combined with startup testing, it is the core of the delayed resolution feature
where IP addresses do not even attempt to resolve until the network has been detected to be up and the interfaces (and DNS) are known to be active and stable.
Network awareness is also a key component of both input processing and output processing because it drives
whether or not the current environment needs to be rebuilt (and attempts made to resolve previously unresolved IP references) in light of the newly-determined
network-state.
Output processing in Syslogd2 as undergone even more changes than the input-processing, so just the highlights will be covered here.
Please visit the link to more information for a more thorough coverage of the changes and more implementation details on these concepts.
In general, Syslogd2 output processing is designed around some common-sense observations about data-communications and the current configuration-file
format and the (now obsolete) assumptions that went into that specification:
Syslogd2 uses 4 named interrupts to allow admins to 'trigger' the immediate execution of certain background operations. The problem is that
Syslogd2 has about 8-10 functions (and growing) that (in different scenarios), admins may want to trigger.
Syslogd2 resolves this conflict by making the four (4) interrupts configurable via global variables so they can be determined
at run-time. (They can also be inspected or changed at run-time via the Command-Tool).
The four interrupts and their default settings are:
Due to its designated role as a syslog-data-collector (part of a larger network-management system), Syslogd2 has the ability to create and forward parameteres
to downstream processes other than just the 'raw' syslog text.
These parameters take the form of 'name=value' pairs with each pair surrounded by square brackets and with the bracketed pairs separated by whitespace (usually spaces).
Together these parameters comprise what Syslogd2 refers to as the 'SD-String' (Static Data[-field] String).
This string (though it may be thought of [and treated as] a separate entity or 'field' of the overall message) is (for formal purposes) the start of the
free-form 'msg-string' and comes BEFORE the 'tag' field (that consists of "application-name + WS + [ '[' + pid + ']' [ + WS ]] + ': ' ".
The purpose of the SDString is to pass fixed-field data through (or from) syslog in a format that downstream applications can parse.
As a working example of this idea, Syslogd2's sister project (DBD2) can receive data from Syslogd2 containing SD-fields, extract the fields and use
insert the corresponding data into MySQL database table(s) in virtually any desired format.
The combination of Syslogd2's filters for event selection and traffic management with CAP_JOURNALTHREADS in-depth information from systemd's journal and DBD2's
ability to convert SD-Fields to MySQL database entries makes for a very potent start of a syslog-based netowrk-management system (or adjunct).
The Static Data Parameters concept complements (or is complemented by) Syslogd2 Filters, Variable Buffer Lengths,
and CAP_JOURNALTHREADS features.
Static Data Parameters also define a method for user applications to pass named data-parameters through syslog to downstream management databases.
The Syslogd2 command-tool is an ncurses-based text-only interactive interface into an executing Syslogd2 daemon process.
It allows for limited interactive configuration in real-time and provides a complete configuration-display and limited interactive-configuration
capability based on the currently-executing Syslogd2 background-process.
At first, the idea of the command tool was simply to extend the functionality of the 'testconfig' configuration display from the 'pre-execution' configuration
to the 'currently executing' environment.
This was primarily for my own debugging purposes so I could confirm that both CheckSources and CheckDestinations were working properly as they built and re-built
connections, properly accounting for conflict situations and setting all run-time flags and values as they should and so I could confirm that all data
structures used at run-time were built and built correctly.
I immediately realized that my run-time confiuguration reporting connection could also function as a bi-directional link into the executing daemon, allowing
me to obtain not only run-time configuration values (like open-file-descriptors and statistics), but also to send commands to implement simple configuration
modifications.
The current version of the command-tool is still in a very embryonic stage of development, mostly because I chose to concentrate my development
time on the main code of Syslogd2 and associated projects such as DBD2.
I did take some security and operational precautions in the design of the command-tool:
As of now (March 2023), command-tool supports:
It is unreasonable to expect administrators to be able to read a complex network-management configuration file and to be able to troubleshoot problems
just from the image in their heads. It is equally unreasonable to attempt to develop a complex application with no way to check the correctness of the
code that builds the data-structures that support that application. For both of these reasons, Syslogd2 supports the --TestConfig command-line
option.
--TestConfig runs through the normal parsing and build startup phases, but just before it starts initializing run-time resources (such as
memory-buffers, open files, sockets or pipes, semaphores, mutexes or threads), the --testconfig option causes this (possibly 2nd instance) of Syslogd2
to produce a display of the confiuration it has just parsed and built to the terminal and then exits.
Because this option causes Syslogd to exit before executing any run-time code, it can only be issued from the actual command-line and is not
supported if moved into the configuration file itself (Well - Duh !!). As a consequence of using the same configuration file as the (possibly) executing code,
--TestConfig automatically 'adjusts' the following files that are used during parsing and startup:
The configuration display is sent to the terminal screen by default. Parameters to the --testconfig command allow this output to be redirected to the
defined logfile (--stdout) or to the error file (--stderr).
Other options to --testconfig specify which display elements to show (the default is to show all elements) and which network-state to simulate.
(Note if the network is actually in 'Down' or 'Local' state, and 'Other' is specified as the state to simulate, --Testconfig will assume that all
IP addresses will be resolved during the build process.)
The configuration display is quite detailed and shows the expected status of every setting and every configuration-file entry that was successfully
parsed and accepted by Syslogd2. This file is quite useful for debugging configurations, experimenting with different options (especially CAP_WORKERTHREADS
and CAP_OUTPUTTREADS) or as a working reference for network-management personnel.
The --testconfig command-line option is limited, however to ONLY the before-execution environment. Obtaining a configuration display of a currently-executing
Syslogd2 instance is also possible, but the command-tool must be used instead of --testconfig.
Syslogd2 filters can be used to select messages based on string comparisons against the message content or hostname and/or comparisons against the facility or priority values.
Once selected, a message may be dropped from further processing, modified (either string modifications to content and/or hostname and/or modifications to facility and/or priority).
Filters provide a key ability to sort and filter data before it ever leaves the point of initial receipt.
Syslogd2 divides its filter capability into two components named after which type of configuration line specifies them. Filters specified
by input-specifications are called 'input-filters' while those specified on output-lines are called 'output-filters'. The operational difference
between the two types is when they are executed:
Input filters apply to all traffic that enters Syslogd2 via the input-specification on which they were declared. Different filters may apply to each different input source.
Input filters are especially useful for sorting incoming messages by content since they can change the facility and priority of the message before
output-selector-strings are searched for matching entries. Input filters are also especially useful when monitoring external log files with the tailfile
option. Since there is no need for Syslogd2 to create a duplicate log file for forensic-analysis purposes, any events that are not to be forwarded to a
remote host may be discarded at an 'early' point in processing.
Output filters apply to all traffic that exits Syslogd2 via the output-line on which they are declared.
Since output-filters are affiliated with the selector-string component of output-lines, different filters may apply to different output-lines
even if those lines specify the same destination.
Output filters are especially useful for discarding all unnecessary events that would otherwise be transmitted
over the network, congesting network-links and central syslog applications. For example: instead of transmitting 'local0.*' (the syslog configuration of a
switch, router or firewall network device), Syslogd2 can apply an output filter and transmit only those events containing the phrases
"user", "logon", or "username", discarding all other traffic instead of transmitting it.
Output filters can also be an alternative or supplement to input-filters when processing external log files since different output-lines may use different
filters (selecting different events) from the same (external, non-syslog) input log file (whether the file is 'pre-filtered' by an input filter or not).
Since output-filters are affiliated with selector-string elements, the element 'extra3' that may contain messages re-directed by an input-filter may elect
to not filter output traffic or may elect to further filter the traffic by discarding selected events.
Syslogd2 addresses the issues of (1) handling high-traffic generators and (2) diminishing returns for each added thread through its 'multi-dimensional' multi-threading structure. This structure can easily allow over 100 active threads to be 'gainfully employed' with little or no blocking of each other.
Traditional syslog administrators often disable the use of DNS and forego host-name resolution for incoming IP traffic. This is usually done for one of three time-tested reasons:
The integrated name-cache in Syslogd2 not only allows name-resolution to occur without any additional strain on either DNS or network links, but at the same time allows 'normalization' of hostnames - even if recieved from secondary or tertiary addresses to which DNS may have assigned different names.
The use of a single, centrally-maintained cache-pre-load file for all syslog hosts (that does not contain localized host information) allows name-resolution for incoming data in all the above situations. When a pre-load file is used, DNS is unnecessary unless DNS is to be used to 'fill-in' any names not found in the cache-file (a process known as 'resolving cache-misses'). When a cache-file is used istead of DNS, there is no additional load placed on the network or the DNS servers as a result of syslog name resolution. When a cache-file is NOT used, the load is greatly reduced since only one DNS request per host will be made (results from te first DNS query are stored in the internal cache. additional load placed on the DNS server or network regardless of the volume of traffic received by the Sysloggd2 daemon.
Syslogd2 supports named-pipes as an input-source. The main reason for doing so is to better support its ApplicationMode
functionality.
Named-pipes are virtually the only way to obtain data from legacy syslog processors that is not blocked by the legacy processor's assumption that enabling IP-service
also entails opening an IP listening socket on all available addresses.
Because the legacy application has reserved all addresses at UDP port 514 to itself and can only transmit to port UDP/514,
Syslogd2 (running on the same host) is unable to listen for IP traffic on the only port the legacy system can transmit to (UDP/514).
Because of this Syslogd2 cannot reasonably expect to receive IP traffic from legacy syslog servers while running in ApplicationMode.
Since legacy syslog processors do not traditionally have an option to write to Unix or Linux sockets, that leaves only their named-pipe output method.
Testing of rsyslog in Linux shows rsyslog to be an exception to the above expectations in that rsyslog can write to non-standard UDP ports using
the non-standard 'hostname:port' syntax. This is especially good because rsyslog does not include the facpri field when it writes to named-pipes,
making rsyslog support for named-pipes much less useful for Syslogd2.
Syslogd2 supports two (2) types of spooling which may be used in tandem with each other. Either type of spooling will cause a message to be written to the same, shared spool-file designated for a given destination. Spool-files and their associated locations are recorded in a reserved file in the Syslogd2 spool-directory so they are persistant across Syslogd2 restarts and system reboots.
The configuration of spooling is split between destination-specific options and selector-string options.
Localized parameters for MaxSpoolFileSize, SpoolFileName and SpoolFileAction that are common to all selector-string elements
configured for the same location are location-options.
Localized parameters for SpoolFileAge, and whether or not to use connection-spooling (keyword: 'sf') or network-state-spooling
(keyword: 'spool') are selector-string opitons that are set based on the data represented by each selector-string-component.
If configured, spooling is started automatically if conditions are met (wrong network state or broken connection).
The support of TCP may not seem like a 'new' feature to those that have worked with rsyslog, but compared to Unix, BSD, Unix and OSX operating systems
(rsyslog is Linux-only), it will be new to some systems as Syslogd2 eventually gets ported to run on different platforms.
Syslogd2 considers TCP support to be the IP-component of the more inclusive term 'streaming protocol' which also includes
(and takes its name from) the Linux socket streaming protocol.
(Streaming protocol is the complement (and alternative) to the default 'datagram' protocol, the IP-component of which is 'UDP'.)
Another way to consider the distinction is that streaming protocols are connection-oriented protocols, while databram protocols
are connection-less protocols.
Streaming connections are required as pre-requisites for several other features of Syslogd2.
Among them are spooling and command-tool.
Once declared at compile-time, stream protocol will be dormant unless confiured. To configure a socket connection to use streaming rather than datagram
protocol, use the 'stream' option (or one of its alias keywords: 'TCP', 's' or 'T') in the connection-specification's option list.
Text-file input is a good example of the difference between a 'host-logging' daemon and a 'syslog-data-collection' daemon. A daemon whose functional boundaries are drawn
primarily by the need to log locally-generated syslog traffic to disk has little (to no) incentive to gather data from non-syslog sources (such as text-files).
However, ignoring non-syslog-formatted input (or non-socket input in general) makes no sense from a standpoint of participation in any type of network- or host-management environment.
Network-management systems need to be able to support those (relatively rare) occasions when a home-grown (or even open-source/commercial) application is running on a local platform, producing a log file that is not sent to syslog.
Host-manaement systems frequently encounter such log files (web-server logs, MySQL/mariaDB log files, home-grown monitor scripts of all types, etc).
Syslogd2's --tailfile command-line option allows specification of a text-file and a
list of opitons that can be specified to parse that file and to specify how Syslogd2 processes input from that file.
One of the less obvious differences in processing a text-file as syslog input is whether or not the syslog-data-collector should log all received content to disk.
Syslogd2 leaves this as an option for the user: Choices are log everything to local disk, log nothing to local disk or apply an output-filter and log selected
events to local disk (remembering that we are reading from a file that already exists on disk).
Text-file input in Syslogd2 is supported more for the ability to capture data to process, integrate and normalize for the benefit of downstream processes than for any need
to actually log a copy of existing data local disk. The option to apply an output filter to log partial data makes some sense if the desire is to parhaps track a
specific interimittent error or extract data for a specific purpose.
The length of various internal buffers can be modified at run-time if the CAP_VARBUFFERS compiler-symbol was selected at compile-time.
The idea behind CAP_VARBUFFERS is allows the user to shorten buffer-lengths to save memory or to expand buffers to allow processing of extra-long strings.
This capability was shown to be crucial during the development of the CAP_JOURNALTHREADS feature.
When CAP_VARBUFFERS is declared, the following new global parameters become available (all of which take a non-negative integer as a parameter):