What is Syslogd2 ?
[Top of page]
Syslogd2 is a replacement for rsyslog (or other Linux syslog processing-service) that is designed
to be a 'toolkit' for network-management / host-management applications.
In designing/writing Syslogd2, I have avoided as many Linux-specific features as possible to maximize
the ability to port Syslogd2 to other platforms. Where a feature MUST use Linux-specific function
calls, that feature is either marked as optional or the function calls are bracketed by compiler
directives that will substitute alternative code when compiled on the ported OS.
By 'toolkit', I mean that Syslogd2 is a base server with many optional components that
(when paired with DBD2) provides the data-collection and
database-insertion functions that can provide a working foundation for a syslog-based full-fledged
network-/host-managment reporting and management system.
The 'base' version of Syslogd2 will function as a fully-capable syslog processor if all that is
needed is basic data-collection from standard syslog inputs and logging data to disk.
To fully realize the capabilities of Syslogd2, additional features must be compiled into the binary.
There are (as of this writing) over 20 features that can be selected into the
code at compile-time.
In support of a network-management / host-management system, Syslogd2 provides the data-collection,
filter-based data-volume-reduction, data-transmission and data-extraction functions to produce a
'parameter-list' of name=value pairs that DBD2 then uses as input to create and populate
individual data-fields in one or more user-defined MySql databases or database-tables.
To get the most out of Syslogd2, it is necessary to revise many of the assumptions and
thought-processes that surround syslog processing today. This website exists to address all the various
Syslogd2 options that exist in an attempt to document all the features of Syslogd2.
This link and the New Concepts link are overviews
or summaries of various ways that Syslogd2 requires users to change the way they think about syslog
processing.
A partial list of Syslogd2's most distinguishing features includes:
- Syslogd2 includes an internal name-cache that can be pre-populated by the user.
The use of this pre-populated name-cache can completely remove the need for Syslogd2 to
make DNS queries to resolve hostnames.
The ability to resolve names without DNS is particularly important for companies that do not
put their network equipment into their DNS servers or in network DMZ locations.
- If the name-cache file is enabled, but incomplete, Syslogd2 will utilize DNS
(if enabled) to 'fill-in' missing entries but will limit the DNS calls to 1-call-per-host
with the name-cache being used thereafter.
- Syslogd2 provides extensive and extremely granular filter capabilities that are
significantly involved in the implementation of the following abilities:
- Re-assignment of facility and priority values based on message content.
This allows sorting of syslog events (based on string matching) or facility/priority
values into a set of 'extra' facilities provided by Syslogd2 and also supported
by DBD2.
There may be up to 1000 of these 'extra' facilities, each of which represents a distinct
'grouping' of events such as user activity, disk-errors, usb-insertions, etc.
- Extraction of arbitrary strings from the free-form msg-string field into the parameter-list
(sdstring-field) that is ultimately passed to DBD2 or other applications.
Extraction of usernames or userids by Syslogd2 allows DBD2 to populate the
usernames into distinct database fields to be used as search fields by various reports.
- Most of Syslogd2's processing is driven by a list of (default or specified)
'input-options' assigned to each input-source.
Each option assigned to an input-source (that does not define the source itself) has a
corresponding global parameter that will be used if the individualized setting is not provided.
- In Syslogd2, an 'input-source' is defined as a single socket, a single text-file,
a kernel-input specification, a journal-input-specification, or a
named-pipe-input-specification.
A 'socket' in Syslogd2 can specify a UDP, TCP, Linux Datagram or Linux Streaming
socket.
- The focus on individual input sources allows each unique socket, file or other source
to use different input filters, different dns, name-cache and other 'default' settings.
(These 'default settings' may include different default hostnames for each text-file-input,
forced facility/priority values, etc.)
- Syslogd2 can reliably collect information from high-speed sources such as firewalls
(The Syslogd2 prototype code was designed to handle PIX debug-level logging on an OC-3
internet feed).
- All of Syslogd2's 'output-processing' is controlled
by a list of output-options.
- Syslogd2 auto-detects and parses time-formats, so can properly parse and interpret most log
file time-formats and can interpret most standard and non-standard time-representations.
Because Syslogd2 reads and uses the Linux timezone files, it can reliably convert time-zones
(when the msg-time offset is present and differs from the local host offset) and it can convert
any incoming time value format it can read into an equivalent output time-format.
If Syslogd2 cannot parse the time value, it uses the original input string instead.
- If the local Linux system is configured to use systemd, Syslogd2 can extract maximum
detailed information from systemd's journal subsystem.
Syslogd2 is a replacement for whatever syslog daemon is currently running on a Linux host.
- Syslogd2 re-interprets the syslog protocols (both traditional and Version 1)
- Syslogd2
Syslogd2 is written to be portable between Linux and other Unix-like platforms.
At one point it was ported to (and running on) Apple's OSX operating system.
That portability was not been kept current due to lack of interest, but can quickly be re-instated
if the interest to do so is shown.
Why Syslogd2 ?
[Top of page]
Whenever an application fails or a network 'drops service', the first (and truthfully 'only') place administrators and engineers usually turn to for additional information is their
log files.
This is because failures rarely occur without some prior 'warning' appearing in some log file somewhere.
Usually, competing 'hypotheses' (little more than informed 'guesses') as to the cause of the outage(s) are confirmed or denied by log file entries.
Why then do we not use syslog entries for management of hosts and networks ? If we were to do so, we could (in theory) see the preliminary errors and warnings that could be fixed
to 'head off' complete systems failure. We could find minor issues that (if heeded) could improve system performance (both throughput performance and stability) and we could
track and verify compliance with a variety of corporate policies (policies such as 'no network changes outside the declared change window', 'no USB devices that could be used for
corporate espionage', etc).
The answer to why we do not use syslog for day-to-day network management is a simple one: the software tools required to effectively manage the massive amount of syslog
traffic generated by either host-based applications or network appliances on a day-to-day basis have simply never been written.
Syslogd2 was written to address that perceived need.
It was written in the belief that network-management systems should always be customized to the networks they are inteded to manage (that there is no 'one-size-fits-all'
solution to either network management or host-management).
Network (and host-) management personnel should not have to duplicate (or spend lots of money on) the same (functional) code for each network-management implementation.
Syslogd2 (and its sister projects) were written to handle the 'mechanics' of data-collection and consolidation, allowing the network management teams to configure
rather than write the 'basic' (fundamental) data-collection code required for virtually all significant network management systems.
If you take the time to examine Syslogd2, I think you'll find that Syslogd2 is more about customizing itself to your requirements than it is about customizing your
requirements to its capabilities.
Syslogd2 (and its siblings) can do the 'mechanical' data collection and even convert the data into a user-defined MySql database.
This gives network-management teams a MASSIVE head-start in their efforts to build a successful management infrastructure.
Another issue with today's network management offerings is that commercial enterprises (the larger the enterprise, the more likely this will be) push their own agendas and
'vision' of the one-size-fits-all solutions to network management issues -- visions that always seem to center around their own products or offerings (of course).
By making Syslogd2 and its siblings open-source, my intent is to throw a wrench into the existing 'money-trains' of big network companies while simultaneously providing
proof that there is no agenda behind Syslogd2. Furthermore, by making Syslogd2 open-source, I MAY be able to encourage a new service industry -- one that
exists to implement network-management solutions. Smaller companies seem to be more willing (especially at affordable prices) to put service over product sales.
For whatever reason, the IT industry has chosen to chase all sorts of non-syslog-based 'shiny silver bullets' in the name of 'network management' rather than to demand
(or produce) software that actually uses the log files (the primary and often 'only' go-to source for post-failure analysis) that are produced (if not disabled) by
applications specifically for the purpose of alerting administrators to potential failure situations.
For the purpose of my diatribe, 'shiny silver bullets' include (but are not limited to):
- 'Pingers': This is a class of commercial 'network management' (more accurately 'network monitoring') software that works by periodically 'pinging' defined IP addresses
and reporting previously-responding addresses that no longer respond. Not only does this class of software not provide any data beyond the fact that the device's network
stack is still reachable, but it actually adds to the amount of 'overhead' on the network data-links. Furthermore, in practice it actually fails because no-one actually
monitors the GUI display 24/7 to detect failed hosts and there are too many false-positive outage reports to set up automated alerts. (It's too often little or no better
for network mannagement notifications than the telephone on the desk of the IT manager if you disregard the high number of false-positive alerts due to dropped ICMP packets in the network and host/application reboots.)
- SFlow/JFlow: These protocols are mentioned because despite the hype they receive, I have yet to figure out (or have explained to me) exactly WHAT is being measured and
how that relates to any form of network status or indicates any form of proactive action to prevent network failure. These protocols also suffer from the fact that they
appear to be vendor-specific and are not supported by host-based network services (such as load-balancers, vpn-hosts, or any host-based service (web servers, proxy
servers, etc).
- SNMP: SNMP looks good on paper, but in practice suffers from the fact that it can only report what it was pre-configured to detect and (if not constantly updated by
the vendor(s), can quickly become outdated by software updates and revisions. SNMP also often suffers from the requirement for one or more centralized stations to 'poll'
each device (see 'pingers' above). If an organization wants to rely on SNMP traps for notifications, current (and appropriate) 'mib' files can often not be found.
There is also the need to configure each device with the equivalent of 'login credentials' -- which can become an administrative nightmare for implementation
if thousands of devices are involved. Furthermore, many devices simply do not support the SNMP protocol and for those that do, there is virtually no information retained
on the individual hosts. Therefore outside of whatever 'SNMP traps' might get logged by the central receiver, after-failure analysis using SNMP is usually not possible.
- Syslog: Syslog has traditionally suffered from the lack of ability to effectively select and forward only events of interest to network-management stations.
Due to its use of 'facility + priority' (facpri) selection, using syslog for network management on large networks is like using a 20-pound sledgehammer to install a thumbtack
to hang a picture on (or like using a chainsaw for brain surgery).
Syslog would be a preferred method of network-management were it not for major drawbacks (actually shortcomings) in design and implementation of current syslog processors.
These processors (also referred to as 'syslog daemons' are fine for the task of logging incoming data to files, but faile miserably at any attempted use in network-management
scenarios.
This is primarily because the algorithms and purpose these syslog processors has never been seriously questioned and these processors have never been updated with an
eye toward today's current proliferation and diversity of network devices, hosts and host-based applications.
Out of the above approaches to network-management, the obvious question is: "If syslog log files are the 'go-to' for post-failure analysis, whey don't
we (the IT community) consider using syslog as our primary notification source for pending network or application failures ?
Indeed, most IT managers and engineers do not see the irony in not using syslog events as a primary means of network management -- even though they are often the
only source of post-failure information.
After working for years on a fortune-500 campus network, I can say with certainty that there is virtually no place that a commercially-sold syslog-based system using
currently-fielded software can be attached where it can handle the requisite volume of incoming syslog events from even a fraction of the network. This shortcoming is
not the fault of the protocol itself, but rather the fault of the way the protocol is handled and implemented.
Due to the free-form nature of the syslog protocol (the contents of a syslog msg-string are intentionally undefined other than the specificaiton that it consist of a single string composed of
printable ASCII characters), syslog is an ideal protocol for logging the unique requirements of applications and network devices because it does not enforce any structure or
limitations on what may be reported.
A syslog message string is inheretly limited to a bit less than 1500 bytes by the size of a UDP packet on Ethernet (UDP being the defined protocol for syslog).
The downside of this openness is that there are no limits on how many messages may get generated per unit time (per device) nor any kind of enforcement over what the
content must look like (either format or verbiage).
There are also no enforced controls over what kinds of messages may be put into each of the 8 named priority-levels in the syslog protocol definition.
A few examples may illustrate my point:
- Because of the overall volume of traffic generated by syslog and the resulting size of log files, it became common on early UNIX systems to limit the priority of messages
that were logged to 'warning' level or above (omitting 'notice', 'info' and 'debug').
This restriction made sense when a 2-GB disk was consiered 'huge' or a 5-GB disk-pack was in common use,
but this 'tradition' has persisted even after terrabyte-size disks have become common on desktops.
However, much of the information that is most useful for network-management is in the priority-levels of 'notice' and 'info'.
Unfortunately, attempting to forward these traffic levels to a central host usually results in congesting the network links and overwhelming the central-host; either of which
is cause for considering a syslog-implementation a 'failure'.
- The mechanism of 'facility + priority' (the 'facpri' mechanism) currently used to select and route syslog messages to their destinations (though still acceptable for
host logging) is wholly insufficient for network management purposes.
Because it relies on the vendor's interpretation of of fixed categories of events, corporations have no options to specify which specific messages to transmit
to a central host over bandwidth-limited network links. The result is an 'all-or-nothing' approach which (in large networks) results in either flooded network links or no
information being forwarded to central reporting hosts.
In recognition of the massive volume of available syslog data, a syslog-based network management system would need to be decentralized with remote nodes ruthlessly minimizing
the data being forwarded to the central host while simultaneously logging copius amounts of data to local disk in an easy-to access manner (preferably individual files
for each application vs a single [mixed] file) for after-action reviews.
Syslogd2 was designed and written to address all the shortcomings I am aware of in today's syslog processing implementations. It is written to be either the primary
syslog service for Linux (or other *nix hosts) or as a regular application co-existing with an existing syslog service. Syslogd2 can scale down to the smallest
laptop or up to handle high-speed firewalls (it was originally designed to handle a PIX firewall on an OC-3 network pipe at full debug level).
When Syslogd2 functions as the central syslog collection/consolidation point of a network-management system, it can be combined with its sister-projects (DBD2 and
SLP2) to extract user-specified information fields from the free-form syslog text strings to populate one or more user-defined MariaDB or MySql databases.
About Syslogd2
[Top of page]
Syslogd2 has been re-imagined and re-designed to be the syslog collector that is required for todays corporate networks and their network-management needs.
It is focused not just on logging data to disk, but on obtaining and pre-processing all possible sources of information (including user-written processes) locally before
it forwards selected information to one or more central hosts to be inserted into databases (with DBD2) or sent to alerting systems.
Syslogd2 is open source so that no one company can buy it out and then smother it with proprietary intentions (patents and copyrights).
As a result of its re-design, Syslogd2 introduces many new features and implements many new concepts
not found in previous syslog processors.
- Syslogd2 is designed to be portable:
Linux-only function-calls and functions have been avoided where possible.
- Syslogd2 is written as a standard (forking) application suitable for use with either Linux systemd or non-Linux SYS-V startup scripts.
- For some features certain non-industry-standardized functions are required.
In some situations, Linux functionality is replicated and the substitute code is used in ported versions if not natively supported by the target host.
- In some cases, I just don't know what features might be missing or significatnly different from system-to-system until I can access those other systems.
- In some cases where LInux-specific function-calls are the only solution (such as reading the Linux kernel), the Linux-specific
function-calls are marked so they won't be included in any non-Linux binaries.
- Some portability progress has already been made. Syslogd2 currently runs on OSX (though this port has not yet been released or recently tested).
Syslogd2 development has been focused on usage in networked environments to collect and pre-process syslog traffic at point-of-receipt (with initial emphasis on
Linux platforms).
Syslogd2 is specifically designed for use as part of an overall network-management system (a design that is a superset of the functionality of current syslog processors).
As such, Syslogd2 has been developed with several (sometimes competing) goals.
Syslogd2 is still under development and will continue to gain features and capabilities as its sister projects (DBD2 and SLP2) mature.
The following list of design goals is not in any particular order:
- Portability. This has already been discussed above.
- Minimal 'learning curve' for administrators. Syslog configuration files should (as much as feasible) look and act the same to minimize surprises due to platform differences.
- Syslogd2 is designed to be portable and (though initially Linux-only) should be easy to port to other operating systems with (minimal) effort.
- Syslogd2 configuration file syntax is a proper superset of the traditional UNIX syntax, meaning that all changes to the syslog file format over Unix take the form of
'additions' rather than 'changes'. Most importantly, the Syslogd2 syntax allows for virtually unlimited 'growth' of new features.
Its use of compiler-symbols to literally include/exclude selected code and structures from the compiled binaries allows admins to only compile the pieces they need.
- Syslog abides by as many Linux (and UNIX) software standards as possible regarding file locations and file names:
- The default location of the syslog configuration file is '/etc/syslog.conf'.
- Syslogd2's ancillary configuration directory (for ancillary input files) is '/etc/syslog.d'.
- Syslogd2's default spool directory is /var/spool/syslog'.
- Syslogd2 stores it's 'pid' (process-id) file in /var/run.
- Syslod2 attempts to maintain full 'backwards' compatability with Unix-style system log-daemons:
- Like traditional syslog systems, Syslogd2 disables support for IP by default. The traditional command-line options '-i' or '-r' will enable IP support and the traditional '-f' option will enable the forwarding of received IP traffic to IP destinations (actually '-r' does both).
- Syslod2 accepts (extended) traditional syntax for output-lines in the configuration file. ('Extended' in the sense of using '.=', and '.none' in selector-strings as well as the 'comma-syntax' to specify multiple facilities with a common priority value.)
- Minimize the impact of 'change' to configuration-file syntax while allowing for the addition of configuration elements to control new features and functionality.
- Syslogd2 uses a highly modular design that allows for a lot of network- and host-management-related features.
- These features need to be configured. Due to the number of features (and their options), command-line management can become unwieldy
(even limiting) due to aggregate length.
- Syslogd2's solution is to define new-feature-configuratons as command-line parameters (since that 'fits' the traditional syntax),
but then to allow any command-line parameters to be re-located into the configuration file (where linear-access [of a single command-line
string] and overall length cease to be problematic).
- To identify a line as containing containing command-line input, Syslogd2 defines the tilde ('~') as a 'recognition character' if found
as the 1st character of a line.
- Extensions to output-lines are implemented by adding an optional 'options-list' at the end of the line.
- Documentation of the syslog file is enhanced by allowing 'in-line comments' (comments that follow configuration entries in the same line).
- To allow 'partial' roll-outs of Syslogd2, it may be desirable to define and deploy a common configuration file to be used by different syslog
processors depending on need and phase of the roll-out. This allows for a common log-file configuration across multiple machines making both network
administration and multi-host management easier (especially when multiple binaries may be involved since only one configuration file will be in use at
a fixed file-system location).
This ability may also be desirable as a 'hedge' against 'roll-backs' of binary images due to updates or during initial implementations of ported version
of Syslogd2 to UNIX systems.
- Because the addition of any syntax element to an existing syslog-file syntax may cause the existing syslog daemon to fail, Syslogd2 re-imagines
the traditional view of a 'comment' (since 'comments' are about the only syntax that all parsers agree on).
- SoftComment This global boolean variable only affect how Syslogd2 parses files. It causes parsing to continue
after finding the first hashtag in a line. SoftComment may be enabled or disabled as often as desired duringg parsing.
- If the next token found in the line is a tilde ('~') the command will be read as Syslogd2-command-line parameters.
- If the next token found in the line is a valid facility name or wild-card:
- If the SoftComment boolean is currently enabled, the line will be parsed as a Syslogd2-compatible output-line.
- If the SoftComment boolean is currently disabled, the line will be considered a comment and ignored.
- Otherwise the line is considered to be a comment and is ignored.
- When SoftComment is enabled, a 2nd hashtag character ('#') marks the start of an unconditional ('hard') comment.
- Syslogd2 considers all SoftComment to always be 'on' for command-line entries so the "--enable softcomment" instruction
can be 'hidden' from legacy syslog daemons.
- First-come-first-serve parsing policy In order to catch and report on duplicated configuration settns, Syslogd2
implements a strict 'first-come-first-serve' policy when parsing variables.
Any attempt to set a variable not identified for 'multiple instance' use that was set by a previous line will result in an error-log message and
rejection of the subsequent setting.
One advantage of this policy is that Syslogd2 can create a 'null' (SoftComment) entry for an output-lines that contains the syslogd2-specific
option-list or 'extra' facilities, followed by a 'duplicate' entry that the legacy parser will see.
Modifications to the 'active' (legacy) line will then affect both service-daemons. Details
- --Skip <n> This command-line option is (admittedly) only useful in the configuration file. It causes
Syslogd2 to 'skip' (ignore) the next <n> lines or to end-of-file (whichever comes first). This option is provided for those few (primarily
Linux) configuration lines that Syslogd2 simply cannot parse or properly support.
- Provide an ultra-high-speed (don't you just HATE suparlatives ?) syslog processor capable of keeping up with the high syslog volumes experienced in
fast, busy firewalls and larger corporate syslog hubs. (worker-threadpools and output-threadpools
- Provide a syslog daemon that is effective for use on laptops and work-from-home systems despite the ability of laptop and work-from-home users to access
multiple networks. (Network Awareness
- Provide a store-and-forward capability for syslog daemons to provide 'other-end' log data after a network outage has been restored.
CAP_SPOOLFILES
- Provide a means of controlling the insanely large amount of syslog traffic that is currently required of any syslog-based management system and that
routinely congests network links (Filters, Spooling.
- Provide a simple method for administrators to select and process text-file input from either existing syslog or non-syslog log files.
(CAP_TAILFILES and Filters
- Do all the above in a cost-effective, vendor-neutral manner. (Open-source project(s) + Linux OS on PC hardware)
About this site
[Top of page]
This site is the home for on-line Syslogd2 documentation and reference material.
If you have downloaded the Syslogd2 project source code and compiled from scratch, you should find a 'docs' directory in your download
containing '.odt' files suitable for producing hard-copies. This site will always be more current than the files made available for download and off-line viewing.
However, attempting to print hardcopies from this website may lead to frustration due to the eye-saving black background-color that will cause excessive use of printer ink.
- New Concepts is a review of several features in Syslogd2 that significantly increase syslog-processor performance over traditional variants.
- Misc Topics contains (or will contain) the 'ash-and-trash' of this site and either direct content or links to:
- Compilation and Installation instructions,
- Details on the expanded configuration file syntax and additions (all of which can be expressed as comments) that allow for the configuraiton of Syslogd2.
- Various theoretical Computer Science topics that are normally confined to Academia, when a cursory understanding helps to understand the
implementation of Syslogd2.
- User demonstrations of various Syslogd2 features. (consisting of sample files and instructions to walk through the demos.)
- Sample Files (of filters, cache-file, config-files, etc).
- Capabilities contains a brief summary of the new features in Syslogd2's base feature set and the optioal features activated by 20+
compiler symbols that are used to customize Syslogd2 to individualized needs.
Each of these compiler symbols causes the compiler to include (or exclude) code and data-structures that would otherwise result in a system too large
and complex for most applications of syslog collection.
Between selection of these compiler symbols and general configuration, Syslogd2 can scale from as small as 2 threads to a huge application
supporting hundreds of threads and tens of threadpools with over a half-dozen threadpool types.
- Syslogd2 Reference is a central refernce for Syslogd2 command-line options, suboptions, global boolean values and command-syntax.
This reference also contains a summary chart of all syslog facility and priority names.
- Project Sites are links to sourceforge project locations where downloads can be obtained for my 3 projects:
- Syslogd2: This project
- DBD2 (DataBase Daemon-2): An automated database-insertion daemon for syslog data or for 'light-client' command-line input to database(s).
Code is almost ready for preliminary release, but documentation is currenly almost totally lacking.
- SLP2 (SysLog Parser-2): A parser to extract name=value data from syslog events for insertion into database(s) with DBD2.
This site is still under construction, but I wanted to make as much information on Syslogd2 as available as possible as soon as possible, so please bear with me
while I build this documentation site.
Some Key Features of Syslogd2
[Top of page]
Scalable
Syslogd2 is scalable in both capacity and feature-set.
Syslogd2 has over 20 optional features to be selected at compile-time (multiple binaries may be compiled at once - each with a different set of features). These optional features are referred to as 'capabilities' (or 'CAP_*-abilities' after the names of the compiler-symbols).
When capabilities are added to a Syslogd2 binary image, they will generally default to off (or to a 'neutral' or 'inactive' condition until called upon by configuration.
The modular design of Syslogd2 allows for new capabilities to be added when time permits (I have a growing list).
For input, Syslogd2 provides support for reading text-files as input ('tailing' a log file), input from named-pipes, TCP and UDP input (Ipv4 and IPv6) on user-specified ports, and input from arbitrary (user-defined) Linux sockets (supporting both stream and datagram protocols).
For output, Syslogd2 supports writing to arbitrary character devices (such as pseudo-terminals), TCP hosts and arbitrary (user-defined) Linux sockets (both stream and datagram protocols) in addition to the full suite of traditional syslog output (files, tty, console, udp, named-pipe, user-list, etc). Output support for IP connections includes both IPv4 and IPv6 as well as user-defined connections on user-specified ports.
In terms of capacity (throughput), Syslogd2 baselines as a multi-threaded process (much like other multi-threaded systems).
As the need grows, Syslogd2 can add not just additional threads, but enire new threadpools that allow resources to be concentrated where needed.
As the need continues to grow, Syslogd2 can divide the processing of individual messages to further increase processing efficiency, allowing reader-threads to concentrate on accepting input while processing threads focus on parsing and processing the buffered traffic. Output thhreadpools allow the use of many output files or connections with little to no risk of dropped (lost) input by offloading the tasks of testing and writing to devices, files, sockets and pipes.
Pre-loadable Name-Cache
Syslogd2's integrated name cache was originally focused on speeding up host-name resolution wile simultaneously reducing the load that syslog name-resolution places on DNS servers.
The optional (user-provided) cache-file allows the cache to be pre-loaded at startup, ideally even replacing the need for DNS services.
Since initial implementation, the cache-file has proven useful in other roles as well (insuring all output remote destinations can be resolved and normalizing secondary and tertiary addresses (and hostnames / aliases) to a the canonical hostname that will be sent to 'downstream processes'.
Spooling
Syslogd2 supports two types of spooling:
Connection Spooling: This is the 'standard' store-and-forward type of spooling that is commonly discussed when the topic of 'spooling' is brought up.
In order to work, connection spooling must be using a connection that supports a handshake protocol (Linux streaming socket, TCP or Named-Pipe output) in order to sense a broken connection.
Whenever Syslogd2 recovers a previously-down connection, it checks for a spool-file and (if it finds one) it will transmit any (delayed) contents of the spoolfile that are still relevant to the remote host.
Network State Spooling: This is a new type of spooling. It works with any type of remote-process protocol (any type of socket or name-pipes).
The basic idea behind network-state spooling is that a syslog host may be on a network that periodically changes state while traffic input is relatively constant (it is always potentially present). Sometimes, traffic is received for a remote host when the network is not in a state that allows transmission to that host. In these cases, Syslogd2 can spool the data for transmission once the network reaches an 'acceptable' state.
- Example: Syslog is receiving input before the network is started that should be sent to a remote host (perhaps a disk is failing or a memory fault is found by the kernel's POST test).
The current network 'state' is 'down' (or perhaps there's a network issue and the state is 'local', but unable to acheive the state of 'other' (fully up and availablea).
Syslogd2 can spool this information until the network starts up or otherwise becomes available, THEN transmit the data to the remote host.
- Example (Illustration only since the network-check routines that would actually enable this functionality have not yet been written): A linux laptop is configured to forward selected events to a central loghost at the employer's location. Since this user often works from home or other locations where he accesses the internet, 'standard' connection-spooling can not facilitate event-logging from this laptop.
Because Syslogd2 is always aware of the current 'network state', it can use network-state spooling to spool events until the laptop encounters the specific set of conditions that have been specified as defining that the laptop has connected to the employer's network. Only when the network state has been confirmed to be 'acceptable' (ie: connected to the correct network), will Syslogd2 'unspool' the stored events to the employer's syslog collector.
Because network-state spooling is not dependent on the connection or handshake, it can work with either TCP or UDP protocols (or streaming or datagram Linux sockets), but will most likely be used over TCP with connection-state spooling.
Filters
Syslogd2 filters act either to modify the contents of a syslog message or in a go/no-go mode to either accept or drop (discard) a message.
All Syslogd2 filters use the same format and syntax. There are two types of filter: input filters and output filters. The difference is whether they are declared as an option to a an input specification or as an option on an output line.
Each filter can mix 'default-pass' and 'default-discard' modes of operation.
Each filter can inspect or modify any field of the syslog message except the time field.
An input-filter is declared as an option to an input source (file, kernel, socket, pipe, etc).
An input filter will receive and process all traffic that is received from the source on which it is declared. The same filter may be applied to multiple sources, however each source may only declare one filter.
Input filters are applied at the point in processing after all the message has been fully parsed and resolved so that 'final' parsing results are presented to the filter. They are applied just before the point where the facility+priority of the message is compared to the selector-strings of the defined output locations.
The power of input filters is that by changing the facility and/or priority values of a message being processed, that message can be re-routed to different destinations based on the contents of any component (or combination of components) of the message.
This allows (for example) the sorting of log events into different files based on the 'tag' field (the application name at the start of of most messages).
It also allows the extraction and selection and redirection of any message containing the string "username: " or "user=".
An output-filter is declared as an option on an output line in the configuration file and applies only to the data-stream that is being sent to that destination based on matching a particular selector-string subset. (Space here does not permit excessive details).
Output filters are applied after a destination has been determined to match a particular message. The only processing events that occur after the message is processed by an output filter (assuming the message is not discarded) are the adjustments made by the 'localhosts' and 'stripdomains' options to the hostname field and final output-line formatting prior to actually being written to the destination.
The placement and timing of the output-filters allows them to provide two key features to Syslogd2:
Traffic reduction. By acting as 'discard' filters based on message content, they can act as 'gatekeepers' by discarding all outgoing events except those specificatlly desired by the administrator. This can drastically reduce the overall network traffic of a syslog-based management system, but at the same time provides a control to prevent the central collection-host from being overwhelmed by the sheer volume of incoming traffic -- no matter how much is being generated or how many devices are being monitored.
It provides the ability for Syslogd2 to specify both the output format and content of traffic sent to each destination.
Syslogd2 can change the facility+priority value (or any other part) of a given message to whatever the remote host wants to see (or can handle). (example: Syslogd2 has internally moved all events containing the string "username" to 'extra5', but the remote host does not support the 'extra5' facility, so Syslogd2 'moves' that message 'down' to 'local0' with an output filter. The same Syslogd2 sends the same message to DBD2 to be inserted into a database, but DBD2 needs to receive it with facility+priority of 'extra12.notice' for proper routiung by DBD2. Syslogd2 uses a different output filter to change the same message to 'extra12.notice' for DBD2.
Vision for Syslogd2
[Top of page]
My hope for Syslogd2 is that it becomes a popular (even standard) tool for network administrators (and Linux administration teams) to use to collect syslog data from all their various sources.
Syslogd2's scalable design and open-source license (Aferro GPL) are intended to keep it free from commercialization so it remains a cheap (free-as-in-$$$) tool that can replace standard Linux syslog services to collect, store, filter and forward syslog events at distributed collection-points throughout a network.
The goal is to prevent network degredation that occurs today as massive amounts of syslog data overwhelm network links.
Syslogd2 can act as a vendor-agnostic syslog collection tool for even the largest networks.
While Syslogd2 specializes in data-collection and transmission, my DBD2 project specializes in inserting name=value data-fields (syslog or command-line data) into databases at relatively high speed and with maximum versatility.
This provides a ready-made 'baseline' for either home-grown or open-source (or even commercial) syslog-based management tools.
Return to top of page