Archive the log files monthly
Go to file
2022-03-19 10:08:48 +08:00
t Fixed errors calling executables in the test suite helper. 2022-03-19 10:05:19 +08:00
.gitignore Added Build.bat to .gitignore. 2021-02-10 07:48:23 +08:00
arclog Added -Level => 9 with the gzip compression. 2022-03-19 10:08:48 +08:00
Build.PL Replaced Compress::Zlib with IO::Compress::Gzip and IO::Uncompress::Gunzip. Replaced Compress::Bzip2 with IO::Compress::Bzip2 and IO::Uncompress::Bunzip2. 2022-03-19 09:23:39 +08:00
Changes Fixed an error in Changes. 2022-03-19 09:51:24 +08:00
LICENSE Replaced the messy GNU-styled, multi-lingual documents with a single Markdown README.md. Added the GitHub repository to the document and removed the SourceForge website, my web page, and the SourceForge mailing list from the document. Removed the document license, and my PGP key from the package. Removed the signature file from the repository. Changed the license from GPLv3 to APLv2. Various fixes to avoid JetBrains IDE problem detection. 2021-02-06 19:34:55 +08:00
Makefile.PL Replaced Compress::Zlib with IO::Compress::Gzip and IO::Uncompress::Gunzip. Replaced Compress::Bzip2 with IO::Compress::Bzip2 and IO::Uncompress::Bunzip2. 2022-03-19 09:23:39 +08:00
MANIFEST.SKIP Replaced MANIFEST.SKIP with the file created by Module::Build and my local extensions. 2021-02-08 09:38:36 +08:00
README.md Replaced Compress::Zlib with IO::Compress::Gzip and IO::Uncompress::Gunzip. Replaced Compress::Bzip2 with IO::Compress::Bzip2 and IO::Uncompress::Bunzip2. 2022-03-19 09:23:39 +08:00

arclog - Archive the Log Files Monthly

Description

arclog archives the log files monthly. It strips off previous months log records from the log file, and save them to compressed archive files named logfile.yyyymm. It then saves the hard disk space and prevents potential attacks on log files.

Currently, arclog supports [Apache] access log, Syslog, [NTP], Apache 1 SSL engine log, and my own bracketed, modified ISO date/time log file formats, and gzip, bzip2, and xz compression methods. Several software projects log (or can log) in a format compatible with the Apache access log, like [CUPS], [ProFTPD], [Pure-FTPd]… etc., and arclog can archive their Apache-like log files, too.

Caution

  • Archival takes time. To reduce the time occupying the source log file, arclog copies the content of the source log file to a temporary working file and restart the source log file first. Then arclog can take its time working on the temporary working file. However, please note:

    1. If you have a huge log file (several hundreds of MBs), merely copying still takes a lot of time. You had better stop logging first, archive the log file and restart logging, to avoid racing condition in writing. If you archive the log file periodically, it shall not grow too big.

    2. If arclog stops in the middle of the execution, it will leave a temporary working file. The next time arclog runs, it stops when it sees that temporary working file. You have to process that temporary working file first. That temporary working file is merely a copy of the original log file. You can rename and archive it like an ordinary log file to solve this.

  • Do not sort unless you have a particular reason. Sorting has the following potential problems:

    1. Sorting may eat huge memory on large log files. The amount of the memory required depends on the number of records in each archived month. Modern Linux and MS-Windows kill processes that eat too much memory, but it still takes minutes, and your system hangs for that. I do not know other operating systems. Try at your own risk.

    2. The time unit of all recognized log formats is second. Log records happen in a same second are sorted by the log file order (if you are archiving several log files at a time) and then the log record order. I try to ensure that the sorted archived records are in a correct order of the happening events, but I cannot guarantee. You have to watch out if the order in a second is important.

  • Be careful on the Syslog and NTP log files: Syslog and NTP does not record the year. arclog uses Date::Parse to parse the date, which assumes the year between this month and last next month if the year is missing. For example, if today is 2001/6/8, it assumes the year between 2001/6/30 back to 2000/7/1. This is fair. However, if you do have a Syslog or NTP log file that has records older than one year, do not use arclog. It will destroy your log file.

  • If read from STDIN, please note:

    1. You must specify the output prefix if you want to read from STDIN, since what it needs is an output pathname prefix, not an output file.

    2. STDIN cannot be deleted, restarted or partially kept. If you read from STDIN, the keep mode is always "keep all". If you archive several source log files including STDIN, the keep mode will be "keep all" for all source log files, to prevent disaster.

    3. The answers of the ask mode is obtained from STDIN, too. Since you have only one STDIN, you cannot specify the ask mode while reading from STDIN. It falls back to the fail mode in that case.

  • I suggest that you install File::MMagic instead of counting on the file executable. The internal magic file of File::MMagic works better than the file executable. arclog treats everything not gzip, bzip2, or xz compressed as plain text. When a compressed log file is wrongly recognized as an image, arclog treats it as plain text, reads directly from it, and fails. This does not hurt the source log files, but is still annoying.

System Requirement

  1. Perl, version 5.8.0 or above. arclog uses 3-argument open() to duplicate file handles, which is only supported since 5.8.0. I have not successfully port this onto earlier versions yet. Please tell me if you made it.

    You can run perl -v to check your current Perl version. If you do not have Perl, or if you have an older version of Perl, you can download and install/upgrade it from the Perl website. For MS-Windows, you can download and install Strawberry Perl or ActivePerl.

  2. Required Perl modules:

    • Date::Parse

      This is used to parse the timestamp of the log records. You can download and install Date::Parse from the CPAN archive, or install it with the CPAN shell:

      cpan Date::Parse
      

      or with the CPANPLUS shell:

      cpanp i Date::Parse
      

      For Debian/Ubuntu:

      sudo apt install libtimedate-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-TimeDate
      

      For FreeBSD:

      ports install p5-TimeDate
      

      For ActivePerl:

      ppm install TimeDate
      
  3. Optional Perl modules:

    • File::MMagic

      This is used to check the file type. If this is not available, arclog tries the file executable instead. If that is not available, too, arclog judges the file type by its name suffix (extension). In that case arclog fails when reading from STDIN. You can download and install File::MMagic from the CPAN archive, or install it with the CPAN shell:

      cpan File::MMagic
      

      or with the CPANPLUS shell:

      cpanp i File::MMagic
      

      For Debian/Ubuntu:

      sudo apt install libfile-mmagic-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-File-MMagic
      

      For FreeBSD:

      ports install p5-File-MMagic
      

      For ActivePerl:

      ppm install File-MMagic
      

      The alternative file.exe for MS-Windows can be obtained from the GnuWin32 home page. Be sure to save it as file.exe somewhere in your PATH.

    • IO::Compress::Gzip and IO::Uncompress::Gunzip

      They are used to support reading/writing the gzip compressed files. It is only needed when gzip compressed files are encountered. If they are not available, arclog tries the gzip executable instead. If that is not available, too, arclog fails. You should not worry about IO::Compress::Gzip since it comes with Perl since version 5.9.3. If not, it is contained in the IO-Compress distribution. You can download and install it from the CPAN archive, or install it with the CPAN shell:

      cpan IO::Compress::Gzip
      

      or with the CPANPLUS shell:

      cpanp i IO::Compress::Gzip
      

      For Debian/Ubuntu:

      sudo apt install libio-compress-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-IO-Compress
      

      For FreeBSD:

      ports install p5-IO-Compress
      

      For ActivePerl:

      ppm install IO-Compress
      

      The alternative gzip.exe for MS-Windows can be obtained from the gzip website. Be sure to save it as gzip.exe somewhere in your PATH.

    • IO::Compress::Bzip2 and IO::Uncompress::Bunzip2

      They are used to support reading/writing the bzip2 compressed files. They are only needed when bzip2 compressed files are encountered. If they are not available, arclog tries the bzip2 executable instead. If that is not available, too, arclog fails. You should not worry about IO::Compress::Bzip2 since it comes with Perl since version 5.10.1. If not, it is contained in the IO-Compress distribution. You can download and install it from the CPAN archive, or install it with the CPAN shell:

      cpan IO::Compress::Bzip2
      

      or with the CPANPLUS shell:

      cpanp i IO::Compress::Bzip2
      

      For Debian/Ubuntu:

      sudo apt install libio-compress-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-IO-Compress
      

      For FreeBSD:

      ports install p5-IO-Compress
      

      For ActivePerl:

      ppm install IO-Compress
      

      The alternative bzip2.exe for MS-Windows can be obtained from the bzip2 website. Be sure to save it as bzip2.exe somewhere in your PATH.

    • IO::Compress::Xz and IO::Uncompress::UnXz

      They are used to support reading/writing the xz compressed files. It is only needed when xz compressed files are encountered. If it is not available, arclog tries the xz executable instead. If that is not available, too, arclog fails. They are contained in the IO-Compress-Lzma distribution. You can download and install it from the CPAN archive, or install them with the CPAN shell:

      cpan IO::Compress::Xz
      

      or with the CPANPLUS shell:

      cpanp i IO::Compress::Xz
      

      For Debian/Ubuntu:

      sudo apt install libio-compress-lzma-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-IO-Compress-Lzma
      

      For FreeBSD:

      ports install p5-IO-Compress-Lzma
      

      For ActivePerl:

      ppm install IO-Compress-Lzma
      

      The alternative xz.exe for MS-Windows can be obtained from the XZ Utils website. Be sure to save it as xz.exe somewhere in your PATH.

    • Term::ReadKey

      This is used to display the progress bar. The progress bar is a good visual feedback of what arclog is currently doing, but arclog is safe without it. You can download and install Term::ReadKey from the CPAN archive, or install it with the CPAN shell:

      cpan Term::ReadKey
      

      or with the CPANPLUS shell:

      cpanp i Term::ReadKey
      

      For Debian/Ubuntu:

      sudo apt install libterm-readkey-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-TermReadKey
      

      For FreeBSD:

      ports install p5-Term-ReadKey
      

      For ActivePerl:

      ppm install TermReadKey
      

Download

arclog is hosted is on…

You can always download the newest version of arclog from…

imacats PGP public key is at…

Install

If you are upgrading from arclog.pl 2.1.1dev4 or earlier, please read the upgrade instruction later in this document.

Install with ExtUtils::MakeMaker

% perl Makefile.PL
% make
% make test
% make install

When running make install, make sure you have the privilege to write to the installation locations. This usually requires the root privilege.

If you want to install into another location, you can set the PREFIX. For example, to install into your home when you are not root:

% perl Makefile.PL PREFIX=/home/jessica

Refer to the documentation of ExtUtils::MakeMaker for more installation options (by running perldoc ExtUtils::MakeMaker).

For MS-Windows, since make is not universally available, Module::Build is preferred to ExtUtils::MakeMaker. See the instructions below.

Install with Module::Build

% perl Build.PL
% ./Build
% ./Build test
% ./Build install

When running ./Build install, make sure you have the privilege to write to the installation locations. This usually requires the root privilege.

If you want to install into another location, you can set the --prefix. For example, to install into your home when you are not root:

% perl Build.PL --prefix=/home/jessica

Refer to the documentation of Module::Build for more installation options (by running perldoc Module::Build).

Upgrade Instruction

Here are a few hints for people upgrading from 2.1.1dev4 or earlier:

The Script Name is Changed from arclog.pl to arclog

This is obvious. If you have any scripts or cron jobs that are running arclog, remember to modify your script for the new name. Of course, you can rename arclog to arclog.pl. It still works.

The reason I changed the script and project name is that: A dot . in the project name is not valid everywhere. At least SourceForge dont accept it. Besides, arclog is enough for a script name under UNIX. The .pl file name suffix/extension may be convenient on MS-Windows, but MS-Windows users wont run it with explorer file name association anyway, and there is a pl2bat to convert arclog to arclog.bat, which would make more sense. The only disadvantage is that I was using UltraEdit, which depends on the file name extension for the syntax highlighting rules. I can manually set it anyway. Im using gedit on Linux now. This is not a problem anymore.

The Default Installation Location Is at /usr/bin

Also, the man page is at /usr/share/man/man1/arclog.1. This is to follow Perls standard convention, and to avoid breaking ExtUtils::MakeMaker with future versions.

When you run perl Makefile.PL or perl Build.PL, it hints a list of existing old files to be removed. Please delete them manually.

If you saved them in other places, you have to delete them yourself.

Also, if you have any scripts or cron jobs that are running arclog, remember to modify your script for the new arclog location. Of course, you can copy arclog to the original location. It still works.

The Argument of --keep and --override Options Are Required Now

Support for omitting the --keep or --override arguments are removed. This helps to avoid confusion for the log file name and the option arguments.

Options

./arclog [options] logfile… [output]
./arclog [-h|-v]
  • logfile

    The log file to be archived. Specify - to read from STDIN. You can specify multiple log files. gzip, bzip2, or xz compressed files are supported.

  • output

    The prefix of the output files. The output files are named as output.yyyymm, i.e., output.200101, output.200102. If not specified, the default is the same as the log file. You must specify this if you want to read from STDIN. You cannot specify - (STDOUT), since arclog needs a name prefix, not the output file.

  • -c, --compress method

    Specify the compression method for the archived files. Log files usually have large number of similar lines. Compress them saves you lots of disk spaces. (And this is why we want to archive them.) The following compression methods are supported:

    • g, gzip

      Compress with gzip. This is the default. arclog can use IO::Compress::Gzip to compress instead of calling gzip. This can be safer and faster for not calling foreign binaries. If IO::Compress::Gzip is not installed, it tries gzip instead. If gzip is not available, either, it fails.

    • b, bzip2

      Compress with bzip2. arclog can use IO::Compress::Bzip2 to compress instead of calling bzip2. This can be safer and faster for not calling foreign binaries. If IO::Compress::Bzip2 is not installed, it tries bzip2 instead. If bzip2 is not available, either, it fails.

    • x, xz

      Compress with xz. arclog can use IO::Compress::Xz to compress instead of calling xz. This can be safer and faster for not calling foreign binaries. If IO::Compress::Xz is not installed, it tries xz instead. If xz is not available, either, it fails.

    • n, none

      No compression at all. (Why? :p)

  • --nocompress

    Do not compress the archived files. This is equivalent to --compress none.

  • -s, --sort

    Sort the records by time (and then the record order). Sorting eats huge memory and CPU, so it is disabled by default. Refer to the description above for a detailed illustration on sorting.

  • --nosort

    Do not sort the records. This is the default.

  • -o, --override mode

    What to do with the existing archived files. The following modes are supported:

    • o, overwrite

      Overwrite existing target files. You will lose these existing records. Use with care. This is helpful if you are sure the main log file has the most complete records.

    • a, append

      Append the records to the existing target files. You may destroy the log file completely by putting irrelevant entries altogether by accident. Use with care. This is helpful if you append want to merge 2 or more log files, for example, 2 log files of different periods.

    • i, ignore

      Ignore any existing target file, and discard all the records of those months. You will lose these log records. Use with care. This is helpful if you are supplying log records for the missing months, or if you are merging the log records in a complex manner.

    • f, fail

      Stop whenever a target file exists, to prevent destroying existing files by accident. This should be mostly desired when run from some automatic mechanism, like crontab. So, this is the default if no terminal is found at STDIN.

    • ask

      Ask you what to do when a target file exists. This should be mostly desired if you are running arclog interactively. So, this is the default if a terminal is found at STDIN. The answers are read from STDIN. Since you have only one STDIN, you cannot specify this mode if you want read the log file from STDIN. In that case, it falls back to the fail mode. Also, if arclog cannot get its answer from STDIN, for example, on a closed STDIN from crontab, it falls back to fail mode.

  • -k, --keep mode

    What to keep in the source file. Currently, the following modes are supported:

    • a, all

      Keep the source file after records are archived.

    • r, restart

      Restart the source log file after records are archived.

    • d, delete

      Delete the source log file after records are archived.

    • t, this-month

      Archive and strip records of previous months off from the log file. Keep the records of this month in the source log file, to be archived next month. This is designed to be run from crontab monthly, so this is the default.

  • -d, --debug

    Show the detailed debugging messages. More -d to be more detailed.

  • -q, --quiet

    Hush! Only yell on error.

  • -h, --help

    Display the help message and exit.

  • -v, --version

    Output version information and exit.

Documentation

Type perldoc arclog to read the arclog manual.

News, Changes and Updates

Refer to the Changes for changes, bug fixes, updates, new functions, etc.

Support

The arclog project is hosted on GitHub. Address your issues on the GitHub issue tracker https://github.com/imacat/arclog/issues.

Thanks

  • Thanks to Chen-hsiu Huang for reporting the bug that $WORKING_RES was not locked when opened.

  • Thanks to SourceForge for providing compiling farm for projects to test on different platforms.

License

Copyright (C) 2001-2021 imacat.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

imacat ^_*'
2007/12/3
imacat@mail.imacat.idv.tw
https://www.imacat.idv.tw