arclog/README.md

626 lines
19 KiB
Markdown
Raw Normal View History

`arclog` - Archive the Log Files Monthly
========================================
Description
-----------
`arclog` archives the log files monthly. It strips off previous
months log records from the log file, and save them to compressed
archive files named `logfile.yyyymm`. It then saves the hard disk
space and prevents potential attacks on log files.
Currently, `arclog` supports [Apache] access log, Syslog, [NTP],
Apache 1 SSL engine log, and my own bracketed, modified ISO date/time
log file formats, and gzip and bzip2 compression methods. Several
software projects log (or can log) in a format compatible with the
Apache access log, like [CUPS], [ProFTPD], [Pure-FTPd]… etc., and
`arclog` can archive their Apache-like log files, too.
Caution
=======
* *Archival takes time*. To reduce the time occupying the source log
file, `arclog` copies the content of the source log file to a
temporary working file and restart the source log file first. Then
`arclog` can take its time working on the temporary working file.
However, please note:
1. If you have a huge log file (several hundreds of MBs), merely
copying still takes a lot of time. You had better stop logging
first, archive the log file and restart logging, to avoid racing
condition in writing. If you archive the log file periodically,
it shall not grow too big.
2. If `arclog` stops in the middle of the execution, it will leave a
temporary working file. The next time `arclog` runs, it stops
when it sees that temporary working file. You have to process
that temporary working file first. That temporary working file
is merely a copy of the original log file. You can rename and
archive it like an ordinary log file to solve this.
* Do not sort unless you have a particular reason. Sorting has the
following potential problems:
1. Sorting may *eat huge memory* on large log files. The amount of
the memory required depends on the number of records in each
archived month. Modern Linux and MS-Windows kill processes that
eat too much memory, but it still takes minutes, and your system
hangs for that. I do not know other operating systems. Try at
your own risk.
2. The time unit of all recognized log formats is *second*. Log
records happen in a same second are sorted by the log file order
(if you are archiving several log files at a time) and then the
log record order. I try to ensure that the sorted archived
records are in a correct order of the happening events, but I
cannot guarantee. You have to watch out if the order in a second
is important.
* Be careful on the Syslog and NTP log files: Syslog and NTP does not
record the year. `arclog` uses [Date::Parse] to parse the date,
which assumes the year between this month and last next month if the
year is missing. For example, if today is 2001/6/8, it assumes the
year between 2001/6/30 back to 2000/7/1. This is fair. However, if
you do have a Syslog or NTP log file that has records older than one
year, do not use `arclog`. It will destroy your log file.
* If read from `STDIN`, please note:
1. You *must* specify the output prefix if you want to read from
`STDIN`, since what it needs is an output pathname prefix, not an
output file.
2. `STDIN` cannot be deleted, restarted or partially kept. If you
read from `STDIN`, the keep mode is always "keep all". If you
archive several source log files including `STDIN`, the keep mode
will be "keep all" for all source log files, to prevent disaster.
3. The answers of the `ask` mode is obtained from `STDIN`, too.
Since you have only one `STDIN`, you cannot specify the ask mode
while reading from `STDIN`. It falls back to the fail mode in
that case.
* I suggest that you install [File::MMagic] instead of counting on the
`file` executable. The internal magic file of File::MMagic works
better than the `file` executable. `arclog` treats everything not
gzip nor bzip2 compressed as plain text. When a compressed log file
is wrongly recognized as an image, `arclog` treats it as plain text,
reads directly from it, and fails. This does not hurt the source
log files, but is still annoying.
[Date::Parse]: https://metacpan.org/release/TimeDate
[File::MMagic]: https://metacpan.org/release/File-MMagic
System Requirement
------------------
1. Perl, version 5.8.0 or above. `arclog` uses 3-argument open() to
duplicate file handles, which is only supported since 5.8.0. I
have not successfully port this onto earlier versions yet. Please
tell me if you made it.
You can run `perl -v` to check your current Perl version. If you
do not have Perl, or if you have an older version of Perl, you can
download and install/upgrade it from the [Perl website]. For
MS-Windows, you can download and install [Strawberry Perl] or
[ActivePerl].
2. Required Perl modules:
* [Date::Parse]
This is used to parse the timestamp of the log records. You can
download and install Date::Parse from the CPAN archive, or
install it with the CPAN shell:
cpan Date::Parse
or with the CPANPLUS shell:
cpanp i Date::Parse
For Debian/Ubuntu:
sudo apt install libtimedate-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-TimeDate
For FreeBSD:
ports install p5-TimeDate
For ActivePerl:
ppm install TimeDate
3. Optional Perl modules:
* [File::MMagic]
This is used to check the file type. If this is not available,
`arclog` tries the `file` executable instead. If that is not
available, too, `arclog` judges the file type by its name suffix
(extension). In that case `arclog` fails when reading from
`STDIN`. You can download and install File::MMagic from the CPAN
archive, or install it with the CPAN shell:
cpan File::MMagic
or with the CPANPLUS shell:
cpanp i File::MMagic
For Debian/Ubuntu:
sudo apt install libfile-mmagic-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-File-MMagic
For FreeBSD:
ports install p5-File-MMagic
For ActivePerl:
ppm install File-MMagic
The alternative `file.exe` for MS-Windows can be obtained from
the [GnuWin32] home page. Be sure to save it as `file.exe`
somewhere in your `PATH`.
* [Compress::Zlib]
This is used to support reading/writing the gzip compressed
files. It is only needed when gzip compressed files are
encountered. If it is not available, `arclog` tries the `gzip`
executable instead. If that is not available, too, `arclog`
fails. Compress::Zlib comes with Perl since version 5.9.3. If
not, you can download and install it from the CPAN archive, or
install it with the CPAN shell:
cpan Compress::Zlib
or with the CPANPLUS shell:
cpanp i Compress::Zlib
For Debian/Ubuntu:
sudo apt install libio-compress-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-IO-Compress
For FreeBSD:
ports install p5-IO-Compress
For ActivePerl:
ppm install IO-Compress
The alternative `gzip.exe` for MS-Windows can be obtained from
[the gzip website]. Be sure to save it as `gzip.exe` somewhere
in your `PATH`.
* [Compress::Bzip2] version 2 or above.
This is used to support reading/writing the bzip2 compressed
files. It is only needed when bzip2 compressed files are
encountered. If it is not available, `arclog` tries the `bzip2`
executable instead. If that is not available, too, `arclog`
fails. Notice that older versions before 2 does not work, since
the file I/O compression was not implemented yet. You can
download and install Compress::Bzip2 from the CPAN archive, or
install it with the CPAN shell:
cpan Compress::Bzip2
or with the CPANPLUS shell:
cpanp i Compress::Bzip2
For Debian/Ubuntu:
sudo apt install libcompress-bzip2-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-Compress-Bzip2
For FreeBSD:
ports install p5-Compress-Bzip2
For ActivePerl:
ppm install Compress-Bzip2
The alternative `bzip2.exe` for MS-Windows can be obtained from
[the bzip2 website]. Be sure to save it as `bzip2.exe` somewhere
in your `PATH`.
* [Term::ReadKey]
This is used to display the progress bar. The progress bar is a
good visual feedback of what `arclog` is currently doing, but
`arclog` is safe without it. You can download and install
Term::ReadKey from the CPAN archive, or install it with the
CPAN shell:
cpan Term::ReadKey
or with the CPANPLUS shell:
cpanp i Term::ReadKey
For Debian/Ubuntu:
sudo apt install libterm-readkey-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-TermReadKey
For FreeBSD:
ports install p5-Term-ReadKey
For ActivePerl:
ppm install TermReadKey
[Perl website]: https://www.perl.org
[Strawberry Perl]: https://strawberryperl.com
[ActivePerl]: https://www.activestate.com/products/perl/
[Date::Parse]: https://metacpan.org/release/TimeDate
[File::MMagic]: https://metacpan.org/release/File-MMagic
[GnuWin32]: http://gnuwin32.sourceforge.net
[Compress::Zlib]: https://metacpan.org/pod/Compress::Zlib
[the gzip website]: https://www.gzip.org
[Compress::Bzip2]: https://metacpan.org/release/Compress-Bzip2
[the bzip2 website]: http://www.bzip.org
[Term::ReadKey]: https://metacpan.org/release/TermReadKey
Download
--------
`arclog` is hosted is on…
* [arclog project on GitHub]
* [arclog project on SourceForge]
You can always download the newest version of `arclog` from…
* [arclog download on SourceForge]
* [Tavern IMACATs FTP directory]
imacats PGP public key is at…
* [imacats PGP key at Tavern IMACATs]
[arclog project on GitHub]: https://github.com/imacat/arclog
[arclog project on SourceForge]: https://sf.net/p/arclog
[arclog download on SourceForge]: https://sourceforge.net/projects/arclog/files
[Tavern IMACATs FTP directory]: https://ftp.imacat.idv.tw/pub/arclog/
[imacats PGP key at Tavern IMACATs]: https://www.imacat.idv.tw/me/pgpkey.asc
Install
-------
If you are upgrading from `arclog.pl` 2.1.1dev4 or earlier, please
read the upgrade instruction later in this document.
### Install with [ExtUtils::MakeMaker]
% perl Makefile.PL
% make
% make test
% make install
When running `make install`, make sure you have the privilege to
write to the installation locations. This usually requires the `root`
privilege.
If you want to install into another location, you can set the
`PREFIX`. For example, to install into your home when you are not
`root`:
% perl Makefile.PL PREFIX=/home/jessica
Refer to the documentation of ExtUtils::MakeMaker for more
installation options (by running `perldoc ExtUtils::MakeMaker`).
For MS-Windows, since `make` is not universally available,
Module::Build is preferred to ExtUtils::MakeMaker. See the
instructions below.
### Install with [Module::Build]
% perl Build.PL
% ./Build
% ./Build test
% ./Build install
When running `./Build install`, make sure you have the privilege to
write to the installation locations. This usually requires the `root`
privilege.
If you want to install into another location, you can set the
`--prefix`. For example, to install into your home when you are not
`root`:
% perl Build.PL --prefix=/home/jessica
Refer to the documentation of Module::Build for more installation
options (by running `perldoc Module::Build`).
[ExtUtils::MakeMaker]: https://metacpan.org/release/ExtUtils-MakeMaker
[Module::Build]: https://metacpan.org/release/Module-Build
Upgrade Instruction
-------------------
Here are a few hints for people upgrading from 2.1.1dev4 or earlier:
### The Script Name is Changed from `arclog.pl` to `arclog`
This is obvious. If you have any scripts or cron jobs that are
running `arclog`, remember to modify your script for the new name.
Of course, you can rename `arclog` to `arclog.pl`. It still works.
The reason I changed the script and project name is that: A dot `.`
in the project name is not valid everywhere. At least SourceForge
dont accept it. Besides, `arclog` is enough for a script name under
UNIX. The `.pl` file name suffix/extension may be convenient on
MS-Windows, but MS-Windows users wont run it with explorer file name
association anyway, and there is a `pl2bat` to convert `arclog` to
`arclog.bat`, which would make more sense. The only disadvantage is
that I was using `UltraEdit`, which depends on the file name extension
for the syntax highlighting rules. I can manually set it anyway. Im
using `gedit` on Linux now. This is not a problem anymore.
### The Default Installation Location Is at `/usr/bin`
Also, the man page is at `/usr/share/man/man1/arclog.1`. This is to
follow Perls standard convention, and to avoid breaking
ExtUtils::MakeMaker with future versions.
When you run `perl Makefile.PL` or `perl Build.PL`, it hints a
list of existing old files to be removed. Please delete them
manually.
If you saved them in other places, you have to delete them yourself.
Also, if you have any scripts or cron jobs that are running `arclog`,
remember to modify your script for the new `arclog` location. Of
course, you can copy `arclog` to the original location. It still
works.
### The Argument of `--keep` and `--override` Options Are Required Now
Support for omitting the `--keep` or `--override` arguments are
removed. This helps to avoid confusion for the log file name and the
option arguments.
Options
-------
./arclog [options] logfile… [output]
./arclog [-h|-v]
* `logfile`
The log file to be archived. Specify `-` to read from `STDIN`.
You can specify multiple log files. `gzip` or `bzip2` compressed
files are supported.
* `output`
The prefix of the output files. The output files are named as
`output.yyyymm`, i.e., `output.200101`, `output.200102`. If not
specified, the default is the same as the log file. You must
specify this if you want to read from `STDIN`. You cannot specify
`-` (`STDOUT`), since `arclog` needs a name prefix, not the output
file.
* `-c`, `--compress method`
Specify the compression method for the archived files. Log files
usually have large number of similar lines. Compress them saves you
lots of disk spaces. (And this is why we want to archive them.)
The following compression methods are supported:
* `g`, `gzip`
Compress with `gzip`. This is the default. `arclog` can use
`Compress::Zlib` to compress instead of calling `gzip`. This can
be safer and faster for not calling foreign binaries. If
`Compress::Zlib` is not installed, it tries `gzip` instead. If
`gzip` is not available, either, it fails.
* `b`, `bzip2`
Compress with `bzip2`. `arclog` can use `Compress::Bzip2` to
compress instead of calling `bzip2`. This can be safer and faster
for not calling foreign binaries. If `Compress::Bzip2` is not
installed, it tries `bzip2` instead. If `bzip2` is not available,
either, it fails.
* `n`, `none`
No compression at all. (Why? :p)
* `--nocompress`
Do not compress the archived files. This is equivalent to
`--compress none`.
* `-s`, `--sort`
Sort the records by time (and then the record order). Sorting eats
huge memory and CPU, so it is disabled by default. Refer to the
description above for a detailed illustration on sorting.
* `--nosort`
Do not sort the records. This is the default.
* `-o`, `--override mode`
What to do with the existing archived files. The following modes
are supported:
* `o`, `overwrite`
Overwrite existing target files. You will lose these existing
records. Use with care. This is helpful if you are sure the
main log file has the most complete records.
* `a`, `append`
Append the records to the existing target files. You may destroy
the log file completely by putting irrelevant entries altogether
by accident. Use with care. This is helpful if you append want
to merge 2 or more log files, for example, 2 log files of
different periods.
* `i`, `ignore`
Ignore any existing target file, and discard all the records of
those months. You will lose these log records. Use with care.
This is helpful if you are supplying log records for the missing
months, or if you are merging the log records in a complex manner.
* `f`, `fail`
Stop whenever a target file exists, to prevent destroying existing
files by accident. This should be mostly desired when run from
some automatic mechanism, like `crontab`. So, this is the default
if no terminal is found at `STDIN`.
* `ask`
Ask you what to do when a target file exists. This should be
mostly desired if you are running `arclog` interactively. So,
this is the default if a terminal is found at `STDIN`. The
answers are read from `STDIN`. Since you have only one `STDIN`,
you cannot specify this mode if you want read the log file from
`STDIN`. In that case, it falls back to the `fail` mode. Also,
if `arclog` cannot get its answer from `STDIN`, for example, on a
closed `STDIN` from `crontab`, it falls back to `fail` mode.
* `-k`, `--keep mode`
What to keep in the source file. Currently, the following modes are
supported:
* `a`, `all`
Keep the source file after records are archived.
* `r`, `restart`
Restart the source log file after records are archived.
* `d`, `delete`
Delete the source log file after records are archived.
* `t`, `this-month`
Archive and strip records of previous months off from the log
file. Keep the records of this month in the source log file, to
be archived next month. This is designed to be run from `crontab`
monthly, so this is the default.
* `-d`, `--debug`
Show the detailed debugging messages. More `-d` to be more
detailed.
* `-q`, `--quiet`
Hush! Only yell on error.
* `-h`, `--help`
Display the help message and exit.
* `-v`, `--version`
Output version information and exit.
Documentation
-------------
Type `perldoc arclog` to read the `arclog` manual.
News, Changes and Updates
-------------------------
Refer to the `Changes` for changes, bug fixes, updates, new functions,
etc.
Support
-------
The `arclog` project is hosted on GitHub. Address your issues on the
GitHub issue tracker https://github.com/imacat/arclog/issues.
2021-02-07 13:46:25 +08:00
Thanks
------
* Thanks to [Chen-hsiu Huang] for reporting the bug that
`$WORKING_RES` was not locked when opened.
* Thanks to [SourceForge] for providing compiling farm for projects
to test on different platforms.
[Chen-hsiu Huang]: mailto:chenhsiu@gens.dhs.org
[SourceForge]: https://sf.net
License
-------
Copyright (C) 2001-2021 imacat.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
imacat ^_*'
2007/12/3
<imacat@mail.imacat.idv.tw>
https://www.imacat.idv.tw