2021-02-06 19:46:56 +08:00
|
|
|
|
`chklinks` - A Non-Threaded Perl Link Checker
|
|
|
|
|
=============================================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Description
|
|
|
|
|
-----------
|
|
|
|
|
|
|
|
|
|
`chklinks` is a non-threaded Perl link checker. It helps to find
|
|
|
|
|
broken links on your website.
|
|
|
|
|
|
|
|
|
|
`chklinks` differs from [linkchecker] in that `chklinks` is
|
|
|
|
|
non-threaded. It does not raise many simultaneously connections for
|
|
|
|
|
its job. It won’t run out of the resources and crash your system in a
|
|
|
|
|
moment. This is certainly more desirable for most webmasters and
|
|
|
|
|
users.
|
|
|
|
|
|
|
|
|
|
`chklinks` respects `robots.txt`. If you disallow robots from your
|
|
|
|
|
website and experience problems, you need to allow `chklinks`. Add
|
|
|
|
|
the following lines to your `robots.txt` file to allow `chklinks`:
|
|
|
|
|
|
|
|
|
|
User-agent: chklinks
|
|
|
|
|
Disallow:
|
|
|
|
|
|
|
|
|
|
`chklinks` uses [LWP::RobotUA] and supports the following schemes:
|
|
|
|
|
`http`, `https`, `ftp`, `gopher` and `file`. You can also specify a
|
|
|
|
|
local file. (To use `https`, you need to install [Crypt::SSLeay].
|
|
|
|
|
This is the requirement of LWP::RobotUA.)
|
|
|
|
|
|
|
|
|
|
`chklinks` supports cookies.
|
|
|
|
|
|
|
|
|
|
[linkchecker]: https://wummel.github.io/linkchecker
|
|
|
|
|
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
|
|
|
|
|
[Crypt::SSLeay]: https://metacpan.org/release/Crypt-SSLeay
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
System Requirement
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
1. Perl, version 5.6 or above. I have not successfully run this on
|
2021-02-10 08:00:54 +08:00
|
|
|
|
earlier versions. Please tell me if you can.
|
|
|
|
|
|
|
|
|
|
You can run `perl -v` to check your current Perl version. If you
|
|
|
|
|
do not have Perl, or if you have an older version of Perl, you can
|
|
|
|
|
download and install/upgrade it from the [Perl website]. For
|
|
|
|
|
MS-Windows, you can download and install [Strawberry Perl] or
|
|
|
|
|
[ActivePerl].
|
2021-02-06 19:46:56 +08:00
|
|
|
|
|
|
|
|
|
2. Required Perl modules:
|
|
|
|
|
|
|
|
|
|
* [URI]
|
|
|
|
|
|
|
|
|
|
This is used to parse and process the found URLs. You can
|
|
|
|
|
download and install URI from the CPAN archive, or install it
|
|
|
|
|
with the CPAN shell:
|
|
|
|
|
|
|
|
|
|
cpan URI
|
|
|
|
|
|
|
|
|
|
or with the CPANPLUS shell:
|
|
|
|
|
|
|
|
|
|
cpanp i URI
|
|
|
|
|
|
|
|
|
|
For Debian/Ubuntu:
|
|
|
|
|
|
|
|
|
|
sudo apt install liburi-perl
|
|
|
|
|
|
|
|
|
|
For Red Hat/Fedora/CentOS:
|
|
|
|
|
|
|
|
|
|
sudo yum install perl-URI
|
|
|
|
|
|
|
|
|
|
For FreeBSD:
|
|
|
|
|
|
|
|
|
|
ports install p5-URI
|
|
|
|
|
|
|
|
|
|
For ActivePerl:
|
|
|
|
|
|
|
|
|
|
ppm install URI
|
|
|
|
|
|
|
|
|
|
* [HTML::LinkExtor]
|
|
|
|
|
|
|
|
|
|
This is used to extract links from the web pages.
|
|
|
|
|
HTML::LinkExtor is contained in the [HTML-Parser] distribution.
|
|
|
|
|
You can download and install HTML::LinkExtor from the CPAN
|
|
|
|
|
archive, or install it with the CPAN shell:
|
|
|
|
|
|
|
|
|
|
cpan HTML::LinkExtor
|
|
|
|
|
|
|
|
|
|
or with the CPANPLUS shell:
|
|
|
|
|
|
|
|
|
|
cpanp i HTML::LinkExtor
|
|
|
|
|
|
|
|
|
|
For Debian/Ubuntu:
|
|
|
|
|
|
|
|
|
|
sudo apt install libhtml-parser-perl
|
|
|
|
|
|
|
|
|
|
For Red Hat/Fedora/CentOS:
|
|
|
|
|
|
|
|
|
|
sudo yum install perl-HTML-Parser
|
|
|
|
|
|
|
|
|
|
For FreeBSD:
|
|
|
|
|
|
|
|
|
|
ports install p5-HTML-Parser
|
|
|
|
|
|
|
|
|
|
For ActivePerl:
|
|
|
|
|
|
|
|
|
|
ppm install HTML-Parser
|
|
|
|
|
|
|
|
|
|
* [LWP::RobotUA]
|
|
|
|
|
|
|
|
|
|
This is used to request web pages. LWP::RobotUA is contained in
|
|
|
|
|
the [libwww-perl] distribution. You can download and install
|
|
|
|
|
LWP::RobotUA from the CPAN archive, or install it with the CPAN
|
|
|
|
|
shell:
|
|
|
|
|
|
|
|
|
|
cpan LWP::RobotUA
|
|
|
|
|
|
|
|
|
|
or with the CPANPLUS shell:
|
|
|
|
|
|
|
|
|
|
cpanp i HTML::LinkExtor
|
|
|
|
|
|
|
|
|
|
For Debian/Ubuntu:
|
|
|
|
|
|
|
|
|
|
sudo apt install libwww-perl
|
|
|
|
|
|
|
|
|
|
For Red Hat/Fedora/CentOS:
|
|
|
|
|
|
|
|
|
|
sudo yum install perl-libwww-perl
|
|
|
|
|
|
|
|
|
|
For FreeBSD:
|
|
|
|
|
|
|
|
|
|
ports install p5-libwww
|
|
|
|
|
|
|
|
|
|
For ActivePerl:
|
|
|
|
|
|
|
|
|
|
ppm install libwww-perl
|
|
|
|
|
|
|
|
|
|
3. Optional Perl modules:
|
|
|
|
|
|
|
|
|
|
* [Crypt::SSLeay]
|
|
|
|
|
|
|
|
|
|
This is needed by LWP::RobotUA to support HTTPS. You can
|
|
|
|
|
download and install HTML::LinkExtor from the CPAN archive, or
|
|
|
|
|
install it with the CPAN shell:
|
|
|
|
|
|
|
|
|
|
cpan Crypt::SSLeay
|
|
|
|
|
|
|
|
|
|
or with the CPANPLUS shell:
|
|
|
|
|
|
|
|
|
|
cpanp i Crypt::SSLeay
|
|
|
|
|
|
|
|
|
|
For Debian/Ubuntu:
|
|
|
|
|
|
|
|
|
|
sudo apt install libcrypt-ssleay-perl
|
|
|
|
|
|
|
|
|
|
For Red Hat/Fedora/CentOS:
|
|
|
|
|
|
|
|
|
|
sudo yum install perl-Crypt-SSLeay
|
|
|
|
|
|
|
|
|
|
For FreeBSD:
|
|
|
|
|
|
|
|
|
|
ports install p5-Crypt-SSLeay
|
|
|
|
|
|
|
|
|
|
For ActivePerl:
|
|
|
|
|
|
|
|
|
|
ppm install Crypt-SSLeay
|
|
|
|
|
|
|
|
|
|
[Perl website]: https://www.perl.org
|
2021-02-10 08:00:54 +08:00
|
|
|
|
[Strawberry Perl]: https://strawberryperl.com
|
|
|
|
|
[ActivePerl]: https://www.activestate.com/products/perl/
|
2021-02-11 08:56:26 +08:00
|
|
|
|
[URI]: https://metacpan.org/pod/URI
|
2021-02-06 19:46:56 +08:00
|
|
|
|
[HTML::LinkExtor]: https://metacpan.org/pod/HTML::LinkExtor
|
|
|
|
|
[HTML-Parser]: https://metacpan.org/release/HTML-Parser
|
|
|
|
|
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
|
|
|
|
|
[libwww-perl]: https://metacpan.org/release/libwww-perl
|
2021-02-11 08:56:26 +08:00
|
|
|
|
[Crypt::SSLeay]: https://metacpan.org/pod/Crypt::SSLeay
|
2021-02-06 19:46:56 +08:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Download
|
|
|
|
|
--------
|
|
|
|
|
|
|
|
|
|
`chklinks` is hosted is on…
|
|
|
|
|
|
|
|
|
|
* [chklinks project on GitHub]
|
|
|
|
|
|
|
|
|
|
* [chklinks project on SourceForge]
|
|
|
|
|
|
|
|
|
|
You can always download the newest version of `chklinks` from…
|
|
|
|
|
|
|
|
|
|
* [chklinks download on SourceForge]
|
|
|
|
|
|
|
|
|
|
* [Tavern IMACAT’s FTP directory]
|
|
|
|
|
|
|
|
|
|
imacat’s PGP public key is at…
|
|
|
|
|
|
|
|
|
|
* [imacat’s PGP key at Tavern IMACAT’s]
|
|
|
|
|
|
|
|
|
|
[chklinks project on GitHub]: https://github.com/imacat/chklinks
|
|
|
|
|
[chklinks project on SourceForge]: https://sf.net/p/chklinks
|
|
|
|
|
[chklinks download on SourceForge]: https://sourceforge.net/projects/chklinks/files
|
|
|
|
|
[Tavern IMACAT’s FTP directory]: https://ftp.imacat.idv.tw/pub/chklinks/
|
|
|
|
|
[imacat’s PGP key at Tavern IMACAT’s]: https://www.imacat.idv.tw/me/pgpkey.asc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Install
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
% perl Makefile.PL
|
|
|
|
|
% make
|
|
|
|
|
% make test
|
|
|
|
|
% make install
|
|
|
|
|
|
|
|
|
|
When running `make install`, make sure you have the privilege to
|
2021-02-10 08:00:54 +08:00
|
|
|
|
write to the installation locations. This usually requires the `root`
|
2021-02-06 19:46:56 +08:00
|
|
|
|
privilege.
|
|
|
|
|
|
|
|
|
|
If you want to install into another location, you can set the
|
|
|
|
|
`PREFIX`. For example, to install into your home when you are not
|
|
|
|
|
`root`:
|
|
|
|
|
|
|
|
|
|
% perl Makefile.PL PREFIX=/home/jessica
|
|
|
|
|
|
|
|
|
|
Refer to the documentation of ExtUtils::MakeMaker for more
|
|
|
|
|
installation options (by running `perldoc ExtUtils::MakeMaker`).
|
|
|
|
|
|
2021-02-10 08:00:54 +08:00
|
|
|
|
For MS-Windows, since `make` is not universally available,
|
|
|
|
|
Module::Build is preferred to ExtUtils::MakeMaker. See the
|
|
|
|
|
instructions below.
|
|
|
|
|
|
2021-02-06 19:46:56 +08:00
|
|
|
|
|
2021-02-10 08:00:54 +08:00
|
|
|
|
### Install with [Module::Build]
|
2021-02-06 19:46:56 +08:00
|
|
|
|
|
|
|
|
|
% perl Build.PL
|
|
|
|
|
% ./Build
|
|
|
|
|
% ./Build test
|
|
|
|
|
% ./Build install
|
|
|
|
|
|
|
|
|
|
When running `./Build install`, make sure you have the privilege to
|
2021-02-10 08:00:54 +08:00
|
|
|
|
write to the installation locations. This usually requires the `root`
|
2021-02-06 19:46:56 +08:00
|
|
|
|
privilege.
|
|
|
|
|
|
|
|
|
|
If you want to install into another location, you can set the
|
|
|
|
|
`--prefix`. For example, to install into your home when you are not
|
|
|
|
|
`root`:
|
|
|
|
|
|
|
|
|
|
% perl Build.PL --prefix=/home/jessica
|
|
|
|
|
|
|
|
|
|
Refer to the documentation of Module::Build for more installation
|
|
|
|
|
options (by running `perldoc Module::Build`).
|
|
|
|
|
|
|
|
|
|
[ExtUtils::MakeMaker]: https://metacpan.org/release/ExtUtils-MakeMaker
|
|
|
|
|
[Module::Build]: https://metacpan.org/release/Module-Build
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Options
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
./chklinks [options] URL1 [URL2 [URL3 …]]
|
|
|
|
|
./chklinks [-h|-v]
|
|
|
|
|
|
|
|
|
|
* `-1`, `--onelevel`
|
|
|
|
|
|
|
|
|
|
Check the links on this page and stops.
|
|
|
|
|
|
|
|
|
|
* `-r`, `--recursive`
|
|
|
|
|
|
|
|
|
|
Recursively check through this website. This is the default.
|
|
|
|
|
|
|
|
|
|
* `-b`, `--below`
|
|
|
|
|
|
|
|
|
|
Only check the links below this directory. This is the default.
|
|
|
|
|
|
|
|
|
|
* `-p`, `--parent`
|
|
|
|
|
|
|
|
|
|
Trace back to the parent directories.
|
|
|
|
|
|
|
|
|
|
* `-l`, `--local`
|
|
|
|
|
|
|
|
|
|
Only check the links on this same host.
|
|
|
|
|
|
|
|
|
|
* `-s`, `--span`
|
|
|
|
|
|
|
|
|
|
Check the links to other hosts (without recursion). This is the
|
|
|
|
|
default.
|
|
|
|
|
|
|
|
|
|
* `-e`, `--exclude path`
|
|
|
|
|
|
|
|
|
|
Exclude this path. Check for their existence but not check the
|
|
|
|
|
links on them, just like they are on a foreign site. Multiple
|
|
|
|
|
`--exclude` are OK.
|
|
|
|
|
|
|
|
|
|
* `-i`, `--include path`
|
|
|
|
|
|
|
|
|
|
Include this path. An opposite of `--exclude` that cancels its
|
|
|
|
|
effect. The latter specified has a higher priority.
|
|
|
|
|
|
|
|
|
|
* `-d`, `--debug`
|
|
|
|
|
|
|
|
|
|
Display debug messages. Multiple `--debug` to debug more.
|
|
|
|
|
|
|
|
|
|
* `-q`, `--quiet`
|
|
|
|
|
|
|
|
|
|
Disable debug messages. An opposite that cancels the effect of
|
|
|
|
|
`--debug`.
|
|
|
|
|
|
|
|
|
|
* `-h`, `--help`
|
|
|
|
|
|
|
|
|
|
Display the help message and exit.
|
|
|
|
|
|
|
|
|
|
* `-v`, `--version`
|
|
|
|
|
|
|
|
|
|
Output version information and exit.
|
|
|
|
|
|
|
|
|
|
* `URL1`, `URL2`, `URL3`
|
|
|
|
|
|
|
|
|
|
The URLs of the websites to check against.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Notes
|
|
|
|
|
-----
|
|
|
|
|
|
|
|
|
|
* `chklinks` does not obey `Crawl-delay:` in `robots.txt` yet. This
|
|
|
|
|
is a problem in [WWW::RobotRules], but not `chklinks` itself.
|
|
|
|
|
|
|
|
|
|
* If you encounter warnings like this:
|
|
|
|
|
|
|
|
|
|
Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/LWP/Protocol.pm line 114.
|
|
|
|
|
|
|
|
|
|
This is an issue of [LWP::Protocol] version ≤ 1.43 (in libwww-perl
|
|
|
|
|
version ≤ 5.805) when working with HTML::Parser version ≥ 3.40 and
|
|
|
|
|
Perl version ≥ 5.8. This issue is solved in LWP::Protocol
|
|
|
|
|
version ≥ 1.46 (in libwww-perl version ≥ 5.806). You can upgrade
|
|
|
|
|
your LWP::Protocol to the current version. If you cannot upgrade
|
|
|
|
|
it, see [CPAN RT Bug#20274] for an LWP::Protocol patch on this.
|
|
|
|
|
|
|
|
|
|
[WWW::RobotRules]: https://metacpan.org/pod/WWW::RobotRules
|
|
|
|
|
[LWP::Protocol]: https://metacpan.org/pod/LWP::Protocol
|
|
|
|
|
[CPAN RT Bug#20274]: https://rt.cpan.org/Public/Bug/Display.html?id=20274
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bugs
|
|
|
|
|
----
|
|
|
|
|
|
|
|
|
|
`chklinks` does not support authentication yet. [W3C-LinkChecker]
|
|
|
|
|
supports this. As a workaround, You can use the syntax
|
|
|
|
|
`http://user:pass@some.where.com/some/path` for Basic Authentication,
|
|
|
|
|
but this does not work on Digest Authentication. This practice is
|
|
|
|
|
not encouraged. Your password would be visible to anyone on this
|
|
|
|
|
system using `ps`, including hidden intruders. Also, what you type
|
|
|
|
|
in your shell will be saved to your shell history file.
|
|
|
|
|
|
|
|
|
|
`mailto:` URLs should be supported by checking the validity of its
|
|
|
|
|
DNS/MX record. Bastian Kleineidam's [linkchecker] have support on
|
|
|
|
|
this.
|
|
|
|
|
|
|
|
|
|
Local file checking has only been tested on Unix and MSWin32. More
|
|
|
|
|
platforms should be tested, especially VMS and Mac.
|
|
|
|
|
|
|
|
|
|
[W3C-LinkChecker]: https://validator.w3.org/checklink
|
|
|
|
|
[linkchecker]: https://wummel.github.io/linkchecker
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
See Also
|
|
|
|
|
--------
|
|
|
|
|
|
|
|
|
|
[LWP::UserAgent], [LWP::RobotUA], [WWW::RobotRules], [URI],
|
|
|
|
|
[HTML::LinkExtor], Bastian Kleineidam’s [linkchecker] and
|
|
|
|
|
W3C-LinkChecker [checklink].
|
|
|
|
|
|
|
|
|
|
[LWP::UserAgent]: https://metacpan.org/pod/LWP::UserAgent
|
|
|
|
|
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
|
|
|
|
|
[WWW::RobotRules]: https://metacpan.org/pod/WWW::RobotRules
|
|
|
|
|
[URI]: https://metacpan.org/release/URI
|
|
|
|
|
[HTML::LinkExtor]: https://metacpan.org/pod/HTML::LinkExtor
|
|
|
|
|
[linkchecker]: https://wummel.github.io/linkchecker
|
|
|
|
|
[checklink]: https://validator.w3.org/checklink
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Release Notes
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
Please read the `NEWS` for the new functions and bug fixes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Support
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
The `chklinks` project is hosted on GitHub. Address your issues on the
|
|
|
|
|
GitHub issue tracker https://github.com/imacat/chklinks/issues.
|
|
|
|
|
|
|
|
|
|
|
2021-02-07 13:46:19 +08:00
|
|
|
|
Thanks
|
|
|
|
|
------
|
|
|
|
|
|
|
|
|
|
* Thanks to [SourceForge] for providing compiling farm for projects to
|
|
|
|
|
test on different platforms.
|
|
|
|
|
|
|
|
|
|
* Thanks to [Stefan Seifert] for pointing
|
|
|
|
|
out redirection loops problem when cookies are not activated.
|
|
|
|
|
(2005-11-07)
|
|
|
|
|
|
|
|
|
|
* Thanks to [nsnake] for reporting warnings from [HTML::Parser]
|
|
|
|
|
version >= 3.40 when checking UTF-8 pages. (2007-06-06)
|
|
|
|
|
|
|
|
|
|
[SourceForge]: https://sf.net
|
|
|
|
|
[Stefan Seifert]: mailto:stefan.seifert@atikon.com
|
|
|
|
|
[nsnake]: mailto:loveme1314@gmail.com
|
|
|
|
|
[HTML::Parser]: https://metacpan.org/pod/HTML::Parser
|
|
|
|
|
|
|
|
|
|
|
2021-02-06 19:46:56 +08:00
|
|
|
|
License
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
Copyright (C) 2003-2021 imacat.
|
|
|
|
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
|
you may not use this file except in compliance with the License.
|
|
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
|
limitations under the License.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
imacat ^_*'
|
|
|
|
|
2003/5/25
|
|
|
|
|
<imacat@mail.imacat.idv.tw>
|
|
|
|
|
https://www.imacat.idv.tw
|