chklinks - A Non-Threaded Perl Link Checker
Description
chklinks is a non-threaded Perl link checker. It helps to find
broken links on your website.
chklinks differs from linkchecker in that chklinks is
non-threaded. It does not raise many simultaneously connections for
its job. It won’t run out of the resources and crash your system in a
moment. This is certainly more desirable for most webmasters and
users.
chklinks respects robots.txt. If you disallow robots from your
website and experience problems, you need to allow chklinks. Add
the following lines to your robots.txt file to allow chklinks:
User-agent: chklinks
Disallow:
chklinks uses LWP::RobotUA and supports the following schemes:
http, https, ftp, gopher and file. You can also specify a
local file. (To use https, you need to install Crypt::SSLeay.
This is the requirement of LWP::RobotUA.)
chklinks supports cookies.
System Requirement
-
Perl, version 5.6 or above. I have not successfully run this on earlier versions. Please tell me if you can.
You can run
perl -vto check your current Perl version. If you do not have Perl, or if you have an older version of Perl, you can download and install/upgrade it from the Perl website. For MS-Windows, you can download and install Strawberry Perl or ActivePerl. -
Required Perl modules:
-
This is used to parse and process the found URLs. You can download and install URI from the CPAN archive, or install it with the CPAN shell:
cpan URIor with the CPANPLUS shell:
cpanp i URIFor Debian/Ubuntu:
sudo apt install liburi-perlFor Red Hat/Fedora/CentOS:
sudo yum install perl-URIFor FreeBSD:
ports install p5-URIFor ActivePerl:
ppm install URI -
This is used to extract links from the web pages. HTML::LinkExtor is contained in the HTML-Parser distribution. You can download and install HTML::LinkExtor from the CPAN archive, or install it with the CPAN shell:
cpan HTML::LinkExtoror with the CPANPLUS shell:
cpanp i HTML::LinkExtorFor Debian/Ubuntu:
sudo apt install libhtml-parser-perlFor Red Hat/Fedora/CentOS:
sudo yum install perl-HTML-ParserFor FreeBSD:
ports install p5-HTML-ParserFor ActivePerl:
ppm install HTML-Parser -
This is used to request web pages. LWP::RobotUA is contained in the libwww-perl distribution. You can download and install LWP::RobotUA from the CPAN archive, or install it with the CPAN shell:
cpan LWP::RobotUAor with the CPANPLUS shell:
cpanp i HTML::LinkExtorFor Debian/Ubuntu:
sudo apt install libwww-perlFor Red Hat/Fedora/CentOS:
sudo yum install perl-libwww-perlFor FreeBSD:
ports install p5-libwwwFor ActivePerl:
ppm install libwww-perl
-
-
Optional Perl modules:
-
This is needed by LWP::RobotUA to support HTTPS. You can download and install HTML::LinkExtor from the CPAN archive, or install it with the CPAN shell:
cpan Crypt::SSLeayor with the CPANPLUS shell:
cpanp i Crypt::SSLeayFor Debian/Ubuntu:
sudo apt install libcrypt-ssleay-perlFor Red Hat/Fedora/CentOS:
sudo yum install perl-Crypt-SSLeayFor FreeBSD:
ports install p5-Crypt-SSLeayFor ActivePerl:
ppm install Crypt-SSLeay
-
Download
chklinks is hosted is on…
You can always download the newest version of chklinks from…
imacat’s PGP public key is at…
Install
% perl Makefile.PL
% make
% make test
% make install
When running make install, make sure you have the privilege to
write to the installation locations. This usually requires the root
privilege.
If you want to install into another location, you can set the
PREFIX. For example, to install into your home when you are not
root:
% perl Makefile.PL PREFIX=/home/jessica
Refer to the documentation of ExtUtils::MakeMaker for more
installation options (by running perldoc ExtUtils::MakeMaker).
For MS-Windows, since make is not universally available,
Module::Build is preferred to ExtUtils::MakeMaker. See the
instructions below.
Install with Module::Build
% perl Build.PL
% ./Build
% ./Build test
% ./Build install
When running ./Build install, make sure you have the privilege to
write to the installation locations. This usually requires the root
privilege.
If you want to install into another location, you can set the
--prefix. For example, to install into your home when you are not
root:
% perl Build.PL --prefix=/home/jessica
Refer to the documentation of Module::Build for more installation
options (by running perldoc Module::Build).
Options
./chklinks [options] URL1 [URL2 [URL3 …]]
./chklinks [-h|-v]
-
-1,--onelevelCheck the links on this page and stops.
-
-r,--recursiveRecursively check through this website. This is the default.
-
-b,--belowOnly check the links below this directory. This is the default.
-
-p,--parentTrace back to the parent directories.
-
-l,--localOnly check the links on this same host.
-
-s,--spanCheck the links to other hosts (without recursion). This is the default.
-
-e,--exclude pathExclude this path. Check for their existence but not check the links on them, just like they are on a foreign site. Multiple
--excludeare OK. -
-i,--include pathInclude this path. An opposite of
--excludethat cancels its effect. The latter specified has a higher priority. -
-d,--debugDisplay debug messages. Multiple
--debugto debug more. -
-q,--quietDisable debug messages. An opposite that cancels the effect of
--debug. -
-h,--helpDisplay the help message and exit.
-
-v,--versionOutput version information and exit.
-
URL1,URL2,URL3The URLs of the websites to check against.
Notes
-
chklinksdoes not obeyCrawl-delay:inrobots.txtyet. This is a problem in WWW::RobotRules, but notchklinksitself. -
If you encounter warnings like this:
Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/LWP/Protocol.pm line 114.This is an issue of LWP::Protocol version ≤ 1.43 (in libwww-perl version ≤ 5.805) when working with HTML::Parser version ≥ 3.40 and Perl version ≥ 5.8. This issue is solved in LWP::Protocol version ≥ 1.46 (in libwww-perl version ≥ 5.806). You can upgrade your LWP::Protocol to the current version. If you cannot upgrade it, see CPAN RT Bug#20274 for an LWP::Protocol patch on this.
Bugs
chklinks does not support authentication yet. W3C-LinkChecker
supports this. As a workaround, You can use the syntax
http://user:pass@some.where.com/some/path for Basic Authentication,
but this does not work on Digest Authentication. This practice is
not encouraged. Your password would be visible to anyone on this
system using ps, including hidden intruders. Also, what you type
in your shell will be saved to your shell history file.
mailto: URLs should be supported by checking the validity of its
DNS/MX record. Bastian Kleineidam's linkchecker have support on
this.
Local file checking has only been tested on Unix and MSWin32. More platforms should be tested, especially VMS and Mac.
See Also
LWP::UserAgent, LWP::RobotUA, WWW::RobotRules, URI, HTML::LinkExtor, Bastian Kleineidam’s linkchecker and W3C-LinkChecker checklink.
Release Notes
Please read the NEWS for the new functions and bug fixes.
Support
The chklinks project is hosted on GitHub. Address your issues on the
GitHub issue tracker https://github.com/imacat/chklinks/issues.
Thanks
-
Thanks to SourceForge for providing compiling farm for projects to test on different platforms.
-
Thanks to Stefan Seifert for pointing out redirection loops problem when cookies are not activated. (2005-11-07)
-
Thanks to nsnake for reporting warnings from HTML::Parser version >= 3.40 when checking UTF-8 pages. (2007-06-06)
License
Copyright (C) 2003-2021 imacat.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
imacat ^_*'
2003/5/25
imacat@mail.imacat.idv.tw
https://www.imacat.idv.tw