A non-threaded Perl link checker
Go to file
2021-02-07 10:41:01 +08:00
t Replaced the messy GNU-styled, multi-lingual documents with a single Markdown README.md. Added the GitHub repository to the document and removed the SourceForge website, my web page, and the SourceForge mailing list from the document. Removed the document license, and my PGP key from the package. Removed the signature file from the repository. Changed the license from GPLv3 to APLv2. Various fixes to avoid JetBrains IDE problem detection. 2021-02-06 19:46:56 +08:00
.gitignore Added Makefile.old to .gitignore. 2021-02-07 10:34:29 +08:00
Build.PL Replaced the messy GNU-styled, multi-lingual documents with a single Markdown README.md. Added the GitHub repository to the document and removed the SourceForge website, my web page, and the SourceForge mailing list from the document. Removed the document license, and my PGP key from the package. Removed the signature file from the repository. Changed the license from GPLv3 to APLv2. Various fixes to avoid JetBrains IDE problem detection. 2021-02-06 19:46:56 +08:00
Changes Advanced to version 3.10. 2021-02-06 23:32:43 +08:00
chklinks Advanced to version 3.10. 2021-02-06 23:32:43 +08:00
LICENSE Replaced the messy GNU-styled, multi-lingual documents with a single Markdown README.md. Added the GitHub repository to the document and removed the SourceForge website, my web page, and the SourceForge mailing list from the document. Removed the document license, and my PGP key from the package. Removed the signature file from the repository. Changed the license from GPLv3 to APLv2. Various fixes to avoid JetBrains IDE problem detection. 2021-02-06 19:46:56 +08:00
Makefile.PL Replaced the messy GNU-styled, multi-lingual documents with a single Markdown README.md. Added the GitHub repository to the document and removed the SourceForge website, my web page, and the SourceForge mailing list from the document. Removed the document license, and my PGP key from the package. Removed the signature file from the repository. Changed the license from GPLv3 to APLv2. Various fixes to avoid JetBrains IDE problem detection. 2021-02-06 19:46:56 +08:00
MANIFEST.SKIP Added Makefile.old to MANIFEST.SKIP. 2021-02-07 10:41:01 +08:00
README.md Replaced the messy GNU-styled, multi-lingual documents with a single Markdown README.md. Added the GitHub repository to the document and removed the SourceForge website, my web page, and the SourceForge mailing list from the document. Removed the document license, and my PGP key from the package. Removed the signature file from the repository. Changed the license from GPLv3 to APLv2. Various fixes to avoid JetBrains IDE problem detection. 2021-02-06 19:46:56 +08:00
THANKS Initial commit. 2021-02-05 22:43:13 +08:00

chklinks - A Non-Threaded Perl Link Checker

Description

chklinks is a non-threaded Perl link checker. It helps to find broken links on your website.

chklinks differs from linkchecker in that chklinks is non-threaded. It does not raise many simultaneously connections for its job. It wont run out of the resources and crash your system in a moment. This is certainly more desirable for most webmasters and users.

chklinks respects robots.txt. If you disallow robots from your website and experience problems, you need to allow chklinks. Add the following lines to your robots.txt file to allow chklinks:

User-agent: chklinks
Disallow:

chklinks uses LWP::RobotUA and supports the following schemes: http, https, ftp, gopher and file. You can also specify a local file. (To use https, you need to install Crypt::SSLeay. This is the requirement of LWP::RobotUA.)

chklinks supports cookies.

System Requirement

  1. Perl, version 5.6 or above. I have not successfully run this on earlier versions. Please tell me if you can. You can run perl -v to see your current Perl version. If you do not have Perl, or if you have an older version of Perl, you can download and install/upgrade it from the Perl website. If you are using MS-Windows, you can download and install ActiveState ActivePerl.

  2. Required Perl modules:

    • URI

      This is used to parse and process the found URLs. You can download and install URI from the CPAN archive, or install it with the CPAN shell:

      cpan URI
      

      or with the CPANPLUS shell:

      cpanp i URI
      

      For Debian/Ubuntu:

      sudo apt install liburi-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-URI
      

      For FreeBSD:

      ports install p5-URI
      

      For ActivePerl:

      ppm install URI
      
    • HTML::LinkExtor

      This is used to extract links from the web pages. HTML::LinkExtor is contained in the HTML-Parser distribution. You can download and install HTML::LinkExtor from the CPAN archive, or install it with the CPAN shell:

      cpan HTML::LinkExtor
      

      or with the CPANPLUS shell:

      cpanp i HTML::LinkExtor
      

      For Debian/Ubuntu:

      sudo apt install libhtml-parser-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-HTML-Parser
      

      For FreeBSD:

      ports install p5-HTML-Parser
      

      For ActivePerl:

      ppm install HTML-Parser
      
    • LWP::RobotUA

      This is used to request web pages. LWP::RobotUA is contained in the libwww-perl distribution. You can download and install LWP::RobotUA from the CPAN archive, or install it with the CPAN shell:

      cpan LWP::RobotUA
      

      or with the CPANPLUS shell:

      cpanp i HTML::LinkExtor
      

      For Debian/Ubuntu:

      sudo apt install libwww-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-libwww-perl
      

      For FreeBSD:

      ports install p5-libwww
      

      For ActivePerl:

      ppm install libwww-perl
      
  3. Optional Perl modules:

    • Crypt::SSLeay

      This is needed by LWP::RobotUA to support HTTPS. You can download and install HTML::LinkExtor from the CPAN archive, or install it with the CPAN shell:

      cpan Crypt::SSLeay
      

      or with the CPANPLUS shell:

      cpanp i Crypt::SSLeay
      

      For Debian/Ubuntu:

      sudo apt install libcrypt-ssleay-perl
      

      For Red Hat/Fedora/CentOS:

      sudo yum install perl-Crypt-SSLeay
      

      For FreeBSD:

      ports install p5-Crypt-SSLeay
      

      For ActivePerl:

      ppm install Crypt-SSLeay
      

Download

chklinks is hosted is on…

You can always download the newest version of chklinks from…

imacats PGP public key is at…

Install

chklinks uses standard Perl installation with ExtUtils::MakeMaker. Follow these steps:

% perl Makefile.PL
% make
% make test
% make install

When running make install, make sure you have the privilege to write to the installation location. This usually requires the root privilege.

If you are using ActivePerl under MS-Windows, you should use nmake instead of make. nmake can be obtained from the Microsoft FTP site.

If you want to install into another location, you can set the PREFIX. For example, to install into your home when you are not root:

% perl Makefile.PL PREFIX=/home/jessica

Refer to the documentation of ExtUtils::MakeMaker for more installation options (by running perldoc ExtUtils::MakeMaker).

Install with Module::Build

You can install with Module::Build instead, if you prefer. Follow these steps:

% perl Build.PL
% ./Build
% ./Build test
% ./Build install

When running ./Build install, make sure you have the privilege to write to the installation location. This usually requires the root privilege.

If you want to install into another location, you can set the --prefix. For example, to install into your home when you are not root:

% perl Build.PL --prefix=/home/jessica

Refer to the documentation of Module::Build for more installation options (by running perldoc Module::Build).

Options

./chklinks [options] URL1 [URL2 [URL3 …]]
./chklinks [-h|-v]
  • -1, --onelevel

    Check the links on this page and stops.

  • -r, --recursive

    Recursively check through this website. This is the default.

  • -b, --below

    Only check the links below this directory. This is the default.

  • -p, --parent

    Trace back to the parent directories.

  • -l, --local

    Only check the links on this same host.

  • -s, --span

    Check the links to other hosts (without recursion). This is the default.

  • -e, --exclude path

    Exclude this path. Check for their existence but not check the links on them, just like they are on a foreign site. Multiple --exclude are OK.

  • -i, --include path

    Include this path. An opposite of --exclude that cancels its effect. The latter specified has a higher priority.

  • -d, --debug

    Display debug messages. Multiple --debug to debug more.

  • -q, --quiet

    Disable debug messages. An opposite that cancels the effect of --debug.

  • -h, --help

    Display the help message and exit.

  • -v, --version

    Output version information and exit.

  • URL1, URL2, URL3

    The URLs of the websites to check against.

Notes

  • chklinks does not obey Crawl-delay: in robots.txt yet. This is a problem in WWW::RobotRules, but not chklinks itself.

  • If you encounter warnings like this:

    Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/LWP/Protocol.pm line 114.
    

    This is an issue of LWP::Protocol version ≤ 1.43 (in libwww-perl version ≤ 5.805) when working with HTML::Parser version ≥ 3.40 and Perl version ≥ 5.8. This issue is solved in LWP::Protocol version ≥ 1.46 (in libwww-perl version ≥ 5.806). You can upgrade your LWP::Protocol to the current version. If you cannot upgrade it, see CPAN RT Bug#20274 for an LWP::Protocol patch on this.

Bugs

chklinks does not support authentication yet. W3C-LinkChecker supports this. As a workaround, You can use the syntax http://user:pass@some.where.com/some/path for Basic Authentication, but this does not work on Digest Authentication. This practice is not encouraged. Your password would be visible to anyone on this system using ps, including hidden intruders. Also, what you type in your shell will be saved to your shell history file.

mailto: URLs should be supported by checking the validity of its DNS/MX record. Bastian Kleineidam's linkchecker have support on this.

Local file checking has only been tested on Unix and MSWin32. More platforms should be tested, especially VMS and Mac.

See Also

LWP::UserAgent, LWP::RobotUA, WWW::RobotRules, URI, HTML::LinkExtor, Bastian Kleineidams linkchecker and W3C-LinkChecker checklink.

Release Notes

Please read the NEWS for the new functions and bug fixes.

Support

The chklinks project is hosted on GitHub. Address your issues on the GitHub issue tracker https://github.com/imacat/chklinks/issues.

License

Copyright (C) 2003-2021 imacat.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

imacat ^_*'
2003/5/25
imacat@mail.imacat.idv.tw
https://www.imacat.idv.tw