`chklinks` - A Non-Threaded Perl Link Checker ============================================= Description ----------- `chklinks` is a non-threaded Perl link checker. It helps to find broken links on your website. `chklinks` differs from [linkchecker] in that `chklinks` is non-threaded. It does not raise many simultaneously connections for its job. It won’t run out of the resources and crash your system in a moment. This is certainly more desirable for most webmasters and users. `chklinks` respects `robots.txt`. If you disallow robots from your website and experience problems, you need to allow `chklinks`. Add the following lines to your `robots.txt` file to allow `chklinks`: User-agent: chklinks Disallow: `chklinks` uses [LWP::RobotUA] and supports the following schemes: `http`, `https`, `ftp`, `gopher` and `file`. You can also specify a local file. (To use `https`, you need to install [Crypt::SSLeay]. This is the requirement of LWP::RobotUA.) `chklinks` supports cookies. [linkchecker]: https://wummel.github.io/linkchecker [LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA [Crypt::SSLeay]: https://metacpan.org/release/Crypt-SSLeay System Requirement ------------------ 1. Perl, version 5.6 or above. I have not successfully run this on earlier versions. Please tell me if you can. You can run `perl -v` to check your current Perl version. If you do not have Perl, or if you have an older version of Perl, you can download and install/upgrade it from the [Perl website]. For MS-Windows, you can download and install [Strawberry Perl] or [ActivePerl]. 2. Required Perl modules: * [URI] This is used to parse and process the found URLs. You can download and install URI from the CPAN archive, or install it with the CPAN shell: cpan URI or with the CPANPLUS shell: cpanp i URI For Debian/Ubuntu: sudo apt install liburi-perl For Red Hat/Fedora/CentOS: sudo yum install perl-URI For FreeBSD: ports install p5-URI For ActivePerl: ppm install URI * [HTML::LinkExtor] This is used to extract links from the web pages. HTML::LinkExtor is contained in the [HTML-Parser] distribution. You can download and install HTML::LinkExtor from the CPAN archive, or install it with the CPAN shell: cpan HTML::LinkExtor or with the CPANPLUS shell: cpanp i HTML::LinkExtor For Debian/Ubuntu: sudo apt install libhtml-parser-perl For Red Hat/Fedora/CentOS: sudo yum install perl-HTML-Parser For FreeBSD: ports install p5-HTML-Parser For ActivePerl: ppm install HTML-Parser * [LWP::RobotUA] This is used to request web pages. LWP::RobotUA is contained in the [libwww-perl] distribution. You can download and install LWP::RobotUA from the CPAN archive, or install it with the CPAN shell: cpan LWP::RobotUA or with the CPANPLUS shell: cpanp i HTML::LinkExtor For Debian/Ubuntu: sudo apt install libwww-perl For Red Hat/Fedora/CentOS: sudo yum install perl-libwww-perl For FreeBSD: ports install p5-libwww For ActivePerl: ppm install libwww-perl 3. Optional Perl modules: * [Crypt::SSLeay] This is needed by LWP::RobotUA to support HTTPS. You can download and install HTML::LinkExtor from the CPAN archive, or install it with the CPAN shell: cpan Crypt::SSLeay or with the CPANPLUS shell: cpanp i Crypt::SSLeay For Debian/Ubuntu: sudo apt install libcrypt-ssleay-perl For Red Hat/Fedora/CentOS: sudo yum install perl-Crypt-SSLeay For FreeBSD: ports install p5-Crypt-SSLeay For ActivePerl: ppm install Crypt-SSLeay [Perl website]: https://www.perl.org [Strawberry Perl]: https://strawberryperl.com [ActivePerl]: https://www.activestate.com/products/perl/ [URI]: https://metacpan.org/pod/URI [HTML::LinkExtor]: https://metacpan.org/pod/HTML::LinkExtor [HTML-Parser]: https://metacpan.org/release/HTML-Parser [LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA [libwww-perl]: https://metacpan.org/release/libwww-perl [Crypt::SSLeay]: https://metacpan.org/pod/Crypt::SSLeay Download -------- `chklinks` is hosted is on… * [chklinks project on GitHub] * [chklinks project on SourceForge] You can always download the newest version of `chklinks` from… * [chklinks download on SourceForge] * [Tavern IMACAT’s FTP directory] imacat’s PGP public key is at… * [imacat’s PGP key at Tavern IMACAT’s] [chklinks project on GitHub]: https://github.com/imacat/chklinks [chklinks project on SourceForge]: https://sf.net/p/chklinks [chklinks download on SourceForge]: https://sourceforge.net/projects/chklinks/files [Tavern IMACAT’s FTP directory]: https://ftp.imacat.idv.tw/pub/chklinks/ [imacat’s PGP key at Tavern IMACAT’s]: https://www.imacat.idv.tw/me/pgpkey.asc Install ------- % perl Makefile.PL % make % make test % make install When running `make install`, make sure you have the privilege to write to the installation locations. This usually requires the `root` privilege. If you want to install into another location, you can set the `PREFIX`. For example, to install into your home when you are not `root`: % perl Makefile.PL PREFIX=/home/jessica Refer to the documentation of ExtUtils::MakeMaker for more installation options (by running `perldoc ExtUtils::MakeMaker`). For MS-Windows, since `make` is not universally available, Module::Build is preferred to ExtUtils::MakeMaker. See the instructions below. ### Install with [Module::Build] % perl Build.PL % ./Build % ./Build test % ./Build install When running `./Build install`, make sure you have the privilege to write to the installation locations. This usually requires the `root` privilege. If you want to install into another location, you can set the `--prefix`. For example, to install into your home when you are not `root`: % perl Build.PL --prefix=/home/jessica Refer to the documentation of Module::Build for more installation options (by running `perldoc Module::Build`). [ExtUtils::MakeMaker]: https://metacpan.org/release/ExtUtils-MakeMaker [Module::Build]: https://metacpan.org/release/Module-Build Options ------- ./chklinks [options] URL1 [URL2 [URL3 …]] ./chklinks [-h|-v] * `-1`, `--onelevel` Check the links on this page and stops. * `-r`, `--recursive` Recursively check through this website. This is the default. * `-b`, `--below` Only check the links below this directory. This is the default. * `-p`, `--parent` Trace back to the parent directories. * `-l`, `--local` Only check the links on this same host. * `-s`, `--span` Check the links to other hosts (without recursion). This is the default. * `-e`, `--exclude path` Exclude this path. Check for their existence but not check the links on them, just like they are on a foreign site. Multiple `--exclude` are OK. * `-i`, `--include path` Include this path. An opposite of `--exclude` that cancels its effect. The latter specified has a higher priority. * `-d`, `--debug` Display debug messages. Multiple `--debug` to debug more. * `-q`, `--quiet` Disable debug messages. An opposite that cancels the effect of `--debug`. * `-h`, `--help` Display the help message and exit. * `-v`, `--version` Output version information and exit. * `URL1`, `URL2`, `URL3` The URLs of the websites to check against. Notes ----- * `chklinks` does not obey `Crawl-delay:` in `robots.txt` yet. This is a problem in [WWW::RobotRules], but not `chklinks` itself. * If you encounter warnings like this: Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/LWP/Protocol.pm line 114. This is an issue of [LWP::Protocol] version ≤ 1.43 (in libwww-perl version ≤ 5.805) when working with HTML::Parser version ≥ 3.40 and Perl version ≥ 5.8. This issue is solved in LWP::Protocol version ≥ 1.46 (in libwww-perl version ≥ 5.806). You can upgrade your LWP::Protocol to the current version. If you cannot upgrade it, see [CPAN RT Bug#20274] for an LWP::Protocol patch on this. [WWW::RobotRules]: https://metacpan.org/pod/WWW::RobotRules [LWP::Protocol]: https://metacpan.org/pod/LWP::Protocol [CPAN RT Bug#20274]: https://rt.cpan.org/Public/Bug/Display.html?id=20274 Bugs ---- `chklinks` does not support authentication yet. [W3C-LinkChecker] supports this. As a workaround, You can use the syntax `http://user:pass@some.where.com/some/path` for Basic Authentication, but this does not work on Digest Authentication. This practice is not encouraged. Your password would be visible to anyone on this system using `ps`, including hidden intruders. Also, what you type in your shell will be saved to your shell history file. `mailto:` URLs should be supported by checking the validity of its DNS/MX record. Bastian Kleineidam's [linkchecker] have support on this. Local file checking has only been tested on Unix and MSWin32. More platforms should be tested, especially VMS and Mac. [W3C-LinkChecker]: https://validator.w3.org/checklink [linkchecker]: https://wummel.github.io/linkchecker See Also -------- [LWP::UserAgent], [LWP::RobotUA], [WWW::RobotRules], [URI], [HTML::LinkExtor], Bastian Kleineidam’s [linkchecker] and W3C-LinkChecker [checklink]. [LWP::UserAgent]: https://metacpan.org/pod/LWP::UserAgent [LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA [WWW::RobotRules]: https://metacpan.org/pod/WWW::RobotRules [URI]: https://metacpan.org/release/URI [HTML::LinkExtor]: https://metacpan.org/pod/HTML::LinkExtor [linkchecker]: https://wummel.github.io/linkchecker [checklink]: https://validator.w3.org/checklink Release Notes ------------- Please read the `NEWS` for the new functions and bug fixes. Support ------- The `chklinks` project is hosted on GitHub. Address your issues on the GitHub issue tracker https://github.com/imacat/chklinks/issues. Thanks ------ * Thanks to [SourceForge] for providing compiling farm for projects to test on different platforms. * Thanks to [Stefan Seifert] for pointing out redirection loops problem when cookies are not activated. (2005-11-07) * Thanks to [nsnake] for reporting warnings from [HTML::Parser] version >= 3.40 when checking UTF-8 pages. (2007-06-06) [SourceForge]: https://sf.net [Stefan Seifert]: mailto:stefan.seifert@atikon.com [nsnake]: mailto:loveme1314@gmail.com [HTML::Parser]: https://metacpan.org/pod/HTML::Parser License ------- Copyright (C) 2003-2021 imacat. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. imacat ^_*' 2003/5/25 https://www.imacat.idv.tw