chklinks/README.md

433 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

`chklinks` - A Non-Threaded Perl Link Checker
=============================================
Description
-----------
`chklinks` is a non-threaded Perl link checker. It helps to find
broken links on your website.
`chklinks` differs from [linkchecker] in that `chklinks` is
non-threaded. It does not raise many simultaneously connections for
its job. It wont run out of the resources and crash your system in a
moment. This is certainly more desirable for most webmasters and
users.
`chklinks` respects `robots.txt`. If you disallow robots from your
website and experience problems, you need to allow `chklinks`. Add
the following lines to your `robots.txt` file to allow `chklinks`:
User-agent: chklinks
Disallow:
`chklinks` uses [LWP::RobotUA] and supports the following schemes:
`http`, `https`, `ftp`, `gopher` and `file`. You can also specify a
local file. (To use `https`, you need to install [Crypt::SSLeay].
This is the requirement of LWP::RobotUA.)
`chklinks` supports cookies.
[linkchecker]: https://wummel.github.io/linkchecker
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
[Crypt::SSLeay]: https://metacpan.org/release/Crypt-SSLeay
System Requirement
------------------
1. Perl, version 5.6 or above. I have not successfully run this on
earlier versions. Please tell me if you can. You can run
`perl -v` to see your current Perl version. If you do not have
Perl, or if you have an older version of Perl, you can download and
install/upgrade it from the [Perl website]. If you are using
MS-Windows, you can download and install [ActiveState ActivePerl].
2. Required Perl modules:
* [URI]
This is used to parse and process the found URLs. You can
download and install URI from the CPAN archive, or install it
with the CPAN shell:
cpan URI
or with the CPANPLUS shell:
cpanp i URI
For Debian/Ubuntu:
sudo apt install liburi-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-URI
For FreeBSD:
ports install p5-URI
For ActivePerl:
ppm install URI
* [HTML::LinkExtor]
This is used to extract links from the web pages.
HTML::LinkExtor is contained in the [HTML-Parser] distribution.
You can download and install HTML::LinkExtor from the CPAN
archive, or install it with the CPAN shell:
cpan HTML::LinkExtor
or with the CPANPLUS shell:
cpanp i HTML::LinkExtor
For Debian/Ubuntu:
sudo apt install libhtml-parser-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-HTML-Parser
For FreeBSD:
ports install p5-HTML-Parser
For ActivePerl:
ppm install HTML-Parser
* [LWP::RobotUA]
This is used to request web pages. LWP::RobotUA is contained in
the [libwww-perl] distribution. You can download and install
LWP::RobotUA from the CPAN archive, or install it with the CPAN
shell:
cpan LWP::RobotUA
or with the CPANPLUS shell:
cpanp i HTML::LinkExtor
For Debian/Ubuntu:
sudo apt install libwww-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-libwww-perl
For FreeBSD:
ports install p5-libwww
For ActivePerl:
ppm install libwww-perl
3. Optional Perl modules:
* [Crypt::SSLeay]
This is needed by LWP::RobotUA to support HTTPS. You can
download and install HTML::LinkExtor from the CPAN archive, or
install it with the CPAN shell:
cpan Crypt::SSLeay
or with the CPANPLUS shell:
cpanp i Crypt::SSLeay
For Debian/Ubuntu:
sudo apt install libcrypt-ssleay-perl
For Red Hat/Fedora/CentOS:
sudo yum install perl-Crypt-SSLeay
For FreeBSD:
ports install p5-Crypt-SSLeay
For ActivePerl:
ppm install Crypt-SSLeay
[Perl website]: https://www.perl.org
[ActiveState ActivePerl]: https://www.activestate.com
[URI]: https://metacpan.org/release/URI
[HTML::LinkExtor]: https://metacpan.org/pod/HTML::LinkExtor
[HTML-Parser]: https://metacpan.org/release/HTML-Parser
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
[libwww-perl]: https://metacpan.org/release/libwww-perl
[Crypt::SSLeay]: https://metacpan.org/release/Crypt-SSLeay
Download
--------
`chklinks` is hosted is on…
* [chklinks project on GitHub]
* [chklinks project on SourceForge]
You can always download the newest version of `chklinks` from…
* [chklinks download on SourceForge]
* [Tavern IMACATs FTP directory]
imacats PGP public key is at…
* [imacats PGP key at Tavern IMACATs]
[chklinks project on GitHub]: https://github.com/imacat/chklinks
[chklinks project on SourceForge]: https://sf.net/p/chklinks
[chklinks download on SourceForge]: https://sourceforge.net/projects/chklinks/files
[Tavern IMACATs FTP directory]: https://ftp.imacat.idv.tw/pub/chklinks/
[imacats PGP key at Tavern IMACATs]: https://www.imacat.idv.tw/me/pgpkey.asc
Install
-------
`chklinks` uses standard Perl installation with ExtUtils::MakeMaker.
Follow these steps:
% perl Makefile.PL
% make
% make test
% make install
When running `make install`, make sure you have the privilege to
write to the installation location. This usually requires the `root`
privilege.
If you are using ActivePerl under MS-Windows, you should use `nmake`
instead of `make`. [nmake can be obtained from the Microsoft FTP site.]
If you want to install into another location, you can set the
`PREFIX`. For example, to install into your home when you are not
`root`:
% perl Makefile.PL PREFIX=/home/jessica
Refer to the documentation of ExtUtils::MakeMaker for more
installation options (by running `perldoc ExtUtils::MakeMaker`).
### Install with [Module::Build]
You can install with Module::Build instead, if you prefer. Follow
these steps:
% perl Build.PL
% ./Build
% ./Build test
% ./Build install
When running `./Build install`, make sure you have the privilege to
write to the installation location. This usually requires the `root`
privilege.
If you want to install into another location, you can set the
`--prefix`. For example, to install into your home when you are not
`root`:
% perl Build.PL --prefix=/home/jessica
Refer to the documentation of Module::Build for more installation
options (by running `perldoc Module::Build`).
[ExtUtils::MakeMaker]: https://metacpan.org/release/ExtUtils-MakeMaker
[nmake can be obtained from the Microsoft FTP site.]: ftp://ftp.microsoft.com/Softlib/MSLFILES/nmake15.exe
[Module::Build]: https://metacpan.org/release/Module-Build
Options
-------
./chklinks [options] URL1 [URL2 [URL3 …]]
./chklinks [-h|-v]
* `-1`, `--onelevel`
Check the links on this page and stops.
* `-r`, `--recursive`
Recursively check through this website. This is the default.
* `-b`, `--below`
Only check the links below this directory. This is the default.
* `-p`, `--parent`
Trace back to the parent directories.
* `-l`, `--local`
Only check the links on this same host.
* `-s`, `--span`
Check the links to other hosts (without recursion). This is the
default.
* `-e`, `--exclude path`
Exclude this path. Check for their existence but not check the
links on them, just like they are on a foreign site. Multiple
`--exclude` are OK.
* `-i`, `--include path`
Include this path. An opposite of `--exclude` that cancels its
effect. The latter specified has a higher priority.
* `-d`, `--debug`
Display debug messages. Multiple `--debug` to debug more.
* `-q`, `--quiet`
Disable debug messages. An opposite that cancels the effect of
`--debug`.
* `-h`, `--help`
Display the help message and exit.
* `-v`, `--version`
Output version information and exit.
* `URL1`, `URL2`, `URL3`
The URLs of the websites to check against.
Notes
-----
* `chklinks` does not obey `Crawl-delay:` in `robots.txt` yet. This
is a problem in [WWW::RobotRules], but not `chklinks` itself.
* If you encounter warnings like this:
Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/LWP/Protocol.pm line 114.
This is an issue of [LWP::Protocol] version ≤ 1.43 (in libwww-perl
version ≤ 5.805) when working with HTML::Parser version ≥ 3.40 and
Perl version ≥ 5.8. This issue is solved in LWP::Protocol
version ≥ 1.46 (in libwww-perl version ≥ 5.806). You can upgrade
your LWP::Protocol to the current version. If you cannot upgrade
it, see [CPAN RT Bug#20274] for an LWP::Protocol patch on this.
[WWW::RobotRules]: https://metacpan.org/pod/WWW::RobotRules
[LWP::Protocol]: https://metacpan.org/pod/LWP::Protocol
[CPAN RT Bug#20274]: https://rt.cpan.org/Public/Bug/Display.html?id=20274
Bugs
----
`chklinks` does not support authentication yet. [W3C-LinkChecker]
supports this. As a workaround, You can use the syntax
`http://user:pass@some.where.com/some/path` for Basic Authentication,
but this does not work on Digest Authentication. This practice is
not encouraged. Your password would be visible to anyone on this
system using `ps`, including hidden intruders. Also, what you type
in your shell will be saved to your shell history file.
`mailto:` URLs should be supported by checking the validity of its
DNS/MX record. Bastian Kleineidam's [linkchecker] have support on
this.
Local file checking has only been tested on Unix and MSWin32. More
platforms should be tested, especially VMS and Mac.
[W3C-LinkChecker]: https://validator.w3.org/checklink
[linkchecker]: https://wummel.github.io/linkchecker
See Also
--------
[LWP::UserAgent], [LWP::RobotUA], [WWW::RobotRules], [URI],
[HTML::LinkExtor], Bastian Kleineidams [linkchecker] and
W3C-LinkChecker [checklink].
[LWP::UserAgent]: https://metacpan.org/pod/LWP::UserAgent
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
[WWW::RobotRules]: https://metacpan.org/pod/WWW::RobotRules
[URI]: https://metacpan.org/release/URI
[HTML::LinkExtor]: https://metacpan.org/pod/HTML::LinkExtor
[linkchecker]: https://wummel.github.io/linkchecker
[checklink]: https://validator.w3.org/checklink
Release Notes
-------------
Please read the `NEWS` for the new functions and bug fixes.
Support
-------
The `chklinks` project is hosted on GitHub. Address your issues on the
GitHub issue tracker https://github.com/imacat/chklinks/issues.
Thanks
------
* Thanks to [SourceForge] for providing compiling farm for projects to
test on different platforms.
* Thanks to [Stefan Seifert] for pointing
out redirection loops problem when cookies are not activated.
(2005-11-07)
* Thanks to [nsnake] for reporting warnings from [HTML::Parser]
version >= 3.40 when checking UTF-8 pages. (2007-06-06)
[SourceForge]: https://sf.net
[Stefan Seifert]: mailto:stefan.seifert@atikon.com
[nsnake]: mailto:loveme1314@gmail.com
[HTML::Parser]: https://metacpan.org/pod/HTML::Parser
License
-------
Copyright (C) 2003-2021 imacat.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
imacat ^_*'
2003/5/25
<imacat@mail.imacat.idv.tw>
https://www.imacat.idv.tw