HTML Proofer
How to use a Ruby gem as a CLI tool to validate links in your static HTML, for both internal and external links.
You can run this on any directory of static HTML files, locally or with CI.
Resources
- gjtorikian/html-proofer gem on GitHub.
- html-proofer gem on RubyGems registry.
Related sections
- Install gems and run - generic recipe for setting up Ruby and gems on GH Actions.
- HTML Proofer GH Actions recipe in the GH Actions section.
Install
Install gem globally
$ gem install html-proofer --user-install
This is if you are going to reuse across your projects or are going to use in CI. You’ll get the latest version available.
Install gem in project
This is for if you want to include in your project.
Use Bundler to add to your Gemfile
and install it.
$ bundle add html-proofer --group "test"
Or add the following your Gemfile
manually:
gem "html-proofer", "~> 3.19", :group => :test
Then later you can install it like this, locally or in CI.
$ bundle install
Use with Docker
See html-proofer on Docker Hub.
Usage
1. Build
Create your HTML output as usual.
e.g.
- Jekyll site.
$ bundle exec jekyll build
- NPM app.
$ npm run build
2. Run checks
Run the tool against your output directory, like _site
or build
.
Global level
If you installed globally, then run:
$ htmproofer _site
Project level
$ bundle exec htmlproofer _site
Running ["ScriptCheck", "LinkCheck", "ImageCheck"] on ["_site"] on *.html...
Checking 1022 external links...
Write to log files
The output can get very long, so you might want to write to log files which are ignored.
The tool prints stdout
as a count of URLs and files (a few lines only). The stdout
content is the actually check breakdown, which can very long.
Here, writing error output to links.log
.
$ htmlproofer --assume-extension _site 2> links.log
It looks like if you give bad flags, the error message is printed and nothing goes to the file, even when using:
... > success.log 2> fail.log
Usage
$ htmlproofer PATH [options]
If you omit the PATH
, then it uses .
for the current directory instead.
The docs recommend this:
$ htmlproofer _site
The output includes _site
though.
Cleaner output and using .
explicitly for clarity.
$ cd _site && htmlproofer .
If your site used a subpath like on GitHub Pages, then use the URL swap flag.
You can also run against a file.
$ htmlproofer _site/index.html
Configuration
See Configuration in the docs.
See also output from:
$ htmlproofer --help
Some flag highlights are covered below.
Log level
Set log level to debug.
e.g. --log-level :debug
Defaults to :info
but you set :debug
, :info
, :warn
, :error
.
URL swap
Provided one or more URLs to substitute with an escaped regex pattern and a value to use.
A hash containing key-value pairs of
RegExp => String
. It transforms URLs that matchRegExp
intoString
viagsub
.
--url-swap REGEX:VALUE
--url-swap REGEX:VALUE,REGEX:VALUE,REGEX:VALUE,...
Example from the docs:
$ htmlproofer --url-swap 'wow:cow,mow:doh'
For this Cookbook site which is built on Jekyll, the subpath is /code-cookbook/
and this causes the internal URLs to all appear broken. So replace that with /
as below.
$ htmlproofer --url '\/code-cookbook\/:/' _site/
URL ignore
e.g.
--url-ignore github.com
HTTP status ignore
--http_status_ignore
An array of numbers representing status codes to ignore.
Defaults to empty array.
e.g. --http-status-ignore "999,401,404"
Only 4XX
Defaults to false.
--only-4xx
Only reports errors for links that fall within the 4xx status code range.
Force HTTPS
--enforce-https
Fails a link if it’s not marked as https.
Assume extension
--assume-extension
Automatically add extension (e.g.
.html
) to file paths, to allow extensionless URLs
The docs recommend this for Jekyll.
External and internal links
Even if you use the link
tag to enforce local links at build time, it is still useful for the tool to check external links. In case you there are some case where you didn’t use link
.
Check internal links only.
--disable-external
If true, does not run the external link checker, which can take a lot of time.
Check external links only.
--external-only
Only checks problems with external references.
Rakefile setup
For a Jekyll site, the docs recommend the following for use with rake
if you want go that route.
Rakefile
require 'html-proofer' task :test do sh "bundle exec jekyll build" options = { :assume_extension => true } HTMLProofer.check_directory("_site", options).run end
Then run as:
$ rake test