Skip to main content

View & Fix Links Menu

Fix Broken & Redirecting links.

Setup & Maintenance Menu

All your links in one view. Information on the cronjobs and more

Explore Links Menu

The explore menu allows you to explore all the links in your content (Joomla articles).

Options & Settings

Broken Link Checker

Show Quick Icon

With this option enabled, you get a nice Quick Icon in the 3rd party panel on the Administrator Dashboard. With the Admin Module active there will also be a check-counter visible.

Cron Throttle

Minimum time between two (HTTP) cron request. Intended as server protection

URL Token

This token is a simple protection for the (HTTP) cron's

(Old) site Urls

This will help to correct old domain names in your content to new ones. For example if you moved to a different domain or from a development location.

 

Global Plugin Options

These options can be set globally and overriden for each (extractor) plugin.
Only extract from public visible content

Extract only links from items that have Public Access.

See the Auto login plugin for more information on checking content with access restrictions

Only extract from published content

Extract only links from items that have Published State and today is between the Publish Up and Publish Down dates, if present.

Trashed containers are never extracted.

When Saved

Whenever an item is saved the content must be scanned for links again. This can be done:

  • immediately : Re-extract links
  • on the next parse be deleting the extracted data: Delete extracted data - Links are not visible anymore in Links Menu
  • on the next parse using the modified date: Do Nothing - Links are visible in Links Menu until next extract.

Some containers like modules do not have a modified date. Use Re-extract or Delete .

When Deleted

When an item is trashed the extracted data must be removed.

  • immediately : Delete extracted data - Links are not visible anymore in the Links Menu
  • on the next parse: Do Nothing - Links are visible in Links Menu until next extract
Delete extracted data on plugin save

Plugin settings might impact the extracted data. This option allows to purge all data for a plugin whenever the plugin settings change.

Extracting

Extract batch size

The number of containers to extract links from per batch. A container can be an article, custom HTML module etcetera.

An item might have several containers. For example, an article has the content and the custom fields.

You can set different batch sizes for the HTTP and the CLI cron, as the latter as less of a time restriction. The administrator Pseudo cron always extracts from five containers per request.

 

HREF Parser

This parser extracts the href attribute  from  html-<a> elements.

 

 

IMG Parser

This parser extracts the src-attribute from html-<img> elements.

Extracting from <picture> or <figure> elements is not supported, neither parsing of a srcset attribute.

In most situations, these elements and attributes are generated and added automatically, wrapped around a fallback <img> element. So if the system is setup properly, checking the src-attribute of the <img> element is sufficient. 

 

Embed Parser

This parser extracts elements from embed content. Most content is embedded using short codes like {youtube src=''} and there are quite few flavours of these code in use.

If you need a specific code please contact me.

iframe

This parser takes the src attribute of an iframe element

Aimy Video Plugin

Extract the video URL or video-id from {youtube}https://youtu.be/zvUi1VZui7E{/youtube} and {vimeo}889793639{/vimeo}.

A video-id is converted into a proper link.

src style embed (includes All Video Share)

{avsplayer src=https://youtu.be/zvUi1VZui7E}, {youtube src=https://youtu.be/zvUi1VZui7E} {vimeo src=https://youtu.be/zvUi1VZui7E}
 
The format {avsplayer id=<number>} format is not yet supported.
 
If your site contain a lot of embed video's consider the oEmbed checker plugin.

Link Replacements

Wrong URL Encoding

Links with for example spaces are not allowed in HTML, however most browsers will happily show them without any problems.

For example, if you upload images with spaces in their names and insert them into content.

Replace links

Be sure to have backups when replacing links.

Link replacement is not supported be all extractors/containers. Use the edit button to edit those manually.

The edit link might take you to a parent container, for example an article, links might be hiding. like in a custom field/

More information: One Click Link replacement

Checking

Checks per batch

How many links to check in one cron batch.

Compared to extracting links, checking them is far more time-consuming.

You can set different batch sizes for the HTTP and the Command Line Interface (CLI) crons, as the latter as less of a time restriction. 

Typically, a website running PHP has a timeout of 30 - 60 seconds. the default is 30 seconds. Usually the CLI has no time limit.

Both the extraction as the checker phase are build as Failed? No Sweat. Let's try again latter. So it wouldn't matter if your job reaches the php timeout time. But it is cleaner to stay within it.

 

The administrator Pseudo cron always checks one link per request.

 

Request Timeout

Maximum time to wait for a response when checking a link.

You can set different batch sizes for the HTTP and the CLI cron, as the latter as less of a time restriction. The administrator Pseudo cron uses the HTTP setting

 

Report False Positives

There is a fine line between broken, maybe broken and working. With this option, the checker marks links that appear broken but most likely work for a normal visitor as a warning.

(This option is most likely removed in a future version)

Use Head request

This option is intended to reduce the amount of data transferred. With this option On the checker sends a HEAD (are you there) request to the remote server. Most servers response correctly with static content like images and stylesheets.

If the response is any kind of error or redirect, thus a response other than 200 the link is rechecked as GET. 

Recheck after

Recheck link after hours is the recheck interval for all links, broken or working. Links marked to ignore are always skipped.

With Recheck broken link after hours you can set a shorter interval for broken links, to catch temporality failures.

Use Range request

This option is intended to reduce the amount of data transferred. With this option On the checker request only a part of the remote content. Successful Range request have a response code 206.

Usually, Range request only works with static content like images and zip files.

If the response is any kind of error or redirect, thus a response other than 200 or 206, the link is rechecked using a normal GET.

So with Use Head request and Use Range request the request are:

  1. HEAD
    • response 200, perfect finish checking
  2. GET with range header
    • response 200 or 206, perfect finish checking
  3. GET without range header
Recheck Count

How often should broken links be rechecked.

Follow Redirects

The checker will always try to get the final destination of a link. This allows for easy replacement of the links.

It either uses CURLOPT_FOLLOWLOCATION or checks a link recursively bases on a redirect response code (301, 302, 303, 307) and the Location header.

If your server as allow_url_open restrictions, redirects can not be followed directly by curl, so the recursive method is used.

If you want to disable following redirects completely, disable this option.

The 'Max Redirects' number controls the maximum number of hops to follow. If this number is reached the link is reported as broken with custom response 'Too many redirects'

 

Log Response

Depending on the response, the checker can keep the response from the remote server in the log. 

Never

Disable saving any response, reducing database size. 

Auto

Log the most common text responses if avaiable.

If the server responds with a 200 code on a HEAD Request, no content is exchanged and therefor nothing is logged. 

If the server responds with a 206 - Partial content on a RANGE Request, only a part of the response is sent and therefor logged.

Always

Keep all the responses. This disables the range and head options

Text

Keep all text like responses. This disables the range and head options

Some plugins, like the External Link Extractor, will override this setting when needed.

 

 

Unkown protocols

A website might contain links that can not be checked, like mailto: links.

With this option, you can either show them in the reports (visible with the unchecked protocol filter) or ignore them completely.

Accept-Language

Some websites serve content based on the browsers language

This tells the remote server which language you prefer. The automatic setting is based on your website's default language.

Cookies and Checker signature

These options are used to masquerade the checker as a browser.  Some Web Application firewalls (WAFs) respond friendlier if Cookies are set and match the User-Agent.

You can create one or more custom signature be adding a json file to [ROOT]/administrator/com_blc/forms/signatures.

Example:

{
    "userAgent": "Some ser agent",
    "headers": [
         "Accept-Encoding: deflate, br",
        "Sec-Fetch-Dest: document",
        "Sec-Fetch-User: ?1",
        "Sec-Fetch-Mode: navigate",
        "Sec-Fetch-Site: none",
    ],
   "Accept-Language": "Language string / Optional"
}

The Accept-Language is optional, if set it will override the setting from the BLC configuration

Adjust Fetch header

With this option enabled, the Fetch metadata request header are adjusted according to the URL location.

Sec-Fetch-Site

This option is set to same-site for internal links and cross-site for external links.

Sec-Fetch-Mode

Should be set in the signature to 'navigate' as if a user clicks on a link to open a html-page, image, video or whatever.

Sec-Fetch-Dest

Should be set in the signature to 'document', as if a user views the destination in the browser.

The checker does not distinguish between links in an <img>-tag and <a>-tags.

 

More info on this topic.

Valid SSL

With this option  SSL certificates of remote websites are checked. Websites with invalid certificates will be reported as broken with a pseudo response SSL Certificate Error (406).

Depending on your servers (mis)configuration, SSL validation might fail for al websites or cause timeouts.

Invalid Server certificates

The most common issues are with SSL certificates for the domain itself:

  • Expired
  • Invalid domain

These problems should be visible in your web browser as well. Hopefully the owner of the website will resolve the issue soon. 

SSL Chain problems

SSL Chian

The SSL certificate chain is the list of certificates that contains the SSL certificate, intermediate certificate authorities, and root certificate authority that enables the connecting device to verify that the SSL certificate is trustworthy. A server should send the intermediate certificate(s). However, some servers are misconfigured and only send the SSL certificate itself. Web browsers are quite forgiving, a tool like curl is not.

You can check a certificate on a website like: https://www.ssllabs.com/ssltest/analyze.html. Message like `This server's certificate chain is incomplete`, `Chain issues Incomplete` and `NOT TRUSTED` indicate some SSL chain issue.

It is possible to solve these issues on the server. This is quite technical and requires root access.

 

As SSL connection errors are not reported as HTTP Response codes, the checker uses custom codes to report SSL problems

Set Fixed SSL

It seems that some WAF's block browsers polling for different TLS versions. They expect Chrome to use TLS v1.3 without bothering about the older versions.

This option enforces the checker to use the selected version only. 

Most modern web servers should support TLS v1.3, so setting this option to this newest version should work fine.

As a fallback, the checker will retry a request if the SSL Connection fails (CURLE_SSL_CONNECT_ERROR (35)) with the default settings.

 

Domain throttle

This limits the amount of request to your own or an external website.

Be friendly to your neighbours!

Using the HTTP cron or the Admin pseudo cron skip domain if the pause threshold is not yet reached. For the CLI command, you can also wait.

Domains to ignore

This is a list of hosts that should never be checked.

For example, Facebook has a login restriction on a lot of pages.

You can either ignore these links completely or report them marked to ignore using the What to do with these domains option

Patterns to ignore

Pattern matching on the path of links (https://example.com/path/).

Intended to exclude internal links, but should work on all links.

As with the Domains above, you can control the reporting with What to do with with matches

Domains to ignore redirects

The field is intended for domain(s) that always redirect, for example:

  • Links to a URL shortener service
  • Affiliate links
  • Google services like drive and maps.

The redirects are followed to the final destination, so the checker should still report broken links (400+ Response codes) but will not show any redirects. That includes 'valid' redirects at the final destination.

Valid links will have a Redirect Ignored custom response code.

Mail Reports

Frequency

The minimum interval between two reports.

Links in Report

How many (broken) links should the report contain

Show Only Changes

If set to yes the report will only be sent if there are new (broken) links.

If an existing link appears on a new location, a link is still considered old.

 

Run report after extract

Run the report after a http of cli extract operation.

Whether a report is sent still depends on the frequency. 

This option is not active when using the Admin pseudo cron.

 

Run report after check

Run the report after a http of cli check operation.

Whether a report is sent still depends on the frequency. 

This option is not active when using the Admin pseudo cron.

 

Recipients

One or more Joomla Users to receive the report.

The administrator component is part of Broken Link Checker package.