Options and Settings of the Component

This page as mostly a demonstration purpose. Please visit Administrator Component for more information

Options & Settings

Broken Link Checker

This is the URL pointing to your homepage. It is used for the CLI interface only.

You can also specify --live-site <URL> on the command line or place it in your configution.php public $live_site = '<URL>';

The live_site option is not accessible from the System Configuration. You will need direct access to the configution.php, either via FTP or the File browser in your hosting dashboard.

Show Quick Icon

With this option enabled, you get a nice Quick Icon in the 3rd party panel on the Administrator Dashboard. With the Admin Module active there will also be a check-counter visible.

Cron Throttle

Minimum time between two (HTTP) cron request. Intended as server protection. It avoids multiple simultaneous crons.

URL Token

This token is a simple protection for the (HTTP) cron's

(Old) site Urls

This will help to correct old domain names in your content to new ones. For example if you moved to a different domain or from a development location.

Lock Level

Checking links is quite resource intensive. It is relative slow and uses outgoing and incoming connections on a server.

Since there are several methods to start the actual checking (task, CLI, HTTP and pseudo cron) multiple action could run in parallel.

To prevent an overload of the system and possible triggering of web application firewalls, the broken link checker uses locking to prevent multi parallel actions from executing.

As the result you might get message like: Another instance of the broken link checker is running

By default, checking links are locked on server level, so if you have multiple Joomla installations on a single server only one can actually execute.

Extracting on reporting are locked on site level.

The broken link checker uses the GET_LOCK and RELEASE_LOCK feature of the database. That means server locks are on database level. Multiple website-servers might share a single database.

You can change the minimum lock level. If set to none, jobs might run in parallel. This could result in database errors when inserting new links. Although ugly this will not impact extracting and checking.

The lock does not apply when (re)checking a single link manually from the View & Fix Link page.

Global Plugin Options

These options can be set globally and overriden for each (extractor) plugin.

Link replacements

This option allows to enable/disable link replacements on a per-plugin basis.

Set the global option to Off and specific plugins to On to allow link replacement for only those plugins.

Set the global option to On and specific plugins to Off to prevent link replacement for only those plugins.

'One-click-link-replacement' is a convenient way to quickly update links. However, manually updating the link and the 'surrounding' content is often a better way to ensure the information is still consistent.

Only extract from public visible content

Extract only links from items that have Public Access.

See the Auto login plugin for more information on checking content with access restrictions

Only extract from published content

Extract only links from items that have Published State and today is between the Publish Up and Publish Down dates, if present.

Trashed containers are never extracted.

When Saved

Whenever an item is saved the content must be scanned for links again. This can be done:

immediately : Re-extract links
on the next parse be deleting the extracted data: Delete extracted data - Links are not visible anymore in Links Menu
on the next parse using the modified date: Do Nothing - Links are visible in Links Menu until next extract.

Some containers like modules do not have a modified date. Use Re-extract or Delete .

When Deleted

When an item is trashed the extracted data must be removed.

immediately : Delete extracted data - Links are not visible anymore in the Links Menu
on the next parse: Do Nothing - Links are visible in Links Menu until next extract

Delete extracted data on plugin save

Plugin settings might impact the extracted data. This option allows to purge all data for a plugin whenever the plugin settings change.

Extracting

Extract batch size

The number of containers to extract links from per batch. A container can be an article, custom HTML module etcetera.

An item might have several containers. For example, an article has the content and the custom fields.

You can set different batch sizes for the HTTP and the CLI cron, as the latter as less of a time restriction. The administrator Pseudo cron always extracts from five containers per request.

HREF Parser

This parser extracts the href attribute from html-<a> elements.

IMG Parser

This parser extracts the src-attribute from html-<img> elements.

Extracting from <picture> or <figure> elements is not supported, neither parsing of a srcset attribute.

In most situations, these elements and attributes are generated and added automatically, wrapped around a fallback <img> element. So if the system is setup properly, checking the src-attribute of the <img> element is sufficient.

Embed Parser

These parsers extracts from 'embed' elements from embedded content. Most content is embedded using short codes like {youtube src=''} and there are quite few flavors of these code in use. Some are html elements

By default, Joomla will not allow elements like iframe and embed in the editor. So if you are not using them, leave the parsers disabled to save a bit of resources.

If you need a specific code, please contact me.

iframe

This parser takes the src attribute of an iframe element

video

This parser takes the src attribute of an iframe element

embed

This parser takes the src attribute of an embed element

Aimy and AllVideos (by JoomlaWorks)

Extract the video URL or video-id from {youtube}https://youtu.be/zvUi1VZui7E{/youtube} and {vimeo}889793639{/vimeo}.

A video-id is converted into a proper link.

src style embed (includes All Video Share)

{avsplayer src=https://youtu.be/zvUi1VZui7E}, {youtube src=https://youtu.be/zvUi1VZui7E} {vimeo src=https://youtu.be/zvUi1VZui7E}

The format {avsplayer id=<number>} format is not yet supported.

If your site contains a lot of embed videos consider the oEmbed checker plugin.

Link Replacements

Wrong URL Encoding

Links with for example spaces are not allowed in HTML, however most browsers will happily show them without any problems.

For example, if you upload images with spaces in their names and insert them into content. Newer Joomla versions replace the spaces with '%20' older leave the spaces in the media input fields. The URL in the media field consists of

The fragment ( the part after #) is always ignored. This avoids problems with the media input field. The URL in the media field consists of the path to the image and the Joomla source in the fragment. The URL is correctly encoded however the fragment is not.

Replace links

Be sure to have backups when replacing links.

Link replacement is not supported be all extractors/containers. Use the edit button to edit those manually.

The edit link might take you to a parent container, for example an article, links might be hiding. like in a custom field/

More information: One Click Link replacement

Internal links

See Internal Links

Checking

Static Checker

If enabled, internal static sources are checked directly as a file. This is much faster and less taxing than via an HTTP request. The disadvantage is that server settings are not taken into account. (Mis)configurations in, for example, the .htaccess on Apache can affect the availability of files. If a file does not appear to exist, a regular check is still performed.

Static path prefixes

These paths define the locations of static resources. On most Joomla websites, that will be the image's and in fewer cases the templates/ folder.

Checks per batch

How many links to check in one cron batch.

Compared to extracting links, checking them is far more time-consuming.

You can set different batch sizes for the HTTP and the Command Line Interface (CLI) crons, as the latter has less of a time restriction.

Typically, a website running PHP has a timeout (max_execution_time) of 30 - 60 seconds. The default is 30 seconds. The CLI usually has no time limit.

The max_execution_time divided by the Request Timeout setting gives a rough upper limit for the amount of links that could be checked in one batch. You will find an estimate on the setup page.

Both the extraction as the checker phase are build as Failed? No Sweat. Let's try again latter. So it wouldn't matter if your job reaches the PHP timeout time. But it is cleaner to stay within it.

The administrator Pseudo cron always checks one link per request.

Request Timeout

Maximum time to wait for a response when checking a link.

You can set different values for the HTTP and the CLI cron, as the latter as less of a time restriction. The administrator Pseudo cron uses the HTTP setting

Report False Positives

There is a fine line between broken, maybe broken and working. With this option, the checker marks links that appear broken but most likely work for a normal visitor as a warning.

(This option is most likely removed in a future version)

Use Head request

This option is intended to reduce the amount of data transferred. With this option On the checker sends a HEAD (are you there) request to the remote server. Most servers response correctly with static content like images and stylesheets.

If the response is any kind of error or redirect, thus a response other than 200 the link is rechecked as GET.

Recheck after

Recheck link after is the recheck interval for all links, broken or working. Links marked to ignore are always skipped.

With the Recheck broken link after setting, you can set a shorter interval for broken links, to catch temporality failures.

Use Range request

This option is intended to reduce the amount of data transferred. With this option On the checker request only a part of the remote content. Successful Range request have a response code 206.

Usually, Range request only works with static content like images and zip files.

If the response is any kind of error or redirect, thus a response other than 200 or 206, the link is rechecked using a normal GET.

So with Use Head request and Use Range request the request are:

HEAD
- response 200, perfect finish checking
GET with range header
- response 200 or 206, perfect finish checking
GET without range header

Recheck Count

How often should broken links be rechecked.

Follow Redirects

The checker will always try to get the final destination of a link. This allows for easy replacement of the links.

It either uses CURLOPT_FOLLOWLOCATION or checks a link recursively bases on a redirect response code (301, 302, 303, 307) and the Location header.

If your server as allow_url_open restrictions, redirects can not be followed directly by curl, so the recursive method is used.

If you want to disable following redirects completely, disable this option.

The 'Max Redirects' number controls the maximum number of hops to follow. If this number is reached the link is reported as broken with custom response 'Too many redirects'

Log Response

Depending on the response, the checker can keep the response from the remote server in the log.

Never

Disable saving any response, reducing database size.

Auto

Log the most common text responses if avaiable.

If the server responds with a 200 code on a HEAD Request, no content is exchanged and therefor nothing is logged.

If the server responds with a 206 - Partial content on a RANGE Request, only a part of the response is sent and therefor logged.

Always

Keep all the responses. This disables the range and head options

Text

Keep all text like responses. This disables the range and head options

Some plugins, like the External Link Extractor, will override this setting when needed.

Unkown protocols

A website might contain links that can not be checked, like mailto: links.

With this option, you can either show them in the reports (visible with the unchecked protocol filter) or ignore them completely.

Accept-Language

Some websites serve content based on the browsers language

This tells the remote server which language you prefer. The automatic setting is based on your website's default language.

Cookies and Checker signature

These options are used to masquerade the checker as a browser. Some Web Application firewalls (WAFs) respond friendlier if Cookies are set and match the User-Agent.

You can create one or more custom signature be adding a json file to [ROOT]/administrator/com_blc/forms/signatures.

Example:

{
    "userAgent": "Some ser agent",
    "headers": [
         "Accept-Encoding: deflate, br",
        "Sec-Fetch-Dest: document",
        "Sec-Fetch-User: ?1",
        "Sec-Fetch-Mode: navigate",
        "Sec-Fetch-Site: none",
    ],
   "Accept-Language": "Language string / Optional"
}

The Accept-Language is optional, if set it will override the setting from the BLC configuration

Adjust Fetch header

With this option enabled, the Fetch metadata request header are adjusted according to the URL location.

Sec-Fetch-Site

This option is set to same-site for internal links and cross-site for external links.

Sec-Fetch-Mode

Should be set in the signature to 'navigate' as if a user clicks on a link to open a html-page, image, video or whatever.

Sec-Fetch-Dest

Should be set in the signature to 'document', as if a user views the destination in the browser.

The checker does not distinguish between links in an <img>-tag and <a>-tags.

Invalid Server certificates

The most common issues are with SSL certificates for the domain itself:

Expired
Invalid domain
SSL certificate problem: unable to get local issuer certificate

These problems should be visible in your web browser as well. Hopefully the owner of the website will resolve the issue soon.

SSL Chain problems

SSL Chian

The SSL certificate chain is the list of certificates that contains the SSL certificate, intermediate certificate authorities, and root certificate authority that enables the connecting device to verify that the SSL certificate is trustworthy. A server should send the intermediate certificate(s). However, some servers are misconfigured and only send the SSL certificate itself. Web browsers are quite forgiving, a tool like curl is not.

You can check a certificate on a website like: https://www.ssllabs.com/ssltest/analyze.html. Message like `This server's certificate chain is incomplete`, `Chain issues Incomplete`, `SSL certificate problem: unable to get local issuer certificate` and `NOT TRUSTED` indicate some SSL chain issue.

It is possible to solve these issues on the server. This is quite technical and requires root access.

As SSL connection errors are not reported as HTTP Response codes, the checker uses custom codes to report SSL problems

Set Fixed SSL

It seems that some WAF's block browsers polling for different TLS versions. They expect Chrome to use TLS v1.3 without bothering about the older versions.

This option enforces the checker to use the selected version only.

Most modern web servers should support TLS v1.3, so setting this option to this newest version should work fine for most.

The checker will encounter older servers where the SSL Connection fails (CURLE_SSL_CONNECT_ERROR (35)). In that case the checker will retry using TLS v1.2.

default: use TLS v1.2 or TLS v1.3
TLS 1.2: use TLS v1.2 and if that fails, fallback to default
TLS 1.3: use TLS v1.3 and if that fails, fallback to default.

CA Bundle

This is an advanced feature best left to default.

A certificate authority (CA) is a trusted entity that issues Secure Sockets Layer (SSL) certificates for websites. Like a kind of signature to ensure the web server's certificate is valid and issued to the owner of the domain.

If you are getting a lot of SSL Chain Errors your server might have an outdated certificate authority (CA) bundle. In that case you could try the Certificate Bundle shipped with Joomla (Bundled), try a search on the System or provide your own bundle.

This option sets the CURL option CURLOPT_CAINFO, in addition CURL used the CURLOPT_CAPATH option to find certificates. So this option can be used to add missing intermediate certificates.

This option has no effect on CURL implementations based on Schannel library (windows).

Verbose

This enabled verbose logging (CURLOPT_VERBOSE).

This adds information about the SSL -handshake, request and response headers to the Log.

Effective after the link is checked with this option On.

Domain throttle

This limits the amount of request to your own or an external website.

Be friendly to your neighbors!

Links are only (rec)checked if the throttle time has expired. The HTTP and Scheduded Task action will skip the throttled domain and continue with other links, fetching new links from the database until Checks per HTTP batch is reached, or all links are considered.

The admin pseudo cron will simply skip the link. If this bothers, set the value for Pause between checks to a value larger than Internal/External Domain throttle and you will never hit a throttled domain.

For the CLI you can either skip a throttled domain or pause checking using the option Handle throttle with CLI/ If you opt to skip links, the actual number of checked links might be lower than the configured Checks per CLI batch.

Domains to ignore

This is a list of hosts that should never be checked.

For example, Facebook has a login restriction on a lot of pages.

You can either ignore these links completely or report them marked to ignore using the What to do with these domains option

Patterns to ignore

Pattern matching on the path of links (https://example.com/path/).

Intended to exclude internal links, but should work on all links.

As with the Domains above, you can control the reporting with What to do with the matches

Domains to ignore redirects

The field is intended for domain(s) that always redirect, for example:

Links to a URL shortener service
Affiliate links
Google services like drive and maps.

The redirects are followed to the final destination, so the checker should still report broken links (400+ Response codes) but will not show any redirects. That includes 'valid' redirects at the final destination.

Valid links will have a Redirect Ignored custom response code.

Mail Reports

Show sources

With this option enabled the report will include all items (articles, categories etc) where the link was found. Including the anchor and links to view and edit the item.

Frequency

The minimum interval between two reports.

Number of links

How many links should be reported. This is per 'what'

Show Only Changes

If set to yes the report will only be sent if there are new (broken) links.

If an existing link appears on a new location, a link is still considered old.

What to report

With the options: Broken, Warning, Redirect, Parked and New you can select what types of (broken) links you want to report.