Options and Settings of the Component
This page as mostly a demonstration purpose. Please visit Administrator Component for more information
Options & Settings
Broken Link Checker
This is the URL pointing to your homepage. It is used for the CLI interface only.
You can also specify --live-site <URL>
on the command line or place it in your configution.php public $live_site = '<URL>';
The live_site option is not accessible from the System Configuration. You will need direct access to the configution.php, either via FTP or the File browser in your hosting dashboard.
With this option enabled, you get a nice Quick Icon in the 3rd party panel on the Administrator Dashboard. With the Admin Module active there will also be a check-counter visible.
Minimum time between two (HTTP) cron request. Intended as server protection. It avoids multiple simultaneous crons.
This token is a simple protection for the (HTTP) cron's
This will help to correct old domain names in your content to new ones. For example if you moved to a different domain or from a development location.
Checking links is quite resource intensive. It is relative slow and uses outgoing and incoming connections on a server.
Since there are several methods to start the actual checking (task, CLI, HTTP and pseudo cron) multiple action could run in parallel.
To prevent an overload of the system and possible triggering of web application firewalls, the broken link checker uses locking to prevent multi parallel actions from executing.
As the result you might get message like: Another instance of the broken link checker is running
By default, checking links are locked on server level, so if you have multiple Joomla installations on a single server only one can actually execute.
Extracting on reporting are locked on site level.
The broken link checker uses the GET_LOCK and RELEASE_LOCK feature of the database. That means server locks are on database level. Multiple website-servers might share a single database.
You can change the minimum lock level. If set to none, jobs might run in parallel. This could result in database errors when inserting new links. Although ugly this will not impact extracting and checking.
The lock does not apply when (re)checking a single link manually from the View & Fix Link page.
Global Plugin Options
This option allows to enable/disable link replacements on a per-plugin basis.
Set the global option to Off
and specific plugins to On
to allow link replacement for only those plugins.
Set the global option to On
and specific plugins to Off
to prevent link replacement for only those plugins.
'One-click-link-replacement' is a convenient way to quickly update links. However, manually updating the link and the 'surrounding' content is often a better way to ensure the information is still consistent.
Extract only links from items that have Public Access.
See the Auto login plugin for more information on checking content with access restrictions
Extract only links from items that have Published State and today is between the Publish Up and Publish Down dates, if present.
Trashed containers are never extracted.
Whenever an item is saved the content must be scanned for links again. This can be done:
- immediately : Re-extract links
- on the next parse be deleting the extracted data: Delete extracted data - Links are not visible anymore in Links Menu
- on the next parse using the modified date: Do Nothing - Links are visible in Links Menu until next extract.
Some containers like modules do not have a modified date. Use Re-extract or Delete .
When an item is trashed the extracted data must be removed.
- immediately : Delete extracted data - Links are not visible anymore in the Links Menu
- on the next parse: Do Nothing - Links are visible in Links Menu until next extract
Plugin settings might impact the extracted data. This option allows to purge all data for a plugin whenever the plugin settings change.
Extracting
The number of containers to extract links from per batch. A container can be an article, custom HTML module etcetera.
An item might have several containers. For example, an article has the content and the custom fields.
You can set different batch sizes for the HTTP and the CLI cron, as the latter as less of a time restriction. The administrator Pseudo cron always extracts from five containers per request.
This parser extracts the href attribute from html-<a> elements.
This parser extracts the src-attribute from html-<img> elements.
Extracting from <picture> or <figure> elements is not supported, neither parsing of a srcset attribute.
In most situations, these elements and attributes are generated and added automatically, wrapped around a fallback <img> element. So if the system is setup properly, checking the src-attribute of the <img> element is sufficient.
These parsers extracts from 'embed' elements from embedded content. Most content is embedded using short codes like {youtube src=''}
and there are quite few flavors of these code in use. Some are html elements
By default, Joomla will not allow elements like iframe and embed in the editor. So if you are not using them, leave the parsers disabled to save a bit of resources.
If you need a specific code, please contact me.
iframe
This parser takes the src attribute of an iframe element
video
This parser takes the src attribute of an iframe element
embed
This parser takes the src attribute of an embed element
Aimy and AllVideos (by JoomlaWorks)
Extract the video URL or video-id from {youtube}https://youtu.be/zvUi1VZui7E{/youtube} and {vimeo}889793639{/vimeo}.
A video-id is converted into a proper link.
src style embed (includes All Video Share)
Link Replacements
Links with for example spaces are not allowed in HTML, however most browsers will happily show them without any problems.
For example, if you upload images with spaces in their names and insert them into content. Newer Joomla versions replace the spaces with '%20' older leave the spaces in the media input fields. The URL in the media field consists of
The fragment ( the part after #) is always ignored. This avoids problems with the media input field. The URL in the media field consists of the path to the image and the Joomla source in the fragment. The URL is correctly encoded however the fragment is not.
Be sure to have backups when replacing links.
Link replacement is not supported be all extractors/containers. Use the edit button to edit those manually.
The edit link might take you to a parent container, for example an article, links might be hiding. like in a custom field/
More information: One Click Link replacement
Checking
Static path prefixes
These paths define the locations of static resources. On most Joomla websites, that will be the image's and in fewer cases the templates/ folder.
How many links to check in one cron batch.
Compared to extracting links, checking them is far more time-consuming.
You can set different batch sizes for the HTTP and the Command Line Interface (CLI) crons, as the latter has less of a time restriction.
Typically, a website running PHP has a timeout (max_execution_time) of 30 - 60 seconds. The default is 30 seconds. The CLI usually has no time limit.
The max_execution_time divided by the Request Timeout
setting gives a rough upper limit for the amount of links that could be checked in one batch. You will find an estimate on the setup page.
Both the extraction as the checker phase are build as Failed? No Sweat. Let's try again latter. So it wouldn't matter if your job reaches the PHP timeout time. But it is cleaner to stay within it.
The administrator Pseudo cron always checks one link per request.
Maximum time to wait for a response when checking a link.
You can set different values for the HTTP and the CLI cron, as the latter as less of a time restriction. The administrator Pseudo cron uses the HTTP setting
There is a fine line between broken, maybe broken and working. With this option, the checker marks links that appear broken but most likely work for a normal visitor as a warning.
(This option is most likely removed in a future version)
This option is intended to reduce the amount of data transferred. With this option On the checker sends a HEAD (are you there) request to the remote server. Most servers response correctly with static content like images and stylesheets.
If the response is any kind of error or redirect, thus a response other than 200 the link is rechecked as GET.
Recheck link after
is the recheck interval for all links, broken or working. Links marked to ignore are always skipped.
With the Recheck broken link after
setting, you can set a shorter interval for broken links, to catch temporality failures.
This option is intended to reduce the amount of data transferred. With this option On the checker request only a part of the remote content. Successful Range request have a response code 206.
Usually, Range request only works with static content like images and zip files.
If the response is any kind of error or redirect, thus a response other than 200 or 206, the link is rechecked using a normal GET.
So with Use Head request and Use Range request the request are:
- HEAD
- response 200, perfect finish checking
- GET with range header
- response 200 or 206, perfect finish checking
- GET without range header
How often should broken links be rechecked.
The checker will always try to get the final destination of a link. This allows for easy replacement of the links.
It either uses CURLOPT_FOLLOWLOCATION or checks a link recursively bases on a redirect response code (301, 302, 303, 307) and the Location header.
If your server as allow_url_open restrictions, redirects can not be followed directly by curl, so the recursive method is used.
If you want to disable following redirects completely, disable this option.
The 'Max Redirects' number controls the maximum number of hops to follow. If this number is reached the link is reported as broken with custom response 'Too many redirects'
Depending on the response, the checker can keep the response from the remote server in the log.
Never
Disable saving any response, reducing database size.
Auto
Log the most common text responses if avaiable.
If the server responds with a 200 code on a HEAD Request, no content is exchanged and therefor nothing is logged.
If the server responds with a 206 - Partial content on a RANGE Request, only a part of the response is sent and therefor logged.
Always
Keep all the responses. This disables the range and head options
Text
Keep all text like responses. This disables the range and head options
Some plugins, like the External Link Extractor, will override this setting when needed.
A website might contain links that can not be checked, like mailto: links.
With this option, you can either show them in the reports (visible with the unchecked protocol filter) or ignore them completely.
Some websites serve content based on the browsers language
This tells the remote server which language you prefer. The automatic setting is based on your website's default language.
These options are used to masquerade the checker as a browser. Some Web Application firewalls (WAFs) respond friendlier if Cookies are set and match the User-Agent.
You can create one or more custom signature be adding a json file to [ROOT]/administrator/com_blc/forms/signatures.
Example:
{
"userAgent": "Some ser agent",
"headers": [
"Accept-Encoding: deflate, br",
"Sec-Fetch-Dest: document",
"Sec-Fetch-User: ?1",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
],
"Accept-Language": "Language string / Optional"
}
The Accept-Language
is optional, if set it will override the setting from the BLC configuration
With this option enabled, the Fetch metadata request header are adjusted according to the URL location.
Sec-Fetch-Site
This option is set to same-site for internal links and cross-site for external links.
Sec-Fetch-Mode
Should be set in the signature to 'navigate' as if a user clicks on a link to open a html-page, image, video or whatever.
Sec-Fetch-Dest
Should be set in the signature to 'document', as if a user views the destination in the browser.
The checker does not distinguish between links in an <img>-tag and <a>-tags.
With this option On , SSL certificates of remote websites are checked. Websites with invalid certificates will be reported as broken with a pseudo response SSL Certificate Error (606).
Depending on your servers (mis)configuration, SSL validation might fail for al websites or cause timeouts.
Invalid Server certificates
The most common issues are with SSL certificates for the domain itself:
- Expired
- Invalid domain
- SSL certificate problem: unable to get local issuer certificate
These problems should be visible in your web browser as well. Hopefully the owner of the website will resolve the issue soon.
SSL Chain problems
The SSL certificate chain is the list of certificates that contains the SSL certificate, intermediate certificate authorities, and root certificate authority that enables the connecting device to verify that the SSL certificate is trustworthy. A server should send the intermediate certificate(s). However, some servers are misconfigured and only send the SSL certificate itself. Web browsers are quite forgiving, a tool like curl is not.
You can check a certificate on a website like: https://www.ssllabs.com/ssltest/analyze.html. Message like `This server's certificate chain is incomplete`, `Chain issues Incomplete`, `SSL certificate problem: unable to get local issuer certificate` and `NOT TRUSTED` indicate some SSL chain issue.
It is possible to solve these issues on the server. This is quite technical and requires root access.
As SSL connection errors are not reported as HTTP Response codes, the checker uses custom codes to report SSL problems
It seems that some WAF's block browsers polling for different TLS versions. They expect Chrome to use TLS v1.3 without bothering about the older versions.
This option enforces the checker to use the selected version only.
Most modern web servers should support TLS v1.3, so setting this option to this newest version should work fine for most.
The checker will encounter older servers where the SSL Connection fails (CURLE_SSL_CONNECT_ERROR (35)). In that case the checker will retry using TLS v1.2.
- default: use TLS v1.2 or TLS v1.3
- TLS 1.2: use TLS v1.2 and if that fails, fallback to default
- TLS 1.3: use TLS v1.3 and if that fails, fallback to default.
This is an advanced feature best left to default
.
A certificate authority (CA) is a trusted entity that issues Secure Sockets Layer (SSL) certificates for websites. Like a kind of signature to ensure the web server's certificate is valid and issued to the owner of the domain.
If you are getting a lot of SSL Chain Errors your server might have an outdated certificate authority (CA) bundle. In that case you could try the Certificate Bundle shipped with Joomla (Bundled
), try a search on the System or provide your own bundle.
This option sets the CURL option CURLOPT_CAINFO, in addition CURL used the CURLOPT_CAPATH
option to find certificates. So this option can be used to add missing intermediate certificates.
This option has no effect on CURL implementations based on Schannel library (windows).
This enabled verbose logging (CURLOPT_VERBOSE).
This adds information about the SSL -handshake, request and response headers to the Log.
Effective after the link is checked with this option On.
This limits the amount of request to your own or an external website.
Be friendly to your neighbors!
Links are only (rec)checked if the throttle time has expired. The HTTP and Scheduded Task action will skip the throttled domain and continue with other links, fetching new links from the database until Checks per HTTP batch
is reached, or all links are considered.
The admin pseudo cron will simply skip the link. If this bothers, set the value for Pause between checks
to a value larger than Internal/External Domain throttle
and you will never hit a throttled domain.
For the CLI you can either skip a throttled domain or pause checking using the option Handle throttle with CLI
/ If you opt to skip links, the actual number of checked links might be lower than the configured Checks per CLI batch
.
This is a list of hosts that should never be checked.
For example, Facebook has a login restriction on a lot of pages.
You can either ignore these links completely or report them marked to ignore using the What to do with these domains option
Pattern matching on the path of links (https://example.com/path/).
Intended to exclude internal links, but should work on all links.
As with the Domains above, you can control the reporting with What to do with the matches
The field is intended for domain(s) that always redirect, for example:
- Links to a URL shortener service
- Affiliate links
- Google services like drive and maps.
The redirects are followed to the final destination, so the checker should still report broken links (400+ Response codes) but will not show any redirects. That includes 'valid' redirects at the final destination.
Valid links will have a Redirect Ignored custom response code.
Mail Reports
With this option enabled the report will include all items (articles, categories etc) where the link was found. Including the anchor and links to view and edit the item.
The minimum interval between two reports.
How many links should be reported. This is per 'what'
If set to yes the report will only be sent if there are new (broken) links.
If an existing link appears on a new location, a link is still considered old.
With the options: Broken, Warning, Redirect, Parked and New you can select what types of (broken) links you want to report.
Run the report after a http of cli extract operation.
Whether a report is sent still depends on the frequency.
This option is not active when using the Admin pseudo cron.
Run the report after a http of cli check operation.
Whether a report is sent still depends on the frequency.
This option is not active when using the Admin pseudo cron.
One or more Joomla Users to receive the report.
Issues
More on this on: RS Form shows wrong links