Article Extractor and Checker
You can use the Broken Link Checker extension to scan your entire Joomla content for broken links. This plugin allows you to find inaccessible links present in your Joomla articles. Schedule a scan to run after a certain interval, go through the real-time report containing status and location of broken links, and use the fix link feature to edit and fix the broken links identified during the link scan. The extension is compatible with both Joomla 4.4 and Joomla 5.
This plugin acts as extractor and checker.
Extracting links from Joomla Articles
This BLC Content Plugin handles links for Joomla articles in the following locations:
- text area's of the article (introtext and fulltext)
- image's from the 'images and links' tab
- url's from the 'images and links' tab
This plugin is included in the Broken Link Checker Package.
Checking (fixing) Internal Links
Wrong categories
Joomla prefers to have links to other articles as relative query links in your content, for example this article (with id 3) might contain a link to the Unsef plugin, that link look like: index.php?option=com_content&view=article&id=7&catid=8
. The Joomla SEF plugin takes care of changing this link to /extensions/unsef
However, if the category of the destination articles changes, for example from extensions to documents (catid=8 to catid=9
), the link in this article is not updated. It remains catid=8.
Joomla SEF does not retrieve the current category for the article but uses the one provided in the link. With a wrong category-id the link to an article might end up on the wrong category or results in a 404 - Page not found. Depending on your SEF settings.
This plugin can (optionally) check and fix the category of articles links. The links are not replaced automatically but presented as internal redirects.
Another tool to correct categories is the SEF Plus System Plugin, which will take care of correcting categories.
Options
Plugin
With this option enabled, the checker will find the correct category ID for each article.
To build the SEF links Joomla needs the article alias and category alias. For example, for this article the alias would be 'aliases' and the category alias would be 'content extractor options'.
Old style joomla links have the alias as part of the query URL:
index.php?id=200:aliases&catid=32:content-extractor-options
,
newer editor insert the links without the aliases:
index.php?id=200&catid=32
The problem is that if the aliases are set but wrong, Joomla will happily create wrong SEF links. For example:
index.php?id=200:old-alias&catid=32:content-extractor-options
would go to
/options/content-extractor-options/200-old-alias
Although that works fine, it gets worse if you have the option Remove IDs from URLs
enabled in the Articles Integrations Options. Then Joomla will create a link to a 404 page:
/options/content-extractor-options/old-alias
Resulting in a 404 page.
The broken link checker can detect and report wrong aliases. You can set the alias:
- As current
If the query as the alias set, keep it, replacing with the correct one if needed. - Always remove
- Always add
Adding the alias saves on a database query, per link, when rendering a page. Unless you frequently change the aliases, this is probably the best option.
Removing the alias will ensure the aliases are always retrieved fresh from the database and therefor correct. Although the Joomla routing system will not correct a wrong category ID. The Sef Plus plugin does.
Custom Fields
Whether to enable extracting and replacing links for custom fields at all.
There are many different types of custom fields. Only a few might contain links.
- In a default configuration, the 'Editor' type and the 'URL' types may contain links.
- 'Editor' is parsed as HTML,
- 'URL' as plain text link.
- 'Textarea' and 'Text' might contain HTML with links, depending on your sites' configuration.
- For the 'Text' field it is more likely that these fields contain plain text links. See the setting Treat fields as Link (ids)
- The 'Media' fields contain plain text links.
Disabling extracting for specific types has a small performance benefit.
With the Replace button, you enable one click replacement of links, ensure you have a backup or rollback option before enabling link replacements.
If enabled, 'Text' fields are treated as HTML. However, they might be used as plain text links.
In this list, you can select which 'Text' and 'Textarea' fields should be treated as plain text.
Due to the limitation of the Joomla form-system, you will always see 'Text' and 'Textarea' fields from all components. So in the Article extractor you will see fields for Categories.
Subforms are traversed, and the fields are threaded as above.
The returned values of a SQL type field might change over time, for example, if the query returns a list of published articles:
SELECT id AS value, title AS text from #__content where state = 1 ORDER BY value=0 desc, text ASC
If a selected value (article) is unpublished or deleted, the stored value in the SQL field is not valid anymore.
Joomla itself does not validate values during display.
This option allows to check/validate custom fields values against the query set in the field definition.
To perform this task, the field values are extracted on stored and displayed as a kind of links: sqlfield:://<id>/<data>
Joomla does not delete field values if a field is deleted, unpublished or not applicable anymore (for example if the categories don't match).
Preserving the values might be handy if you move articles and fields between categories, it also clutters the database.
This option will remove obsolete values when the container is parsed.
This option does not extract or check data. It will show a message if any field values are purged.
Advanced
Plugin settings might impact the extracted data. This option allows to purge all data for a plugin whenever the plugin settings change.
When an item is trashed the extracted data must be removed.
- immediately : Delete extracted data - Links are not visible anymore in the Links Menu
- on the next parse: Do Nothing - Links are visible in Links Menu until next extract
Whenever an item is saved the content must be scanned for links again. This can be done:
- immediately : Re-extract links
- on the next parse be deleting the extracted data: Delete extracted data - Links are not visible anymore in Links Menu
- on the next parse using the modified date: Do Nothing - Links are visible in Links Menu until next extract.
Some containers like modules do not have a modified date. Use Re-extract or Delete .
Extract only links from items that have Published State and today is between the Publish Up and Publish Down dates, if present.
Trashed containers are never extracted.
Extract only links from items that have Public Access.
See the Auto login plugin for more information on checking content with access restrictions
This option allows to enable/disable link replacements on a per-plugin basis.
Set the global option to Off
and specific plugins to On
to allow link replacement for only those plugins.
Set the global option to On
and specific plugins to Off
to prevent link replacement for only those plugins.
'One-click-link-replacement' is a convenient way to quickly update links. However, manually updating the link and the 'surrounding' content is often a better way to ensure the information is still consistent.