Skip to main content

Article Extractor and Checker

You can use the Broken Link Checker extension to scan your entire Joomla content for broken links. This plugin allows you to find inaccessible links present in your Joomla articles. Schedule a scan to run after a certain interval, go through the real-time report containing  status and location of broken links, and use the fix link feature to edit and fix the broken links identified during the link scan. The extension  is compatible with both Joomla 4.4 and Joomla 5.

This plugin acts as extractor and checker.

Extracting links from Joomla Articles

This BLC Content Plugin handles links for Joomla articles in the following locations:

  • text area's of the article (introtext and fulltext)
  • image's from the 'images and links' tab
  • url's from the 'images and links' tab

This plugin is included in the Broken Link Checker Package.

Checking (fixing) Internal Links

Wrong categories

Joomla prefers to have links to other articles as relative query links in your content, for example this article (with id 3) might contain a link to the Unsef plugin, that link look like: index.php?option=com_content&view=article&id=7&catid=8. The Joomla SEF plugin takes care of changing this link to /extensions/unsef

However, if the category of the destination articles changes, for example from extensions to documents (catid=8 to catid=9), the link in this article is not updated. It remains catid=8. 

Joomla SEF does not retrieve the current category for the article but uses the one provided in the link. With a wrong category-id the link to an article might end up on the wrong category or results in a 404 - Page not found. Depending on your SEF settings. 

This plugin can (optionally) check and fix the category of articles links. The links are not replaced automatically but presented as internal redirects.

Another tool to correct categories is the SEF Plus System Plugin, which will take care of correcting categories.

Options

Plugin

Check the Category ID in parsed links

With this option enabled, the checker will find the correct category ID for each article.

Aliases

To build the SEF links Joomla needs the article alias and category alias. For example, for this article the alias would be 'aliases' and the category alias would be 'content extractor options'.

Old style joomla links have the alias as part of the query URL:

index.php?id=200:aliases&catid=32:content-extractor-options,

newer editor insert the links without the aliases:

 index.php?id=200&catid=32

The problem is that if the aliases are set but wrong, Joomla will happily create wrong SEF links. For example:

 index.php?id=200:old-alias&catid=32:content-extractor-options

would go to

/options/content-extractor-options/200-old-alias

Although that works fine, it gets worse if you have the option Remove IDs from URLs enabled in the Articles Integrations Options. Then Joomla will create a link to a 404 page:

/options/content-extractor-options/old-alias

Resulting in a 404 page.

The broken link checker can detect and report wrong aliases. You can set the alias:

  • As current
    If the query as the alias set, keep it, replacing with the correct one if needed.
  • Always remove
  • Always add

Adding the alias saves on a database query, per link, when rendering a page. Unless you frequently change the aliases, this is probably the best option.

Removing the alias will ensure the aliases are always retrieved fresh from the database and therefor correct. Although the Joomla routing system will not correct a wrong category ID. The Sef Plus plugin does.

 

 

 

Custom Fields

Enable Custom fields

Whether to enable extracting and replacing links for custom fields at all.

Custom Fields: 'type'

There are many different types of custom fields. Only a few might contain links.

  • In a default configuration, the 'Editor' type and the 'URL' types may contain links.
    • 'Editor' is parsed as HTML,
    • 'URL' as plain text link.
  • 'Textarea' and 'Text' might contain HTML with links, depending on your sites' configuration.
    • For the 'Text' field it is more likely that these fields contain plain text links. See the setting Treat fields as Link (ids)
  • The 'Media' fields contain plain text links.

Disabling extracting for specific types has a small performance benefit.

With the Replace button, you enable one click replacement of links, ensure you have a backup or rollback option before enabling link replacements.

Treat fields as Link (ids)

If enabled, 'Text' fields are treated as HTML. However, they might be used as plain text links.

In this list, you can select which 'Text' and 'Textarea' fields should be treated as plain text.

Due to the limitation of the Joomla form-system, you will always see 'Text' and 'Textarea' fields from all components. So in the Article extractor you will see fields for Categories.

Custom Fields: subforms

Subforms are traversed, and the fields are threaded as above.

SQL fields

The returned values of a SQL type field might change over time, for example, if the query returns a list of published articles:

SELECT id AS value, title AS text from #__content where state = 1 ORDER BY value=0 desc, text ASC

If a selected value (article) is unpublished or deleted, the stored value in the SQL field is not valid anymore.

Joomla itself does not validate values during display. 

This option allows to check/validate custom fields values against the query set in the field definition.

To perform this task, the field values are extracted on stored and displayed as a kind of links: sqlfield:://<id>/<data>

Purge obsolete fields

Joomla does not delete field values if a field is deleted, unpublished or not applicable anymore (for example if the categories don't match).

Preserving the values might be handy if you move articles and fields between categories, it also clutters the database.

This option will remove obsolete values when the container is parsed.

This option does not extract or check data. It will show a message if any field values are purged.

Advanced

Delete extracted data on plugin save

Plugin settings might impact the extracted data. This option allows to purge all data for a plugin whenever the plugin settings change.

When Deleted

When an item is trashed the extracted data must be removed.

  • immediately : Delete extracted data - Links are not visible anymore in the Links Menu
  • on the next parse: Do Nothing - Links are visible in Links Menu until next extract
When Saved

Whenever an item is saved the content must be scanned for links again. This can be done:

  • immediately : Re-extract links
  • on the next parse be deleting the extracted data: Delete extracted data - Links are not visible anymore in Links Menu
  • on the next parse using the modified date: Do Nothing - Links are visible in Links Menu until next extract.

Some containers like modules do not have a modified date. Use Re-extract or Delete .

Only extract from published content

Extract only links from items that have Published State and today is between the Publish Up and Publish Down dates, if present.

Trashed containers are never extracted.

Only extract from public visible content

Extract only links from items that have Public Access.

See the Auto login plugin for more information on checking content with access restrictions

Link replacements

This option allows to enable/disable link replacements on a per-plugin basis.

Set the global option to Off and specific plugins to On to allow link replacement for only those plugins.

Set the global option to On and specific plugins to Off to prevent link replacement for only those plugins.

'One-click-link-replacement' is a convenient way to quickly update links. However, manually updating the link and the 'surrounding' content is often a better way to ensure the information is still consistent.