Cron Jobs
The broken link checker works mostly in the background, to execute the tasks you will need some cron jobs
To run the jobs automaticity, you can create cronjobs.
There are four different options to run cron jobs.
- Using the administrator cron module. This module provides a pseudo cron that runs in the background while you are logged in. This keeps the extracted and checked links up to date while you are working in the administrator.
- Using the HTTP interface from your browser or with a cron daemon and tools like wget and curl. The exact implementation depends on your hosting provider.
- Using Joomla's build in task scheduler.
- Using Joomla's Command line interface.
The Setup & Maintenance page in the administrator shows the links and command to use for the HTTP and CLI cron. As with estimated on intervals and batch sizes.
Whether you can run the CLI or HTTP commands as a cronjob depends on your hosts' configuration. Alternatively, you can head over to System ⇾ Scheduled Tasks and create a BLC Task.
Using the Command Line is the most effective method to parse and check links in batches.
Cron Frequency?
Extracting frequency
With the Admin Pseudo Cron Module changed items should be reprocessed will you are active in the administrator.
With the global or extractor setting 'Extract on Save
' active, items should be reprocessed on save.
Furthermore, Articles and most other content types in Joomla have a 'modified on date', so only changed items will be reprocessed. So in general the extract cron won't need a high frequency. Maybe daily to process items without modified dates
If your site has a lot of unattended changes, like with the publish-up and publish-down times for articles, you might need to decrease the interval. Or set the option 'Only extract from published content
' to off.
Seeding after installation
After first time installation of the link checker the extractor has to parse an existing content to find all links. In this phaste you might need a higher interval for this cron job.
Check frequency
The frequency to recheck extracted links depends on the number of links and the recheck interval.
The Admin Pseudo Cron Module should be able to keep up with newly added links. Rechecking links should occur even if you are not working in the administrator so you will need some cron to do that.
You will find an estimate on the Maintenance page.
Domain throttling
To prevent flooding a remote host, there is a request limit per domain. You can change the period between two requests to the same host in the BLC Options.
If you have a lot of links to the same domain, you might need several cron runs to check them all.
Other Options
There are quite a few settings to control the cron jobs. Like batch sizes and check frequency. These options are depicted on the main options page
Maximum execution time
There are no maximum execution time settings. An aborted operation will simply restart the next time.
If multiple cron's run simultaneously, Inserting links into the database can collide, and you might get database errors (duplicate entry).