Assuming a default 'COPYTO:ThreadCount' value of 5, and a default 'COPYTO:ThreadMultiplier' value of 5, SQL Backup picks up the oldest 25 files that it needs to copy.
When any of the 5 threads are available, it assigns a file to the thread to begin the copying process.
Over a duration of the 'COPYTO:SleepIntervalInSeconds' default value of 60 seconds, it checks every 'COPYTO:WaitIntervalInSeconds' value of 5 seconds if any of the 5 threads are available again, and assigns the next file to the free thread to be copied.
Once the 60 seconds has elapsed, it repeats the above cycle i.e. picks up the oldest 25 files that need to be copied, finds a free thread, assign the file to the thread etc.
So basically, the copying process is set up to copy a maximum of 25 files over a 60 seconds period. Assuming a transaction log backup interval of 5 minutes, and assuming your server can handle copying 25 files in a minute, SQL Backup should be able to copy a maximum of 1500 files over 5 minutes.
A potential bottleneck is if all 5 threads are occupied copying large files, and your transaction log backup interval is shorter than the time it takes to copy the file e.g. it takes 6 minutes to copy each file, your backup interval is only 5 minutes, and you have 5 or more databases generating that large of a transaction log backup file. The 5 worker threads copying the files will be preoccupied with the larger files, servicing the smaller files only later, than be preoccupied again with the larger files, and eventually everything gets back-logged.
Increasing the number of worker threads won't help, as network bandwidth is a fixed resource and the existing worker threads would already be using the maximum possible bandwidth. Increasing the 'COPYTO:ExpiryIntervalInMinutes' may help, as it would allow files to stay in the copy queue longer.
Another potential bottleneck is if you are generating more than 25 backup files a minute. As SQL Backup will only copy a maximum of 25 files a minute, you will eventually end up with a back log. Increasing the 'COPYTO:ThreadCount' and/or 'COPYTO:ThreadMultiplier' values will be required to handle this load.
To answer your questions:
>> 1) It appears that the files are not being copied in the order they were created.
Two possibilities - the older files have expired because they were not copied within the first 24 hours, and the later files were copied, or the larger files took longer to complete and smaller files (although newer) assigned to a different worker thread were copied over first.
In the first case, you'll need to increase the 'COPYTO:ExpiryIntervalInMinutes' value. Assuming the bottleneck occurs only during month end, the copy queue should clear up during the beginning of the next month.
In the second case, SQL Backup's transaction restore process will eventually restore the files once they are available. You might encounter a few errors when the newer files have been copied over before the older files, but once the older files have been copied over, the files will be restored in the correct order.
>> 2) Are there best practices on config settings for the registry entries that control copy behavior, specifically: COPYTO:ExpiryIntervalInMinutes and
COPYTO:ThreadCount ?
This really depends on your backup patterns. As described above, 'COPYTO:ExpiryIntervalInMinutes' helps in keeping the files longer in the copy queue, if your current network bandwidth has already maxed out. 'COPYTO:ThreadCount' would help if you have network bandwidth to spare.
>> In addition, is there a max value allowed for COPYTO:ExpiryIntervalInMinutes?
No, there is no maximum value.
>> 3) What are the valid values for the [status] field in backupfile_copylist? Here are the ones I think I know: A = Active, E = Expired, P = Pending, S = Success.
And C = Cancelled, used when a hosted storage upload copy item is cancelled.
4) Finally, how is the data in these admin tables maintained?
The retention policy that you set up for the local history applies to the 'backupfiles_copylist' and 'backupfiles_copylist_log' tables too. From the GUI, select the server instance, right click to bring up the context menu, and select the 'Server Options' item. The retention policy is the one under 'SQL Server backup and restore history'. If you need a different retention policy, you'll need to perform the deletion manually. It's only those 2 tables that are involved.
Lastly, there are some other timing settings that you can tweak, if SQL Backup requires more than a single attempt to copy your files. You can see if this is the case by checking the 'count' column in the 'backupfiles_copylist' table. If this is happening in your case, let mw know and I'll explain the settings.
When any of the 5 threads are available, it assigns a file to the thread to begin the copying process.
Over a duration of the 'COPYTO:SleepIntervalInSeconds' default value of 60 seconds, it checks every 'COPYTO:WaitIntervalInSeconds' value of 5 seconds if any of the 5 threads are available again, and assigns the next file to the free thread to be copied.
Once the 60 seconds has elapsed, it repeats the above cycle i.e. picks up the oldest 25 files that need to be copied, finds a free thread, assign the file to the thread etc.
So basically, the copying process is set up to copy a maximum of 25 files over a 60 seconds period. Assuming a transaction log backup interval of 5 minutes, and assuming your server can handle copying 25 files in a minute, SQL Backup should be able to copy a maximum of 1500 files over 5 minutes.
A potential bottleneck is if all 5 threads are occupied copying large files, and your transaction log backup interval is shorter than the time it takes to copy the file e.g. it takes 6 minutes to copy each file, your backup interval is only 5 minutes, and you have 5 or more databases generating that large of a transaction log backup file. The 5 worker threads copying the files will be preoccupied with the larger files, servicing the smaller files only later, than be preoccupied again with the larger files, and eventually everything gets back-logged.
Increasing the number of worker threads won't help, as network bandwidth is a fixed resource and the existing worker threads would already be using the maximum possible bandwidth. Increasing the 'COPYTO:ExpiryIntervalInMinutes' may help, as it would allow files to stay in the copy queue longer.
Another potential bottleneck is if you are generating more than 25 backup files a minute. As SQL Backup will only copy a maximum of 25 files a minute, you will eventually end up with a back log. Increasing the 'COPYTO:ThreadCount' and/or 'COPYTO:ThreadMultiplier' values will be required to handle this load.
To answer your questions:
>> 1) It appears that the files are not being copied in the order they were created.
Two possibilities - the older files have expired because they were not copied within the first 24 hours, and the later files were copied, or the larger files took longer to complete and smaller files (although newer) assigned to a different worker thread were copied over first.
In the first case, you'll need to increase the 'COPYTO:ExpiryIntervalInMinutes' value. Assuming the bottleneck occurs only during month end, the copy queue should clear up during the beginning of the next month.
In the second case, SQL Backup's transaction restore process will eventually restore the files once they are available. You might encounter a few errors when the newer files have been copied over before the older files, but once the older files have been copied over, the files will be restored in the correct order.
>> 2) Are there best practices on config settings for the registry entries that control copy behavior, specifically: COPYTO:ExpiryIntervalInMinutes and
COPYTO:ThreadCount ?
This really depends on your backup patterns. As described above, 'COPYTO:ExpiryIntervalInMinutes' helps in keeping the files longer in the copy queue, if your current network bandwidth has already maxed out. 'COPYTO:ThreadCount' would help if you have network bandwidth to spare.
>> In addition, is there a max value allowed for COPYTO:ExpiryIntervalInMinutes?
No, there is no maximum value.
>> 3) What are the valid values for the [status] field in backupfile_copylist? Here are the ones I think I know: A = Active, E = Expired, P = Pending, S = Success.
And C = Cancelled, used when a hosted storage upload copy item is cancelled.
4) Finally, how is the data in these admin tables maintained?
The retention policy that you set up for the local history applies to the 'backupfiles_copylist' and 'backupfiles_copylist_log' tables too. From the GUI, select the server instance, right click to bring up the context menu, and select the 'Server Options' item. The retention policy is the one under 'SQL Server backup and restore history'. If you need a different retention policy, you'll need to perform the deletion manually. It's only those 2 tables that are involved.
Lastly, there are some other timing settings that you can tweak, if SQL Backup requires more than a single attempt to copy your files. You can see if this is the case by checking the 'count' column in the 'backupfiles_copylist' table. If this is happening in your case, let mw know and I'll explain the settings.