6.3.1. Site Sync¶
Site sync - both one-way and bidirectional - may fail due to conflicts which cannot be automatically resolved. This page outlines options for intervening to resolve such conflicts.
6.3.1.1. Snapshot Rotation¶
By default, snapdiff snapshots will always rotate, even if an error occurs during sync.
Traditionally, snapshots are rolled-back on error. However, this can lead to changes being replayed inappropriately, leading to further errors and conflicts. Because of this, the default behaviour was changed to always rotate.
The downside to this approach is that some changes may be missed by sync. For example, if an network issues prevent a file from being synced, that file will not be re-synced unless or until it changes again. Note, however, there are retries within a sync run to mitigate such temporary issues.
If a sync job does fail, it will be necessary to determine the source of the error and manually resolve any issues. The job details will report on which paths failed overall. Looking at the individual task details will give more information on where and why a specific path failed.
To disable this behaviour, so that snapshots will rollback on error, set the runtime field snapdiff_rotate_on_error
to False
Note
If the sync is run within a schedule, it will use the first subscribed workflow's snapdiff_rotate_on_error value. This does not apply to bidirectional_sync as it can only have one subscribed workflow.
6.3.1.2. Manual Resolution¶
In some cases it is possible to resolve conflicts by manually applying changes.
For example, if a file is moved on site A and deleted on site B, sync will fail because there is no file to move (or delete, depending on the sync direction) on the target site. In this case, maunally deleting the file on site A, or re-sending the file (in its new location) onto site B will resolve the conflict. Thereafter, sync will be able to run without issue.
6.3.1.3. Re-sync all¶
Another option is to re-sync everything from scratch. This is the safest option, as it ensures that no file changes are lost.
Note that sync will skip any files which are already in the correct state, so a re-sync won't take as long as syncing to a brand new target.
Snapdiff-based sync uses filesystem snapshots to track file changes over time. The snapdiff discovery uses a 'last snap' file to record the last snapshot which was successfully synced.
This last snap file is located at /mmfs1/.rotate/ngenea-worker.lastsnap.name.<fileset_name>.id.<fileset_id>
By removing the last snap file, the next sync run will behave as if it has not been run before, and so will sync everything. Once sync has run successfully, you can safely delete the snapshot which was previously recorded in the last snap file (before that file was deleted).
6.3.1.4. Force rotate¶
The riskiest option is to force a snapshot rotation. This effectively says that you don't care about the current failure and just want the sync to move on.
As discussed above, this is the default behaviour. Note that this may result in some file changes not being synced. For a safer option, see Re-sync all above.
The following steps can be used as a one-off when the default rotate-on-error function has been disabled.
To force a rotate, you should first temporarily disable sync. This can be done by setting the sync schedule to disabled in the Ngenea Hub UI.
Next, create a new snapshot of the fileset. The name is expected to be of the form ngenea-worker.snapdiff.<timestamp>
.
Update the 'last snap' file (described above), replacing the currently recorded snapshot name with the name of the new snapshot you just created.
This last snap file is located at /mmfs1/.rotate/ngenea-worker.lastsnap.name.<fileset_name>.id.<fileset_id>
Finally, re-enable sync. Sync will now pick up new file changes starting from the point at which the new snapshot was created.
Once sync has run successfully, you can safely delete the snapshot which was previously recorded in the last snap file (before that file was updated).
6.3.1.5. Locking Errors¶
Lock files are used to ensure that a sync for a given fileset will not run if one is already running.
In this case, any new sync job will fail, and under the snapdiff task details, you will see an error like
SnapdiffLockError: Could not perform snapdiff as one is currently running for provided fileset
Under rare circumstances, a lock may not be correctly cleaned up, preventing syncs from running, even though there are none currently active.
In that case, the lock file can be removed manually. First, ensure there there aren't any syncs running. For extra safety, temporarily disable any scheduled syncs.
The lock file is located at /mmfs1/.rotate/snapdiff.name.<fileset_name>.id.<fileset_id>.lock
Any snapdiff lock file will automatically expire after 24 hours by default. The lifetime can be changed using the lock_threshold
site setting