Advanced#

Usage tips#

Since yarsync allows using a command interface similar to git, one can synchronize several repositories simultaneously using myrepos.

If new data was added to several repositories simultaneously, commit the changes on one of them and synchronize that with the another. rsync should link the working directory with commits properly. This may fail depending on how you actually copied files (they may have changed attributes). In this case, create new commits in both repositories and manually rename them to be the same. Try to synchronize to see that all is linked properly. For example, when we move photographs from an SD card, we want to have at least two copies of them. It would be more reliable to copy data from the original source to two repositories than to push that from one of them to another (possible errors on the intermediate filesystem increase the risk). Make sure that the two repositories were synchronized beforehand.

Development#

Community contributions are very important for free software projects. The best thing for the project on the starting phase is to spread information and create packages for new operating systems.

yarsync was tested on ext4, NFSv4 and SimFS on Arch Linux and CentOS. Tests on other systems would be useful.

rsync limitations#

  • Millions of files will be synced very slowly.

  • rsync freezes when encountering too many hard links. Users report problems for repositories of 200 G or 90 GB, with many hard links. For the author’s repository with 30 thousand files (160 thousand with commits) and 3 Gb of data rsync works fine. If you have a large repository and want to copy it with all hard links, it is recommended to create a separate partition (e.g. LVM) and copy the filesystem as a whole. You can also remove some of older backups.

  • rsync may create separate files instead of hard linking them. It can be fixed quickly using the hardlink executable.

Alternatives#

Free software that uses rsync includes:

  • Back In Time. See previous snapshots using a GUI.

  • Grsync, graphical interface for rsync.

  • LuckyBackup. It is written in C++ and is mostly used from a graphical shell.

  • rsnapshot, a filesystem snapshot utility. rsnapshot makes it easy to make periodic snapshots of local machines, and remote machines over ssh. Files can be restored by the users who own them, without the root user getting involved.

Other syncronization / backup / archiving software:

  • casync is a combination of the rsync algorithm and content-addressable storage. It is an efficient way to deliver and update directory trees and large images over the Internet in an HTTP and CDN friendly way. Other systems that use similar algorithms include bup.

  • Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. duplicity uses librsync and is space efficient. It supports many cloud providers. In 2021 duplicity supports deleted files, full unix permissions, directories, and symbolic links, fifos, and device files, but not hard links. It can be run on Linux, MacOS and Windows (under Cygwin).

  • Git-annex manages distributed copies of files using git. This is a very powerful tool written in Haskell. It allows for each file to track the number of backups that contain it and their names, and it allows to plan downloading of a file to the local storage. This is its author’s use case: “I have a ton of drives. I have a lot of servers. I live in a cabin on dialup and often have 1 hour on broadband in a week to get everything I need”. I tried to learn git-annex, it was uneasy , and finally I found that it doesn’t preserve timestamps (because git doesn’t) and permissions. If that suits you, there is also a list of specialized related software. git-annex allows to use many cloud services as special remotes, including all rclone remotes.

  • Rclone focuses on cloud and other high latency storage. It supports more than 50 different providers. As of 2021, it doesn’t preserve permissions and attributes.

Continuous synchronization software:

  • gut-sync offers a real-time bi-directional folder synchronization.

  • Syncthing. A very powerful and developed tool, works on Linux, MacOS, Windows and Android. Mostly uses a GUI (admin panel is managed through a Web interface), but also has a command line interface.

  • Unison is a file-synchronization tool for OSX, Unix, and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other (pretty much like other syncronization tools work).

  • Dropbox, Google Drive, Yandex Disk and many other closed-source tools fall into this cathegory.

ArchWiki includes several useful scripts for rsync and a list of its graphical front-ends. It also has a list of cloud synchronization clients and a list of synchronization and backup programs. Wikipedia offers a comparison of file synchronization software and a comparison of backup software. Git-annex has a list of git-related tools.