Advanced#
Usage tips#
Since yarsync
allows using a command interface similar to git
,
one can synchronize several repositories simultaneously using
myrepos.
If new data was added to several repositories simultaneously,
commit the changes on one of them and synchronize that with the another.
rsync
should link the working directory with commits properly.
This may fail depending on how you actually copied files (they may have
changed attributes).
In this case, create new commits in both repositories
and manually rename them to be the same.
Try to synchronize to see that all is linked properly.
For example, when we move photographs from an SD card, we want to have
at least two copies of them.
It would be more reliable to copy data from the original source to two repositories
than to push that from one of them to another (possible errors on the intermediate filesystem
increase the risk). Make sure that the two repositories were synchronized beforehand.
Development#
Community contributions are very important for free software projects. The best thing for the project on the starting phase is to spread information and create packages for new operating systems.
yarsync
was tested on ext4, NFSv4 and SimFS on Arch Linux and CentOS.
Tests on other systems would be useful.
Hard links#
The file system must support hard links if you plan to use commits. Multiple hard links are supported by POSIX-compliant and partially POSIX-compliant operating systems, such as Linux, Android, macOS, and also Windows NT4 and later Windows NT operating systems [Wikipedia].
Notable file systems to support hard links include [hard links and comparison of file systems from Wikipedia]:
EncFS (an Encrypted Filesystem using FUSE). Note that it doesn’t support hard links when External IV Chaining is enabled (this is enabled by default in paranoia mode, and disabled by default in standard mode).
ext2-ext4. Standard on Linux. Ext4 has a limit of 65000 hard links on a file.
HFS+. Standard on Mac OS.
NTFS. The only Windows file system to support hard links. It has a limit of 1024 hard links on a file.
SquashFS, a compressed read-only file system for Linux.
Hard links are not supported on:
FAT, exFAT. These are used on many flash drives.
Joliet (“CDFS”), ISO 9660. File systems on CDs.
The majority of modern file systems support hard links. A full list of file system capabilities can be found on Wikipedia.
One can copy data to file systems without hard links, but this will reduce the functionality of yarsync
,
and one should take care not to consume too much disk space if accidentally copying files instead of hard linking.
rsync limitations#
Millions of files will be synced very slowly.
rsync
freezes when encountering too many hard links. Users report problems for repositories of 200 G or 90 GB, with many hard links. For the author’s repository with 30 thousand files (160 thousand with commits) and 3 Gb of datarsync
works fine. If you have a large repository and want to copy it with all hard links, it is recommended to create a separate partition (e.g. LVM) and copy the filesystem as a whole. You can also remove some of older backups.rsync
may create separate files instead of hard linking them. It can be fixed quickly using the hardlink executable.
Alternatives#
Free software that uses rsync includes:
Back In Time. See previous snapshots using a GUI.
Grsync, graphical interface for rsync.
LuckyBackup. It is written in C++ and is mostly used from a graphical shell.
rsnapshot, a filesystem snapshot utility.
rsnapshot
makes it easy to make periodic snapshots of local machines, and remote machines over ssh. Files can be restored by the users who own them, without the root user getting involved.
Other syncronization / backup / archiving software:
casync is a combination of the rsync algorithm and content-addressable storage. It is an efficient way to deliver and update directory trees and large images over the Internet in an HTTP and CDN friendly way. Other systems that use similar algorithms include bup.
Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server.
duplicity
useslibrsync
and is space efficient. It supports many cloud providers. In 2021duplicity
supports deleted files, full unix permissions, directories, and symbolic links, fifos, and device files, but not hard links. It can be run on Linux, MacOS and Windows (under Cygwin).Git-annex manages distributed copies of files using git. This is a very powerful tool written in Haskell. It allows for each file to track the number of backups that contain it and their names, and it allows to plan downloading of a file to the local storage. This is its author’s use case: “I have a ton of drives. I have a lot of servers. I live in a cabin on dialup and often have 1 hour on broadband in a week to get everything I need”. I tried to learn
git-annex
, it was uneasy , and finally I found that it doesn’t preserve timestamps (becausegit
doesn’t) and permissions. If that suits you, there is also a list of specialized related software.git-annex
allows to use many cloud services as special remotes, including all rclone remotes.Rclone focuses on cloud and other high latency storage. It supports more than 50 different providers. As of 2021, it doesn’t preserve permissions and attributes.
Continuous synchronization software:
gut-sync offers a real-time bi-directional folder synchronization.
Syncthing. A very powerful and developed tool, works on Linux, MacOS, Windows and Android. Mostly uses a GUI (admin panel is managed through a Web interface), but also has a command line interface.
Unison is a file-synchronization tool for OSX, Unix, and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other (pretty much like other syncronization tools work).
Dropbox, Google Drive, Yandex Disk and many other closed-source tools fall into this cathegory.
ArchWiki includes several useful scripts for rsync and a list of its graphical front-ends. It also has a list of cloud synchronization clients and a list of synchronization and backup programs. Wikipedia offers a comparison of file synchronization software and a comparison of backup software. Git-annex has a list of git-related tools.