π

Easy data handling with git-annex

Show Sidebar

git-annex is using the powerful distributed version control system git as a distributed file system for synchronizing data between several hosts or (external) data storages like USB-sticks, backup hard disks or even remote servers using SMB or ssh. Unlike git, git-annex can also handle very large files.

Besides the usual advantages of such a solution (easy remote backup, access all historical versions, working with other people on the same files, ...) git-annex is able to get you even /more benefit/ for your workflows. On the walkthrough page you'll notice the sheer beauty of this solution.

For example if you do not want to keep all files on every repository you can put them /somewhere/ and locate them later on:

# git annex get video/hackity_hack_and_kaxxt.mov
get video/_why_hackity_hack_and_kaxxt.mov (not available)
  Unable to access these remotes: usbdrive, server
  Try making some of these repositories available:
    5863d8c0-d9a9-11df-adb2-af51e6559a49  -- my home file server
    58d84e8a-d9ae-11df-a1aa-ab9aa8c00826  -- portable USB drive
    ca20064c-dbb5-11df-b2fe-002170d25c55  -- backup SATA drive
failed
# sudo mount /media/usb
# git annex get video/hackity_hack_and_kaxxt.mov
get video/hackity_hack_and_kaxxt.mov (from usbdrive...) ok
# git commit -a -m "got a video I want to rewatch on the plane"	  

Moving files between repositories (for getting space on the laptop for example) is very easy too:

# git annex move my_cool_big_file --to usbdrive
move my_cool_big_file (to usbdrive...) ok
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
move video/hackity_hack_and_kaxxt.mov (from fileserver...)
WORM-s86050597-m1274316523--hackity_hack_and_kax 100%   82MB 199.1KB/s   07:02
ok	  

If you are having very important files, you can tell git-annex to keep at least /n/ copies of them. The default value is one but you can decide to change the value for your diploma thesis folder to two and for all of your mp3 files to seven :-)

If you like the ideas here but have slightly different use-cases, please make sure to read the page "What git-annex is not" and check out its cool links to other projects like bup (a backup solution), ShareBox (a DropBox replacement in early development stage), Unison File Synchronizer (using rsync for syncing, no versioning), and so forth.

I am planning to test git-annex sometimes after I made sure to learn git that much, that I trust my own command line ;-)

Note: Stay tuned here for my upcoming blog entry about another attempt to replace DropBox with a secure method using DVCS-Autosync where you do not have to trust the untrustworthy and unstable cloud.

Comment via email (persistent) or via Disqus (ephemeral) comments below: