Skip to content

per2jensen/dar-backup

dar-backup

Long-term archival backups for Linux — with integrity you can prove and repair

Codecov Snyk Vuln findings CI PyPI version PyPI downloads # clones Milestone 🎯 Stats powered by ClonePulse

dar-backup is for Linux users who want serious, long-term backups — not just file copies. It automates FULL / DIFF / INCR as independent archive types built on two exceptional open-source tools:

  • dar (Disk ARchiver) — a powerful, actively maintained archiver by Denis Corbin that handles differential and incremental archives, built-in verification, catalogue databases, and precise file selection. dar is the engine that makes long-term archival practical. It deserves to be far better known than it is.
  • par2cmdline — Parchive's redundancy format can detect and repair corruption in any archive, as long as the .par2 files travel with the dar archives. A quiet but remarkable piece of technology.

dar-backup wires these two tools together into a fully automated backup system, with every archive verified and a random set (configurable) of files restored to a test directory before the job completes.

Is this for you?

✅ You back up irreplaceable data — photos, documents, home-made video — and want to be certain you can restore any file to any point in time, years from now

✅ You run backups as a normal user — root is not required, and FUSE-mounted filesystems (Nextcloud, rclone, sshfs) work correctly

✅ You want bitrot repair to travel with your archives — onto USB disks, offsite copies, and cloud storage — without depending on the original system

✅ You want unattended, scheduled backups with Discord notifications on success or failure

✅ You want a transparent, no-lock-in tool built on proven Unix components

✗ You need a GUI or Windows support

✗ You need multiple backups per day — dar-backup is designed around one backup run per day per definition (one FULL, one DIFF, one INCR)


TL;DR

# prep
sudo apt -y install dar par2 python3 python3-venv
INSTALL_DIR=/tmp/dar-backup; mkdir "$INSTALL_DIR" && cd "$INSTALL_DIR"
python3 -m venv venv    # create a virtual environment
. venv/bin/activate     # activate the virtual environment
# install and run dar-backup
pip install dar-backup
demo --install && manager --create-db
dar-backup --full-backup

dar-backup runs FULL, DIFF, and INCR backup cycles across as many backup definitions as you need (e.g. photos, documents, homevideos). After each archive it:

  1. Verifies the archive with dar -t
  2. Restore-tests a random sample of files and compares them byte-for-byte against the source
  3. Creates PAR2 redundancy files so the archive can be repaired if bitrot occurs later
  4. Notifies your Discord channel on completion or failure

Schedules are managed by systemd timers (generated for you). Catalogs of every archive are maintained by dar_manager, enabling single-file Point-in-Time Recovery without a database server.

Version 1.1.8 · reached 1.0.0 on October 9, 2025 · Changelog


A personal digital preservation system

Most backup tools are designed to survive hardware failure. dar-backup is designed to survive time.

That is a different problem.

Hardware failures are acute — they happen on a known day, with the original system still understood, the software environment still intact, and recovery procedures still fresh. Time introduces a different class of failure:

  • Format drift — tools and expectations shift over years
  • Software obsolescence — the restore environment may no longer exist
  • Silent corruption — bitrot accumulates unnoticed on disks and media
  • Forgotten procedures — institutional knowledge degrades or disappears
  • Loss of context — future operators, family members, or administrators may need to recover data without the original author present

For personal archives — family photographs, home video, research, creative work — these are the real failure modes. The data often grows more valuable with age, not less. And it is irreplaceable.

dar-backup is built around a single question: will this archive be recoverable years from now, under conditions I cannot fully predict today?


Recovery confidence over storage efficiency

Philosophy:

"Every backup is validated not when it is created, but when it is successfully restored".

This principle drives the design choices behind dar-backup.

dar-backup intentionally prioritizes recoverability over deduplication ratios or storage efficiency.

Storage costs are typically lower than the cost of data loss or recovery failure. The cost of data loss can be in currency or perhaps even more important the sentimental value of your history.

This priority is reflected throughout the design:

  • Independent archives — each backup is a discrete, self-contained artifact, not a node in a shared repository. Archives can be copied, moved, and verified individually.
  • Verification after every backupdar -t runs automatically. A backup that has never been tested is an assumption, not a guarantee.
  • Restore testing after every backup — a random sample of files is extracted and compared byte-for-byte against the source. Each restore test increases confidence in recoverability through repeated, physical verification of data integrity.
  • PAR2 redundancy travels with the archive — integrity protection and repair capability are embedded in the archive set itself, not dependent on the original system.
  • No dependency on the original machine — a single static dar binary is sufficient. No installation, configuration, or version matching required.
  • Documentation as part of the system — long-term preservation requires preserving the knowledge needed to use the archives, not just the archives themselves.

Why independent archives matter

Many modern backup systems maintain a single evolving repository — a shared pool of data that stores all snapshots efficiently. This works well for scenarios where the system is intact and trusted.

Over long time horizons, the repository model carries risk. If the repository is damaged, all snapshots may be affected. Restore depends on the health of the entire system. The repository format may evolve. Tooling may change or disappear.

dar-backup uses independent, self-contained archive files instead:

  • Each archive can be copied to independent media (USB, cloud, offsite)
  • Each archive can be verified in isolation, anywhere
  • Each archive can be repaired with its accompanying PAR2 files, without the original system
  • Point-in-Time Recovery operates through dar_manager catalogs — no database server required

The result is that recovery confidence does not degrade with distance — distance being time, location, or technical context.


Features

  • FULL / DIFF / INCR backup cycles — per backup definition, independently scheduled
  • Automatic archive verificationdar -t after every backup run
  • Automatic restore test — random files extracted and compared to source after each backup; configurable excludes for cache dirs, temp files, locks
  • PAR2 redundancy — configurable coverage per backup type (FULL/DIFF/INCR); optionally stored in a separate directory (different device or offsite mount)
  • Docker based restore environment (time capsule) — optional dar-backup-image providing a fully packaged, known-working restore environment (dar-backup, dar, PAR2, and tooling). Very useful years into the future :-)
  • Point-in-Time Recoverydar_manager catalogs let you locate and restore any file to any date across your full archive history
  • Metrics and dashboard - optional detailed metrics and dashboard
  • Can run as a normal user — no root needed; works correctly on FUSE-mounted filesystems. Root is also a first class user.
  • systemd integration — timer units generated for you with sensible default schedules
  • Discord notifications — webhook alerts on backup success or failure, from all CLI tools
  • Shell autocompletion — bash and zsh, context-aware (archive names filtered by definition)
  • Clean logging — three log files (main, command output, trace/debug), all rotating and size-capped; clean-log strips verbose dar output when not needed
  • No lock-in — standard dar archives, standard PAR2 files; restore with just the dar binary, no dar-backup installation required on the restore machine
  • 1000+ tests — unit and integration tests covering PAR2 bitrot repair, full/diff/incr restore chains, PITR verification, and edge cases; CI on every commit

✅ The author has used dar-backup ~6 years and has been saved by it multiple times.

dar-backup stands on the shoulders of two projects that do the real work. Sincere thanks to Denis Corbin for dar, and to the Parchive team for par2. If you find dar-backup useful, consider giving those projects a star too.


Dashboard

Every backup run writes structured metrics to a SQLite database. The built-in dar-backup-dashboard command fires up datasette and opens the dashboard in your browser:

dar-backup metrics dashboard

Dashboard & metrics documentation


High-level architecture

dar-backup overview


🛡️ Built for Disaster Recovery

Unlike traditional backup utilities that fail entirely if a single byte is corrupted, dar-backup splits data processing into independent, verifiable layers:

  1. Deterministic Slicing: Backups are divided into manageable chunk sizes (e.g., 10GB blocks) making transfers over networks or long-term disk sets reliable.
  2. Isolated Parity Directories: par2 recovery files are maintained in a completely separate directory infrastructure from your data slices. If an underlying storage volume experiences filesystem bitrot, the parity blocks can cleanly rebuild damaged .dar slices automatically.
  3. Point-in-Time Recovery (PITR): Historical queries let you restore to the state captured by any backup in your archive history — one PITR per backup run, per calendar day.

Documentation

Document Description
Quick Guide Get started in minutes using the demo app
Getting Started Manual setup for a real installation
Configuration Reference Config file, .darrc, backup definitions, config history
Restoring Point-in-Time Recovery (PITR), restore examples
PAR2 Redundancy Verify, repair, and create PAR2 files
systemd Setup Generate and install systemd timers/services
Shell Autocompletion Bash and zsh tab-completion setup
Dashboard & Metrics Metrics database, Datasette, dashboard
dar Tips File selection, merging archives, logging tips
CLI Reference All command options, exit codes, env vars
Troubleshooting Error codes, FUSE issues, special characters
Development Dev setup, testing, PyPI, building dar
Changelog High-level release history
Detailed Changelog Per-release details

My use case

I needed the following:

  • Backup my workstation to a remote server

  • Backup primarily photos, home made video and different types of documents

  • I have cloud storage mounted on a directory within my home dir. The filesystem is FUSE based, which gives it a few special features

    • Backup my cloud storage (cloud is convenient, but I want control over my backups)
    • A non-privileged user can perform a mount
    • A privileged user cannot look into the filesystem --> a backup script running as root is not suitable
  • Have a simple way of restoring, possibly years into the future. 'dar' fits that scenario with a single statically linked binary (kept with the archives). There is no need install/configure anything - restoring is simple and works well.

  • During backup archives must be tested and a restore test (however small) performed

  • Archives stored on a server with a reliable file system (easy to mount a directory over sshfs)

  • Easy to verify archive's integrity, after being moved around.

I do not need the encryption features of dar, as all storage is already encrypted.

My setup

  1. Primary backup to server with an ext4 file system on mdadm RAID1

  2. Secondary copies to multiple USB disks / cloud

  3. Archive integrity verification anywhere using Par2 and dar -t.

  4. Archive repair anywhere if needed. By default dar-backup creates par2 redundancy files with 5% coverage. Enough to fix localized bitrot.

  5. No dependency on original system

  6. Docker image archived alongside the dar archives

    The dar-backup Docker image packages dar-backup, dar, and par2 in a self-contained environment — a time capsule of the exact tools needed to restore your archives, years from now, without hunting for the right versions or fighting package managers.

    A small helper script,

    save-dar-backup-image.sh, checks the latest released image against build-history.json and saves it as a compressed tar alongside the dar archives on the backup server. Run it as a cron job or systemd timer — it is idempotent and only pulls when a new image is available.

   # Example: run daily via cron
   DOCKER_ARCHIVE_DIR=/mnt/dar/docker-archives ~/.local/bin/save-dar-backup-image.sh

The result: my rsync to USB disks on the storage server picks up the Docker image automatically, so the restore environment travels with the archives onto every offsite copy.

Why PAR2 is especially good for portable / offsite copies

PAR2 parity is:

Self-contained (travels with the data)

Format-agnostic (works on any filesystem)

Location-agnostic (local disk, USB, cloud object storage)

Tool-stable (PAR2 spec has not changed in years)

That means:

Integrity protection moves with the archive.

Design choices

My design choices are boring, proven and pragmatic:

  • mdadm handles disks
  • PAR2 handles data integrity
  • You control when and how verification happens
  • Errors have a fair chance of being diagnosed and fixed, due to well known tooling.
  • No hidden magic, no lock-in

Quick Guide

Step-by-step walkthrough using the built-in demo application — install, backup, list, restore.

Quick Guide


dar-backup principles

dar-backup

dar-backup is built in a way that emphasizes getting backups. It loops over the backup definitions, and in the event of a failure while backing up a backup definition, dar-backup shall log an error and start working on the next backup definition.

There are 3 levels of backups, FULL, DIFF and INCR.

  • The author does a FULL yearly backup once a year. This includes all files in all directories as defined in the backup definition(s) (assuming -d was not given).

  • The author makes a DIFF once a month. The DIFF backs up new and changed files compared to the FULL backup.

    • No DIFF backups are taken until a FULL backup has been taken for a particular backup definition.
  • The author takes an INCR backup every 3 days. An INCR backup includes new and changed files compared to the DIFF backup.

    • So, a set of INCR's will contain duplicates (this might change as I become more used to use the catalog databases)

    • No INCR backups are taken until a DIFF backup has been taken for a particular backup definition.

After each backup of a backup definition, dar-backup tests the archive and then performs a few restore operations of random files from the archive (see config file). The restored files are compared to the originals to check if the restore went well.

dar-backup skips doing a backup of a backup definition if an archive is already in place. So, if you for some reason need to take a new backup on the same date, the first archive must be deleted (I recommend using cleanup).

cleanup

The cleanup application deletes DIFF and INCR if the archives are older than the thresholds set up in the configuration file.

cleanup will only remove FULL archives if the option --cleanup-specific-archives is used. It requires the user to confirm deletion of FULL archives.

Use --dry-run to preview which archives, PAR2 files, and catalogs would be removed without deleting anything.

Examples:

cleanup --dry-run -d media-files --log-stdout
cleanup --dry-run --cleanup-specific-archives -d media-files media-files_INCR_2025-12-22

manager

darhas the concept of catalogs which can be exported and optionally be added to a catalog database. That database makes it much easier to restore the correct version of a backed up file if for example a target date has been set.

dar-backup adds archive catalogs to their databases (using the manager script). Should the operation fail, dar-backup logs an error and continue with testing and restore validation tests.


How to run

Manual setup for a real installation — configuration, catalog databases, first backup.

Getting Started


Status

1.0.0 milestone reached

October 9, 2025, version 1.0.0 was released after extensive testing.

GPG Signing key

To increase the security and authenticity of dar-backup packages, all releases from v2-beta-0.6.18 onwards will be digitally signed using the GPG key below.


🎯 GPG Signing Key Details
Name:        Per Jensen (author of dar-backup)
Email:       dar-backup@pm.me
Primary key: 4592 D739 6DBA EFFD 0845  02B8 5CCE C7E1 6814 A36E
Signing key: B54F 5682 F28D BA36 22D7  8E04 58DB FADB BBAC 1BB1
Created:     2025-03-29
Expires:     2030-03-28
Key type:    ed25519 (primary, SC)
Subkeys:     ed25519 (S), ed25519 (A), cv25519 (E)

🎯 Where to Find Release Signatures

PyPI does Not host .asc Signature Files

Although the dar-backup packages on PyPI are GPG-signed, PyPI itself does not support uploading .asc detached signature files alongside .whl and .tar.gz artifacts.

Therefore, you will not find .asc files on PyPI.

Where to Get .asc Signature Files

You can always download the signed release artifacts and their .asc files from the official GitHub Releases page:

📁 GitHub Releases for dar-backup

Each release includes:

  • dar_backup-x.y.z.tar.gz

  • dar_backup-x.y.z.tar.gz.asc

  • dar_backup-x.y.z-py3-none-any.whl

  • dar_backup-x.y.z-py3-none-any.whl.asc


🎯 How to Verify a Release from GitHub
  1. Import the GPG public key:

    curl https://keys.openpgp.org/vks/v1/by-fingerprint/4592D7396DBAEFFD084502B85CCEC7E16814A36E | gpg --import
  2. Download the wheel or tarball and its .asc signature from the GitHub.

  3. Run GPG to verify it:

    gpg --verify dar_backup-x.y.z.tar.gz.asc dar_backup-x.y.z.tar.gz
    # or
    gpg --verify dar_backup-x.y.z-py3-none-any.whl.asc dar_backup-x.y.z-py3-none-any.whl
  4. If the signature is valid, you'll see:

    gpg: Good signature from "Per Jensen (author of dar-backup) <dar-backup@pm.me>"
    

🛡️ Reminder: Verify the signing subkey

Only this subkey is used to sign PyPI packages:

B54F 5682 F28D BA36 22D7  8E04 58DB FADB BBAC 1BB1

You can view it with:

gpg --list-keys --with-subkey-fingerprints dar-backup@pm.me

License

These scripts are licensed under the GPLv3 license. Read more here: GNU GPL3.0, or have a look at the "LICENSE" file in this repository.

Requirements

  • A linux system
  • dar
  • parchive (par2)
  • python3
  • python3-venv

On Ubuntu, install the requirements this way:

    sudo apt install dar par2 python3 python3-venv

Homepage - Github

'dar-backup' package lives here: Github - dar-backup

Community

Please review the Code of Conduct to help keep this project welcoming and focused.

Projects these scripts benefit from

  1. The wonderful dar achiver
  2. The Parchive suite
  3. shellcheck - a bash linter
  4. Ubuntu of course :-)
  5. PyPI