view README.md @ 116:d2a7c0913ef1 default tip

Add project README
author Tom Fredrik Blenning <bfg@bfgconsult.no>
date Wed, 20 May 2026 23:25:34 +0200
parents
children
line wrap: on
line source

# DeDupe

Qt/C++ file indexer and duplicate browser.

DeDupe scans a directory tree, records file metadata in a SQLite database, and
helps find likely duplicate files by checksum, name, size, modification time, or
filename edit distance. It includes both a Qt GUI (`DeDupe.App`) for browsing
and deleting duplicates, and command-line helpers for updating and querying the
index.

## Features

- Recursively indexes regular files without following symlinks.
- Stores paths, sizes, modification times, and SHA-1 checksums in
  `~/.DeDupe.sqlite`.
- Avoids recomputing hashes for files whose modification time has not changed.
- Removes database entries for files that no longer exist under the scanned
  prefix.
- GUI filters for duplicate name, size, modification time, checksum, and similar
  names by edit distance.
- Double-click opens a file with the desktop default application.
- Context-menu delete removes a file from disk and updates the database.
- Shell scripts report duplicate sets, duplicate statistics, and directory
  comparison results from the SQLite database.

## Requirements

The build is based on CMake and older Qt 4-era dependencies:

- C++ compiler, tested by the build files as `g++`.
- CMake 2.6.4 or newer.
- Qt 4 with QtGui, QtXml, QtSql, and QtOpenGL.
- SQLite 3 development headers and runtime.
- Boost filesystem, system, and test components.
- Optional `ccache`.
- Optional `lcov` and `genhtml` for coverage targets.

On Debian-like systems, `setup.sh` attempts to install the main dependencies:

```sh
./setup.sh
```

## Building

Use an out-of-tree build directory:

```sh
mkdir build
cd build
../setup.sh
make
```

The main build products are:

- `DeDupe.App` - Qt GUI application.
- `updateDeDupe` - command-line index updater.
- `DeDupe` - project library used by the apps and tests.

## Updating the Index

Index the current directory:

```sh
./updateDeDupe
```

Index one or more explicit paths:

```sh
./updateDeDupe /path/to/photos /path/to/archive
```

The index is stored in `~/.DeDupe.sqlite` by default. Paths are canonicalized
before indexing.

## Browsing Duplicates

Run the GUI from the build directory:

```sh
./DeDupe.App
```

The GUI scans the current directory, updates the database, and then shows
possible duplicates. The toolbar controls which signals are used: name, size,
modification time, checksum, and edit distance. The "Show full path" menu item
switches between filenames and full paths.

## Command-Line Reports

The helper scripts query `~/.DeDupe.sqlite` directly:

```sh
scripts/duplicates.sh /path/prefix
scripts/duplicates.sh -s /path/prefix
scripts/statistics.sh /path/prefix
scripts/dircompare.sh /path/one /path/two
```

`duplicates.sh -s` strips the supplied prefix from the displayed paths.

## Tests and Coverage

The CMake project defines unit-test executables for the database layer,
bit-array/bit-decoder code, edit distance, Huffman structures, exception types,
and other core classes.

After building, run tests through CTest:

```sh
ctest
```

Coverage support can be enabled with:

```sh
cmake -DCOVERAGE=ON ..
make coverage_presentation
```

## Repository Status

This is a Mercurial repository. The code targets an older Qt 4/CMake toolchain,
so modern systems may need compatibility packages or small build adjustments.