Mercurial > dedupe
view README.md @ 116:d2a7c0913ef1 default tip
Add project README
| author | Tom Fredrik Blenning <bfg@bfgconsult.no> |
|---|---|
| date | Wed, 20 May 2026 23:25:34 +0200 |
| parents | |
| children |
line wrap: on
line source
# DeDupe Qt/C++ file indexer and duplicate browser. DeDupe scans a directory tree, records file metadata in a SQLite database, and helps find likely duplicate files by checksum, name, size, modification time, or filename edit distance. It includes both a Qt GUI (`DeDupe.App`) for browsing and deleting duplicates, and command-line helpers for updating and querying the index. ## Features - Recursively indexes regular files without following symlinks. - Stores paths, sizes, modification times, and SHA-1 checksums in `~/.DeDupe.sqlite`. - Avoids recomputing hashes for files whose modification time has not changed. - Removes database entries for files that no longer exist under the scanned prefix. - GUI filters for duplicate name, size, modification time, checksum, and similar names by edit distance. - Double-click opens a file with the desktop default application. - Context-menu delete removes a file from disk and updates the database. - Shell scripts report duplicate sets, duplicate statistics, and directory comparison results from the SQLite database. ## Requirements The build is based on CMake and older Qt 4-era dependencies: - C++ compiler, tested by the build files as `g++`. - CMake 2.6.4 or newer. - Qt 4 with QtGui, QtXml, QtSql, and QtOpenGL. - SQLite 3 development headers and runtime. - Boost filesystem, system, and test components. - Optional `ccache`. - Optional `lcov` and `genhtml` for coverage targets. On Debian-like systems, `setup.sh` attempts to install the main dependencies: ```sh ./setup.sh ``` ## Building Use an out-of-tree build directory: ```sh mkdir build cd build ../setup.sh make ``` The main build products are: - `DeDupe.App` - Qt GUI application. - `updateDeDupe` - command-line index updater. - `DeDupe` - project library used by the apps and tests. ## Updating the Index Index the current directory: ```sh ./updateDeDupe ``` Index one or more explicit paths: ```sh ./updateDeDupe /path/to/photos /path/to/archive ``` The index is stored in `~/.DeDupe.sqlite` by default. Paths are canonicalized before indexing. ## Browsing Duplicates Run the GUI from the build directory: ```sh ./DeDupe.App ``` The GUI scans the current directory, updates the database, and then shows possible duplicates. The toolbar controls which signals are used: name, size, modification time, checksum, and edit distance. The "Show full path" menu item switches between filenames and full paths. ## Command-Line Reports The helper scripts query `~/.DeDupe.sqlite` directly: ```sh scripts/duplicates.sh /path/prefix scripts/duplicates.sh -s /path/prefix scripts/statistics.sh /path/prefix scripts/dircompare.sh /path/one /path/two ``` `duplicates.sh -s` strips the supplied prefix from the displayed paths. ## Tests and Coverage The CMake project defines unit-test executables for the database layer, bit-array/bit-decoder code, edit distance, Huffman structures, exception types, and other core classes. After building, run tests through CTest: ```sh ctest ``` Coverage support can be enabled with: ```sh cmake -DCOVERAGE=ON .. make coverage_presentation ``` ## Repository Status This is a Mercurial repository. The code targets an older Qt 4/CMake toolchain, so modern systems may need compatibility packages or small build adjustments.
