Skip to content

Dac-Man Commands and Output

Command-line

Dac-Man enables change capture and analysis with four simple steps, providing users with flexibility when identifying and capturing changes. Dac-Man provides four command-line options to manage each of these steps separately.

scan

This command scans and saves the directory structure and other metadata related to a data path. You can specify an optional staging directory, where the metadata information will be saved.

dacman scan <path> [-s STAGINGDIR] [-i [IGNORE [IGNORE ...]]] [--nonrecursive] [--symlinks]

The options to this command are:

Option Meaning
-s STAGINGDIR Directory where filesystem metadata and indexes are saved
-i [IGNORE [IGNORE ...]] List of file types to be ignored
--nonrecursive Do not scan the directory contents recursively
--symlinks Include symbolic links

index

This command indexes the files, mapping the files to their contents.

dacman index <path> [-s STAGINGDIR] [-m python,tigres,mpi]

The options to this command are:

Option Meaning
-s STAGINGDIR Directory where filesystem metadata and indexes are saved
-m python,tigres,mpi Index manager for parallelizing the index creation. Possible values are python, mpi and tigres. By default, it uses the Python multiprocessing module (manager=python) that is suitable for parallelizing on a single node. For multi-node parallelism, users can select between MPI (manager=mpi) or tigres (manager=tigres)

compare

This command examines and calculates the different types of changes between two datapaths.

dacman <oldpath> <newpath> [-s STAGINGDIR]

The options to this command are:

Option Meaning
-s STAGINGDIR Directory where filesystem metadata and indexes are saved

diff

This command retrieves changes between two datapaths.

dacman diff <oldpath> <newpath> [-s STAGINGDIR] [-o OUTDIR] [--script SCRIPT] [--datachange] [-e default,threaded,mpi,tigres]

The options to this command are:

Option Meaning
-s STAGINGDIR Directory where filesystem metadata and indexes are saved
-o OUTDIR Directory where the summary of changes is saved
--script SCRIPT User-defined script for analyzing data changes
--datachange Calculate data-level changes in addition to file-level changes
-e default,threaded,tigres,mpi Type of executor (or runtime) for parallel data change capture. The options are: default, threaded, tigres, mpi. The default option uses single-threaded execution. The threaded option uses the Python multiprocessing module that is suitable for parallelizing on one node. For multi-node parallelism, users can select between MPI or tigres.

In addition to these four commands, Dac-Man also provides two additional commands for cleanup and metadata management.

clean

This option removes all the indexes and cache information associated with the specified directories.

dacman clean <path> [path ...]

The arguments to this command are:

Option Meaning
path Path to data directories

metadata

This command allows users to add user-defined metadata for a data directory.

dacman metadata [-m METADATA] [-s STAGINGDIR] insert,retrieve,append <datapath>

The options to this command are:

Option Meaning
-s STAGINGDIR Directory where filesystem metadata and indexes are saved
-m METADATA User-defined metadata information
insert,retrieve,append Options related to user-defined metadata information
datapath Path to the data directory

Outputs

Dac-Man prints the summary of changes on standard output. The summary lists the number of changes between two datasets.

An example output looks like below:

Added: 1, Deleted: 1, Modified: 1, Metadata-only: 0, Unchanged: 1

You can opt to save a more detailed output by specifying the output directory where the detailed change information will be saved:

dacman diff /path/to/old/data /path/to/new/data -o output

The output/ directory contains a list of files with detailed information about the changes. It also contains a summary of the change information as:

# output/summary

counts:
  added: 1
  deleted: 1
  metaonly: 0
  modified: 1
  unchanged: 1
versions:
  base:
    dataset_id: /path/to/old/data
    nfiles: 3
  revision:
    dataset_id: /path/to/new/data
    nfiles: 3