Skip to content

Extending the JSON plug-in

The following example illustrates how to implement a change analysis for JSON files tailored to specific features of the data being analyzed, by creating a minimal extension of the included JSON plug-in with a more detailed output.

A complete runnable version of the entire script is available at the end of this section, and in the examples/json directory of the Dac-Man source code repository, together with the example data (shown immediately below) as two separate json files.

Test data

In this example, we perform an analysis of the changes between the files a.json and b.json:

a.json

{
   "name": "John Doe",
   "age": "30",
   "cars": {
      "car1": {
         "brand": "Ford",
         "color":  "white"
      }
   }
}

b.json

{
   "age": "33",
   "name": "Jane Doe",
   "cars": {
      "car1": {
         "brand": "",
         "color":  "black"
      },
      "car2": {
         "brand": "Tesla",
         "color": "white"
      }
   },
   "pets": {}
}

The example files contain few lines, allowing us to visually inspect the differences. - b.json contains an additional key "pets" - The values for name, age and, cars differ

Creating a specialized comparator class

In the following, we'll build an extension of the JSON plug-in tailored to the structure of this source data, and how to integrate it in a custom analysis script. We begin by creating a comparator subclass implementing our customizations by extending the JSONPlugin class:

from dacman.plugins.json import JSONPlugin

class MyJSONPlugin(JSONPlugin):
    ...

Setting the output detail level from the header

The only customization we are specifying here is varying the detail level for the outputs of the JSON plug-in.

Creating a runnable change analysis script

Creating the __main__ block

We start from creating the skeleton of the analysis script in a Python file, for instance, /home/user/my_json_analysis.py. Dac-Man analysis scripts are required to accept two command-line arguments, where arguments are the file paths to be compared.

import sys

if __name__ == '__main__':
    cli_args = sys.argv[1:]
    print(f'cli_args={cli_args}')
    file_a, file_b = cli_args[0], cli_args[1]

Implementing the change analysis with Dac-Man's API

Next, we create a Python function run_my_change_analysis(file_a, file_b) taking in the two file paths as arguments, and implementing the custom change analysis using Dac-Man's API. This allows us to integrate our customized comparator class while reusing much of the functionality provided by Dac-Man.

import sys
import dacman

def run_my_change_analysis(file_a, file_b):
    comparisons = [(file_a, file_b)]
    differ = dacman.DataDiffer(comparisons, dacman.Executor.DEFAULT)


if __name__ == '__main__':
    cli_args = sys.argv[1:]
    print(f'cli_args={cli_args}')
    file_a, file_b = cli_args[0], cli_args[1]
    run_my_change_analysis(file_a, file_b)

Integrating our custom comparator in the change analysis

The next step is to add the code for our custom comparator class MyJSONPlugin and set it as the plug-in to use for the comparison:

import sys
import dacman
from dacman.plugins.json import JSONPlugin


class MyJSONPlugin(JSONPlugin):
      output_options = {'detail_level': 2}


def run_my_change_analysis(file_a, file_b):
    comparisons = [(file_a, file_b)]
    differ = dacman.DataDiffer(comparisons, dacman.Executor.DEFAULT)
    differ.use_plugin(MyJSONPlugin)
    differ.start()


if __name__ == '__main__':
    cli_args = sys.argv[1:]
    print(f'cli_args={cli_args}')
    file_a, file_b = cli_args[0], cli_args[1]
    run_my_change_analysis(file_a, file_b)

Making the change analysis script executable

Finally, we make the script executable by adding the "shebang" line at the top, and using the chmod command to add executable permissions:

#!/usr/bin/env python3
chmod +x /home/user/my_json_analysis.py

Testing the custom change analysis

To test this change analysis with Dac-Man, navigate to the examples/json directories and run:

dacman diff a.json b.json --script /home/user/my_json_analysis.py

Tip

A runnable copy of this file is available in examples/json/my_json_analysis.py

Final Output

The final output from executing the script above is as follows:

cli_args=['a.json', 'b.json']
[INFO] Sequentially comparing dataset pairs.
Data comparator plugin = MyJSONPlugin
[INFO] Comparing a.json and b.json using MyJSONPlugin

Contents in a.json denoted with "+"
Contents in b.json denoted with "-"


-  pets: {
-  }
+  name: John Doe
-  name: Jane Doe
  cars: {
-    car2: {
-      brand: Tesla
-      color: white
-    }
    car1: {
+      brand: Ford
-      brand:
+      color: white
-      color: black
    }
  }
+  age: 30
-  age: 33
[INFO] --- Using custom detail level 2

Level 0 detail:
        a.json has -66.67% less keys than b.json


Level 1 detail:
        Total number of keys in a.json: 6
        Total number of keys in b.json: 10
        Total number of overlapping keys in both files: 6


Level 2 detail:
        JSON level 0 has
                0 unique keys for a.json
                1 unique keys for b.json
                3 keys shared between files
        JSON level 1 has
                3 unique keys for b.json
                0 unique keys for a.json
                3 keys shared between files
[INFO] Data comparison complete.