DMF Tabular Data

The DMF has some classes and methods specifically for adding tabular data, e.g. data from a spreadsheet. Before importing external data, you will usually convert it to comma-separated values (CSV), although the DMF also has its own internal JSON format that you could use. There are two related modules for handling tabular data. The tabular module is designed for any set of labeled columns, whereas the propdata module has some additional bells and whistles for IDAES “property” data. In the description that follows, both will be dealt with together, with differences highlighted in the text and notes.

Note

In the Python code, the propdata.PropertyTable class inherits from the base tabular.Table class, meaning that methods expecting a Table can also take a PropertyTable object.

Tabular data can be imported into Python objects from comma-separated values (CSV) files, JSON files, or Python dictionaries. Data files can be merged and the data can be changed through the Python API. From the Python objects, the data can be exported to CSV or JSON, or saved to the DMF as a TabularDataResource (for property data, PropertyDataResource).

Steps to get new property data into the DMF:

  1. Initialize the DMF (see the “Initialize” section of the API examples).
  2. Load the data into a tabular.Table (for property data, propdata.PropertyTable) object, using the import methods for CSV, JSON, or a Python dictionary.
  3. Create a TabularDataResource (for property data, PropertyDataResource), passing the object from the previous step into its constructor.
  4. Add the new resource to the DMF with dmf.DMF.add().

Below are more details about importing from different data sources.

Import from CSV

The expected form of CSV data is one or two files: one for metadata, the other for the data. The metadata indicates the provenance of the data, e.g., a publication or website.

Note

For property data, the metadata is especially important to identify the associated sources of the data.

Tabular data example

Here is a simple example of importing from CSV to a Table object, and from there to the DMF:

from idaes_dmf import dmf, resource, tabular

# Note: Working directory is inside the DMF `examples` tree

# Initialize the DMF in the current directory,
# creating '.dmf' tree if necessary.
d = dmf.DMF(create=True)

# Load tabular data
pd = tabular.TabularData.from_csv('wwc_input_liq_vap.csv')

# Load tabular metadata
pm = tabular.Metadata.from_csv('wwc_input_liq_vap_sources.csv')

# Create table with data/metadata combined
tbl = tabular.Table(data=pd, metadata=pm)

# Make tabular resource with table
# This will automatically save the DMF representation to
# a temporary file
rsrc = resource.TabularDataResource(tbl)

# Add the resource to the DMF
# This will copy the temporary file with the converted
# tabular data into the DMF workspace.
# (Note that for very large files this behavior can be
#  overridden to avoid the copy overhead.)
d.add(rsrc)

Property data example

This is the same as the example above, but for property data, import using PropertyTable:

from idaes_dmf import dmf, resource, propdata

# Note: Working directory is inside the DMF `examples` tree

# Initialize the DMF in the current directory,
# creating '.dmf' tree if necessary.
d = dmf.DMF(create=True)

# Load property data
# `12` is number of non-measurement columns, i.e. most of them
pd = propdata.PropertyData.from_csv('wwc_input_liq_vap.csv', 12)

# Load property metadata
# `MEA` is the user-selected "type" of the property data
pm = propdata.PropertyMetadata.from_csv('wwc_input_liq_vap_sources.csv', 'MEA')

# Create PropertyTable from data and metadata
tbl = propdata.PropertyTable(data=pd, metadata=pm)

# Create resource from PropertyTable.
#
# At this point, you can modify the resource metadata to reflect
# links to other objects, aliases, tags, a detailed description, etc.
rsrc = resource.PropertyDataResource(tbl)

# Add the resource to the DMF
d.add(rsrc)

CSV Formats

Tabular metadata

The metadata file has two columns, name and value. One name, “Source”, is required. All other names are user-defined. Below is an example CSV metadata file (line continuations added for legibility).

Source,Han, J., Jin, J., Eimer, D.A., Melaaen, M.C.,"Density of \
Water(1) + Monoethanolamine(2) + CO2(3) from (298.15 to 413.15) K\
and Surface Tension of Water(1) + Monethanolamine(2) from ( \
303.15 to 333.15)K", J. Chem. Eng. Data, 2012, Vol. 57, \
pg. 1095-1103"
Notes,r is MEA weight fraction in aqueous soln. (CO2-free basis)

This might be exported from a spreadsheet that looks like the following:

Sample Property Metadata
Name Value
Source Han, J., Jin, J., Eimer, D.A., Melaaen, M.C.,”Density of Water(1) + Monoethanolamine(2) + CO2(3) from (298.15 to 413.15) K and Surface Tension of Water(1) + Monethanolamine(2) from ( 303.15 to 333.15)K”, J. Chem. Eng. Data, 2012, Vol. 57 pg. 1095-1103
Notes r is MEA weight fraction in aqueous soln. (CO2-free basis)

The “Source” will be parsed according to the regular expression in the source_expr attribute of tabular.Metadata, which by default is just three parts: {authors}, “title”, {venue and year}. The year is inferred from a comma followed by optional whitespace and 4 digits.

Tabular data

The data file is a standard CSV file, with a “header” row followed by one or more value rows. The first column is always an identifier, whose value is largely ignored. Columns can either be interpreted as values or pairs of <value column>,<error column>. The default behavior differs depending on whether you are importing a generic table or property data:

  • For a generic table, the default is to see each column as a value. You can pass “error_column=True” to change this.
  • For property data, error columns are the default. You need to pass error_column=False to change this.

Note

For property data, some number of initial column pairs are called “states”, which means they are the independent variables, and the remaining columns are called “properties”, which are dependent on the values for the states.

Tabular CSV data example

Thus, the overall layout of the file is like this, for generic tabular data:

id, value1, value2, ...

OR

id, value1, error1, value2, error2, ...

Property CSV data example

The overall layout is like this, for property data:

id, state-value1, state-error1, state-value2, state-error2, ..., \
    prop-value1, prop-error1, prop-value2, prop-error2, ...

For example, here is a property data CSV with two data rows, and two columns: one state value and one property value column.

Data No.,T (K),Absolute Error,Density (g/cm3),Absolute Error
1,303.15,0,0.2,0
2,304.15,0,0.3,0

And in a spreadsheet, this example may look like this:

Sample Property Data
Data No. T (K) Absolute Error Density (g/cm3) Absolute Error
1 303.15 0 0.2 0
2 304.15 0 0.3 0

The format for the header columns is:

For state/property value colummns
<value name> (<units>), where the units are optional but recommended. Example: “Density (g/cm3)”.
For state/property error columns
<type of error> Error. Example: “Absolute Error”.

Import from JSON

The JSON representation of the tabular data, unlike the CSV representation, combines the data and metadata in a single file.

JSON import example

Here is a simple example of importing tabular data (or property data) from JSON and adding to the DMF:

from idaes_dmf import dmf, resource, propdata, tabular

# Note: Working directory is inside the DMF `examples` tree

# Initialize the DMF in the current directory,
# creating '.dmf' tree if necessary.
d = dmf.DMF(create=True)

# Create generic Table from JSON file
tbl = tabular.Table.load(open('wwc_input_liq_vap.json'))
#
# OR
#
# Create PropertyTable from (same) JSON file
tbl2 = propdata.PropertyTable.load(open('wwc_input_liq_vap.json'))

# Create resource from PropertyTable.
#
# At this point, you can modify the resource metadata to reflect
# links to other objects, aliases, tags, a detailed description, etc.
rsrc = resource.PropertyDataResource(tbl)

# Add the resource to the DMF
d.add(rsrc)

Note

Because the JSON format explicitly labels the “states” and “properties” columns, and separates error values from non-error vallues, the only real difference between importing as PropertyTable and a tabular.Table is the type of the objects that are returned.

JSON format

The “schema” of the JSON format is shown at the end of this section. But, since nobody likes reading schemas, here is a JSON version of the CSV metadata and data given above (truncated to six rows).

{
  "data": [
    {
      "name": "T",
      "units": "K",
      "values": [
        298.15,
        298.15,
        298.15,
        298.15,
        298.15,
        313.15,
      ],
      "errors": [
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
      ],
      "error_type": "absolute",
      "type": "property"
    },
    {
      "name": "CO2 Loading",
      "units": "mol CO2/MEA",
      "values": [
        0.1,
        0.21,
        0.32,
        0.44,
        0.56,
        0.1,
      ],
      "errors": [
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
      ],
      "error_type": "absolute",
      "type": "property"
    },
    {
      "name": "r",
      "units": "",
      "values": [
        0.3,
        0.3,
        0.3,
        0.3,
        0.3,
        0.3,
      ],
      "errors": [
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
      ],
      "error_type": "absolute",
      "type": "property"
    },
    {
      "name": "Density Data",
      "units": "g/cm3",
      "values": [
        1.0333,
        1.0534,
        1.0756,
        1.0964,
        1.1142,
        1.0253,
      ],
      "errors": [
        5e-05,
        5e-05,
        5e-05,
        5e-05,
        5e-05,
        5e-05,
      ],
      "error_type": "absolute",
      "type": "property"
    }
  ],
  "meta": [
    {
      "authors": "Han, J., Jin, J., Eimer, D.A., Melaaen, M.C.",
      "date": "1970-01-01",
      "title": "Density of Water(1) + Monoethanolamine(2) + CO2(3) from (298.15 to 413.15) K and Surface Tension of Water(1) + Monethanolamine(2) from (303.15 to 333.15)K",
      "info": "J. Chem. Eng. Data, 2012, Vol. 57, pg. 1095-1103",
      "retrieval": "J. Morgan, date unknown",
      "notes": "r is MEA weight fraction in aqueous soln. (CO2-free basis)"
    }
  ]
}

Import from Python dictionary

Creating a new table from a Python dictionary is similar to importing from JSON, except in this case the data and metadata dicts are created separately from Python dicts, and then combined.

Import from dictionary example

The import API is shown in the following example:

from idaes_dmf import dmf, resource, tabular, propdata

 # Note: Working directory is inside the DMF `examples` tree

 # Initialize the DMF in the current directory,
 # creating '.dmf' tree if necessary.
 d = dmf.DMF(create=True)

 # Create PropertyTable from two inputs:
 #  - `data` is a dict in the same format as the 'data' secion of the JSON
 #    schema
 #  - `metadata` is a list of metadata dicts, just like the `meta`
 #    section of the JSON schema.
 tbl = tabular.Table(data=data, metadata=metadata)
 #
 # OR, for property data:
 tbl = propdata.PropertyTable(data=data, metadata=metadata)

 # Add the property data resource to the DMF
 d.add(resource.PropertyDataResource(tbl))

JSON Schemas

Schema (HTML version)

schema
  • meta List of information about the data source
    List of:
    • datatype string name of the data type
      Example:
      • MEA
    • info string Additional information about the source (i.e. publication)
      Example:
      • J. Chem. Eng. Data, 2009, Vol 54, pg. 3096-30100
    • notes string Free-form text with notes about the data
      Example:
      • r is MEA weight fraction in aqueous soln.
    • authors string Author list in format Last1, First1, Last2, First2, etc.
      Example:
      • Amundsen, T.G., Lars, E.O., Eimer, D.A.
    • title string Title of the source (e.g. publication title)
      Example:
      • Density and Viscosity of Monoethanolamine + .etc.
    • date string Date of source data
      Example:
      • 2009
  • data Measured data columns
    List of:
    • name string
      Examples:
      • Density
      • r
    • units string
      Examples:
      • mPa-s
      • K
    • values Column of numeric values
      List of:
      number
    • error_type string Type for error values
    • errors Column of numeric errors
      List of:
      number

Schema (raw JSON version)

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "http://idaes.org",
  "definitions": {
    "Column": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "examples": [
            "Density",
            "r"
          ]
        },
        "units": {
          "type": "string",
          "examples": [
            "mPa-s",
            "K"
          ]
        },
        "values": {
          "description": "Column of numeric values",
          "type": "array",
          "items": {
            "type": "number"
          },
          "examples": [
            "[2.6, 6.21]"
          ]
        },
        "error_type": {
          "description": "Type for error values",
          "type": "string"
        },
        "errors": {
          "description": "Column of numeric errors",
          "type": "array",
          "items": {
            "type": "number"
          },
          "examples": [
            "[0.001, 0.035]"
          ]
        },
        "type": {
          "description": "Type of column",
          "enum": [
            "state",
            "property"
          ]
        }
      },
      "required": [
        "name",
        "units",
        "values"
      ],
      "additionalProperties": false
    },
    "Metadata": {
      "type": "object",
      "properties": {
        "datatype": {
          "description": "name of the data type",
          "type": "string",
          "examples": [
            "MEA"
          ]
        },
        "info": {
          "description": "Additional information about the source (i.e. publication)",
          "type": "string",
          "examples": [
            "J. Chem. Eng. Data, 2009, Vol 54, pg. 3096-30100"
          ]
        },
        "notes": {
          "description": "Free-form text with notes about the data",
          "type": "string",
          "examples": [
            "r is MEA weight fraction in aqueous soln."
          ]
        },
        "authors": {
          "description": "Author list in format Last1, First1, Last2, First2, etc.",
          "type": "string",
          "examples": [
            "Amundsen, T.G., Lars, E.O., Eimer, D.A."
          ]
        },
        "title": {
          "description": "Title of the source (e.g. publication title)",
          "type": "string",
          "examples": [
            "Density and Viscosity of Monoethanolamine + .etc."
          ]
        },
        "date": {
          "description": "Date of source data",
          "type": "string",
          "examples": [
            "2009"
          ]
        }
      },
      "required": [
        "datatype",
        "authors",
        "title",
        "date"
      ],
      "additionalProperties": true
    }
  },
  "type": "object",
  "properties": {
    "meta": {
      "description": "List of information about the data source",
      "type": "array",
      "items": {
        "$ref": "#/definitions/Metadata"
      }
    },
    "data": {
      "description": "Measured data columns",
      "type": "array",
      "items": {
        "$ref": "#/definitions/Column"
      }
    }
  },
  "required": [
    "meta",
    "data"
  ],
  "additionalProperties": false
}