DMF Tabular Data¶
The DMF has some classes and methods specifically for adding tabular data,
e.g. data from a spreadsheet. Before importing external data, you will
usually convert it to comma-separated values (CSV), although the DMF also
has its own internal JSON format that you could use. There are two
related modules for handling tabular data. The tabular
module
is designed for any set of labeled columns, whereas the propdata
module has some additional bells and whistles for IDAES “property” data.
In the description that follows, both will be dealt with together, with
differences highlighted in the text and notes.
Note
In the Python code, the propdata.PropertyTable
class
inherits from the base tabular.Table
class, meaning that
methods expecting a Table can also take a PropertyTable object.
Tabular data can be imported into Python
objects from comma-separated values (CSV) files, JSON files, or
Python dictionaries. Data files
can be merged and the data can be changed through the Python API.
From the Python objects, the data can be exported to CSV or JSON, or saved
to the DMF as a
TabularDataResource
(for property data, PropertyDataResource
).
Steps to get new property data into the DMF:
- Initialize the DMF (see the “Initialize” section of the API examples).
- Load the data into a
tabular.Table
(for property data,propdata.PropertyTable
) object, using the import methods for CSV, JSON, or a Python dictionary.- Create a
TabularDataResource
(for property data,PropertyDataResource
), passing the object from the previous step into its constructor.- Add the new resource to the DMF with
dmf.DMF.add()
.
Below are more details about importing from different data sources.
Import from CSV¶
The expected form of CSV data is one or two files: one for metadata, the other for the data. The metadata indicates the provenance of the data, e.g., a publication or website.
Note
For property data, the metadata is especially important to identify the associated sources of the data.
Tabular data example¶
Here is a simple example of importing from CSV to a Table
object,
and from there to the DMF:
from idaes_dmf import dmf, resource, tabular
# Note: Working directory is inside the DMF `examples` tree
# Initialize the DMF in the current directory,
# creating '.dmf' tree if necessary.
d = dmf.DMF(create=True)
# Load tabular data
pd = tabular.TabularData.from_csv('wwc_input_liq_vap.csv')
# Load tabular metadata
pm = tabular.Metadata.from_csv('wwc_input_liq_vap_sources.csv')
# Create table with data/metadata combined
tbl = tabular.Table(data=pd, metadata=pm)
# Make tabular resource with table
# This will automatically save the DMF representation to
# a temporary file
rsrc = resource.TabularDataResource(tbl)
# Add the resource to the DMF
# This will copy the temporary file with the converted
# tabular data into the DMF workspace.
# (Note that for very large files this behavior can be
# overridden to avoid the copy overhead.)
d.add(rsrc)
Property data example¶
This is the same as the example above, but for property data,
import using PropertyTable
:
from idaes_dmf import dmf, resource, propdata
# Note: Working directory is inside the DMF `examples` tree
# Initialize the DMF in the current directory,
# creating '.dmf' tree if necessary.
d = dmf.DMF(create=True)
# Load property data
# `12` is number of non-measurement columns, i.e. most of them
pd = propdata.PropertyData.from_csv('wwc_input_liq_vap.csv', 12)
# Load property metadata
# `MEA` is the user-selected "type" of the property data
pm = propdata.PropertyMetadata.from_csv('wwc_input_liq_vap_sources.csv', 'MEA')
# Create PropertyTable from data and metadata
tbl = propdata.PropertyTable(data=pd, metadata=pm)
# Create resource from PropertyTable.
#
# At this point, you can modify the resource metadata to reflect
# links to other objects, aliases, tags, a detailed description, etc.
rsrc = resource.PropertyDataResource(tbl)
# Add the resource to the DMF
d.add(rsrc)
CSV Formats¶
Tabular metadata¶
The metadata file has two columns, name and value. One name, “Source”, is required. All other names are user-defined. Below is an example CSV metadata file (line continuations added for legibility).
Source,Han, J., Jin, J., Eimer, D.A., Melaaen, M.C.,"Density of \
Water(1) + Monoethanolamine(2) + CO2(3) from (298.15 to 413.15) K\
and Surface Tension of Water(1) + Monethanolamine(2) from ( \
303.15 to 333.15)K", J. Chem. Eng. Data, 2012, Vol. 57, \
pg. 1095-1103"
Notes,r is MEA weight fraction in aqueous soln. (CO2-free basis)
This might be exported from a spreadsheet that looks like the following:
Name | Value |
---|---|
Source | Han, J., Jin, J., Eimer, D.A., Melaaen, M.C.,”Density of Water(1) + Monoethanolamine(2) + CO2(3) from (298.15 to 413.15) K and Surface Tension of Water(1) + Monethanolamine(2) from ( 303.15 to 333.15)K”, J. Chem. Eng. Data, 2012, Vol. 57 pg. 1095-1103 |
Notes | r is MEA weight fraction in aqueous soln. (CO2-free basis) |
The “Source” will be parsed according to the regular expression in
the source_expr
attribute of tabular.Metadata
, which by default
is just three parts: {authors}, “title”, {venue and year}. The year
is inferred from a comma followed by optional whitespace and 4 digits.
Tabular data¶
The data file is a standard CSV file, with a “header” row followed
by one or more value rows. The first column is always an identifier,
whose value is largely ignored. Columns can either be interpreted as values or pairs
of <value column>,<error column>
.
The default behavior differs
depending on whether you are importing a generic table or property data:
- For a generic table, the default is to see each column as a value. You can pass “error_column=True” to change this.
- For property data, error columns are the default. You need to pass error_column=False to change this.
Note
For property data, some number of initial column pairs are called “states”, which means they are the independent variables, and the remaining columns are called “properties”, which are dependent on the values for the states.
Tabular CSV data example¶
Thus, the overall layout of the file is like this, for generic tabular data:
id, value1, value2, ...
OR
id, value1, error1, value2, error2, ...
Property CSV data example¶
The overall layout is like this, for property data:
id, state-value1, state-error1, state-value2, state-error2, ..., \
prop-value1, prop-error1, prop-value2, prop-error2, ...
For example, here is a property data CSV with two data rows, and two columns: one state value and one property value column.
Data No.,T (K),Absolute Error,Density (g/cm3),Absolute Error
1,303.15,0,0.2,0
2,304.15,0,0.3,0
And in a spreadsheet, this example may look like this:
Data No. | T (K) | Absolute Error | Density (g/cm3) | Absolute Error |
---|---|---|---|---|
1 | 303.15 | 0 | 0.2 | 0 |
2 | 304.15 | 0 | 0.3 | 0 |
The format for the header columns is:
- For state/property value colummns
<value name> (<units>)
, where the units are optional but recommended. Example: “Density (g/cm3)”.- For state/property error columns
<type of error> Error
. Example: “Absolute Error”.
Import from JSON¶
The JSON representation of the tabular data, unlike the CSV representation, combines the data and metadata in a single file.
JSON import example¶
Here is a simple example of importing tabular data (or property data) from JSON and adding to the DMF:
from idaes_dmf import dmf, resource, propdata, tabular
# Note: Working directory is inside the DMF `examples` tree
# Initialize the DMF in the current directory,
# creating '.dmf' tree if necessary.
d = dmf.DMF(create=True)
# Create generic Table from JSON file
tbl = tabular.Table.load(open('wwc_input_liq_vap.json'))
#
# OR
#
# Create PropertyTable from (same) JSON file
tbl2 = propdata.PropertyTable.load(open('wwc_input_liq_vap.json'))
# Create resource from PropertyTable.
#
# At this point, you can modify the resource metadata to reflect
# links to other objects, aliases, tags, a detailed description, etc.
rsrc = resource.PropertyDataResource(tbl)
# Add the resource to the DMF
d.add(rsrc)
Note
Because the JSON format explicitly labels the “states” and “properties” columns, and separates error values from non-error vallues, the only real difference between importing as PropertyTable and a tabular.Table is the type of the objects that are returned.
JSON format¶
The “schema” of the JSON format is shown at the end of this section. But, since nobody likes reading schemas, here is a JSON version of the CSV metadata and data given above (truncated to six rows).
{
"data": [
{
"name": "T",
"units": "K",
"values": [
298.15,
298.15,
298.15,
298.15,
298.15,
313.15,
],
"errors": [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
],
"error_type": "absolute",
"type": "property"
},
{
"name": "CO2 Loading",
"units": "mol CO2/MEA",
"values": [
0.1,
0.21,
0.32,
0.44,
0.56,
0.1,
],
"errors": [
0.0,
0.0,
0.0,
0.0,
0.0,
],
"error_type": "absolute",
"type": "property"
},
{
"name": "r",
"units": "",
"values": [
0.3,
0.3,
0.3,
0.3,
0.3,
0.3,
],
"errors": [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
],
"error_type": "absolute",
"type": "property"
},
{
"name": "Density Data",
"units": "g/cm3",
"values": [
1.0333,
1.0534,
1.0756,
1.0964,
1.1142,
1.0253,
],
"errors": [
5e-05,
5e-05,
5e-05,
5e-05,
5e-05,
5e-05,
],
"error_type": "absolute",
"type": "property"
}
],
"meta": [
{
"authors": "Han, J., Jin, J., Eimer, D.A., Melaaen, M.C.",
"date": "1970-01-01",
"title": "Density of Water(1) + Monoethanolamine(2) + CO2(3) from (298.15 to 413.15) K and Surface Tension of Water(1) + Monethanolamine(2) from (303.15 to 333.15)K",
"info": "J. Chem. Eng. Data, 2012, Vol. 57, pg. 1095-1103",
"retrieval": "J. Morgan, date unknown",
"notes": "r is MEA weight fraction in aqueous soln. (CO2-free basis)"
}
]
}
Import from Python dictionary¶
Creating a new table from a Python dictionary is similar to importing from JSON, except in this case the data and metadata dicts are created separately from Python dicts, and then combined.
Import from dictionary example¶
The import API is shown in the following example:
from idaes_dmf import dmf, resource, tabular, propdata
# Note: Working directory is inside the DMF `examples` tree
# Initialize the DMF in the current directory,
# creating '.dmf' tree if necessary.
d = dmf.DMF(create=True)
# Create PropertyTable from two inputs:
# - `data` is a dict in the same format as the 'data' secion of the JSON
# schema
# - `metadata` is a list of metadata dicts, just like the `meta`
# section of the JSON schema.
tbl = tabular.Table(data=data, metadata=metadata)
#
# OR, for property data:
tbl = propdata.PropertyTable(data=data, metadata=metadata)
# Add the property data resource to the DMF
d.add(resource.PropertyDataResource(tbl))
JSON Schemas¶
Schema (HTML version)¶
-
meta List of information about the data sourceList of:
-
datatype string name of the data typeExample:
- MEA
-
info string Additional information about the source (i.e. publication)Example:
- J. Chem. Eng. Data, 2009, Vol 54, pg. 3096-30100
-
notes string Free-form text with notes about the dataExample:
- r is MEA weight fraction in aqueous soln.
-
authors string Author list in format Last1, First1, Last2, First2, etc.Example:
- Amundsen, T.G., Lars, E.O., Eimer, D.A.
-
title string Title of the source (e.g. publication title)Example:
- Density and Viscosity of Monoethanolamine + .etc.
-
date string Date of source dataExample:
- 2009
-
-
data Measured data columnsList of:
-
name stringExamples:
- Density
- r
-
units stringExamples:
- mPa-s
- K
-
values Column of numeric valuesList of:number
-
error_type string Type for error values
-
errors Column of numeric errorsList of:number
-
Schema (raw JSON version)¶
{
"$schema": "http://json-schema.org/draft-04/schema#",
"id": "http://idaes.org",
"definitions": {
"Column": {
"type": "object",
"properties": {
"name": {
"type": "string",
"examples": [
"Density",
"r"
]
},
"units": {
"type": "string",
"examples": [
"mPa-s",
"K"
]
},
"values": {
"description": "Column of numeric values",
"type": "array",
"items": {
"type": "number"
},
"examples": [
"[2.6, 6.21]"
]
},
"error_type": {
"description": "Type for error values",
"type": "string"
},
"errors": {
"description": "Column of numeric errors",
"type": "array",
"items": {
"type": "number"
},
"examples": [
"[0.001, 0.035]"
]
},
"type": {
"description": "Type of column",
"enum": [
"state",
"property"
]
}
},
"required": [
"name",
"units",
"values"
],
"additionalProperties": false
},
"Metadata": {
"type": "object",
"properties": {
"datatype": {
"description": "name of the data type",
"type": "string",
"examples": [
"MEA"
]
},
"info": {
"description": "Additional information about the source (i.e. publication)",
"type": "string",
"examples": [
"J. Chem. Eng. Data, 2009, Vol 54, pg. 3096-30100"
]
},
"notes": {
"description": "Free-form text with notes about the data",
"type": "string",
"examples": [
"r is MEA weight fraction in aqueous soln."
]
},
"authors": {
"description": "Author list in format Last1, First1, Last2, First2, etc.",
"type": "string",
"examples": [
"Amundsen, T.G., Lars, E.O., Eimer, D.A."
]
},
"title": {
"description": "Title of the source (e.g. publication title)",
"type": "string",
"examples": [
"Density and Viscosity of Monoethanolamine + .etc."
]
},
"date": {
"description": "Date of source data",
"type": "string",
"examples": [
"2009"
]
}
},
"required": [
"datatype",
"authors",
"title",
"date"
],
"additionalProperties": true
}
},
"type": "object",
"properties": {
"meta": {
"description": "List of information about the data source",
"type": "array",
"items": {
"$ref": "#/definitions/Metadata"
}
},
"data": {
"description": "Measured data columns",
"type": "array",
"items": {
"$ref": "#/definitions/Column"
}
}
},
"required": [
"meta",
"data"
],
"additionalProperties": false
}