Edge API User Guide

General context

The goal of the Edge package is to manage experimental data. It provides tools to create structured models of the type of data that will be stored, ingest that data to a central data catalog, and search and retrieve it. This document shows how to interact with the package using the API (separate documents show how to use the GUI application).

About the data model

The Edge data model consists of 'data model' (EntityKind) instances which each describe a real-world object (e.g., sample, instrument, image) and contain a hierarchical organized tree structure. This model consists of data group (FactNode) branch nodes and data field (FactKind) leaf nodes. The data fields each represent a particular type of information associated with the parent data model (e.g., individual recipe or performance characteristics). Each data field contains a 'configurable' (ValueConfiguration) instance, which describes the units and data type (e.g, float, int, string) of the value. There may optionally be additional configurables describing configurable parameters of the value (e.g., measurement temperature). While the EntityKind and FactKind classes represent the types of known data, the Entity and Fact classes represent the corresponding realized instances of them.

Connecting to a Data Catalog

Interaction with the Data Catalog is performed via an EdgeService instance. This service can be initialized in two different ways.

With a configuration file

The Edge application stores its catalog connection details in a YAML file, which is stored at ~/.enthought_edge/data_catalog_config.yml. This file can also be used for connecting to the catalog outside of the application:

from edge import api

service = api.connect(
    config_path='/path/to/config/file.yml',
)

Without a config file

A host URL and catalog name can be used to initialize the service:

from edge import api

service = api.connect(
    catalog_host="catalog.enthought.com",
    catalog_name="edge_demo1",
    catalog_port=5000  # Optional, defaults to 5000
)

If no authentication arguments are provided, a hatcher token will be searched for first in the environment variable $HATCHER_TOKEN and then in the ~/.edm.yaml file.

Otherwise, authentication can be provided by passing a hatcher token as an argument:

from edge import api

service = api.connect(
    catalog_host="catalog.enthought.com",
    catalog_name="edge_demo1",
    api_token='YOUR_HATCHER_TOKEN',
)

or a username and password (This example uses getpass to prompt the user for their password without saving it in the notebook):

from edge import api
from getpass import getpass

service = api.connect(
    catalog_host="catalog.enthought.com",
    catalog_name="edge_demo1",
    username='YOUR_USERNAME',
    password=getpass()
)

Retrieving saved data models from the catalog

Data model retrieval

data_model = service.get_entity_kind(
    'Edge_Example'  # Name of the data model
)

Looking up an object in the data model from its path

It is also possible to access the various components (Data Group, Data Field and Configurables) in a given data model, provided their paths.

The path is just a string representing the location of the object within the data model. It is the names of the ancestors of that object, each separated by a '/', plus the name of the object itself. For example, given a data model with the following structure:

- Edge_Example
    - Evaluation
        - Mechanical Properties
            * Tensile Strength (MPa)
                + Temperature (ºC)
        - Physical Properties
            * IR Spectrum

the path representing the data group 'Evaluation' would be 'Edge_Example/Evaluation', the path representing the data field 'Tensile Strength' would be 'Edge_Example/Evaluation/Mechanical Properties/Tensile Strength' and the path representing the configurable 'Temperature' would be 'Edge_Example/Evaluation/Mechanical Properties/Tensile Strength/Temperature'

# Get a data group
api.lookup_data_group(
    data_model,
    'Edge_Example/Evaluation')

# Get a data field
api.lookup_data_field(
    data_model,
    'Edge_Example/Evaluation/Mechanical Properties/Tensile Strength')

# Get a configurable
api.lookup_configurable(
    data_model,
    'Edge_Example/Evaluation/Mechanical Properties/Tensile Strength/Temperature')

Visualising data models

The structure of a data model can be visualised by printing it. Data groups are prepended with a -, data fields are prepended with a *, configurables are prepended by a +, and children are indented underneath their parents. For the model constructed above, this is the result:

>>> print(data_model)
- Edge_Example
    - Condition
        * Extruder
        * Screw Type
        * Screw Speed (rpm)
    - Evaluation
        - Mechanical Properties
            * Tensile Strength (MPa)
                + Temperature (ºC)
        - Physical Properties
            * IR Spectrum
    - Composition
        * Base (%)
        * Flame Retardant (%)
            + Purity (%)
    * Timestamp

Data search and export

To retrieve a subset of data already in the catalog as a Pandas DataFrame, use the search function. This takes as arguments - an EdgeService instance (see Connecting to a Data Catalog), - a list of data models to search (see Data model retrieval), - a list of criteria to filter the facts by (consisting of a tuple containing the path to the data field, the name of the configurable to filter on, a condition, and a value to compare the configurable to), and - a data table to set which columns you want in the returned DataFrame (see Data Tables)

# Create the search query
# List of Search Terms
all_search_terms = [
    ('Example/Composition/Flame Retardant', 'value', 'Exists', None),
    ('Example/Condition/Screw Speed', 'value', '>', 1),
]
search_terms = []
for path, config_name, condition, value in all_search_terms:
    data_field = api.lookup_data_field(data_model, path)
    search_terms.append(((data_field, config_name), condition, value))

# Retrieve a data frame containing facts matching the search criteria
df = api.search(
    service,
    data_models=[data_model],
    criteria=search_terms,
    data_table=data_table  # Use a data table to only include the desired columns
)

# Export to .csv and .xlsx
df.to_csv('search_results.csv')
df.to_excel('search_results.xlsx')

Tidy data

The search function returns pandas data frames following the 'tidy data' paradigm (see here and here). Briefly, 'tidy' datasets avoid containing any values associated to a measurement in headers, so columns for 'Pressure @ 20ºC' and 'Pressure @ 30ºC' would be represented instead as:

Entity Id.

Pressure (Pa)

Temperature (ºC)

001

10

20

001

15

30

002

20

20

002

25

30

003

30

40

This means that data from some entities can be split across multiple rows. It also allows for entities which have “redundant” Facts; those with the same configuration, but different values.

If you must, you can "untidy" the DataFrame:

df.pivot(
    index='Pressure (Pa)',
    columns='Entity Id.',
    values='Temperature (ºC)')

results in

Pressure (Pa) Entity Id

001

002

003

10

20

NaN

NaN

15

30

NaN

NaN

20

NaN

20

NaN

25

NaN

30

NaN

30

NaN

NaN

40

Data Tables

When searching for data, a DataTable object can be used to apply further levels of filtering and customization to the output.

Creating a data table

A DataTable can first be created for the EntityKind(s) of interest:

from edge.model import DataTable

screw_type_fact_kind = api.lookup_data_field(data_model, 'Example/Condition/Screw Type')

data_table = api.create_data_table(
    'example_table',
    data_models=[data_model],
    data_fields=[screw_type_fact_kind]
)

Adding data fields to a data table

Next, FactKinds (and optionally a column name) can be added to the DataTable after it is created. If no column name is given, the name of the FactKind is used.

screw_speed_fact_kind = api.lookup_data_field(
    data_model_2, 'Example 2/Condition/Screw Speed')
data_table.add_column(screw_speed_fact_kind, col_name="Screw Speed")

Saving/Loading data tables

Data Tables can also be saved and loaded via the Data Catalog.

# Saving a DataTable to the catalog. Note: Saving a DataTable requires it
# to have a unique name
service.add_data_table(data_table)

# Loading a DataTable
loaded_data_table = service.get_data_table(name='example_table')

Data import

Creating Entities and Facts

Once the data model has been constructed, the API can be used to ingest data, i.e. creating entities and facts.

Importing from an Excel File

Before using Edge Scripting Tool API to import data from files, an ImportRecord instance needs to be created in the Data Importer of Edge. This record relates a data model to excel files, by describing how data corresponding to data fields in the model can be extracted from the spreadsheets.

Providing an ImportRecord and an excel file, we can extract data in the form of entities and their facts using the parse_file_to_entity_group function as follows:

template = api.get_import_record(
    service, data_model, 'Formulation Standard')

parsed_entity_group = api.parse_file_to_entity_group(
    excel_file, data_model, template)

This creates an EntityGroup, which is a bundle of a collection of entities along with their associated facts, containing the data from the excel file now mapped to the data model.

Creating Data Manually

It is also possible to create data without an excel file, by creating an Entity, and Fact objects associated to it. The following code block illustrates how to do so:

# Create a new entity of the kind specified in the above data model
entity = data_model.create_entity()

# Retrieve relevant FactKind instances to make some Facts
screw_type_fact_kind = api.lookup_data_field(
    data_model, 'Example/Condition/Screw Type')
screw_speed_kind = api.lookup_data_field(
    data_model, 'Example/Condition/Screw Speed')

# Create some facts about the new entity
facts = [
    screw_type_fact_kind.create_fact(entity, value='Type A'),
    screw_speed_kind.create_fact(entity, value=10.0),
]

Saving Entities and Facts to the database

# Add the entity
service.add_entities([entity])

# Add the facts
service.add_facts(facts)

# Alternatively, bundle the entity and facts into an EntityGroup, and upload
# them together
entity_group = api.create_entity_group(data_model, entities, facts)
service.add_entity_group(entity_group)

Note that Entity or EntityGroup instances can only be added if their parent EntityKind already exists in the database (see Saving the data model) Likewise, Fact instances can only be added once their parent Entity has been added.

Saving Facts which already exist in the database

If you are modifying an existing Fact, instead of saving a Fact in the database for the first time, then to add it back to the database you must use update_facts instead of add_facts:

service.update_facts(facts)

Raw data import

A command-line utility is available for importing raw data (e.g. images and other files) from structured directory hierarchies. The basic usage requires only a path for the directory containing the files to import, and a fileset_depth integer indicating the depth relative to the import path to consider files part of the same logical unit for parsing. Typical basic usage may look something like:

python -m edge.io.raw_data_import --path /path/to/data --fileset-depth 3

See the API documentation for more detail and advanced usage of this utility.

See also the Endex documentation on Bulk Data Import, which is the basis for this utility.

Data modeling

Creating a data model

If the data model creation is likely to be a one-off task, it may be easiest to create the model via the UI and save it to the catalog, and then retrieve it when required using the service (see Data model retrieval)

However, data models can be created using the scripting API.

  • create_configurable creates a configurable

  • create_data_field creates a data field

  • create_data_group creates a data group

  • create_data_model creates a data model

Creating data fields

Below is an example of creating some data fields for a data model, including an extra configurable for one of those data fields:

from edge import api

timestamp_kind = api.create_data_field(
    'Timestamp', value_type='datetime')
extruder_kind = api.create_data_field(
    'Extruder', value_type='str')
screw_type_kind = api.create_data_field(
    'Screw Type', value_type='str')
screw_speed_kind = api.create_data_field(
    'Screw Speed', units='rpm', value_type='float')
ir_spectrum_kind = api.create_data_field(
    'IR Spectrum', value_type='fileset:ir')
sem_image_kind = api.create_data_field(
    'SEM Image', value_type='fileset:sem')

# Create configurable
temperature_configurable = api.create_configurable(
    'Temperature', units='ºC', value_type='float')
tensile_strength_kind = api.create_data_field(
    'Tensile Strength',
    units='MPa',
    value_type='float',
    configurables=[temperature_configurable]
)

Creating data groups

After all the data fields model are created, they can be grouped into specific data groups. Below is an example:

condition_group = api.create_data_group(
    'Condition',
    data_fields=[extruder_kind, screw_type_kind, screw_speed_kind]
)
mechanical_properties_group = api.create_data_group(
    'Mechanical Properties',
    data_fields=[tensile_strength_kind]
)
physical_properties_group = api.create_data_group(
    'Physical Properties',
    data_fields=[ir_spectrum_kind]
)

# Groups of groups work as well
evaluation_group = api.create_data_group(
    'Evaluation',
    subgroups=[mechanical_properties_group, physical_properties_group]
)

Creating a data model

Finally, the data model can be made by using the objects created above:

data_model = api.create_data_model(
    'Edge_Example',
    data_fields=[timestamp_kind],
    groups=[condition_group, evaluation_group]
)

Editing a data model

The functions above allow creation of a data model in a "bottom up" fashion, by creating configurables first, and then data fields, then data groups, and finally the data model. There are also functions to allow a "top down" construction, by editing a data model which has already been constructed

Adding children to a data model/group/field after creation

  • add_configurable adds a configurable to an already existing data field in a model

  • add_data_field adds a data field to an already existing data group in a model

  • add_data_group adds a data group to an already existing data group in a model

composition_group = api.create_data_group('Composition')
api.add_data_group(
    data_model,  # Data model to add to
    'Edge_Example',  # Path to data group within the data model (here, the root node)
    composition_group  # Data group to add
)

base_kind = api.create_data_field(
    'Base', units='%', value_type='float')
flame_retardant_kind = api.create_data_field(
    'Flame Retardant', units='%', value_type='float')
api.add_data_field(
    data_model,  # Data model to add to
    'Edge_Example/Composition',  # Path to data group within the data model
    base_kind  # Data field to add
)
api.add_data_field(
    data_model,  # Data model to add to
    'Edge_Example/Composition',  # Path to data group within the data model
    flame_retardant_kind  # Data field to add
)

purity_configurable = api.create_configurable(
    'Purity', units='%', value_type='float')
api.add_configurable(
    data_model,  # Data model to add to
    'Edge_Example/Composition/Flame Retardant',  # Path to data field within the data model
    purity_configurable  # Configurable to add
)

Editing the description of a data model

  • edit_data_model_description edits the data model description

# Edit description
api.edit_data_model_description(
    data_model, 'Insulation Manufacturing Data')

Deleting parts of the data model

  • delete_data_field removes a data field from a model

  • delete_data_group removes a data group from a model

# Delete data field
api.delete_data_field(data_model, ('Edge_Example', 'Evaluation', 'Mechanical Properties', 'Tensile Strength'))

# String as path also works
api.delete_data_field(data_model, 'Edge_Example/Condition/Screw Type')

# Delete data group
api.delete_data_group(data_model, 'Edge_Example/Condition')

Editing a data group

  • edit_data_group edits the name and description of a data group

Note that there is no function for editing a data field other than deleting it.

api.edit_data_group(
    data_model, 'Edge_Example/Composition', name='Beethoven',
    description='Composer')

Saving the data model

Once the model is created, it needs to be saved to Data Catalog using the EdgeService instance initialized above.

# Add the data model
service.add_entity_kind(entity_kind)

Saving a model which already exists in the database

If you are modifying an existing model, instead of saving a model in the database for the first time, then to add it back to the database you must use update_entity_kind instead of add_entity_kind:

service.update_entity_kind(data_model)