Data uploaded to the catalog can be retrieved through the 'Data Searcher' tool, which can be accessed through the Search for data option in the welcome screen, or the menu item Tools > Search...
Connecting to the Catalog¶
Before using the searcher it's worth checking that the application is connected to the catalog instance containing the data. If Catalog not connected appears on the right in the status bar, the connection details needs to be configured.
This can be done through the Configure data catalog connection option in the welcome menu, or by the menu item File > Login to Data Catalog....
The first step when performing a search is selecting the Data Model to search against. Selecting one via the Model dropdown brings up a tree view of the Data Model's structure, on the right side of the window.
Hitting the Refresh button performs a search and populates the results table. Without specifying any filters, all entities associated with the selected Data Model, as well as their facts, will be fetched. If a large amount of entities are present in the catalog, this is likely to take a while, so the search can be cancelled. Each row in the results table represents an entity, while the columns represents the FactKinds of the selected data model, and their configurables.
Checking/unchecking items in the data model tree rules which columns will be displayed in the results table. Selecting or deselecting a node automatically selects/deselects all items contained in it. Consequently, one can quickly display all FactKinds within a node by checking it. Please note that the selection of columns in the tree view does not affect the search behaviour: it only controls what is displayed in the results table. Running a search with no checkboxes selected will still fetch the items from the catalog, although no results will be displayed.
Building search queries¶
Additional filters can be added by hitting the + button below the Data Model tree view. Doing so spawns dropdown menus to select a FactKind, a configurable (by default, the search performs on the value of the facts) and a condition to satisfy. In the image above, two filters have been added: the first one sets a condition for the value of the Viscosity facts, while the second one imposes a condition on the Spin speed, a configurable of Thickness. After adding filters, the results can be updated by hitting Refresh again. Note that the fetched entities satisfy all the condition given in the filters. Filters can be removed by hitting the - next to them.
Search results can also be filtered by right-clicking a single value in the results table and using the Add Filter or New Search menus. Filters added through the Add Filter menu are appended to the existing filters, while New Search also clears any existing filters.
Saving and loading search filters¶
Complex or regularly used filters can be saved to avoid having to build them from scratch every time. Clicking the Manage button brings up a dialog where the current set of filters can be saved or previously saved ones deleted.
Saved searches can be loaded via the Filter: dropdown menu in the main searcher pane.
Searches with older data models¶
By default, the search results refer to the latest revision of a data model. A previous revision can be selected by clicking the ... button next to the Model dropdown menu.
For the example below, we created a new revision of our Spin Coating Model by removing the Solvent Properties | Solvent Boiling Point FactKind. Differences between the latest and selected revision of the data model are shown in the bottom left table of this dialog.
The revision number gets displayed on the main searcher pane if the selected version of a data model is not the latest revision.
Searches can be performed as usual with the selected Data model, though saved search filters can only be run against the specific revision used to build them.
The Searcher offers different options to inspect the retrieved data, available by right-clicking the results table as shown in the image below. When cells within the same row are selected, a View Entity action can be clicked to open the entity represented by this row in the Entity Editor. Columns can also be sorted or plotted in a graph (see here).
Exporting search results¶
The results table can be saved in a csv file using the Export to CSV button. A Jupyter notebook prefilled with the necessary code to retrieve the same results as in the application can also be created and opened using the Open in Jupyter button.
Plotting search results¶
The Searcher also provides basic 2D plot capabilities for quick data visualization. Plotting data can be done either by right clicking to open the context menu on selected columns in the results table, or by hitting the Show Plot button and selecting the columns to plot in the X-values and Y-values dropdowns. Two additional dropdowns allow the points in the scatter plot to be colored and sized depending on the values of other columns in the results table.
Precise values of the plot points can be accessed simply by hovering over them. Plots can also be used to quickly identify an entity: when selecting a point, the corresponding row in the results table gets selected as well, and vice versa.
As with search filters, it can be convenient to save the display settings for results tables if these end up being regularly re-used.
Column selections, custom names for columns and joins between columns (see here) can be saved as Data Tables, using the Save / Save As buttons below the search results table. Data Tables must be given a unique name when saved and are retrievable from the Table: dropdown menu.
Note that Data Tables available are dependent on the selected Data model. Consequently, only compatible Data Tables for the selected Data model are listed in the Table: dropdown.
Often, different data models will have properties representing the same thing. A common example of this in a lab would be labelling experimental samples by an identifier. If each experimental sample has two sets of measurements taken (with two different data models describing the types of measurement taken), we may want to group the measurements which share that common identifier, rather than treating the data as being totally separate.
In our results table, all the measurements for a sample would be on a single row, with a single column for the identifier.
Any columns can be linked in this way, as long as their data type is the same (numerical, text, boolean, etc.).
Configuring a data link¶
Here, we will configure a Data Link between two columns representing a Temperature, in two different data models. The first is from our Spin Coating Model data model, corresponding to the Temperature configurable of the Viscosity FactKind, while the second column corresponds to the Temperature configurable in an Example data model. To create such Data Link, we need to click the Create Data Link action in the context menu appearing when right-clicking the Viscosity FactKind.
Once a Data Link is configured, it is represented in the data model tree view by the link symbol. A Data Link can be created before or after a search has been performed, though search results will not update automatically. Hitting Refresh will update the results, showing the link between the two data models.
So, given the two following data sets for the Spin Coating Model and Example data models:
A search done with an inner Data Link returns:
Since this is an inner link, data from entities without a shared value across data models are dropped. Performing an outer link retains this data, but leaves missing data points blank. In our example, there are no matches for temperatures of 10˚C or 35˚C.
Created data links can be removed by right-clicking on the data model tree view to open a context menu.
Note: In our case, the data linking operation is performed on the data after retrieving the relevant entities. This means the number of results indicated in the data searcher tab may not match the number of entries shown!
Data Links with duplicate data¶
If a column being linked contains duplicate values there is no longer a clear one-to-one mapping between the entities from one data model and the other. In such situation, for a given value in linked column, there will be as many rows as there are combinations of the two linked entities sharing this value.
For example, we edited the Example entities so two of them have a temperature value of 25˚C. In this case, since only one entity in the Spin Coating Model has a temperature of 25˚C, only two rows needs to be created (see selected rows in the image below). Take a look at the Entity Name columns: for the Spin Coating Model, the two selected rows have the same value, corresponding to the same entity, whereas these values are distinct for the Example model.
It's worth keeping in mind that if many duplicate values exist, the number of combinations can potentially grow very large.