Getting to know the ODM

Dictionary v2.2.3 Documentation v2.1.0

What are “Parts” and “Part Types”?

Within the ODM, every component or element of the model and the dictionary is called a “part”. Parts are grouped into “part types”.

Because the term “parts” refers to every component of the ODM, “part types” are a way to differentiate between parts which have different functions and structures. The three most important part types are:

  • Measures: A measurement or observation of any substance including a biological, physical or chemical substance.
  • Methods: A procedure for collecting a sample or performing a measure.
  • Attributes: A description of the who, where, when, and why of environmental surveillance.

See the parts reference document for more details.

The are additional part types to support the dictionary including:

  • Aggregations
  • Aggregation scales
  • Categories
  • Classes
  • Compartments
  • Dictionary support
  • Domains
  • Groups
  • Missingness
  • Nomenclature
  • Quality indicators
  • Specimens
  • Tables
  • Units

For information and details about all the part types, please see the parts reference document.

“Sets” and the rationale behind sets

“Sets” are a way of grouping together different possible categorical inputs within the ODM. The purpose behind sets is to group options together so that users only need to check a few options from a drop-down menu, rather than scroll through a long list. Sets are also designed so that a single part can be used in multiple sets, which avoids the need to create multiple versions of this part for these different use contexts. For example, if you’re taking a measure of concentration, that measure will likely populate the unit choice drop-down list with units from the “standard concentration unit set”. Units in this set include Milligrams per litre, parts per million, and Percent Primary Sludge. Similarly, if one were measuring the amount of oxygen in a wastewater sample they would be offered units from the “dissolved gas unit set”. The units in this set include parts per million, and Milligrams per litre. You see here how Milligrams per litre is used twice across the sets, but exists still as a single part in the parts list. Unit sets are only one type of set, with the others being:

  • Aggregation sets
  • Compartment sets
  • Quality sets
  • Specimen sets

Details about these sets are their similar - but unique - applications can be found in the parts reference document. Importantly, “category sets” are not considered a “set” like these others. See the section below for an explanation of this difference.

Why category sets are treated differently

One of the reasons why “sets” are set up and organized the way that they are, is to allow for the re-use of parts in multiple sets. Category sets are, however, an exception to this rule because the categories that make up a category set are used in only one set. Furthermore, the general sets are needed for almost every measure or method (which require unit, aggregation, quality, and compartment information), while category sets are used only in specific circumstances for specific fields. These are things such as collection metadata (example: Sample collection category set), dictionary metadata and model architecture (example: Data Type Category Set), or more detailed types of certain methods (example: Nucleic Acid Extraction Category Set). Category sets are also a distinct part type, and more information about them can be found in the parts reference document.

The “what” and “why” of Specimen IDs

Specimen IDs indicate the level at which a measure is being done. This can be: a site measure, ie. a measure at a site such as temperature or weather; a sample measure, ie. a laboratory measurement on a sample, such as gene copies of SARS-CoV-2 per mL; a person measure, ie. a measure of something at the level of an individual person, such as a blood pressure reading; or a population measure, ie. an aggregate measure of a population, such as the number of confirmed cases of a given illness. While there is currently nothing in version 2.0 of the ODM with uses the person specimen ID explicitly, it has been included to allow for the possibility of storing individual-level data in a future version. The reason specimen IDs were created was so that there could be a single manner in which measures and methods are recorded, regardless of the level at which they were performed, while still maintaining the possibility to collect site, sample, and population measures as a distinct types of metadata.

The “what” and “why” of Groups and Classes

Groups and classes, similar to sets, are ways of grouping together different measures within the ODM. Given that the ODM aims to be as robust as possible and has a very long list of possible measures, groups and classes were designed to give shorter lists of measures in the drop down lists by specifying details about the kind of measure that a user is recording. Groups and classes can work together to further specify what kinds of measures are being reported. For example, the group sarsCov2 contains many measures, but by specifying that the class is an allele or variant, the list of possible measures is pared down. Alternatively, class can be said to be non-applicable, paring down the measures in the sarsCov2 group to only be unspecified measures of the quantity of the virus.

Tables and table types within the ODM

Within the ODM there are three types of tables: program description tables, results tables, and look-up tables. These table types exist to differentiate the function of these tables and to highlight these differences to users. The different tables take different types of inputs and maintenance from users, so understanding the differences can be important.

Program description tables

Program description tables (represented in yellow in the ERD) are tables used to record metadata on the organizations, locations, methods, and appurtenance. These tables help to describe surveillance and testing programs, and are intended to be updated infrequently.

Results tables

Results tables (represented in blue in the ERD) are the tables used to record details on samples and measures. These tables record the main outcomes data and are updated daily, if not more frequently.

Look-up tables

Look-up tables (represented in green in the ERD) are the tables that are pre-programmed and pre-populated in the ODM. These hold information on sets, all parts, languages, and translation abilities. These are only updated by the ODM team in version updates.

Measures, Methods, and Attributes: Key parts

While there are many part types, there are three main part types users should be most familiar with: measures, methods, and attributes. These three have parallels with the three table types, and the differences and details of these parts are useful to understand.

Measures

Measures are actually types of measures that can be performed. These can range from temperature to the number of gene copies in a sample. The measure, or measureID, can be selected from a drop down in the templates and it specifies the kind of measure you intend to record. The actual value of the measure is then recorded in the value field of the measures table, with units and aggregation specified in the unitID and aggregationID fields.

Methods

Similar to measures, methods are types of methods that can be performed to accomplish a measurement. These can be diverse, ranging from incubation, qPCR, or nucleic acid extraction. The method itself (methodID) is selected from a drop down in the templates, specifying in general terms the kind of method the user wishes to record. From there, the value field of the methodSteps table can be populated by one of the inputs from the category set associated with that methodID. This provides a higher level of detail for a given method. For example, the methodID might be solidSep for solid separation of a sample. The value field might then be populated with cent for centrifugation, indicating more details about how the sample settling was done.

Attributes

Attributes are the largest category of part type, as these refer to most of the fields in the ODM. Attributes are fields for metadata within the ODM and range greatly in the use. They comprise everything from collection dates, to sampling period, to names.

Data quality and reportability

Within the ODM there is a qualityFlag field present in both the samples and measures tables. This field serves to highlight whether or not there is a quality issues with the sample or the measure. The quality flag also allows a user to specify the type of quality issue. This is managed through the use of quality sets, which are the sets that contain the possible quality flags for a given measure or for samples. The idea is that this provides data on any issues with a sample and an indication of the nature of that issue. Having data about the nature and presence of a quality issue is often not sufficient for decision makers who are trying to use and interpret the data. As such, there is also a reportable field which is a Boolean indicator of whether or not data can or should be reported or included in final reports and decisions.

Time periods for samples and measures

For measures, there is a field aDateStart and aDateEnd which specify the date and time that an analysis was begun and finished. This allows for the recording of greater detail around timelines for especially long, multi-day analyses. For shorter analyses, which will likely make up the bulk of reported measures, the same date can be inputted into both fields. The idea is to have the reporting tables for measures be as robust as possible to allow for various kinds of timelines.

For samples, there is collDT, collDTStart, and collDTEnd. The first field is for the collection date and time of a single grab sample, so start and end are not necessary and the other two fields can be left blank. For composite or pooled sample, the start and end date and time for collection is crucial information to know. When these two fields are populated, the collDT field can be left blank.

Dates within the ODM

There are a number of date fields within the ODM which all serve different purposes. Collection datetime (collDT) is the date a sample was collected, used only for grab samples. This field is left blank if the collection datetime start and end fields (collDTstart and collDTEnd) have been populated instead. Inversely, collection datetime start and end fields should be left blank if the collection datetime filed has been populated. Similarly, analysis datetime start and analysis datetime end (aDateStart and aDateEnd) are used to report the date and or timeline of an analysis for a measure.

Date fields that are more related to data processing and labratory infrastructure are: the last edited date (lastEdited) which indicates the last time a table, or a measure or sample details, were last edited or updated; the sent date (sentDate), or date that a sample was sent to the lab from the field; the received date (recDate), or the date the sample was recieved in the lab; and the report date (repDate), or the date that the analysis results or measures were reported. These give a greater indication of how up to date data is, but also on the speed and efficiency of the pipeline between sampling and reported results. Dates should be reported in day/month/year format to accommodate the most popular global convention.

Translation and language capabilities

The default language of the ODM is English, but French translations of all descriptive elements of the data model dictionary are also available. As additional nation states and partners adopt the ODM, we anticipate that these fields will be translated into other languages as well. The translation capacities are managed through the language look-up table (languageLUs), the translation look-up table (translationLUs), and the parts table (partsLUs). When a translation for a given part is not available, the dictionary will default to the English term. The language look-up table stores linguistic and classification codes for spoken human languages, with the most recent ISO639 code being the language ID (languageID). In the translation table, the language ID is paired with every part ID (partID), along with the label (partLabel), description (partDesc), and instructions (partInstr) translated to that language. The part IDs are linked to the full parts list which otherwise contains metadata that is coded in variables and requires no further translation.

Recording protocols

When analyzing data from various sources, it is important to understand how the data was produced. The PHES-ODM recognizes this and includes protocols as one of its entities. To store protocols efficiently in database tables, several challenges must be addressed:

  1. Protocols are comprised of a collection of steps.
  2. Protocol steps can consist of two entries:
    • They can prescribe the use of a specific quantity of something. This can be described using the different measures found in the PHES-ODM dictionary and assigning them a prescribed value and unit of measurement.

    • They can prescribe an action. The PHES-ODM calls this a method.

  3. Protocol steps must be organised in the right order to convey the meaning of the protocol.
  4. Steps may follow each other, or they can be done concurrently.
  5. Measures found inside a protocol specify the quantity of reagents, supplies and conditions involved in the realization of a method.
  6. Different protocols may use some of the same steps, but in a different order.
  7. Protocols may have steps that consist in one or several (sub)protocols.
  8. Protocols can be updated with new versions, or they can reference and build from other protocols.

To solve these constraints, the PHES-ODM stores a protocol (e.g., what it does, who developed it, when it was developed) separately from its constituent steps. These protocol steps are stored in the Protocol Steps table. The entries in the steps table are either methods or measures.

The protocol steps are linked together in their own Protocol Relationships table. Rows in the relationship table have four main attributes. The first is the protocol identifier. The three other attributes define a relationship between two steps in the protocol. The relationship is expressed in the form:

subject → relationship → object

Where the subject and the object can either be a protocol step or subprotocol. The available relationships (e.g., is_before, specifies, is_concurrent_with, etc.) allows one to organize the protocol in a more semantically meaningful way than by simply using a sequential order.

This flexible structure allows protocol steps and subprotocols to be reused in any number of protocols. The relationships between protocols, protocol orderings, and protocol steps are shown in Figure A.

Figure A: Main entities and relationships used to represent protocols

Questions and online community

If any users have additional questions or issues with the ODM, we invite them to check out our Discourse Page for discussion boards and community support. For larger issues, and to ask the ODM team to add additional fields or variables into the model, we encourage users to visit the project’s GitHub repository and to create and issue there so that a team member can respond.