C.2.1. Ontologies¶
The _models that provide the structure and meaning for the information._
The capability Ontologies (C.2.1) is part of the capability area Data Architecture in the Data Pillar.
The _models that provide the structure and meaning for the information._
An ontology is a set of formal definitions for the key concepts that organize and structure an organization’s information. Having ontologies provides a common denominator level of unification that allows information to be shared and communicated between different use cases and shareholders, regardless of the different sources, structures and vocabulary they might use.
An ontology is about:
- explicit meanings and relationships; the terms used are less important
- a combination of definitions in both text and logic
An ontology can be the basis of, but is broader than:
- a taxonomy
- a vocabulary
- a data or object model
- a conceptual model
- a specific serialization format
Ontologies can be expressed at different levels of sophistication, with different scopes, and in a combination of languages. The basic structures include:
- individual
-
a representation of a business object or item which is the subject of information to be managed. An individual has a unique identity. For example
Person X
orShipment Y
. Many such individuals might represent the same real world object. - data value
-
strings, numbers, dates which represent the data.
- property
-
a type of data point that may be associated with individuals. An individual, a property and a value --- which may be a data value or another individual --- form a triple.
For example personX hasBirthDate D
or personX hasMother Y
. Triples whose value is another individual form relationships. Properties may have generalizations, for examplehasMother
is asubProperty
ofhasParent
. - class
-
a category applied to individuals, that determines what you can do with them, the properties you can expect to see, and the rules that might apply; an individual may be a member of many classes associated; classes may have generalizations. Note that, unlike more traditional approaches, properties are independent of classes or class membership. For example, given the triple
X hasMother Y
you may be able to infer that bothX
andY
are members of the classPerson
, or at leastAnimal
. - ontology
-
grouping of the above for management and identification purposes.
- How much of the enterprise data is covered by ontologies?
- How well is ontology coverage mapped to business need?
- To what extent are concepts independent of but mapped to terminology/vocabulary?
- Level of sophistication of textual and logic definitions
- Level of tooling is available and used
- Level of training and trained people
- Level of process (including change management), guidelines and standards
- Level of modularity and reuse --- internal and external
- Extent of examples and tests
- Extent of traceability with different logical and physical data models
The following criteria for each level are abbreviated: each item is shorthand for:
- documented process
- trained participants
- implemented process and/or technology
- monitoring and improvement
Maturity Level 1¶
- Minimal ontologies which could be as simple as a list of classes and properties used in graphs
- Basic metadata (definition, label) for each class and property
- Each individual (in data) has at least one explicit class
- Ontology coverage for each use case in scope of the project; project includes minimal number of ontologies and classes not justified by a use case
- Definitions catalogued and under change management
Maturity Level 2¶
- Ontologies expressed in a standard ontology language (could be as simple as RDF Schema)
- Common (shared or mapped) concepts across EKG projects
- Ability to see ontology usage by use cases, vocabularies and datasets
- Namespace scheme established and used for new ontologies in the EKG
- Ontology guidelines in place and implemented, including common metadata
- Documented approach for external ontologies, including selection and adaptation
- Annotated example files for documentation and training
- Test files based on use cases covering all used ontology elements
- Ontology change management includes impact analysis and stakeholder approval
- Tooling for ontology diagrams and documentation
- Automated basic checking of ontology syntax
- Access to at least one trained Ontologist
Maturity Level 3¶
- Modeling of required data and constraints by use case, including for stored and communicated data
- Automated validation of ontologies (for guideline compliance, and for logical consistency), with results as triples
- Automated testing and validation of test data with ontologies (per use case)
- Separation of concerns to support enterprise management such as bi-temporality, transactions and events
- Automated transformation of ontologies to use common serialization and metadata
- Automated checking of ontologies against different profiles (e.g. OWL-RL) to check for technology support
- Automated checking of ontologies against different best practices
- Ontology source changes linked to automated operations for testing and deployment
- Impact analysis identifies ontology breaking changes which require fixes to existing EKG data
- EKG-wide ontology browsing and searching
- Follow-your-nose UI starting from any ontology element URI1
- Follow-your-nose API starting from any ontology element URI1
- Trained ontologist available to each project (ideally via the EKG Center of Excellence)
Maturity Level 4¶
- Separation of ontologies from vocabularies, with multiple vocabularies for different communities mapped to the same concepts
- Ontology architecture management process, including use of patterns and modularity
- Generation of logic into business language
- Automated fixes to existing EKG data in response to ontology breaking changes
- Basic ontology metrics and reporting, including usage in data
- Generation of ontologies/shapes for external interchange
Maturity Level 5¶
- Sophisticated ontology metrics and reporting, including trends
- Matching and differencing of ontologies from different sources
- Automated matching of ontologies with vocabularies
- Generation of validation code for external interchange
- Wizard for developing ontologies from business questions
- Inducing of ontologies from instance data
Contribution to the EKG¶
Ontologies are the basis for Principle Meaning:
The meaning of every data point must be directly resolvable to a
machine-readable definition in verifiable formal logic.
The link to precise meaning serves to mitigate problems created using the same word with multiple definitions; and the challenges of expressing conceptual nuance using multiple informal sentences. In the other direction, from ontology to vocabulary, it should be possible to generate a business glossary directly from ontologies for a given scope. Since they should capture the meaning of concepts applicable to an organization, or an even broader ecosystem, the choice of concepts to include in an EKG should be driven by business use cases. And different overlapping ontologies may be included and mapped to cover different relevant aspects. Likewise, it should be possible to generate---and map to---models for more conventional tools from ontologies, by applying technology-specific rules.
Semantic modeling also eliminates the problem of hard-coding assumptions about the world into a single data model. And while multiple ontologies may coexist, they are able to be mapped and connected to each other. In a mature environment, the data modeling process drives technology implementation, by defining the detailed data structures and associated APIs. These components---along with functional code---are included as part of the testing suite within the EKG to facilitate rapid deployment.
Different types of external data models are not needed in EKG but can be mapped to or generated. In fact, physical data modelers are a community with their own vocabulary.
Constraints/shapes for models are applied by context (use case)---there is no Single Version of the Truth (SVOT) for the EKG as a whole. Different ontologies may be used for different contexts and mapped to each other in the underlying knowledge graph.
Ontology elements are linked to by vocabularies and mapped to other data models and datasets to provide their meaning; and from Use Cases to provide their scope. These aspects are covered by those respective capabilities.
Contribution to the Enterprise¶
Ontologies are needed to truly understand what a given set of data really means and what can be inferred from it. For example, you cannot rely on the name of a column in a spreadsheet. A deceptively simple column name such as "number of European customers" leaves open the meaning of "European" and "customer" and timing (when does one start and stop being a customer?). And different sources could have different interpretations of that same name. The benefit is consistency, accuracy and the ability to make sound business decisions. Having the models themselves be resources that can be looked up means that all data is self-defining and carries its meaning with it. In an EKG there is no fixed set of ontologies so it can non-disruptively incorporate additional knowledge. Ontologies allow data to be understood independent of the format/technology and the vocabulary used in different communities, saving misunderstandings and battles over which word to use.
Warn
Work in progress, describe how this capability is possibly being delivered today in a non-EKG context and optionally what the issues are that EKG could or should improve
Warn
Work in progress, describe how this capability would be delivered or supported using an EKG approach, making the link to the "how" i.e. the EKG/Method.
Warn
Work in progress, list examples of use cases that contribute to this capability, making the link to use cases in the catalog at https://catalog.ekgf.org/use-case/..