C.3.1. Business Rules¶
Strategy and approach for managing data quality.
The capability Business Rules (C.3.1) is part of the capability area Data Quality in the Data Pillar.
Strategy and approach for managing data quality.
Data quality is a measurement of the degree to which any dataset is fit for its intended purpose. It is based on an understanding of application requirements and derived by reverse engineering of the data production process. A data quality framework is an agreed methodology including operational controls, governance processes and measurement mechanisms. The framework is designed to support organizational priorities for data quality based on criticality and business value.
Data Quality business rules ensure that the data is fit for its intended purpose. SMEs specify the criteria used to validate and enforce data integrity. The criteria are translated into agreed specifications (i.e. business rules) which are later codified for data profiling or measuring conformity. Data quality business rules can also be embedded into data capture systems to ensure validity at source.
Warn
Work in progress
- Are the data integration activities, their systems, repositories, and connections known and tracked
- Are data integration activities linked to data inventory, business glossaries, and data models
- Are all data integration input and output datasets documented, tracked, and governed
- Are there reusable standards and defined business rules for performing data integration
- Are data integration patterns, tools, and technologies defined, governed, and used
- Has the firm established a central data integration function (i.e. integration Center of Excellence) to manage ETL across both internal and external data pipelines
The goal is not always a single source of data - but rather the ability to choose the right authoritative source for the appropriate context.
Maturity Level 1¶
- All data sources are identified and documented for in-scope use cases
- Do we know the authoritative source for each data set (should not be able to do integration without using approved authoritative source)
- Does everyone agree that we are using the right sources (the right source for every context) --- link to governance
- Do we have an approved list of what each source feeds (precise description at the entity level that
we can get from an approved source---must know if this is the primary source of the data per the
use case context).
"For any given entity do I have all the potential sources and for a specific context do I know which is authorized." - There is a defined governance process for change management and testing (clear picture of all the dependencies for data integration). If there are changes to authoritative sources---do we know the downstream implications (tracked and tested)
- Are all technology-stacks known and supported by current teams (are all key systems under the management and governance of the organization---should not have ghost systems that are not controlled as part of the integration process)
- Entitlement policies and classification rules (i.e. security, PII, business sensitive) are defined and verified
- Data Quality requirements are defined, documented, and verified
Maturity Level 2¶
- All information (above) are identified, precisely defined, and on-boarded into the knowledge graph
- Able to do datapoint lineage (detailed and complete view of the data integration landscape)
- Start making the EKG the central point for data integration (the EKG becomes the Rosetta stone of integration)---onboard systems, convert to RDF, integrate into EKG (defined as the data integration strategy---not necessarily complete)
- All data sets that are on-boarded into the EKG are coming from the authoritative sources. There are no man-in-the-middle systems. The goal is direct from the authoritative source to the target system for in-scope use cases. Must get the most granular data directly from the authoritative sources.
- All datasets are "self-describing datasets" (SDDs).1
- Policy---All data is obtained from the EKG as the authoritative source. Do not go directly to the originating source of the data.
- Entitlement policies and classification requirements are on-boarded into the EKG
- Data quality business rules are on-boarded into the EKG
Maturity Level 3¶
- Data is precisely defined (granular level)---expressed as formal ontologies---and on-boarded into the EKG
- All data flows are modeled, defined, and registered in the EKG (full lineage in the EKG for all in-scope use cases or applications)
- Start to make the EKG the authoritative source (set-up to facilitate decommissioning of systems). The EKG is structured to become the “new” system for in-scope applications (as soon as all connections emanate from the EKG).
- Entitlements are automatically managed and enforced
Maturity Level 4¶
- Policy---All downstream client systems are using authoritative sources as the only source of information for in-scope datasets (EKG is in the middle of all data flows)
- All “cottage industry systems” are replaced by the EKG (and EKG is able to perform all the requirements of any system it replaced---reporting, entitlement, quality control)
Maturity Level 5¶
No further requirements.
Value¶
Warn
Work in progress. Explain how EKG contributes value and how this capability or capability- enables higher levels of maturity for the EKG (which in turn provides more value to the business)
Traditional Approach¶
Warn
Work in progress. Explain how things are done today in a non-EKG context
EKG Approach¶
Warn
Work in progress. Explain how the given Capability or Capability Area would look like in a mature EKG context.
Use Cases¶
Warn
Work in progress. List examples of use cases that contribute to this capability, making the link to use cases in the catalog at https://catalog.ekgf.org/use-case/..