Levels
The goal is not always a single source of data - but rather the ability to choose the right authoritative source for the appropriate context.
Maturity Level 1¶
- All data sources are identified and documented for in-scope use cases
- Do we know the authoritative source for each data set (should not be able to do integration without using approved authoritative source)
- Does everyone agree that we are using the right sources (the right source for every context) --- link to governance
- Do we have an approved list of what each source feeds (precise description at the entity level that
we can get from an approved source---must know if this is the primary source of the data per the
use case context).
"For any given entity do I have all the potential sources and for a specific context do I know which is authorized." - There is a defined governance process for change management and testing (clear picture of all the dependencies for data integration). If there are changes to authoritative sources---do we know the downstream implications (tracked and tested)
- Are all technology-stacks known and supported by current teams (are all key systems under the management and governance of the organization---should not have ghost systems that are not controlled as part of the integration process)
- Entitlement policies and classification rules (i.e. security, PII, business sensitive) are defined and verified
- Data Quality requirements are defined, documented, and verified
Maturity Level 2¶
- All information (above) are identified, precisely defined, and on-boarded into the knowledge graph
- Able to do datapoint lineage (detailed and complete view of the data integration landscape)
- Start making the EKG the central point for data integration (the EKG becomes the Rosetta stone of integration)---onboard systems, convert to RDF, integrate into EKG (defined as the data integration strategy---not necessarily complete)
- All data sets that are on-boarded into the EKG are coming from the authoritative sources. There are no man-in-the-middle systems. The goal is direct from the authoritative source to the target system for in-scope use cases. Must get the most granular data directly from the authoritative sources.
- All datasets are "self-describing datasets" (SDDs).1
- Policy---All data is obtained from the EKG as the authoritative source. Do not go directly to the originating source of the data.
- Entitlement policies and classification requirements are on-boarded into the EKG
- Data quality business rules are on-boarded into the EKG
Maturity Level 3¶
- Data is precisely defined (granular level)---expressed as formal ontologies---and on-boarded into the EKG
- All data flows are modeled, defined, and registered in the EKG (full lineage in the EKG for all in-scope use cases or applications)
- Start to make the EKG the authoritative source (set-up to facilitate decommissioning of systems). The EKG is structured to become the “new” system for in-scope applications (as soon as all connections emanate from the EKG).
- Entitlements are automatically managed and enforced
Maturity Level 4¶
- Policy---All downstream client systems are using authoritative sources as the only source of information for in-scope datasets (EKG is in the middle of all data flows)
- All “cottage industry systems” are replaced by the EKG (and EKG is able to perform all the requirements of any system it replaced---reporting, entitlement, quality control)
Maturity Level 5¶
No further requirements.