Skip to content

D.2.2. Data Integration

Ability to source and integrate data (consolidate silos).

The capability Data Integration (D.2.2) is part of the capability area Technology Execution in the Technology Pillar.

Ability to source and integrate data (consolidate silos).

Data Integration within the Technology Pillar encompasses the organizational capability to seamlessly combine data from various sources into the Enterprise Knowledge Graph (EKG). It involves the processes, technologies, and practices required to efficiently integrate data and resolve identity challenges, including legacy and EKG identifiers.

Key aspects of Data Integration within the Technology Pillar now include:

  1. Data Source Connectivity: Establishing connections and interfaces to retrieve data from diverse sources, including internal databases, external systems, APIs, structured data, unstructured documents, and more. This includes the integration of data with different identifier schemes.
  2. Identity Resolution: Resolving identity challenges, including legacy identifiers and EKG identifiers, to ensure consistency and accuracy within the EKG. This involves mapping and aligning identifiers across various data sources and resolving conflicts to achieve a unified and reliable representation of entities. See also D.2.1. Identifier Resolution.
  3. Data Transformation and Harmonization: Applying transformation and harmonization techniques to ensure data from different sources, including legacy systems, are integrated and reconciled effectively within the EKG. This includes mapping legacy identifiers to EKG identifiers and maintaining referential integrity.
  4. Data Quality and Governance: Implementing data quality measures and governance practices to address identity-related issues. This involves data profiling, deduplication, and data cleansing to improve the accuracy, completeness, and reliability of entity identities within the EKG.
  5. Semantic Interoperability: Achieving semantic interoperability by harmonizing the meaning and representation of entity identities across diverse data sources. This involves leveraging ontologies, controlled vocabularies, and semantic modeling techniques to ensure consistency and accurate identity representation within the EKG.
  6. Scalability and Performance: Ensuring the data integration processes, including identity resolution, can scale to handle increasing volumes of data and perform optimally within the EKG ecosystem. This may involve adopting scalable technologies, distributed processing frameworks, and optimizing workflows to address identity resolution challenges at scale.

By incorporating identity resolution into the Data Integration capability, organizations can effectively address the challenges associated with legacy and EKG identifiers. This enables a cohesive and accurate representation of entities within the EKG, ensuring data integrity and supporting reliable knowledge discovery, analytics, and decision-making processes.

Approach

  1. Use Case-driven Approach: Emphasizing the importance of defining use cases that drive data integration projects. Each use case represents a specific business objective or information need that is modeled within the EKG itself. By identifying these use cases upfront, organizations can establish a clear understanding of "the why" behind data integration efforts.
  2. Use Case Modeling and Linkage: Assigning an EKG identifier to each use case and establishing linkages between the use case and the relevant data sources, data products, and other related entities within the EKG. This ensures that everything associated with or serving a specific use case is properly connected and traceable.
  3. Alignment with Business Objectives: Ensuring that data integration projects are aligned with the broader business objectives and strategic initiatives of the organization. By explicitly linking data integration efforts to specific use cases, organizations can prioritize and allocate resources effectively, focusing on the most critical areas that drive value and address business needs.
  4. Continuous Refinement and Adaptation: Recognizing that use cases and their associated data integration projects may evolve over time. It is essential to adopt an iterative and agile approach, allowing for continuous refinement and adaptation of the EKG's data integration strategies based on changing business requirements and emerging use cases.
  5. Documentation and Communication: Documenting the rationale, scope, and objectives of each use case and data integration project, and effectively communicating this information to relevant stakeholders. This ensures a shared understanding of "the why" behind the data integration efforts and facilitates collaboration, decision-making, and alignment across the organization.

By adopting a use case-driven approach (see EKG Method and clearly linking data integration projects to specific use cases within the EKG, organizations can establish a strong foundation for data integration initiatives. This enables focused, purposeful, and traceable data integration efforts, ensuring that the right data is integrated for the right reasons, driving value, and supporting informed decision-making and knowledge discovery within the organization.

Warn

Work in progress

Warn

Work in progress. Describe the five levels of maturity for this Capability.

Warn

Work in progress

Warn

Work in progress, describe how this capability is possibly being delivered today in a non-EKG context and optionally what the issues are that EKG could or should improve

Warn

Work in progress, describe how this capability would be delivered or supported using an EKG approach, making the link to the "how" i.e. the EKG/Method.

Warn

Work in progress, list examples of use cases that contribute to this capability, making the link to use cases in the catalog at https://catalog.ekgf.org/use-case/..

Comments