|
|
Master Data Management
The MDM Institute defines Data Governance as “the formal orchestration of people, process, and technology
to enable an organization to leverage data as an enterprise asset.” Yet, despite the critical
importance of Data Governance to an organization’s success, little support has been available from
data product vendors with their legacy tools. This tool gap impedes even the most aggressive
Data Governance initiatives as was noted by the MDM Institute in their
Enterprise Master Data Management market review: “In fact, most vendors will point to their
data steward console as the acme of their data governance capabilities. In reality, what’s needed are
formal processes, assisted by workflow software, to enable formalized decision making, documentation,
and delegation, regarding the rules rendered as part of the governance lifecycle. Another gaping hole
in data governance capabilities of the majority of MDM vendors is their inability to directly store and
execute such governance-generated procedures as part of the MDM logic that controls the software which
in turn should enforce the governance. True data governance mandates the integration of people,
process, and technologies via a formalized framework. These formal structures are inevitable as they
are the key enablers of data governance policy functions much are so than paper-based methodologies
and accelerator/frameworks.”
TECHi2 designed and built a high-performance innovative Enterprise Data Environment (EDE)
as a fully integrated set of tools, processes, and specifications that process and manage
data over a full lifecycle of data collection, transformation, integration, unification, and
publishing in its SOA framework. Data quality is built into the EDE at every level and every
step of processing with direct traceability of technical functions to governance defined business
processes and rules. It is built entirely with open architecture tools, methods, and specifications
using modern proven high performance techniques for web clients, web services, application
services, database server, metadata registries, metadata files, and secure data handling. It
uses our new innovative technology that provides a single integrated semantic framework with high
performance data integration and cleansing engine all tied together and using a single managed set of
data specifications. The KORS™ semantic unification framework seamlessly organizes and combines the
key knowledge, concepts, rules, metadata, and specifications of an organization into a scalable unified
data and metadata system. Never before has the separate and complicated technologies of knowledge
engineering, ontologies, rules, metadata, data models, and high-performance computing been brought together
in a practical engineering solution.
We used our in-depth knowledge of the technologies used in legacy systems and newly available
for SOA, metadata, and data engineering. The challenges of low data quality and expensive disjointed
systems are not new; indeed, the DoD and industry have attempted to solve them with technology
advances in databases, applications, and networks for decades. However, these attempts
failed to realize their potential because they made a critical error. They relied on technology alone
to solve what is inherently an integrated governance process and technology problem.
TECHi2 recognized this critical failure due to our years of service as system and
EA developers but also as technical experts reviewing systems in many Federal agencies. We also
used our extensive expertise in the technologies themselves arising from our key personnel who
were scientists in these technology areas, and who worked with the leading research sponsors
on these technologies as they were being matured. This afforded us with broad and deep knowledge
of a new arsenal of technologies based on open standards and modular extensible methods that
offer a good opportunity to solve the data quality and sharing problem in an affordable,
maintainable framework that can evolve and extend as new requirements, technologies, and processes
inevitably emerge.
The operational overview is shown in figure as an Enterprise Architecture (DODAF OV-1) diagram which
highlights the SOA architecture but also shows the critical design strategy of linking technology,
process, and governance throughout the EDE. Indeed, the EDE name was chosen to emphasize that it
is not just technology. That is, it is an environment of integrated but modular technology tools
operating according to authoritative rules specified by formally decreed organizational governance
and business authorities. This is most clearly seen in the EDE’s explicit data QA/QC component
in the OV-1 which acts as a quality gateway.
Key aspects of this technology are:
- Governance based: Detailed guidance is collected from published documents and by working
with governance groups following the EDE Data Unification process. These groups include the
decreed authorities in each business domain as well as higher level organizational authorities.
This guidance is converted into a set of operational rules that are documented in EA views
and EDE specifications such as code encyclopedias and data dictionaries.
All subsequent EDE work is traceable and aligned to these rules.
- Standards based: Only industry and Government standard techniques and tools are used in the
EDE open architecture. These standards include: XML files, XML schemas,
JavaScript Object Notation (JSON) data format, DHTML, HTTP and HTTPS protocols,
Digital Object Identifiers (DOI), metadata schema, FIPS 140 encryption, and many others.
- Conforms to Govt and industry policies and guidance: EDE follows relevant polices for
data sharing, XML Naming and Design Rules, Security Technical Implementation Guides (STIG),
Security Reference Architecture, Protection of Sensitive Agency Information, and others
as directed by governance groups.
- Open architecture: The EDE software architecture is built in modules and services,
with internal functions using the same methods for data access that are exposed to
external users. This allows additions and changes to methods through individual modules
without complicated changes to the entire system. Even different programming languages can be
used for different modules since they are linked in their compiled form thereby
allowing seamless integration of C#, Java, and other languages. The data and
metadata use standard storage and I/O methods (i.e. databases, XML files)
with the details of their access hidden from the service customer so that they do not need
to tie their code to the physical structure of EDE components. EDE functions are accessed through
a service using the API that follows industry standards for object based software methods.
- Scaleable: EDE uses scaleable techniques throughout and has been designed for high
performance distributed web operations. EDE uses low overhead database (i.e. no complicated SQL predicates)
and file commands to save and retrieve data/metadata yielding very high data I/O,
which is critical in high performance transactional systems. This has been measured
with a large multi-megabyte data load being processed round trip from web client to database back
to web client in 200 msec. The transactional database connection time is minimized and
the physical database connections are pooled per industry best practice ensuring that a high
speed conduit is always available for a transaction without the slow opening and closing of
network connections each time. Additionally, the web client request is transmitted using an
AJAX callback which minimizes both the data load between the web client and server and the
web page refresh time. For web scalability, the number of calls to the server is reduced
to the minimum necessary to retrieve or save data by performing application logic on the
web client itself with JavaScript application functions, which are programmed with
high performance DHTML methods.
- Unified data semantics: The TECHi2 EDE technology uses a data framework that
semantically integrates an organization’s core knowledge into a unified data model
with very little design and development cost through ontology templates that allow rapid
adaptation to a specific organization and application domain. This is a major technology
advance over standard data modeling methods that require extensive manual effort and typically
cannot scale to a large organization’s need to unite many disparate but equally important
functional perspectives. In fact, this is the single greatest source of failure in attempting to
use older data technology in a SOA architectural approach. The conceptual model is mapped to
the EDE logical data model which is an object based model using an entity-attribute-value (EAV)
object structure providing repeatable, consistent, flexible, extensible, and comprehensive coverage
of data requirements with direct traceability to governance rules and the conceptual model.
- Secure: We built multiple security mechanisms into EDE in all three tiers
(web client, application server, data server) and in network transmissions. All users are assigned
one or more domain based roles that control their access and application privileges per
business domain per application function, such as for workflow processing of authoritative
data artifacts (e.g. data dictionaries). The application and database servers require
encrypted (FIPS 140) transactional credentials placed in the web service SOAP header or
protected server memory. Sensitive data, including all PII data, is stored encrypted and
decrypted only after all access control gates have been passed on the server avoiding the
risk of client side hacking of credentials. Network transmission is secured with a SSL connection.
- Metadata repository/registry: The Integrated Metadata Repository (IMR) stores both rich semantic metadata in XML files, as required by policy and industry specifications for
some types of content objects, and data information and artifacts like data dictionaries, code
encyclopedias, and workflow status. The IMR conforms to standards. This metadata is accessible
through a data access service method and can be viewed and edited in a portal.
- Light-weight application services: The EDE design enables small but highly functional
application modules offered as services. This builds on the foundation of integrated,
harmonized data with reusable functions provided as units of business logic, which
is a central approach in a SOA. We build the application services following well
defined requirements and software engineering plans documented in DODAF BPMN and UML models.
These application services can have sophisticated business logic with very small software
footprints (typically <50kB for comparable functionality to a monolithic application of > 100MB).
- Rules-based data quality processing: A major component of the TECHi2 EDE technology
is the integration of comprehensive data quality processing. Data quality is a
foundational part of EDE which is required to be performed on all data. The data quality
process is part of the overall Data Unification Process with the rules defining what is acceptable
and unacceptable collected from governance and business authorities and then placed in XML
rules files. The EDE data quality engine uses these rules to confirm or modify data element
values as they come into EDE, either in a transaction or batch mode, and then if passed
store the value or if failed, produce an exception report. These rules can be sophisticated
multi-variable analyses in contrast to the simple data quality processes found in most systems that
merely compare spelling or perform table lookups to predefined transformation.
|
|