        
Metadata Architecture and Standards
There have been a number of competing and sometimes overlapping data representation standards
that have attempted to address various aspects of metadata integration, interoperation, and
management in general over the last ten years. While there is some evidence of convergence
in metadata standards for data warehousing in particular, new technologies requiring increasingly
sophisticated metadata creation, exchange, and management capabilities continue to emerge,
causing de facto standards proliferation where voids exist. Some of the more relevant standards
include:
- Database Management. The ISO/IEC-11179 Specification and Standardization of
Data Elements1 standard specifies basic aspects of data element composition, including
metadata. In the United States, it is supported by
L8 - Metadata, a technical committee of the
National Committee for Information Technology Standards (NCITS), a standards development
organization accredited by the American National Standards Institute (ANSI). This
committee is also responsible for several ANSI standards, including X3.285 (metamodel
for the management of shareable data), as well as new proposed standards for knowledge
representation.
- Data Warehousing. The Common Warehouse Metamodel (CWM)2 describes metadata
interchange among data warehousing, business intelligence, knowledge management and
portal technologies. CWM combines standards originally developed by the Meta Data
Coalition (MDC), including the Metadata Interchange Specification (MDIS) and Microsoft's
Open Information Model (OIM), and the Object Management Group (OMG)'s earlier version of
the CWM. The MDC OIM was designed as a technology-neutral, vendor-independent metadata
standard by data warehousing vendors. It was a flat file definition intended for use
by warehouse loading tools in batch mode through a public API, describing multiple
object types, including databases, schemas, files, and relationships, and supporting
extensions for exchanging tool-specific or proprietary metadata. The Common Warehouse
Metamodel leverages various OMG standards, including the UML (Unified Modeling Language)3,
XMI (XML Metadata Interchange) and MOF (Meta Object Facility), and the Coalition's OIM,
and is an adopted OMG standard.
- Computer-Aided Software Engineering (CASE). The CASE Data Interchange Format (CDIF),
originally sponsored by the Electronics
Industries Association and now also maintained by the OMG, represents a collection
of standards intended to support the exchange of information among CASE tools. CDIF
provides a published set of vendor-independent, method-independent definitions for
metadata concepts in general and for modeling data and related concepts in particular,
including the CDIF Integrated Meta-model, a multi-facetted, integrated, multi-disciplinary
information model for modeling concepts. It supports the Integration Definition for
Function Modeling (IDEF0)4, (IDEF1x)5, and UML, among other modeling paradigms. It also
defines standard ways of moving this information between tools without the need for
customized interfaces, including the CDIF Transfer Format, a file format to represent
models. Formal standardization of CDIF at the international level is underway
(ISO/IEC JTC1/SC7/WG11). This standards body also coordinates with the Object Management
Group and ISO JTC1/SC32 (Metadata standards, including the 11179 standard listed above).
- Web-based Document Exchange. The Resource Description Framework (RDF) Model and
Syntax Specification [W3C, 1999], sponsored by the World Wide Web Consortium (W3C), is
a foundation for processing metadata; it provides interoperability between applications
that exchange machine-understandable information on the Web. RDF emphasizes facilities
to enable automated processing of Web resources. RDF can be used in a variety of application
areas; for example: in resource discovery to provide better search engine capabilities, in
cataloging for describing the content and content relationships available at a particular
Web site, page, or digital library, by intelligent software agents to facilitate knowledge
sharing and exchange, in content rating, in describing collections of pages that represent
a single logical "document", for describing intellectual property rights of Web pages,
and for expressing the privacy preferences of a user as well as the privacy policies of a
Web site. Related activities include those of the Dublin
Core Metadata Initiative, the Semantic Web's
Web Ontology working group, and the DARPA Agent Markup
Language Program.
It is important to note that these standards overlap in some areas, diverge in others, and
leave some issues open, such as a standard approach to semantic content and context. The
ISO/IEC 11179 standard comes closest to addressing the semantic properties of documents,
databases, and other resources, but does not establish a framework for representing rules
relevant to terminology usage and conflict resolution among other issues. Additionally,
because the standards were developed from perspectives of distinct computing domains, there
is no higher-level architecture that ties these various representation schemes together, or
that suggests one approach over another in an environment that includes multiple technologies,
diverse systems, or complex interrelationships.
Another complicating factor is that the concept of metadata itself is ambiguous. Metadata
can be used to describe a variety of distinct classes of information that play different
roles in a cross-organizational enterprise architecture or in a federation of enterprises,
such as the clients and suppliers of a third-party manufacturer. These may include:
- Descriptive metadata, sometimes called semantic or navigational metadata, or "data
about data," which is intended to provide information consumers with sufficient data
to allow them to access, browse, query, retrieve, and understand the data contained
within the resources available to them.
- Operational metadata, which, from a database or data warehousing perspective, facilitates
data extraction, transformation, move and load operations, including mechanisms such as
directory services and data translation.
- Interface-specific or administrative metadata, which is the metadata used by database
administrators to manage and maintain internal tables and other structures in a database
or that describes an application programming interface, for example.
No single representation standard addresses all of these classes of metadata. Yet, creating
a metadata architecture that can be leveraged by a broker (or federation of brokers) to
facilitate knowledge sharing across such diverse teams and resources requires an understanding
not only of the kinds of metadata listed above, but, where possible, of:
- The target business processes.
- Any special access mechanisms, security, or process-related requirements, including
scripting languages, application programming interfaces, message sequencing requirements,
or control and data flow requirements for the systems and repositories participating in the community.
- The relevant taxonomies, domain, organization, or vendor-specific nomenclature and
jargon, and mechanisms for expressing quantities and other scientific or business concepts
relevant to the environment.
- Ownership and configuration management rules for the information exchanged.
- An understanding of how various users will interact with the broker and resultant
integrated environment, including systems administration and management requirements.
- Knowledge of the rules for determining whether or not the information shared is
consistent, accurate, complete, and correct when brokered across applications or repositories.
- Most importantly, an understanding of the terminology and business rules relevant to
the people involved in the processes themselves.
Footnotes:
1. Gilliam, Daniel W. ISO/IEC 11179-1 Final Committee Draft, "Information technology -- Specification
and standardization of data elements: Part 1:Framework for the specification and standardization of
data elements", June 1998.
2. Common Warehouse Metamodel Specification, Version 1.0, The Object Management Group, February 2001.
See also www.omg.org/cwm/.
3. Booch, Grady, Rumbaugh, James, and Jacobson, Ivar, The Unified Modeling Language User Guide,
Addison Wesley Longman, Inc., Reading, MA, 1999.
OMG Unified Modeling Language Specification, Version 1.4, Object Management Group, Inc., Needham, MA,
February 2001. See also http://www.omg.org/uml/.
4. Draft Federal Information Processing Standards Publication 183, "Integration Definition For Function
Modeling (IDEF0)", December 1993.
5. Draft Federal Information Processing Standards Publication 184, "Integration Definition For
Information Modeling (IDEF1x)", December 1993.
|