Knowledge Integration Dynamics

The metadata universe


By Mervyn Mooi, Director at Knowledge Integration Dynamics.
Johannesburg, 8 Feb 2013

Metadata is the universe in which all data, information, and process objects exist, and it is through metadata that data architectures can enable flexible data and processing usage models that are required for using and managing big data. It implies that any data store or data processing technologies, which are based on metadata, such as data warehouses, are also considered as part of the big data domain. There is an argument that big data is about applying new technologies to meet needs unfulfilled by traditional technologies such as data warehouses.

The argument is that the three Vs of data – velocity, variety, and volume – were instituted long before big data, and that they don't necessarily hold true in the big data world. Big data need not have large volume, need not have velocity, and need not incorporate variety. In fact, using the three Vs to describe big data is a literal and technical interpretation of the term. Big data does, however, also require flexibility and quick or even real-time response rates. Data warehouses provide rapid responses, yet are not typically flexible – but that is not only a weakness of data warehouses. All-inclusive

Data warehouses were traditionally used to feed business intelligence systems and typically in a rigid manner relying on structured data. Big data goes beyond those fundamental data types and includes data content, systems data such as job or process run-time results, and user accesses, which are not considered business data in the bottom-line sense. Systems data is actually often touted as being metadata. Although architecture implies rigidity, it is not absolutely so.

Data warehouses have difficulty in dealing with some of the basic data processing tasks, but this is more often a failure of data integration tools and architecture than the warehouses themselves, because, while they remain capable of dealing with the tasks technically, the way they are deployed and used is the real problem. It is true that architecture can be a constraining factor when placing data warehouses in the context of big data, yet the best way to retain or gain economy of resources is to employ, for example, common processing, common models, and integration by taking a strongly architected approach. It is more about maximising the already deployed tools and technologies, than adding layers of complexity with new tools and technologies that are not entirely necessary.

Although architecture implies rigidity, it is not absolutely so. The architecture can be designed to be flexible so it is able to adapt to changes such as dynamic mappings, self-service information delivery or reporting, drag-and-drop report development, and sand-box, quick win development strategies. All in order

Flexibility designed into the architecture lends agility, which is one of the most prominently cited constraints of modern data warehouses. Such agility, however, is achieved within the confines of an architectural framework, allowing for rapidly changing or cycling models and their uses. And it retains structure and order, which, even if flexible, are necessary for sound data management, which is a top priority in the data governance domain and one of the challenges in dealing with big data. Metadata does not fit into the business data and information content, but rather into the models, definitions, programs, scripting and specifications of all ICT artefacts and resources. It sits a layer above the artefacts and resources, which is why metadata is the universe in which all data, models, information and process objects exist, which in turn includes big data and data warehouses. That is why it is in metadata that the flexibility for the new usage models big data requires is realised.

The solution to realising the benefits of big data does not reside solely in the employment of big data technologies and systems, or even other technologies and tools, but rather in the architecture of the data domain which relies on reliable, consistent and available metadata to drive flexibility within the confines of good management practices and processes.

comments powered by Disqus