How to build better systems – the specification

Towering stack of paperworkI’m going to take a little departure from building Road Maps to begin a new series How to build better systems.  Applied Enterprise Architecture means a little hands-on practical application is always a good way to keep our knives sharp and skills current. We will begin with the specification. I define a Systems requirements specification (SRS) as a collection of desired requirements and characteristics for a proposed software system design.  This includes Business Requirements (commonly known as a BRD), non-functional requirements, and core architectural deliverables. The collection is intended to describe the behavior of a system to be developed.  The collection should include a set of use cases that describe interactions the users will have with the software.  The specification should be accompanied by a set of architectural guidelines for the project team (see Nick Malik’s fine post “A practical set of EA deliverables” for good examples) to follow as first principals.  This is true (in slightly different variants) no matter what Systems Development Life Cycle (for example Waterfall, Agile, Spiral, and others, even Test Driven Development) we are planning to use to manage the development effort.  Getting this right means can reduce defects and errors early and remains the single most effective action we can take to improve our project outcomes.

The Challenge

We probably can all agree that defining functional and non-functional requirements are the most critical elements of a project’s success or failure.  In fact searching Google for the simple search term “defining good requirements” returns over ninety-one (91) million hits! Yet in spite of all the good advice and fine writing available we have all seen inaccurate, incomplete, and poorly written requirements specifications. We all know this will result in poor quality products, wasted resources, higher costs, dissatisfied customers, and late project completion.  The cost to remediate these defects in development projects is staggering.  I’m going to try to keep this very professional and objective. I can’t help thinking this is probably a very good place to rant about the dismal state of affairs many of us find ourselves in today. We spend an enormous amount of energy to test executable code, yet the tests themselves are constructed on faulty assumptions or ambiguous specifications (you figure it out) that can’t be tested in the first place.  This is a common and all too familiar challenge I see everywhere in my practice. And I mean everywhere.

How bad is it?

More than 31% of all projects will be cancelled before completion. 53% of projects will take more time to complete and will cost twice as of their original estimates. Overall, the train_wreck_2_lsuccess rate for IT projects is less than 30%. Using generally accepted quantitative findings fixing requirements defects (errors) can eat up to roughly one third of most software project budgets. For example, requirements errors have been found to consume 28% to 42.5% of total software development cost (Hooks and Farry, 2001).  Rework can consume up to 40–50 percent of project budgets (Boehm and Basili). In fact, faulty requirements form the root cause for 75–85 percent of rework (Leffingwell). And early detection is critical. The cost to a 1 million dollar budget at the low end of scale can be more than $250,000. Finding and fixing defects outside the SDLC phase where it originates has its own cost (up to 400 times as much to remedy – Reifer, 2007).  It might cost $250 to $1000 to find and fix a requirements defect during the requirements phase. If uncovered when the product is in production repairing it can cost $100,000 to $400,000 in this example.  As Willie Sutton once said he robbed banks “because that’s where the money is”.   And so it is here where the real opportunity lies (and has been for some time) to capture and preserve real business value.  Reducing specification requirements errors is the single most effective action we can take to improve project outcomes.

The four biggest causes of failed projects are:

  1. incomplete, missing, or poorly written requirements,
  2. lack of user input and buy-in,
  3. failure to define clear objectives, and
  4. scope creep.

All four of these toot causes result from poor requirements and specifications practices. It is clear that current requirement management processes and tools are simply not working.

So what?

This is how we have always performed the work, why change the way we do this now? Crafting requirements specifications can seem like a daunting task. And how can the author tell the difference between a good requirement and a bad one? How can we ensure clear unambiguous system requirements are crafted on a consistent and repeatable basis? While many new business analysts are looking for templates it is rare a template provides enough rigor and information to help write better system requirements. Templates can help you get started, but they don’t really tell you what a good requirement looks like.  Understanding what represents good content in the requirement is a little more difficult (ex; Specific, Measurable, Achievable, Realistic, and Timely).  Even more difficult is preventing text ambiguity from occurring in the document(s) due to the presence of ‘non-actionable’, vague or imprecise language.

There is a better way

I have stumbled across a better way to tackle this challenge. The approach uses process improvements, a few new roles, and Natural Language Processing (NLP) technology to identify defects early in the development life cycle before they become expensive to correct.

The approach I propose is inexpensive and mitigates much of the risk associated with defects and errors early in the development life cycle.  We already have many fine tools to manage the traditional requirements practice. What I intend to share with you is very different and until very recently widely unknown to many in the development community.  The method works (IIBA’s Business Analysis Body of Knowledge) across all five types of requirements:

  1. Business Requirements
  2. Stakeholder Requirements
  3. Functional Requirements
  4. Nonfunctional Requirements, and
  5. Transition Requirements

Word Cloud (NLP)The immediate benefit reduces costly rework (or downstream change requests) and the opportunity to build the right product the first time.  The more significant value is uncovering and reducing requirements defects early. This is the single most effective action (have I said this before?) we can take to improve our project outcomes. In the coming weeks I will address a method to tackle this challenge that is inexpensive and uncovers many of the defects and errors early in the development cycle.  I will share with you how to evaluate the following key elements:

  • Structural Completeness/Alignment (%)
  • Text Ambiguity and Context (Quality %)
  • Section Quantity/Size
  • Volatility (Change over time)
  • Lexicon (Frequency Distribution across documents) Discovery
  • Plain Language (Readability) Word Complexity Density – complex words

I will also illustrate how use Natural Language Processing (NLP) and text engineering tools to automate and identify defects. One of the most powerful features of this method is the immediate feedback provided to authors and designers early in the development life cycle.  They can then take action to correct the defects uncovered while they are still relatively inexpensive to remedy.  The end result is improved specifications to drive better products.  So if you are program manager, analyst, developer, or architect stay tuned for an approach that just may change the way you think about your writing. And improve your delivery of quality systems.

Design Goals

Design GoalsIn my last post (Wide open spaces) we discussed the elegance of using space based architecture platforms based on their simplicity and power. Compared to other models for developing distributed applications, it offers simpler design, savings in development and debugging effort, and more robust results that are easier to maintain and integrate.  Recall, this model combines and integrates distributed caching, content-based distributed messaging, and parallel processing into a powerful architecture within a grid computing framework.  

That was a mouthful. You may want to read that last sentence again carefully. And think about what this means to you as a professional practitioner.  More importantly, how this may change the way you think about application platforms in general.

Before diving into this important concept, I think it is always good idea to express our stated design goals right up front – and use these to guide the inevitable trade-offs and decisions that will need to be made along this journey. So let’s get started with a few design goals I’m comfortable with. I’m sure there are more, but this represents a good start.

The platform’s ability to scale must be completely transparent

The architecture should be based on technology that can be deployed across a grid of commodity hardware nodes, providing a scalable and adaptable platform that supports high-volume, high-performance processing. The resulting platform should be tolerant of failure in individual nodes, can be matched to changing volumes easily by increasing (or decreasing) the number of processing nodes and, by virtue of its decoupled business logic, is extendible and adaptable to evolve as the business landscape changes.

Unlike conventional application server models, our elastic application platform should not require application developers to do anything different in their code in order to scale. The developer uses a simple API that provides a vast key-value data store that looks like a large shared memory space. Underneath the covers distributed caching features of the application platform spread the data across multiple servers (e.g. using a sophisticated hash algorithm). The application developer should remain unaware of the underlying implementation that distributes the data across the servers on his behalf. In brief, the goal of the grid-enabled middleware is designed to hide complexities of partitioning, distributing, and load balancing.

The platform provides resiliency by design

Applications must be available to customers and expected service level objectives must be met.  The business cannot afford a single point of failure to impact customer access to other features and functions of the customer applications suite otherwise available. The platform should operate continuously and needs to be highly resilient to avoid any interruption in processing. This means that the application suite cannot have any single point of failure in the software, hardware, or network. High Availability (HA) is a basic requirement.  Failing services and application components will continue on different backup servers, without service disruption.

Distributed data caches are resilient by design because they should automatically replicate data stored in the cache to one or more backup servers, guided by the policies defined by an administrator and executed in a consistent controlled manner. If one server fails, then another server provides the data (the more replicas, the more resilient the cache). Note, distributed data caches can be vulnerable to data center outages if all the compute servers are located in the same physical data center. To address this weakness, the distributed caching mechanism should offer special WAN features to replicate and recover data across multiple physical locations. The improvement in resilience reduces the risk of expensive system down-time due to hardware or software failure, allowing the business to continue operating albeit with reduced performance, during partial outages. An added benefit of this architecture composed of discrete units working together would enable rapid development and a controlled introduction of new features in response to changing requirements without the need for a big-bang rollout approach.

The platform is prepared to meet demanding performance requirements

A performance characteristic of distributed caches is that they store data in fast-access memory rather than on disk, although backing store on disk may be an option. Since this data spans multiple servers, there is no bottleneck or single point of failure. Using this advanced elastic application platform provides a means to ensure that cached data will tend to be on the same server where application code is processing, reducing network latency. We can do this by implementing a “near-cache” concept that places data on the server running the application using that data or by directly managing application code execution in the platform, placing adjacent code and data in cache nodes that are on the same server.

The platform needs to support robust integratation with other data sources

Most distributed caching platforms offer read-through, write-through, and write-behind features to synchronize data in the cache with external data sources. Rather than the developer having to write the code that does this, an administrator configures the cache to automatically read or write to a database or other external data source whenever an application performs a data operation in the cache. Data is an asset that is valuable. Sharing this asset across the platform improves the ability to support better data enrichment, improve accuracy and meet business goals.

The platform’s application workload is by nature distributed

For elastic application platforms offering distributed code execution we should consider the nature of the workload the applications will present to servers. If we can divide the workload into units that naturally fit into the distribution schemes as offered then the greater sophistication of the distributed code execution capability can be just what’s needed to turn a troublesome, resource intensive application into one that performs well and meets expectations.

Specific application responsibilities that repeat (or are redundant) across the application architecture should be separated out in the application architecture.  Shared global or common use application functional solutions are sometimes referred to as “cross-cutting concerns” and forward the key principle of “separation of concerns”. The platform should support component designs which minimize coupling.  The law of Demeter (Principle of Least Knowledge or only know your neighbor applies). The platform should promote loose coupling by minimizing:

  • dependency between modules (e.g. shared global variables)
  • discouraging content coupling (one module relying on another’s content)
  • protocol or format dependencies
  • control based coupling where one program controls another’s behavior
  • Non-traceable message coupling which can lead to a dynamic spaghetti-like results impossible to manage

There are other goals I have not addressed here which we should all be familiar with to include:

  • Desire to BUY vs. Build and Maintain
  • Remain Technology and Vendor Independent
  • Promote Interoperability
  • Meet security and privacy needs

So, now we have a better idea of the design goals we are going to try to achieve. I think it is always important to take these goals to the next step in the high-level specification in order to begin quantifying how we will meet these into actionable objectives. Remember our original strategy which has driven our design goals. The design goals now should be used to create quantifiable objectives we can plan and measure progress to.

Wide open spaces

Wide open spaces - Wyoming

Okay, okay – know I should keep this blog more up to date, just have been a little busy with my day job… and now after a much needed rest in last weeks in August I can now share a few things you may find especially interesting and timely.  It is no coincidence that the image accompanying this post is of wide open spaces. This is in fact where I spent most satisfying part of my “summer vacation”.  And spaces (Tuple Spaces) is what I intend to share with you in the coming weeks.           

As architects we have a professional responsibility to always remain on the look-out for new (and sometimes revisited) ideas about how to improve and adopt good ideas. Especially when our business needs to invest in some key technology changes to remain competitive and deliver value customers will continue to seek for its distinctive quality of service and value.            

I have been pretty busy in the last year engaged in a variety of industries where road map development and execution of next generation platforms and paradigm shifts were needed.  Many of the more difficult challenges were solved by adopting a Space-Based Architecture (SBA) architecture pattern. This is a demonstrated pattern used to achieve near linear scalability of stateful, high-performance applications using the tuple spaces. This is not a new idea; the tuple space model was developed by David Gelernter over thirty years ago at Yale University. Implementations of tuple spaces have also been developed for Smalltalk, Java (JavaSpaces), and the .NET framework). A tuple space is an implementation of the associative memory model for parallel (distributed) computing by providing a repository of tuples that can be accessed concurrently. I know, this is a mouthful and a little too academic for me too. What this really means is we can group processors that produce pieces of data and group processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. This is also known as the blackboard metaphor. Tuple spaces may be thought as a form of distributed shared memory. The model is closely related to other patterns that have been proved successful in addressing the application scalability challenge used by Google and Amazon.com (EC2) for example. The model has also been applied by many firms in the securities industry for implementing scalable electronic securities trading applications for example.   

Before you think I have gone daft on you, I recommend you see a commercial implementation of this at Gigaspaces.  Review the site and developer documentation and you will see how this platform is used to embrace many of the principles of Representational State Transfer (REST), service-oriented architecture (SOA) and Event-driven architecture (EDA), as well as elements of grid computing.  The beauty of the space based architecture resides in its tandem of simplicity and power. Compared to other models for developing distributed applications, it offers simpler design, savings in development and debugging effort, and more robust results that are easier to maintain and integrate.          

The pattern represents a model that combines and integrates distributed caching (Data Grid), content-based distributed messaging (Messaging Grid), and parallel processing (Processing Grid) into a powerful service oriented architecture built on shared spaces within a grid computing framework. Research results and commercial use have shown that a large number of problems in parallel and distributed computing have been solved using this architecture. And the implications of its adoption beyond high performance On-Line Transaction Processing extend well into other uses (including Master Data Management, Complex Event Processing, and Rules Processing for example).           

And this is what I intend to share with you in the coming weeks. 
Wide open spaces…

Modeling the MDM Blueprint – Part VI

In this series we have discussed developing the MDM blueprint by developing the Common Information (part II), Canonical (part III) , and Operating (part IV)  models in our work. In Part V  I introduced the Reference Architecture model into the mix to apply the technical infrastructure or patterns we plan on using. The blueprint has now moved from being computation and platform independent to one of expressing intent through the use of more concrete platform specific model.  The solution specification is now documented (independent of the functional Business Requirements) to provide shared insight into the overall design solution.  Now it is time to bring the modeling products together and incorporate them into a MDM solution specification we can use in many ways to communicate the intent of the project.

First, the MDM blueprint specification becomes the vehicle for communicating the system’s design to interested stakeholders at each stage of its evolution. The blueprint can be used by:

  • Downstream designers and implementers to provide overall policy and design guidance. This establishes inviolable constraints (and a certain amount of freedom) on downstream development activities.
  • Testers and integrators to dictate the correct black-box behavior of the pieces that must fit together.
  • Technical managers as the basis for forming development teams corresponding to the work assignments identified.
  • Project managers as the basis for a work breakdown structure, planning, allocation of project resources, and tracking of progress by the various teams.
  • Designers of other systems with which this one must interoperate to define the set of operations provided and required, and the protocols for their operation, that allows the inter-operation to take place.

Second, the MDM blueprint specification provides a basis for performing up-front analysis to validate (or uncover deficiencies) design decisions and refine or alter those decisions where necessary. The blueprint could be used by:

  • Architects and requirements engineers who represent the customer the MDM blueprint specification becomes the forum for negotiating and making trade-offs among competing requirements.
  • Architects and component designers as a vehicle for arbitrating resource contention and establishing performance and other kinds of run-time resource consumption budgets.
  • Development using vendor-provided products from the commercial marketplace to establish the possibilities for commercial off-the-shelf (COTS) component integration by setting system and component boundaries and establishing requirements for the required behavior and quality properties of those components.
  • Architects to evaluate the ability of the design to meet the system’s quality objectives. The MDM blueprint specification serves as the input for architectural evaluation methods such as the Software Architecture Analysis Method [and the Architecture Tradeoff Analysis Method (ATAM-SM) and Software Performance Engineering (SPE)   as well as less ambitious (and less effective) activities such as unfocused design walkthroughs.
  • Performance engineers as the formal model that drives analytical tools such as rate schedulers, simulations, and simulation generators.
  • Development product line managers to determine whether a potential new member of a product family is in or out of scope, and if out, by how much.

Third the MDM blueprint becomes the first artifact used to achieve system understanding for:

  • Technical mangers as the basis for conformance checking, for assurance that implementations have in fact been faithful to the architectural prescriptions.
  • Maintainers as a starting point for maintenance activities, revealing the areas a prospective change will affect.
  • New project members, as the first artifact for familiarization with a system’s design.
  • New architects as the artifacts that (if properly documented) preserve and capture the previous incumbent’s knowledge and rationale.
  • Re-engineers as the first artifact recovered from a program understanding activity or (in the event that the architecture is known or has already been recovered) the artifact that drives program understanding activities at the appropriate level of component granularity.

Blueprint for MDM – Where this fits within a larger program

Developing and refining the MDM blueprint is typically associated with larger programs or strategic initiatives. In this last part of the series I will now discuss where all this typically fits within a larger program and how to organize and plan this work within context.  The following diagram (click to enlarge and use your browser to magnify the png file) puts our modeling efforts within the context of a larger program taken from a mix of actual engagements with large, global customers.  The key MDM blueprint components are highlighted with numbers representing:

  1. Common Information Model
  2. The Canonical Model
  3. The Operating Model
  4. The Reference Architecture
Program Management Design

Click to enlarge

I have also assumed a business case exists (you have this right?) and the functional requirements are known.  Taken together with the MDM blueprint we now have a powerful arsenal of robust information products we can use to prepare a high quality solution specification that is relevant and can be used to meet a wide variety of needs.  Typically, use of the MDM blueprint may include:

  • Identifying all necessary components and services
  • Reviewing existing progress to validate (or uncover deficiencies in) design decisions; refine or alter those decisions where necessary
  • Preparation of detailed planning products (Product, Organization, and Work Breakdown structures)
  • Program planning and coordination of resources
  • Facilitating prioritization of key requirements – technical and business
  • Development of Request for Quotation, Request for Information products (make vs. buy)
  • Preparing funding estimates (Capital and Operating Expense) and program budget preparation
  • Understanding a vendors contribution to the solution and pricing accordingly (for example, repurpose as needed in contract and licensing activities and decouple supplier proprietary lock-in from solution where appropriate)

We are also helping to ensure the business needs drive the solution by mitigating the impact of the dreaded Vendor Driven Architecture (VDA) in the MDM solution specification.

Summary

I hope you have enjoyed this brief journey through Modeling the MDM blueprint and have gained something from my experience.  I’m always interested in learning from others, please let me know what you have encountered yourself, and maybe we can help others avoid the pitfalls and pain in this difficult demanding work.  A key differentiator and the difference between success and failure on an MDM journey is taking the time to model the blueprint and share this early and often with the business.  This is after all a business project, not an elegant technical exercise.  In an early reference I mentioned Ward Cunningham’s Technical Debt concept.  Recall this metaphor means doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choices we have made. The technical debt and resulting interest due in MDM initiative with this kind of far-reaching impact across the enterprise is, well, unthinkable. Take the time to develop your MDM blueprint and use this product to ensure success by clearly communicating business and technical intent with your stakeholders.

Modeling the MDM Blueprint – Part V

er_modelIn this series we have discussed developing the MDM blueprint by creating Common Information (part II), Canonical (part III), and Operating (part IV) models in our work streams. We have introduced the Operating Model into the mix to communicate how the solution will be adopted and used to realize the benefits we expect with the business in a meaningful way.  And hopefully set reasonable expectations with our business partners as to what this solution will look like when deployed.

Now it is time to model and apply the technical infrastructure or patterns we plan on using. The blueprint now moves from being computation and platform independent to one of expressing intent through the use of more concrete platform specific models.

Reference Architecture
After the initial (CIM, Canonical, and Operating models) work is completed then, and only then are we ready to move on to the computation and platform specific models. We know how to do this well – for example see Information service patterns, Part 4: Master Data Management architecture patterns.

At this point we now have enough information to create the reference architecture. One way (there are several) to organize this content is to use the Rozanski and Woods extensions to the classic 4+1 view model introduced by Philippe Kruchten. The views are used to describe the system in the viewpoint of different stakeholders (end-users, developers and project managers). The four views of the model are logical, development, process and physical view. In addition selected use cases or scenarios are used to demonstrate or show the architecture’s intent. Which is why the model contains 4+1 views (the +1 being the selected scenarios). 

41views1

Rozanski and Woods extended this idea by introducing a catalog of six core viewpoints for information systems architecture: the Functional, Information, Concurrency, Development, Deployment, and Operational viewpoints and related perspectives. This is elaborated in detail in their book titled “Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives”. There is much to learn from their work, I encourage you to visit the book’s web site for more information.

What we are describing here is how MDM leadership within very large-scale organization can eventually realize the five key “markers” or characteristics in the reference architecture to include:

- Shared services architecture evolving to process hubs;
– Sophisticated hierarchy management;
– High-performance identity management;
– Data governance-ready framework; and
– Registry, persisted or hybrid design options in the architecture selected.

Recommended, this is an exceptional way to tie the technical models back to the stakeholders needs as reflected in the viewpoints, perspectives, guidelines, principles, and template models used in the reference architecture. Grady Booch said “…the 4+1 view model has proven to be both necessary and sufficient for most interesting systems”, and there is no doubt that MDM is interesting.  Once this work has been accomplished and agreed to as part of a common vision, we have several different options to proceed with. One interesting approach is leveraging this effort into a Service Orientated Modeling Framework introduced by Michael Bell at Methodologies Corporation.

Service Orientated Modeling
The service-oriented modeling framework (SOMF) is a service-oriented development life cycle methodology. It offers a number of modeling practices and disciplines that contribute to a successful somf_v_2_0service-oriented life cycle management and modeling. It illustrates the major elements that identify the “what to do” aspects of a service development scheme. These are the modeling pillars that will enable practitioners to craft an effective project plan and to identify the milestones of a service-oriented initiative—in this case crafting an effective MDM solution.

SOMF provides four major SOA modeling styles that are useful throughout a service life cycle (conceptualization, discovery and analysis, business integration, logical design, conceptual and logical architecture). These modeling styles: Circular, Hierarchical, Network, and Star, can assist us with the following modeling aspects:

- Identify service relationships: contextual and technological affiliations
– Establish message routes between consumers and services
– Provide efficient service orchestration and choreography methods
– Create powerful service transaction and behavioral patterns
– Offer valuable service packaging solutions

SOMF Modeling Styles
SOMF offers four major service-oriented modeling styles. Each pattern identifies the various approaches and strategies that one should consider employing when modeling MDM services in a SOA environment.

- Circular Modeling Style: enables message exchange in a circular fashion, rather than employing a controller to carry out the distribution of messages. The Circular Style also offers a way to affiliate services.

- Hierarchical Modeling Style: offers a relationship pattern between services for the purpose of establishing transactions and message exchange routes between consumers and services. The Hierarchical pattern enforces parent/child associations between services and lends itself to a well known taxonomy. somf_styles

- Network Modeling Style: this pattern establishes “many to many” relationship between services, their peer services, and consumers similar to RDF. The Network pattern accentuates on distributed environments and interoperable computing networks.

- Star Modeling Style: the Star pattern advocates arranging services in a star formation, in which the central service passes messages to its extending arms. The Star modeling style is often used in “multi casting” or “publish and subscribe” instances, where “solicitation” or “fire and forget” message styles are involved.

There is much more to this method, encourage you to visit the Methodologies Corporation site (Michael is the founder) and download the tools, power point presentations, and articles they have shared with us.

Summary
So, based on my experience we have to get this modeling effort completed to improve the probability we will be successful. MDM is really just another set of tools and processes for modeling and managing business knowledge of data in a sustainable way.  Take the time to develop a robust blueprint to include Common Information (semantic, pragmatic and logical modeling), Canonical, (business rules and format specifications), and Operating Models to ensure completeness.  Use these models to drive a suitable Reference Architecture to guide design choices in the technical implementation.

This is hard, difficult work. Anything worthwhile usually is. Why put the business at risk to solve this important and urgent need without our stakeholders understanding and real enthusiasm for shared success?  A key differentiator and the difference between success and failure on an MDM journey is taking the time to model the blueprint and share this early and often with the business.  This is after all a business project, not an elegant technical exercise.  Creating and sharing a common vision through our modeling efforts helps ensure success from inception through adoption by communicating clearly the business and technical intent of each element of the MDM program.

In the last part of the series I will be discussing where all this fits into the larger MDM program and how to plan, organize, and complete this work.

Modeling the MDM Blueprint – Part II

whiteboardIn part I of this series we discussed what essential elements should be included in a MDM blueprint. The important thing to remember is the MDM is a business project that requires establishing of a common set of models that can be referenced independent of the technical infrastructure or patterns you plan on using. The blueprint should remain computation and platform independent until  the models are completed (and accepted by the business) to support and ensure the business intent. The essential elements should include:

- Common Information Model
– Canonical Model
– Operating Model, and
– Reference Architecture (e.g. 4+1 views, viewpoints and perspectives).

We will now turn our attention to first element, the Common Information Model.

A Common Information Model (CIM) is defined using relational, object, hierarchical, and semantic modeling methods. What we are really developing here is rich semantic data architecture in selected business domains using:

  • Object Oriented modeling 
    Reusable data types, inheritance, operations for validating data
  • Relational
    Manage referential integrity constraints (Primary Key, Foreign Key)
  • Hierarchical
    Nested data types and facets for declaring behaviors on data (e.g. think XML Schemas)
  • Semantic models
    Ontologies defined through RDF, RDFS and OWL

I believe (others may not) that MDM truly represents the intersection of Relational, Object, Hierarchical, and semantic modeling methods to achieve a rich expression of the realitycim_diagram the organization is operating in.  Expressed in business terms this model represents a “foundation principal” or theme we can pivot around to understand each facet in the proper context.  This is not easy to pull off, but will provide a fighting chance to resolve semantic differences in a way that help focus the business on the real matter at hand. This is especially important when the developing the Canonical model introduced in the next step.

If you want to see what one of these looks like visit the MDM Alliance Group (MAG).  MAG is a community Pierre Bonnet founded to share MDM Modeling procedures and prebuilt data models.  The MDM Alliance Group publishes a set of prebuilt data models that include the usual suspects (Location, Asset, Party, Party Relationship, Party Role, Event, Period [Date, Time, Condition]) downloadable from the website. And some more interesting models like Classification (Taxonomy) and Thesaurus organized across three domains. Although we may disagree about the “semantics” I do agree with him adopting this approach can help us avoid setting up siloed reference databases “…unfortunately often noted when using specific functional approaches such as PIM (Product Information Management) and CDI (Customer Data Integration) modeling”.  How true. And a very common issue I encounter often.

Another good example is the CIM developed over the years at the Distributed Management Task Force (DMTF). You can get the CIM V2.20 Schema MOF, PDF and UML at their web site and take a look for yourself. While this is not what most of us think of as MDM, they are solving for some of the same problems and challenges we face.

Even more interesting is what is happening in semantic technology. Building semantic models (ontologies) include many of the same concepts found in the other modeling methods we have already discussed but further extend the expressive quality we often need to fully communicate intent. For example:

- Ontolgies can be used at run time (queried and reasoned over).
– Relationships are first-class constructs.
– Classes and attributes (properties) are set-based and dynamic.
– Business rules are encoded and organized using axioms.
– XML schemas are graphs not trees, and used for reasoning.

If you haven’t been exposed to ontology development I encourage you to grab the open source Protege Ontology Editor and discover for yourself what this all about.  And while you are there see the Protégé Wiki  and grab the Federal Enterprise Architecture Reference Model Ontology (FEA-RMO) for an example of its use in the EA world.   Or see the set of tools found at the Essential project. The project uses this tool to enter model content, based on a model pre-built for Protégé. While you are at the Protégé Wiki  grab some of the ontologies developed for use with this tool for other examples, such as the SWEET Ontologies (A Semantic Web for Earth and Environmental Terminology. Source: Jet Propulsion Laboratory).  For more on this, see my post on this tool at Essential Analytics. This is an interesting and especially useful modeling method to be aware of and an important tool to have at your disposal.

This is hard challenging work. Doing anything worthwhile usually is.  A key differentiator and the difference between success and failure on your MDM journey will be taking the time to model the blueprint and sharing this work early and often with the business. We will be discussing the second element of the MDM blueprint, the Canonical model in part III. I encourage you to participate and share your professional experience.

Modeling the MDM Blueprint – Part I

Several practitioners have contributed to this complex and elusive subject (see Dan Power’s Five Essential Elements of MDM and CDI for example) and have done a good job at elaborating the essential elements.  There is one more element often overlooked in this field and remains a key differentiator and the difference between success and failure among the major initiatives I have had the opportunity to witness firsthand – modeling the blueprint for MDM.

pen1This is an important first step to take, assuming the business case is completed and approved. It forces us to address the very real challenges up front, before embarking on a journey that our stakeholders must understand and support in order to succeed. Obtaining buy-in and executive support means we all share a common vision for what we are solving for.

 MDM is more than maintaining a central repository of master data. The shared reference model should provide a resilient, adaptive blueprint to sustain high performance and value over time. A MDM solution should include the tools for modeling and managing business knowledge of data in a sustainable way.  This may seem like a tall order, but consider the implications if we focus on the tactical and exclude the reality of how the business will actually adopt and embrace all of your hard work. Or worse, asking the business to stare at a blank sheet of paper and expect them to tell you how to rationalize and manage the integrity rules connecting data across several systems, eliminate duplication and waste, and ensure an authoritative source of clean reliable information can be audited for completeness and accuracy.  Still waiting?

So what is in this blueprint?

The essential thing to remember is the MDM project is a business project that requires establishing of a common information model that applies whatever the technical infrastructure or patterns you plan on using may be. The blueprint should remain computation and platform independent until the Operating Model is defined (and accepted by the business), and a suitable Common Information Model (CIM) and Canonical model are completed to support and ensure the business intent. Then, and only then, are you ready to tackle the Reference Architecture.

The essential elements should include:
– Common Information Model
– Canonical Model
– Operating Model, and
– Reference Architecture (e.g. 4+1 views).

Will be discussing each of these important and necessary components within the MDM blueprint in the following weeks and encourage you to participate and share your professional experience. Adopting and succeeding at Master Data Management is not easy, and jumping into the “deep end” without truly understanding what you are solving for is never a good idea. Whether you are a hands-on practitioner, program manager, or an executive planner I can’t emphasize enough how critical modeling the MDM blueprint and sharing this with the stakeholders is to success. You simply have to get this right before proceeding further.