Big Data Analytics – Unlock Breakthrough Results: (Step 4)

wieghted_foodIn this step we look a little closer into defining the critical capabilities used across the four operating models discussed in an earlier post (Big Data Analytics – Unlock Breakthrough Results: Step 3). We are going to assign relative weights to each of the critical capabilities groups for each operating model uncovered earlier. This is done to assign the higher weighting to capability groupings most important to the success of each model. Having the quantified index means we can evaluate each platform or tool for optimization within quickly and produce meaningful results. We already know a set of tools and platforms which are ideal for Centralized Provisioning are usually unsuited for use within a Decentralized Analytics operating model. In contrast critical capability essential to Embedded Analytics is very different from Governed Data Discovery. Yes there are some capabilities that cross operating models (e.g. metadata), and some that are far important than others. So what we are doing in this step is just gathering and validating the relative importance of each so form truly does follow function. This will become increasingly clear when building the decision models to guide our actions.

What is a decision model?
A Decision Model is a new way of looking at analytics using business logic. A key enabler sandwiched between BPM and Business Rules, the logic is captured and knits both together to illustrate what drives the decisions in a business. Instead of trying to capture and manage the logic one business rule at a time, a Decision Model groups the information sources, knowledge, and decisions (including the rules) into their natural logical groups to create the structure that make the model so simple to capture, understand, communicate and manage. Using this method we will be using a proven approach for OMG_DMN_Imagesolving platform and tool optimization in the same way that proven practice suggests every analytic decision be made. DMN provides the constructs that are needed to model decisions, so that organizational decision-making can be readily depicted in diagrams, accurately defined by business analysts, and optionally use to specify and deploy automated decision-making. The objective is to illustrate a method to address the perplexing management challenge of platform and tool optimization. In this step we are simply using an organizing principle to continue grouping and categorizing our findings quantifying each capability in its complexity and nuance across several facets. For more on this see the OMG specification released in September 2015.

Relative Weights
The relative weights and further refinements should reflect your site specific needs so there is less of chance of friction or semantical confusion when the decision model and the findings are shared with the stakeholders. This is a collaborative exercise where the findings are shared and confirmed with both technical and business stakeholders for agreement and validation. This usually means you (as an architect) create the baseline and then iteratively refine with the subject matter experts and business sponsors to agree on the final results or weights that will be used. This work still remains platform, tool, and vendor agnostic. We are simply trying to identify and assign quantitative measures to evaluate which functional (critical capability) is most important to each operating model. A good baseline to begin with is the Gartner work published as Critical Capabilities for Business Intelligence and Analytics Platforms this summer (12 May 2015 ID:G00270381). With this we have a reasonably good way to think about form and function across the different operating models which Gartner refers to in their work as baseline use cases. Recall that across any analytic landscape (including big data) we are most likely to encounter one or more of the four operating models to include:

– Centralized Provisioning,
– Decentralized Analytics,
– Governed Data Discovery, and
– OEM/Embedded Analytics.

This seems to be a sensible way to organize the decision model we building. Thanks to Gartner we also have a pretty good way to describe manage the fifteen (15) groups of critical capabilities to use when comparing or seeking platform and tool optimization within each model. The baseline used includes the following groups of features, functions, and enabling tools:

– Traditional Styles of Analysis
– Analytic Dashboards and Content
– IT-Developed Reports and Dashboards
– Platform Administration
– Metadata Management
– Business User Data Mash-up
– Cloud Deployment
– Collaboration and Social Integration
– Customer Services
– Development and Integration
– Ease of Use
– Embedded Analytics
– Free Form Interactive Exploration
– Internal Platform Integration
– Mobile

The purpose in all of this is arrive at some way to quantify which capability within each operating model is more important than the others; weighting their relative importance in satisfying need. In this step we are simply starting at a baseline. We can refine the critical analytic capabilities from this baseline to meet site specific needs before moving on to the weighting in the next step. Note these are high level summary weights. Each capability includes a number of different values or characteristics you can refine to any level of detail you believe necessary. They should all sum to the groups value (e.g. 20% for Platform Administration within the Centralized Provisioning model for example) to retain the integrity of the results.

For each of the fifteen (15) groups of critical capabilities we assign weights to be used in later steps to evaluate the relative importance of each within each operating model.


Note: the weights used in this example are based on the Gartner work referred to above. I have changed the metadata weighting to reflect my experience, leave the balance of the work to the next step after you have tailored this baseline to your environment and are ready to apply your own weighting.

We have already seen there are very different needs required for each of the models presented. As the decision model is introduced and developed the data points for each can be used to develop quick snapshots and quantitative indexes when evaluating the form and function for each optimization in question.

The fifteen (15) critical capabilities are now assigned relative weights used within each of the four operating models. We are now at a point where the analytic community profiles can be compiled to arrive at a defensible approach to quantifying the data used in the upcoming decision model. This has also helped clarify and understand the key capabilities that drive each operating model which we see can be very different as illustrated in the following diagram.


If you enjoyed this post, please share with anyone who may benefit from reading it. And don’t forget to click the follow button to be sure you don’t miss future posts. Planning on compiling all the materials and tools used in this series in one place, still unsure of what form and content would be the best for your professional use. Please take a few minutes and let me know what form and format you would find most valuable.

Suggested content for premium subscribers:
Big Data Analytics – Unlock Breakthrough Results: Step Four (4)
Operating Model Mind Map (for use with Mind Jet – see for more)
Analytic Core Capability Mind Map
Enterprise Analytics Mind Map
Analytics Critical Capability Workbooks
Analytics Critical Capability Glossary, detailed descriptions, and cross-reference
Reference Library with Supporting Documents

Prior Posts in this series can be found at:


Big Data Analytics – Unlock Breakthrough Results: (Step 1)

tlmd_mitos_que_afectan_la_vida_de_tu_mascota_17You’ve made the big data investment. You believe Nucleus Research when it says that an investment in analytics return a whopping thirteen (13) dollars for every one (1) dollar spent. Now it’s time to realize value. This series of posts is going to provide a detailed set of steps you can take to unlock this value in a number of ways.  As a simple use case I’m going to address the perplexing management challenge of platform and tool optimization across the analytic community as an example to illustrate each step. This post addresses the first of nine practical steps to take.  Although lengthy, please stick with me, I think this you find this valuable. I’m going to use a proven approach for solving platform and tool optimization in the same manner that proven practice suggests every analytic decision be made.  In this case I will leverage the CRISP-DM method (there are others I have used like SEMMA from SAS) to put business understanding front and center at the beginning of this example.

Yes, I will be eating my own dog food now (this is why a cute puppy is included in a technical post and not the Hadoop elephant) and getting a real taste of what proven practice should look like across the analytic community.  Recall the nine steps to take summarized in a prior post.

1) Gather current state analytics portfolio, interview stakeholders, and compile findings.
2) Determine the analytic operating models in use.
3) Refine Critical Analytic Capabilities as defined to meet site specific needs.
4) Weight Critical Analytic Capability according to each operating model in use.
5) Gather user profiles and simple population counts for each form of use.
6) Gather platform characteristics profiles.
7) Develop platform and tool signatures.
8) Gather data points and align with the findings.
9) Assemble findings and prepare a decision model for platform and tooling optimization.

Using the CRISP-DM method as a guideline, we find that each of the nine steps corresponds to the CRISP-DM method as illustrated in the following diagram.


Note there is some overlap between understanding the business and the data. The models we will be preparing will use a combination of working papers, logical models, databases, and the Decision Model Notation (DMN) from the OMG to wrap everything together.  In this example the output product is less about deploying or embedding an analytic decision and more about taking action based on the results of this work.

Step One – Gather Current State Portfolio
In this first step we are going to gather a deep understanding for what exists already within the enterprise and learn how the work effort is organized. Each examination should include at a minimum:

  • Organization (including its’ primary and supporting processes)
  • Significant Data Sources
  • Analytic Environments
  • Analytic Tools
  • Underlying technologies in use

The goal is to gather the current state analytics portfolio, interview stakeholders, and document our findings. In brief, this will become an integral part of the working papers we can build on in the steps to follow.  This is an important piece of the puzzle we are solving for. Do not even think about proceeding until this is complete. Note the following diagram (click to enlarge) illustrates the dependencies between accomplishing this field work and each component of the solution.


Unlocking Breakthrough Results – Dependency Diagram

If form follows function, this is where we begin to uncover the underlying analytic processes and how the business is organized. Understanding the business by evaluating the organization will provide invaluable clues to uncover what operating models are in use.  For example, if there is a business unit organized outside of IT and reporting to the business stakeholder, you will most likely have a decentralized analytics model in addition to the centralized provisioning most analytic communities already have in place.

Start with the organization charts; but do not stop there. Recommend you get a little closer to reality in the interview process to really understanding what is occurring in the community. By examining the underlying processes this will become clear. For example, what is the analytic community really doing? Do they use a standard method (CRISP-DM) or something else? An effective way to uncover this beyond the simple organization charts (which are never up-to-date and notorious for mislabeling what people are actually doing) is using a generally accepted model (like CRISP-DM) to organize the stakeholder interviews. This means we can truly understand what is typically performed by whom, using what processes to accomplish their work.  And where boundary conditions exist or in the worst case are un-defined.  An example is in order.  Using the CRISP-DM model we see there are a couple of clear activities that typically occur across all analytic communities.  This set of processes is summarized in the following diagram (click to enlarge).


Gathering the analytic inventory and organizing the interviews now becomes an exercise in knowing what to look for using this process model. For example, diving a little deeper we can now explore how modeling is performed during our interviews guided by a generally accepted method. We can structure questions around the how, who, and what is performed for each expected process or supporting activity. Following up on this line of questioning should normally lead to samples of the significant assets which are collected and managed within an analytic inventory. Let’s just start with the modeling effort and a few directed questions.

  • Which organization is responsible for the design, development, testing, and deployment of the models?
  • How do you select which modeling techniques to use? Where are the assumptions used captured?
  • How do you build the models?
  • Where do I find the following information about each model?
    •     Parameter, Variable Pooling Settings
    •     Model Descriptions
    •     Objectives
    •     Authoritative Knowledge Sources Used
    •     Business rules
    •     Anticipated processes used
    •     Expected Events
    •     Information Sources
    •     Data sets used
    •     Description of any Implementation Components needed
    •     A Summary of Organizations Impacted
    •     Description of any Analytic Insight and Effort needed
  • Are anticipated reporting requirements identified?
  • How is model testing designed and performed?
  • Is a regular assessment of the model performed to recognize decay?

When you hear the uncomfortable silence and eyes point to the floor you have just uncovered one meaningful challenge.  Most organizations I have consulted into DO NOT have an analytic inventory, much less a metadata repository (or even a simple information catalog) I would expect to support a consistent, repeatable process.  This is a key finding for another kind of work effort that is outside the scope of this discussion.  All we are doing here is trying to understand what is being used to produce and deploy information products within the analytic community.  And is form really following function as the organization charts have tried to depict? Really?

An important note: we are not in a process improvement effort; not just yet. Our objective is focused on platform and tool optimization across the analytic community.  Believing form really does follow function it should be clear after this step what platforms and tools are enabling (or inhibiting) effective response and solving for this important and urgent problem across the organization.

Significant Data Sources
The next activity in this step is to also gain a deeper understanding what data is needed to meet the new demands and business opportunities made possible with big data.  Let’s begin with understanding how the raw materials or data stores can be categorized.  Data may be sourced from any number of sources to include one or more of the following:

  • Structured data (from  tables, records)
  • Demographic data
  • Times series data
  • Web log data
  • Geospatial data
  • Clickstream data from websites
  • Real-time event data
  • Internal text data (i.e. from e-mails, call center notes, claims, etc.)
  • External social media text data

If you are lucky there will be an enterprise data model or someone in enterprise architecture who can point to the major data sources and where the system of record resides. These are most likely organized by subject area (Customer, Account, Location, etc.) and almost always include schema-on-write structures. Although the focus is big data, it still is important to recognize that vast majority of data collected originates in transactional systems (e.g. Point of Sale).  Look for curated data sets and information catalogs (better yet an up-to-date metadata repository like Adaptive or Alation) to accelerate this task if present.

Data in and of itself is not very useful until it is converted or processed into useful information.  So here is a useful way to think about how this is viewed or characterized in general. The flow of information across applications and the analytic community from sources external to the organization can take on many forms. Major data sources can be grouped into three (3) major categories:

  • Structured Information,
  • Semi-Structured Information and
  • Unstructured Information.

While modelling techniques for structured information have been around for some time, semi-structured and unstructured information formats are growing in importance. Unstructured data presents a more challenging effort.  Many believe up to 80% of the information in a typical organization is unstructured this must be an important area for focus as part of an overall information management strategy. It is an area, however, where the accepted best practices are not nearly as well-defined. Data standards provide an important mechanism for structuring information. Controlled vocabularies are also helpful (if available) to focus on the use of standards to reduce complexity and improve reusability. When we get to modeling platform characteristics and signatures in the later steps the output of this work will become increasingly valuable.

Analytic Landscape
I have grouped the analytic environments, tools, and underlying technologies together in this step because they are usually the easiest data points to gather and compile.

  • Environments
    Environments are usually described as platforms and can take several different forms. For example, you can group these according to intended use as follows:
    – Enterprise Data Warehouse
    – Specialized Data Marts
    – Hadoop (Big Data)
    – Operational Data Stores
    – Special Purpose Appliances (MPP)
    – Online Analytical Processor (OLAP)
    – Data Visualization and Discovery
    – Data Science (Advanced Platforms such as the SAS Data Grid)
    – NLP and Text Engineering
    – Desktop (Individual Contributor; yes think how pervasive Excel and Access are)
  • Analytic Tools
    Gathering and compiling tools is a little more interesting. There is such a wide variety of tools designed to meet several different needs, and significant overlap in functions delivered exists among them. One way to approach this is group by intended use.  Try using the INFORMS taxonomy for example to group the analytic tools you find.  There work identified three hierarchical but sometimes overlapping groupings for analytics categories: descriptive, predictive, and prescriptive analytics. These three groups are hierarchical and can be viewed in terms of the level of analytics maturity of the organization.  Recognize there are three types of data analysis:

    • Descriptive (some have split Diagnostic into it’s own category)
    • Predictive (forecasting)
    • Prescriptive (optimization and simulation)

This simple classification scheme can be extended to include lower level nodes and improved granularity if needed. The following diagram illustrates a graphical depiction of the simple taxonomy developed by INFORMS and widely adopted by most industry leaders as well as academic institutions.


Source: INFORMS (Institute for Operations Research and Management Science)

Even though these three groupings of analytics are hierarchical in complexity and sophistication, moving from one to another is not clearly separable. That is, the analytics community may be using tools to support descriptive analytics (e.g. dashboards, standard reporting) while at the same time using other tools for predictive and even prescriptive analytics capability in a somewhat piecemeal fashion. And don’t forget to include the supporting tools which may include metadata functions, modeling notation, and collaborative workspaces for use within the analytic community.

  • Underlying technologies in use
    Technologies in use can be described and grouped as follows (and this just a simple example and is not intended to be an exhaustive compilation).

    • Relational Databases
    • MPP Databases
    • NOSQL databases
      • Key-value stores
      • Document store
      • Graph
      • Object database
      • Tabular
      • Tuple store, Triple/quad store (RDF) database
      • Multi-Value
      • Multi-model database
    • Semi and Unstructured Data Handlers
    • ETL or ELT Tools
    • Data Synchronization
    • Data Integration – Access and Delivery

Putting It All Together
Not that we have compiled the important information needed, where do we put this for the later stages of the work effort?  In an organization of any size this can be quite a challenge, just due to the sheer size and number of critical facets we will need later, the number of data points, and the need to re-purpose and leverage this in a number of views and perspectives.

Here is what has worked for me.  First use a mind or concept map (Mind Jet for example) to organize and store URIs to the underlying assets. Structure, flexibility, and the ability to export and consume data from a wide variety of sources is a real plus.  The following diagram illustrates an example template I use to organize an effort like this. Note the icons (notepad, paperclip, and MS-Office) even at this high level point to a wide variety of content gathered and compiled in the fieldwork (including interview notes and observations).


Enterprise Analytics – Mind Map Example

For larger organizations without an existing Project Portfolio Management (PPM) tool or metadata repository that supports customizations (extensions, flexible data structures) it is sometimes best to augment the maps with a logical and physical database populated with the values already collected and organized in specific nodes of the map.  A partial fragment of a logical model would look something like this, where some sample values are captured in the yellow notes.


Logical Model Fragment

Armed with the current state analytics landscape (processes and portfolio), stakeholder’s contributions, and the findings compiled we are now ready to move on to the real work at hand. In step (2) we will use this information to determine the analytics operating models in use supported by the facts.

If you enjoyed this post, please share with anyone who may benefit from reading it. And don’t forget to click the follow button to be sure you don’t miss future posts. Planning on compiling all the materials and tools used in this series in one place, still unsure of what form and content would be the best for your professional use.  Please take a few minutes and let me know what form and format you would find most valuable.

Suggested content for premium subscribers: 
Big Data Analytics - Unlock Breakthrough Results:(Step 1) 
CRISP-DM Mind Map (for use with Mind Jet, see for more)
UML for dependency diagrams.  Use with yUML (see
Enterprise Analytics Mind Map (for use with Mind Jet)
Logical Data Model (DDL; use with your favorite tool)
Analytics Taxonomy, Glossary (MS-Office)
Reference Library with Supporting Documents

Big Data Analytics – Nine Easy Steps to Unlock Breakthrough Results

breakthrough1An earlier post addressed one of the more perplexing challenges to managing an analytic community of any size against the irresistible urge to cling to what everyone else seems to be doing without thinking carefully about what is needed, not just wanted.  This has become more important and urgent with the breath-taking speed of Big Data adoption in the analytic community. Older management styles and obsolete thinking have created needless friction between the business and their supporting IT organizations.  To unlock breakthrough results requires a deep understanding of why this friction is occurring and what can be done to reduce this unnecessary effort so everyone can get back to the work at hand.

There are two very real and conflicting views that we need to balance carefully.  The first, driven by the business is concerned with just getting the job done and lends itself to an environment where tools (and even methods) proliferate rapidly. In most cases this results in overlapping and redundant expensive functionality.  Less concerned with solving problems once, the analytic community is characterized by many independent efforts where significant intellectual property (analytic insight) is not captured and inadvertently placed at risk.

The second view, in contrast, is driven by the supporting IT organization charged with managing and delivering supporting services across a technology portfolio that values efficiency and effectiveness.  The ruthless pursuit of eliminating redundancy, leveraging the benefits of standardization, and optimizing investment drive this behavior.  So this is where the friction is introduced. Until you understand this dynamic be prepared to experience organizational behavior that seems puzzling and downright silly at times.  Questions like these (yes they are real) seem to never be resolved.

– Why do we need another data visualization tool when we already have five in the portfolio?
– Why can’t we just settle on one NoSQL alternative?
– Is the data lake really a place to worry about data redundancy?
– Should we use the same Data Quality tools and principles in our Big Data environment?

What to Do
So I’m going to share a method to help resolve this challenge and help focus on what is important so you can expend your nervous system solving problems rather than creating them. Armed with a true understanding of the organizational dynamics it is now a good time to revisit a first principle that form follows function to help resolve and mitigate what is an important and urgent problem. For more on this important principle see Big Data Analytics and Cheap Suits.

This method knits together several key components and tools to craft an approach that you may find useful.  The goal is to organize and focus the right resources to ensure successful Big Data Analytic programs meet expectations. Because of the content delivered believe I will just break this down into several posts, each building on the other to keep the relative size and readability manageable.  This approach seemed to work with earlier series on Modeling the MDM Blueprint and How to Build a Roadmap so think I will stick to this method for now.

The Method
FLW_QuoteFirst let’s see what the approach looks like independent of any specific tools or methods.  This includes nine (9) steps which can be performed concurrently by both business and technical professionals working together to arrive at the suggested platform and tooling optimization decisions. Each of the nine (9) steps in this method will be accompanied by a suggested tool or method to help you prepare your findings in a meaningful way.  Most of these tools are quite simple; some will be a little more sophisticated.  This represents a starting point on your journey and can be extended in any number of ways to create more refined uses to re-purpose the data and facts collected in this effort. The important point is all steps are designed organize and focus the right resources to ensure successful Big Data Analytic programs meet expectations.  Executed properly you will find a seemingly effortless way to help:

– Reduce unnecessary effort
– Capture, manage, and operationally use analytic insight
– Uncover inefficient tools and processes and take action to remedy
– Tie results directly to business goals constrained by scope and objectives

So presented here is a simplified method to follow to compile an important body of work, supported by facts and data to arrive at any number of important decisions in your analytics portfolio.

1) Gather current state analytic portfolio, interview stakeholders, and document findings
2) Determine the analytic operating model in use (will have more than one, most likely)
3) Refine Critical Analytic Capabilities as defined to meet site specific needs
4) Weight Critical Analytic Capability according to each operating model in use
5) Gather user profiles and simple population counts for each form of use
6) Gather platform characteristics profiles
7) Develop platform and tool signatures
8) Gather data points and align with the findings
9) Assemble findings and prepare a decision model for platform and tooling optimization

The following diagram illustrates the method graphically (click to enlarge).


In a follow-up post I will dive into each step starting with gathering current state analytic portfolio, interviewing stakeholders, and documenting your findings.  Along the way I will provide examples and tools you can use to help make your decisions and unlock breakthrough results. Stay tuned…

Modeling the MDM Blueprint – Part VI

In this series we have discussed developing the MDM blueprint by developing the Common Information (part II), Canonical (part III) , and Operating (part IV)  models in our work. In Part V  I introduced the Reference Architecture model into the mix to apply the technical infrastructure or patterns we plan on using. The blueprint has now moved from being computation and platform independent to one of expressing intent through the use of more concrete platform specific model.  The solution specification is now documented (independent of the functional Business Requirements) to provide shared insight into the overall design solution.  Now it is time to bring the modeling products together and incorporate them into a MDM solution specification we can use in many ways to communicate the intent of the project.

First, the MDM blueprint specification becomes the vehicle for communicating the system’s design to interested stakeholders at each stage of its evolution. The blueprint can be used by:

  • Downstream designers and implementers to provide overall policy and design guidance. This establishes inviolable constraints (and a certain amount of freedom) on downstream development activities.
  • Testers and integrators to dictate the correct black-box behavior of the pieces that must fit together.
  • Technical managers as the basis for forming development teams corresponding to the work assignments identified.
  • Project managers as the basis for a work breakdown structure, planning, allocation of project resources, and tracking of progress by the various teams.
  • Designers of other systems with which this one must interoperate to define the set of operations provided and required, and the protocols for their operation, that allows the inter-operation to take place.

Second, the MDM blueprint specification provides a basis for performing up-front analysis to validate (or uncover deficiencies) design decisions and refine or alter those decisions where necessary. The blueprint could be used by:

  • Architects and requirements engineers who represent the customer the MDM blueprint specification becomes the forum for negotiating and making trade-offs among competing requirements.
  • Architects and component designers as a vehicle for arbitrating resource contention and establishing performance and other kinds of run-time resource consumption budgets.
  • Development using vendor-provided products from the commercial marketplace to establish the possibilities for commercial off-the-shelf (COTS) component integration by setting system and component boundaries and establishing requirements for the required behavior and quality properties of those components.
  • Architects to evaluate the ability of the design to meet the system’s quality objectives. The MDM blueprint specification serves as the input for architectural evaluation methods such as the Software Architecture Analysis Method [and the Architecture Tradeoff Analysis Method (ATAM-SM) and Software Performance Engineering (SPE)   as well as less ambitious (and less effective) activities such as unfocused design walkthroughs.
  • Performance engineers as the formal model that drives analytical tools such as rate schedulers, simulations, and simulation generators.
  • Development product line managers to determine whether a potential new member of a product family is in or out of scope, and if out, by how much.

Third the MDM blueprint becomes the first artifact used to achieve system understanding for:

  • Technical mangers as the basis for conformance checking, for assurance that implementations have in fact been faithful to the architectural prescriptions.
  • Maintainers as a starting point for maintenance activities, revealing the areas a prospective change will affect.
  • New project members, as the first artifact for familiarization with a system’s design.
  • New architects as the artifacts that (if properly documented) preserve and capture the previous incumbent’s knowledge and rationale.
  • Re-engineers as the first artifact recovered from a program understanding activity or (in the event that the architecture is known or has already been recovered) the artifact that drives program understanding activities at the appropriate level of component granularity.

Blueprint for MDM – Where this fits within a larger program

Developing and refining the MDM blueprint is typically associated with larger programs or strategic initiatives. In this last part of the series I will now discuss where all this typically fits within a larger program and how to organize and plan this work within context.  The following diagram (click to enlarge and use your browser to magnify the png file) puts our modeling efforts within the context of a larger program taken from a mix of actual engagements with large, global customers.  The key MDM blueprint components are highlighted with numbers representing:

  1. Common Information Model
  2. The Canonical Model
  3. The Operating Model
  4. The Reference Architecture
Program Management Design

Click to enlarge

I have also assumed a business case exists (you have this right?) and the functional requirements are known.  Taken together with the MDM blueprint we now have a powerful arsenal of robust information products we can use to prepare a high quality solution specification that is relevant and can be used to meet a wide variety of needs.  Typically, use of the MDM blueprint may include:

  • Identifying all necessary components and services
  • Reviewing existing progress to validate (or uncover deficiencies in) design decisions; refine or alter those decisions where necessary
  • Preparation of detailed planning products (Product, Organization, and Work Breakdown structures)
  • Program planning and coordination of resources
  • Facilitating prioritization of key requirements – technical and business
  • Development of Request for Quotation, Request for Information products (make vs. buy)
  • Preparing funding estimates (Capital and Operating Expense) and program budget preparation
  • Understanding a vendors contribution to the solution and pricing accordingly (for example, repurpose as needed in contract and licensing activities and decouple supplier proprietary lock-in from solution where appropriate)

We are also helping to ensure the business needs drive the solution by mitigating the impact of the dreaded Vendor Driven Architecture (VDA) in the MDM solution specification.


I hope you have enjoyed this brief journey through Modeling the MDM blueprint and have gained something from my experience.  I’m always interested in learning from others, please let me know what you have encountered yourself, and maybe we can help others avoid the pitfalls and pain in this difficult demanding work.  A key differentiator and the difference between success and failure on an MDM journey is taking the time to model the blueprint and share this early and often with the business.  This is after all a business project, not an elegant technical exercise.  In an early reference I mentioned Ward Cunningham’s Technical Debt concept.  Recall this metaphor means doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choices we have made. The technical debt and resulting interest due in MDM initiative with this kind of far-reaching impact across the enterprise is, well, unthinkable. Take the time to develop your MDM blueprint and use this product to ensure success by clearly communicating business and technical intent with your stakeholders.

Modeling the MDM Blueprint – Part IV

optionIn part II and III of this series we discussed the Common Information and Canonical Models. Because MDM is a business project we need to establish of a common set of models that can be referenced independent of the technical infrastructure or patterns we plan on using. Now it is time to introduce the Operating Model into the mix to communicate how the solution will actually be deployed and used to realize the benefits we expect with the business in a meaningful way.

This is the most important set of models you will undertake. And sadly not accounted for in practice “in the wild”, meaning rarely seen, much less achieved. This effort describes how the organization will govern, create, maintain, use, and analyze consistent, complete, contextual, and accurate data values for all stakeholders.

There are a couple of ways to do this. One interesting approach I have seen is to use the Galbraith Star Model as an organizational design framework. The model is developed within this framework to understand what design policies and guidelines will be needed to align organizational decision making and behavior within the MDM initiative. The Star model includes the following five categories:

Determine direction through goals, objectives, values and mission. It defines the criteria for selecting an organizational structure (for example functional or balanced Matrix). The strategy defines the ways of making the best trade-off between alternatives.

Determines the location of decision making power. Structure policies can be subdivided into:
– specialization: type and number of job specialties;
– shape: the span of control at each level in the hierarchy;
– distribution of power: the level of centralization versus decentralization;
– departmentalization: the basis to form departments (function, product, process, market or geography).

In our case this will really help when it comes time to designing the entitlement and data steward functions.

The flow of information and decision processes across the proposed organization’s structure. Processes can be either vertical through planning and budgeting, or horizontal through lateral relationships (matrix).

Reward Systems:
Influence the motivation of organization members to align employee goals with the organization’s objectives.

People and Policies:
Influence and define employee’s mindsets and skills through recruitment, promotion, rotation, training and development.

Now before your eyes glaze over, I’m only suggesting this be used as a starting point. We are not originating much of this thought capital, only examining the impact the adoption of MDM will have on the operating model within this framework. And more importantly identifying how any gaps uncovered will be addressed to ensure this model remains internally consistent. After all, we do want to enable the kind of behavior we expect in order to be effective, right? A typical design sequence starts with an understanding of the strategy as defined. This in turns drives the organizational structure. Processes are based on the organization’s structure. Structure and Processes define the implementation of reward systems and people policies.

The preferred sequence in this design process is composed in the following order:
a – strategy;
b – structure;
c – key processes;
d – key people;
e – roles and responsibilities;
f – information systems (supporting and ancillary);
g – performance measures and rewards;
h – training and development;
i – career paths.

The design process can be accomplished using a variety of tools and techniques. I have used IDEF, BPMN or other process management methods and tools (including RASIC charts describing roles and responsibilities for example), What ever tools you elect to use, they should effectively communicate intent and used to validate changes with the stakeholders who must be engaged in this process. Armed with a clear understanding of how the Star model works we can turn our attention to specific MDM model elements to include:

Master Data Life Cycle Management processes
– Process used to standardize the way the asset (data) is used across an enterprise
– Process to coordinate and manage the lifecycle of master data
– How to understand and model the life-cycle of each business object using state machines (UML)
– Process to externalize business rules locked in proprietary applications (ERP) for use with Business Rules Management Systems (BRMS) (if you are lucky enough to have one )
– Operating Unit interaction
– Stewardship (Governance Model)
– Version and variant management, permission management, approval processes.
– Context (languages, countries, channels, organizations, etc.) and inheritance of reference data values between contexts
– Hierarchy management
– Lineage (historical), auditability, traceability

I know this seems like a lot of work. Ensuring success and widespread adoption of Master Data Management mandates this kind of clear understanding and shared vision among all stakeholders. We do this to communicate how the solution will actually be deployed and used to realize the benefits we expect.

In many respects this is the business equivalent to the Technical Debt concept Ward Cunningham developed (we will address this in the next part on Reference Architecture) to help us think about this problem. Recall this metaphor means doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choices we have made. The same concept applies to this effort. The most elegant technical design may be the worst possible fit for the business. The interest due in a case like this is, well, unthinkable.

Take the time to get this right. You will be rewarded with enthusiastic and supportive sponsors who will welcome your efforts to achieve success within an operating model they understand.

Modeling the MDM Blueprint – Part II

whiteboardIn part I of this series we discussed what essential elements should be included in a MDM blueprint. The important thing to remember is the MDM is a business project that requires establishing of a common set of models that can be referenced independent of the technical infrastructure or patterns you plan on using. The blueprint should remain computation and platform independent until  the models are completed (and accepted by the business) to support and ensure the business intent. The essential elements should include:

– Common Information Model
– Canonical Model
– Operating Model, and
– Reference Architecture (e.g. 4+1 views, viewpoints and perspectives).

We will now turn our attention to first element, the Common Information Model.

A Common Information Model (CIM) is defined using relational, object, hierarchical, and semantic modeling methods. What we are really developing here is rich semantic data architecture in selected business domains using:

  • Object Oriented modeling 
    Reusable data types, inheritance, operations for validating data
  • Relational
    Manage referential integrity constraints (Primary Key, Foreign Key)
  • Hierarchical
    Nested data types and facets for declaring behaviors on data (e.g. think XML Schemas)
  • Semantic models
    Ontologies defined through RDF, RDFS and OWL

I believe (others may not) that MDM truly represents the intersection of Relational, Object, Hierarchical, and semantic modeling methods to achieve a rich expression of the realitycim_diagram the organization is operating in.  Expressed in business terms this model represents a “foundation principal” or theme we can pivot around to understand each facet in the proper context.  This is not easy to pull off, but will provide a fighting chance to resolve semantic differences in a way that help focus the business on the real matter at hand. This is especially important when the developing the Canonical model introduced in the next step.

If you want to see what one of these looks like visit the MDM Alliance Group (MAG).  MAG is a community Pierre Bonnet founded to share MDM Modeling procedures and prebuilt data models.  The MDM Alliance Group publishes a set of prebuilt data models that include the usual suspects (Location, Asset, Party, Party Relationship, Party Role, Event, Period [Date, Time, Condition]) downloadable from the website. And some more interesting models like Classification (Taxonomy) and Thesaurus organized across three domains. Although we may disagree about the “semantics” I do agree with him adopting this approach can help us avoid setting up siloed reference databases “…unfortunately often noted when using specific functional approaches such as PIM (Product Information Management) and CDI (Customer Data Integration) modeling”.  How true. And a very common issue I encounter often.

Another good example is the CIM developed over the years at the Distributed Management Task Force (DMTF). You can get the CIM V2.20 Schema MOF, PDF and UML at their web site and take a look for yourself. While this is not what most of us think of as MDM, they are solving for some of the same problems and challenges we face.

Even more interesting is what is happening in semantic technology. Building semantic models (ontologies) include many of the same concepts found in the other modeling methods we have already discussed but further extend the expressive quality we often need to fully communicate intent. For example:

– Ontolgies can be used at run time (queried and reasoned over).
– Relationships are first-class constructs.
– Classes and attributes (properties) are set-based and dynamic.
– Business rules are encoded and organized using axioms.
– XML schemas are graphs not trees, and used for reasoning.

If you haven’t been exposed to ontology development I encourage you to grab the open source Protege Ontology Editor and discover for yourself what this all about.  And while you are there see the Protégé Wiki  and grab the Federal Enterprise Architecture Reference Model Ontology (FEA-RMO) for an example of its use in the EA world.   Or see the set of tools found at the Essential project. The project uses this tool to enter model content, based on a model pre-built for Protégé. While you are at the Protégé Wiki  grab some of the ontologies developed for use with this tool for other examples, such as the SWEET Ontologies (A Semantic Web for Earth and Environmental Terminology. Source: Jet Propulsion Laboratory).  For more on this, see my post on this tool at Essential Analytics. This is an interesting and especially useful modeling method to be aware of and an important tool to have at your disposal.

This is hard challenging work. Doing anything worthwhile usually is.  A key differentiator and the difference between success and failure on your MDM journey will be taking the time to model the blueprint and sharing this work early and often with the business. We will be discussing the second element of the MDM blueprint, the Canonical model in part III. I encourage you to participate and share your professional experience.

Modeling the MDM Blueprint – Part I

Several practitioners have contributed to this complex and elusive subject (see Dan Power’s Five Essential Elements of MDM and CDI for example) and have done a good job at elaborating the essential elements.  There is one more element often overlooked in this field and remains a key differentiator and the difference between success and failure among the major initiatives I have had the opportunity to witness firsthand – modeling the blueprint for MDM.

pen1This is an important first step to take, assuming the business case is completed and approved. It forces us to address the very real challenges up front, before embarking on a journey that our stakeholders must understand and support in order to succeed. Obtaining buy-in and executive support means we all share a common vision for what we are solving for.

 MDM is more than maintaining a central repository of master data. The shared reference model should provide a resilient, adaptive blueprint to sustain high performance and value over time. A MDM solution should include the tools for modeling and managing business knowledge of data in a sustainable way.  This may seem like a tall order, but consider the implications if we focus on the tactical and exclude the reality of how the business will actually adopt and embrace all of your hard work. Or worse, asking the business to stare at a blank sheet of paper and expect them to tell you how to rationalize and manage the integrity rules connecting data across several systems, eliminate duplication and waste, and ensure an authoritative source of clean reliable information can be audited for completeness and accuracy.  Still waiting?

So what is in this blueprint?

The essential thing to remember is the MDM project is a business project that requires establishing of a common information model that applies whatever the technical infrastructure or patterns you plan on using may be. The blueprint should remain computation and platform independent until the Operating Model is defined (and accepted by the business), and a suitable Common Information Model (CIM) and Canonical model are completed to support and ensure the business intent. Then, and only then, are you ready to tackle the Reference Architecture.

The essential elements should include:
– Common Information Model
– Canonical Model
– Operating Model, and
– Reference Architecture (e.g. 4+1 views).

Will be discussing each of these important and necessary components within the MDM blueprint in the following weeks and encourage you to participate and share your professional experience. Adopting and succeeding at Master Data Management is not easy, and jumping into the “deep end” without truly understanding what you are solving for is never a good idea. Whether you are a hands-on practitioner, program manager, or an executive planner I can’t emphasize enough how critical modeling the MDM blueprint and sharing this with the stakeholders is to success. You simply have to get this right before proceeding further.