How to Build a Roadmap – Gap Analysis Update

update-stock-imageI have received a number of requests about the tools and methods used to complete the gap analysis steps from earlier posts in the series How to Build a Roadmap. In this series I have discussed the specific steps required to develop a well thought out road map where one of the key tasks was conducting a gap analysis exercise. Understanding the limits of a medium like this I have posted this update to explore the questions in a little more detail. Believe this will be extremely useful to anyone building a meaningful road map. The internet is full of simply awful templates and tools which range from the downright silly to extremely dangerous in their simple assumptions where there is no attempt to quantify results. Even more distressing is lack of understanding of how to use and leverage the best data sources you already have – the professionals within your own organization. Save yourself some time and read on.

Recall the road map development identified specific actions using an overall pattern ALL road maps should follow. The steps required to complete this work:

  1. Develop a clear and unambiguous understanding of the current state
  2. Define the desired end state
  3. Conduct a Gap Analysis exercise
  4. Prioritize the findings from the Gap Analysis exercise into a series of gap closure strategies
  5. Discover the optimum sequence of actions (recognizing predecessor – successor relationships)
  6. Develop and Publish the Road Map

The Gap Analysis step discussed how to develop a robust analytic to find any significant shortcomings between the current and desired end states. We use these findings to begin develop strategy alternatives (and related initiatives) to address what has been uncovered. Our intent is to identify and quantify the difference (delta) from where we are to what we aspire to become. This exercise is critical to find what needs to be accomplished. The gap analysis leads to a well-organized set of alternatives and practical strategies we can use to complete the remaining work. You can review the full post here.

Gap Analysis

Gap Analysis

The goal? Seek a quick and structured way to define actionable activities to be reviewed and approved by all stakeholders. We would like focus on the important actions requiring attention. This includes identifying a set of related organizational, functional, process, and technology initiatives needed. The gap closure recommendations give a clear line of sight back to what needs to be accomplished to close the “delta” or gaps uncovered in the analysis.

What is needed than is a consistent, repeatable way to evaluate quickly where an organization is, where they want to go, and the level of effort needed to accomplish their goals with some precision. In short the delta between current and desired state is uncovered, quantified, and ready for a meaningful road map effort based on factual findings supported by real evidence captured in the field. Performing a successful gap analysis begins with defining what you are analyzing which could be processes, products, a region, or an entire organization. Even at the overall organizational level, knowing what aspect you are analyzing is crucial to find and understand the intent and findings of the effort. Quickly focusing at the desired level of detail means we can now:

  • Know where to go; what really needs attention
  • Pinpoint opportunity…quickly.
  • Uncover what is preventing or holding back an important initiative
  • Know what to do – and in what suggested order

This is where problem solving using some quick management diagnostic tools can be used across a variety of challenges met when developing a road map. Using these tools to perform the gap analysis delivers quick distinctive results and provides the key data and actionable insight needed to develop a meaningful road map. This method (and the tools) can be used to:

  • evaluate capability using a generally accepted maturity model specific to the business,
  • focus on a specific subject area or domain; Master Data Management or Business Intelligence are two examples were known proven practice can be used as starting point to support the findings compiled,
  • assess the readiness of an important program or evaluate why it is in trouble,
  • evaluate and uncover root cause issues with a struggling project,
  • detect and measure what requires immediate attention by uncovering weaknesses where proven practice has not been followed or adopted.

Quick Management diagnostic tools
The tool set I use follows the same general pattern and structure, only the content or values differ based on how focused the effort is and what is needed to complete the work successfully. The questions, responses, and data points gathered and compiled are usually organized in a structured taxonomy of topics. See the earlier post (Define End State) for more on this. The key is using the same engine to tabulate values based on responses that can range from zero (0 – Never or No) to five (5 – Always or Yes). Of course you can customize the responses. In fact I have done this with a Program Readiness Assessment and BigData Analytics tool. The real point is to quantify the responses received. The engine component takes the results builds a summary, and produces accompanying tabs where radar graphs plots present the Framework, Topic, Lookup, # Questions, Current State Scores, Desired End State Scores, and common statistical results within each grouping. The tool can be extended to include MS Word document templates which then link to the findings worksheet and grabs the values and charts produced to assemble the draft document ready for further editing and interpretation. If all this sounds confusing, a couple of examples may be helpful.

Using the Data Maturity Model (CMMI) to Evaluate Capability
The Data Maturity Model (DMM) was developed using the principles and structure of CMMI Institute’s Capability Maturity Model Integration (CMMI)—a proven approach to performance improvement and the gold standard for software and systems development for more than 20 years. The DMM model helps organizations become more proficient in managing critical data assets to improve operations, enable analytics and gain competitive advantage.

Using this body of knowledge and a library of source questions we can elicit current state and desired end state responses using a simple survey. This can be conducted online, in workshops, or traditional interviews as needed. The responses are compiled and grouped to evaluate the gap closure opportunities for an organization wishing to improve their data management practices by identifying and taking action to address shortcoming or weaknesses identified. The framework and topic structure of the 142 questions are organized to match the DMM model. DMM_Topics

Looking closer we find the nine (9) questions used to elicit responses related to Business Glossaries within the Data Governance topic.

1) Is there a policy mandating use and reference to the business glossary?
2) How are organization-wide business terms, definitions, and corresponding metadata created, approved, verified, and managed?
3) Is the business glossary promulgated and made accessible to all stakeholders?
4) Are business terms referenced as the first step in the design of application data stores and repositories?
5) Does the organization perform cross-referencing and mapping of business-specific terms (synonyms, business unit glossaries, logical attributes, physical data elements, etc.) to standardized business terms?
6) How is the organization’s business glossary enhanced and maintained to reflect changes and additions?
7) What role does data governance perform in creating, approving, managing, and updating business terms?
8) Is a compliance process implemented to make sure that business units and projects are correctly applying business terms?
9) Does the organization use a defined process for stakeholders to give feedback about business terms?

Responses are expected to include or more of the following values describing current state practice and what the respondent believes is a desired end state. These can simply be a placed on a scale where the following values are recorded for both current and desired outcomes.

0 – Never or No
1 – Awareness
2 – Occasionally
3 – Often
4 – Usually
5 – Always or Yes

In this example note how the relatively simple response can be mapped directly into the scoring description and perspective the DMM follows.

0 – No evidence of processes performed or unknown response.

1 – Performed Processes are performed ad hoc, primarily at the project level. Processes are typically not applied across business areas. Process discipline is primarily reactive; for example, data quality processes emphasize repair over prevention. Foundational improvements may exist, but improvements are not yet extended within the organization or maintained. Goal: Data is managed as a requirement for the implementation of projects.

2 – Managed Processes are planned and executed in accordance with policy; employ skilled people with adequate resources to produce controlled outputs; involve relevant stakeholders; are monitored and controlled and evaluated for adherence to the defined process. Goal: There is awareness of the importance of managing data as a critical infrastructure asset.

3 – Defined Set of standard processes is employed and consistently followed. Processes to meet specific needs are tailored from the set of standard processes according to the organization’s guidelines. Goal: Data is treated at the organizational level as critical for successful mission performance.

4 – Measured Process metrics have been defined and are used for data management. These include management of variance, prediction, and analysis using statistical and other quantitative techniques. Process performance is managed across the life of the process. Goal: Data is treated as a source of competitive advantage.

5 – Optimized Process performance is optimized through applying Level 4 analysis for target identification of improvement opportunities. Best practices are shared with peers and industry. Goal: Data is critical for survival in a dynamic and competitive market.

The key here is capturing both current state (what is being performed now) and the desired end state capability using this tool. The difference or delta between the two values now becomes a data set we can use analytic tools to reveal where the greatest challenges are. In this example the clear gaps (represented in orange and red visual cues) show where we should focus our immediate attention and call for further investigation. Yellow shaded topics are less urgent. All green shaded topics don’t need the same rigor when addressing the actions needed in the road map developed in later stages.


Specific Subject Area – Master Data Management Assessment
In this example we can extend and focus on Master Data Management using the same principles and structure of CMMI Institute’s Capability Maturity Model Integration (CMMI), adding proven practice in the Master Data Management domain. Note the framework and topic structure is far more focused to match the MDM model framework. And the library of survey questions used here (225 questions) are far more detailed and now very much focused on Master Data Management.


Using the same scoring engine we have captured both current state (what is being performed now) and the desired end state capability using this tool. The difference or delta between the two values now becomes a data set we can use analytic tools to reveal where the greatest challenges are. The clear gaps (represented in orange and red visual cues) pop off the page when the size and relative distance from desired or needed and current practice is measured. Now there is a good idea of what needs to be addressed in the road map developed in later stages.


This is a quick way to summarize our findings and give valuable clues and direction for further investigation. We can then focus on specific problem areas using detailed schedules based on the field work to date. Based on the gaps uncovered at the higher level summary (Current vs. Desired End State) further investigation should be performed by a professional with deep subject matter expertise and intimate knowledge of generally accepted proven practice. Using the same data set we can now begin to use an interactive exploration tools to uncover significant patterns and reveal further insight.



I hope this has helped readers who have asked about how to develop and use gap analysis tools to find quickly what significant delta items (the difference between current and desired states) demand further attention. I think you can see this is valuable way to quickly gather, compile field work, and capture a fairly comprehensive view of the gaps uncovered between the current and desired end state of the subject in question. This method and set of tools can be used in a variety of management challenges across the business both big and small. Armed with this information we can now go ahead to step four (4) and begin to prioritize the findings from the Gap Analysis exercise into a series of gap closure strategies.

This is an invaluable way to assemble and discover the best sequence of actions (recognizing predecessor – successor relationships) as we move to developing the road map. This difference (delta) between these two (current and desired end state) is the basis for our road map. I hope this has answered many of the questions about step three (3) Conduct a Gap Analysis exercise. This is not the only way to do this, but has become the most consistent and repeatable methods I’m aware of to perform a gap analysis quickly in my practice.

If you enjoyed this post, please share with anyone who may help from reading it. And don’t forget to click the follow button to be sure you don’t miss future posts. Planning on compiling all the materials and tools used in this series in one place, still unsure of what form and content would be the best for your professional use.Please take a few minutes and let me know what form and format you would find most valuable.

Suggested content for premium subscribers:

  • Topic Area Models (for use with Mind Jet – see for more)
  • Master Data Management Gap Analysis Assessment
  • Data Maturity Management Capability Assessment
  • Analytic Practice Gap Analysis Assessment
  • Big Data Analytic Gap Analysis Assessment
  • Program Gap Analysis Assessment
  • Program Readiness Assessment
  • Project Gap Analysis Assessment
  • Enterprise Analytic Mind Map
  • Reference Library with Supporting Documents

Big Data Analytics – Unlock Breakthrough Results: (Step 3)

In this step I will dive deeper into defining the critical capabilities used across the four operating models discussed in an earlier post (Big Data Analytics – Unlock Breakthrough Results: Step 2). This may be a little boring for many and just a little too detailed for a medium like this. I believe it is important to always define your terms and create a controlled vocabulary so there is less of chance of friction or ambiguity in the decision model we will be developing. Seems old-fashioned and little of date in a world where Info-graphics and sound-bites are the preferred delivery medium. So at the risk of boring many, I’m going to just put this baseline out there and reference this work later when needed.
Capability Defined
A capability is the ability to perform or achieve certain actions or outcomes through a set of controllable and measurable faculties, features, functions, processes, or services. Capability describes “the what” of the activity, but not necessarily the how.  Achieving success with big data means leveraging its’ capability to transform raw data into the intelligence to realize true actionable insight.  Big data is a part of a much larger ecosystem and should not be viewed as a stand-alone solution that is independent of the other platforms available to the analytic community. The other platforms should be used to expand and amplify what is uncovered in big data using each of their respective strengths.

ModelSummaryThanks to Gartner who published Critical Capabilities for Business Intelligence and Analytics Platforms this summer (12 May 2015 ID:G00270381) we have a reasonably good way to think about form and function across the different operating models which Gartner refers to in their work as baseline use cases. Recall that across any analytic landscape (including big data) we are most likely to encounter one or more of the four operating models to include:

– Centralized Provisioning,
– Decentralized Analytics,
– Governed Data Discovery, and
– OEM/Embedded Analytics.

This seems to be a sensible way to organize the decision model by describing the fifteen (15) groups of critical capabilities when comparing or seeking platform and tool optimization. The baseline used includes the following capability groups:

– Traditional Styles of Analysis
– Analytic Dashboards and Content
– IT-Developed Reports and Dashboards
– Platform Administration
– Metadata Management
– Business User Data Mash-up
– Cloud Deployment
– Collaboration and Social Integration
– Customer Service
– Development and Integration
– Ease of Use
– Embedded Analytics
– Free Form Interactive Exploration
– Internal Platform Integration
– Mobile

There are other ways to view capability. See Critical Capabilities for Enterprise Data Science written by Dr. Jerry Smith that addresses Data Science in depth. His work represents a significant deep dive into the data science and a refinement of capability expressed at a much more granular level than I suggest here. The purpose in this effort is to organize and quantify which capability within each operating model is more important than the others; weighting their relative importance in satisfying need. In this step we are simply starting a baseline. We can refine the critical analytic capabilities from this baseline to meet site specific needs before moving on to the weighting in the next step.

Note: the weights used in this example are based on the Gartner work referred to above. I have changed the metadata weighting to reflect my experience, will leave the balance of the work to the next step after you have tailored this baseline to your environment and are ready to apply your own weightings.



We have already seen there are very different needs required for each of the models presented. A set of tools and platforms which are ideal for Centralized Provisioning may be completely unsuited for use within a Decentralized operating model.  Critical capability essential to Embedded Analytic is very different from Governed Data Discovery.  And of course there are some essential capabilities that will be shared across all operating models (e.g. metadata).

As the decision model is introduced and developed in later steps the data points for each can be used to develop quick snapshots and quantitative indexes when evaluating for form and function.  I know this seems like a lot of work. Once completed you can always leverage this effort for solving for what can seem like bewildering array of choices and implications. Think of this as a way to introduce a taxonomy and a controlled vocabulary so all interested stakeholders have a way to discuss and think about each choice in a meaningful way. The following descriptions and characteristics of each of the fourteen (14) critical capabilities are presented to add additional context.

IT-Developed Reports and Dashboards
Provides the ability to create highly formatted, print-ready and interactive reports, with or without parameters. IT-authored or centrally authored dashboards are a style of reporting that graphically depicts performance measures. This includes the ability to publish multi-object, linked reports and parameters with intuitive and interactive displays; dashboards often employ visualization components such as gauges, sliders, check boxes and maps, and are often used to show the actual Healthcare-performance-dashboardvalue of the measure compared with a goal or target value. Dashboards can represent operational or strategic information. Most often found in the Centralized Provisioning and OEM/Embedded Analytic models. Key characteristics and functions to recognize include:

– Production reporting, distribution and printing
– Parameterization, filters, prompts
– Report and dashboard navigation and guided navigation
– Design environment and document layout
– Visual components, such as gauges, sliders, dials, check boxes

Analytic Dashboards and Content
The ability to create highly interactive dashboards and content with visual exploration and embedded advanced and geospatial analytics to be consumed by others. Key features and functions include:

– Information visualizations
– Disconnected exploration
– Embedded advanced analytics
– Geospatial and location intelligence
– Content authoring
– Consumer interactivity and exploration

While this is an important capability found across all operating models, it is most important to Decentralized Analytics and Governed Discovery success.

Traditional Styles of Analysis
Ad hoc query enables users to ask their own questions of the data, without relying on IT to create a report. In particular, the tools must have a reusable semantic layer to enable users to navigate available data sources, predefined metrics, and hierarchies. Online analytical processing (OLAP) enables users to analyze data with fast query and calculation performance, enabling a style of analysis known as “slicing and dicing.” Users are able to navigate multidimensional drill paths. They also have the ability to write-back values to a database for planning and “what if?” modeling. This capability could span a variety of data architectures (such as relational, multidimensional or hybrid) and storage architectures (such as disk-based or in-memory). This capability is most often realized through:

– OLAP, and
– Ad hoc queries.

Most often found in the Centralized Provisioning model, it can be useful in Governed Discovery as well.

Platform Administration
Provides the capability to secure and administer users, scale the platform, optimize performance and ensure high availability and disaster recovery. These capabilities should be common across all platform components. This capability includes:

– Architecture
– Security
– User administration
– Scalability and performance
– High availability and disaster recovery

Almost always found in the Centralized Provisioning and to a lesser extent OEM/Embedded Analytics models.

Business User Data Mashup
“Drag and drop,” user-driven data combination of different sources and the creation of analytic models to include user-defined measures, sets, groups and hierarchies. Advanced capabilities include semantic autodiscovery, intelligent joins, intelligent profiling, hierarchy generation, data lineage and data blending on varied data sources, including multistructured data. Features to identify related to this capability include:

– Business user data mashup and joins
– Business-user-defined calculations, grouping
– Data inference
– Data profiling and enrichment
– Business user data lineage

This capability group is important to Decentralized Analytics and Governed Discovery models.

Cloud Deployment
Platform as a service and analytic application as a service capabilities forbuilding, deploying and managing analytics and analytic applications in the cloud, based on data both in the cloud and on-premises. Expect the following features and functions within this group to include:

– Built-in data management capabilities (including data integration and data warehouse)
– Special-purpose connectors to cloud-based data sources
– Direct connect for both cloud and on-premises data sources (hybrid)
– Packaged content
– Self-service administration
– Self-service elasticity

This capability is most important in Decentralized Analytics, Governed Discovery, and Embedded models.

Collaboration and Social Integration
Enables users to share and discuss information, analysis, analytic content and decisions via discussion threads, chat, annotations and storytelling. Think of this as the communication channel or collaborative workspace. In addition to analytic content and findings look for:

– Story telling
– Discussion threads
– Integration with social platforms
– Timelines
– Sharing and real-time collaboration

This capability is most important to Decentralized Analytics and Governed Discovery models.

Customer Service
Relationships, products and services/programs that enable clients to be successful with the products evaluated. Specifically, this includes the ways customers receive technical support or account support. This can also include ancillary tools, customer support programs (and the quality thereof), availability of user groups, and service-level agreements. Examine the service level agreements (SLAs) and discover what the analytic community is happy with; or not.

This capability is found across all operating models.

Development and Integration
The platform should provide a set of programmatic and visual tools and a development workbench for building reports, dashboards, queries and analysis. It should enable scalable and personalized distribution, scheduling, alerts, and workflow of content and applications via email, to a portal or to mobile devices. It should include the ability to embed and customize analytic platform components in a business process, application or portal.

– External platform integration
– Embedded Analytics
– Support for big data sources (including cloud)
– Developer productivity (APIs, SDKs, versioning, and multi-developer features)
– Scheduling and alerts
– Workflow and events

This group of capabilities is important to Centralized Provisioning and OEM/Embedded Analytics models.

Usability – Ease of Use
This is a combined grouping consisting of product quality, support, availability of skills, user support (which includes training, online videos, online communities and documentation) and migration difficulty. Closely related to Customer Service but different – this is all about the content available to analytic community.

Important across all models especially critical in the success of Decentralized Analytics.

Embedded Analytics
This group of capabilities includes a software developer’s kit with APIs and support for open standards — for creating and modifying analytic content, visualizations and applications, and embedding them into a business process and/or an application or portal. These capabilities can reside outside the application, reusing the analytic infrastructure, but must be easily and seamlessly accessible from inside the application, without forcing users to switch between systems. The capabilities for integrating analytics with the application architecture will enable users to choose where in the business process the analytics should be embedded. Look for:

– Capability for embedding (APIs, open standards, SDKs, component libraries)
– Capability to consume common methods, (ex; Predictive Model Markup Language (PMML) and SAS/R-based models in the metadata layer and in a report object or analysis application.

This capability is important to the success of the OEM/Embedded Analytics model.

Free Form Interactive Exploration
This group of critical capabilities enables the exploration of data, manipulation of chart images, with the color, brightness, size, shape and motion of visual objects representing aspects of the data set being analyzed. This includes an array of visualization options that go beyond those of pie, bar and line charts, including heat and tree maps, geographic maps, scatter plots and other special purpose visuals. These tools enable users to analyze the data by interacting directly with a visual representation of it. What to look for?

– Interactivity and exploration
– User experience
– Information visualizations
– Disconnected exploration
– Search-based data discovery
– Data flow
– Content authoring
– In-memory interactive analysis

This capability is most important to Decentralized Analytics and Governed Discovery models.

Internal Platform Integration
A common look and feel, install, query engine, shared metadata, promotability across all platform components.

– Integration with complementary analytic capabilities
– Ability to promote business-user-generated data mashups to the systems of record
– Common security model and administration application components across the platform
– Integrated semantic/metadata layer
– Integrated and common front-end tools

This capability is most important to Centralized Provisioning, Governed Discovery, and OEM/Embedded Analytic models.

Metadata Management
metadataPlatform and supporting tools used to enable users to leverage the same systems-of-record semantic model and metadata. They should provide a robust and centralized way for administrators to search, capture, store, reuse and publish metadata objects, such as dimensions, hierarchies, measures, performance metrics/KPIs, and report layout objects, parameters and soon. Administrators should have the ability to promote a business user-defined data mashup and metadata to the systems-of-record metadata.

– Promotability
– Data modeling
– Reuse
– Connectivity and data sources
– Data lineage and impact analysis

This capability is most important to Centralized Provisioning, Decentralized Analytics, and Governed Discovery models.

Enables organizations to develop and deliver content to mobile devices in a publishing and/or interactive mode, and takes advantage of mobile devices’ native capabilities, such as touchscreen, camera, location awareness and natural-language query.

– Content authoring and information exploration
– Information display, interaction and context awareness
– Multi-device support
– Security and administration
– Offline mode exploration

This capability is important across all operating models.

So there it is. The fourteen (14) critical capabilities organized as a baseline to be used within each of the four (4) operating models. We are now at a point where the data points can be weighted and combined with the community profiles (this is coming in another step) to arrive at a sound approach to quantifying the data used in the upcoming decision model.


CapabilityResultsIf you enjoyed this post, please share with anyone who may benefit from reading it. And don’t forget to click the follow button to be sure you don’t miss future posts. Planning on compiling all the materials and tools used in this series in one place, still unsure of what form and content would be the best for your professional use. Please take a few minutes and let me know what form and format you would find most valuable.

Suggested content for premium subscribers: 
Big Data Analytics - Unlock Breakthrough Results: Step Three (3) 
Operating Model Mind Map (for use with Mind Jet - see for more)
Analytic Core Capability Mind Map
Enterprise Analytics Mind Map 
Analytics Critical Capability Workbooks
Analytics Critical Capability Glossary, detailed descriptions, and cross-reference
Logical Data Model (XMI - use with your favorite tool)
Reference Library with Supporting Documents

Big Data Analytics – Unlock Breakthrough Results: (Step 2)

Herding-CatsThis post is part of a larger series to provide a detailed set of steps you can take to unlock breakthrough results in Big Data Analytics. The simple use case used to illustrate this method will address the perplexing management challenge of platform and tool optimization. This step is used to identify the types and nature of the operating models used within the analytic community. I’m using a proven approach for solving platform and tool optimization in the same manner that proven practice suggests every analytic decision be made. Here we are simply using an organizing principle to group and categorize our findings in what can quickly become a bewildering experience (much like herding cats) in its complexity and nuance.

Recall the nine steps to take as summarized in a prior post.

1) Gather current state Analytic Portfolio, and compile findings.
2) Determine the Analytic Operating Models in use.
3) Refine Critical Analytic Capabilities as defined.
4) Weight Critical Analytic Capability according to each operating model.
5) Gather user profiles and simple population counts for each form of use.
6) Gather platform characteristics profiles.
7) Develop platform and tool signatures.
8) Gather data points and align with the findings.
9) Assemble decision model for platform and tooling optimization.

Let’s start with examining the type and nature of the analytic operating models in use. Note an organization of any size will most likely use two or more of these models for very good reasons. I myself have seen all of these models employed at the same organization in my own practice. When moving on to the remaining steps it will become increasingly evident that having a keen understanding of the strategy, organization, technology footprint, and culture that drives the model adoption in question will become invaluable. First, let’s define our terms.

What is an operating model? 
Wikipedia defines an operating model as an abstract representation of how an organization operates across a range of domains in order to OperatingModels_Summaryaccomplish its function. An operating model breaks this system into components, showing how each works together. It helps us understand the whole. In our case we are going to focus on the analytic community and use this understanding to evaluate fit when making changes to ensure the enabling models will still work after the recommended optimization is called for. Thanks to Gartner who published Critical Capabilities for Business Intelligence and Analytics Platforms this summer (12 May 2015 ID:G00270381) we have a reasonably good way to think about form and function across the different operating models which Gartner refers to in their work as baseline use cases to include.

– Centralized Provisioning,
– Decentralized Analytics,
– Governed Data Discovery, and
– OEM/Embedded Analytics.

You may think what you will about Gartner I believe they have done a good job of grouping and characterizing the signatures around the four (4) operating models using fifteen (15) critical analytic capabilities to further decompose the form and function found within each. At a summary level the capabilities are grouped as follows.

– Traditional Styles of Analysis
– Analytic Dashboards and Content
– IT-Developed Reports and Dashboards
– Platform Administration
– Metadata Management
– Business User Data Mash-up
– Cloud Deployment
– Collaboration and Social Integration
– Customer Services
– Development and Integration
– Ease of Use
– Embedded Analytics
– Free Form Interactive Exploration
– Internal Platform Integration
– Mobile

Note: Detailed descriptions and characteristics of each of the fifteen (15) critical capabilities can be found in step three (3) where I will refine the Gartner definitions of Critical Analytic Capabilities to add additional context.

Why is this important?
Each of the four models have very different needs influenced by strategy, footprint, and culture of the organization. Each optimization will have to recognize their differences and accommodate for them to remain meaningful. A set of tools and 805platforms which are ideal for Centralized Provisioning are usually terrible and completely unsuited for use within a Decentralized Analytics operating model. Critical capability essential to Embedded Analytics is very different from Governed Data Discovery. Yes there are some capabilities that cross operating models (e.g. metadata), and some that are far important than others. In general this is a truly sound way to determine where your investment in capability should be occurring – and where it is not. Along the way you will surely stumble across very clever professionals who have solved for their own operating model limitations in ways that will surprise you. And some just downright silliness; remember culture plays a real and present role in this exercise. At a minimum I would think carefully about what you uncover across the following facets or dimensions.

  • Structure is drawing boundaries for each analytic community, defining the horizontal mechanisms that ensure coordination and scale, and evaluating the resource levels that reflect the roles of the each. It should define the high-level organization chart if form follows function. If you look carefully, the clues to helping understand and classify each model are there. And note some overlap and redundancy is expected between each of the models.
  • Accountability describes the roles and responsibilities of the organizational entities within each model and clarify how organizational units come together to make effective cross-enterprise analytic decisions. This is where a lot of organizational friction can occur resulting in undefined behaviors and unnecessary ambiguity.
  • Governance refers to the configuration and cadence for discussing and resolving issues of strategy, resource allocation (including talent), performance management and other matters under each model. Note the wide variety of skills and competencies needed under each model and the potential for a rapid proliferation of tools and methods.
  • Working describes how people collaborate across the seams that lie between different models. Behavior that’s consistent with intended values is critical to effective execution. Less understood by many, remember you really can’t do effective predictive or prescriptive analytic work without the descriptive or diagnostic data sets usually prepared by others under what is typically a very different operating model.
  • Critical Capability can be determined by using the collection referred to above to balance people, processes and technology investment. The choice of operating models has implications for the type of talent or technology platform and tool optimization required. This collection is a suggestion only (and a good one at that), in step three I will refine this further to illustrate how to extend and refine this set of capabilities.

Step Two – Determine the operating models in use
In this step we are going to gather a deep understanding for the characteristics within each operating model, where they differ, and what common components and critical capability are shared. If you read the Gartner reference they consider metadata to be most heavily weighted in the Centralized Provisioning and Governed Discovery models. Based on my experience it is just as critical (and perhaps even more so) in the Decentralized model as well, especially in the Big Data world where tools like Alation, Adaptive, and Tamr are becoming essential to supporting discovery and self-service capability. The rest of this post will briefly describe the key characteristics for each operating model, their signature attributes, and highlight a few differences to help determine which operating models are employed.

Centralized Provisioning

CentralizedProvisioningThe classic model used for years in delivery of what has been referred to as business intelligence. Typically we would find tight management controls to push through centralized strategy and efficiency, usually at a high cost. Tightly managed processes for collecting and cleaning data before consumption can be found in the classic patterns associated with Extract, Transform, and Load operations into a data warehouse or mart. Most often characterized by formal processes where a developer or specialists collects business requirements from the users and then creates sanctioned reports and dashboards for them on trusted data. Centralized provisioning enables an information consumer to access their Key Performance Indicators (KPIs) from an information portal — increasingly on a mobile device or embedded in an analytic application — to measure the performance of the business. Interactivity and discovery in centrally developed content is limited to what is designed in by the content author. Seven of fourteen most important capabilities needed this model would include:

– IT-Developed Reports and Dashboards
– Traditional Styles of Analysis
– Platform Administration
– Development and Integration
– Metadata Management
– Ease of Use
– Customer Services

Decentralized Analytics

DecentralizedAnalyticsThe opposite of centralized provisioning, this model or loose confederation encourages local optimization and entrepreneurial drive. Look for a community that rapidly and interactively explores trends or detects patterns in data sets often from multiple sources to identify opportunities or risks with minimal support from the IT development community. Interactivity and discovery in this model is NOT limited to what is designed in by the content authors we find in the Centralized Provisioning model. The users are the content authors. Users of platforms and tools that excel at the decentralized analytics model can explore data using highly interactive descriptive analytic (“what happened” or “what is happening”) or diagnostic analytic (“Why did something happen?”, “Where are areas of opportunity or risk?”, and “What if?”). Because of embedded advanced analytic functions offered by many vendors, users can extend their analysis to some advanced descriptive analysis (for example, clustering, segmenting and correlations) and to a basic level of predictive analytic (for example, forecasting and trends). They can also prepare their own data for analysis, reducing their reliance on IT and improving time to insight. As decentralized analytics becomes more pervasive, the risk of multiple sources of the truth and information governance itself becomes a real challenge. Six of fourteen most important capabilities important capabilities needed in this model would include:

– Analytic Dashboards and Content
– Free Form Interactive Exploration
– Business User Data Mashup and Modeling
– Metadata Management
– Ease of Use
– Customer Services

Governed Data Discovery

GovernDiscoveryA hybrid of centralized and decentralized this model is best characterized by offering freedom within a framework to enhance transparency and effectiveness. This model features business users’ ability to prepare and combine data, explore and interact visually with this data to enable discovery to be deployed and managed across the enterprise. With the success of data discovery tools in driving business value, there is an increasing demand to use data discovery capabilities for a broader range of analysis and an expanded set of users than previously addressed by traditional reporting and dashboards. Governed data discovery enables users to access, blend and prepare data, then visually explore, find and share patterns with minimal IT support using their own technical and statistical skills. At the same time, this model must also satisfy enterprise requirements for business-user-generated model standards, data reuse and governance. In particular, users should be able to reuse sanctioned and approved business-user-created data or data sets, derived relationships, derived business models, derived KPIs, and metrics that support analyses.

Governed data discovery can enable pervasive deployment of data discovery in the enterprise at scale without proliferating data discovery tooling sprawl. The expanded adoption of data discovery also requires analytic leaders to redesign analytics deployment models and practices, moving from an IT-centric to an agile and decentralized, yet governed and managed approach. This would include putting in place a prototype, pilot and production process in which user-generated content is created as a prototype. Some of these prototypes would need to be used in recurring analysis and promoted to a pilot phase. Successful pilots are promoted to production and operationalized for regular analysis as part of the system of record. Each step provides more rigor and structure in governance and Quality Assurance testing. Business user data mashup and modeling, administration, and metadata capabilities should be based understanding on the following characteristics which would differentiate a Governed model from the Decentralized Analytics model discussed earlier. Pursuing the following questions will help define the differences.

– Where are permissions enabled on business models?
– Who can access shared data connections and data sets?
– Who can create and publish data sets?
– Who can access shared user work spaces to publish visualizations?
– Is there shared metadata about usage, connections and queries ?
– Are usage, connections and queries monitored?
– Is there a information catalog available to enable discovery?

Eight of fourteen most important capabilities needed in this model would include:
– Analytic Dashboards and Content
– Free Form Interactive Exploration
– Business User Data Mashup and Modeling
– Internal Platform Integration
– Platform Administration
– Metadata Management
– Ease of Use
– Customer Services

Embedded Analytics

EmbeddedAnalyticsIn this model analytics (decisions, business rules, and processes) are integrated into the organization to capture economies of scale and consistency across planning, operations, and customer experience. Most typically found where developers are using software development kits (SDKs) and related APIs to include advanced analytics and statistical functions within application products. These capabilities are used to create and modify analytic content, visualizations and applications and embed them into a business process, application or portal. Analytic functions can reside outside the application, reusing the infrastructure but should be easily and seamlessly accessible from inside the application, without forcing users to switch between systems. The ability to integrate analytics with the application architecture will enable the analytic community to choose where in the business process the analytics should be embedded. On example of a critical capability for embedding advanced analytics would include consuming a SAS/R or PMML model to create advanced models embedded in dashboards, reports or data discovery views. Six of the fourteen most important capabilities needed in this model would include:

– Embedded (includes both developer and embedded advanced analytics)
– Cloud Deployment
– Development and Integration
– Mobile
– Ease of Use
– Customer Services

Putting It All Together
Believing form really does follow function it should be clear after this step what operating models are driving the platforms and tools that are enabling (or inhibiting) effective performance. Using the Gartner work and the refinements I have extended this with we can now see at a glance what core capabilities are most important to each model as illustrated in the following diagram. This will become a key input to consider when assembling the decision model and discovering platform and tooling optimization in the later steps.

Now that this step is completed it is time to turn our attention to further refining the critical analytic capabilities as defined and begin weighting each according to their relative importance to each operating model.  It will become increasingly clear why certain critical capabilities essential to one model will be less important to another when this task is completed.

If you enjoyed this post, please share with anyone who may benefit from reading it. And don’t forget to click the follow button to be sure you don’t miss future posts. Planning on compiling all the materials and tools used in this series in one place, still unsure of what form and content would be the best for your professional use. Please take a few minutes and let me know what form and format you would find most valuable.

Suggested content for premium subscribers: 
Big Data Analytics - Unlock Breakthrough Results: Step Two (2) 
Operating Model Mind Map (for use with Mind Jet - see for more)
Analytic Core Capability Mind Map
Enterprise Analytics Mind Map 
Analytics Critical Capability Workbooks
Analytics Critical Capability Glossary, detailed descriptions, and cross-reference
Logical Data Model (XMI - use with your favorite tool)
Reference Library with Supporting Documents

What is Natural Language Processing?

Before proceeding with the Building Better Systems series I thought I should write a quick post over the weekend about the underlying Natural Language Processing (NLP) and text engineering technologies proposed in the solution. I have received a lot of questions about this when I posted How to build better systems – the specification  and mentioned this key element. And a few assumptions that are not quite correct about just what this technology TextMining_SoftBorderis all about.  This is completely understandable. Unless you are in voice recognition, machine learning, or artificial intelligence NLP is not a mainstream subject many of us are familiar with. I’m not going to pretend to know the subject as well as others who are dedicated to this field. There are others who know this subject inside out and can discuss the science behind it far better than I can. Nor do I intend to go into a detailed exposition describing just how powerful and useful this technology can be. I will share with you how I was able to take advantage of the work done to date and use a couple of tools with NLP components already embedded to automate many of the labor intensive tasks usually performed when evaluating system specifications. My focus is seeking a way to automate and uncover defects in our specifications as quickly as possible.  This can be accomplished by uncovering defects in our specifications related to:

  • Structural Completeness/Alignment
  • Text Ambiguity and Context
  • Section Quantity/Size
  • Volatility
  • Lexicon Discovery
  • Plain Language; word complexity density – complex words

NLP and text engineering can do this for us and much more. Rather than elaborate any further, I think it is better to just see it in action with a few good examples. Fortunately the University of Illinois Cognitive Computation Group as already done this for all of us and created a site for a little taste of what can be done. Their demonstration page using NLP has several clear demonstrations of the following key concepts:

Natural Language Analysis

  • Coreference Resolution
  • Part of Speech Tagging
  • Semantic Role Labeling
  • Shallow Parsing
  • Text Analysis

Entities and Information Extraction

  • Named Entity Recognition
  • Named Entity Recognizer (often using extended entity type sets)
  • Number Quantization
  • Temporal Extraction and Comparison


  • Context Sensitive Verb Paraphrasing
  • LLM (Lexical Level Matching)
  • Named Entity Similarity
  • Relation Identification
  • Word Similarity

For example say we wanted to parse and process this simple sentence:

Time Zones are used to manage the various campaigns that are executed to ensure customers are called within the allowed campaigning window which is 8 AM through 8 PM.”

Using the text analysis demonstration at the site we can copy and paste this simple phrase into the dialog box and submit for processing. The results are returned in the following diagram.


Click to enlarge

The parser has identified things (entities), guarantees, purpose, verbs (use, manage, call, and execute) and time within this simple sentence.  I think you can see now how powerful this technology can be when used to collect, group, and evaluate text . While this is impressive enough think of the possibilities for using this technology to process page after page of business and system requirements?

There is also a terrific demonstration illustrating what the Cognitive Computation Group calls The Wikification system. Use their examples or plug in your own text for processing. Using the Wikification example you can insert plain text to be parsed and processed to identify entities with Wikipedia articles. The result set returned includes the live links to visit the corresponding Wikipedia page and the categories associated with each entity. Here is an example from an architecture specification describing (at a high level) Grid Service Agent behavior.

The Grid Service Agent (GSA) acts as a process manager that can spawn and manage Service Grid processes (Operating System level processes) such as Grid Service Manager and Grid Service Container. Usually, a single GSA is run per machine. The GSA can spawn Grid Service Managers, Grid Service Containers, and other processes. Once a process is spawned, the GSA assigns a unique id for it and manages its life cycle. The GSA will restart the process if it exits abnormally (exit code different than 0), or if a specific console output has been encountered (for example, an Out Of Memory Error).

The result returned is illustrated in the following diagram. Note the parser here has categorized and selected context specific links to public Wikipedia articles (and related links) to elaborate on the objects identified where such articles exist.


The more ambitious among us (with an internal Wiki) can understand how powerful this can be. Armed with the source code this is potentially a truly wonderful application processor to link dynamic content (think system specifications or requirements) in context back to an entire knowledge base.

If you really want to get your hands dirty and dive right in, there are two widely known frameworks for natural language processing.

  • GATE (General Architecture for Text Engineering)
  • UIMA (Unstructured Information Management Architecture)

GATE is a Java suite of tools originally developed at the University of Sheffield and now used worldwide by a wide community of scientists, companies for all sorts of natural language processing tasks. It is readily available and works well with Protégé (semantic editor) for those of you into ontology development.

UIMA (Unstructured Information Management Architecture) originally developed by IBM but now maintained by the Apache Software Foundation. .

If you need to input plain text and identify entities, such as persons, places, organizations; or relations, (such as works-for or located-at) this may be the way to do it.  Access to both frameworks is open, and there is a large community of developers supporting both initiatives.

For myself, I’m going to use a wonderful commercial product available in the cloud (and on-premise if needed) called The Visible Thread to illustrate many of the key concepts.  Designed for just this need, it also comes with the nlp-300x300 collaboration features already built-in and a pretty handy user interface for managing the collection (the fancy word for this is corpus; look it up and impress your friends) of documents I need to evaluate.  As an architect my primary mission is reduce unnecessary effort, so it just makes sense to use this. Of course, if you want to build this yourself there are several options available using GATE or UIMA to develop your own text processor to evaluate system specification document(s).  I hope you will take the time to try some of the samples at the University of Illinois Cognitive Computation Group to get a feel for what is going on under the covers as I discuss the methods and techniques around building better systems through high quality specifications in the coming weeks.  You don’t need to be a rocket scientist (something I can’t relate to either) to appreciate the underlying technology and concepts you can take advantage of to make this meet this important and urgent need a reality in your own practice.

How to build a Roadmap – Define Current State


In an earlier post (How to Build a Roadmap) I discussed the specific steps required to develop a defensible, well thought out road map to identify specific actions using an overall pattern all roadmaps should follow. The steps required to complete this work:

I have received a lot of questions about step one (1) which is understandable given the lack of details about just how to quickly gather real quantifiable objectives and potential functional gaps. In the interest of simplicity and my attempt  to keep the length of the prior post manageable specific details about how to do this were omitted.

This post will provide a little more exposition and insight into one method I have found useful in practice. Done well, it can provide an honest and objective look in the mirror to successfully understand where we truly are as an organization and face the uncomfortable truth in some cases where we need to improve.  The reality of the organizational dynamic and politics (we are human after all) can distort the reality we are seeking here and truly obscure the findings. I think this happens in our quest to preserve the preferred “optics” without an objective and shared method all stakeholders are aware of and approve before embarking down this path. In the worst case, if left to the hands of outside consultants alone or in the hands of an unskilled practitioner we risk creating more harm than good before even starting. This is why I will present a quick, structured way to gather and evaluate current state that can be reviewed and approved by all stakeholders before the activity even gets underway.  Our objective is to develop a clear and unambiguous understanding of the current state. We should have a formal, well understood way to gather and evaluate the results.

Define Current State

First, we need to have a shared, coherent set of questions we can use to draw participants into the process that are relevant and can be quantified.  This set of questions should be able to be compiled, evaluated, and presented in a way which is easy to understand. Everyone participating in the work should be able to immediately grasp where the true gaps or shortcomings exist and why this is occurring.  This is true if we are evaluating Information Strategy, our readiness to embrace a SOA initiative, or launching a new business initiative. So, we need just a few key components.

A pool or set of relevant questions that can be answered quickly by the participants and results quantified An engine to compile the results A quick way to compile and summarize the results for distribution to the stakeholders

Now this is where many of us come up short.  Where do I find the questions and how to I make sure they are relevant? What is this engine I need to compile the results? And how do I quickly compile the results dynamically and publish for comment every time I need to? One of the answers for me came a few years ago when I first saw the MIKE 2.0 quick assessment engine for Information Maturity .

The Information Maturity (IM) Quick Scan is the MIKE 2.0 tool used to assess current and desired Information Maturity levels within an organization. This survey instrument is broad in scope and is intended to assess enterprise capabilities as opposed to focusing on a single subject area.  Although this instrument focuses on Information Maturity I realized quickly I had been doing something similar for years across many other domains. The real value here is in the open source resource you can use to kick start your own efforts.

So what does this do?

I’m going to focus on the Mike 2.0 tools here because they are readily available to you. The MS Office templates you need can be found at

Who am I?Extending these templates into other subject areas is pretty simple once you understand how they work. The basic premise remains the same it really is just a matter of injecting your own subject matter expertise and organizing the results in a way that makes sense to you and your organization.  So here is what you will find there.

First a set of questions organized around the following categories:

  • People/Organization considers the human side of Information Management, looking at how people are measured, motivated and supported in related activities.  Those organizations that motivate staff to think about information as a strategic asset tend to extract more value from their systems and overcome shortcomings in other categories.
  • Policy considers the message to staff from leadership.  The assessment considers whether staffs are required to administer and maintain information assets appropriately and whether there consequences for inappropriate behaviours.  Without good policies and executive support it is difficult to promote good practices even with the right supporting tools.
  • Technology covers the tools that are provided to staff to properly meet their Information Management duties.  While technology on its own cannot fill gaps in the information resources, a lack of technological support makes it impractical to establish good practices.
  • Compliance surveys the external Information Management obligations of the organization.  A low compliance score indicates that the organization is relying on luck rather than good practice to avoid regulatory and legal issues.
  • Measurement looks at how the organization identifies information issues and analyses its data.  Without measurement, it is impossible to sustainably manage the other aspects of the framework.
  • Process and Practice considers whether the organization has adopted standardized approaches to Information Management.  Even with the right tools, measurement approaches and policies, information assets cannot be sustained unless processes are consistently implemented.  Poor processes result in inconsistent data and a lack of trust by stakeholders.

The templates include an engine to compile the results and a MS Word document template to render and present the results. Because it is based on MS Office the Assessment_Questions.xlsx, Assessment_Engine.xlsx, and Assessment_Report.docx are linked (rather than relative, they use MS’s way –really hardcoded to find linked files in the c:\assessment folder – yikes!) so that you open and score the Assessment_Questions first, then the Assessment_Engine picks these values and creates a nice tabbed interface and charts across all six subject areas. The Word document picks this up further and creates the customized report.

You can extend this basic model to include your own relevant questions in other domains (for example ESB or SOA related, Business Intelligence).  We are going to stick with the Information Maturity quick scan for now. Note I have extended a similar model to include SOA Readiness, BI/.DW, and Business Strategy Assessments.

How it Works

The questions in the quick scan are organized around six (6) key groups in this domain to include Organization, Policy, Technology, Compliance, Measurement, and Process/Practice.  The results are tabulated based on responses (in the case of the MIKE 2.0 template) ranging from zero (0 – Never) to five (5 – Always).  Of course you can customize response the real point here is we want to quantify the responses received.

The engine component takes the results builds a summary, and produces accompanying tabs where radar graphs plots present the Framework, Topic, Lookup, # Questions, Total Score, Average Score, and Optimum within each grouping.  The MS Word document template then links to this worksheet and grabs the values and radar charts produced to assemble the final document. If all this sounds confusing, please grab the templates and try them for yourself.

Define Current State Diagram

Define Current State Model

Each of these six (6) perspectives is then summarized and combined into a MS Word document to present to the stakeholders.


I think you can see this is valuable way to reduce complexity and gather, compile, and present a fairly comprehensive view of the current state of the domain (in this case Information Management Maturity) in question. Armed with this quantified information we can now proceed to step 3 and conduct a Gap Analysis exercise based on our understanding of what is the desired end state. The delta between these two (current and desired end state) becomes the basis for our road map development.  I hope this has answered many of the questions about step one (1) Define Current State. This is not the only way to do this, but has become the most consistent and repeatable methods I’m aware of to quickly define current state in my practice.

Modeling the MDM Blueprint – Part IV

optionIn part II and III of this series we discussed the Common Information and Canonical Models. Because MDM is a business project we need to establish of a common set of models that can be referenced independent of the technical infrastructure or patterns we plan on using. Now it is time to introduce the Operating Model into the mix to communicate how the solution will actually be deployed and used to realize the benefits we expect with the business in a meaningful way.

This is the most important set of models you will undertake. And sadly not accounted for in practice “in the wild”, meaning rarely seen, much less achieved. This effort describes how the organization will govern, create, maintain, use, and analyze consistent, complete, contextual, and accurate data values for all stakeholders.

There are a couple of ways to do this. One interesting approach I have seen is to use the Galbraith Star Model as an organizational design framework. The model is developed within this framework to understand what design policies and guidelines will be needed to align organizational decision making and behavior within the MDM initiative. The Star model includes the following five categories:

Determine direction through goals, objectives, values and mission. It defines the criteria for selecting an organizational structure (for example functional or balanced Matrix). The strategy defines the ways of making the best trade-off between alternatives.

Determines the location of decision making power. Structure policies can be subdivided into:
– specialization: type and number of job specialties;
– shape: the span of control at each level in the hierarchy;
– distribution of power: the level of centralization versus decentralization;
– departmentalization: the basis to form departments (function, product, process, market or geography).

In our case this will really help when it comes time to designing the entitlement and data steward functions.

The flow of information and decision processes across the proposed organization’s structure. Processes can be either vertical through planning and budgeting, or horizontal through lateral relationships (matrix).

Reward Systems:
Influence the motivation of organization members to align employee goals with the organization’s objectives.

People and Policies:
Influence and define employee’s mindsets and skills through recruitment, promotion, rotation, training and development.

Now before your eyes glaze over, I’m only suggesting this be used as a starting point. We are not originating much of this thought capital, only examining the impact the adoption of MDM will have on the operating model within this framework. And more importantly identifying how any gaps uncovered will be addressed to ensure this model remains internally consistent. After all, we do want to enable the kind of behavior we expect in order to be effective, right? A typical design sequence starts with an understanding of the strategy as defined. This in turns drives the organizational structure. Processes are based on the organization’s structure. Structure and Processes define the implementation of reward systems and people policies.

The preferred sequence in this design process is composed in the following order:
a – strategy;
b – structure;
c – key processes;
d – key people;
e – roles and responsibilities;
f – information systems (supporting and ancillary);
g – performance measures and rewards;
h – training and development;
i – career paths.

The design process can be accomplished using a variety of tools and techniques. I have used IDEF, BPMN or other process management methods and tools (including RASIC charts describing roles and responsibilities for example), What ever tools you elect to use, they should effectively communicate intent and used to validate changes with the stakeholders who must be engaged in this process. Armed with a clear understanding of how the Star model works we can turn our attention to specific MDM model elements to include:

Master Data Life Cycle Management processes
– Process used to standardize the way the asset (data) is used across an enterprise
– Process to coordinate and manage the lifecycle of master data
– How to understand and model the life-cycle of each business object using state machines (UML)
– Process to externalize business rules locked in proprietary applications (ERP) for use with Business Rules Management Systems (BRMS) (if you are lucky enough to have one )
– Operating Unit interaction
– Stewardship (Governance Model)
– Version and variant management, permission management, approval processes.
– Context (languages, countries, channels, organizations, etc.) and inheritance of reference data values between contexts
– Hierarchy management
– Lineage (historical), auditability, traceability

I know this seems like a lot of work. Ensuring success and widespread adoption of Master Data Management mandates this kind of clear understanding and shared vision among all stakeholders. We do this to communicate how the solution will actually be deployed and used to realize the benefits we expect.

In many respects this is the business equivalent to the Technical Debt concept Ward Cunningham developed (we will address this in the next part on Reference Architecture) to help us think about this problem. Recall this metaphor means doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choices we have made. The same concept applies to this effort. The most elegant technical design may be the worst possible fit for the business. The interest due in a case like this is, well, unthinkable.

Take the time to get this right. You will be rewarded with enthusiastic and supportive sponsors who will welcome your efforts to achieve success within an operating model they understand.

Modeling the MDM Blueprint – Part II

whiteboardIn part I of this series we discussed what essential elements should be included in a MDM blueprint. The important thing to remember is the MDM is a business project that requires establishing of a common set of models that can be referenced independent of the technical infrastructure or patterns you plan on using. The blueprint should remain computation and platform independent until  the models are completed (and accepted by the business) to support and ensure the business intent. The essential elements should include:

– Common Information Model
– Canonical Model
– Operating Model, and
– Reference Architecture (e.g. 4+1 views, viewpoints and perspectives).

We will now turn our attention to first element, the Common Information Model.

A Common Information Model (CIM) is defined using relational, object, hierarchical, and semantic modeling methods. What we are really developing here is rich semantic data architecture in selected business domains using:

  • Object Oriented modeling 
    Reusable data types, inheritance, operations for validating data
  • Relational
    Manage referential integrity constraints (Primary Key, Foreign Key)
  • Hierarchical
    Nested data types and facets for declaring behaviors on data (e.g. think XML Schemas)
  • Semantic models
    Ontologies defined through RDF, RDFS and OWL

I believe (others may not) that MDM truly represents the intersection of Relational, Object, Hierarchical, and semantic modeling methods to achieve a rich expression of the realitycim_diagram the organization is operating in.  Expressed in business terms this model represents a “foundation principal” or theme we can pivot around to understand each facet in the proper context.  This is not easy to pull off, but will provide a fighting chance to resolve semantic differences in a way that help focus the business on the real matter at hand. This is especially important when the developing the Canonical model introduced in the next step.

If you want to see what one of these looks like visit the MDM Alliance Group (MAG).  MAG is a community Pierre Bonnet founded to share MDM Modeling procedures and prebuilt data models.  The MDM Alliance Group publishes a set of prebuilt data models that include the usual suspects (Location, Asset, Party, Party Relationship, Party Role, Event, Period [Date, Time, Condition]) downloadable from the website. And some more interesting models like Classification (Taxonomy) and Thesaurus organized across three domains. Although we may disagree about the “semantics” I do agree with him adopting this approach can help us avoid setting up siloed reference databases “…unfortunately often noted when using specific functional approaches such as PIM (Product Information Management) and CDI (Customer Data Integration) modeling”.  How true. And a very common issue I encounter often.

Another good example is the CIM developed over the years at the Distributed Management Task Force (DMTF). You can get the CIM V2.20 Schema MOF, PDF and UML at their web site and take a look for yourself. While this is not what most of us think of as MDM, they are solving for some of the same problems and challenges we face.

Even more interesting is what is happening in semantic technology. Building semantic models (ontologies) include many of the same concepts found in the other modeling methods we have already discussed but further extend the expressive quality we often need to fully communicate intent. For example:

– Ontolgies can be used at run time (queried and reasoned over).
– Relationships are first-class constructs.
– Classes and attributes (properties) are set-based and dynamic.
– Business rules are encoded and organized using axioms.
– XML schemas are graphs not trees, and used for reasoning.

If you haven’t been exposed to ontology development I encourage you to grab the open source Protege Ontology Editor and discover for yourself what this all about.  And while you are there see the Protégé Wiki  and grab the Federal Enterprise Architecture Reference Model Ontology (FEA-RMO) for an example of its use in the EA world.   Or see the set of tools found at the Essential project. The project uses this tool to enter model content, based on a model pre-built for Protégé. While you are at the Protégé Wiki  grab some of the ontologies developed for use with this tool for other examples, such as the SWEET Ontologies (A Semantic Web for Earth and Environmental Terminology. Source: Jet Propulsion Laboratory).  For more on this, see my post on this tool at Essential Analytics. This is an interesting and especially useful modeling method to be aware of and an important tool to have at your disposal.

This is hard challenging work. Doing anything worthwhile usually is.  A key differentiator and the difference between success and failure on your MDM journey will be taking the time to model the blueprint and sharing this work early and often with the business. We will be discussing the second element of the MDM blueprint, the Canonical model in part III. I encourage you to participate and share your professional experience.