Big Data Analytics and Cheap Suits

Why?Sometimes I just want to staple my head to the carpet and wonder how to help others manage the seemingly irresistible urge to cling to what everyone else seems to be doing without thinking carefully about what is needed, not just wanted.  I will be discussing a topic I have been buried in the last couple of years in the Big Data Analytics space which most everyone by now is familiar with.  The technology is sound, evolving quickly, and solves for problems I could not image attacking a decade ago.  On the other hand the breath-taking speed of this platform adoption has left many scratching their heads and wondering why the old familiar rules of thumb and proven practice just don’t seem to work well anymore.   Less current management styles and obsolete thinking have created needless friction between the business and their supporting IT organizations.  This never ends well, but does keep me very busy.

First let’s put this challenge in perspective with a little context.  Over my career there have been a number of times when the need for efficient, cost effective data analysis has forced a change in existing technologies. The move to a relational model occurred when older methods to reliably handle changes to structured data led to the shift toward a data storage paradigm that was modeled on relational algebra. This created a fundamental shift in data handling, introducing a variety of tools and techniques that made all of our lives more rewarding. The current revolution in technology referred to as Big Data has happened because the relational data model can no longer efficiently handle the current needs for analysis of large and unstructured data sets. It is not just that data is bigger than before, or any of the other Vs (Variety, Volume, Velocity, Veracity, and Volatility) others have written about.  All of these data characteristics have been steadily growing for decades. The Big Data revolution is really a fundamental shift in architecture, just as the shift to the relational model was a shift that changed all of us. This shift means building new capabilities, adopting new tools, and thinking clearly about solving the right problems with the right tools the right way.  This means we need to truly understand what critical analytic capability is needed and make a focused investment in time and energy to realize this opportunity.  This should sound familiar to any of you working in this space. Many are already answering some of the obvious questions we should address at a minimum.

– When do we use a big data platform as opposed to the other platforms available?
– What are the platform drivers or key characteristics beyond storage and advanced analytics?
– Is low latency, real time application access required?
– How about availability and consistency requirements (see the CAP theorem for more on this)
– Workload characteristics – consistent flows or spikes?
– What is the shape of the data (e.g. structured, unstructured, and streaming)?
– Is there a need to integrate with existing data warehouse or other analytic platforms?
– How will the data be accessed by the analytic community and supporting applications?

Note that last question carefully; this is where the fun starts.

Why? There are two very real and conflicting views that we need to balance carefully.

The first, driven by the business is concerned with just getting the job done and lends itself to an environment where tools (and even methods) proliferate rapidly. In most cases this results in overlapping and redundant expensive functionality.  Less concerned with solving problems once, the analytic community is characterized by many independent efforts where significant intellectual property (analytic insight) is not captured and likely put a risk.  And not even re-used across the organization by others solving the same question.  There are very good reasons for this, this is completely understandable when the end justifies the means, and getting to the end game is the rewarded behavior. Like a cheap suit the analytic community simply doesn’t believe one size fits all. And I agree.

What-to-look-for-in-a-good-cheap-suit-by-DapperedThe second view, in contrast, is driven by the supporting IT organization charged with managing and delivering supporting services across a technology portfolio that values efficiency and effectiveness.  The ruthless pursuit of eliminating redundancy, leveraging the benefits of standardization, and optimizing investment drive this behavior.  I think it is easy to see where the means becomes the critical behavioral driver and the end is just assumed to resolve itself.   Just as cheap suits are designed to be mass-produced, use standard materials, and provide just enough (and no more) details to get by with the average consumer (if there really is such a thing).  Is there really an average analytic consumer? No; there is not (see the user profile tool in the next post for more). And I do agree with this view as well, there are very sound reasons why this view remains valid.

So this is where the friction is introduced. Until you understand this dynamic get ready for endless meetings, repeated discussions about capability (and what it means), and organizational behavior that seems puzzling and downright silly at times.  Questions like these (yes these are real) seem to never be resolved.

– Why do we need another data visualization tool when we already have five in the portfolio?
– Why can’t we just settle on one NoSQL alternative?
– Is the data lake really a place to worry about data redundancy?
– Should we use the same Data Quality tools and principals in our Big Data environment?

What to Do

So I’m going to share a method to help resolve this challenge and help focus on what is important so you can expend your nervous system solving problems rather than creating them. Armed with a true understanding of the organizational dynamics it is now a good time to revisit a first principal to help resolve what is an important and urgent problem.

First Principal: Form follows function.

The American architect, Louis Sullivan coined the phrase saying “It is the pervading law of all things organic and inorganic, of all things physical and metaphysical, of all things human and all things superhuman, of all true manifestations of the head, of the heart, of the soul, that the life is recognizable in its expression, that form ever follows function. This is the law”. And this has since become known by its’ more familiar phrase “form follows function“.

It is truly interesting that Sullivan developed the shape of the tall steel skyscraper in late 19th Century Chicago at the very moment when technology, taste and economic forces converged and made it necessary to drop the established styles of the past. If the shape of the building was not going to be chosen out of the old pattern book something had to determine form, and according to Sullivan it was going to be the purpose of the building. It was “form follows function”, as opposed to “form follows precedent”. Sullivan’s assistant Frank Lloyd Wright adopted and professed the same principle in slightly different form perhaps because shaking off the old styles gave them more freedom and latitude.

Sound familiar? It should, for any of us actively adopting this technology. This is where the challenge of using tried and true proven practice meets the reality of shaking off the old styles and innovating where and when it is needed in a meaningful, controlled, and measured manner.

So if form follows function, let’s see what makes sense. Thanks to Gartner who published Critical Capabilities for Business Intelligence and Analytics Platforms this summer (12 May 2015 ID:G00270381) we have a reasonably good way to think about form and function.  You may think what you will about Gartner I believe they have done a good job of grouping and characterizing fourteen (14) critical capabilities for analytics across four (4) different operating models (Gartner referred to them as baseline use cases) as follows.

– Centralized Provisioning
– Decentralized Analytics
– Governed Data Discovery
– OEM/Embedded Analytics

In this case capabilities are defined as “the ability to perform or achieve certain actions or outcomes through a set of controllable and measurable faculties, features, functions, processes, or services”.  They grouped the capabilities in questions into fourteen (14) major categories to include:

– Analytic Dashboards and Content
– Platform Administration
– Business User Data Mashup
– Cloud Deployment
– Collaboration and Social Integration
– Customer Services
– Development and Integration
– Ease of Use
– Embedded Analytics
– Free Form Interactive Exploration
– Internal Platform Integration
– IT-Developed Reports and Dashboards
– Metadata Management
– Mobile
– Traditional Styles of Analysis

Note there may be more than one operating model or baseline use case delivery scenario in use at your organization.  I just completed an engagement where three of the four operating models are in use.  This is exactly where the friction and confusion is created between IT Management and the Analytic Community. Every problem does not represent a nail where 805a hammer is useful. A set of tools and platforms which are ideal for Centralized Provisioning are usually terrible and completely unsuited for use within a Decentralized Analytics operating model.  Critical capability essential to Embedded Analytics is very different from Governed Data Discovery.  Yes there are some essentials that cross operating models (e.g. metadata), and in general this is a truly sound way to determine where your investment in capability should be occurring – and where it is not. In short, form follows function.  This is extremely helpful in using a common vocabulary where all stakeholders can understand the essentials when making analytic portfolio investment or simply selecting the right tool for the right job.

In a follow-up post I will provide an example and some simple tools you can use to help make ToolImage_01these decisions.  And remain committed to delivering value. After all, there is another prinicipal we should always remember. Analysis for analysis sake is just plain ridiculous.  Or has Tom Davenport said “…If we can’t turn that data into better decision making through quantitative analysis, we are both wasting data and probably creating suboptimal performance”.

Stay tuned…

Design Goals

Design GoalsIn my last post (Wide open spaces) we discussed the elegance of using space based architecture platforms based on their simplicity and power. Compared to other models for developing distributed applications, it offers simpler design, savings in development and debugging effort, and more robust results that are easier to maintain and integrate.  Recall, this model combines and integrates distributed caching, content-based distributed messaging, and parallel processing into a powerful architecture within a grid computing framework.  

That was a mouthful. You may want to read that last sentence again carefully. And think about what this means to you as a professional practitioner.  More importantly, how this may change the way you think about application platforms in general.

Before diving into this important concept, I think it is always good idea to express our stated design goals right up front – and use these to guide the inevitable trade-offs and decisions that will need to be made along this journey. So let’s get started with a few design goals I’m comfortable with. I’m sure there are more, but this represents a good start.

The platform’s ability to scale must be completely transparent

The architecture should be based on technology that can be deployed across a grid of commodity hardware nodes, providing a scalable and adaptable platform that supports high-volume, high-performance processing. The resulting platform should be tolerant of failure in individual nodes, can be matched to changing volumes easily by increasing (or decreasing) the number of processing nodes and, by virtue of its decoupled business logic, is extendible and adaptable to evolve as the business landscape changes.

Unlike conventional application server models, our elastic application platform should not require application developers to do anything different in their code in order to scale. The developer uses a simple API that provides a vast key-value data store that looks like a large shared memory space. Underneath the covers distributed caching features of the application platform spread the data across multiple servers (e.g. using a sophisticated hash algorithm). The application developer should remain unaware of the underlying implementation that distributes the data across the servers on his behalf. In brief, the goal of the grid-enabled middleware is designed to hide complexities of partitioning, distributing, and load balancing.

The platform provides resiliency by design

Applications must be available to customers and expected service level objectives must be met.  The business cannot afford a single point of failure to impact customer access to other features and functions of the customer applications suite otherwise available. The platform should operate continuously and needs to be highly resilient to avoid any interruption in processing. This means that the application suite cannot have any single point of failure in the software, hardware, or network. High Availability (HA) is a basic requirement.  Failing services and application components will continue on different backup servers, without service disruption.

Distributed data caches are resilient by design because they should automatically replicate data stored in the cache to one or more backup servers, guided by the policies defined by an administrator and executed in a consistent controlled manner. If one server fails, then another server provides the data (the more replicas, the more resilient the cache). Note, distributed data caches can be vulnerable to data center outages if all the compute servers are located in the same physical data center. To address this weakness, the distributed caching mechanism should offer special WAN features to replicate and recover data across multiple physical locations. The improvement in resilience reduces the risk of expensive system down-time due to hardware or software failure, allowing the business to continue operating albeit with reduced performance, during partial outages. An added benefit of this architecture composed of discrete units working together would enable rapid development and a controlled introduction of new features in response to changing requirements without the need for a big-bang rollout approach.

The platform is prepared to meet demanding performance requirements

A performance characteristic of distributed caches is that they store data in fast-access memory rather than on disk, although backing store on disk may be an option. Since this data spans multiple servers, there is no bottleneck or single point of failure. Using this advanced elastic application platform provides a means to ensure that cached data will tend to be on the same server where application code is processing, reducing network latency. We can do this by implementing a “near-cache” concept that places data on the server running the application using that data or by directly managing application code execution in the platform, placing adjacent code and data in cache nodes that are on the same server.

The platform needs to support robust integratation with other data sources

Most distributed caching platforms offer read-through, write-through, and write-behind features to synchronize data in the cache with external data sources. Rather than the developer having to write the code that does this, an administrator configures the cache to automatically read or write to a database or other external data source whenever an application performs a data operation in the cache. Data is an asset that is valuable. Sharing this asset across the platform improves the ability to support better data enrichment, improve accuracy and meet business goals.

The platform’s application workload is by nature distributed

For elastic application platforms offering distributed code execution we should consider the nature of the workload the applications will present to servers. If we can divide the workload into units that naturally fit into the distribution schemes as offered then the greater sophistication of the distributed code execution capability can be just what’s needed to turn a troublesome, resource intensive application into one that performs well and meets expectations.

Specific application responsibilities that repeat (or are redundant) across the application architecture should be separated out in the application architecture.  Shared global or common use application functional solutions are sometimes referred to as “cross-cutting concerns” and forward the key principle of “separation of concerns”. The platform should support component designs which minimize coupling.  The law of Demeter (Principle of Least Knowledge or only know your neighbor applies). The platform should promote loose coupling by minimizing:

  • dependency between modules (e.g. shared global variables)
  • discouraging content coupling (one module relying on another’s content)
  • protocol or format dependencies
  • control based coupling where one program controls another’s behavior
  • Non-traceable message coupling which can lead to a dynamic spaghetti-like results impossible to manage

There are other goals I have not addressed here which we should all be familiar with to include:

  • Desire to BUY vs. Build and Maintain
  • Remain Technology and Vendor Independent
  • Promote Interoperability
  • Meet security and privacy needs

So, now we have a better idea of the design goals we are going to try to achieve. I think it is always important to take these goals to the next step in the high-level specification in order to begin quantifying how we will meet these into actionable objectives. Remember our original strategy which has driven our design goals. The design goals now should be used to create quantifiable objectives we can plan and measure progress to.

Wide open spaces

Wide open spaces - Wyoming

Okay, okay – know I should keep this blog more up to date, just have been a little busy with my day job… and now after a much needed rest in last weeks in August I can now share a few things you may find especially interesting and timely.  It is no coincidence that the image accompanying this post is of wide open spaces. This is in fact where I spent most satisfying part of my “summer vacation”.  And spaces (Tuple Spaces) is what I intend to share with you in the coming weeks.           

As architects we have a professional responsibility to always remain on the look-out for new (and sometimes revisited) ideas about how to improve and adopt good ideas. Especially when our business needs to invest in some key technology changes to remain competitive and deliver value customers will continue to seek for its distinctive quality of service and value.            

I have been pretty busy in the last year engaged in a variety of industries where road map development and execution of next generation platforms and paradigm shifts were needed.  Many of the more difficult challenges were solved by adopting a Space-Based Architecture (SBA) architecture pattern. This is a demonstrated pattern used to achieve near linear scalability of stateful, high-performance applications using the tuple spaces. This is not a new idea; the tuple space model was developed by David Gelernter over thirty years ago at Yale University. Implementations of tuple spaces have also been developed for Smalltalk, Java (JavaSpaces), and the .NET framework). A tuple space is an implementation of the associative memory model for parallel (distributed) computing by providing a repository of tuples that can be accessed concurrently. I know, this is a mouthful and a little too academic for me too. What this really means is we can group processors that produce pieces of data and group processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. This is also known as the blackboard metaphor. Tuple spaces may be thought as a form of distributed shared memory. The model is closely related to other patterns that have been proved successful in addressing the application scalability challenge used by Google and (EC2) for example. The model has also been applied by many firms in the securities industry for implementing scalable electronic securities trading applications for example.   

Before you think I have gone daft on you, I recommend you see a commercial implementation of this at Gigaspaces.  Review the site and developer documentation and you will see how this platform is used to embrace many of the principles of Representational State Transfer (REST), service-oriented architecture (SOA) and Event-driven architecture (EDA), as well as elements of grid computing.  The beauty of the space based architecture resides in its tandem of simplicity and power. Compared to other models for developing distributed applications, it offers simpler design, savings in development and debugging effort, and more robust results that are easier to maintain and integrate.          

The pattern represents a model that combines and integrates distributed caching (Data Grid), content-based distributed messaging (Messaging Grid), and parallel processing (Processing Grid) into a powerful service oriented architecture built on shared spaces within a grid computing framework. Research results and commercial use have shown that a large number of problems in parallel and distributed computing have been solved using this architecture. And the implications of its adoption beyond high performance On-Line Transaction Processing extend well into other uses (including Master Data Management, Complex Event Processing, and Rules Processing for example).           

And this is what I intend to share with you in the coming weeks. 
Wide open spaces…

Modeling the MDM Blueprint – Part VI

In this series we have discussed developing the MDM blueprint by developing the Common Information (part II), Canonical (part III) , and Operating (part IV)  models in our work. In Part V  I introduced the Reference Architecture model into the mix to apply the technical infrastructure or patterns we plan on using. The blueprint has now moved from being computation and platform independent to one of expressing intent through the use of more concrete platform specific model.  The solution specification is now documented (independent of the functional Business Requirements) to provide shared insight into the overall design solution.  Now it is time to bring the modeling products together and incorporate them into a MDM solution specification we can use in many ways to communicate the intent of the project.

First, the MDM blueprint specification becomes the vehicle for communicating the system’s design to interested stakeholders at each stage of its evolution. The blueprint can be used by:

  • Downstream designers and implementers to provide overall policy and design guidance. This establishes inviolable constraints (and a certain amount of freedom) on downstream development activities.
  • Testers and integrators to dictate the correct black-box behavior of the pieces that must fit together.
  • Technical managers as the basis for forming development teams corresponding to the work assignments identified.
  • Project managers as the basis for a work breakdown structure, planning, allocation of project resources, and tracking of progress by the various teams.
  • Designers of other systems with which this one must interoperate to define the set of operations provided and required, and the protocols for their operation, that allows the inter-operation to take place.

Second, the MDM blueprint specification provides a basis for performing up-front analysis to validate (or uncover deficiencies) design decisions and refine or alter those decisions where necessary. The blueprint could be used by:

  • Architects and requirements engineers who represent the customer the MDM blueprint specification becomes the forum for negotiating and making trade-offs among competing requirements.
  • Architects and component designers as a vehicle for arbitrating resource contention and establishing performance and other kinds of run-time resource consumption budgets.
  • Development using vendor-provided products from the commercial marketplace to establish the possibilities for commercial off-the-shelf (COTS) component integration by setting system and component boundaries and establishing requirements for the required behavior and quality properties of those components.
  • Architects to evaluate the ability of the design to meet the system’s quality objectives. The MDM blueprint specification serves as the input for architectural evaluation methods such as the Software Architecture Analysis Method [and the Architecture Tradeoff Analysis Method (ATAM-SM) and Software Performance Engineering (SPE)   as well as less ambitious (and less effective) activities such as unfocused design walkthroughs.
  • Performance engineers as the formal model that drives analytical tools such as rate schedulers, simulations, and simulation generators.
  • Development product line managers to determine whether a potential new member of a product family is in or out of scope, and if out, by how much.

Third the MDM blueprint becomes the first artifact used to achieve system understanding for:

  • Technical mangers as the basis for conformance checking, for assurance that implementations have in fact been faithful to the architectural prescriptions.
  • Maintainers as a starting point for maintenance activities, revealing the areas a prospective change will affect.
  • New project members, as the first artifact for familiarization with a system’s design.
  • New architects as the artifacts that (if properly documented) preserve and capture the previous incumbent’s knowledge and rationale.
  • Re-engineers as the first artifact recovered from a program understanding activity or (in the event that the architecture is known or has already been recovered) the artifact that drives program understanding activities at the appropriate level of component granularity.

Blueprint for MDM – Where this fits within a larger program

Developing and refining the MDM blueprint is typically associated with larger programs or strategic initiatives. In this last part of the series I will now discuss where all this typically fits within a larger program and how to organize and plan this work within context.  The following diagram (click to enlarge and use your browser to magnify the png file) puts our modeling efforts within the context of a larger program taken from a mix of actual engagements with large, global customers.  The key MDM blueprint components are highlighted with numbers representing:

  1. Common Information Model
  2. The Canonical Model
  3. The Operating Model
  4. The Reference Architecture
Program Management Design

Click to enlarge

I have also assumed a business case exists (you have this right?) and the functional requirements are known.  Taken together with the MDM blueprint we now have a powerful arsenal of robust information products we can use to prepare a high quality solution specification that is relevant and can be used to meet a wide variety of needs.  Typically, use of the MDM blueprint may include:

  • Identifying all necessary components and services
  • Reviewing existing progress to validate (or uncover deficiencies in) design decisions; refine or alter those decisions where necessary
  • Preparation of detailed planning products (Product, Organization, and Work Breakdown structures)
  • Program planning and coordination of resources
  • Facilitating prioritization of key requirements – technical and business
  • Development of Request for Quotation, Request for Information products (make vs. buy)
  • Preparing funding estimates (Capital and Operating Expense) and program budget preparation
  • Understanding a vendors contribution to the solution and pricing accordingly (for example, repurpose as needed in contract and licensing activities and decouple supplier proprietary lock-in from solution where appropriate)

We are also helping to ensure the business needs drive the solution by mitigating the impact of the dreaded Vendor Driven Architecture (VDA) in the MDM solution specification.


I hope you have enjoyed this brief journey through Modeling the MDM blueprint and have gained something from my experience.  I’m always interested in learning from others, please let me know what you have encountered yourself, and maybe we can help others avoid the pitfalls and pain in this difficult demanding work.  A key differentiator and the difference between success and failure on an MDM journey is taking the time to model the blueprint and share this early and often with the business.  This is after all a business project, not an elegant technical exercise.  In an early reference I mentioned Ward Cunningham’s Technical Debt concept.  Recall this metaphor means doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choices we have made. The technical debt and resulting interest due in MDM initiative with this kind of far-reaching impact across the enterprise is, well, unthinkable. Take the time to develop your MDM blueprint and use this product to ensure success by clearly communicating business and technical intent with your stakeholders.

Modeling the MDM Blueprint – Part V

er_modelIn this series we have discussed developing the MDM blueprint by creating Common Information (part II), Canonical (part III), and Operating (part IV) models in our work streams. We have introduced the Operating Model into the mix to communicate how the solution will be adopted and used to realize the benefits we expect with the business in a meaningful way.  And hopefully set reasonable expectations with our business partners as to what this solution will look like when deployed.

Now it is time to model and apply the technical infrastructure or patterns we plan on using. The blueprint now moves from being computation and platform independent to one of expressing intent through the use of more concrete platform specific models.

Reference Architecture
After the initial (CIM, Canonical, and Operating models) work is completed then, and only then are we ready to move on to the computation and platform specific models. We know how to do this well – for example see Information service patterns, Part 4: Master Data Management architecture patterns.

At this point we now have enough information to create the reference architecture. One way (there are several) to organize this content is to use the Rozanski and Woods extensions to the classic 4+1 view model introduced by Philippe Kruchten. The views are used to describe the system in the viewpoint of different stakeholders (end-users, developers and project managers). The four views of the model are logical, development, process and physical view. In addition selected use cases or scenarios are used to demonstrate or show the architecture’s intent. Which is why the model contains 4+1 views (the +1 being the selected scenarios). 


Rozanski and Woods extended this idea by introducing a catalog of six core viewpoints for information systems architecture: the Functional, Information, Concurrency, Development, Deployment, and Operational viewpoints and related perspectives. This is elaborated in detail in their book titled “Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives”. There is much to learn from their work, I encourage you to visit the book’s web site for more information.

What we are describing here is how MDM leadership within very large-scale organization can eventually realize the five key “markers” or characteristics in the reference architecture to include:

– Shared services architecture evolving to process hubs;
– Sophisticated hierarchy management;
– High-performance identity management;
– Data governance-ready framework; and
– Registry, persisted or hybrid design options in the architecture selected.

Recommended, this is an exceptional way to tie the technical models back to the stakeholders needs as reflected in the viewpoints, perspectives, guidelines, principles, and template models used in the reference architecture. Grady Booch said “…the 4+1 view model has proven to be both necessary and sufficient for most interesting systems”, and there is no doubt that MDM is interesting.  Once this work has been accomplished and agreed to as part of a common vision, we have several different options to proceed with. One interesting approach is leveraging this effort into a Service Orientated Modeling Framework introduced by Michael Bell at Methodologies Corporation.

Service Orientated Modeling
The service-oriented modeling framework (SOMF) is a service-oriented development life cycle methodology. It offers a number of modeling practices and disciplines that contribute to a successful somf_v_2_0service-oriented life cycle management and modeling. It illustrates the major elements that identify the “what to do” aspects of a service development scheme. These are the modeling pillars that will enable practitioners to craft an effective project plan and to identify the milestones of a service-oriented initiative—in this case crafting an effective MDM solution.

SOMF provides four major SOA modeling styles that are useful throughout a service life cycle (conceptualization, discovery and analysis, business integration, logical design, conceptual and logical architecture). These modeling styles: Circular, Hierarchical, Network, and Star, can assist us with the following modeling aspects:

– Identify service relationships: contextual and technological affiliations
– Establish message routes between consumers and services
– Provide efficient service orchestration and choreography methods
– Create powerful service transaction and behavioral patterns
– Offer valuable service packaging solutions

SOMF Modeling Styles
SOMF offers four major service-oriented modeling styles. Each pattern identifies the various approaches and strategies that one should consider employing when modeling MDM services in a SOA environment.

– Circular Modeling Style: enables message exchange in a circular fashion, rather than employing a controller to carry out the distribution of messages. The Circular Style also offers a way to affiliate services.

– Hierarchical Modeling Style: offers a relationship pattern between services for the purpose of establishing transactions and message exchange routes between consumers and services. The Hierarchical pattern enforces parent/child associations between services and lends itself to a well known taxonomy. somf_styles

– Network Modeling Style: this pattern establishes “many to many” relationship between services, their peer services, and consumers similar to RDF. The Network pattern accentuates on distributed environments and interoperable computing networks.

– Star Modeling Style: the Star pattern advocates arranging services in a star formation, in which the central service passes messages to its extending arms. The Star modeling style is often used in “multi casting” or “publish and subscribe” instances, where “solicitation” or “fire and forget” message styles are involved.

There is much more to this method, encourage you to visit the Methodologies Corporation site (Michael is the founder) and download the tools, power point presentations, and articles they have shared with us.

So, based on my experience we have to get this modeling effort completed to improve the probability we will be successful. MDM is really just another set of tools and processes for modeling and managing business knowledge of data in a sustainable way.  Take the time to develop a robust blueprint to include Common Information (semantic, pragmatic and logical modeling), Canonical, (business rules and format specifications), and Operating Models to ensure completeness.  Use these models to drive a suitable Reference Architecture to guide design choices in the technical implementation.

This is hard, difficult work. Anything worthwhile usually is. Why put the business at risk to solve this important and urgent need without our stakeholders understanding and real enthusiasm for shared success?  A key differentiator and the difference between success and failure on an MDM journey is taking the time to model the blueprint and share this early and often with the business.  This is after all a business project, not an elegant technical exercise.  Creating and sharing a common vision through our modeling efforts helps ensure success from inception through adoption by communicating clearly the business and technical intent of each element of the MDM program.

In the last part of the series I will be discussing where all this fits into the larger MDM program and how to plan, organize, and complete this work.

Modeling the MDM Blueprint – Part II

whiteboardIn part I of this series we discussed what essential elements should be included in a MDM blueprint. The important thing to remember is the MDM is a business project that requires establishing of a common set of models that can be referenced independent of the technical infrastructure or patterns you plan on using. The blueprint should remain computation and platform independent until  the models are completed (and accepted by the business) to support and ensure the business intent. The essential elements should include:

– Common Information Model
– Canonical Model
– Operating Model, and
– Reference Architecture (e.g. 4+1 views, viewpoints and perspectives).

We will now turn our attention to first element, the Common Information Model.

A Common Information Model (CIM) is defined using relational, object, hierarchical, and semantic modeling methods. What we are really developing here is rich semantic data architecture in selected business domains using:

  • Object Oriented modeling 
    Reusable data types, inheritance, operations for validating data
  • Relational
    Manage referential integrity constraints (Primary Key, Foreign Key)
  • Hierarchical
    Nested data types and facets for declaring behaviors on data (e.g. think XML Schemas)
  • Semantic models
    Ontologies defined through RDF, RDFS and OWL

I believe (others may not) that MDM truly represents the intersection of Relational, Object, Hierarchical, and semantic modeling methods to achieve a rich expression of the realitycim_diagram the organization is operating in.  Expressed in business terms this model represents a “foundation principal” or theme we can pivot around to understand each facet in the proper context.  This is not easy to pull off, but will provide a fighting chance to resolve semantic differences in a way that help focus the business on the real matter at hand. This is especially important when the developing the Canonical model introduced in the next step.

If you want to see what one of these looks like visit the MDM Alliance Group (MAG).  MAG is a community Pierre Bonnet founded to share MDM Modeling procedures and prebuilt data models.  The MDM Alliance Group publishes a set of prebuilt data models that include the usual suspects (Location, Asset, Party, Party Relationship, Party Role, Event, Period [Date, Time, Condition]) downloadable from the website. And some more interesting models like Classification (Taxonomy) and Thesaurus organized across three domains. Although we may disagree about the “semantics” I do agree with him adopting this approach can help us avoid setting up siloed reference databases “…unfortunately often noted when using specific functional approaches such as PIM (Product Information Management) and CDI (Customer Data Integration) modeling”.  How true. And a very common issue I encounter often.

Another good example is the CIM developed over the years at the Distributed Management Task Force (DMTF). You can get the CIM V2.20 Schema MOF, PDF and UML at their web site and take a look for yourself. While this is not what most of us think of as MDM, they are solving for some of the same problems and challenges we face.

Even more interesting is what is happening in semantic technology. Building semantic models (ontologies) include many of the same concepts found in the other modeling methods we have already discussed but further extend the expressive quality we often need to fully communicate intent. For example:

– Ontolgies can be used at run time (queried and reasoned over).
– Relationships are first-class constructs.
– Classes and attributes (properties) are set-based and dynamic.
– Business rules are encoded and organized using axioms.
– XML schemas are graphs not trees, and used for reasoning.

If you haven’t been exposed to ontology development I encourage you to grab the open source Protege Ontology Editor and discover for yourself what this all about.  And while you are there see the Protégé Wiki  and grab the Federal Enterprise Architecture Reference Model Ontology (FEA-RMO) for an example of its use in the EA world.   Or see the set of tools found at the Essential project. The project uses this tool to enter model content, based on a model pre-built for Protégé. While you are at the Protégé Wiki  grab some of the ontologies developed for use with this tool for other examples, such as the SWEET Ontologies (A Semantic Web for Earth and Environmental Terminology. Source: Jet Propulsion Laboratory).  For more on this, see my post on this tool at Essential Analytics. This is an interesting and especially useful modeling method to be aware of and an important tool to have at your disposal.

This is hard challenging work. Doing anything worthwhile usually is.  A key differentiator and the difference between success and failure on your MDM journey will be taking the time to model the blueprint and sharing this work early and often with the business. We will be discussing the second element of the MDM blueprint, the Canonical model in part III. I encourage you to participate and share your professional experience.

Modeling the MDM Blueprint – Part I

Several practitioners have contributed to this complex and elusive subject (see Dan Power’s Five Essential Elements of MDM and CDI for example) and have done a good job at elaborating the essential elements.  There is one more element often overlooked in this field and remains a key differentiator and the difference between success and failure among the major initiatives I have had the opportunity to witness firsthand – modeling the blueprint for MDM.

pen1This is an important first step to take, assuming the business case is completed and approved. It forces us to address the very real challenges up front, before embarking on a journey that our stakeholders must understand and support in order to succeed. Obtaining buy-in and executive support means we all share a common vision for what we are solving for.

 MDM is more than maintaining a central repository of master data. The shared reference model should provide a resilient, adaptive blueprint to sustain high performance and value over time. A MDM solution should include the tools for modeling and managing business knowledge of data in a sustainable way.  This may seem like a tall order, but consider the implications if we focus on the tactical and exclude the reality of how the business will actually adopt and embrace all of your hard work. Or worse, asking the business to stare at a blank sheet of paper and expect them to tell you how to rationalize and manage the integrity rules connecting data across several systems, eliminate duplication and waste, and ensure an authoritative source of clean reliable information can be audited for completeness and accuracy.  Still waiting?

So what is in this blueprint?

The essential thing to remember is the MDM project is a business project that requires establishing of a common information model that applies whatever the technical infrastructure or patterns you plan on using may be. The blueprint should remain computation and platform independent until the Operating Model is defined (and accepted by the business), and a suitable Common Information Model (CIM) and Canonical model are completed to support and ensure the business intent. Then, and only then, are you ready to tackle the Reference Architecture.

The essential elements should include:
– Common Information Model
– Canonical Model
– Operating Model, and
– Reference Architecture (e.g. 4+1 views).

Will be discussing each of these important and necessary components within the MDM blueprint in the following weeks and encourage you to participate and share your professional experience. Adopting and succeeding at Master Data Management is not easy, and jumping into the “deep end” without truly understanding what you are solving for is never a good idea. Whether you are a hands-on practitioner, program manager, or an executive planner I can’t emphasize enough how critical modeling the MDM blueprint and sharing this with the stakeholders is to success. You simply have to get this right before proceeding further.