Design Goals February 19, 2011Posted by James Parnitzke in Cloud Computing, Enterprise Architecture, Space Based Architecture, Tuple Spaces.
Tags: Cloud Computing, Reference Architecture, Space Based Architecture, Tuple Space
1 comment so far
In my last post (Wide open spaces) we discussed the elegance of using space based architecture platforms based on their simplicity and power. Compared to other models for developing distributed applications, it offers simpler design, savings in development and debugging effort, and more robust results that are easier to maintain and integrate. Recall, this model combines and integrates distributed caching, content-based distributed messaging, and parallel processing into a powerful architecture within a grid computing framework.
That was a mouthful. You may want to read that last sentence again carefully. And think about what this means to you as a professional practitioner. More importantly, how this may change the way you think about application platforms in general.
Before diving into this important concept, I think it is always good idea to express our stated design goals right up front – and use these to guide the inevitable trade-offs and decisions that will need to be made along this journey. So let’s get started with a few design goals I’m comfortable with. I’m sure there are more, but this represents a good start.
The platform’s ability to scale must be completely transparent
The architecture should be based on technology that can be deployed across a grid of commodity hardware nodes, providing a scalable and adaptable platform that supports high-volume, high-performance processing. The resulting platform should be tolerant of failure in individual nodes, can be matched to changing volumes easily by increasing (or decreasing) the number of processing nodes and, by virtue of its decoupled business logic, is extendible and adaptable to evolve as the business landscape changes.
Unlike conventional application server models, our elastic application platform should not require application developers to do anything different in their code in order to scale. The developer uses a simple API that provides a vast key-value data store that looks like a large shared memory space. Underneath the covers distributed caching features of the application platform spread the data across multiple servers (e.g. using a sophisticated hash algorithm). The application developer should remain unaware of the underlying implementation that distributes the data across the servers on his behalf. In brief, the goal of the grid-enabled middleware is designed to hide complexities of partitioning, distributing, and load balancing.
The platform provides resiliency by design
Applications must be available to customers and expected service level objectives must be met. The business cannot afford a single point of failure to impact customer access to other features and functions of the customer applications suite otherwise available. The platform should operate continuously and needs to be highly resilient to avoid any interruption in processing. This means that the application suite cannot have any single point of failure in the software, hardware, or network. High Availability (HA) is a basic requirement. Failing services and application components will continue on different backup servers, without service disruption.
Distributed data caches are resilient by design because they should automatically replicate data stored in the cache to one or more backup servers, guided by the policies defined by an administrator and executed in a consistent controlled manner. If one server fails, then another server provides the data (the more replicas, the more resilient the cache). Note, distributed data caches can be vulnerable to data center outages if all the compute servers are located in the same physical data center. To address this weakness, the distributed caching mechanism should offer special WAN features to replicate and recover data across multiple physical locations. The improvement in resilience reduces the risk of expensive system down-time due to hardware or software failure, allowing the business to continue operating albeit with reduced performance, during partial outages. An added benefit of this architecture composed of discrete units working together would enable rapid development and a controlled introduction of new features in response to changing requirements without the need for a big-bang rollout approach.
The platform is prepared to meet demanding performance requirements
A performance characteristic of distributed caches is that they store data in fast-access memory rather than on disk, although backing store on disk may be an option. Since this data spans multiple servers, there is no bottleneck or single point of failure. Using this advanced elastic application platform provides a means to ensure that cached data will tend to be on the same server where application code is processing, reducing network latency. We can do this by implementing a “near-cache” concept that places data on the server running the application using that data or by directly managing application code execution in the platform, placing adjacent code and data in cache nodes that are on the same server.
The platform needs to support robust integratation with other data sources
Most distributed caching platforms offer read-through, write-through, and write-behind features to synchronize data in the cache with external data sources. Rather than the developer having to write the code that does this, an administrator configures the cache to automatically read or write to a database or other external data source whenever an application performs a data operation in the cache. Data is an asset that is valuable. Sharing this asset across the platform improves the ability to support better data enrichment, improve accuracy and meet business goals.
The platform’s application workload is by nature distributed
For elastic application platforms offering distributed code execution we should consider the nature of the workload the applications will present to servers. If we can divide the workload into units that naturally fit into the distribution schemes as offered then the greater sophistication of the distributed code execution capability can be just what’s needed to turn a troublesome, resource intensive application into one that performs well and meets expectations.
Specific application responsibilities that repeat (or are redundant) across the application architecture should be separated out in the application architecture. Shared global or common use application functional solutions are sometimes referred to as “cross-cutting concerns” and forward the key principle of “separation of concerns”. The platform should support component designs which minimize coupling. The law of Demeter (Principle of Least Knowledge or only know your neighbor applies). The platform should promote loose coupling by minimizing:
- dependency between modules (e.g. shared global variables)
- discouraging content coupling (one module relying on another’s content)
- protocol or format dependencies
- control based coupling where one program controls another’s behavior
- Non-traceable message coupling which can lead to a dynamic spaghetti-like results impossible to manage
There are other goals I have not addressed here which we should all be familiar with to include:
- Desire to BUY vs. Build and Maintain
- Remain Technology and Vendor Independent
- Promote Interoperability
- Meet security and privacy needs
So, now we have a better idea of the design goals we are going to try to achieve. I think it is always important to take these goals to the next step in the high-level specification in order to begin quantifying how we will meet these into actionable objectives. Remember our original strategy which has driven our design goals. The design goals now should be used to create quantifiable objectives we can plan and measure progress to.
Wide open spaces September 14, 2010Posted by James Parnitzke in Enterprise Architecture, Reference Architecture, Space Based Architecture, Tuple Spaces.
Tags: Grid Computing, Reference Architecture, Space Based Architecture, Tuple Space
1 comment so far
Okay, okay – know I should keep this blog more up to date, just have been a little busy with my day job… and now after a much needed rest in last weeks in August I can now share a few things you may find especially interesting and timely. It is no coincidence that the image accompanying this post is of wide open spaces. This is in fact where I spent most satisfying part of my “summer vacation”. And spaces (Tuple Spaces) is what I intend to share with you in the coming weeks.
As architects we have a professional responsibility to always remain on the look-out for new (and sometimes revisited) ideas about how to improve and adopt good ideas. Especially when our business needs to invest in some key technology changes to remain competitive and deliver value customers will continue to seek for its distinctive quality of service and value.
I have been pretty busy in the last year engaged in a variety of industries where road map development and execution of next generation platforms and paradigm shifts were needed. Many of the more difficult challenges were solved by adopting a Space-Based Architecture (SBA) architecture pattern. This is a demonstrated pattern used to achieve near linear scalability of stateful, high-performance applications using the tuple spaces. This is not a new idea; the tuple space model was developed by David Gelernter over thirty years ago at Yale University. Implementations of tuple spaces have also been developed for Smalltalk, Java (JavaSpaces), and the .NET framework). A tuple space is an implementation of the associative memory model for parallel (distributed) computing by providing a repository of tuples that can be accessed concurrently. I know, this is a mouthful and a little too academic for me too. What this really means is we can group processors that produce pieces of data and group processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. This is also known as the blackboard metaphor. Tuple spaces may be thought as a form of distributed shared memory. The model is closely related to other patterns that have been proved successful in addressing the application scalability challenge used by Google and Amazon.com (EC2) for example. The model has also been applied by many firms in the securities industry for implementing scalable electronic securities trading applications for example.
Before you think I have gone daft on you, I recommend you see a commercial implementation of this at Gigaspaces. Review the site and developer documentation and you will see how this platform is used to embrace many of the principles of Representational State Transfer (REST), service-oriented architecture (SOA) and Event-driven architecture (EDA), as well as elements of grid computing. The beauty of the space based architecture resides in its tandem of simplicity and power. Compared to other models for developing distributed applications, it offers simpler design, savings in development and debugging effort, and more robust results that are easier to maintain and integrate.
The pattern represents a model that combines and integrates distributed caching (Data Grid), content-based distributed messaging (Messaging Grid), and parallel processing (Processing Grid) into a powerful service oriented architecture built on shared spaces within a grid computing framework. Research results and commercial use have shown that a large number of problems in parallel and distributed computing have been solved using this architecture. And the implications of its adoption beyond high performance On-Line Transaction Processing extend well into other uses (including Master Data Management, Complex Event Processing, and Rules Processing for example).
And this is what I intend to share with you in the coming weeks.
Wide open spaces…
Modeling the MDM Blueprint – Part V April 16, 2009Posted by James Parnitzke in Architecture Frameworks, Master Data Management, Reference Architecture, Service Orientated Modeling.
Tags: Architecture Frameworks, Master Data Management, Reference Architecture, Service Orientated Modeling
1 comment so far
In this series we have discussed developing the MDM blueprint by creating Common Information (part II), Canonical (part III), and Operating (part IV) models in our work streams. We have introduced the Operating Model into the mix to communicate how the solution will be adopted and used to realize the benefits we expect with the business in a meaningful way. And hopefully set reasonable expectations with our business partners as to what this solution will look like when deployed.
Now it is time to model and apply the technical infrastructure or patterns we plan on using. The blueprint now moves from being computation and platform independent to one of expressing intent through the use of more concrete platform specific models.
After the initial (CIM, Canonical, and Operating models) work is completed then, and only then are we ready to move on to the computation and platform specific models. We know how to do this well – for example see Information service patterns, Part 4: Master Data Management architecture patterns.
At this point we now have enough information to create the reference architecture. One way (there are several) to organize this content is to use the Rozanski and Woods extensions to the classic 4+1 view model introduced by Philippe Kruchten. The views are used to describe the system in the viewpoint of different stakeholders (end-users, developers and project managers). The four views of the model are logical, development, process and physical view. In addition selected use cases or scenarios are used to demonstrate or show the architecture’s intent. Which is why the model contains 4+1 views (the +1 being the selected scenarios).
Rozanski and Woods extended this idea by introducing a catalog of six core viewpoints for information systems architecture: the Functional, Information, Concurrency, Development, Deployment, and Operational viewpoints and related perspectives. This is elaborated in detail in their book titled “Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives”. There is much to learn from their work, I encourage you to visit the book’s web site for more information.
What we are describing here is how MDM leadership within very large-scale organization can eventually realize the five key “markers” or characteristics in the reference architecture to include:
- Shared services architecture evolving to process hubs;
- Sophisticated hierarchy management;
- High-performance identity management;
- Data governance-ready framework; and
- Registry, persisted or hybrid design options in the architecture selected.
Recommended, this is an exceptional way to tie the technical models back to the stakeholders needs as reflected in the viewpoints, perspectives, guidelines, principles, and template models used in the reference architecture. Grady Booch said “…the 4+1 view model has proven to be both necessary and sufficient for most interesting systems”, and there is no doubt that MDM is interesting. Once this work has been accomplished and agreed to as part of a common vision, we have several different options to proceed with. One interesting approach is leveraging this effort into a Service Orientated Modeling Framework introduced by Michael Bell at Methodologies Corporation.
Service Orientated Modeling
The service-oriented modeling framework (SOMF) is a service-oriented development life cycle methodology. It offers a number of modeling practices and disciplines that contribute to a successful service-oriented life cycle management and modeling. It illustrates the major elements that identify the “what to do” aspects of a service development scheme. These are the modeling pillars that will enable practitioners to craft an effective project plan and to identify the milestones of a service-oriented initiative—in this case crafting an effective MDM solution.
SOMF provides four major SOA modeling styles that are useful throughout a service life cycle (conceptualization, discovery and analysis, business integration, logical design, conceptual and logical architecture). These modeling styles: Circular, Hierarchical, Network, and Star, can assist us with the following modeling aspects:
- Identify service relationships: contextual and technological affiliations
- Establish message routes between consumers and services
- Provide efficient service orchestration and choreography methods
- Create powerful service transaction and behavioral patterns
- Offer valuable service packaging solutions
SOMF Modeling Styles
SOMF offers four major service-oriented modeling styles. Each pattern identifies the various approaches and strategies that one should consider employing when modeling MDM services in a SOA environment.
- Circular Modeling Style: enables message exchange in a circular fashion, rather than employing a controller to carry out the distribution of messages. The Circular Style also offers a way to affiliate services.
- Hierarchical Modeling Style: offers a relationship pattern between services for the purpose of establishing transactions and message exchange routes between consumers and services. The Hierarchical pattern enforces parent/child associations between services and lends itself to a well known taxonomy.
- Network Modeling Style: this pattern establishes “many to many” relationship between services, their peer services, and consumers similar to RDF. The Network pattern accentuates on distributed environments and interoperable computing networks.
- Star Modeling Style: the Star pattern advocates arranging services in a star formation, in which the central service passes messages to its extending arms. The Star modeling style is often used in “multi casting” or “publish and subscribe” instances, where “solicitation” or “fire and forget” message styles are involved.
There is much more to this method, encourage you to visit the Methodologies Corporation site (Michael is the founder) and download the tools, power point presentations, and articles they have shared with us.
So, based on my experience we have to get this modeling effort completed to improve the probability we will be successful. MDM is really just another set of tools and processes for modeling and managing business knowledge of data in a sustainable way. Take the time to develop a robust blueprint to include Common Information (semantic, pragmatic and logical modeling), Canonical, (business rules and format specifications), and Operating Models to ensure completeness. Use these models to drive a suitable Reference Architecture to guide design choices in the technical implementation.
This is hard, difficult work. Anything worthwhile usually is. Why put the business at risk to solve this important and urgent need without our stakeholders understanding and real enthusiasm for shared success? A key differentiator and the difference between success and failure on an MDM journey is taking the time to model the blueprint and share this early and often with the business. This is after all a business project, not an elegant technical exercise. Creating and sharing a common vision through our modeling efforts helps ensure success from inception through adoption by communicating clearly the business and technical intent of each element of the MDM program.
In the last part of the series I will be discussing where all this fits into the larger MDM program and how to plan, organize, and complete this work.