Sunday, December 13, 2015

Justifying a DEVOPS Business Case - Introduction

Introduction

Communicating the business value of a devops program is probably one of the hardest tasks for a technical team to accomplish.  The key to this is to identify projects that have a specific impact on the business and explain in business terms what that impact is. The good thing is, there are 5 specific projects that most DEVOPS programs start developing because of clear business needs.  These projects are:

  • Automated Deployments.
  • Centralized Logging,
  • Message tracing (also referred to as message-auditing),
  • Management tooling, and
  • Framework monitoring.

After this introductory post, I'll give a brief overview of the business value of each of these projects. Before that though, we'll talk a little bit about what we mean by a modern framework and the complexity which necessitates the need for a DEVOPS program.

The Root-Cause for DEVOPS Projects

When a vendor demonstrates their enterprise tool, their goal is to show you how well their tool works and its functionality. The demonstration is not intended to show how the tools function in a normal corporate enterprise environment. Remember, the vendor doesn't know the nuances of your corporate environment. Attempting to replicate it is a costly endeavor they shouldn't be expected to undertake.

The complexity added by a modern corporate network is what adds the need for DEVOPS projects. While some vendors have solutions for DEVOPS projects, the need for a large corporate enterprise environment for development and testing will keep most from creating solutions like those above.

Modern Framework Complexity

Creating an enterprise middleware framework includes the installation, clustering, and integration of a large number of tools including gateways, load-balancing tools, service-buses, Java containers, OSGi containers, DMZ tooling, management & monitoring tooling, and possibly hundreds of different communication protocols.

The enterprise middleware team's first task will be to architect a solution using vendor(s) tooling to implement a solid enterprise framework. Once the team architects the framework, they install it on a minimal set of servers or virtual-machines (VM's) for a proof-of-concept.

Once the enterprise framework has been created and tested, they make duplicates of it to create different environments. These could include an environment for developers (DEV), testers (QA), and real-time production (PRD). Normally, I also include a fourth environment for integrated unit-testing among multiple applications which I call TST.

Where DEVOPS Issues Arise

Up until now we've gone over the setup of a typical middleware framework of 4 environments in a corporate network environment. But, the issues requiring DEVOPS development happen as the environments are used. Here I will demonstrate two ways DEVOPS issues can arise.

Typical Development and Deployment Flow

A typical development flow through these environments is:
  1. The application development team (apps-team) create an application in DEV.
  2. The apps-team deploy their code into TST and document the process.
  3. DEVOPS takes the instructions from the apps-team and deploys from TST to QA.
  4. As bugs are found and fixed, they are then deployed from TST to QA again following apps-team documentation.
  5. Testers approve the application in QA, and DEVOPS deploys the application to PRD. In a DEVOPS environment, the apps-team provides support to this process.

Centralized Logging

In one common scenario, messages flow through over 100 interconnected nodes. If one node fails, it will have a cascading impact on your system over time.  When one node fails, it is imperative to identify the problem quickly and fix it. The first step in this

Message Tracing

In another common scenario, in QA a large number of messages (say 50) are sent through the system for load-testing and only 49 arrive at the target system. The lack of message tracing requires manually going through the logs of all systems to locate each of the 50 messages to identify 1) the failed message, and 2) the system where the failure occurred, and finally 3) the error message.

Summary

As difficult as these issues are in a pre-production environment, imagine the chaos if they occurred at a large scale? Remember, an enterprise environment have have thousands of possible nodes for a message to flow through. Even it if only takes 5 minutes per VM to check the logs, that means the DEVOPS team will need over 8 hours to find the bad message if there are 100 VM's.  The troubles get even worse at scale.

This is why organizational investment in DEVOPS is needed.

No comments:

Post a Comment