service_management

Agile service management pattern: Deploy often

I'd like to start on a set of rules for running and supporting onlines services in a way that takes advantage of lean production principles. This is carrying on the thoughts stirred up by my recent exposure to ITIL, which is pretty much the IT infrastructure equivalent of the waterfall development principle.

So the first pattern I came up with in that post was around keeping people involved throughout the process of planning, rolling out, and running a service, rather than having each of these things done by a different set of people, relying on the mythical "knowledge transfer" process. I'm sure there's a lot more to say on that one, but for the moment I'd like to get another idea out.

This pattern can be called DeployOften. I've found that a good rule in most areas of life is: if you find something difficult to get right, do it more often. This is obvious in some contexts - it's called practice - but in day to day business it goes against the grain to seek out painful tasks and repeat them more than is necessary to get through the job. The benefit is that the more you do something, the better you get at it, and it becomes less painful.

The principle of doing difficult things more often is found throughout agile development methodologies, an obvious example being test-driven development. Where testing is usually a painful and dull process, typically skipped or skimped, agile makes it a central behaviour, in fact the very first thing you should do when coding is to write up automated tests.

A difficult and painful part of service management is deploying new or updated software, part of what ITIL calls the transition phase. One organisation I know of tries to release updates to their software every 3 months, and each time it's a trial. Deploying the software to the server inevitably turns up surprises, and acceptance testing by users drags on with multiple rounds as updated releases are built to fix problems discovered.

This is in spite of automated nightly and iteration builds, which somehow never bring out the same issues that come out even on staging servers, which use snapshots of current live data sets, and more rigidly mimic the live deployment environment.

The DeployOften pattern suggests that the operations team should deploy each iteration build onto a production-like environment rather than waiting for the nearly-complete release. This will raise cries from the ops people, who don't exactly have light workloads already. But by deploying every two weeks they will get it down to a very quick process, and also turn up deployment problems much sooner in the development cycle, which should raise the developers' awareness of the kinds of things they need to keep in mind to make deployment easier.

ITIL can suck, but shouldn't

The pattern of my career over the past five years or so has involved moving newish, smallish internet/software companies to a post-startup hosting infrastructure. My past three companies were all small companies that developed and hosted internet applications, either for clients (in one case) or their own products. My role in each case was to move things to a more mature infrastructure, with configuration management, monitoring, directory services, and the other pieces needed to be able to manage a growing sprawl of servers and applications.

My focus has been much more on the technical than on the people processes for running and supporting the infrastructures. In my current job, the team I've brought in and the infrastructure we've built has reached a decent level, although there's certainly plenty more to do technically. But looking at what we've done and what I want to do next, I've realized thatrather than moving on and doing the same thing at another company, the more interesting challenge will be to take things to the next level.

The next level for me is going beyond the technical to focus on the people and processes. The technology infrastructure is going to grow in size and sophistication, including spreading to multiple data centres globally, but the technical challenges seem like more of the same to me. The challenges that seem newer and more intriguing to me personally are more along the lines of how the hell we're going to organize and coordinate people doing development, infrastructure, and support in three, four, or more countries.

So I went on a course in ITIL version 3. Yikes. ITIL is basically a blueprint for organizing a huge IT operation with lots of bureaucratic processes, forms, and signoffs that will make it nearly impossible to get anything done, and ensure that responsibilities are divided so that nobody who is doing anything productive sees the big picture.

I don't think it has to be this way. I actually did find the course useful, although not as useful as it could have been given that most of the people on it were more interested in ticking off the certification than getting ideas on how to improve the organisations they work at. There were some pretty interesting people there, some of whom were obviously interested in fixing real problems "back home". If the course had been more of a workshop where we shared war stories and ideas, it would have rocked.

A lot of the concepts in ITIL are useful, I think it's more a matter of using your head when applying them, making sure to adapt the ideas in ways that fit your needs and objectives. It's very easy to see how an organisations, especially large ones, take the ITIL material and use it to build horribly inefficient IT structures. I've worked with companies that use ITIL this way, and the course shed light on how they got this way.

The biggest problem with ITIL is that it's presented with clearly defined "phases" of strategizing, desiging, deploying ("transitioning"), operating, and improving IT services. This is an invitation to a waterfall model, where (as in at least one organization I know of) each phase can even be run by a completely different team of people.

So one group designs the service, hands it off to another than rolls it out (tests and installs it), and then hands off to a completely separate team that supports it. In the organisation I've encountered, the operations team hasn't got the vaguest clue about the service.

Of course the transition process involves "knowledge transfer" where the people who set up the service train the support team, but anybody who's done this stuff in the real world should know better.

Knowledge that is transferred in a handover process is never, ever, ever going to be learned as well as knowledge that comes from actually being involved throughout the whole process. Having some hands-off manager (ahem) overseeing things all the way through doesn't cut it, the people who will actually be diving into runtime problems with an application need to have gotten their hands dirty trying to install the application, and even have pitched into meetings where the details of how the application should be integrated into the infrastructure.

Otherwise, you're going to end up in the situation of my nameless organisation, one which is actually often held up as an exemplar of ITIL. They host an application on their servers, installed by the transition team, and their support team had training on how to log into the server and investigate problems with it. But when users call up with problems, the support people, who probably support dozens of applications, have forgotten all of this. They call up the software vendor - who have no access to the servers.

Can you imagine how incompetent your organisation looks when it's clear that your support people have no idea that the application they support is run by their own company?

But I do think it's possible to take many of the ideas of ITIL and apply them in a more agile manner. A bit of Googling shows I'm not the only one who thinks so, but that there doesn't seem to have been much work done on the idea, at least publicly. It's certainly something that would take a bit of thought and work.

My first thought, clearly, is that an agile IT services process would have to embrace the lean management principle of empowerment by having the "workers" (for lack of a better word) involved throughout the process.

I've also thought that the kanban approach to agile is paricularly suited to a sysadmin team, since it does away with the iterations/release cycle in favor of a queue of tasks that people pull from when they find they've got spare capacity.

Anyway, I'm looking forward to thinking this stuff through and trying out ideas over the next year. Although I'm going to be far less hands-on technically, my focus does need to involve a thorough understanding of the technical aspects of what we're doing, so I don't think I'm going to become a total suit.

Syndicate content