« The Problems with the Precautionary Principle | Main | Thoughts on Programming's Future »

February 19, 2007

The Most Common, and Most Costly Possible, IT Project Management Failure

So what invidious failure could it be to merit the title of most common and most costly IT project management failure? Is it staffing unqualified people? No, that can be fixed without even throwing away a version of whatever it is you're building. Is it bad requirements gathering? Not really, though that is a common failure. Usually, bad requirements gathering means you throw the first one away and build the second to the now properly-gathered requirements. You may end up embarrassed, even fired, but from a corporate point of view it's hardly the end of the world. Is it inadequate testing? Nope, though that, too, is pretty bad and pretty common. An inadequately-tested application means that your customers are debugging the application instead of your programmers. (In other words, think of almost any released Microsoft product.) The bugs that are found can be fixed with point updates — this is painful and expensive, but there are worse project management monsters to slay.

No, the most common and most costly possible IT project management failure is simply managing to dates instead of work effort. This is the gift that just keeps on giving: you pay for the software when you build it badly because you are too rushed to build it well, and again with every fix and every patch and every feature added thereafter. You pay for it over and over again, until you stop.

Let me give you a real-life example. A project I am peripherally involved with (that is, I work with the primary reviewer, and occasionally review; I neither design nor manage nor even approve) has gone through the following cycle:

  1. The first design was horrible. There were hundreds of review comments from the first design review. Many of the basic ideas behind data modeling and object-oriented programming were clearly either not understood, or were so badly expressed as to be almost comically bad. It was very, very clear that the program would be at least twice as large as it had to be, and probably larger, and that it would be so complex as to be nearly impossible to maintain with any efficiency.
  2. The right action at this point would have been to can the design, can the designers, and start over. OK, I probably would have given the designers general feedback and then had them start over, canning them if they still obviously didn't get it. Nonetheless, the key item here is that the design was clearly flawed. (Moreover, it came about during design reviews that the designers and the customers disagreed on what the requirements meant, and the designers were arguing requirements with the customer!) Clearly, the design should not be pursued.
  3. You can already see it coming: the design was used. Why? Because there was a deadline to meet, and the prior design group that was canned had ended up pushing the project well behind schedule. But it's worse than that: not only was the design used, but the project manager decided that the number of comments on the design document meant that the document was flawed, rather than the design, and so canned the "documentation" effort until after coding. Besides, it was already partly coded, even though design wasn't done. Did I mention yet that the requirements gathering done by the first group was terrible, but was used going forwards anyway because of, wait for it, lack of time in the schedule to redo it right. That, by the way, is why the designers ended up arguing requirements with the customer.
  4. Anyway, the beat goes on, and application coding is completed. Now it's time to document the design. Except no one actually knows what the design is. Thus, every design document produced creates more questions than it answers, and the review architects begin to get slowly sucked into the weeds, all the way down to the code more than once. At times, the review architects are creating diagrams to check their understanding, because the design team can't produce the diagrams themselves!
  5. In parallel with the documentation and reviews noted, the application goes through testing. When the user acceptance tests come, the application is rejected utterly. The users find so many functional and security gaps that they refuse to let it be deployed. There wasn't time to design it well, so now the company has to invest in emergency fixes and bringing on extra people and staffing 24 hours and so forth, just to get it deployed at the pilot sites (six of the 180-some sites that will eventually, theoretically, have the application).
  6. There is a follow-on project, that adds significant capabilities and is scheduled for some six months out from the point that the users reject the prior version. Clearly, the user issues need to be fixed immediately to allow wider deployment. Clearly, there's no time to go fix even the most egregious flaws in the architecture and design. Clearly, there's no time to document this before we code it. I'm sure you can see where this is going. The originally-planned follow-on project is pushed out just long enough to put in an interim project, intended to fix the most critical bugs and add the most critical features required of, but not delivered by, the recently-rejected version. And because this has to happen quickly to meet the schedule, the documentation (ie, the design work) is again pushed to the back of the effort, and no time is available for architectural fixes. I can buy that, except that there is still denial going on about the inevitable result: we are digging deeper into the hole of bad design we are already in, and for the same reasons.
  7. By this point, as we are in testing for the interim version, everyone agrees that the design is deeply flawed. The critical flaws that were called out in the original design review, and ignored, more than a year prior have now resulted in bugs that have had the application down for weeks at the pilot sites, as well as a squadron of emergency fixes to correct all kinds of previously-identified, and ignored, issues. But now we're even further behind, because we made some assumptions in November to meet our schedule for February, that required another team to deliver something in October that they told us up front they could not deliver until January. The schedule demanded, and then when reality delivered as promised, more slippages and emergency work followed, as night follows day. So now, let's look ahead to the next version, the one that was pushed out to slot in the interim version.
  8. Are we going to fix the architectural issues? No; no time is available in the schedule for this, because we have to implement these new features for the business, and we were supposed to have delivered them already. So we will have time to design before we code, but we will not have time to fix anything already identified. Indeed, by and large the new team (much more technically competent overall than the old team) will begin by extending the problematic parts of the old design, making the hole deeper, instead of refactoring first.
  9. There is, now, clearly a need identified for a version beyond the one already in design. There are numerous fixes that need to be put in place, as well as adding in sufficient functionality to bring us up to where the original version was supposed to be. This version is about to go into planning, and is scheduled for end-of-year delivery. Now, we've spent three+ years and millions of dollars to build something that should have been done in one year for about a million dollars, and every time we've had to fix or extend the system, we've paid more for it, and taken more time, than we needed to do. Now, it seems, there is finally a version with time and scope for fixes, because we can rearchitect and refactor, then add in the functions, faster and cheaper than we can just add in the functions.
  10. By now, you should have figured this out. If we take the time to rearchitect and refactor, that's time, in the project managers' minds, that is not available for adding in the new functionality that's needed, and all the estimates tell us that every moment is necessary to meet the current schedules. No argument that we are actually shrinking the schedule and budget will be entertained, because there is no connection in the PM's heads between code complexity and size on the one hand, and cost to maintain and extend on the other.

Yeah, I had a terrible day at work. Why do you ask?

Posted by jeff at February 19, 2007 9:17 PM

Trackback Pings

TrackBack URL for this entry:
http://www.caerdroia.org/MT/mt-tb.cgi/2456

Comments

Pretty much why I prefer to have installable and useable subsets of projects: feedback starts coming in early.

IMO a major project management problem is lack of willingness to secure funding for the complete extent of the project. Examples: insufficient funding for testing, documentation, beta testing, etc. The usual rationale for this is something that boils down to “If we tell them how much it will really cost, the project won't be approved and we'll be out of our jobs.” Remember, it's easier to explain why the project failed than it is to sell the idea of doing it right the first time.

Posted by: Dave Schuler [TypeKey Profile Page] at February 23, 2007 12:43 PM