February 28, 2007
Thoughts on Programming's Future
I came across this interview with Richard Gabriel, from 2002, that takes a wonderful view of one possible direction for programming:
So, because you can program well or poorly, and because most of it is creative (in that we don't really know what we're doing when we start out), my view is that we should train developers the way we train creative people like poets and artists. People may say,"Well, that sounds really nuts." But what do people do when they're being trained, for example, to get a Master of Fine Arts in poetry? They study great works of poetry. Do we do that in our software engineering disciplines? No. You don't look at the source code for great pieces of software. Or look at the architecture of great pieces of software. You don't look at their design. You don't study the lives of great software designers. So, you don't study the literature of the thing you're trying to build.
Second, MFA programs create a context in which you're creating while reflecting on it. For example, you write poetry while reading and critiquing other poetry, and while working with mentors who are looking at what you're doing, helping you think about what you're doing and working with you on your revisions. Then you go into writers' workshops and continue the whole process, and write many, many poems under supervision in a critical context, and with mentorship. We don't do that with software.
I was talking to Mark Strand, who is one of the first poets who mentored me, and he said, more or less, that how good you are depends on how many poems you've written in your life. About two and a half years ago, I started writing a poem a day, and I've gotten way better since I started doing that. And so, I've probably written about 1000 poems in my life so far, almost all of them in the last two years.
Compare that to how many programs someone has written before they're considered a software developer or engineer. Have they written 1000? No, they've probably written 50. So, the idea behind the MFA in software is that if we want to get good at writing software, we have to practice it, we have to have a critical literature, and we have to have a critical context.
That, more or less, is a take on programming as a creative discipline. It is the classical education approach applied to computing, and it is a beautiful and wonderful idea. The part of me that wrote a lot of poems as a young man, the part that enjoys role-playing games and fiddling with software after everyone is asleep, really likes that approach.
But the part of me that goes to work every day, and looks at building robust systems that work, realizes that that approach to training programmers will only work for a small number: it's too difficult, time-consuming, and expensive. The immediate input is money and hard work, the output gain is in the far future, and is more variable and nebulous. Only a very small proportion of IT people could benefit — the future leaders and mentors. It is important, but not comprehensive, towards solving the software crisis.
(Background note: the software crisis is a term for the problem that it is nearly impossible to write programs that are not buggy, and even techniques like OO programming and formal design cannot help you past a few million lines, by which point the program is essentially guaranteed to be unrunnable because of bugs.)
Another approach is looking at programming languages. I had an epiphany a few weeks ago, wherein I realized that a large common element of many of the design problems being encountered in projects I work with is not that the developers are not good at figuring out how to correctly factor business logic, but that business logic is not an object or a characteristic of an object, so expressing it as objects is fundamentally wrong. Object orientation is just too simple to provide the correct abstractions beyond the model layer and the display-interaction part of the GUI (which is inherently physical). It cannot express business logic (processes, procedures, constraints, timing sequences, coupled rule sets and the like) appropriately.
In looking around for people already working on this, one of my colleagues pointed me to Sun's "metaphors" concept. I think that, from an enterprise software point of view, fixing the problem at this level — ending the artificial separation of design and coding, and allowing units of execution to run anywhere in an organic fashion (delegating deployment and execution decisions by rule set rather than by fixed implementation, for example) — would be a great step forwards. The problem that this attacks is that reusability is hard, currently. Take a simple idea of an object, and put it into a radically different infrastructure than what it was built for, and it often fails. Consider a system with a singleton logger, running on two different systems. Which gets the singleton, and how does the other communicate with it? Or do they each get one singleton, and hope nothing breaks?
In the end, I think that there are three fundamental changes that have to happen before software creation can become robust and inexpensive: a better job must be done training developers, system administrators and technical management, particularly at the beginning of their careers; a better programming paradigm must be developed to simplify the creation and maintenance of large-scale, distributed mixed behavioral/data-centric applications; and we must figure out a way to get people to actually start using the various tools that have become available over the past few decades to simplify the process of system development and integration.
On that last point, two of the little-discussed reasons why engineering software is not like engineering a bridge, are that project managers over bridges are usually qualified engineers in their own right, and that a project manager building a bridge would never dream of saying something like, "We're behind schedule, so we're leaving out half the supports on this one span to meet our deadlines." That happens — all the time — in software development. Posted by jeff at 2:59 PM | Comments (2) | TrackBack
February 19, 2007
The Most Common, and Most Costly Possible, IT Project Management Failure
So what invidious failure could it be to merit the title of most common and most costly IT project management failure? Is it staffing unqualified people? No, that can be fixed without even throwing away a version of whatever it is you're building. Is it bad requirements gathering? Not really, though that is a common failure. Usually, bad requirements gathering means you throw the first one away and build the second to the now properly-gathered requirements. You may end up embarrassed, even fired, but from a corporate point of view it's hardly the end of the world. Is it inadequate testing? Nope, though that, too, is pretty bad and pretty common. An inadequately-tested application means that your customers are debugging the application instead of your programmers. (In other words, think of almost any released Microsoft product.) The bugs that are found can be fixed with point updates — this is painful and expensive, but there are worse project management monsters to slay.
No, the most common and most costly possible IT project management failure is simply managing to dates instead of work effort. This is the gift that just keeps on giving: you pay for the software when you build it badly because you are too rushed to build it well, and again with every fix and every patch and every feature added thereafter. You pay for it over and over again, until you stop.
Let me give you a real-life example. A project I am peripherally involved with (that is, I work with the primary reviewer, and occasionally review; I neither design nor manage nor even approve) has gone through the following cycle:
- The first design was horrible. There were hundreds of review comments from the first design review. Many of the basic ideas behind data modeling and object-oriented programming were clearly either not understood, or were so badly expressed as to be almost comically bad. It was very, very clear that the program would be at least twice as large as it had to be, and probably larger, and that it would be so complex as to be nearly impossible to maintain with any efficiency.
- The right action at this point would have been to can the design, can the designers, and start over. OK, I probably would have given the designers general feedback and then had them start over, canning them if they still obviously didn't get it. Nonetheless, the key item here is that the design was clearly flawed. (Moreover, it came about during design reviews that the designers and the customers disagreed on what the requirements meant, and the designers were arguing requirements with the customer!) Clearly, the design should not be pursued.
- You can already see it coming: the design was used. Why? Because there was a deadline to meet, and the prior design group that was canned had ended up pushing the project well behind schedule. But it's worse than that: not only was the design used, but the project manager decided that the number of comments on the design document meant that the document was flawed, rather than the design, and so canned the "documentation" effort until after coding. Besides, it was already partly coded, even though design wasn't done. Did I mention yet that the requirements gathering done by the first group was terrible, but was used going forwards anyway because of, wait for it, lack of time in the schedule to redo it right. That, by the way, is why the designers ended up arguing requirements with the customer.
- Anyway, the beat goes on, and application coding is completed. Now it's time to document the design. Except no one actually knows what the design is. Thus, every design document produced creates more questions than it answers, and the review architects begin to get slowly sucked into the weeds, all the way down to the code more than once. At times, the review architects are creating diagrams to check their understanding, because the design team can't produce the diagrams themselves!
- In parallel with the documentation and reviews noted, the application goes through testing. When the user acceptance tests come, the application is rejected utterly. The users find so many functional and security gaps that they refuse to let it be deployed. There wasn't time to design it well, so now the company has to invest in emergency fixes and bringing on extra people and staffing 24 hours and so forth, just to get it deployed at the pilot sites (six of the 180-some sites that will eventually, theoretically, have the application).
- There is a follow-on project, that adds significant capabilities and is scheduled for some six months out from the point that the users reject the prior version. Clearly, the user issues need to be fixed immediately to allow wider deployment. Clearly, there's no time to go fix even the most egregious flaws in the architecture and design. Clearly, there's no time to document this before we code it. I'm sure you can see where this is going. The originally-planned follow-on project is pushed out just long enough to put in an interim project, intended to fix the most critical bugs and add the most critical features required of, but not delivered by, the recently-rejected version. And because this has to happen quickly to meet the schedule, the documentation (ie, the design work) is again pushed to the back of the effort, and no time is available for architectural fixes. I can buy that, except that there is still denial going on about the inevitable result: we are digging deeper into the hole of bad design we are already in, and for the same reasons.
- By this point, as we are in testing for the interim version, everyone agrees that the design is deeply flawed. The critical flaws that were called out in the original design review, and ignored, more than a year prior have now resulted in bugs that have had the application down for weeks at the pilot sites, as well as a squadron of emergency fixes to correct all kinds of previously-identified, and ignored, issues. But now we're even further behind, because we made some assumptions in November to meet our schedule for February, that required another team to deliver something in October that they told us up front they could not deliver until January. The schedule demanded, and then when reality delivered as promised, more slippages and emergency work followed, as night follows day. So now, let's look ahead to the next version, the one that was pushed out to slot in the interim version.
- Are we going to fix the architectural issues? No; no time is available in the schedule for this, because we have to implement these new features for the business, and we were supposed to have delivered them already. So we will have time to design before we code, but we will not have time to fix anything already identified. Indeed, by and large the new team (much more technically competent overall than the old team) will begin by extending the problematic parts of the old design, making the hole deeper, instead of refactoring first.
- There is, now, clearly a need identified for a version beyond the one already in design. There are numerous fixes that need to be put in place, as well as adding in sufficient functionality to bring us up to where the original version was supposed to be. This version is about to go into planning, and is scheduled for end-of-year delivery. Now, we've spent three+ years and millions of dollars to build something that should have been done in one year for about a million dollars, and every time we've had to fix or extend the system, we've paid more for it, and taken more time, than we needed to do. Now, it seems, there is finally a version with time and scope for fixes, because we can rearchitect and refactor, then add in the functions, faster and cheaper than we can just add in the functions.
- By now, you should have figured this out. If we take the time to rearchitect and refactor, that's time, in the project managers' minds, that is not available for adding in the new functionality that's needed, and all the estimates tell us that every moment is necessary to meet the current schedules. No argument that we are actually shrinking the schedule and budget will be entertained, because there is no connection in the PM's heads between code complexity and size on the one hand, and cost to maintain and extend on the other.
Yeah, I had a terrible day at work. Why do you ask?
Posted by jeff at 9:17 PM | Comments (1) | TrackBack
February 11, 2007
The Problems with the Precautionary Principle
I was watching a Penn and Teller piece on environmentalists, courtesy of the Jawa Report, and was struck anew by the fallacies embedded in the precautionary principle. I need to differentiate between Wikipedia's definition and the way that the precautionary principle is used in practice, even though both are wrong, and the Wikipedia's more reasonable-sounding definition in fact only sounds more reasonable.
Wikipedia says the precautionary principle states that "if an action or policy might cause severe or irreversible harm to the public, in the absence of a scientific consensus that harm would not ensue, the burden of proof falls on those who would advocate taking the action." The more common way of using the precautionary principle is to claim that if there is the possibility, no matter how remote, of harm being caused by action or inaction (on certain issues), then the action or inaction which might possibly cause harm cannot be done until the proponents of the action (or inaction) can prove that no harm will result.
The first fallacy is hidden in the unstated premise: those issues to which the precautionary principle may be applied are inevitably the issues of interest to anti-capitalists, anti-industrialists, socialists, environmentalists and the like. The precautionary principle is never applied by such people to problems such as, say, Iran, or jihadi terrorism, and indeed is rejected utterly in relation to issues not of interest to the radical Left.
The second fallacy is obvious: proof is not possible in science. All that you can do is build up evidence. But even overwhelming evidence falls immediately in the face of a single convincing counter-example. This is largely a problem with the Wikipedia definition, which stresses science and absolute terms. In the more common definition, the problem is worse: a veto is given to those on the correct (ie, more "progressive") side; until you prove to their satisfaction that you can act against their interests without causing harm, you must either do what they want, or not do what they do not want, depending on the issue. In other words, the precautionary principle is a way of stopping the argument in place: you cannot act because it might cause harm, and you cannot debate the point because we refuse to accept any evidence you might offer. Ergo, we win.
But there is a third and more insidious fallacy buried in the precautionary principle: there is no such thing as an absolute lack of harm. Every decision that we take, every thing that we do, causes harm in some measure to someone or something, particularly when we include those acts which we do not do. Let's take an easy example: if I spend money on environmental causes, that money is not available to feed my family, or prepare for my retirement. Now, an environmentalist might say that is true, but that the money spend on environmental causes does far more good for me (and if that argument fails, they will appeal to the greater good) than using my money for the other purpose. That would be a reasonable argument, and we could perhaps debate the point. They might even convince me in some circumstances. But the proponents of the precautionary principle implicitly exclude that cost-benefit analysis from issues to which the precautionary principle is applied. The precautionary principle sets the value of any level of harm at infinity, and the value of any countervailing cost at effectively zero. In other words, the precautionary principle assumes that any cost, no matter how massive, is worth creating a benefit, no matter how small or arguable.
What is left out of this consideration is the opportunity costs: what else could be done with the money? Let's say, for example, that the cost of fighting global warming is $10 trillion over some arbitrary unit of time. Let's further simplify by saying that global warming is happening, it is caused by humans, it will cause disastrous effects — let's assume as a single unit of measure 1 billion human deaths — if not stopped, and that spending that money will stop the warming and prevent the ill effects. Finally, let's discard even the possibility of natural causes of global warming, and assume that the climate does not change at all absent human causes. Now, here we are giving every arguable point (and more) to the proponents of anthropogenic global warming. So, why would we not, then, necessarily buy into the necessity of spending that money in that way?
What if we could prevent 2 billion deaths by spending $1 trillion over the same time period? That would leave another $9 trillion available for other uses, and would save additional lives besides. But the precautionary principle prevents any discussion of such tradeoffs, because it assumes infinite harm and zero cost, and so any discussion of using the resources available in different ways, even to the same end, is out of bounds: even if you save 2 billion lives, they are different lives, and that 1 billion might still die. In other words, a second line of argument-stopping "we win" assumptions is built into the precautionary principle.
But that in fact is the principle's power, and why it will continue to be made, even — in fact, particularly — in cases like global warming where there is a significant range of variation in possible outcomes and costs of action. That it is dishonest and fallacious does not appear in any way to enter the considerations of those who rely on the precautionary principle.
Refining the Process
So I was going to define jargon first, so my IT posts would be intelligible to non-professionals. That is not practical, it turns out, because there is a huge base of jargon I have to define before each article makes sense. For the last piece I wrote like this, I went over a month with it sitting unpublished because I didn't want to write the 8 jargon entries first. So here's the new deal: if you don't understand, and want to, ask. I'll answer.
Why Software Tends to be Bad
Designing and building good software is difficult. If you doubt this, answer the following:
- Do you have/have you used more software with bad interfaces (confusing, hidden features, too much exposed functionality, weird tab orders) or good interfaces (clean, consistent, exposing what is important and hiding power-user options)?
- Do you have/have you used more software with or without noticeable bugs or design flaws? How about with or without occasional major bugs or design flaws that cause lost data, including application crashes?
- How often do you just give up on using software because you can't figure it out, or it's too cumbersome, or it behaves badly?
- How often do you need technical support to get beyond the most rudimentary features of your software?
The problem is that software is terribly prone to failure. Engineers express the degree to which a mechanical system is prone to failure in terms of the number of moving parts and the total number of parts, moving or otherwise. A moving part (think of the wheel bearings in your car) wears down over time, and thus can fail. Properly designed, a non-moving part will not fail under its design loads, but the more parts that there are, the more chance that there will be some flaw in design, manufacture or assembly. The software analogy of a part, in modern languages, is a semicolon; every programmatic statement in languages like C/C++, Java, and Perl terminates with a semicolon. The software analogy of a moving part is code that can change its behavior as circumstances (such as data) change, or code only exercised under uncommon circumstances (and thus that might not be reached by test cases). The number of parts and particularly of "moving parts" in software is far, far larger than any mechanical system, and thus software is inherently more prone to failure than mechanical systems.
This complexity can be mitigated in many ways. These include creation and reuse of standard components for standard tasks or entities (and the consequent multiple cycles of refinement), use of standardized ways of doing common tasks, encapsulation and abstraction, proper requirements gathering and test case development/execution, rigorous unit testing that reaches every branch, automated code analysis, and good logical design practice. The three very powerful tools that have arisen from various combinations of these mitigation techniques and tools are best practices for application design, standardized design patterns, and object-oriented coding practices.
Sadly, in the real world, these practices are more honored in the breach. In part, this arises, in software developed by companies, from the fact that most computer people are magicians, who don't understand these tools, or bureaucrats, who don't want to pay for using them. Proper coding is expensive, and it's often difficult to convince people that it's easier and cheaper to design and code correctly once, than to redesign and recode several times.
In academic and open source software, which attracts vastly more artists, the underlying code is often wonderful, while the interfaces and reports are miserable and the software is incredibly difficult to use. I have actually heard university-based programmers say that if you can't understand their software's interface, the problem is yours, rather than the interface's. If the measure of software's utility is how widely it's used within its problem domain, academic software tends to be the least useful code written.
Of particular concern to businesses, heavyweight development methodologies are expensive, because they assume that people will make mistakes, and mitigate this tendency by making people do sufficient verification work before coding to (theoretically) ensure that mistakes are caught. Agile development is much cheaper, but only works if your people are in the top few per cent of the industry (which makes them more expensive to employ, of course), your development is done in-house, and you have good or at least well-understood business processes already in place.
Building software to do more than you need today is more expensive than building software for only your current needs. Building services and libraries saves you in the long run because you only write them once. The practical upshot is that most managers tend to want to use agile development methods even though their staff is incapable of doing so, or their business processes are immature; and most managers tend to want to ignore reusable code because their budget and schedule are based on this project, not the next one.
But academically-developed software, and much open source, is built by programmers for programmers with very little attention to usability. In some cases, such as for code libraries or faceless servers, this works very well. In others, such as for finished desktop applications, it often works very, very badly. Software that is perfect, but unusable, is not any better (except for strip mining the base code) than software that is imperfect but usable. In many ways, perfect base code with a lousy interface is worse than bad base code with a usable interface.
It is possible to build good software. But it's not common.
February 10, 2007
Can't Keep up with Catastrophe
Is it the "scientific consensus" on global warming, or the "almost unanimous" agreement of meteorologists on global cooling, or the ozone hole, or acid rain, or asteroid impact or that I'm supposed to be panicking about today? I can't keep up any more.
February 5, 2007
From Armed Liberal, check out Michael Wesch's impressive — what? Presentation? Movie? Artwork? — on Web 2.0. It happens that I'm not usually captured by the latest computer buzzwords, because my entry into the Internet was in the Spring of 1988, when not only did the web not exist, but its precursor (gopher) did not exist either. With that background, I often see the new buzzwords as old concepts repackaged. SOA? It's just the concept of services — like your web server or FTP server or mail server — where the interface is on port 80 (the web port) and is defined by description files rather than by a pre-agreed protocol. Useful, yes, but hardly earth-shattering in either concept or execution; I'd be happier to see more businesses get the first level of reuse right than to see more businesses jump on SOA — the payback is higher, and most companies haven't gotten to the level of code reuse across projects in the same group, never mind recycling of code and entities at an enterprise level.
In a way, web 2.0 is like that: it's a buzzword for a collection of web services and sites that really are rehashes of things already there. In a way.
But there is more to it than that, because it has always been the case that increasing the number of people capable of sharing information, and the amount of information they are able to share, changes the world. And web 2.0, stripped of the hype and boiled down to the common elements that tie these various sites and services together, is about one thing only: making it possible for everyone to share any information at all, any time, to anyone, without a mediator, a priest, a government official, an editor, a reviewer or anyone else in the way. That is the promise of web 2.0: universality of conversation, creation, art, life.
Now there I go sounding like the various hype-driven tech press organizations. But seriously, that is a non-trivial change. When I was a child, if someone had an idea, their capacity to share it was limited to people who knew them. Maybe a few would have type-written newsletters, perhaps for a school or company, in which they could share their idea. Even fewer, less than one in one hundred, would have the ability to get an idea consistently into the local newspaper, even as a letter to the editor (because there were so many, and such little space to print them). No more than one in a thousand — probably no more than one in one hundred thousand — could get their idea onto the radio, or on TV, or even in a book. The cost of sharing information was high; the speed was low. The barriers to entry, then, were such that relatively few ideas could be widely and quickly shared.
With web 2.0 — a buzzword I still hate — some professor in a relatively minor college has an idea, creates an expression of that idea, and it gets picked up by a guy in Canada, where it is seen by a blogger from Winds of Change, where I see it, and now you see it here. Moreover, at any or every step of the way, the idea and its expression could be passed verbatim, without notable chance of error, or modified into something completely new. I could, for example, take the idea, and make a different presentation, perhaps in the hated Powerpoint. Or I could snip the video apart and put in my own images and ideas to change the emphasis. You can do that, too. Everyone can.
I'm not one for hype, but sometimes it is justified.Posted by jeff at 7:04 PM | TrackBack