January 25, 2006
How to Hire a Good System Administrator
I am asked, periodically, how to hire good systems administrators, DBAs and integration people. Since it seems to be a topic of somewhat general interest at least among IT managers, I decided to address it here. But since most of my readers seem to come here for the political posts, I'll put it in the extended section.
Q: What are the qualities to look for in a good admin?
A: You want to get someone who is intelligent, capable of both intuition and logic, lazy, egotistical but not subject to easy shame, and a little bit compulsive (a lot persistent).
The kinds of systems installed in an enterprise are very, very complex. It takes someone with a fair degree of intelligence to be able to remember the interconnections and interactions of the various systems, as well as the components on each one, and know where to start looking when problems surface. For the same reason, you need someone capable of intuition (no one will ever completely understand the behavior of sufficiently complex installations), so that he'll know where to start looking, and with a very structured, logical mind, so that he'll know what to investigate and what to discard.
You want someone lazy, because lazy people hate to do work. Well, you've got the computers to do the work, so why should administrators have to do things? Things break, or are just not quite polished enough. A lazy system administrator will sometimes take weaks fixing things so that they don't break again, ever, or polishing the edges on something to save himself 5 minutes of work in a day. These kinds of improvements add up, in both system stability and reduced workload. That means that you can expand your environment without expanding your work force.
You want someone egotistical, because an egotistical person does not want to admit they cannot solve a problem. So they will work quite hard to solve a problem. And again, these are very complex systems, so someone without the ego investment will often fail to solve the really odd problems, the ones that you can work around at a cost, but whose causes are not apparent. However, you can't get someone so egotistical that they cannot admit to anything that would shame them: that type covers up their mistakes, and complete openness is required in order to track down unintended side effects. (Managers can help this along by not shooting the messenger, even when he's there to tell you that he just unintentionally took down your entire production environment.)
The reason that you want someone who is a bit compulsive, and quite persistent, is that some problems hover just below the level of "must fix now", so they don't get fixed. This is, again, often the case with difficult problems of uncertain provenance that have a (painful, but workable) workaround. A good admin will use his spare time to track down and fix these problems, because they bug him, and he can't leave them alone.
Q: How can I gauge an admin's true experience level, given how misleading resumes often are?
A: I've found that the best way to do this is to offer him the root password on a really big, really fast, really new system. If he's all eager to try it out, he's not very experienced. (If he starts asking about detailed OS levels, configuration and such, that's a clue.) A mid-level admin will just accept the password and ask what the machine is used for. A very experienced admin will groan, at least inwardly, and try to figure out how to avoid having the root password: he's been here before, and it's a jading experience after a while.
Q: What are good questions or tests?
A: Well, assuming that you are not particularly proficient yourself, find someone who is. Those kinds of questions vary by system, and it's hard to generalize. There are a few things you can do even if you don't understand the systems very well yourself:
Break the system (a test system, that is) in a known way. (For UNIX machines, setting application startup files to mode 000, or an invalid owner, is a good way to do this. Or fill up the filesystem on a box with a large file, open the file in an editor, then delete the file from the filesystem and leave the editor running. See if the candidate can figure out why the filesystem is full even though there aren't any files in it.) Let the admin fix it. It doesn't necessarily matter if he does; what you are looking for is whether he goes for the right ideas or not. The cleverer the break, the better the test, and the more likely the admin will be way off at first as he looks for the simple things. (If you see hoofprints, it's a good troubleshooting technique to look for horses before you look for zebras.)
Explain that each of your 45 boxes has a separate root password, that these are cryptic and random and changed monthly, and that this is done for security reasons. If he does not vociferously question your sanity, he either does not understand security or he is not assertive enough to stand up for himself when he knows he's right. Or, alternately, he figures he can get around that behind your back. In any case, if he doesn't protest, don't hire him.
Another good one is to posit a situation where a critical security patch has been made available by the vendor, and his manager is insisting it be installed immediately. What process does he use? If he starts with the production systems, or doesn't consider outage windows, he may not have ever worked in a true enterprise shop: the value of the data being compromised and the application run-time being threatened almost always outweighs the risk of an unpatched system over the short term, so expect protests that "now" is not advisable as a good time to install things, unless "now" is already in an outage window and the patch has already been tested.
Oh, and ask what kind of puzzles he liked to do when he was a kid (or likes now). Every good admin I know did puzzles as a kid: word finds, crosswords, cryptoquotes, logic puzzles — something.
Q: What did you mean before about resumes being misleading?
A: System administration is an apprenticed art. There are no formal methods to administration that are worth the time to learn, because problems are too complex, diverse and non-repetitive (well, the meaningful problems don't repeat, anyway) to reduce to a set of rules. As with auto mechanics, it's a process of accumulation of techniques and insights that leads to good results. The quality of an admin depends more on how he was mentored, or whether he was capable of self-mentoring (some are), than on how many machines of what kinds in what circumstances he has worked on. That kind of stuff doesn't show up on a resume. The best admin I ever hired had something like 6 months of formal experience, and boxes he played with at home. (He was a security guard before I hired him.) The worst admin I ever hired had 15 years' experience on UNIX systems. Don't trust resumes.
TrackBack URL for this entry:
As a systems integrator who is responsible for interviewing, I appreciate your article.
One suggestion... Was the applicant musically inclined? Many of the best sysadmin's I've worked with were played an instrument of some sort, as a child at least. Not sure why, but if you ask around, I bet you'll find the same.
Posted by: Chris Midkiff at January 25, 2006 3:51 PM
The one I was referring to as the best hire was not musically inclined. However, I have noticed that it is often the case that good admins are. (I, though, cannot carry a tune in a bag, and can barely play the radio.)
Posted by: Jeff Medcalf at January 25, 2006 5:48 PM
As someone who will probably be on the other side of these interviews in the not too distant future, I appreciate this post too. It's given me a few things to think about, and it's comforting that I think I mostly fit your description of good admin material, minus the compulsiveness.
Posted by: Matt McIntosh at January 25, 2006 10:33 PM
There's a high cross-correlation between musical ability and engineering ability. When I was in college the highest rate of transfer between schools was between the engineering school and the music school. The first chairs in the university symphony were mostly music majors; many of the rest were engineers.
Regardless of what title or certifications you may hold you're an apprentice for the first five years. 5-9 is a journeyman. Seniors have 9 or more years of experience. Ads for Senior system administrators—three years experience required are at best well-intentioned idiocy. It says more about what you're willing to pay (not much) than about what level of expertise is required.
Posted by: Dave Schuler at January 26, 2006 8:52 AM