Audits
Why you need to Audit
People suck, we all make mistakes, there's very little we can do about it except try to catch and fix as many as possible. For this reason, Auditing is critical to the success of Managing any Operating System. Every so often, preferably in a time of relative stability, it is important to perform an Audit of your system.
We perform Audits to ensure that set-up on our servers remains consistent, working and up-to-date. When something goes wrong, it is essential that you that a machine was in a known-state at a certain time.
How to Audit
We perform our Audits manually, using a paper and pen, we've found this
works best. Our audits consist of a Military style checklist, with actual
checkboxes on paper, layed out in a logical order. Ours contains simple
things like checking the contents of /etc/resolv.conf and
/etc/apt/sources.list but also more important things like
ensuring that unneccessary services have been disabled from inetd
and that the system has a working MTA and mail about updates can actually
reach administrators.
Our audit procedure also contained adequate space on the reverse for the auditor to make notes of outstanding issues. Thus an audit will also serve as a repository of information on how any of your systems are in any way non-standard. A glance over our audits will tell you which servers are non-Dell, which require special lilo boot arguments and which do not boot cleanly.
If any aspect of the Audit is service affecting, it should occur during scheduled maintainence windows or designated at-risk periods.
