Saturday, March 03, 2007

Fun with OpenLDAP, nscd, Cyrus-Imap, and Bind

After updating a server this weekend, I restarted to verify that all services would come back up and that's where the fun started...

The first issue I noticed was that Apache would not start. Nothing, I would issue a restart command:

primary ~ # /etc/init.d/apache2 restart

and I get no output, just a new prompt. The only problem here is that nothing is listening on port 80 or 443 and there are no processes running owned by apache.

primary ~ # netstat -anp --inet | egrep '80|443'

primary ~ # ps -eaf | grep apache

Nothing...

From here, I notice that bind isn't running and my email is getting queued up in postfix and not being delivered to my Cyrus-IMAP server. I see a lot of messages like this:

master[17957]: process 25160 exited, signaled to death by 13

...and...

Mar 2 14:53:43 primary master[15457]: process 15228 exited, signaled to death by 13
Mar 2 14:53:43 primary master[15457]: service imaps pid 15228 in READY state: terminated abnormally
Mar 2 14:53:43 primary master[15457]: process 15229 exited, signaled to death by 13
Mar 2 14:53:43 primary master[15457]: service imaps pid 15229 in READY state: terminated abnormally
Mar 2 14:53:43 primary master[15457]: process 15230 exited, signaled to death by 13
...and...
Mar 2 14:53:43 primary master[15457]: service lmtpunix pid 15318 in READY state: terminated abnormally
Mar 2 14:53:43 primary master[15457]: process 15319 exited, signaled to death by 13
Mar 2 14:53:43 primary master[15457]: service imaps pid 15319 in READY state: terminated abnormally
Mar 2 14:53:43 primary master[15457]: process 15320 exited, signaled to death by 13

At this point, I'm at a loss as to what is wrong and I start checking config files -- although this doesn't make a lot of sense since I didn't change any config files...(I do it anyway).

After a few hours of troubleshooting, I've re-installed Apache, Cyrus, Bind and tried different versions of each. I ended up getting Bind to work after compiling it without thread support, although this is not a good thing, as it means something else is wrong. I'm getting pretty frustrated without a working IMAP server and beginning to reconsider my decision to go with Cyrus in the first place.

Well...to make a long story short, it turns out that a few days prior to the updates, I had disabled nscd on this host to see how my OpenLDAP server would stand up to the increased load resulting from no caching. This did not seem to cause any problems and I had made this change persistent. Anyway, after searching around on the web for a while I discovered that another person had a similar issue with Cyrus where it would not start and it was due to an LDAP server being unavailable.

It turns out that if nscd is not available when Cyrus starts, and a multi-threaded bind, and Apache with my configuration, and the system is using LDAP for authentication, then these applications will not startup and run normally.

Big LESSON LEARNED for me -- always document changes, even at home on home servers, and make sure you restart the system after making big changes to verify that they don't affect other daemons. I learn this lesson about once a month on a new server, maybe this time it will persist.

No comments: