In one of the more recent updates I included some system profiling information which revealed the biggest consumers of system cpu time:
    38 e1000_intr                                 0.1250
    39 flush_tlb_others                           0.1625
    44 e1000_clean                                0.1719
    48 __d_lookup                                 0.1667
    48 find_vma                                   0.4286
    54 ipt_do_table                               0.0614
    60 __reacquire_kernel_lock                    0.7500
    87 lock_kernel                                1.8125
   100 _spin_unlock_irqrestore                    6.2500
   101 schedule                                   0.0295
   415 sigprocmask                                1.9952
  1084 do_sigaction                               1.8819
 16567 mwait_idle                               207.0875
 19864 total                                      0.0060

The main culprits were sigprocmask and do_sigaction, it was found that changing your cancellation state via pthreads with NPTL was manipulating the signal mask for the process - so we dropped those functions and saw a significant improvement.

However, sigprocmask and do_sigaction still held a significant position in the profiling output, and our system cpu continued to be higher than I felt was appropriate. And more importantly, it did not scale linearly, as the number of threads increased the system cpu would get more and more out of control even though not many of them were running concurrently. After some research it was found that the glibc version used in debian sarge installs and removes a SIGPIPE handler *every time you call syslog()*. And the 2.6 linux kernel does not like this at all when your process has alot of threads, it has to update signal-related data for every thread in the process every time it happens (O(N)).

Upon learning of this problem it became immediately clear that we needed to lose the local syslogd and forget syslog() exists. Just send the UDP packets straight to the syslog collection server. Lose the IPC overhead of talking to syslogd and lose this signal sickness.

Thats just what I did and below are the results.

Previous libbingo using syslog() function in the log worker thread:

New libbingo using snprintf() and UDP sendto() in the log worker thread:

One Dell 2850, doing all the pop, and all the deliveries, for > 300,000 pop accounts.