38 e1000_intr 0.1250
39 flush_tlb_others 0.1625
44 e1000_clean 0.1719
48 __d_lookup 0.1667
48 find_vma 0.4286
54 ipt_do_table 0.0614
60 __reacquire_kernel_lock 0.7500
87 lock_kernel 1.8125
100 _spin_unlock_irqrestore 6.2500
101 schedule 0.0295
415 sigprocmask 1.9952
1084 do_sigaction 1.8819
16567 mwait_idle 207.0875
19864 total 0.0060
The main culprits were sigprocmask and do_sigaction, it was found that changing your cancellation state via pthreads with NPTL was manipulating the signal mask for the process - so we dropped those functions and saw a significant improvement.
However, sigprocmask and do_sigaction still held a significant position in the profiling output, and our system cpu continued to be higher than I felt was appropriate. And more importantly, it did not scale linearly, as the number of threads increased the system cpu would get more and more out of control even though not many of them were running concurrently. After some research it was found that the glibc version used in debian sarge installs and removes a SIGPIPE handler *every time you call syslog()*. And the 2.6 linux kernel does not like this at all when your process has alot of threads, it has to update signal-related data for every thread in the process every time it happens (O(N)).
Upon learning of this problem it became immediately clear that we needed to lose the local syslogd and forget syslog() exists. Just send the UDP packets straight to the syslog collection server. Lose the IPC overhead of talking to syslogd and lose this signal sickness.
Thats just what I did and below are the results.
Previous libbingo using syslog() function in the log worker thread:
-logging.cpu.rrd.png)
New libbingo using snprintf() and UDP sendto() in the log worker thread:
One Dell 2850, doing all the pop, and all the deliveries, for > 300,000 pop accounts.