The Rat in the maze
sometime after 2pm the load on service-00 jumped up to >100;
'top' showed slurmctld at the top;
tried a previous solution of setting max_rpc_count to 32
worked temporarily, load dropped to ~80 and then went up again.
messages in /var/log/slurmctld were showing 163 & higher pending RPCs at cycle's end;
set max_rpc_count to 163. slurmctld is no longer at the top of 'top', yet named is.
named logging a lot of queries for ipv6 addresses
in /etc/sysconfig/named set OPTIONS="-4" so that named only responds to ipv4-related queries
also named was getting somewhat saturated with requests from ip=221.6.4.77 and 221.6.4.78. iplookup showed these are coming from china. dropped them in iptables.
in the end load stabilized....
The End.

0 Comments:
Post a Comment
<< Home