2018 – Day 4 – Session 3

Insights – solving every problem for good Paul Wayper


  • Too much to check, too little time
  • What does this message mean again
  • Too reactive

How Sysadmins fix problems

  • Read text files and command output
  • Look at them for information
  • Check this information against the knowlede
  • Decide on appobiate solution


  • Reads test files and outputs
  • Process them into information
  • Use information in rules
  • Rules provide information about Solution


  • Simple rule – check “localhost” is in /etc/hosts
  • Rule 2 – chronyd refuses to fix server’s time since is out by more than 1000s
    • Checks /var/log/message for error message from chrony
  • Insites rolls up all the checks against messages, so only down once
  • Rule 3 – rsyslog dropping messages