Linux.conf.au 2015 – Day 2 – Session 3 – Sysadmin

Alerting Husbandry – Julien Goodwin

  • Obsolete alerts
    • New staff members won’t have context to know was is obsolete and should have been removed (or ignorened)
  • Unactionable alerts – It is managed by another team but thought you’d like to be woken up
  • SLA Alerts – can I do something about that?
  • Bad thresholds ( server with 32 cores had load of 4 , that is not load ), Disk space alerts either too much or not enough margin
  • Thresholds only redo after complete monitoring rebuilds
  • Hair trigger alerts ( once at 51ms not 50ms )
  • Not impacting redundancy ( only one of 8 web servers is down )
  • Spamming alerts, things is down for the 2925379857 time. Even if important you’ve stopped caring
  • Alerts for something nobody cares about, eg test servers
  • Most of earlier items end up in “don’t care” bucket
  • Emails bad, within a few weeks the entire team will have a filter to ignore it.
  • Undocumented alerts – If it is broken, what am I supposed to do about it?
  • Document actions to take in  “playbook”
  • Alert acceptance practice, only oncallers should e accepting alerts
  • Need a way to silence it
  • Production by Fiat

 

 

Managing microservices effectively – Daniel Hall

  • Step one – write your own apps
  • keep state outside apps
  • not nanoservices, not milliservices
  • Each should be replaceable, independantly deployable , have a single capability
  • think about depandencies, especially circular
  • Packaging
    • small
    • multiple versions on same machine
    • in dev and prod
    • maybe use docker, have local registry
    • Small performance hit compared to VMs
    • Docker is a little immature
  • Step 3 deployment
    • Fast in and out
    • Minimal human interaction
    • Recovery from failures
    • Less overhead requires less overhead
    • We use Meso and marathon
    • Marathon handles switches from old app to new, task failure and recover
    •  Early on the Hype Cycle
  • Extra Credit Sceduling
    • Chronos within Mesos
    • A bit newish

 

Corralling logs with ELK – Mark Walkom

  • You don’t want to be your bosses grep
  • Cluster Elastisearch, single master at any point
  • Sizing best to determine with single machine, see how much it can hadle. Keep Java heap under 31GB
  • Lots of plugins and clients
  • APIs return json. ?pretty makes it looks nicer. The ” _cat/* ” api is more command line
  • new node scales, auto balancers and grows automatic
  • Logstash. lots of filters, handles just about any format, easy to setup.
  • Kibana – graphical front end for elastisearch
  • Curator, logstash-forwarder, grokdebugger

FAI — the universal deployment tool – Thomas Lange

  • From power off to applications running
  • It is all about installing software packages
  • Central administration and control
  • no master or golden image
  • can be expanded by hooks
  • plan your installation and FAI installs the plan
  • Boot up diskless client via PXE/tftp
  • creates partitions, file systems, installs, reboots
  • groups hosts by classes, mutiple classes per host etc
  • Classes can be executables, writeing to standard output, can be in shell, pass variables
  • partitioning, can handle LVM, RAID
  • Projected started in 1999
  • Supports debian based distributions including ubuntu
  • Supports bare metal, VM, chroot, LiveCD, Golden image

 

Documentation made complicated – Eric Burgueno

  • Incomplete, out of date, inconsistent
  • Tools – Word, LibreOffice  -> Sharepoint
  • Sharepoint = lets put this stuff over here so nobody will read it ever again
  • txt , markdown, html. Need to track changes
  • Files can be put in version control.
  • Mediawiki
  • Wiki – uncontrolled proliferation of pages, duplicate pages
  • Why can’t documentation be mixed in with the configuration management
  • Documentation snippits
    • Same everywhere (mostly)
    • Reusable
  • Transclusion in mediawiki (include one page install another)
  • Modern version of mediawiki have parser functions. display different content depending on a condition
  • awesomewiki.co