Linux.conf.au 2015 – Day 2 – Session 3 – Sysadmin

Alerting Husbandry – Julien Goodwin

Obsolete alerts
- New staff members won’t have context to know was is obsolete and should have been removed (or ignorened)
Unactionable alerts – It is managed by another team but thought you’d like to be woken up
SLA Alerts – can I do something about that?
Bad thresholds ( server with 32 cores had load of 4 , that is not load ), Disk space alerts either too much or not enough margin
Thresholds only redo after complete monitoring rebuilds
Hair trigger alerts ( once at 51ms not 50ms )
Not impacting redundancy ( only one of 8 web servers is down )
Spamming alerts, things is down for the 2925379857 time. Even if important you’ve stopped caring
Alerts for something nobody cares about, eg test servers
Most of earlier items end up in “don’t care” bucket
Emails bad, within a few weeks the entire team will have a filter to ignore it.
Undocumented alerts – If it is broken, what am I supposed to do about it?
Document actions to take in “playbook”
Alert acceptance practice, only oncallers should e accepting alerts
Need a way to silence it
Production by Fiat

Managing microservices effectively – Daniel Hall

Corralling logs with ELK – Mark Walkom

You don’t want to be your bosses grep
Cluster Elastisearch, single master at any point
Sizing best to determine with single machine, see how much it can hadle. Keep Java heap under 31GB
Lots of plugins and clients
APIs return json. ?pretty makes it looks nicer. The ” _cat/* ” api is more command line
new node scales, auto balancers and grows automatic
Logstash. lots of filters, handles just about any format, easy to setup.
Kibana – graphical front end for elastisearch
Curator, logstash-forwarder, grokdebugger

FAI — the universal deployment tool – Thomas Lange

From power off to applications running
It is all about installing software packages
Central administration and control
no master or golden image
can be expanded by hooks
plan your installation and FAI installs the plan
Boot up diskless client via PXE/tftp
creates partitions, file systems, installs, reboots
groups hosts by classes, mutiple classes per host etc
Classes can be executables, writeing to standard output, can be in shell, pass variables
partitioning, can handle LVM, RAID
Projected started in 1999
Supports debian based distributions including ubuntu
Supports bare metal, VM, chroot, LiveCD, Golden image

Documentation made complicated – Eric Burgueno

Incomplete, out of date, inconsistent
Tools – Word, LibreOffice -> Sharepoint
Sharepoint = lets put this stuff over here so nobody will read it ever again
txt , markdown, html. Need to track changes
Files can be put in version control.
Mediawiki
Wiki – uncontrolled proliferation of pages, duplicate pages
Why can’t documentation be mixed in with the configuration management
Documentation snippits
- Same everywhere (mostly)
- Reusable
Transclusion in mediawiki (include one page install another)
Modern version of mediawiki have parser functions. display different content depending on a condition
awesomewiki.co