Alerting Husbandry – Julien Goodwin
- Obsolete alerts
- New staff members won’t have context to know was is obsolete and should have been removed (or ignorened)
- Unactionable alerts – It is managed by another team but thought you’d like to be woken up
- SLA Alerts – can I do something about that?
- Bad thresholds ( server with 32 cores had load of 4 , that is not load ), Disk space alerts either too much or not enough margin
- Thresholds only redo after complete monitoring rebuilds
- Hair trigger alerts ( once at 51ms not 50ms )
- Not impacting redundancy ( only one of 8 web servers is down )
- Spamming alerts, things is down for the 2925379857 time. Even if important you’ve stopped caring
- Alerts for something nobody cares about, eg test servers
- Most of earlier items end up in “don’t care” bucket
- Emails bad, within a few weeks the entire team will have a filter to ignore it.
- Undocumented alerts – If it is broken, what am I supposed to do about it?
- Document actions to take in “playbook”
- Alert acceptance practice, only oncallers should e accepting alerts
- Need a way to silence it
- Production by Fiat
Managing microservices effectively – Daniel Hall
- Step one – write your own apps
- keep state outside apps
- not nanoservices, not milliservices
- Each should be replaceable, independantly deployable , have a single capability
- think about depandencies, especially circular
- Packaging
- small
- multiple versions on same machine
- in dev and prod
- maybe use docker, have local registry
- Small performance hit compared to VMs
- Docker is a little immature
- Step 3 deployment
- Fast in and out
- Minimal human interaction
- Recovery from failures
- Less overhead requires less overhead
- We use Meso and marathon
- Marathon handles switches from old app to new, task failure and recover
- Early on the Hype Cycle
- Extra Credit Sceduling
- Chronos within Mesos
- A bit newish
Corralling logs with ELK – Mark Walkom
- You don’t want to be your bosses grep
- Cluster Elastisearch, single master at any point
- Sizing best to determine with single machine, see how much it can hadle. Keep Java heap under 31GB
- Lots of plugins and clients
- APIs return json. ?pretty makes it looks nicer. The ” _cat/* ” api is more command line
- new node scales, auto balancers and grows automatic
- Logstash. lots of filters, handles just about any format, easy to setup.
- Kibana – graphical front end for elastisearch
- Curator, logstash-forwarder, grokdebugger
FAI — the universal deployment tool – Thomas Lange
- From power off to applications running
- It is all about installing software packages
- Central administration and control
- no master or golden image
- can be expanded by hooks
- plan your installation and FAI installs the plan
- Boot up diskless client via PXE/tftp
- creates partitions, file systems, installs, reboots
- groups hosts by classes, mutiple classes per host etc
- Classes can be executables, writeing to standard output, can be in shell, pass variables
- partitioning, can handle LVM, RAID
- Projected started in 1999
- Supports debian based distributions including ubuntu
- Supports bare metal, VM, chroot, LiveCD, Golden image
Documentation made complicated – Eric Burgueno
- Incomplete, out of date, inconsistent
- Tools – Word, LibreOffice -> Sharepoint
- Sharepoint = lets put this stuff over here so nobody will read it ever again
- txt , markdown, html. Need to track changes
- Files can be put in version control.
- Mediawiki
- Wiki – uncontrolled proliferation of pages, duplicate pages
- Why can’t documentation be mixed in with the configuration management
- Documentation snippits
- Same everywhere (mostly)
- Reusable
- Transclusion in mediawiki (include one page install another)
- Modern version of mediawiki have parser functions. display different content depending on a condition
- awesomewiki.co