Alerting Husbandry – Julien Goodwin
- Obsolete alerts
- New staff members won’t have context to know was is obsolete and should have been removed (or ignorened)
 
 
- Unactionable alerts – It is managed by another team but thought you’d like to be woken up
 
- SLA Alerts – can I do something about that?
 
- Bad thresholds ( server with 32 cores had load of 4 , that is not load ), Disk space alerts either too much or not enough margin
 
- Thresholds only redo after complete monitoring rebuilds
 
- Hair trigger alerts ( once at 51ms not 50ms )
 
- Not impacting redundancy ( only one of 8 web servers is down )
 
- Spamming alerts, things is down for the 2925379857 time. Even if important you’ve stopped caring
 
- Alerts for something nobody cares about, eg test servers
 
- Most of earlier items end up in “don’t care” bucket
 
- Emails bad, within a few weeks the entire team will have a filter to ignore it.
 
- Undocumented alerts – If it is broken, what am I supposed to do about it?
 
- Document actions to take in  “playbook”
 
- Alert acceptance practice, only oncallers should e accepting alerts
 
- Need a way to silence it
 
- Production by Fiat
 
 
 
Managing microservices effectively – Daniel Hall
- Step one – write your own apps
 
- keep state outside apps
 
- not nanoservices, not milliservices
 
- Each should be replaceable, independantly deployable , have a single capability
 
- think about depandencies, especially circular
 
- Packaging
- small
 
- multiple versions on same machine
 
- in dev and prod
 
- maybe use docker, have local registry
 
- Small performance hit compared to VMs
 
- Docker is a little immature
 
 
- Step 3 deployment
- Fast in and out
 
- Minimal human interaction
 
- Recovery from failures
 
- Less overhead requires less overhead
 
- We use Meso and marathon
 
- Marathon handles switches from old app to new, task failure and recover
 
-  Early on the Hype Cycle
 
 
- Extra Credit Sceduling
- Chronos within Mesos
 
- A bit newish
 
 
 
Corralling logs with ELK – Mark Walkom
- You don’t want to be your bosses grep
 
- Cluster Elastisearch, single master at any point
 
- Sizing best to determine with single machine, see how much it can hadle. Keep Java heap under 31GB
 
- Lots of plugins and clients
 
- APIs return json. ?pretty makes it looks nicer. The ” _cat/* ” api is more command line
 
- new node scales, auto balancers and grows automatic
 
- Logstash. lots of filters, handles just about any format, easy to setup.
 
- Kibana – graphical front end for elastisearch
 
- Curator, logstash-forwarder, grokdebugger
 
FAI — the universal deployment tool – Thomas Lange
- From power off to applications running
 
- It is all about installing software packages
 
- Central administration and control
 
- no master or golden image
 
- can be expanded by hooks
 
- plan your installation and FAI installs the plan
 
- Boot up diskless client via PXE/tftp
 
- creates partitions, file systems, installs, reboots
 
- groups hosts by classes, mutiple classes per host etc
 
- Classes can be executables, writeing to standard output, can be in shell, pass variables
 
- partitioning, can handle LVM, RAID
 
- Projected started in 1999
 
- Supports debian based distributions including ubuntu
 
- Supports bare metal, VM, chroot, LiveCD, Golden image
 
 
Documentation made complicated – Eric Burgueno
- Incomplete, out of date, inconsistent
 
- Tools – Word, LibreOffice  -> Sharepoint
 
- Sharepoint = lets put this stuff over here so nobody will read it ever again
 
- txt , markdown, html. Need to track changes
 
- Files can be put in version control.
 
- Mediawiki
 
- Wiki – uncontrolled proliferation of pages, duplicate pages
 
- Why can’t documentation be mixed in with the configuration management
 
- Documentation snippits
- Same everywhere (mostly)
 
- Reusable
 
 
- Transclusion in mediawiki (include one page install another)
 
- Modern version of mediawiki have parser functions. display different content depending on a condition
 
- awesomewiki.co