DevOpsDaysNZ 2018 – Day 2 – Session 3

Kubernetes

I’ll fill this in later.

Observability

Honeycomb, Sumologic. Use AI to look at what happened at same time and magically correlate
Expensive or hard to send all logs as volumes go up
What is the logging is wrong or missing?
Metrics
- Export in prometheus format
- Read RED and USE paper
- Create a company schema with half a dozen metrics that all services expose
Had and event or transaction ID that flows across all the microservices sorry logs can be correlated
Non technical solutions
- Refer to previous incident logs
- Part of deliverables for product is SLA stats which require logs etc
Testing logs
- Make sure certain events produce a log
Chaos Monkey

ANZ Drivetrain

Change control cares about
- Avaiability
- Risk
- Dependencies
- Rollback
But the team doing the change knows about these all
Saw tools out there that seem very opinated
Drivetrain
- Automated Checklist
- Work with Change people to create checklist
- Pipeline talks to drivetrain and tells it what has been down
- Slack messages sent for manual changes (they login to app to approve)
Looked at some other tools (eg chef automate, udeploy )
- Forced team to work in a certain pattern
But use ServiceNow tool as official corporate standard
- Looking at making DriveTrail fill in ServiceNow forms
People worried about stages in tool often didn’t realise the existing process had same limitations
Risk assessed at the Story and Feature level. Not release level
Not suitable for products that due huge released every few months with a massive number of changes.