Kubernetes
I’ll fill this in later.
Observability
- Honeycomb, Sumologic. Use AI to look at what happened at same time and magically correlate
- Expensive or hard to send all logs as volumes go up
- What is the logging is wrong or missing?
- Metrics
- Export in prometheus format
- Read RED and USE paper
- Create a company schema with half a dozen metrics that all services expose
- Had and event or transaction ID that flows across all the microservices sorry logs can be correlated
- Non technical solutions
- Refer to previous incident logs
- Part of deliverables for product is SLA stats which require logs etc
- Testing logs
- Make sure certain events produce a log
- Chaos Monkey
ANZ Drivetrain
- Change control cares about
- Avaiability
- Risk
- Dependencies
- Rollback
- But the team doing the change knows about these all
- Saw tools out there that seem very opinated
- Drivetrain
- Automated Checklist
- Work with Change people to create checklist
- Pipeline talks to drivetrain and tells it what has been down
- Slack messages sent for manual changes (they login to app to approve)
- Looked at some other tools (eg chef automate, udeploy )
- Forced team to work in a certain pattern
- But use ServiceNow tool as official corporate standard
- Looking at making DriveTrail fill in ServiceNow forms
- People worried about stages in tool often didn’t realise the existing process had same limitations
- Risk assessed at the Story and Feature level. Not release level
- Not suitable for products that due huge released every few months with a massive number of changes.