Building a zero downtime Kubernetes cluster by Feilong Wang
Working for Catalyst Cloud. Catalyst Cloud especially appealing to NZ customers who don’t want latency of going to Australia
Zero Downtime in K8s Context
– Downtime of the User applications
– Downtime of the k8s cluster
The ultimate goal is zero downtime for the customer applications.
User Applications
- Replicas >2 (ideally >3)
- podDisruptionbudget with minAvailbale
- Correct RollingUpdate strategy
- Connection Draining (using readynessProbe, handle SIGTERM)
- use prestop for apps that don’t handle sigterm
- HTTP Keep-Alive
Zero Downtime for the K8s Cluster
- Planned maintenance (eg an upgrade)
- Unexpected node broken
Planned
- Cordon and drain nodes, upgrade, uncordon
Unplanned Node Broken
- Failure detection
- Repair/Healing
- Manual or Automatic?
Detect Failure
- Detect failures from outside or inside the cluster
Draino + Cluster Autoscaler
- Detect node status/condition by draino
- Draino the node
- Autoscaler will remove the empty node since it’s workload is under 50%
- See also Node Problem Detector
Magnum AutoHealer
- Support master node and etcd repairing
- Autoscaler is responsible for repairing
- The node count is predictable after repairing
- Currently only supports openstack but could be extended
Like, Share and Subscribe: Effective Communication of Security Advice by Serena Chen
Tools and ideas to help you communicate security advice to friends and family who are not in tech.
Security Professionals are a bubble within the Tech Bubble.
Tell the people who are doing the wrong practises (like using Windows XP) that “we can’t help you”.
Nobody chooses to do the wrong thing and be insecure, they are trying to do the best for themselves.
What if people are not bad at security “because it is hard” but because they are not getting the right messages.
Personas
- Group 1
- Don’t know what good practice looks like
- Confused what to do
- Group 2
- Knows some good practises
- But doesn’t do any of them (eg knows about password managers but doesn’t use them)
- Not sure how to impliment
Security is lot exercise
- Ongoing
- More is better
- Room for improvement
- Little steps, not big steps
- Do one update not a huge change
- The Perfect is the enemy of the good
- Personalised for each person
How to Personalise for each person
Consider where on the following spectrums they fall
- Technological Capability
- Privacy needs
- Don’t forget those who need to be visable
- Likely Adversaries
The Open Internet tools Project have a big sample of personas
Lay a Path for Progression
- Couch to 5k for Security
- Week 1 – Add a password on your phone
- Week 2 – Change you email password
How do we communicate
- Tell, sell and shame doesnt work
- Lead by example (with is what I do, you could too)
- Sell doesn’t work
- Give people successful examples to emulate
- Give peopel scripts to help them navigate
- Shame also doesn’t work
- Shame Culture means that people don’t ask for advice
- Try asking “Hey, can I show you a better way to do this? “
“Influencers”
- Show don’t tell
- Show their mistakes
- Let you opt in and not out
- Give you a range of people to follow
- I made a youtube channel!
- Immediately fell back into the habit of Tell, Sell and Shame
- To reach people requires a degree of vulnerbility
- Experts are the ones who don’t want to reveal their personal security setup
- What else happened
- Friends asked me about my security
- Showed people in IRL my personal setup and how I got there
- Honest about how hard it was
- A lot of them were already clued up, seeing somebody they know actually doing it encouraged them to take the step and do it
Be Vulnerable
- Tell them how you screwed up
- People want to hear how they are not stupid for finding it hard
- Be nice to people