DevOpsDays Wellington 2016 – Day 2, Session 3

Ignites

Mrinal Mukherjee – How to choose a DevOps tool

Right Tool
– Does the job
– People will accept

Wrong tool
– Never ending Poc
– Doesn’t do the job

How to pick
– Budget / Licensing
– does it address your pain points
– Learning cliff
– Community support
– API
– Enterprise acceptability
– Config in version control?

Central tooling team
– Pro standardize, educate, education
– Constant Bottleneck, delays, stifles innovation, not in sync with teams

DevOps != Tool
Tools != DevOps

Tools facilitate it not define it.

Howard Duff – Eric and his blue boxes

Physical example of KanBan in an underwear factory

Lindsey Holmwood – Deepening people to weather the organisation

Note: Lindsey presents really fast so I missed recording a lot from the talk

His Happy, High performing Team -> He left -> 6 months later half of team had left

How do you create a resilient culture?

What is culture?
– Lots of research in organisation psychology
– Edgar Schein – 3 levels of culture
– Artefacts, Values, Assumptions

Artefacts
– Physical manifestations of our culture
– Standups, Org charts, desk layout, documentation
– actual software written
– Easiest to see and adopt

Values
– Goals, strategies and philosophise
– “we will dominate the market”
– “Management if available”
– “nobody is going to be fired for making a mistake”
– lived values vs aspiration values (People have good nose for bullshit)
– Example, cores values of Enron vs reality
– Work as imagined vs Work is actually done

Assumptions
– beliefs, perceptions, thoughts and feelings
– exist on an unconscious level
– hard to discern
– “bad outcomes come from bad people”
– “it is okay to withhold information”
– “we can’t trust that team”
– “profits over people”

If we can change our people, we can change our culture

What makes a good team member?

Trust
– Vulnerability
– Assume the best of others
– Aware of their cognitive bias
– Aware of the fundamental attribution error (judge others by actions, judge ourselves by our intentions)
– Aware of hindsight bias. Hindsight bias is your culture killer
– When bad things happen explain in terms of foresight
– Regular 1:1s
Eliminate performance reviews
Willing to play devils advocate

Commit and acting
– Shared goal settings
– Don’t solutioneer
– Provide context about strategy, about desired outcome
What makes a good team?

Influence of hiring process
– Willingness to adapt and adopt working in new team
– Qualify team fit, tech talent then rubber stamp from team lead
– have a consistent script, but be prepared to improvise
– Everyone has the veto power
– Leadership is vetoing at the last minute, thats a systemic problem with team alignment not the system
– Benefit: team talks to candidate (without leadership present)
– Many different perspectives
– unblock management bottlenecks
– Risk: uncovering dysfunctions and misalignment in your teams
– Hire good people, get out of their way

Diversity and inclusion
– includes: race, gender, sexual orientation, location, disability, level of experience, work hours
– Seek out diverse candidates.
– Sponsor events and meetups
– Make job description clear you are looking for diverse background
– Must include and embrace differences once they actually join
– Safe mechanism for people to raise criticisms, and acting on them

Leadership and Absence of leadership
– Having a title isn’t required
– If leader steps aware things should continue working right
– Team is their own shit umbrella
– empowerment vs authority
– empowerment is giving permission from above (potentially temporary)
– authority is giving power (granting autonomy)

Part of something bigger than the team
– help people build up for the next job
– Guilds in the Spotify model
– Run them like meetups
– Get senior management to come and observe
– What we’re talking about is tech culture

We can change tech culture
– How to make it resist the culture of the rest of the organisation
– Artefacts influence behaviour
– Artifact fast builds -> value: make better quality
– Artifact: post incident reviews -> Value: Failure is an opportunity for learning

Q: What is a pre-incident review
A: Brainstorm beforehand (eg before a big rollout) what you think might go wrong if something is coming up
then afterwards do another review of what just went wrong

Q: what replaces performance reviews
A: One on ones

Q: Overcoming Resistance
A: Do it and point back at the evidence. Hard to argue with an artifact

Q: First step?
A: One on 1s

Getting started, reading books by Patrick Lencioni:
– Solos, Politics and turf wars
– 5 Dysfunctions of a team

Share

DevOpsDays Wellington 2016 – Day 2, Session 2

Troy Cornwall & Alex Corkin – Health is hard: A Story about making healthcare less hard, and faster!

Maybe title should be “Culture is Hard”

@devtroy @4lexNZ

Working at HealthLink
– Windows running Java stuff
– Out of date and poorly managed
– Deployments manual, thrown over the wall by devs to ops

Team Death Star
– Destroy bad processes
– Change deployment process

Existing Stack
– VMware
– Windows
– Puppet
– PRTG

CD and CI Requirements
– Goal: Time to regression test under 2 mins, time to deploy under 2 mins (from 2 weeks each)
– Puppet too slow to deploy code in a minute or two. App deply vs Conf mngt
– Can’t use (then) containers on Windows so not an option

New Stack
– VMware
– Ubuntu
– Puppet for Server config
– Docker
– rancher

Smashed the 2 minute target!

But…
– We focused on the tech side and let the people side slip
– Windows shop, hard work even to get a Linux VM at the start
– Devs scared to run on Linux. Some initial deploy problems burnt people
– Lots of different new technologies at once all pushed to devs, no pull from them.

Blackout where we weren’t allowed to talk to them for four weeks
– Should have been a warning sign…

We thought we were ready.
– Ops was not ready

“5 dysfunctions of a team”
– Trust as at the bottom, we didn’t have that

Empathy
– We were aware of this, but didn’t follow though
– We were used to disruption but other teams were not

Note: I’m not sure how the story ended up, they sort of left it hanging.

Pavel Jelinek – Kubernetes in production

Works at Movio
– Software for Cinema chains (eg Loyalty cards)
– 100million emails per month. million of SMS and push notifications (less push cause ppl hate those)

Old Stack
– Started with mysql and php application
– AWS from the beginning
– On largest aws instance but still slow.

Decided to go with Microservices
– Put stuff in Docker
– Used Jenkins, puppet, own docker registery, rundeck (see blog post)
– Devs didn’t like writing puppet code and other manual setup

Decided to go to new container management at start of 2016
– Was pushing for Nomad but devs liked Kubernetes

Kubernetes
– Built in ports, HA, LB, Health-checks

Concepts in Kub
– POD – one or more containers
– Deployment, Daemon, Pet Set – Scaling of a POD
– Service- resolvable name, load balancing
– ConfigMap, Volume, Secret – Extended Docker Volume

Devs look after some kub config files
– Brings them closer to how stuff is really working

Demo
– Using kubectl to create pod in his work’s lab env
– Add load balancer in front of it
– Add a configmap to update the container’s nginx config
– Make it public
– LB replicas, Rolling updates

Best Practices
– lots of small containers are better
– log on container stdout, preferable via json
– Test and know your resource requirements (at movio devs teams specify, check and adjust)
– Be aware of the node sizes
– Stateless please
– if not stateless than clustered please
– Must handle unexpected immediate restarts

Share