DevOpsDaysNZ 2018 – Day 2 – Session 2

Interesting article I read today

Why Doctors Hate their Computers by Atul Gawande

Mrinal Mukherjee – A DevOps Confessional

  • Not about accidents, it is about Planned Blunders that people are doing in DevOps
  • One Track DevOps
    • From Infrastructure background
    • Job going into places, automated the low hanging fruit, easy wins
    • Column of tools on resume
    • Started becoming the bottleneck, his team was the only one who knew how the infrastructure worked.
    • Not able to “DevOps” a company since only able to fix the infrastructure, not able to fix testing etc so not dilvering everything that company expected
    • If you are the only person who understands the infrastructure you are the only one blamed when it goes wrong
    • Fixes
      • Need to take all team on a journey
      • But need to have right expectations set
      • Need to do learning in areas where you have gaps
      • DevOps is not about individual glory, Devops is about delivering value
      • HR needs to make sure they don’t reward the wrong thing
  • MVP-Driven Devops
    • Mostly working on Greenfields products that need to be delivered quickly
    • MVP = Maximum Technical Debt
    • MVP = Delays later and Security audits = Your name attached to the problem
    • Minimum Standard of Engineering
      • Test cases, Documentation, Modular
      • Peer review
    • Evolve architecture, not re-architect
  • Judgemental Devops
    • That team sucks, they are holding things up, playing a different game from us
    • Laughing at other teams
    • Consequence – Stubbornness from the other team
    • Empathy
      • Find out why things are they way they are
    • Collaborate to find common ground and improve
    • Design my system to I plan to work within constraints of the other team
Share

DevOpsDaysNZ 2018 – Day 2 – Session 1

Alison Polton-Simon – The DevOps Experiments: Reflections From a Scaling Startup

  • Software engineer at Angaza, previously Thoughtworks, “Organizational Anthropologist”
  • Angaza
    • Enable sales of life-changing products (eg solar chargers, water pumps, cook stoves in 3rd world countries)
    • Originally did hardware, changed to software company doing pay-as-you-go of existing devices
    • ~50 people. Kenya and SF, 50% engineering
    • No dedicated Ops
    • Innovate with empathy, Maximise impact
    • Model is to provide software tools to activate/deactivate products sold to peopel with low credit-scores. Plus out software around the activity like reports for distributors.
  • Reliability
    • Platform is business critical
    • Outages disrupt real people (households without light, farmers without irrigation)
    • Buildkite, Grafana, Zendesk
  • Constraints
    • Operate in 30+ countries
    • Low connectivity, 2G networks best case
    • Rural and peri-urban areas
    • Team growing by 50% in 2018 (2 eng teams in Kenya + 1 QA)
    • Most customers in timezone where day = SF night
  • Eras of experimentation
    • Ad Hoc
    • Tributes (sacrifice yourself for the stake of the team)
    • Collectives (multiple teams)
    • Product teams
  • ad Hoc – 5 eng
    • 1 eng team
    • Ops by day – you broke, you fix
    • Ops by night – Pagerduty rotation
    • Paged on all backend exception, 3 pages = amnesty
    • Good
      • Small but senior
      • JIT maturity
      • Everyone sitting next to each other
    • Bad
      • Each incident higherly disruptive
      • prioritized necessity over sustainability
  • Tribute – 5-12 eng
    • One person protecting team from interuptions
    • Introduced support person and triage
    • Expanded PD rotation
    • Good
      • More sustainable
      • Blue-Green deploys
      • Clustered workloads
    • Not
      • Headcount != horizontal scaling
      • Hard to hire
      • Customer service declined
  • Collectives 13-20 engs
    • Support and Ops teams – Ops staffed with devs
    • Other teams build roadmaps and requests
    • Teams rotate quarterly – helps onboarding
    • Good
      • Cross train ppl
      • Allow for focus
      • allowed ppl to get depth
    • Bad
      • Teams don’t op what they built
      • Quarter flies by quickly
      • Context switch is costly
      • Still a juggling act
      • 1m ramping up, 6w going okay, 2w winging down
  • Product teams  21 -? eng
    • 5 teams, 2 in Nairobi
    • Teams allighned with business virticals, KPIs
    • Dev, own and maintain services
    • Per-team tributes
    • No [Dev]Ops team
    • Intended goals
      • Independent teams
      • own what build
      • Support biz KPIs
      • cross team coordination
    • Expected Chellenges
      • Ownership without responsbility
      • Global knowledge sharing
      • Return to tribute system (2w out of the workflow)
  • Next
    • Keep growing team
    • Working groups
    • Eventual SRE
    • 24h global coverage
  • Case a “Constitution” of values that everybody who is hired signs
  • Takeaways
    • Maximise impact
      • Dependable tools over fashionable ones
      • Prefer industry-std tech
      • But get creative when necessary
    • Define what reliability means for your system
    • Evolve with Empathy
      • Don’t be dogmatic without structure
      • Serve your customers and your team
      • Adapt when necessary
      • Talk to people
      • Be explicit as to the tradeoffs you are making

Anthony Borton – Four lessons learnt from Microsoft’s DevOps Transformation

  • Microsoft starting in 1975
  • 93k odd engineers at Microsoft
    • 78k deployments per day
    • 2m commits per month
    • 4.4 builds/month
    • 500 million tests/day
  • 2018 State of Devops reports looks at Elite performers in the space
  • TFS – Team Foundation Server
    • Move product to the cloud
    • Moved on-prem to one instance
    • Each account had it’s own DB (broke stuff at 11k DBs)
  • 4 lessons
    • Customer focussed
      • Listen to customers, uservoice.com
      • Lots of team members keep eye on it
      • Stackoverflow
      • Embed with customers
      • Feedback inside product
      • Have to listen in all the channels
    • Failure is an opportunity to learn
    • Definition of done
      • Live in prod, collecting telemetary that examines hypotheses that it was created to prove
    • “For those of you who don’t know who Encarta is, look it up on Wikipedia”
  • Team Structure
    • Combined engineering = devs + testing
      • Some testers left team or organisation
    • Feature team
      • Physical team rooms
      • Cross discipline
      • 10-12 people
      • self managing
      • Clear charter and goals
      • Intact for 12-18 months
    • Sticky note exercise, people pick which teams they would like to join (first 3 choices)
      • 20% choose to change
      • 100% get the choice
  • New constants and requirements
    • Problems
      • Tests took too long – 22h to 2days
      • Tests failed frequently – On 60% passed 100%
      • Quality signal unreliable in master
    • Publish VSTS Quality vision
      • Sorted by exteranl dependancies
      • Unit tests
        • L0 – in-memory unit tests
        • L1 – More with SQL
      • Functional Tests
        • L2 – Functional tests against testable service deployment
        • L3 – Restricted class integration tests that run against production
      • 83k L0 tests run agains all pulls very fast
  • Deploy to rings of users
    • Ring 0 – Internal Only
    • Ring 1 – Small Datacentre 1-1.5m accounts in Brazil (same TZ as US)
    • Ring 2 – Public accounts, medium-large data centre
    • Ring 3 – Large internal accounts
    • Ring 4-5 – everyone else
    • Takes about a week for normal releases.
    • Binaries go out and then the database changes
    • Delays of minutes (up to 75) during the deploys to allow bugs to manafest
    • Some customers have access to feature flags
    • Customers who are risk tolerant can opt in to early deploys. Allows them to get faster feedback from people who are able to provide it
  • More features delivered in 2016 than previous 4 years. 50% more in 2017

 

Share

DevOpsDaysNZ 2018 – Day 1 – Session 4

Everett Toews – A Trip Down CI/CD Lane

I missed most of this talk. Sounded Good.

Jeff Smith – Creating Shared Contexts

  • Ideas and viewpoints are different from diff people
  • Happens in organisation, need to make sure everybody is on the same page
  • Build a shared context via conversations
  • Info exchange
  • Communications tools
  • Context Tools
  • X/Y Problem
  • Data can bridge conversations. Shared reality.
  • Use the same tools as other teams so you know what they are doing
  • Give the context to your requests, ask for them and it will automatically happen eventually.

Peter Sellars – 2018: A Build Engineers Odyssey

  • Hungry, Humble and Smart

Katrina Clokie – Testing in DevOps for Engineers

  • We can already write, so how hard can it be to write a novel?
  • Hopefully some of you are doing testing already
  • Problem is that people overestimate their testing skills, not interested in finding out anything else.
  • The testing you are doing now might be with one tool, in one spot. You are probably finding stuff but missing other things
  • Why important
    • Testing is part of you role
    • In Devops testing goes though Operations as well
    • Testing is DevOps is like air, it is all around you in every role
    • Roles of testers is to tech people to breath continuously and naturally.
  • Change the questions that you ask
    • How do you know that something is okay? What questions are you asking of your product?
    • Oracles are the ways that we recognise a problem
    • Devs ask: “Does it work how we think it should?”
    • Ops ask: “Does it work how it usually works?”
    • Devs on claims, Ops on history
    • Does it work like our competors, does it meet it’s purpose without harmful side effects, doesn’t it meet legal requirements, Does it work like similar services.
    • HICCUPPS – Testing without a Map – Michael Bolton, 2005
    • How do we compare to what other people are doing?  ( Not just a BA’s job , cause the customer will be asking a question and so should you)
    • Flip the Oracle, compare them against other things not just the usual.
    • Audit – Continuous compliance, Always think about if it works like the standards say it should.
    • These are things that the business is asking. If you ask then gain confidence of business
  • Look for Answers in Other Places
    • Number of tests: UI <  Service < Unit
    • The Test Pyramid as a bug catcher. Catch the Simple bugs first and then the subtle ones
  • Testing mesh
    • Unit tests – fine mesh
    • Intergration – Bigger/Fewer tests but cover more
    • Next few layers: End to End, Alerting, Monitoring, Logging. Each stops different types of bugs
    • Conversation should be “Where do we put our mesh?”, “How far can this bug fall?”.
    • If another layer will pick the bug up do we need a test.
  • Use Monitoring as testing
    • Push risk really late, no in all cases but can often work
  • A/B testing
    • Ops needs to monitor
    • Dev needs framework to role out and put in different options
  • Chaos Engineering
    • Start with something small, communicate well and do during daylight hours.
    • Yours customers are testing in production all the time, so why arn’t you too?
  • https://leanpub.com/testingindevops

 

Share

DevOpsDaysNZ 2018 – Day 1 – Session 3

Open Space 1 – Prod Support, who’s responsible

  • Problem that Ops doesn’t know products, devs can’t fix, product support owners not technical enough
  • Xero have embedded Ops and dev in teams. Each person oncall maybe 2 weeks in 20
  • Customer support team does everything?
  • “Ops have big graphs on screens, BI have a couple of BI stats on screens, Devs have …youtube videos”
  • Tiers support vs Product team vs Product support team
  • Tiered support
    • Single point of entry
    • lower paid person can handle easy stuff
    • Context across multiple apps
  • Product Team
    • Buck stops with someone
    • More likely to be able to action
    • Ownership of issues
    • Everyone must be enabled to do stuff
    • Everyone needs to be upskilled
  • Prod Support
    • Big skilled can fix anything team
    • Devs not keen
    • Even the best teams don’t know everything

Open Space 2 – DevOps at NZ Scale

  • Devops team, 3rd silo
    • Sometimes they are the new team doing cool stuff
    • One model is evangelism team
  • Do you want devops culture or do you just want somebody to look after your pipeline?
  • Companies often don’t know what they want to hire
  • Companies get some benefit with the tools (pipelines, agile)  but not the culture. But to get the whole benefit they need to adopt everything.
  • The Way of Ways article by John Cutler

Open Space 3 – Responding Quickly

I was taking notes on the board.

Share

DevOpsDaysNZ 2018 – Day 1 – Session 2

Mark Simpson, Carlie Osborne – Transforming the Bank: pipelines not paperwork

  • Change really is possible even in the least likely places
  • Big and risk adverse
    • Lots of paperwork and progress, very slow
  • Needed to change – In the beginning – 18 months ago
    • 6 months talking about how we could change things
  • Looked for a pilot project – Internet Banking team – ~80 people – Go-money platform
    • Big monolith, 1m lines of code
    • New release every 6 weeks
    • 10 weeks for feature from start to production
    • Release on midnight on a Friday night, 4-5 hours outage, 20-25 people.
    • Customer complaints about outage at midnight, move to 2am Sunday morning
  • Change to release every single week
    • Has to be middle of the day, no outage
    • How do we do this?
  • Took whole Internet banking team for 12 weeks to create process, did nothing else.
  • What we didn’t do
    • Didn’t replatform, no time
  • What we did
    • Jenkins – Created a single Pipeline, from commit to master all the way to projection
    • Got rid of selenium tests
    • Switched to cypress.io
      • Just tested 5 key customer journeys
    • Drivetrain – Internal App
      • Wanted to empower the teams, but lots of limits within industry/regulations
      • Centralise decision making
      • Lightweight Rules engine, checks that all the requirements have been done by the team before going to the next stage.
    • Cannery Deployments
      • Two versions running, ability to switch users to one or other
  • Learning to Break things down into small chunks
  • Change Process
    • Lots of random rules, eg mandatory standdown times
    • New change process for teams using Drivetrain, certified process no each release
  • Lots of times spent talking to people
    • Had to get lots of signoffs
  • Result
    • Successful
    • 16 weeks rather than 12
    • 28 releases in less than 6 months (vs approx 4 previously)
    • 95% less toil for each release
  • Small not Big changes
    • Now takes just 4-5 weeks to cycle though a feature
    • Don’t like saying MVP. Pitch is as quickly delivering a bit of value
    • and iterating
    • 2 week pilot, not iterations -> 8 week pilot, 4 iterations
    • Solution at start -> Solution derived over time
  • Sooner, not later
    • Previously
      • Risk, operations people not engaged until too late
      • Dev team disengaged from getting things into production
    • Now
      • Everybody engaged earlier
  • Other teams adopting similar approach

Ryan McCarvill – Fighting fires with DevOps

  • Lots of information coming into a firetruck, displayed on dashboard
  • Old System was 8-degit codes
  • Rugged server in each each truck
    • UPS
    • Raspberry Pi
    • Storage
    • Lots of different networking
  • Requirements
    • Redundant Comms
    • Realtime
    • Offline Mpas
    • Offline documentation, site reports, photos, building info
    • Offline Hazzards
    • Allow firefighters to update
    • Track appliance and firefighter status
    • Be a hub for an incident
    • Needs to be very secure
  • Stack on the Truck
    • Ansible, git, docker, .netcode, redis, 20 micoservices
  • What happens if update fails?
  • More than 1000 trucks, might be offline for months at a time
  • How to keep secure
  • AND iterate quickly
  • Pipeline
    • Online update when truck is at home
    • Don’t update if moving
    • Blue/Green updates
    • Health probes
  • Visual Studio Team Services -> Azure cont registry
  • Playbooks on git , ansible pull,
  • Nginx in front of blue/green
  • Built – there were problems
    • Some overheating
    • Server in truck taken out of scope, lost offline strategy
    • No money or options to buy new solution
  • MVP requirements
    • Lots of gigs of data, made some so only online
    • But many gigs still needed online
    • Create virtual firetruck in the sky, worked for online
    • Still had communication device – 1 core, minimum storage, locked down Linux
  • Put a USB stick in the back device and updated it
    • Can’t use a lot of resources or will inpact comms
    • Hazard search
      • Java/python app, no much impact on system
      • Re-wrote in rust, low impact and worked
      • Changed push to rsync and bash
  • Lessons
    • Automation gots us flexability to change
    • Automation gave us flexability to grow
    • Creativity can solve any problem
    • You can solve new problems with old technology
    • Sometimes the only way to get buy in is to just do it.
Share

DevOpsDaysNZ 2018 – Day 1 – Session 1

Jeff Smith – Moving from Ops to DevOps: Centro’s Journey to the Promiseland

  • Everyone’s transformation will look a little different
  • Tools are important but not the main problem (dev vs Ops)
  • Hiring a DevOps Manager
    • Just sounds like a normal manager
    • Change it to “Director of Production Operations”
  • A “DevOps Manager” is the 3rd silo on top of Dev and Ops
  • What did peopel say was wrong when he got there?
    • Paternalistic Ops view
      • Devs had no rights on instances
      • Devs no prod access
      • Devs could not create alerts
  • Fix to reduce Ops load
    • Devs get root to instances, but access to easily destroy and recreate if they broke it
    • Devs get access to common safe tasks, required automation and tools (which Ops also adopted)
    • Migrated to datadog – Single tool for all monitoring that anyone could access/update.
    • Shared info about infrastructure. Docs, lunch and learns. Pairing.
  • Expanding the scope of Ops
    • Included in the training and dev environment, CICD. Customers are internal and external
    • Used same code to build every environment
    • Offering Operation expertise
      • Don’t assume the people who write the software know the best way to run it
  • Behaviour can impact performance
    • See book “Turn the Ship around”
    • Participate in Developer rituals – Standups, Retros
    • Start with “Yes.. But” instead of “No” for requests. Assume you can but make it safe
    • Can you give me some context. Do just do the request, get the full picture.
  • Metrics to Track
    • Planned vs unplanned work
    • What are you doing lots of times.
  • What we talk about?
    • Don’t allow your ops dept to be a nanny
    • Remove nanny state but maintain operation safety
    • Monitor how your language impacts behavour
    • Monitor and track the type of work you are doing

François Conil – Monitoring that cares (the end of user based monitoring)

  • User Based monitoring (when people who are affected let you know it is down)
  • Why are we not getting alerts?
    • We are are not measuring the right thing
    • Just ignore the dashboard (always orange or red)
    • Just don’t understand the system
  • First Steps
    • Accept that things are not fine
    • Decide what you need to be measuring, who needs to know, etc. First principals
    • A little help goes a long way ( need a team with complementary strengths)
  • Actionable Alerts
    • Something Broken, User affected, I am the best person to fix, I need to fix immediately
    • Unless all 4 apply then nobody should be woken up.
      • Measured: Take to QA or performance engineers to find out the baseline
      • User affected: If nobody is affected do we care? Do people even work nights? How do you gather feedback?
      • Best person to fix: Should ops guys who doesn’t understand it be the first person to page?
      • Do it need to be fixed? – Backup environment, Too much detail in the alerts, Don’t alert on everything that is broken, just the one causing the problem
  • Fix the cause of the alerts that are happening the most often
  • You need time to get things done
    • Talk to people
    • Find time for fixes
  • You need money to get things done
    • How much is the current situation costing the company?
    • Tech-Debt Friday
Share

Audiobooks – October 2018

How to Rig an Election by Nic Cheeseman & Brian Klaas

The authors take experiences in various countries (mostly recent 3rd-world examples) as to how elections are rigged. Some advice for reducing it. 8/10

The Hound of the Baskervilles by Sir Arthur Conan Doyle. Read by Stephen Fry

Once again great reading by Fry and a great story. Works very well with all Holmes and Watson action and no giant backstory. 8/10

Our Oriental Heritage: Story of Civilization Series, Book 1 by Will Durant

Covers the early history of Egypt, the Middle East, India, China and Japan. In some cases up to the 20th Century. The book cover arts, religion and philosophy as well Kings and dates. This was written in the 1930s so has some stuff that has been superseded and out of date attitudes to race and religion in places. It long (50 hours) with another 11 volumes still to go but it is pretty good if you don’t mind these problems. 7/10

Chasing New Horizons: Inside the Epic First Mission to Pluto by Alan Stern & David Grinspoon

Stern was one of the originators and principal investigator of the mission so lots of firsthand details about all stages of the project from first conception though various versions.

Share

Audiobooks – September 2018

Lone Star: A History of Texas and the Texans by T. R. Fehrenbach

About 80% of the 40 hour book covers the period 1820-1880. Huge amounts of detail during then but skips over the rest quickly. Great stories though. 8/10

That’s Not English – Britishisms, Americanisms, and What Our English Says About Us by Erin Moore

A series of short chapters (usually one per word) about how the English language is used differently in England from the US. Fun light read. 7/10

The Complacent Class: The Self-Defeating Quest for the American Dream by Tyler Cowen

How American culture (and I’d extend that to countries like NZ) has stopped innovating and gone the safe route in most areas. Main thesis is that pressure is building up and things may break hard. Interesting 8/10

A History of Britain, Volume 2 : The British Wars 1603 – 1776 by Simon Schama

Covering the Civil War, Glorious Revolution and bits of the early empire and American revolution. A nice overview. 7/10

I Find Your Lack of Faith Disturbing: Star Wars and the Triumph of Geek Culture by A. D. Jameson

A personal account of the author’s journey though Geekdom (mainly of the Sci-Fi sort) mixed in with a bit of analysis of how the material is deeper than critics usually credit. 7/10

Share

Audiobooks – August 2018

Homo Deus: A Brief History of Tomorrow by Yuval Noah Harari

An interesting listen. Covers both history of humanity and then extrapolates ways things might go in the future. Many plausible ideas (although no doubt some huge misses). 8/10

Higher: A Historic Race to the Sky and the Making of a City by Neal Bascomb

The architects, owners & workers behind the Manhattan Trust Building, the Chrysler Building and the Empire State Building all being built New York at the end of the roaring 20s. Fascinating and well done. 9/10

The Invention Of Childhood by Hugh Cunningham

The story of British childhood from the year 1000 to the present. Lots of quotes (by actors) from primary sources such as letters (which is less distracting than sometimes). 8/10

The Sign of Four by Arthur Conan Doyle – Read by Stephen Fry

Very well done reading by Fry. Story excellent of course. 8/10

My Happy Days in Hollywood: A Memoir by Garry Marshall

Memoir by writer, producer (Happy Days, etc) and director (Pretty Woman, The Princess Diaries, etc). Great stories mostly about the positive side of the business. Very inspiring 8/10

Napoleon by J. Christopher Herold

A biography of Napoleon with a fair amount about the history of the rest of Europe during the period thrown in. A fairly short (11 hours) book but some but not exhausting detail. 7/10

Storm in a Teacup: The Physics of Everyday Life by Helen Czerski

A good popular science book linking everyday situations and objects with bigger concepts (eg coffee stains to blood tests). A fun listen. 7/10

All These Worlds Are Yours: The Scientific Search for Alien Life by Jon Willis

The author reviews recent developments in the search for life and suggests places it might be found and how missions to search them (he gives himself a $4 billion budget) should be prioritised. 8/10

Ready Player One by Ernest Cline

I’m right in the middle of the demographic for most of the references here so I really enjoyed it. Good voicing by Wil Wheaton too. Story is straightforward but all pretty fun. 8/10

Share

Audiobooks – July 2018

The Return of Sherlock Holmes by Sir Arthur Conan Doyle

I switched to Stephen Fry for this collection. Very happy with his reading of the stories. He does both standard and “character” voices well and is not distracting. 8/10

Roughing It by Mark Twain

A bunch of anecdotes and stories from Twain’s travels in Nevada & other areas in the American West. Quality varies. Much good but some stories fall flat. Verbose writing (as was the style at the time…) 6/10

Asteroids Hunters by Carrie Nugent

Spin off of a Ted talk. Covers hunting for Asteroids (by the author and others) rather than the Asteroids themselves. Nice level of info in a short (2h 14m) book. 7/10

Things You Should Already Know About Dating, You F*cking Idiot by Ben Schwartz & Laura Moses

100 dating tips (roughly in order of use) in 44 minutes. Amusing and useful enough. 7/10

Protector – A Classic of Known Space by Larry Niven

Filling in a spot in Niven’s universe. Better than many of his Known Space stories. Great background on the Pak in Hard Core package. Narrator gave everybody strong Australian accents for some reason. 7/10

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly

Good book on 12 long term “deep trends” ( filtering, remixing, tracking, etc ) and how they have worked over the last and next few decades (especially in the context of the Internet). Pretty interesting and mostly plausible. 7/10

Caesar’s Last Breath: Decoding the Secrets of the Air Around Us by Sam Kean

Works it’s way though the gases in & evolution of earth’s atmosphere, their discovery and several interesting asides. Really enjoyed this, would have enjoyed 50% more of it. 9/10

Share