DevOpsDaysNZ 2018 – Day 1 – Session 2

Mark Simpson, Carlie Osborne – Transforming the Bank: pipelines not paperwork

  • Change really is possible even in the least likely places
  • Big and risk adverse
    • Lots of paperwork and progress, very slow
  • Needed to change – In the beginning – 18 months ago
    • 6 months talking about how we could change things
  • Looked for a pilot project – Internet Banking team – ~80 people – Go-money platform
    • Big monolith, 1m lines of code
    • New release every 6 weeks
    • 10 weeks for feature from start to production
    • Release on midnight on a Friday night, 4-5 hours outage, 20-25 people.
    • Customer complaints about outage at midnight, move to 2am Sunday morning
  • Change to release every single week
    • Has to be middle of the day, no outage
    • How do we do this?
  • Took whole Internet banking team for 12 weeks to create process, did nothing else.
  • What we didn’t do
    • Didn’t replatform, no time
  • What we did
    • Jenkins – Created a single Pipeline, from commit to master all the way to projection
    • Got rid of selenium tests
    • Switched to cypress.io
      • Just tested 5 key customer journeys
    • Drivetrain – Internal App
      • Wanted to empower the teams, but lots of limits within industry/regulations
      • Centralise decision making
      • Lightweight Rules engine, checks that all the requirements have been done by the team before going to the next stage.
    • Cannery Deployments
      • Two versions running, ability to switch users to one or other
  • Learning to Break things down into small chunks
  • Change Process
    • Lots of random rules, eg mandatory standdown times
    • New change process for teams using Drivetrain, certified process no each release
  • Lots of times spent talking to people
    • Had to get lots of signoffs
  • Result
    • Successful
    • 16 weeks rather than 12
    • 28 releases in less than 6 months (vs approx 4 previously)
    • 95% less toil for each release
  • Small not Big changes
    • Now takes just 4-5 weeks to cycle though a feature
    • Don’t like saying MVP. Pitch is as quickly delivering a bit of value
    • and iterating
    • 2 week pilot, not iterations -> 8 week pilot, 4 iterations
    • Solution at start -> Solution derived over time
  • Sooner, not later
    • Previously
      • Risk, operations people not engaged until too late
      • Dev team disengaged from getting things into production
    • Now
      • Everybody engaged earlier
  • Other teams adopting similar approach

Ryan McCarvill – Fighting fires with DevOps

  • Lots of information coming into a firetruck, displayed on dashboard
  • Old System was 8-degit codes
  • Rugged server in each each truck
    • UPS
    • Raspberry Pi
    • Storage
    • Lots of different networking
  • Requirements
    • Redundant Comms
    • Realtime
    • Offline Mpas
    • Offline documentation, site reports, photos, building info
    • Offline Hazzards
    • Allow firefighters to update
    • Track appliance and firefighter status
    • Be a hub for an incident
    • Needs to be very secure
  • Stack on the Truck
    • Ansible, git, docker, .netcode, redis, 20 micoservices
  • What happens if update fails?
  • More than 1000 trucks, might be offline for months at a time
  • How to keep secure
  • AND iterate quickly
  • Pipeline
    • Online update when truck is at home
    • Don’t update if moving
    • Blue/Green updates
    • Health probes
  • Visual Studio Team Services -> Azure cont registry
  • Playbooks on git , ansible pull,
  • Nginx in front of blue/green
  • Built – there were problems
    • Some overheating
    • Server in truck taken out of scope, lost offline strategy
    • No money or options to buy new solution
  • MVP requirements
    • Lots of gigs of data, made some so only online
    • But many gigs still needed online
    • Create virtual firetruck in the sky, worked for online
    • Still had communication device – 1 core, minimum storage, locked down Linux
  • Put a USB stick in the back device and updated it
    • Can’t use a lot of resources or will inpact comms
    • Hazard search
      • Java/python app, no much impact on system
      • Re-wrote in rust, low impact and worked
      • Changed push to rsync and bash
  • Lessons
    • Automation gots us flexability to change
    • Automation gave us flexability to grow
    • Creativity can solve any problem
    • You can solve new problems with old technology
    • Sometimes the only way to get buy in is to just do it.
Share