Linux.conf.au 2019 – Wednesday – Keynote

#WeAreNotWaiting: how open source is changing healthcare
– Dana Lewis

Dana Lewis
  • Getting diagnosed with a chronic disease is like being struck by lighting.
  • Insulin takes a while to kick in
  • Manual diabetes
    • Have to check level over and over again
    • Have to judge trend and decide more insulin, food, exercise, etc
    • Constantly
  • Device
    • Windows-only interface software to access
    • Alarm not great, various other limitations
  • Idea
    • Pull data off the device, create smarter app
    • Hard to do
  • First version
    • Device -> WinPC -> dropbox -> app -> pushover -> phone
  • 2nd Version
    • Press button to indicate what she is doing (eating, sleeping) when get alert
  • 3rd version
    • Hook into Insulin pump to do it automatically
    • Replace the human do they same thing over-and-over again in the loop.
    • all portable
    • Person doesn’t have to wake up to adjust, petter sleep
  • OpenAPS
    • Open Source
    • Created a list of ways it could fail (battery fails, wires come out)
    • Focusing on safety
    • Limiting dosing ability in hardware and software
    • Failing back safely to standard device operation
  • Approx 1000 users
    • 9 million+ hours of DIY closed loop experience
    • Anonymized dataset available
  • Sample child user
    • Before: 4.5 manual interventions per day by parents
    • After: 0.7 per day
  • School Child ( 5 vs 6th Grade )
    • 420 visits to school nurse ( 2.3 /day ) – 66 visits for events
    • 5 visits with OpenAPS – 3 visits (Gym related)
  • Intel Edison Platform
    • Smaller than Raspberry Pi
    • But discontinued (looking for old ones to replace)
  • Going back to Pi, now smaller
    • Has built in display with “Explorer Hat”
  • Outsiders are building stuff because they can and the traditional companies are not meeting the need. Innovate with small solution and build it up
  • Ask what little things you can do for people with similar problems. This started with just “Make a louder alarm”.

Share

Linux.conf.au 2019 – Tuesday – Session 3 – Docs Down Under

How to Avoid Meetings – Maia Sauren

Maia Sauren
  • What Distributed/International Teams involves
    • Lots of late meetings
    • Not up on the inclusive languages
    • We’d language quirks from ESL speakers
    • Taxonomy changes between fields
    • Stereotypes are incomplete
    • Ask Culture vs Guess Culture
    • Micro-cultures , down to schools, extra years.
    • When you make a private joke without context, someone is left out
    • What governance model wins (who do they raise barriers for)
    • It’s harder to change a relationship over the phone than maintain one
    • A relationship with a CoC is less fragile (shouldn’t be figured out in the fly during conflict)
    • “How do you want to have arguments?”
    • Set standards early and resolve swiftly
    • Normalise conflict resolution
    • Adulting: it’s for people who don’t want to cry even more later
    • What have you done this week you are proud of?

Disaster Recovery Book – Svetlana Marina

Svetlana Marina
  • Software Development Process
    • Speaning all time “firefighting” so no time to improve processes
  • Operation Support Model
  • Incident Model
    • Alert Raise
    • Initial Response
      • Runbook should have: Alert type -> Investigation help, exact queries into log search, links
    • Impact, escalations, SLA
      • Send email to stakeholders, keep their trust
      • Include “communication required” in runbook
    • Investigation and Damage control
      • Do what is required to fix problem, not fix the root cause
      • Message to stakeholders again
    • Through Investigation (depends on situation)
    • Fix root cause
    • Post Incident Review

The Bus Plan: Junior Staff Training – Andrew Jeffree

Andrew Jeffree
  • Staff coming into industries with lots of automation, but what when automation fails?
  • Staff only know about automation, can only run a playbook, don’t understand what it does
  • Disclaimer: Automation isn’t bad.
    • Config Managment, Custom tooling, CICD, Infrastructure as code
  • Buttons
    • Need to understand what they do
    • Sometimes they are not there
    • Can be Hard to implement a button that doesn’t exist
  • Do it the manual way
  • Training
    • Do something manually (install wordpress) and document
    • Expand what you have done
    • Tweak it
    • Break and fix it
    • Don’t be afraid to change something manually
  • Questions
    • Split for new people – 30% training , 70% work
    • Manual -> ansible -> puppet
    • Don’t want them to be stuck in the mindset that everthing we do is the best way

When Agile Doesn’t Work Anymore: Managing a Large Documentation Project – Lana Brindley

Lana Brindley
  • We all age
    • Sometimes old docs should just be binned
  • Old docs base, massive changes needed, but wanted to save it
    • Changes to toolchain too
  • Proof of concept
    • Define how big the job will be, what is involved
    • Get buy-in
  • Plan, Plan, Plan
    • Create a real timeline
    • Advertise your plan, tell everybody about it
    • Do a presentation to team for each phase
  • Research
    • Who is your audience?
    • No, really, who is your audience? – eg Sales and support may use docs more than customers
    • Lets the customers know that “somebody cares about these docs”
  • User/task analysis
    • Where do we need to focus work
    • What are the big tasks
  • Do the thing
    • You have to sit down and write it
    • You are breaking the agile system
    • You need to track your work and who doing what
    • eg this case, moving from big chapters to small topic-based docs.
    • Agree ahead of time on process, have buy-in
  • Review
    • What went on?
    • WHo not to stuff it up next time
    • Try not to blame people too much
  • Agile vs … something else
    • Sometimes Agile is not the best model
  • Tips for working outside Agile
    • Outreach
      • Especially product owners.
      • Get people on board
    • Track your work
      • Create Epics of sprints
      • Don’t go overboard
    • Should about it
      • outreach never stops
      • Present at sprint reviews
      • Brownbags

Share

Linux.conf.au 2019 – Tuesday – Session 2 – Picking a community & Mycroft AI

Finding Your Tribe: Choosing open source Communities – Cintia Del Rio

  • When started out she couldn’t find info on picked what project to work on. Crowdsourced some opinions from others
  • You need to work out why you want to volenteer for a project.
  • 3 types of code on Github
    • Source Available
      • Backed by companies (without open source as their business model)
      • Core devs from company, roadmap controlled by them
      • Limited influence from externals
      • Most communication not on public channels
      • You will be seen as guest/outsider
    • On Person Band
      • Single core maintainer, working in spare time
      • Common even for very popular libraries and tools (lots of examples from node and java ecosystem)
      • Few resource
      • Conflicts might not be handled well
    • Communities
      • Communication Channells – forums, mailing list, chat
      • Multiple core devs
      • Github org
      • Code of conduct
  • Some things to check beforehand
    • Is it dead yet?
      • Communication channels, Commits, issues, pull requests – how old, recent updates
    • How aggressive is the community?
      • Look at how a clueless user is handled
      • Declined pull requests. HOW did they handle the decline
      • ” Is the mailing list/icr/slack, is it a trash fire? ”
      • “Can you please rule” – add “Can you please” in front of a comment and does it sound nice or still mean/sarcasm?
    • Is non-coding work valued
    • Communities with translations?
    • Look at photos from the conferences
    • Grammar mistakes and typos – how are the handled?
    • Newbie tags and Getting started docs
    • Cool languages tend to attract toxic people
    • Look for Jerks in leadership
    • Ask around
  • Does it spark joy? – If not let it go.

Intro to the Open Source Voice Stack By Kathy Reid

Kathy Reid
  • In the past we have taught spreadsheets etc. We need to teach the latest thing and that is now voice interfaces
  • Worked for Mycroft, mainly using that for demo
  • Voice stack
    • Wake Word
    • Speech 2 Text (utterance)
    • Action ( command )
    • Text to Speech (dialogue )
  • Request response life-Cycle
    • Wake word
    • Intent parser over utterance
  • When Kathy just started she was first woman and first Australian so few/no samples in database and had problems understanding her.
  • Text to Speech
    • Needs to have a well speaker speaking for 40-60 hours
    • Mimic Recording Studio – List of Phrases that people need to speak to train the output
Share

Linux.conf.au 2019 – Tuesday – Session 1 – Docs Down Under Miniconf

Being Kind to 3am You – Katie McLaughlin

Katie McLaughlin
  • Not productive and operating at her best at 3am
  • But 3am’s will happen and they probably will be important
  • Essentials
    • You should have documentation, don’t keep it in your head since people are not available at 3am
    • Full doc management system might take a while
    • Must be Editable
      • Must be updateable at any time
    • Searchable
    • Have browser keywords that search confluence or github
    • Secure but discoverable by co-workers
  • Your Tools
    • Easy cache commands to use
    • Not dangerous
  • Stepping Up
    • Integrate your docs so it’ll be available and visible when people need it.
    • Alerts could link to docs for service
  • Post Mortem
    • List of commands you typed to fix it
  • Reoccurring Issues
    • Sometimes the quick fix is all you can do or is good enough. You can get back to sleep.
    • Maybe just log rotate to clean the disk. Or restart process once a week
    • Make you fix an ansible playbook you can just click
  • Learning
    • Learn new stuff so when you have chance you can do it write
  • Flag Changes
    • Handover changes to over to everyone else
  • So Empathy towards the other people (and they may show it back)
  • Audience
    • One guy gave anyone who go paged overnight $100 bill on their desk next day ( although he charged customers $150 )
    • From Fire Depts – Label everything, Have the docs come with the alert. Practice during the daylight.
    • Project IPXE – every single error message is a link to wiki page
    • Advice: Write down every command, everything you did, every output you saw. So useful for next day.

Making youself Redundant on Day One – Alexandra Perkins

Alexandra Perkins
  • Experiences
    • All Docs as facebook posts
    • All docs as comment codes
    • Word documents hidden in folders
  • Why you should document in your first weeks
    • Could you know the what the relevant questions for new people
    • You won’t remember it the first time you hear it
    • Easier for the next person
    • Inclusive and diverse workplace
  • What should you document
    • Document the stuff you find hard
    • Think about who else can use your docs
    • Stuff like: How to book leave, Who to ask about what topics, Info on workplace social events. Where lunch is.
  • How to document from the start
    • In wiki or Sharepoint
    • Word docs locally and copy it the official place once you have access
    • Saved support tickets
    • Notes to yourself on slack
    • Screenshots of slack conversations
    • Keep it simple, informal content is your friend. All the Memes!
    • Example Tutorial: “Send yourself an email and trace it though the logs”
  • Future Proofing
    • Create or Improve the place for Internal documentation
    • Everyone should be able to access and edit (regardless of technical expertise)
    • Must be searchable and editable so can be updated
    • Transfer all docs you did on your personal PC to company-wide documentation
    • Make others aware of the work you have done
    • Foster a culture of strong documentation.
      • Policy to document all newly announced changes
      • Have a rotation for the documentation person
    • Quality Internal Docs should be
      • Accessible
      • Editable
      • Searchable
      • Peer Reviewed

JIT Learning: It’s great until it isn’t – Tessa Bradbury

Tessa Bradbury
  • What should we learn?
    • There is a lot to learn
  • What is JIT Learning?
    • Write Code -> Hit an issue -> Define Problem -> Find a Solution
  • Assumption – You will ask the required questions (hit the issue)
    • Counter example: accessibility, you might not hit the problem yourself
  • Assumption – You can figure out the problem
    • Sometimes you can’t easily, you might not have the expereince and/or training
  • Assumption – You can find the solution
    • Sometimes you can’t find the solution on google
    • Sometimes you are not in Open Source, you can’t just read the code and the docs may be lacking
  • Assumption – You might not be sure the best way to write your fix
    • Best way to implement the code, if you should fix it in code
    • Or if your code has actually fixed the whole problem
  • Assumption – The benefit of getting it done now outweighs the cost of getting it wrong
    • Counter example – Security

Share

Linux.conf.au 2019 – Tuesday – Keynote: Rory Aronson

Beyond README.md

  • Had idea to automated small-scale gardening
  • Wrote up a proposal: FarmBot – 3d printer for growing your garden
    • Open Source
    • Impact > Money
    • Based on 3rd printer frame (with moving arms)
  • Created prototype 2015-2016
    • Add Web App
  • Crowdfunding and Video campaign
    • 58 Million views
    • 600k shares
    • 38,00 comments
    • $800k pre-sales, 300 orders
  • Created and shipped first version
  • But still open source

OPen Source Hardware

  • genesis.farm.bot
    • All the CAD models
    • Bill of materials
    • Everything Versioned
    • Mods and add-ons “for inspiration only” (not officially supported)
  • Open Source Community
    • Code of Conduct

Open Source Company

  • If you have a business you will have competitors
  • Competitors -> Collaborators
  • meta.farm.bot
  • Lots of numbers online (profit, bills, etc), Stuff that would be on internal wiki (like how to handle orders or do taxes) at other company is public
  • Compensation formula
  • See “Buffer’s Transparency Dashboard




Share

Donations 2018

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended).

I also blog about it to hopefully inspire others. See: 2017, 2016, 2015

All amounts this year are in $US

My main donations was to Givewell (to allocate to projects as they prioritize). I’m happy that they are are making efficient uses of donations.

I gave some money to the Software Conservancy to allocate across the projects (mostly open source software) they support and also to Mozilla to support the Firefox browser (which I use) and other projects.

Next were three advocacy and infrastructure projects.

and finally I gave some money to a couple of outlets whose content I consume. Signum University produce various education material around science-fiction, fantasy and medieval literature. In my case I’m following their lectures on Youtube about the Lord of the Rings. The West Wing Weekly is a podcast doing a episode-by-episode review of the TV series The West Wing.

 

Share

DevOpsDaysNZ 2018 – Day 2 – Session 4

Allen Geer, Amanda Baker – Continuously Testing govt.nz

  • Various .govt.nz sites
  • All Silverstripe and Common Web Platform
  • Many sites out of date, no automated testing, no test metrics, manual testing
  • Micro-waterfall agile
  • Specification by example (prod owner, Devops, QA)  created Gherkin tests
  • Standardised on CircleCI
  • Visualised – Spec by example
  • Prioritised feature tests
  • Ghirkinse
  • Test at start of dev process. Bake Quality in at the start
  • Visualise and display metrics, people could then improve.
  • Path to automation isn’t binary
  • Involve everyone in the team
  • Automation only works if humanised

Jules Clements – Configuration Pipeline : Ruling the One Ring

  • Desired state
  • I didn’t quite understand what he was saying

Nigel Charman – Keep Calm and Carry On Organising

  • 71 Conferences worldwide this year
  • NZ following the rules
  • Lots of help from people
  • Stuff stuff stuff

Jessica DeVita – Retrospecting our Retrospectives

  • Works on Azure DevOps
  • Post-mortems
  • What does it mean to have robust systems and resilience? Is resilience even a property? It just Is. When we fly on planes, we’re trusting machines and automation. Even planes require regular reboots to avoid catastrophic failures, and we just trust that it happen
  • CEO after a million dollar outage said “Can you get me a million dollars of learning out of this?”
  • After US Navy had accidents caused by slept deprivation switched to new watch structure
  • Postmortems are not magic, they don’t automatically make things change
  • http://stella.report
  • We dedicate a lot of time to to below the line, looking at the technology. Not a lot of conversation about above-the-line things like mental models.
  • Resilience is above the line
  • Catching the Apache SNAFU
  • The Ironies of Automation – Lisanne Bainbridge
  • Well facilitated debriefings support recalibration of mental models
  • US Forest Service – Learning Review – Blame discourages people speaking up about problems
  • We never know where the accident boundary is, only when we have crossed it.
    • SRE, Chaos Engineer and Human Factors help hadle
  • In postmortems please be mindful of judging timelines without context. Saying something happened in a short or long period of time is damanging
  • Ask “what made it hard to get that team on the phone?” , “What were you trying to achieve”
  • Etsy Debriefing Guide – lots of important questions.
  • “Moving post shallow incident data” – Adaptive Capacity Labs
  • Safety is a characteristics of Systems and not of their components
  • Ask people about their history, ask every person about what they do and how they got there because that is what shapes your culture as an organisation
Share

DevOpsDaysNZ 2018 – Day 2 – Session 3

Kubernetes

I’ll fill this in later.

Observability

  • Honeycomb, Sumologic. Use AI to look at what happened at same time and magically correlate
  • Expensive or hard to send all logs as volumes go up
  • What is the logging is wrong or missing?
  • Metrics
    • Export in prometheus format
    • Read RED and USE paper
    • Create a company schema with half a dozen metrics that all services expose
  • Had and event or transaction ID that flows across all the microservices sorry logs can be correlated
  • Non technical solutions
    • Refer to previous incident logs
    • Part of deliverables for product is SLA stats which require logs etc
  • Testing logs
    • Make sure certain events produce a log
  • Chaos Monkey

ANZ Drivetrain

  • Change control cares about
    • Avaiability
    • Risk
    • Dependencies
    • Rollback
  • But the team doing the change knows about these all
  • Saw tools out there that seem very opinated
  • Drivetrain
    • Automated Checklist
    • Work with Change people to create checklist
    • Pipeline talks to drivetrain and tells it what has been down
    • Slack messages sent for manual changes (they login to app to approve)
  • Looked at some other tools (eg chef automate, udeploy )
    • Forced team to work in a certain pattern
  • But use ServiceNow tool as official corporate standard
    • Looking at making DriveTrail fill in ServiceNow forms
  • People worried about stages in tool often didn’t realise the existing process had same limitations
  • Risk assessed at the Story and Feature level. Not release level
  • Not suitable for products that due huge released every few months with a massive number of changes.

 

 

Share

DevOpsDaysNZ 2018 – Day 2 – Session 2

Interesting article I read today

Why Doctors Hate their Computers by Atul Gawande

Mrinal Mukherjee – A DevOps Confessional

  • Not about accidents, it is about Planned Blunders that people are doing in DevOps
  • One Track DevOps
    • From Infrastructure background
    • Job going into places, automated the low hanging fruit, easy wins
    • Column of tools on resume
    • Started becoming the bottleneck, his team was the only one who knew how the infrastructure worked.
    • Not able to “DevOps” a company since only able to fix the infrastructure, not able to fix testing etc so not dilvering everything that company expected
    • If you are the only person who understands the infrastructure you are the only one blamed when it goes wrong
    • Fixes
      • Need to take all team on a journey
      • But need to have right expectations set
      • Need to do learning in areas where you have gaps
      • DevOps is not about individual glory, Devops is about delivering value
      • HR needs to make sure they don’t reward the wrong thing
  • MVP-Driven Devops
    • Mostly working on Greenfields products that need to be delivered quickly
    • MVP = Maximum Technical Debt
    • MVP = Delays later and Security audits = Your name attached to the problem
    • Minimum Standard of Engineering
      • Test cases, Documentation, Modular
      • Peer review
    • Evolve architecture, not re-architect
  • Judgemental Devops
    • That team sucks, they are holding things up, playing a different game from us
    • Laughing at other teams
    • Consequence – Stubbornness from the other team
    • Empathy
      • Find out why things are they way they are
    • Collaborate to find common ground and improve
    • Design my system to I plan to work within constraints of the other team
Share

DevOpsDaysNZ 2018 – Day 2 – Session 1

Alison Polton-Simon – The DevOps Experiments: Reflections From a Scaling Startup

  • Software engineer at Angaza, previously Thoughtworks, “Organizational Anthropologist”
  • Angaza
    • Enable sales of life-changing products (eg solar chargers, water pumps, cook stoves in 3rd world countries)
    • Originally did hardware, changed to software company doing pay-as-you-go of existing devices
    • ~50 people. Kenya and SF, 50% engineering
    • No dedicated Ops
    • Innovate with empathy, Maximise impact
    • Model is to provide software tools to activate/deactivate products sold to peopel with low credit-scores. Plus out software around the activity like reports for distributors.
  • Reliability
    • Platform is business critical
    • Outages disrupt real people (households without light, farmers without irrigation)
    • Buildkite, Grafana, Zendesk
  • Constraints
    • Operate in 30+ countries
    • Low connectivity, 2G networks best case
    • Rural and peri-urban areas
    • Team growing by 50% in 2018 (2 eng teams in Kenya + 1 QA)
    • Most customers in timezone where day = SF night
  • Eras of experimentation
    • Ad Hoc
    • Tributes (sacrifice yourself for the stake of the team)
    • Collectives (multiple teams)
    • Product teams
  • ad Hoc – 5 eng
    • 1 eng team
    • Ops by day – you broke, you fix
    • Ops by night – Pagerduty rotation
    • Paged on all backend exception, 3 pages = amnesty
    • Good
      • Small but senior
      • JIT maturity
      • Everyone sitting next to each other
    • Bad
      • Each incident higherly disruptive
      • prioritized necessity over sustainability
  • Tribute – 5-12 eng
    • One person protecting team from interuptions
    • Introduced support person and triage
    • Expanded PD rotation
    • Good
      • More sustainable
      • Blue-Green deploys
      • Clustered workloads
    • Not
      • Headcount != horizontal scaling
      • Hard to hire
      • Customer service declined
  • Collectives 13-20 engs
    • Support and Ops teams – Ops staffed with devs
    • Other teams build roadmaps and requests
    • Teams rotate quarterly – helps onboarding
    • Good
      • Cross train ppl
      • Allow for focus
      • allowed ppl to get depth
    • Bad
      • Teams don’t op what they built
      • Quarter flies by quickly
      • Context switch is costly
      • Still a juggling act
      • 1m ramping up, 6w going okay, 2w winging down
  • Product teams  21 -? eng
    • 5 teams, 2 in Nairobi
    • Teams allighned with business virticals, KPIs
    • Dev, own and maintain services
    • Per-team tributes
    • No [Dev]Ops team
    • Intended goals
      • Independent teams
      • own what build
      • Support biz KPIs
      • cross team coordination
    • Expected Chellenges
      • Ownership without responsbility
      • Global knowledge sharing
      • Return to tribute system (2w out of the workflow)
  • Next
    • Keep growing team
    • Working groups
    • Eventual SRE
    • 24h global coverage
  • Case a “Constitution” of values that everybody who is hired signs
  • Takeaways
    • Maximise impact
      • Dependable tools over fashionable ones
      • Prefer industry-std tech
      • But get creative when necessary
    • Define what reliability means for your system
    • Evolve with Empathy
      • Don’t be dogmatic without structure
      • Serve your customers and your team
      • Adapt when necessary
      • Talk to people
      • Be explicit as to the tradeoffs you are making

Anthony Borton – Four lessons learnt from Microsoft’s DevOps Transformation

  • Microsoft starting in 1975
  • 93k odd engineers at Microsoft
    • 78k deployments per day
    • 2m commits per month
    • 4.4 builds/month
    • 500 million tests/day
  • 2018 State of Devops reports looks at Elite performers in the space
  • TFS – Team Foundation Server
    • Move product to the cloud
    • Moved on-prem to one instance
    • Each account had it’s own DB (broke stuff at 11k DBs)
  • 4 lessons
    • Customer focussed
      • Listen to customers, uservoice.com
      • Lots of team members keep eye on it
      • Stackoverflow
      • Embed with customers
      • Feedback inside product
      • Have to listen in all the channels
    • Failure is an opportunity to learn
    • Definition of done
      • Live in prod, collecting telemetary that examines hypotheses that it was created to prove
    • “For those of you who don’t know who Encarta is, look it up on Wikipedia”
  • Team Structure
    • Combined engineering = devs + testing
      • Some testers left team or organisation
    • Feature team
      • Physical team rooms
      • Cross discipline
      • 10-12 people
      • self managing
      • Clear charter and goals
      • Intact for 12-18 months
    • Sticky note exercise, people pick which teams they would like to join (first 3 choices)
      • 20% choose to change
      • 100% get the choice
  • New constants and requirements
    • Problems
      • Tests took too long – 22h to 2days
      • Tests failed frequently – On 60% passed 100%
      • Quality signal unreliable in master
    • Publish VSTS Quality vision
      • Sorted by exteranl dependancies
      • Unit tests
        • L0 – in-memory unit tests
        • L1 – More with SQL
      • Functional Tests
        • L2 – Functional tests against testable service deployment
        • L3 – Restricted class integration tests that run against production
      • 83k L0 tests run agains all pulls very fast
  • Deploy to rings of users
    • Ring 0 – Internal Only
    • Ring 1 – Small Datacentre 1-1.5m accounts in Brazil (same TZ as US)
    • Ring 2 – Public accounts, medium-large data centre
    • Ring 3 – Large internal accounts
    • Ring 4-5 – everyone else
    • Takes about a week for normal releases.
    • Binaries go out and then the database changes
    • Delays of minutes (up to 75) during the deploys to allow bugs to manafest
    • Some customers have access to feature flags
    • Customers who are risk tolerant can opt in to early deploys. Allows them to get faster feedback from people who are able to provide it
  • More features delivered in 2016 than previous 4 years. 50% more in 2017

 

Share