Simon Lyall's Blog – Page 21 – New Zealand, Sysadmin, Linux, Curry, Transport

DevOps Days Auckland 2017 – Wednesday Session 1

Michael Coté – Not actually a DevOps Talk

Digital Transformation

Goal: deliver value, ~~weekly~~ reliably, with small patches
Management must be the first to fail and transform
Standardize on a platform: special snow flakes are slow, expensive and error prone (see his slide, good list of stuff that should be standardize)
Ramping up: “Pilot low-risk apps, and ramp-up”
Pair programming/working
- Half the advantage is people speed less time on reddit “research”
Don’t go to meetings
Automate compliance, have what you do automatic get logged and create compliance docs rather than building manually.
Crafting Your Cloud-Native Strategy

Sajeewa Dayaratne – DevOps in an Embedded World

Challenges on Embedded
- Hardware – resource constrinaed
- Debugging – OS bugs, Hardware Bugs, UFO Bugs – Oscilloscopes and JTAG connectors are your friend.
- Environment – Thermal, Moisture, Power consumption
- Deploy to product – Multi-month cycle, hard of impossible to send updates to ships at sea.
Principles of Devops , equally apply to embedded
- High Frequency
- Reduce overheads
- Improve defect resolution
- Automate
- Reduce response times
Navico
- Small Sonar, Navigation for medium boats, Displays for sail (eg Americas cup). Navigation displays for large ships
- Dev around world, factory in Mexico
Codebase
- 5 million lines of code
- 61 Hardware Products supported – Increasing steadily, very long lifetimes for hardware
- Complex network of products – lots of products on boat all connected, different versions of software and hardware on the same boat
Architecture
- Old codebase
- Backward compatible with old hardware
- Needs to support new hardware
- Desire new features on all products
What does this mean
- Defects were found too late
- Very high cost of bugs found late
- Software stabilization taking longer
- Manual test couldn’t keep up
- Cost increasing , including opportunity cost
Does CI/CD provide answer?
- But will it work here?
- Case Study from HP. Large-Scale Agile Development by Gary Gruver
Our Plan
- Improve tolls and archetecture
- Build Speeds
- Automated testing
- Code quality control
Previous VCS
- Proprietary tool with limit support and upgrades
- Limited integration
- Lack of CI support
- No code review capacity
Move to git
- Code reviews
- Integrated CI
- Supported by tools
Archetecture
- Had a configurable codebase already
- Fairly common hardware platform (only 9 variations)
- Had runtime feature flags
- But
  - Cyclic dependancies – 1.5 years to clean these up
  - Singletons – cut down
  - Promote unit testability – worked on
  - Many branches – long lived – mega merges
Went to a single Branch model, feature flags, smaller batch sizes, testing focused on single branch
Improve build speed
- Start 8 hours to build Linux platform, 2 hours for each app, 14+ hours to build and package a release
- Options
  - Increase speed
  - Parallel Builds
- What did
  - ccache.clcache
  - IncrediBuild
  - distcc
- 4-5hs down to 1h
Test automation
- Existing was mock-ups of the hardware to not typical
- Started with micro-test
  - Unit testing (simulator)
  - Unit testing (real hardware)
- Build Tools
  - Software tools (n2k simulator, remote control)
  - Hardware tools ( Mimic real-world data, re purpose existing stuff)
- UI Test Automation
  - Build or Buy
  - Functional testing vs API testing
  - HW Test tools
  - Took 6 hours to do full test on hardware.
PipeLine
- Commit -> pull request
- Automated Build / Unit Tests
- Daily QA Build
Next?
- Configuration as code
- Code Quality tools
- Simulate more hardware
- Increase analytics and reporting
- Fully simulated test env for dev (so the devs don’t need the hardware)
- Scale – From internal infrastructure to the cloud
- Grow the team
Lessons Learnt
- Culture!
- Collect Data
- Get Executive Buy in
- Change your tolls and processes if needed
- Test automation is the key
  - Invest in HW
  - Silulate
  - Virtualise
- Focus on good software design for Everything

DevOps Days Auckland 2017 – Tuesday Session 3

Mirror, mirror, on the wall: testing Conway’s Law in open source communities – Lindsay Holmwood

The map between the technical organisation and the technical structure.
Easy to find who owns something, don’t have to keep two maps in your head
Needs flexibility of the organisation structure in order to support flexibility in a technical design
Conway’s “Law” really just adage
Complexity frequently takes the form of hierarchy
Organisations that mirror perform badly in rapidly changing and innovative enviroments

Metrics that Matter – Alison Polton-Simon (Thoughtworks)

Metrics Mania – Lots of focus on it everywhere ( fitbits, google analytics, etc)
How to help teams improve CD process
Define CD
- Software consistently in a deployable state
- Get fast, automated feedback
- Do push-button deployments
Identifying metrics that mattered
- Talked to people
- Contextual observation
- Rapid prototyping
- Pilot offering
4 big metrics
- Deploy ready builds
- Cycle time
- Mean time between failures
- Mean time to recover
Number of Deploy-ready builds
- How many builds are ready for production?
- Routine commits
- Testing you can trust
- Product + Development collaboration
Cycle Time
- Time it takes to go from a commit to a deploy
- Efficient testing (test subset first, faster testing)
- Appropriate parallelization (lots of build agents)
- Optimise build resources
Case Study
- Monolithic Codebase
- Hand-rolled build system
- Unreliable environments ( tests and builds fail at random )
- Validating a Pull Request can take 8 hours
- Coupled code: isolated teams
- Wide range of maturity in testing (some no test, some 95% coverage)
- No understanding of the build system
- Releases routinely delay (10 months!) or done “under the radar”
Focus in case study
- Reducing cycle time, increasing reliability
- Extracted services from monolith
- Pipelines configured as code
- Build infrastructure provisioned as docker and ansible
- Results:
  - Cycle time for one team 4-5h -> 1:23
  - Deploy ready builds 1 per 3-8 weeks -> weekly
Mean time between failures
- Quick feedback early on
- Robust validation
- Strong local builds
- Should not be done by reducing number of releases
Mean time to recover
- How long back to green?
- Monitoring of production
- Automated rollback process
- Informative logging
Case Study 2
- 1.27 million lines of code
- High cyclomatic complexity
- Tightly coupled
- Long-running but frequently failing testing
- Isolated teams
- Pipeline run duration 10h -> 15m
- MTTR Never -> 50 hours
- Cycle time 18d -> 10d
- Created a dashboard for the metrics
Meaningless Metrics
- The company will build whatever the CEO decides to measure
- Lines of code produced
- Number of Bugs resolved. – real life duplicates Dilbert
- Developers Hours / Story Points
- Problems
  - Lack of team buy-in
  - Easy to agme
  - Unintended consiquences
  - Measuring inputs, not impacts
Make your own metrics
- Map your path to production
- Highlights pain points
- collaborate
- Experiment

DevOps Days Auckland 2017 – Tuesday Session 2

Using Bots to Scale incident Management – Anthony Angell (Xero)

Who we are
- Single Team
- Just a platform Operations team
SRE team is formed
- Ops teams plus performance Engineering team
Incident Management
- In Bad old days – 600 people on a single chat channel
- Created Framework
- what do incidents look like, post mortems, best practices,
- How to make incident management easy for others?
ChatOps (Based on Hubot)
- Automated tour guide
- Multiple integrations – anything with Rest API
- Reducing time to restore
- Flexability
Release register – API hook to when changes are made
Issue report form
- Summary
- URL
- User-ids
- how many users & location
- when started
- anyone working on it already
- Anything else to add.
Chat Bot for incident
- Populates for an pushes to production channel, creates pagerduty alert
- Creates new slack channel for incident
- Can automatically update status page from chat and page senior managers
- Can Create “status updates” which record things (eg “restarted server”), or “yammer updates” which get pushed to social media team
- Creates a task list automaticly for the incident
- Page people from within chat
- At the end: Gives time incident lasted, archives channel
- Post Mortum
More integrations
- Report card
- Change tracking
- Incident / Alert portal
High Availability – dockerisation
Caching
- Pageduty
- AWS
- Datadog

DevOps Days Auckland 2017 – Tuesday Session 1

DevSecOps – Anthony Rees

“When Anthrax and Public Enemy came together, It was like Developers and Operations coming together”

Everybody is trying to get things out fast, sometimes we forget about security
Structural efficiency and optimised flow
Compliance putting roadblock in flow of pipeline
- Even worse scanning in production after deployment
Compliance guys using Excel, Security using Shell-scripts, Develops and Operations using Code
Chef security compliance language – InSpec
- Insert Sales stuff here
ispec.io
Lots of pre-written configs available

Immutable SQL Server Clusters – John Bowker (from Xero)

Problem
- Pet Based infrastructure
- Not in cloud, weeks to deploy new server
- Hard to update base infrastructure code
110 Prod Servers (2 regions).
1.9PB of Disk
Octopus Deploy: SQL Schemas, Also server configs
Half of team in NZ, Half in Denver
- Data Engineers, Infrastructure Engineers, Team Lead, Product Owner
Where we were – The Burning Platform
- Changed mid-Migration from dedicated instances to dedicated Hosts in AWS
- Big saving on software licensing
Advantages
- Already had Clustered HA
- Existing automation
- 6 day team, 15 hours/day due to multiple locations of team
Migration had to have no downtime
- Went with node swaps in cluster
Split team. Half doing migration, half creating code/system for the node swaps
We learnt
- Dedicated hosts are cheap
- Dedicated host automation not so good for Windows
- Discovery service not so good.
- Syncing data took up to 24h due to large dataset
- Powershell debugging is hard (moving away from powershell a bit, but powershell has lots of SQL server stuff built in)
- AWS services can timeout, allow for this.
Things we Built
- Lots Step Templates in Octopus Deploy
- Metadata Store for SQL servers – Dynamite (Python, Labda, Flask, DynamoDB) – Hope to Open source
- Lots of PowerShell Modules
Node Swaps going forward
- Working towards making this completely automated
- New AMI -> Node swap onto that
- Avoid upgrade in place or running on old version

Linux.conf.au 2017 – Friday – Closing

Code of Consult and Safety

Badge
- Putting prefered pronoun
- Emoji
Free Childcare
- Sponsored by Github
- Approx 10 kids
Assistance Grants
Attendees
- Breakdown by gender etc
- Roughly 25% of attendees and speakers not men
More numbers
- 104 Matrix chat users
- 554 attendees
- 2900 coffee cups
- Network claimed to 7.5Gb/s
- 1.6 TB over the week, 200Mb/s max
- 30 Session Chairs
- 12 Miniconfs
- 491 Proposals (130 more than the others)
- 6 Tutorials, 75 talks, 80 speakers
- 4 Keynote speakers
- 21 Sponsors

Linux.conf.au 2018 – Sydney

A little bit of history repeating
2001, 2007, 2018
Venue is UTS
5 minutes to food, train station
https://lca2018.org
@lca2018 on twitter
Looking for a few extra helpers

Raffle

In support of Outreachy
3 interns funded

Final Bit

Thanks to team members

Linux.conf.au 2017 – Friday – Lightning Talks

Use #lcapapers to tell Linux.conf.au what you want to see in 2018

Michael Still and Michael Davies get the Rusty Wrench award

Karaoke – Jack Skinner

Talk with random slides

Martin Krafft

Matrix
End to end encrypted communication system
No entity owns your conversations
Bridge between walled gardens (eg IRC and Slack)
In Very late Beta, 450K user accounts
Run or Write your own servers or services or client

Cooked – Pete the Pirate

How to get into Sous Vide cooking
Create home kit
Beaglebone Black
Rice cooker, fish tank air pump.
Also use to germinate seeds
Also use this system to brew beer

Emoji Archeology 101 – Russell Keith-Magee

1963 Happy face created
🙂 invented
later 🙁 invented
Only those emotions imposed by the Unicode consortium can now be expressed

The NTPsec Project – Mark Atwood

Since 2014
For and git in 2015 from parent ntp project
1.0.0 release soon
Removed 73% of lines from classic
- Removed commandline tools
- Got write of stuff for old OSes
- Changed to POSIX and modern coding
- removed experiments
Switch to git and bugzilla etc
Fun not painful
Welcoming community, not angry
ntpsec.org

National Computer Science Summer School – Katie Bell

Running for 22 years
Web stream, Embedded Stream
Using BBC Microbit
Lots of projects
Students in grade 10-11
Happens in January
Also 5 week long online programming competition NCSS Competition.

Blockchain – Rusty Russell

Blockchain
Blockchain
Blockchain

Go to Antarctica – Jucinter Richardson

Went Twice
Go by ship
No rain
Nice and cool
Join the government
Positions close
Go while it is still there

Cool and Awesome projects you should help with – Tim Ansell

Tomu Boards
MicroPython on FPGAs
Python Devicetree – needs a good library
QEMU for LiteX / MiSoC
NuttX for LiteX / MiSoC
QEMU for Tomu
Improving LiteX / MiSoc
Sypress FX2
Linux to LiteX / MiSoC
DMMI2USB
j.mp/timpro-lca2017

LoRa TAS – Paul Neumeyer

long range (2-3km urban 10km rural)
low power (batter ~5 years)
Unlicensed radio spectrum 915-928 Mhz BAnd (AUS)
LoRaWAN is an open standard
Ideal for IoT applications (sensing, preventative maintenance, smart)

Roan Kattatow

Different languages mix dots and commas and spaces etc to write numbers

ZeroSkip – Ron Gondwana

Crash safe embeded database
Not fast enough
Zeroskip
Append only database file
Switch files now and then
Repack old files togeather

PyCon Au – Richard Jones

Python Conference Australia
7th in Melbourne in Aug 2016 – 650 people, 96 presentation
In Melb on 308 of August on 2016
2017.pycon-au.org

Buying a Laptop built for Linux – Paul Wayper

Bought from System76
Designed for Linux

openQA – Aleksa Sarai

Life is too short for manual testing
Perl based framework that lets you emulate a user
Runs from console, emulates keyboard and mouse
Has screenshots
Used by SUSE and openSUSE and fedora
Fuzzy comparison, using regular expressions
open.qa

South Coast Track – Bec, Clinton and Richard

What I did in the Holidays
6 day walk in southern tasmania
Lots of pretty photos

Linux.conf.au 2017 – Friday – Session 2

Continuously Delivering Security in the Cloud – Casey West

This is a talk about operation excellence
Why are system attacked? Because they exist
Resisting Change to Mitigate Risk – It’s a trap!
You have a choice
- Going fast with unbounded risk
- Going slow to mitigate risk
Advanced Persistent Threat (ATP) – The breach that lasts for months
Successful attacks have
- Time
- Leaked or misused creditials
- Miconfigured or unpatched software
Changing very little slowly helps all three of the above
A moving target is harder to hit
Cloud-native operability lets platforms move faster
- Composable architecture (serverless, microservices)
- Automated Processes (CD)
- Collaborative Culture (DevOps)
- Production Environment (Structured Platform)
The 3 Rs
- Rotate
  - Rotate credentials every few minutes or hours
  - Credentials will leak, Humans are weak
  - “If a human being generates a password for you then you should reject it”
  - Computers should generate it, every few hours
- Repave
  - Repave every server and application every few minutes/hours
  - Implies you have things like LBs that can handle servers adding and leaving
  - Container lifecycle
    - Built
    - Deploy
    - Run
    - Stop
    - Note: No “change “step
  - A Server that doesn’t exist isn’t being cromprimised
  - Regularly blow away running containers
  - Repave ≠ Patch
  - uptime <= 3600
- Repair
  - Repair vulnerable runtime environments every few minutes or hours
  - What stuff will need repair?
    - Applications
    - Runtime Environments (eg rails)
    - Servers
    - Operating Systems
  - The Future of security is build pipelines
  - Try to put in credential rotation and upsteam imports into your builds
Embracing Change to Mitigate Risk
Less of a Trap (in the cloud)

Linux.conf.au 2017 – Friday – Session 1

Adventures in laptop battery hacking -Matthew Chapman

Lenovo Thinkpad X230T
- Bought Aug 2013
- Ariginal capacity 62 KWh – 5hours and 12W
- Capacity down to 1.9Wh – 10 minutes
45N1079 replacement bought
- DRM on laptop claimed it was not genuine and refused to recharge it.
Batteries talk SBS protocol to laptop
SMBus port and SMClock port
- sniffed the port with logic analyser
- Using I2C protocol
- Looked at spec to see what it means
- Challenge-response authentication
Options
1. Throw Away
2. Replace Cells
  - Easy to damage
  - Might not work
3. Hack firmware on battery
  - Talk at DEFCON 19
  - But this is different model from that
  - Couldn’t work out how to get to firmware
4. Added something in between
5. Update the firmware on the machine
  - Embeded Controller (EC)
  - MEC1619
Looking though the firmware for Battery Authentication
- Found routine that look plausable
- But other stuff was encrypted
EC Update process
- BIOS update puts EC update in spare flash memory area
- After the BIOs grabs that and applies update
Pulled apart the BIOs, found EcFwUpdateDxe.efi routine that updates the EC
- Found that stuff send to the EC still encrypted.
- Unencryption done by flasher program
Flasher program
- Encrypted itself (decrypted by the current fireware)
- JTAG interface for flashing debug
JTAG
- Physically difficult to get to
- Luckily Russian Hackers have already grabbed a copy
The Decryption function in the Flasher program
- Appears to be blowfish
- Found the key (in expanded form) in the firmware
- Enough for the encryption and decryption
Checksums
- Outer checksum checked by BIOs
- Post-decryption sum – checked by the flasher (bricks EC if bad)
- Section Echecksums (also bricks)
Applying
- noop the checks in code
- noop another check that sometimes failer
- Different error message
Found a second authentication process
- noop out the 2nd challenge in the BIOs
Works!
Posted writeup, posted to hacker news
- 1 million page views
Uploaded code to github
- Other people doing stuff with the embedded controller
- No longer works on latest laptops, EC firmware appears to be signed
Anything can be broken with physical access and significant determination

Election Software – Vanessa Teague

Australian Elections use a lot of software
- Encoding and counting preferential votes
- For voting in polling places
- For voting over the internet
How do we know this software is correct
The Paper ballot box is engineered around a serious of problems
- In the past people bought their own voting paper
- The Australian Ballot used in many places (eg NZ)
- Franch use different method with envelopes and glass boxes
- The US has had lots of problems and different ways
Four cases studies in Aus
vVote: Victoria
- Vic state election 2014
- 1121 votes for overseas Australians voting in Embassies etc
- Based on Pret a Voter
- You can varify that what you voted was what went though
- Source code on bitbucket
- Crypto signed, varified, open source, etc
- Not going forward
- Didn’t get the electoral commissions input and buy-in.
- A little hard to use
iVote: NSW and WA
- 280,000 votes over Internet in 2015 NSW state election ( around 5-6% of total votes)
- Vote on a device of your choosing
- Vote encrypted and send over Internet
- Get receipt number
- Exports to a varification service. You can telephone them, give them your number and they will read back you votes
- Website used 3rd-party analytics provider with export-grade crypto
  - Vulnerable to injection of content, votes could be read or changed
  - Fixed (after 66k votes cast)
- NSW iVote really wasn’t varifiable
- About 5000 people called into service and successfully verified
- How many tried to verify but failed?
- Commission said 1.7% of electors verified and none identified any anomalies with their vote (Mar 2015)
- How many tried and failed? “in the 10s” (Oct 2015)
- Parliamentary said how many failed? Seven or 5 (Aug 2016)
- How many failed to get any vote? 627 (Aug 2016)
- This is a failure rate of about 10%
- It is believed it was around 200 unique (later in 2016)
Vote Counting software
Errors in NSW counting
- NSW legislative voting redistributed votes are selected at random
- No source code for this
- Use same source code for lots of other elections
- Re-ran some of the votes, found randomness could change results. Found one most likely cost somebody a seat, but not till 4 years later.
Recomended
- Generate the random key publicly
- Open up the source code
- They electorial peopel didn’t want to do this.
In the 2016 localgovt count we found 2 more bugs
- One candidate should have won with 54% probability but didn’t
The Australian Senate Count
AEC consistent refuses to revel the source code
The Senate Date is release, you can redo it yourself any bugs will become evident
What about digitising the ballots?
- How would we know if that wasn’t working?
- Only by auditing the paper evidence
Auditing
- The Americas have a history or auditing the paper ballots
- But the Australian vote is a lot more complex so everything not 100% yet
- Stuff is online

Linux.conf.au 2017 – Friday Keynote – Robert Lefkowitz

Keeping Linux Great

Previous Keynotes have posed question I’ll pose answers
What is the free of open source software, it has no future
FLOSS is yesterday’s gravy
- Based on where the technology is today. How would FLOSS work with punch cards?
- Other people have said similar things
- Software, Linux and similar all going down in google trends
- But “app” is going up
Lithification
- Small pieces losely joined
- Linux used to be great could you could pipe stuff to little programs
- That is what is happening to software
- Example – share a page to another app in a mobile interface
- All apps no longer need to send mail, they just have to talk to the mail app
So What should you do?
- Vendor all you dependencies, just copy everyone elses code into your repo (and list their names if it is BSD) so you can ship everything in one blob (eg Android)
  - Components must be 5> million or >20 LOC , only a handful or them
  - At the other end apps are smaller since they can depend on the OS or other Apps for lots of functionality so they don’t have to write it themselves.
  - Example node with thousands of dependencies
App Freedom
- “Advanced programming environments conflate the runtime with the devtime” – Bret Victor
- Open Source software rarely does that
- “It turns out that Object Orientation didn’t work out, it is another legacy with are stuck with”
- Having the source code is nice but it is not a requirement. Access to the runtime is what you want. You need to get it where people are using it.
Liberal Software
But not everything wasn’t to be a programmer
- 75% comes from 6 generic web applications ( collection, storage, reservation, etc)
A lot of functionality requires big data or huge amounts of machines or is centralised so open sourcing the software doesn’t do anything useful
If it was useful it could be patented, if it was not useful but literary then it was just copyright

Linux.conf.au 2017 – Thursday – Session 3

Open Source Accelerating Innovation – Allison Randal

Story of Stallman and the printer
Don’t talk about the story of the context
- Stallman was living in a free software domain, propriety software was creeping in
- Software only became subject to copyright in early 80s
First age of software – 1940s – 1960s
- Software was low value
- Software was all free and open, given away
Precursor – The 1970s
Middle Age of Software – 1980s
- Start of Windows, Mac, Oracle and other big software companies
- Also start of GNU and BSD
- Who Leads?
  - Propritory software was seen as the innovator and always would be.
  - Free Software was seen to be always chasing after windows
The 2000s
- Free Software caught up with Propritory
- Used by big companies
- “Open Source” name adopted
- dot-com bubble had burst
- Web 2.0
- Economic necessity, everyone else getting it for free
- Collaborative Process – no silver bullet but a better chance
- Innovations lead by open source
Software Freedoms
- About Control over our material enviroment
- If you don’t other freedoms then you don’t have a free society
Modern Age of Software
- Accelerating
- Cops in 2010 42% used OS software, In 2015 78% using
- Using Open Source is now just table stakes
- Competitive edge for companies is participating is OS
- Most participation pushes innovation even faster
Now What?
- The New innovative companies
  - Amazing experiences
  - Augment Workers
  - Deliver cool stuff to customers
  - Use Network effects, Brand names
- Businesses making contribution to society
- Need to look at software that just doesn’t cover commercial use cases.
Next Phase
- Diversity
- Myopic monocultures – risk cause they miss the dangers
- empowered to change the rule for the better

Surviving the Next 30 Years of Free Software – Karen M. Sandler

We’re not getting any younger
Software Relicensing
- Need to get approval of authors to re-license
- Has had to contact surviving spouse and get them to agree to re-license the code
- One survivor wanted payment. Didn’t understand that code would be written out of the project.
There are surely other issues that that we have no considered
Copyright Assignment is a way around it
- But not everybody likes that.
Bequeathment doesn’t work
- In some jurisdictions copyrights have to assessed for their value before being transferred. Taxes could be owed
Who is your next of Kin?
- They might share your OS values or even think of them
Need perpetual care of copyrights
- Debian Copyright Aggregation Projects
A Trust
- Assign copyrights today, will give you back the rights you want but these expire on your death
- Would be a registry for free software
- Companies could participate to
Recognize the opportunity with age
- A lot of people with a lot of spare time