Linux.conf.au 2018 – Day 1 – Session 3 – Developers, Developers Miniconf

Beyond Web 2.0 Russell Keith-Magee

  • Django guy
  • Back in 2005 when Django first came out
    • Web was fairly simple, click something and something happened
    • model, views, templates, forms, url routing
  • The web c 2016
    • Rich client
    • API
    • mobile clients, native apps
    • realtime channels
  • Rich client frameworks
    • reponse to increased complexity that is required
    • Complex client-side and complex server-side code
  • Isomorphic Javascript development
    • Same code on both client and server
    • Only works with javascript really
    • hacks to work with other languages but not great
  • Isomorphic javascript development
    • Requirements
    • Need something in-between server and browser
    • Was once done with Java based web clients
    • model, view, controller
  • API-first development
  • How does it work with high-latency or no-connection?
  • Part of the controller and some of the model needed in the client
    • If you have python on the server you need python on the client
    • brython, skulp, pypy.js
    • <script type=”text/pyton”>
    • Note: Not phyton being compiled into javascript. Python is run in the browser
    • Need to download full python interpreter though (500k-15M)
    • Fairly fast
  • Do we need a full python interpreter?
    • Maybe something just to run the bytecode
    • Batavia
    • Javascript implementation of python virtual machine
    • 10KB
    • Downside – slower than cpython on the same machine
  • WASM
    • Like assembly but for the web
    • Benefits from 70y of experience with assembly languages
    • Close to Cpython speed
    • But
      • Not quite on browsers
      • No garbage collection
      • Cannot manipulate DOM
      • But both coming soon
  • Example: http://bit.ly/covered-in-bees
  • But “possible isn’t enough”
  • pybee.org
  • pybee.org/bee/join

Using “old skool” Free tools to easily publish API documentation – Alec Clew

  • https://github.com/alecthegeek/doc-api-old-skool
  • You API is successful if people are using it
  • High Quality and easy to use
  • Provide great docs (might cut down on support tickets)
  • Who are you writing for?
    • Might not have english as first language
    • New to the API
    • Might have different tech expertise (different languages)
    • Different tooling
  • Can be hard work
  • Make better docs
    • Use diagrams
    • Show real code (complete and working)
  • Keep your sentence simple
  • Keep the docs current
  • Treat documentation like code
    • Fix bugs
    • add features
    • refactor
    • automatic builds
    • Cross platform support
    • “Everything” is text and under version control
  • Demo using pandoc
  • Tools
  • pandoc, plantuml, Graphviz, M4, make, base/sed/python/etc

 

Lightning Talks

  • Nic – Alt attribute
    • need to be added to images
    • Don’t have alts when images as links
    • http://bit.ly/Nic-slides
  • Vaibhav Sager – Travis-CI
    • Builds codes
    • Can build websites
    • Uses to build Resume
    • Build presentations
  • Steve Ellis
    • Openshift Origin Demo
  • Alec Clews
    • Python vs C vs PHP vs Java vs Go for small case study
    • Implemented simple xmlrpc client in 5 languages
    • Python and Go were straightforward, each had one simple trick (40-50 lines)
    • C was 100 lines. A lot harder. Conversions, etc all manual
    • PHP wasn’t too hard. easier in modern vs older PHP
  • Daurn
    • Lua
    • Fengari.io – Lua in the browser
  • Alistair
    • How not to docker ( don’t trust the Internet)
    • Don’t run privileged
    • Don’t expose your docker socket
    • Don’t use host network mode
    • Don’t where your code is FROM
    • Make sure your kernel on your host is secure
  • Daniel
    • Put proxy in front of the docker socket
    • You can use it to limit what no-priv users with socket access to docker port can do

 

Share

Linux.conf.au 2018 – Day 1 – Session 2

Manage all your tasks with TaskWarrior Paul ‘@pjf’ Fenwick

  • Lots of task management software out there
    • Tried lots
    • Doesn’t like proprietary ones, but unable to add features he wants
    • Likes command line
  • Disclaimer: “Most systems do not work for most people”
  • TaskWarrior
    • Lots of features
    • Learning cliff

Intro to TaskWarrior

  • Command line
  • Simple level can be just a todo list
  • Can add tags
    • unstructured many to many
    • Added just put putting “+whatever” on command
    • Great for searching
    • Can put all people or all types of jobs togeather
  • Meta Tags
    • Automatic date related (eg due this week or today)
  • Project
    • A bunch of tasks
    • Can be strung togeather
    • eg Travel project, projects for each trip inside them
  • Contexts (show only some projects and tasks)
    • Work tasks
    • Tasks for just a client
    • Home stuff
  • Annotation (Taking notes)
    • $ task 31 annotate “extra stuff”
    • has an auto timestamp
    • show by default, or just show a count of them
  • Tasks associated with dates
    • “wait”
    • Don’t show task until a date (approx)
    • Hid a task for an amount of time
    • Scheduled tasks urgency boasted at specific date
  • Until
    • delete a task after a certain date
  • Relative to other tasks
    • eg book flights 30 days before a conference
    • good for scripting, create a whole bunch of related tasks for a project
  • due dates
    • All sorts of things give (see above) gives tasks higher priority
    • Tasks can be manually changed
  • Tools and plugins
    • Taskopen – Opens resources in annotations (eg website, editor)
  • Working with others
    • Bugworrier – interfaces with github trello, gmail, jira, trac, bugzilla and lots of things
    • Lots of settings
    • Keeps all in sync
  • Lots of extra stuff
    • Paul updates his shell prompt to remind him things are busy
  • Also has
    • Graphical reports: burndown, calendar
    • Hooks: Eg hooks to run all sort of stuff
    • Online Sync
    • Android client
    • Web client
  • Reminder it has a steep learning curve.

Love thy future self: making your systems ops-friendly Matt Palmer

  • Instrumentation
  • Instrumenting incoming requests
    • Count of the total number of requests (broken down by requestor)
    • Count of reponses (broken down by request/error)
    • How long it took (broken down by sucess/errors
    • How many right now
  • Get number of in-progress requests, average time etc
  • Instrumenting outgoing requests
    • For each downstream component
    • Number of request sent
    • how many reponses we’ve received (broken down by success/err)
    • How long it too to get the response (broken down by request/ error)
    • How many right now
  • Gives you
    • incoming/outgoing ratio
    • error rate = problem is downstream
  • Logs
    • Logs cost tends to be more than instrumentation
  • Three Log priorities
    • Error
      • Need a full stack trace
      • Add info don’t replace it
      • Capture all the relivant variables
      • Structure
    • Information
      • Startup messages
      • Basic request info
      • Sampling
    • Debug
      • printf debugging at webcale
      • tag with module/method
      • unique id for each request
      • late-bind log data if possible.
      • Allow selective activation at runtime (feature flag, special url, signals)
    • Summary
      • Visbility required
      • Fault isolation

 

Share

Linux.conf.au 2018 – Day 1 – Session 1 – Kernel Miniconf

Look out for what’s in the security pipeline – Casey Schaufler

Old Protocols

  • SeLinux
    • No much changing
  • Smack
    • Network configuration improvements and catchup with how the netlable code wants things to be done.
  • AppArmor
    • Labeled objects
    • Networking
    • Policy stacking

New Security Modules

  • Some peopel think existing security modules don’t work well with what they are doing
  • Landlock
    • eBPF extension to SECMARK
    • Kills processes when it goes outside of what it should be doing
  • PTAGS
    • General purpose process tags
    • Fro application use ( app can decide what it wants based on tags, not something external to the process enforcing things )
  • HardChroot
    • Limits on chroot jail
    • mount restrictions
  • Safename
    • Prevents creation of unsafe files names
    • start, middle or end characters
  • SimpleFlow
    • Tracks tainted data

Security Module Stacking

  • Problems with incompatibility of module labeling
  • People want different security policy and mechanism in containers than from the base OS
  • Netfilter problems between smack and Apparmor

Container

  • Containers are a little bit undefined right now. Not a kernel construct
  • But while not kernel constructs, need to work with and support them

Hardening

  • Printing pointers (eg in syslog)
  • Usercopy

 

Share

DevOps Days Auckland 2017 – Wednesday Session 3

Sanjeev Sharma – When DevOps met SRE: From Apollo 13 to Google SRE

  • Author of Two DevOps Bookks
  • Apollo 13
    • Who were the real heroes? The guys back at missing control. The Astronaunts just had to keep breathing and not die
  • Best Practice for Incident management
    • Prioritize
    • Prepare
    • Trust
    • Introspec
    • Consider Alternatives
    • Practice
    • Change it around
  • Big Hurdles to adoption of DevOps in Enterprise
    • Literature is Only looking at one delivery platform at a time
    • Big enterprise have hundreds of platforms with completely different technologies, maturity levels, speeds. All interdependent
    • He Divides
      • Industrialised Core – Value High, Risk Low, MTBF
      • Agile/Innovation Edge – Value Low, Risk High, Rapid change and delivery, MTTR
      • Need normal distribution curve of platforms across this range
      • Need to be able to maintain products at both ends in one IT organisation
  • 6 capabilities needed in IT Organisation
    • Planning and architecture.
      • Your Delivery pipeline will be as fast as the slowest delivery pipeline it is dependent on
    • APIs
      • Modernizing to Microservices based architecture: Refactoring code and data and defining the APIs
    • Application Deployment Automation and Environment Orchestration
      • Devs are paid code, not maintain deployment and config scripts
      • Ops must provide env that requires devs to do zero setup scripts
    • Test Service and Environment Virtualisation
      • If you are doing 2week sprints, but it takes 3-weeks to get a test server, how long are your sprints
    • Release Management
      • No good if 99% of software works but last 1% is vital for the business function
    • Operational Readiness for SRE
      • Shift between MTBF to MTTR
      • MTTR  = Mean time to detect + Mean time to Triage + Mean time to restore
      • + Mean time to pass blame
    • Antifragile Systems
      • Things that neither are fragile or robust, but rather thrive on chaos
      • Cattle not pets
      • Servers may go red, but services are always green
    • DevOps: “Everybody is responsible for delivery to production”
    • SRE: “(Everybody) is responsible for delivering Continuous Business Value”
Share

DevOps Days Auckland 2017 – Wednesday Session 2

Marcus Bristol (Pushpay) – Moving fast without crashing

  • Low tolerance for errors in production due to being in finance
  • Deploy twice per day
  • Just Culture – Balance safety and accountability
    • What rule?
    • Who did it?
    • How bad was the breach?
    • Who gets to decide?
  • Example of Retributive Culture
    • KPIs reflect incidents.
    • If more than 10% deploys bad then affect bonus
    • Reduced number of deploys
  • Restorative Culture
  • Blameless post-mortem
    • Can give detailed account of what happened without fear or retribution
    • Happens after every incident or near-incident
    • Written Down in Wiki Page
    • So everybody has the chance to have a say
    • Summary, Timeline, impact assessment, discussion, Mitigations
    • Mitigations become highest-priority work items
  • Our Process
    • Feature Flags
    • Science
    • Lots of small PRs
    • Code Review
    • Testers paired to devs so bugs can be fixed as soon as found
    • Automated tested
    • Pollination (reviews of code between teams)
    • Bots
      • Posts to Slack when feature flag has been changed
      • Nags about feature flags that seems to be hanging around in QA
      • Nags about Flags that have been good in prod for 30+ days
      • Every merge
      • PRs awaiting reviews for long time (days)
      • Missing postmortun migrations
      • Status of builds in build farm
      • When deploy has been made
      • Health of API
      • Answer queries on team member list
      • Create ship train of PRs into a build and user can tell bot to deploy to each environment
Share

DevOps Days Auckland 2017 – Wednesday Session 1

Michael Coté – Not actually a DevOps Talk

Digital Transformation

  • Goal: deliver value, weekly reliably, with small patches
  • Management must be the first to fail and transform
  • Standardize on a platform: special snow flakes are slow, expensive and error prone (see his slide, good list of stuff that should be standardize)
  • Ramping up: “Pilot low-risk apps, and ramp-up”
  • Pair programming/working
    • Half the advantage is people speed less time on reddit “research”
  • Don’t go to meetings
  • Automate compliance, have what you do automatic get logged and create compliance docs rather than building manually.
  • Crafting Your Cloud-Native Strategy

Sajeewa Dayaratne – DevOps in an Embedded World

  • Challenges on Embedded
    • Hardware – resource constrinaed
    • Debugging – OS bugs, Hardware Bugs, UFO Bugs – Oscilloscopes and JTAG connectors are your friend.
    • Environment – Thermal, Moisture, Power consumption
    • Deploy to product – Multi-month cycle, hard of impossible to send updates to ships at sea.
  • Principles of Devops , equally apply to embedded
    • High Frequency
    • Reduce overheads
    • Improve defect resolution
    • Automate
    • Reduce response times
  • Navico
    • Small Sonar, Navigation for medium boats, Displays for sail (eg Americas cup). Navigation displays for large ships
    • Dev around world, factory in Mexico
  • Codebase
    • 5 million lines of code
    • 61 Hardware Products supported – Increasing steadily, very long lifetimes for hardware
    • Complex network of products – lots of products on boat all connected, different versions of software and hardware on the same boat
  • Architecture
    • Old codebase
    • Backward compatible with old hardware
    • Needs to support new hardware
    • Desire new features on all products
  • What does this mean
    • Defects were found too late
    • Very high cost of bugs found late
    • Software stabilization taking longer
    • Manual test couldn’t keep up
    • Cost increasing , including opportunity cost
  • Does CI/CD provide answer?
    • But will it work here?
    • Case Study from HP. Large-Scale Agile Development by Gary Gruver
  • Our Plan
    • Improve tolls and archetecture
    • Build Speeds
    • Automated testing
    • Code quality control
  • Previous VCS
    • Proprietary tool with limit support and upgrades
    • Limited integration
    • Lack of CI support
    • No code review capacity
  • Move to git
    • Code reviews
    • Integrated CI
    • Supported by tools
  • Archetecture
    • Had a configurable codebase already
    • Fairly common hardware platform (only 9 variations)
    • Had runtime feature flags
    • But
      • Cyclic dependancies – 1.5 years to clean these up
      • Singletons – cut down
      • Promote unit testability – worked on
      • Many branches – long lived – mega merges
  • Went to a single Branch model, feature flags, smaller batch sizes, testing focused on single branch
  • Improve build speed
    • Start 8 hours to build Linux platform, 2 hours for each app, 14+ hours to build and package a release
    • Options
      • Increase speed
      • Parallel Builds
    • What did
      • ccache.clcache
      • IncrediBuild
      • distcc
    • 4-5hs down to 1h
  • Test automation
    • Existing was mock-ups of the hardware to not typical
    • Started with micro-test
      • Unit testing (simulator)
      • Unit testing (real hardware)
    • Build Tools
      • Software tools (n2k simulator, remote control)
      • Hardware tools ( Mimic real-world data, re purpose existing stuff)
    • UI Test Automation
      • Build or Buy
      • Functional testing vs API testing
      • HW Test tools
      • Took 6 hours to do full test on hardware.
  • PipeLine
    • Commit -> pull request
    • Automated Build / Unit Tests
    • Daily QA Build
  • Next?
    • Configuration as code
    • Code Quality tools
    • Simulate more hardware
    • Increase analytics and reporting
    • Fully simulated test env for dev (so the devs don’t need the hardware)
    • Scale – From internal infrastructure to the cloud
    • Grow the team
  • Lessons Learnt
    • Culture!
    • Collect Data
    • Get Executive Buy in
    • Change your tolls and processes if needed
    • Test automation is the key
      • Invest in HW
      • Silulate
      • Virtualise
    • Focus on good software design for Everything
Share

DevOps Days Auckland 2017 – Tuesday Session 3

Mirror, mirror, on the wall: testing Conway’s Law in open source communities – Lindsay Holmwood

  • The map between the technical organisation and the technical structure.
  • Easy to find who owns something, don’t have to keep two maps in your head
  • Needs flexibility of the organisation structure in order to support flexibility in a technical design
  • Conway’s “Law” really just adage
  • Complexity frequently takes the form of hierarchy
  • Organisations that mirror perform badly in rapidly changing and innovative enviroments

Metrics that Matter – Alison Polton-Simon (Thoughtworks)

  • Metrics Mania – Lots of focus on it everywhere ( fitbits, google analytics, etc)
  • How to help teams improve CD process
  • Define CD
    • Software consistently in a deployable state
    • Get fast, automated feedback
    • Do push-button deployments
  • Identifying metrics that mattered
    • Talked to people
    • Contextual observation
    • Rapid prototyping
    • Pilot offering
  • 4 big metrics
    • Deploy ready builds
    • Cycle time
    • Mean time between failures
    • Mean time to recover
  • Number of Deploy-ready builds
    • How many builds are ready for production?
    • Routine commits
    • Testing you can trust
    • Product + Development collaboration
  • Cycle Time
    • Time it takes to go from a commit to a deploy
    • Efficient testing (test subset first, faster testing)
    • Appropriate parallelization (lots of build agents)
    • Optimise build resources
  • Case Study
    • Monolithic Codebase
    • Hand-rolled build system
    • Unreliable environments ( tests and builds fail at random )
    • Validating a Pull Request can take 8 hours
    • Coupled code: isolated teams
    • Wide range of maturity in testing (some no test, some 95% coverage)
    • No understanding of the build system
    • Releases routinely delay (10 months!) or done “under the radar”
  • Focus in case study
    • Reducing cycle time, increasing reliability
    • Extracted services from monolith
    • Pipelines configured as code
    • Build infrastructure provisioned as docker and ansible
    • Results:
      • Cycle time for one team 4-5h -> 1:23
      • Deploy ready builds 1 per 3-8 weeks -> weekly
  • Mean time between failures
    • Quick feedback early on
    • Robust validation
    • Strong local builds
    • Should not be done by reducing number of releases
  • Mean time to recover
    • How long back to green?
    • Monitoring of production
    • Automated rollback process
    • Informative logging
  • Case Study 2
    • 1.27 million lines of code
    • High cyclomatic complexity
    • Tightly coupled
    • Long-running but frequently failing testing
    • Isolated teams
    • Pipeline run duration 10h -> 15m
    • MTTR Never -> 50 hours
    • Cycle time 18d -> 10d
    • Created a dashboard for the metrics
  • Meaningless Metrics
    • The company will build whatever the CEO decides to measure
    • Lines of code produced
    • Number of Bugs resolved. – real life duplicates Dilbert
    • Developers Hours / Story Points
    • Problems
      • Lack of team buy-in
      • Easy to agme
      • Unintended consiquences
      • Measuring inputs, not impacts
  • Make your own metrics
    • Map your path to production
    • Highlights pain points
    • collaborate
    • Experiment

 

Share

DevOps Days Auckland 2017 – Tuesday Session 2

Using Bots to Scale incident Management – Anthony Angell (Xero)

  • Who we are
    • Single Team
    • Just a platform Operations team
  • SRE team is formed
    • Ops teams plus performance Engineering team
  • Incident Management
    • In Bad old days – 600 people on a single chat channel
    • Created Framework
    • what do incidents look like, post mortems, best practices,
    • How to make incident management easy for others?
  • ChatOps (Based on Hubot)
    • Automated tour guide
    • Multiple integrations – anything with Rest API
    • Reducing time to restore
    • Flexability
  • Release register – API hook to when changes are made
  • Issue report form
    • Summary
    • URL
    • User-ids
    • how many users & location
    • when started
    • anyone working on it already
    • Anything else to add.
  • Chat Bot for incident
    • Populates for an pushes to production channel, creates pagerduty alert
    • Creates new slack channel for incident
    • Can automatically update status page from chat and page senior managers
    • Can Create “status updates” which record things (eg “restarted server”), or “yammer updates” which get pushed to social media team
    • Creates a task list automaticly for the incident
    • Page people from within chat
    • At the end: Gives time incident lasted, archives channel
    • Post Mortum
  • More integrations
    • Report card
    • Change tracking
    • Incident / Alert portal
  • High Availability – dockerisation
  • Caching
    • Pageduty
    • AWS
    • Datadog

 

Share

DevOps Days Auckland 2017 – Tuesday Session 1

DevSecOps – Anthony Rees

“When Anthrax and Public Enemy came together, It was like Developers and Operations coming together”

  • Everybody is trying to get things out fast, sometimes we forget about security
  • Structural efficiency and optimised flow
  • Compliance putting roadblock in flow of pipeline
    • Even worse scanning in production after deployment
  • Compliance guys using Excel, Security using Shell-scripts, Develops and Operations using Code
  • Chef security compliance language – InSpec
    • Insert Sales stuff here
  • ispec.io
  • Lots of pre-written configs available

Immutable SQL Server Clusters – John Bowker (from Xero)

  • Problem
    • Pet Based infrastructure
    • Not in cloud, weeks to deploy new server
    • Hard to update base infrastructure code
  • 110 Prod Servers (2 regions).
  • 1.9PB of Disk
  • Octopus Deploy: SQL Schemas, Also server configs
  • Half of team in NZ, Half in Denver
    • Data Engineers, Infrastructure Engineers, Team Lead, Product Owner
  • Where we were – The Burning Platform
    • Changed mid-Migration from dedicated instances to dedicated Hosts in AWS
    • Big saving on software licensing
  • Advantages
    • Already had Clustered HA
    • Existing automation
    • 6 day team, 15 hours/day due to multiple locations of team
  • Migration had to have no downtime
    • Went with node swaps in cluster
  • Split team. Half doing migration, half creating code/system for the node swaps
  • We learnt
    • Dedicated hosts are cheap
    • Dedicated host automation not so good for Windows
    • Discovery service not so good.
    • Syncing data took up to 24h due to large dataset
    • Powershell debugging is hard (moving away from powershell a bit, but powershell has lots of SQL server stuff built in)
    • AWS services can timeout, allow for this.
  • Things we Built
    • Lots Step Templates in Octopus Deploy
    • Metadata Store for SQL servers – Dynamite (Python, Labda, Flask, DynamoDB) – Hope to Open source
    • Lots of PowerShell Modules
  • Node Swaps going forward
    • Working towards making this completely automated
    • New AMI -> Node swap onto that
    • Avoid upgrade in place or running on old version
Share

Linux.conf.au 2017 – Friday – Closing

Code of Consult and Safety

  • Badge
    • Putting prefered pronoun
    • Emoji
  • Free Childcare
    • Sponsored by Github
    • Approx 10 kids
  • Assistance Grants
  • Attendees
    • Breakdown by gender etc
    • Roughly 25% of attendees and speakers not men
  • More numbers
    • 104 Matrix chat users
    • 554 attendees
    • 2900 coffee cups
    • Network claimed to 7.5Gb/s
    • 1.6 TB over the week, 200Mb/s max
    • 30 Session Chairs
    • 12 Miniconfs
    • 491 Proposals (130 more than the others)
    • 6 Tutorials, 75 talks, 80 speakers
    • 4 Keynote speakers
    • 21 Sponsors

Linux.conf.au 2018 – Sydney

  • A little bit of history repeating
  • 2001, 2007, 2018
  • Venue is UTS
  • 5 minutes to food, train station
  • https://lca2018.org
  • @lca2018 on twitter
  • Looking for a few extra helpers

Raffle

  • In support of Outreachy
  • 3 interns funded

Final Bit

  • Thanks to team members

 

 

Share