simon – Page 23 – Simon Lyall's Blog

Linux.conf.au 2017 – Tuesday Keynote – Pia Waugh

BTW: Conference Streams are online at linux.conf.au/stream

The Future of Humans – Pia Waugh

At a tipping point, we can’t reinvent everything or just do the past with shinny new things.

Started as a Sysadmin, helped her see things as Systems

Trying to make active choices about the future we want,

Started building tools, knowledge spread slowly
Created cities, people could specialise, knowledge faster
Surplus created, much went to rulers, sometimes rulers overthrown, but hierarchy started the same
More recently the surplus has got given to people
Last 250 years, people have seen themselves as having power, change their future, not just be a peasant.
As resources have increased power and resources have been distributed more widely
This has kept expanding, – overthrown you boss at work
We are on the cusp on a massive skyrocket in quality of live

Citizens have powers now that we previously centralized
We are now in a time of suplus not scaricity
Small groups and individual can now disrupt a country, industry or company
We made up all of our society, we can make it again to reflect the present not what was needed in the past.
Choose our own adventure or let others choose it for us. We have the option now that we didn’t previously
Most people’s eyes glaze over when they here that.
“You can’t do that” say many people when they find out what software can do.
People switch off their creativity when they come to work.

How Could the World be better

Property
- 3D printing could print organs, food, just about anything
- Why are we protecting business models that are already out of date (eg copyright) when we couple use them to eliminated scarcity
Work and Jobs
- Everybody is scared about technology taking jobs
- What do we care about the lose of jobs
- Why is the value of a person defined by a full-time jobs?
Transhumanism
- tatoos, peicing have been around forever
- Obsession with the human “normal” , is this a recent thing from the media?
- Society encourages people towards the Norm
- Internet has demonstrated that not everybody is normal – Rule 34
- “If you lose a leg, instead of getting a replacement leg, whey not have seven legs?”
- Anyone who doesn’t make our definition of Normal is seen as something less even if they have amazing abilities
Spaceships
- Still takes a day to get around the planet
- If we are going to set up new worlds how are they going to run?
Global Citizenship
- People are seen though the lens of their national citizenship
- Governments are not the only representative of our rights

“How can we build a better world? Luckily we have git”
We have the power and knowledge to do things, but not all people do
If you are as powerful as the tools you use, where does that leave people who can’t use computers or program?

Systemic Change
- What doesn’t you Doctor say about “scratching your itch” ?
- Example: “diversity” , how do we deal with the problems that led us to not having it.
Who are you building for? Not building for?
What is the default position in society? Is it to no get knowledge, power?
What does human mean to you
Waht do we value
What assumptions and bias do you have?
How are you helping non-geeks help themselves
What future do you want to see?

How are Systems changing? How do out policies, assumptions laws reflect the older way?
- Scarcity -> Surplus
- Close -> Open
- Centralise -> Distributed
- Belief -> Rationalism
- Win/Lose -> Cooperative competitive
- Nationalism -> World Citizen
- Normative Human -> Formative Human
I believe the Open Source Culture is a good model for society
But in Inventing the future we have to be careful not to drag the legacy systems and values from the past.

2017 SysAdmin Miniconf – Session 3

Turtles all the way down – Thin LVM + KVM tips and Tricks – Steven Ellis

ssd -> partition -> encryption -> LVM -> [..] -> filesystem
Lots of examples see the online Slides
https://github.com/steven-ellis/ansible-playpen

Samba and the road to 100,000 user – Andrew Bartlett

Release cycle is every 6 months
Samba 4.0 is 4 years p;d
4.2 and older are out of security support by Samba team (support by distros sometimes)
Much faster adding users to AD DC. 55k users added in 50 minutes
Performance issues, not bugs, are now the biggest area of work
- Customer deploying SAMBA at scale
Looking for Volunteers running AD will to run a tshark script
- What does your busy hour look like?
- What is the pattern of requests?

The School for Sysadmins Who Can’t Timesync Good and Wanna Learn To Do Other Stuff Good Too – Paul Gear

Aim is 1-10ms accuracy
Using Standard Linux reference distribution etc
Why care
- Same apps need time sync
- Log matching
Network Time Foundation needs support
NTP
- Not widely understood
- Unglamorous
- Daunting documentation
- old protocol, chequered secrity history
- The first Google result may not be accurate
Set clock
- step – jump clock to new time
- slew – gradually adjust the time
NTP Assumption
- The is one true time – UTC
- Nobody really has it
- bad time servers may be present
- networks change

I ran out of power on my laptop at this point so not many more notes. Paul gave a very good set of recommendations and myth-busting for those running NTP though. His notes will be online on the Sysadmin Miniconf site and he has also posted more detail online.

2017 Sysadmin Miniconf – Session 2

Running production workloads in a programmable infrastructure – Alejandro Tesch

Talk was demos of openstack, Scipts are online here:
https://github.com/gatesch/ansible

Managing performance parameters through systemd – Sander van Vugt

Mostly Demos in this talk too.
Using CPUShare parameter as an example
systemd-cgtop and systemd-cgls
“systemctl show stress1.service” will show available parameters
“man 5 systemd.resource-control” gives a lot more details.

Go for DevOps – Caskey L. Dickson

SideBar: The Platform Wars are over
- Hint: We all won
- As long as have an API we are all cool
Always builds staticly linked binaries, should work on just about any Linux system. Just one file.
Built in cross compiler (eg for Windows, Mac) via just enviroment variable “GOOS=darwin” and 32bit “GOARCH=32”
Bash is great, Python is great, Go is better
Microservices are Services
No Small Systems
- Our Scripts are no longer dozens of lines long, they are thousands of lines long
- Need full software engineering
Sysops pushing buttons and running scripts are dying
Platform Specific Code
- main_linux.go main_windows.go and compiler find.
- // +build linux darwin <– At the top of the file
“Once I got my head around channels Go really opened up for me”

2017 Sysadmin Miniconf – Session 1

The Opposite of the Cloud – Tom Eastman

Korinates Data gateway – an appliance onsite at customers
Requirements
- A bootable images ova, AMI/cloud images
- Needs network access
- Sounds like an IoT device
Opoossite of cloud is letting somebody outsource their stuff onto your infrastructure
Tom’s job has been making a nice and tidy appliance
What does IoT get wrong
- Don’t do updates, security patches
- Don’t treat network as hostile
- Hard to remotely admin
How to make them secure
- no default or static credentials
- reduce the attack surface
- secure all networks comms
- ensure it fails securely
Solution
- Don’t treat appliances like appliances
- Treat like tightly orchestrated Linux Servers
Stick to conserative archetecture
- Use standard distribution like Debian
- You can trust the standard security updates
Solution Components
- aspen: A customized Debian machine image built with Packer
- pando: orchestration server/C&C network
- hakea: A Django/Rest microservice API in charge
saltstack command and control
- Normal orchestration stuff
- Can works as a distributed command execution
- The minions on each server connect to the central node, means you don’t need to connect into a remote appliance (no incoming connections needed to appliance)
- OpenVPN as Internet transport
- Outgoing just port 443 and openvpn protocol. Everything else via OpenVPN
What is the Appliance
- A lightly mangled Debian Jessie VM image
- Easy to maintain by customer, just reboot, activate or reinstall to fix any problems.
- Appliance is running a bunch of docker containers
Appliance authentication
- Needs to connect via 443 with activation code to download VPN and Salt short-lived certificates to get started
- Auth keys only last for 24 hours.
- If I can’t reach it it kills itself.
Hakea: REST control
- Django REST framework microservices
- Self documenting using DRF amd CoreAPI Schema
DevOps Principals apply beyonf the cloud

Inventory Management with Pallet Jack – Karl-Johan Karlsson

Goals
- Single source of truth
- Version control
- Scaleable (to around 1000 machines, 10k objects)
Stuff stored as just a file structure
Some tools to access
Tools to export, eg to kea DHCP config
Tools as post-commit hooks for git. Pushes out update via salt etc
Various Integrations
- API
- Salt

Continuous Dashboard – You DevOps Airbag – Christopher Biggs

Dashboard traditionally targeted at OPs
Also need to target Devs
- KPIs and
Sales and Support need to know everything to
Management want reassurance, Shipping a new feature, you have a hotline to the CEO
Customer, do you have something you are ashamed of?
- Take notice of load spikes
- Assume customers errors are being acted on, option to notify then when a fix happens
- What is relivant to support call, most recent outages affecting this customer
- Remember recent behavour of this customer
What kinds of data?
- Tradditionally: System load indicators, transtion numbers etc
- Now: Business Goals, unavoidable errors, spikes of errors, location of errors, user experience metrics, health of 3rd party interfaces, App and product reviews
What should I put in dashboards
- Understand the Status-quo
- Continuously
- Look at trends over time and releases
- Think about features holisticly
How to get there
- Like you data as much as your code
- Experiment with your data
- tools: nodered.org, blynk.cc, elastic
Insert Dashboards into your dev pipeline
- Code Review, CI, Unit Test, Confirm that alarms actually work via test errors
- Automate deployment
Tools
- ELK – off the shelf images, good import/export
- Node-RED – Flow based data processing, nice visual editor, built in dashboarding
- Blynk – Nice dashboards in Ios or Android. Interactive dashboard editor. Easy to share
Social Media integration
- Receive from twitter, facebook, apps stores reviews
- Post to slack and monitoring channels
- Forward to internal groups

The Sound of Silencing – Julien Goodwin

Humans know to ignore “expected” alerts during maintenance
- Hard to know what is expected vs unexpected
- Major events can lead to alert overload
Level 1 – Turn it all off
- Can work on small scale
Level 2 – Turn off a localtion while working on it.
- What if something happens while you are doing the work?
- May work with single-service deployments
Level 3 – Turn off the expect alerts
- Hard to get exactly right
Level 4 – Change mngt integration
- Link the generator up to th change mngt automation system
- What about changes too small to track?
- What about changes too big for a simple silence?
Level 5 – Inhibiting Alerts
- Use Service level indigations to avoid alerts on expected failures
- Fire “goes nowhere” alert
Level 6 – Global monitoring and preventing over-siliencing
- Alert if too many sites down
- Need to have explicit alerts to spot when somebody silences “*”
How to get there from here
- Incrementally
- Choose a bad alert and change it to make it better
- Regularly

Linux.conf.au 2017 – Conference Opening

Wear SunScreen
Karen Sandler introduces Outreachy and it is announced as the raffle cause for 2017
Overview of people
- 462 From Aus
- 43 from NZ
- 62 From USA
- Lots of other countries
- Gender breakdown lots of no answers so a stats a bit rough
Talks
- 421 Proposals
- 80-ish talks and 6 tutorials
- Questions
  - Please ask questions during the question time
Looking for Volunteers – look at a session and click to signup
Keynotes – A quick profile
All the rooms are booked till 11pm! for BOF sessions
Lightning talks, Coffee, Lunch, dinners

Passengers vs “50 Girls 50”

Spoilers: Minor for Passengers, Major for 50 Girls 50.

In late 2016 the movie “Passengers” came out staring Jennifer Lawrence and Chris Pratt. The movie is set aboard a sleeper spaceship and the plot centers around the two leads characters waking up early. I won’t say more about movie but there is summary of the plot in the wikipedia entry for the movie. You can compare it to the comic below to see the similarities and differences.

When I first saw the trailer it reminded me of a Sci-Fi comic I read years ago, others noticed it was similar and gave a name of the comic as “50 Girls 50” by Al Williamson. I couldn’t find a summary of short story so I thought I’d write it up here.

50 Girls 50 by Al Williamson – Plot summary

The story is a 6 page comic with one off characters originally published in 1953. It is set in the distant future aboard a spaceship making humanity’s first journey to a nearby star. Since the trip will take 100 years the the crew/passengers of 50 women and 50 men (hence the title) will be frozen for the whole journey. However the freezing technology used only works on a person once, if you attempt to refreeze somebody they will die.

The plot of the story is partially told though flashbacks but I’ll tell it is chronological order.

The main character is Sid who before the voyage starts is attracted to one of the other passengers Wendy. Wendy notices his attraction and they get together. After a time Wendy has proposition for him. She suggest that Sid sabotage the Deep-freeze (D-F) units so that he wakes up early. He can then wake her up and they can wake up the others one at a time and “make them our slaves”

Sid however as his own idea. What he wants to do is just have a series of girlfriends. He’ll set his clock for two years into the voyage. Then he will wake up Wendy and live with he for a while, when he gets tired of Wendy he will get rid of her and move to the next girl and so on.

Once the voyage starts things go to Sid’s plan. He thaws out 2 years in but instead of waking up Wendy he decided to thaw out Laura first. He then pretends to Laura that they both accidentally thawed out.

“Almost a year” later he gets tired and Laura, shoots her with a “Paralyzer” gun and stuffs her back in a Freeze-chamber to die.

He then prepares to wake Wendy. First he sets the Ships clock to say they will reach the destination in 3 years to give him enough time to get tired of Wendy. Things don’t go according to plan however when Wendy wakes up:

Not really a happy ending for anyone, although it is not like Sid or Wendy really deserved one.

Donations 2016

Like last year I am doing all my charity donations at once and blogging about it. The theory with doing it all at once is that is it more efficient and less impulsive, while blogging about it might encourage others to do similarly. Note that all amounts are in $US

I found one downside of doing it all at once (especially around midnight) is that my bank suspended my card for suspicious activity. All sorted out with a quick phone call though.

Once more this year I gave the majority of my money to those charities recommended by Givewell. This year instead of spreading my donation evenly among the top charities I followed their recommendations ( See right sidebar on the link above ).

$400 Against Malaria Foundation
$200 Schistosomiasis Control Initiative

Next were a series of Open Source projects, trying to concentrate on software I use:

$30 Debian
$30 Python
$30 Gnome

And some tech content or advocacy groups

Additionally I gave some money to MSF via a campaign by Zeynep Tufekci highlighting Yemen

$30 Remembering Yemen’s Children and supporting Doctors Without Borders

Hoping to do the same again next year, feel free to recommend other organizations you think might be a good place for me to donate towards. I’m thinking about

DevOpsDays Wellington 2016 – Day 2, Session 3

Ignites

Mrinal Mukherjee – How to choose a DevOps tool

Right Tool
– Does the job
– People will accept

Wrong tool
– Never ending Poc
– Doesn’t do the job

How to pick
– Budget / Licensing
– does it address your pain points
– Learning cliff
– Community support
– API
– Enterprise acceptability
– Config in version control?

Central tooling team
– Pro standardize, educate, education
– Constant Bottleneck, delays, stifles innovation, not in sync with teams

DevOps != Tool
Tools != DevOps

Tools facilitate it not define it.

Howard Duff – Eric and his blue boxes

Physical example of KanBan in an underwear factory

Lindsey Holmwood – Deepening people to weather the organisation

Note: Lindsey presents really fast so I missed recording a lot from the talk

His Happy, High performing Team -> He left -> 6 months later half of team had left

How do you create a resilient culture?

What is culture?
– Lots of research in organisation psychology
– Edgar Schein – 3 levels of culture
– Artefacts, Values, Assumptions

Artefacts
– Physical manifestations of our culture
– Standups, Org charts, desk layout, documentation
– actual software written
– Easiest to see and adopt

Values
– Goals, strategies and philosophise
– “we will dominate the market”
– “Management if available”
– “nobody is going to be fired for making a mistake”
– lived values vs aspiration values (People have good nose for bullshit)
– Example, cores values of Enron vs reality
– Work as imagined vs Work is actually done

Assumptions
– beliefs, perceptions, thoughts and feelings
– exist on an unconscious level
– hard to discern
– “bad outcomes come from bad people”
– “it is okay to withhold information”
– “we can’t trust that team”
– “profits over people”

If we can change our people, we can change our culture

What makes a good team member?

Trust
– Vulnerability
– Assume the best of others
– Aware of their cognitive bias
– Aware of the fundamental attribution error (judge others by actions, judge ourselves by our intentions)
– Aware of hindsight bias. Hindsight bias is your culture killer
– When bad things happen explain in terms of foresight
– Regular 1:1s
Eliminate performance reviews
Willing to play devils advocate

Commit and acting
– Shared goal settings
– Don’t solutioneer
– Provide context about strategy, about desired outcome
What makes a good team?

Influence of hiring process
– Willingness to adapt and adopt working in new team
– Qualify team fit, tech talent then rubber stamp from team lead
– have a consistent script, but be prepared to improvise
– Everyone has the veto power
– Leadership is vetoing at the last minute, thats a systemic problem with team alignment not the system
– Benefit: team talks to candidate (without leadership present)
– Many different perspectives
– unblock management bottlenecks
– Risk: uncovering dysfunctions and misalignment in your teams
– Hire good people, get out of their way

Diversity and inclusion
– includes: race, gender, sexual orientation, location, disability, level of experience, work hours
– Seek out diverse candidates.
– Sponsor events and meetups
– Make job description clear you are looking for diverse background
– Must include and embrace differences once they actually join
– Safe mechanism for people to raise criticisms, and acting on them

Leadership and Absence of leadership
– Having a title isn’t required
– If leader steps aware things should continue working right
– Team is their own shit umbrella
– empowerment vs authority
– empowerment is giving permission from above (potentially temporary)
– authority is giving power (granting autonomy)

Part of something bigger than the team
– help people build up for the next job
– Guilds in the Spotify model
– Run them like meetups
– Get senior management to come and observe
– What we’re talking about is tech culture

We can change tech culture
– How to make it resist the culture of the rest of the organisation
– Artefacts influence behaviour
– Artifact fast builds -> value: make better quality
– Artifact: post incident reviews -> Value: Failure is an opportunity for learning

Q: What is a pre-incident review
A: Brainstorm beforehand (eg before a big rollout) what you think might go wrong if something is coming up
then afterwards do another review of what just went wrong

Q: what replaces performance reviews
A: One on ones

Q: Overcoming Resistance
A: Do it and point back at the evidence. Hard to argue with an artifact

Q: First step?
A: One on 1s

Getting started, reading books by Patrick Lencioni:
– Solos, Politics and turf wars
– 5 Dysfunctions of a team

DevOpsDays Wellington 2016 – Day 2, Session 2

Troy Cornwall & Alex Corkin – Health is hard: A Story about making healthcare less hard, and faster!

Maybe title should be “Culture is Hard”

@devtroy @4lexNZ

Working at HealthLink
– Windows running Java stuff
– Out of date and poorly managed
– Deployments manual, thrown over the wall by devs to ops

Team Death Star
– Destroy bad processes
– Change deployment process

Existing Stack
– VMware
– Windows
– Puppet
– PRTG

CD and CI Requirements
– Goal: Time to regression test under 2 mins, time to deploy under 2 mins (from 2 weeks each)
– Puppet too slow to deploy code in a minute or two. App deply vs Conf mngt
– Can’t use (then) containers on Windows so not an option

New Stack
– VMware
– Ubuntu
– Puppet for Server config
– Docker
– rancher

Smashed the 2 minute target!

But…
– We focused on the tech side and let the people side slip
– Windows shop, hard work even to get a Linux VM at the start
– Devs scared to run on Linux. Some initial deploy problems burnt people
– Lots of different new technologies at once all pushed to devs, no pull from them.

Blackout where we weren’t allowed to talk to them for four weeks
– Should have been a warning sign…

We thought we were ready.
– Ops was not ready

“5 dysfunctions of a team”
– Trust as at the bottom, we didn’t have that

Empathy
– We were aware of this, but didn’t follow though
– We were used to disruption but other teams were not

Note: I’m not sure how the story ended up, they sort of left it hanging.

Pavel Jelinek – Kubernetes in production

Works at Movio
– Software for Cinema chains (eg Loyalty cards)
– 100million emails per month. million of SMS and push notifications (less push cause ppl hate those)

Old Stack
– Started with mysql and php application
– AWS from the beginning
– On largest aws instance but still slow.

Decided to go with Microservices
– Put stuff in Docker
– Used Jenkins, puppet, own docker registery, rundeck (see blog post)
– Devs didn’t like writing puppet code and other manual setup

Decided to go to new container management at start of 2016
– Was pushing for Nomad but devs liked Kubernetes

Kubernetes
– Built in ports, HA, LB, Health-checks

Concepts in Kub
– POD – one or more containers
– Deployment, Daemon, Pet Set – Scaling of a POD
– Service- resolvable name, load balancing
– ConfigMap, Volume, Secret – Extended Docker Volume

Devs look after some kub config files
– Brings them closer to how stuff is really working

Demo
– Using kubectl to create pod in his work’s lab env
– Add load balancer in front of it
– Add a configmap to update the container’s nginx config
– Make it public
– LB replicas, Rolling updates

Best Practices
– lots of small containers are better
– log on container stdout, preferable via json
– Test and know your resource requirements (at movio devs teams specify, check and adjust)
– Be aware of the node sizes
– Stateless please
– if not stateless than clustered please
– Must handle unexpected immediate restarts

DevOpsDays Wellington 2016 – Day 2, Session 1

Jethro Carr – Powering stuff.co.nz with DevOps goodness

Stuff.co.nz
– “News” Website
– 5 person DevOps team

Devops
– “Something you do because Gartner said it’s cool”
– Sysadmin -> InfraCoder/SRE -> Dev Shepherd -> Dev
– Stuff in the middle somewhere
– DevSecOps

Company Structure drives DevOps structure
– Lots of products – one team != one product
– Dev teams with very specific focus
– Scale – too big, yet to small

About our team
– Mainly Ops focus
– small number compared to developers
– Operate like an agency model for developers
– “If you buy the Dom Post it would help us grow our team”
– Lots of different vendors with different skill levels and technology

Work process
– Use KanBan with Jira
– Works for Ops focussed team
– Not so great for long running projects

War Against OnCall
– Biggest cause of burnout
– focus on minimising callouts
– Zero alarm target
– Love pagerduty

Commonalities across platforms
– Everyone using compute
– Most Java and javascript
– Using Public Cloud
– Using off the shelf version control, deployment solutions
– Don’t get overly creative and make things too complex
– Proven technology that is well tried and tested and skills available in marketplace
– Classic technologist like Nginx, Java, Varnish still have their place. Don’t always need latest fashion

Stack
– AWS
– Linux, ubuntu
– Adobe AEM Java CMS
– AWS 14x c4.2xlarge
– Varnish in front, used by everybody else. Makes ELB and ALB look like toys

How use Varnish
– Retries against backends if 500 replies, serve old copies
– split routes to various backends
– Control CDN via header
– Dynamic Configuration via puppet

CDN
– Akamai
– Keeps online during breaking load
– 90% cache offload
– Management is a bit slow and manual

Lamda
– Small batch jobs
– Check mail reputation score
– “Download file from a vendor” type stuff
– Purge cache when static file changes
– Lamda webapps – Hopefully soon, a bit immature

Increasing number of microservices

Standards are vital for microservices
– Simple and reasonable
– Shareable vendors and internal
– flexible
– grow organicly
– Needs to be detail
– 12 factor App
– 3 languages Node, Java, Ruby
– Common deps (SQL, varnish, memcache, Redis)
– Build pipeline standardise. Using Codeship
– Standardise server builds
– Everything Automated with puppet
– Puppet building docker containers (w puppet + puppetstry)
– Std Application deployment

Init systems
– Had proliferation
– pm2, god, supervisord, systemvinit are out
– systemd and upstart are in

Always exceptions
– “Enterprise ___” is always bad
– Educating the business is a forever job
– Be reasonable, set boundaries

More Stuff at
– http://tinyurl.com/notclickbaithonest

Q: Pull request workflow
A: Largely replaced traditional review

Q: DR eg AWS outage
A: Documented process if codeship dies can manually push, Rest in 2*AZs, Snapshots

Q: Dev teams structure
A: Project specific rather than product specific.

Q: Puppet code tested?
A: Not really, Kinda tested via the pre-prod environment, Would prefer result (server spec) testing rather than low level testing of each line
A: Code team have good test coverage though. 80-90% in many cases.

Q: Load testing, APM
A: Use New Relic. Not much luck with external load testing companies

Q: What is somebody wants something non-standard?
A: Case-by-case. Allowed if needed but needs a good reason.

Q: What happens when automation breaks?
A: Documentation is actually pretty good.