The Opposite of the Cloud – Tom Eastman
- Korinates Data gateway – an appliance onsite at customers
- Requirements
- A bootable images ova, AMI/cloud images
- Needs network access
- Sounds like an IoT device
- Opoossite of cloud is letting somebody outsource their stuff onto your infrastructure
- Tom’s job has been making a nice and tidy appliance
- What does IoT get wrong
- Don’t do updates, security patches
- Don’t treat network as hostile
- Hard to remotely admin
- How to make them secure
- no default or static credentials
- reduce the attack surface
- secure all networks comms
- ensure it fails securely
- Solution
- Don’t treat appliances like appliances
- Treat like tightly orchestrated Linux Servers
- Stick to conserative archetecture
- Use standard distribution like Debian
- You can trust the standard security updates
- Solution Components
- aspen: A customized Debian machine image built with Packer
- pando: orchestration server/C&C network
- hakea: A Django/Rest microservice API in charge
- saltstack command and control
- Normal orchestration stuff
- Can works as a distributed command execution
- The minions on each server connect to the central node, means you don’t need to connect into a remote appliance (no incoming connections needed to appliance)
- OpenVPN as Internet transport
- Outgoing just port 443 and openvpn protocol. Everything else via OpenVPN
- What is the Appliance
- A lightly mangled Debian Jessie VM image
- Easy to maintain by customer, just reboot, activate or reinstall to fix any problems.
- Appliance is running a bunch of docker containers
- Appliance authentication
- Needs to connect via 443 with activation code to download VPN and Salt short-lived certificates to get started
- Auth keys only last for 24 hours.
- If I can’t reach it it kills itself.
- Hakea: REST control
- Django REST framework microservices
- Self documenting using DRF amd CoreAPI Schema
- DevOps Principals apply beyonf the cloud
Inventory Management with Pallet Jack – Karl-Johan Karlsson
- Goals
- Single source of truth
- Version control
- Scaleable (to around 1000 machines, 10k objects)
- Stuff stored as just a file structure
- Some tools to access
- Tools to export, eg to kea DHCP config
- Tools as post-commit hooks for git. Pushes out update via salt etc
- Various Integrations
- API
- Salt
Continuous Dashboard – You DevOps Airbag – Christopher Biggs
- Dashboard traditionally targeted at OPs
- Also need to target Devs
- KPIs and
- Sales and Support need to know everything to
- Management want reassurance, Shipping a new feature, you have a hotline to the CEO
- Customer, do you have something you are ashamed of?
- Take notice of load spikes
- Assume customers errors are being acted on, option to notify then when a fix happens
- What is relivant to support call, most recent outages affecting this customer
- Remember recent behavour of this customer
- What kinds of data?
- Tradditionally: System load indicators, transtion numbers etc
- Now: Business Goals, unavoidable errors, spikes of errors, location of errors, user experience metrics, health of 3rd party interfaces, App and product reviews
- What should I put in dashboards
- Understand the Status-quo
- Continuously
- Look at trends over time and releases
- Think about features holisticly
- How to get there
- Like you data as much as your code
- Experiment with your data
- tools: nodered.org, blynk.cc, elastic
- Insert Dashboards into your dev pipeline
- Code Review, CI, Unit Test, Confirm that alarms actually work via test errors
- Automate deployment
- Tools
- ELK – off the shelf images, good import/export
- Node-RED – Flow based data processing, nice visual editor, built in dashboarding
- Blynk – Nice dashboards in Ios or Android. Interactive dashboard editor. Easy to share
- Social Media integration
- Receive from twitter, facebook, apps stores reviews
- Post to slack and monitoring channels
- Forward to internal groups
The Sound of Silencing – Julien Goodwin
- Humans know to ignore “expected” alerts during maintenance
- Hard to know what is expected vs unexpected
- Major events can lead to alert overload
- Level 1 – Turn it all off
- Can work on small scale
- Level 2 – Turn off a localtion while working on it.
- What if something happens while you are doing the work?
- May work with single-service deployments
- Level 3 – Turn off the expect alerts
- Hard to get exactly right
- Level 4 – Change mngt integration
- Link the generator up to th change mngt automation system
- What about changes too small to track?
- What about changes too big for a simple silence?
- Level 5 – Inhibiting Alerts
- Use Service level indigations to avoid alerts on expected failures
- Fire “goes nowhere” alert
- Level 6 – Global monitoring and preventing over-siliencing
- Alert if too many sites down
- Need to have explicit alerts to spot when somebody silences “*”
- How to get there from here
- Incrementally
- Choose a bad alert and change it to make it better
- Regularly