Is that a Cloud in you packet – Steven Ellis
- What if you could have a demo of a stack on a phone
- or on a memory stick or a mini raspberry-pi type PC
- Nested Virtualisation
- Hardware
- Using Linux as host env, not so good on Win and Mac
- Thinkpad, fedora or Centos, 128GB SSD
- Nested Virtualisation
- Huge perforance boost over qemu
- Use SSD
- enable options in modules kvm-intel or kvm-amd
- Confirm SSD perf 1st – hdparm -t /dev/sdX
- Create base env for VMs, enable vmx in features
- Make sure it uses a different network so doesn’t badly interact with ones further out
- Think LVM
- Creat ethin pool for all envs
- Think on lvm ” issue_discards = 1 “
- Base image
- Doesn’t have to be minimal
- update the base regularly
- How do you build your base image?
- Thin may go weirdly wrong
- Always use kickstart to re-create it.
- Think of your use case, don’t skim on the disk (eg 40G disk for image)
- ssh keys, Enable yum cache
- Patch once kicked
- keep a content cache, maybe with rsync or mrepo
- Turn off VM and hen use fsrim and partx to make it nice and smaller.
- virt-manager can’t manage thin volumes, DONT manually add the path
- use virsh to manually add the path.
- snapshots or snapshots great performance on SSD
- Thin longer activates automatically on distros
- packstack simple way to install simple openstack setup
- LVM vs QCOW
- qcow okay for smaller images
- cloud-init with atomic
- do not snapshot a qcow image when it is running
Revisiting Unix principles for modern system automation – Martin Krafft
- SSH Botnet
- OSI of System Automation
- Transport unix style, both push and pull
- uses socat for low level data moving
- autossh <- restarts ssh connection automatically
- creates control socket
A Gentle Introduction to Ceph – Tim Serong
- Ceph gives a storage cluster that is self healing and self managed
- 3 interfaces, object, block, distributed fs
- OSD with files on them, monitor nodes
- OSD will forward writes to other replics of the data
- clients can read from any OSD
- Software defined storage vs legacy appliances
- Network
- Fastest you can, seperate public and cluster networks
- cluster fatsre than public
- Nodes
- 1-2G ram per TB of storage
- read recomendations
- SSD journals to cache writes
- Redundancy
- Replications – capacity impact but usually good performance
- Erasure coding – Like raid – better space efficiency but impact in most other areas
- Adding more nodes
- tends to work
- temp impact during rebalancing
- How to size
- understand you workload
- make a guess
- Build a 10% pilot
- refine to until perf is achieved
- scale up the the pilot
Keeping Pinterest running – Joe Gordon
- Software vs service
- No stable versions
- Only one version is live
- Devs support their own service – alligns incentives, eg monitoring built in
- Testing against production traffic
- SRE at Pinterest
- Like a pit crew in F1
- firefighting at scale
- changing tires while moving
- Operation Maturity
- Operation Excellence
- Have the best practices, docs, process, imporvements
- Repeatable deploys
- Visability
- data driven company
- Lots of Time series data – TSDB
- Using ELK
- Deployments
- no impact to end user
- easy to do, every few minutes
- Canary vs Staging
- Send dark (copies) of traffic to canary box without sending anything back to user
- Bounce back to starting if problems
- Teletran
- Rollback, hotfix, rolling deploy, starting and testing, visibility and useability
- client-server model
- pre/post download, restart, etc scripts included with every deployment
- puase/resume various testing
- Postmortums and Production readyness reviews
- Cloud is not infinite, often will hit AWS capacity limits or even no avaialble stuff in the region
- Need to be able to make sure you know what you are running and if it i seffecintly used
- Open sourced tools
- mysql_utils – lots of tools to manage many DBs
- Thrift tools
- Teletraan – open sourced in Feb 2016
- github.com/pinterest