Linux.conf.au 2015 – Day 2 – Session 3 – Sysadmin

Alerting Husbandry – Julien Goodwin

  • Obsolete alerts
    • New staff members won’t have context to know was is obsolete and should have been removed (or ignorened)
  • Unactionable alerts – It is managed by another team but thought you’d like to be woken up
  • SLA Alerts – can I do something about that?
  • Bad thresholds ( server with 32 cores had load of 4 , that is not load ), Disk space alerts either too much or not enough margin
  • Thresholds only redo after complete monitoring rebuilds
  • Hair trigger alerts ( once at 51ms not 50ms )
  • Not impacting redundancy ( only one of 8 web servers is down )
  • Spamming alerts, things is down for the 2925379857 time. Even if important you’ve stopped caring
  • Alerts for something nobody cares about, eg test servers
  • Most of earlier items end up in “don’t care” bucket
  • Emails bad, within a few weeks the entire team will have a filter to ignore it.
  • Undocumented alerts – If it is broken, what am I supposed to do about it?
  • Document actions to take in  “playbook”
  • Alert acceptance practice, only oncallers should e accepting alerts
  • Need a way to silence it
  • Production by Fiat

 

 

Managing microservices effectively – Daniel Hall

  • Step one – write your own apps
  • keep state outside apps
  • not nanoservices, not milliservices
  • Each should be replaceable, independantly deployable , have a single capability
  • think about depandencies, especially circular
  • Packaging
    • small
    • multiple versions on same machine
    • in dev and prod
    • maybe use docker, have local registry
    • Small performance hit compared to VMs
    • Docker is a little immature
  • Step 3 deployment
    • Fast in and out
    • Minimal human interaction
    • Recovery from failures
    • Less overhead requires less overhead
    • We use Meso and marathon
    • Marathon handles switches from old app to new, task failure and recover
    •  Early on the Hype Cycle
  • Extra Credit Sceduling
    • Chronos within Mesos
    • A bit newish

 

Corralling logs with ELK – Mark Walkom

  • You don’t want to be your bosses grep
  • Cluster Elastisearch, single master at any point
  • Sizing best to determine with single machine, see how much it can hadle. Keep Java heap under 31GB
  • Lots of plugins and clients
  • APIs return json. ?pretty makes it looks nicer. The ” _cat/* ” api is more command line
  • new node scales, auto balancers and grows automatic
  • Logstash. lots of filters, handles just about any format, easy to setup.
  • Kibana – graphical front end for elastisearch
  • Curator, logstash-forwarder, grokdebugger

FAI — the universal deployment tool – Thomas Lange

  • From power off to applications running
  • It is all about installing software packages
  • Central administration and control
  • no master or golden image
  • can be expanded by hooks
  • plan your installation and FAI installs the plan
  • Boot up diskless client via PXE/tftp
  • creates partitions, file systems, installs, reboots
  • groups hosts by classes, mutiple classes per host etc
  • Classes can be executables, writeing to standard output, can be in shell, pass variables
  • partitioning, can handle LVM, RAID
  • Projected started in 1999
  • Supports debian based distributions including ubuntu
  • Supports bare metal, VM, chroot, LiveCD, Golden image

 

Documentation made complicated – Eric Burgueno

  • Incomplete, out of date, inconsistent
  • Tools – Word, LibreOffice  -> Sharepoint
  • Sharepoint = lets put this stuff over here so nobody will read it ever again
  • txt , markdown, html. Need to track changes
  • Files can be put in version control.
  • Mediawiki
  • Wiki – uncontrolled proliferation of pages, duplicate pages
  • Why can’t documentation be mixed in with the configuration management
  • Documentation snippits
    • Same everywhere (mostly)
    • Reusable
  • Transclusion in mediawiki (include one page install another)
  • Modern version of mediawiki have parser functions. display different content depending on a condition
  • awesomewiki.co
Share

Linux.conf.au 2015 – Day 2 – Session 2 – Sysadmin Miniconf

Mass automatic roll out of Linux with Windows as a VM guest – Steven Sykes

  • Was late and missed the start of the talk

etcd: distributed locking and service discovery – Brandon Philips

  • /etc distributed
  • open source, failure tolerant, durable, watchable, exposed via http, runtime configurable
  • API – get/put/del  basics plus some extras
  • Applications
    • Locksmith, distributed locks used when machines update
    • Vulcan http load balancer
  • Leader Election
    • TTL and atomic operations
    • Magical stuff explained faster than I can type it.
    • Just one leader cluster-wide
  • Aims for consistence ahead of raw performance

 

Linux at the University – Randy Appleton

  • No numbers on how many students use Linux
  • Peninsula Michigan
  • 3 schools
  • Michigan Tech
    • research, 7k students, 200CS Students, Sysadmin Majors in biz school
    • Linux used is Sysadmin courses, one of two main subjects
    • Research use Linux “alot”
    • Inactive LUG
    • Scripting languages. Python, perl etc
  • Northern Michigan
    • 9k students, 140 CS Majors
    • Growing CIS program
    • No Phd Programs
    • Required for sophomore and senior network programming course
    • Optional Linux sysadmin course
    • Inactive LUG
    • Sysadmin course: One teacher, app of the week (Apache, nfs, email ), shell scripting at end, big project at the end
    • No problem picking distributions, No problem picking topics, huge problem with desperate incoming knowledge
    • Kernel hacking. Difficult to do, difficult to teach, best students do great. Hard to teach the others
  • Lake Superior State
    • 2600 students
    • 70 CS Majors
    • One professor teaches Sysadmin and PHP/MySQL
    • No LUG
    • Not a lot of research
  • What is missing
    • Big power Universities
    • High Schools – None really
    • Community college – None really
  • Usage for projects
    • Sometimes, not for video games
  • Usage for infrastructure
    • Web sites, ALL
    • Beowuld Clusters
    • Databases – Mostly
  • Obstacles
    • Not in High Schools
    • Not on laptops, not supported by Uni
    • Need to attract liberal studies students
    • Is Sysadmin a core concept – not academic enough
  • What would make it better
    • Servers but not desktops
    • Not a edu distribution
    • Easier than Eclispe , better than visual studio

Untangling the strings: Scaling Puppet with inotify – Steven McDonald

  • Around 1000 nodes at site
  • Lots of small changes, specific to one node that we want to happen quickly
  • Historically restarting the puppet master after each update
  • Problem is the master gets slow as you scale up
  • 1300 manifests, takes at least a minute to read each startup
  • Puppet internal caching very coarse, per environment basis (and they have only one prod one)
  • Multiple environments doesn’t work well at site
  • Ideas – tell puppet exactly what files have changed with each rollout (via git, inotify). But puppet doesn’t support this
  • I missed the explan of exactly how puppet parses the change. I think it is “import” which is getting removed in the future
  • Inotify seemed to be more portable and simpler
  • Speed up of up to 5 minutes for nodes with complex catalogs, 70 seconds off average agent run
  • implementation doesn’t support the future parser, re-opening the class in a seperate file is not supported
  • Available on github. Doesn’t work with current ruby-inotify ( in current master branch )

 

 

Share

Linux.conf.au – Day 2 – Session 1 – Sysadmin Miniconf

Configuration Management – A love Story – Javier Turegano

  • June 2008 – Devs want to deploy fast
  • June 2009 – git -> jenkins -> Puppet master
  • But things got pretty complicated and hard to maintain
  • Remove puppet master, puppet noop, but only happens now and then lots of changes but a couple of errors
  • Now doing manual changes
  • June 2010 – Thngs turned into a mess.
  • June 2011 – Devs want prod-like development
  • Cloud! Tooling! Chef! – each dev have their own environment
  • June 2012 – dev environments for all working in ec2
  • dev no longer prod-like. cloud vs datacentre, puppet vs chef , debian vs centos, etc
  • June 2013 – More into cloud, teams re-arranged
  • Build EC2 images and deploy out of jenkins. Eaither as AMI or as rpm
  • Each team fairly separate, doing thing different ways. Had guilds to share skills and procedures and experience
  • June 2014 – Cloudformation, Ansible used by some groups, random

Healthy Operations – Phil Ingram

  • Acquia – Enterprise Drupal as a service. GovCMS Australian Federal Government. 1/4 are remote
  • Went from working in office to working from home
  • Every week had phone call with boss
  • Talk about thing other than with work, ask home people are going, talk to people.
  • Not sleep, waking up at night, not exercising, quick to anger and negative thinking, inability to concentrate
  • Hadn’t taken more than 1 week off work, let exercise work, hobbies was computer stuff
  • In general being in Ops not as much of an option to take time off. Things stay broke until fix
  • Unable to learn via Osmosis, Timing of handing over between shifts
  • People do not understand that computers are run by people not robots
  • Methods: Turn work off at the end of the day, Rubber Ducking, exercise

Developments in PCP (Performance Co-Pilot) : Nathan Scott

  • See my slides from yesterday for intro to PCP
  • Stuff in last 12 months
    • Included in supported in RHEL 6.6 and RHEL 7
    • Regular stable releases
    • Better out of the box experience
    • Tackling some long-standing problems
  • JSON access – pmwebd , interactive web charts ( Graphite, grafana )
  • zero-install look-inside containers
  • Docker support but written to allow use by others
  • Collectors
    • Lots of new kernel metrics additions
    • New applications from web devs (memcached, DNS, web )
    • DB server additions
    • Python PMDA interfaces
  • Monitor work
    • Reporting tools
    • Web tools, GUIs
  • Also improving ease of setup
  • Getting historical data from sar, iostat
  • www.pcp.io

Security options for container implementations – Jay Coles

  • What doesn’t work: rlimits, quotas, blacklisting via ACLs
  • Capabilities: Big list that containers probably shouldn’t have
  • Cgroups – Accounting, Limiting resource usage, tracking of processes, preventing/allowing device access
  • App Armor vs selinux – Use at least one, selinux a little more featured
Share

Linux.conf.au – Day 2 – Keynote by Eben Moglen

Last spoke 10 years ago in Canberra Linux.conf.au

Things have improved in the last ten years

  • $10s of billions of value have been lost in software patent war
  • But things have been so bad that some help was acquired, so worst laws have been pushed back  a little
  • “Fear of God” in industry was enough to push open Patent pools
  • Judges determined that Patent law was getting pathological, 3 wins in Supreme court
  • Likelihood worst patent laws will be applied against free software devs has decreased
  • “The Nature of the problem has altered because the world has altered”

The Next 10 years

  • Most important Patent system will be China’s
  • Lack of rule of law in China will cause problems in environment of patents
  • Too risky for somebody too try and stop a free software project. We have “our own baseball bat” to spring back at them

The last 10 years

  • Changes in Society more important changes in software
  • 21st century vs 20th century social organisations
    • Less need for hierarchy and secrecy
    • Transparency, Participation, non-hierarchical interaction
  • OS invented that organisation structure
  • Technology we made has taken over the creation of software
  • “Where is BitKeeper now?” – Eben Moglen
  • Even Microsoft reorganises that our way of software making won
  • Long term the organisation structure change everywhere will be more important than just it’s application in Software
  • If there has been good news about politics = “we did it”, bad news = “we tried”

Our common Values

  • “Bridge entire environment between vi and emacs”

Snowden

  • Without PGP and free software then things could have been worse
  • The world would be a far more despotic place if PGP was driven underground back in 1993. Imagine today’s Net without HTTPS or SSH!
  • “We now live in the world we are afraid of”
  • “What stands between them and us is our inventions”
  • “Freedom itself depends on how we make use of the technologies we are creating.” – Eben Moglen
  • “You can’t trust what you can’t read”
  • Big power in the wrong is committed against the first law of robotics, they what technology to work for it.
  • From guy in twitter – “You can’t trust what you can’t read.” True, but if OpenSSL teaches us anything you can’t necessarily trust what you can
  • Attitudes in under-18s are a lot more positive towards him than those who are older (not just cause he looks like Harry Potter)
  • GNU Project is 30 years old, almost same age is Snowden

Oppertunity

  • We can’t control the net but opportunity to prevent others from controlling it
  • Opportunity to prevent failure of freedom
  • Society is changing, demographics under control
  • But 1.6 billion people live in China, America is committed to spying, consumer companies are committed to collecting consumer information
  • Collecting everything is not the way we want the net to work
  • We are playing for keeps now.

 

 

Share

Linux.conf.au – Day 1 – Session 3 – Containers

Building a PaaS with Docker, Kubernetes, and Hard Work – Steven Pousty

  • Slides – bit.ly/1AFGACa
  • All about Openshift
  • So why a new Paas?
  • Project Atomic – stripped down RHEL install, everything else as a container. ostree file system, same kernel as RHEL
  • Kubernetes intro
    • Kubernetes Daemon – Routing for services
    • Sceduler etc
  • Openshift
    • Built-in software defined networking – OpenVSwith , HAPRoxy load balancing etc
  • Takeaway
    • PAAS seems to be cool again

 

Galera with Docker: How Synchronous Replication and Linux Containers Mesh Together – Raghavendra Prabhu

  • I got lost in the talk

 

Cloud, Containers, and Orchestration Panel –  Katie Miller

  • Steven Pousty , Bran Philips ,
    Tycho Andersen
    Tycho Andersen
    Tycho Andersen
    Tycho Andersen

    Tycho Andersen

  • Standard is Dockers to lose and they might manage it
  • 3-4 years before we should standardise them. Need to experiment first.
  • The kernel API imposes some limits on diversity
  • Lots of other stuff

 

Share

Linux.conf.au 2015 – Day 1 – Session 2 – Containers

AWS OpsWorks Orchestration War Stories – Andrew Boag

  • Autoscaling too slow since running build-from-scratch every time
  • Communications dependencies
  • Full stack rebuild in 20-40 minutes to use data currently in production
  • A bit longer in a different region
  • Great for load testing
  • If we ere doing again
    • AMI-based better
    • OPSWorks not suitable for all AWS stacks
    • Golden master for flexable
  • Auto-Scaling
    • Not every AMI instance is Good to Go upon provisioning
    • Not a magic bullet, you can’t broadly under-provision
    • needs to be throughly load-tested
  • Tips
    • Dual factor authentication
    • No single person / credentials should be able to delete all cloud-hosted copies of your data
  • Looked at Cloudformation at start, seemed to be more work
  • Fallen out of love with OpsWorks
  • Nice distinction by Andrew Boag: he doesn’t talk about “lock-in” to cloud providers, but about “cost to exit”.   – Quote from Paul

 

Slim Application Containers from Source – Sven Dowideit

  • Choose a base image and make a local version (so all your stuff uses the same one)
  • I’d pick debian (a little smaller) unless you can make do with busybox or scratch
  • Do I need these files? (check though the Dockerfile) eg remove docs files, manpages, timezones
  • Then build, export, import and it comes all clean with just one layer.
  • If all your images use same base, only on the disk once
  • Use related images with all your tools, related to deployment image but with the extra dev, debug, network tools
  • Version the dev images
  • Minimise to 2 layers
    • look at docker-squash
    • Get rid of all the sourc code from your image, just end up with whats need, not junk hidden in layers
  • Static micro-container nginx
    • Build as container
    • export as tar , reimport
    • It crashes 🙁
    • Use inotifywait to find what extra files (like shared libraries) it needs
    • Create new tarball with those extra files and “docker import” again
    • Just 21MB instead of 1.4GB with all the build fragments and random system stuff
    • Use docker build as last stage rather than docker import and you can run nginx from docker command line
    • Make 2 tar files, one for each image, one in libs/etc, second is nginx

 

Containers and PCP (Performance Co-Pilot) –  Nathan Scott

  • Been around for 20+ years, 11 years open source, Not a big mindshare
  • What is PCP?
    • Toolkit, System level analysis, live and historical, Extensible, distributed
    • pmcd daemon on each server, plus for various functions (bit of like collectd model)
    • pmlogger, pmchart, pmie, etc talk (pull or poll) to pmcd to get data
  • With Containers
    • Use –container=  to grab info inside a container/namespace
    • Lots of work still needed. Metrics inside containers limited compared to native OS

 

The Challenges of Containerizing your Datacenter – Daniel Hall

  • Goals at LIFX
    • Apps all stateless, easy to dockerize
    • Using mesos, zookeeper, marathon, chronos
    • Databases and other stuff outside that cloud
  • Mesos slave launches docker containers
  • Docker Security
    • chroot < Docker < KVM
    • Running untrusted Docket containers are a BAD IDEA
    • Don’t run apps as root inside container
    • Use a recent kernel
    • Run as little as possible in container
    • Single static app if possible
    • Run SELinux on the host
  • Finding things
    • Lots of micoroservices, marathon/mesos moves things all over the place
    • Whole machines going up and down
    • Marathon comes with a tool that pushes it’s state into HAProxy, works fairly well, apps talk to localhost on each machines and haproxy forwards
    • Use custom script for this
  • Collecting Logs
    • Not a good solution
    • can mount /dev/log but don’t restart syslog
    • Mesos collects stdout/stderror , hard to work with and no timestamps
    • Centralized logs
    • rsyslog log to 127.0.0.1 -> haproxy -> contral machine
    • Sometimes needs to queue/drop if things take a little while to start
    • rsyslog -> logstash
    • elasticsearch on mesos
    • nginx tasks running kibana
  • Troubleshooting
    • Similar to service discover problem
    • Easier to get into a container than getting out
    • Find a container in marathon
    • Use docker exec to run a shell, doesn’t work so well on really thin containers
    • So debugging tolls can work from outside, pprof or jsonsole can connect to exposed port/pid of container
Share

Linux.conf.au 2015 – Day 1 – Session 1 – Containers

Clouds, Containers, and Orchestration Miniconf

 

Cloud Management and ManageIQ – John Mark Walker

  • Who needs management – Needs something to tie it all together
  • New Technology -> Adoption -> Proliferation -> chaos -> Control -> New Technology
  • Many technologies follow this, flies under the radar, becomes a problem to control, management tools created, management tools follow the same pattern
  • Large number of customers using hybrid cloud environment ( 70% )
  • Huge potential complexity, lots of requirements, multiple vendors/systems to interact with
  • ManageIQ
    • Many vendor managed open source products fail – open core, runt products
    • Better way – give more leeway to upstream developers
    • Article about taking it opensource on opensource.com. Took around a year from when decision was made
    • Lots of work to create a good open source project that will grow
    • Release named after Chess Grandmasters
    • Rails App

 

LXD: The Container-Based Hypervisor That Isn’t –  Tycho Andersen

  • Part of Openstack
  • Based on LXC , container based hypervisor
  • Secure by default: user namespaces, cgroups, Apparmor, etc
  • A EST API
  • A daemon that doesn’t hypervisory things
  • A framework for maintaining container based applications
  • It Isn’t
    • No network configuration
    • No storage management – But storage aware
    • Not an application container tool
    • handwavy difference between it and docker, I’m sure it makes sense to some people. Something about running an init/systemd rather than the app directly.
  • Features
    • Snapshoting – eg something that is slow to start, snapshot after just starts and deploy it in that state
    • Injection – add files into the container for app to work on.
    • Migration – designed to go fairly fast with low downtime
  • Image
    • Public and private images
    • can be published
  • Roadmap
    • MVP 0.1 released late January 2015
    • container management only

 

Rocket and the App Container Spec – Brandon Philips

  • Single binary – rkt – runs everywhere, systemd not required
  • rkt fetch – downloads and discovers images ( can run as non-root user )
  • bash -> rkt -> application
  • upstart -> rkt -> application
  • rkt run coreos.com/etcd-v2.3.1
  • multiple processes in container common. Multiple can be run from command line or specified in json file of spec.
  • Steps in launch
    • stage 0 – downloads images, checks it
    • Stage 1 – Exec as root, setup namespaces and cgroups, run systemd container
    • Stage 2 – runs actual app in container. Things like policy to restart the app
    • rocket-gc garbage collects stuff , runs periodicly. no managmanent daemon
  • App Container spec is work in progress
    • images, files, compressed, meta-data, dependencies on other images
    • runtime , restarts processes, run multiple processes, run extra procs under specified conditions
    • metadata server
    • Intended to be built with test suite to verify
Share

Links: Efficient Software, West Wing history, $20k houses, Damn boomers

Share

Another run in with the Electoral Commission

After already having trouble Electoral Commission banning photography in polling places I now get a threatening email from them.

Yesterday I made this Tweet:

 

and today I get the following email

Subject: Electoral Commission complaint – London exit poll posted on Twitter account

Dear Simon,

The Electoral Commission has received a complaint with regard to an exit poll being taken and then published on the Twitter account of @slyall. We understand that this is your Twitter account.

Under section 197(1)(d) of the Electoral Act 1993, it is an offence to conduct a public opinion poll of persons who have voted (exit polls). Section 197(1)(d) states:

197 Interfering with or influencing voters
(1) Every person commits an offence and shall be liable on conviction to a fine not exceeding $20,000 who at an election—
(d) at any time before the close of the poll, conducts in relation to the election a public opinion poll of persons voting before polling day

In order to assist the Commission in considering this complaint, could you please provide the following information:

1.         Who conducted the exit poll and when was it conducted?
2.         How did you receive this information?
3.         Any other information you believe to be of relevance to the Commission’s consideration.
4.         How you might remedy this matter.

Can you please provide the above information by 5pm, Friday 19 September 2014. In the first instance, to avoid further complaints, you may wish to remove the Twitter post.

Please telephone me if you wish to discuss this further.

Update

I replied with:

I saw this:
http://www.reddit.com/r/newzealand/comments/2gidem/kiwi_did_exit_polling_out
side_london_embassy_note/

and copied it to twitter.

I have no further knowledge of the photo or poll or the people who took it
or even if it actually took place.

and did nothing else. A couple of days later he emailed me with.

Thanks for getting back to us. The Commission understands that the original
tweet in respect of the exit poll has been removed and the Commission is not
taking any further action on the matter.

Thanks again for prompt reply which was much appreciated.

which was a little strange since neither me nor anybody else had removed anything. A little weird and one reason I don’t feel confident with these guys running voting over the Internet.

Share