Linux.conf.au 2018 – Day 5 – Keynote – Jess Frazelle

Keynote: Containers aka crazy user space fun

  • Work at Microsoft on Open Source and containers, specifically on kubernetes
  • Containers vs Zones vs Jails vs VMs
  • Containers are not a first class concept in the kernel.
    • Namespaces
    • Cgroups
    • AppArmour in LSM (prevent mounting, writing to /proc etc) (or SELinux)
    • Seccomp (syscall filters, which allowed or denied) – Prevent 150 other syscalls which are uncommon or dangerous.
      • Got list from testing all of dockerhub
      • eg CLONE, UNSHARE
      • NoNewPrivs (exposed as “AllowPrivilegeEsculation” in K8s)
      • rkt and systemd-nspawn don’t 100% follow
  • Intel Clear containers are really VMs

History of Containers

  • OpenVZ – released 2005
  • Linux-Vserver (2008)
  • LXC ( 2008)
  • Docker ( 2013)
    • Initially used LXC as a backend
    • Switched to libcontainer in v0.7
  • lmctfy (2013)
    • By Google
  • rkt (2014)
  • runc (2015)
    • Part of Open container Initiative
  • Container runtimes are like the new Javascript frameworks

Are Containers Secure

  • Yes
  • and I can prove it
  • VMs / Zones and Jails are like all the Lego pieces are already glued togeather
  • Containers you have the parts seperate
    • You can turn on and off certain namespaces
    • You can share namespaces between containers
    • Every container in k8s shares PID and NET namespaces
    • Docker has sane defaults
    • You can sandbox apps every further though
  • https://contained.af/
    • No one has managed to break out of the container
    • Has a very strict seccomp profile applied
    • You’d be better off attacking the app, but you are still running a containers default seccomp filters

Containerizing the Desktop

  • Switched to runc from docker (had to convert stuff)
  • rootless containers
  • Runc hook “netns” to do networking
  • Sandboxed desktop apps, running in containers
  • Switch from Debian to CoreOS Container Linux as base OS
    • Verify the integrity of the OS
    • Just had to add graphics drivers
    • Based on gentoo, emerge all the way down

What if we applied the the same defaults to programming languages?

  • Generate seccomp filters at build-time
    • Previously tried at run time, doesn’t work that well, something always missed
    • At build time we can ensure all code is included in the filter
    • The go compiler writes the assembly for all the syscalls, you can hijack and grab the list of these, create a seccomp filter
    • No quite that simply
      • plugins
      • exec external stuff
      • can directly exec a syscall in go code, the name passed in via arguments at runtime
  • metaparticle.io
    • Library for cloud-native applications

Linux Containers in secure enclaves (SCONE)

  • Currently Slow
  • Lots of tradeoffs or what executes where (trusted area or untrsuted area)

Soft multi-tenancy

  • Reduced threat model, users not actively malicious
  • Hard Multi-tenancy would have potentially malicious containers running next to others
  • Host OS – eg CoreOs
  • Container Runtime – Look at glasshouse VMs
  • Network – Lots to do, default deny in k8s is a good start
  • DNS – Needs to be namespaced properly or turned off. option: kube-dns as a sidecar
  • Authentication and Authorisation – rbac
  • Isolation of master and System nodes from nodes running containers
  • Restricting access to host resources (k8s hostpath for volumes, pod security policy)
  • making sure everything else is “very dumb” to it’s surroundings

 

Share