Everything Open 2025 – Day 2 – Afternoon

I skipped a couple of talks to do Hallway track and other stuff

Koha – not your average library system by Aleisha Amohia

  • Name because software was made open source as a gift to the community
  • Started in 1999
  • First fully web-based opensource library system
  • Bugs and external patches soon after
  • Customizable and Configurable
  • Used in 18,000+ libraries
  • It is just a big database
    • Can be used as not just a library system
    • Can be used to catalog other stuff at organisations other than libraries like documents
  • Configurable via CSS, fonts, languages, CMS, feature toggles, etc
  • Customisable views for each branch are possible
  • Special Beyond the code
    • Offline circulation
    • Supports non-ascii characters
    • Translation capability
  • Is it harder to find people to work on stuff since it is writter in perl which is effectively a legacy language? – Has a good onboarding and support for devs and things still work
  • What are challengers with it being open source? –
    • People worry about quality of OSS. Fix: Good robust quality procedures
    • Think it is free – Have good support that is worth paying for
    • DB backend – MySQL and MariaDB

The circle of life: The Digital Skills GitBook project by Sara King

  • Working on project for last 5 years that is in the process of winding up
  • tinyurl.com/5539zzpx <- more information
  • Starting early 2019
  • 5 years later project is coming to the end of a natural cycle
  • Context
    • Group of 60 libraries looking for projects – CAUL Digital Librarians
    • Is there a book that teaches modern not-quite-technical computer skills?
  • With Pandemic lockdowns everybody started working from home
  • Why Gitbook?
    • “Book” is in the term helped
    • Similar project using github etc
    • CAUL eventually went Pressbooks, but not till later
    • Also qualified for free version
    • Learning git was a useful thing
  • Did the community really need this? – Wasn’t checked in detail, but seemed a cool idea
  • Happened at start of pandemic
    • Everyone online
    • Supportive community was good at start of pandemic
  • Took some courses in git and other tools
  • Did a prototype book on another subject to get the hang of the tech
  • “Gave ourselves permission to not know what we were doing”
  • Created chapters of the books to give outline
    • Each Chapter had 3 levels of knowledge in it. Novice, proficient, advanced
  • Went public in late-2021
  • Also did code of conduct, license, contributions guidelines
  • Told people about it via various methods
  • Worked to get people to contribute ad-hoc
  • But didn’t get the amount of contributions they were expecting
  • In 2023 University libraries having problems, budgets shrinking etc
    • People leaving or too busy
    • Some used experience on the project to get new more technical jobs
  • No new people joining to replace those leaving
  • 2025 reflecting on the project
  • Process and product are different
  • We equated enthusiastic about the idea and the process. But didn’t join in or wasn’t super into the product
  • Not shared a lot or got many hits
  • Goal of training people to create stuff was a big success
  • People gained lots of confidence with new tech
  • Support of CAUL was great, but no longer availbale
  • Next? – If people like the process maybe we should talk about that
  • Create a roadmap for other projects
  • Hand it over to somebody else? Doesn’t seem to be interest
Share

Everything Open 2025 – Day 2 – Morning

Skill Trees: Gamifying The Hard Things by Steph Piper

  • A list of skills
  • Each area has a series of skills that can be colored in.
  • Design
    • Hexagons are good
    • Can be done in any order, hard to connect meaningfully
    • Simple, flexable milestones
  • Reception
    • First on was 3d printing & modeling
    • Tested on makerspace student staff members. Good to identify gaps
  • Benefits
    • Reduce imposter syndrome or on the other size overconfidence
    • Target areas for improvement
  • Online on git – https://github.com/sjpiper145/MakerSkillTree
  • How to make a skill tree
    • Flexibility, not too cost restrictive, globally applicable
    • Peer reviewed
    • Final skill tree and translation
  • Book – The Learning Game by Ana Lorena Fabrega
  • Beta testing book of a collection of these skills.
    • Good published through “Make: Magazine”
    • 68 tiles per tree, 1020 skill tiles in the book
  • Tips for writing
    • Continue to evolve and improve
    • Do own illustrations was huge time saver from the publisher
    • Confidence in your work. The publisher will only do the final publishing
  • Looking to fill the gaps
  • Working on a kids version of the book

The Token Wars: Why not everything should be open by Kathy Reid

  • The Token Wars
    • A resource conflict fought through technical, social and legal means
  • What is a token?
    • An atomic unit of text taken from a larger collection called a corpus
    • text -> subwords tokens -> vectorization
    • Transformer architecture
    • Word embeddings capture semantic closeness of words
  • Scaling up to billions of tokens
    • Train the relationships between tokens based on all the text
  • The value of tokens and token economics and the actors in the token wars
    • Are the a public good?
    • No the are rivalrous either excludable or non-excludable
    • LLMs in 2024 were trained on 4 orders of magnitude data than 5 years ago.
    • Estimated 60-160 trillion tokens on the public web and some LLMs are trained on close to all of those
    • Synthetic Data especially low quality slop is polluting the Internet
    • Scrapers pick this up and train on it, concern about Model Collapse ( like a photocopy of a photocopy). Reduces the diversity of what it will produce.
  • Key actors in the token wars
  • Individual content creators
    • Included in corpus without permission
  • Platforms with user-generated content
    • Seeking to get paid for their content ( eg Reddit deal with OpenAI )
  • Archival Institutions
    • Australian National Film and Sound Archive: Maintain Trust, Transparancy, Create Public Value
  • Private Companies
    • Anthropic: Model Context Protocol
  • The AI Companies
    • Have used fair-use. Although some countries don’t have those
    • Companies blocking the common crawl
  • Governments
    • Having trouble balancing interests
  • Token Tactics – Protecting your token treasure
    • Data poisoning
    • Blocking bots and scrapers
  • Data Sovereignty
  • Futures
    • Hunt for more tokens
    • Better ways to block/prevent
    • Better understanding of the alateral damage of the resource conflicts
Share

Everything Open 2025 – Day 1 – Afternoon

The Storage Shift by Steven Ellis

  • Storage Data is critical for business
  • Requirements are always growing
  • Organisations already have existing solutions and relationships
  • Three Dimensions of data
  • Participants ( dev, ops, product ) all have different requirements and views
  • Where did you first store your data?
    • As spinning drives have gotten smaller the capacity has increased
    • Now people have small local storage and storage is not directly attached
  • Storage platforms / API driven storage
    • Block vs Files vs Object
  • Options for Kubernetes storage.
    • CSI operates on all levels
    • Able to create an destroy storage at kubernetes speed rather than waiting for storage admin (or even cloud storage API)
  • Workload Examples
    • Kubevirt and Kubernetes centric but applicable elsewhere
  • What about prosumer
    • Be careful with clouds except as backups
    • zfs and btrfs
    • Stephen uses TrueNas
    • 3 copies of all data. RAID isn’t a backup

What happened in production?! Instrumenting with OpenTelemetry by David Bell

  • A sample problem
    • Microservice based system
    • What happened in Production?
    • Errors up high, response time went bad
  • What about the logs?
    • 200s and then 500s . What does that mean?
  • Kept happening at 2pm every day. Sometimes bad, sometimes worse
  • O11y and OpenTelemetry
    • Find the internal state of a system just by asking questions
  • What about metrics
    • Pre-aggregated, No “connective tissue”, Can’t drill down
    • Answering known questions, good for alarms, graphs and dashboards
    • known-knowns and known-unknowns
  • What about Logs?
    • unstructured strings
    • Many logs lines per piece of work. Maybe with a request-id but not often
    • no schema or index So can be quite slow to parse
    • structured logs sometimes work
    • expensive to store yourself or pay to have stored
    • But we should still log – audit logging and security logging
  • Tracing is good
    • separate tooling from logs and metrics
    • often limited fields
    • often limited traces to even look at ( just the bad ones)
  • OpenTelemetry
    • covers metrics, logs and traces
    • wide language support and auto-instrumentation out of the box
    • Easy to get started
    • wrappers and external hooks
    • distributed tracing
  • Otel Traces
    • Traces are Directed Acyclic Graphs ( DAGs) of Spans
    • Spans are sort of structured logs with required firlds
    • Spans contain many attributes
    • Attributes can have high cadinality
    • Spans have high dimensionality
  • Otel isn’t for everything
    • Don’t put you secret data
    • Maybe not business logic
    • no guarantee on delivery ( sometimes traces get lost )
    • No for secuity/audit loggin
  • Sampling can be useful
    • head-based sampling ( based on head at start )
    • rule-based/tail-based grabs all and keeps some that are interesting
  • Setup ( for python ) – no code changes
    • install a couple of packages. One to gather, one to send
    • send in some env variables
    • Change docker run command to wrap your existing code
  • Setup (code changes )
    • Import packages
    • Shove attributes into a span in code (see example code in talk)
  • Demo of App (using Honeycomb)

Please don’t forget my parents! – Digital Exclusion is happening, so you all better know about it by Sae Ra Germaine

  • Various Background Stuff
  • Her Parents retired to rural property near outer suburb of Melbourne
  • Two phone lines
  • Mobile reception only available standing outside of the house
  • Wireless point-to-point wireless. Approx 1Mb/s but vulnerable to animals chewing through it
  • NBN
    • Originally was going to be Fiber to the premises.
    • Then got cheaper and fiber-to-the-curb or fiber-to-the-node and copper rest of the way
    • Today 98% on NBN but not everybody well connected
    • Parents land line got cut off regular due to errors
    • Then 3G got cut-off. 4G at parents place doesn’t really work
  • Digital Divide
    • Everything is now all online ( jobs, doctors, social services )
    • Satellite based Internet a lot more expensive than comparable options in cities
    • During covid lockdowns they were over 5km from various services which was a problem with movement restrictions
  • Libraries had to pivot during lockdows
    • wifi hotspots outside, accepting deliveries
    • Mobile libraries provide access to government services
    • Various other stuff on libraries

Open source voice interfaces in 2025 by Kit Biggs

  • Big changes in the last 12 months
  • AI has zoomed past inflated expectations and is now in the trough of disillusionment
  • Where are we with conversation user interfaces
  • What are the steps/software needed for this?
  • Get the sound
    • Digital microphones are good and do the first rough filtering
  • Is somebody actually speaking?
    • xiao_respeaker – example software project
  • Wake word recognisers
    • Commercial software work with a “wake word” ( Hey Siri )
    • Used to be hard to do, now easier
  • Word recognition just looks for specific words
    • Getting better
  • Contentious voice recognition
    • Also better
  • Intent recognition
    • Usually hooked in with communication to outside world
  • Feedback
    • Speech Synthesis is pretty much a solve problem
  • Looking at software you can use. Not cloud based
  • Wake Word
    • Picovoice Porcupine ( non commercial or licensed ) . 16 languages
    • OpenWakeWord
      • Great docs
      • Trains on Synthetic speech
      • More than good enough
  • Speech to Text
    • OpenAI Whisper was leader
    • Lots of new ones. Look at Moonshine
  • Text to Speech
    • Piper is the stand-out, actively developed
    • Others mostly good for english-only
    • Emotional synthesis is getting better
  • Hardware
    • Raspberry Pi 4 or 5
      • 5 has ability to plugin an accelerator
    • Rockchip Arm64 with neural coprocessor
    • AI in A Box ( Radxa Rock 5A)
  • Voice on a Microcontroller, the time has arrived
  • ESP32 processor is the most common option – $10 each
    • Dev board plus microphone maybe for $20 or so
    • Can do the wakeword stuff and then stream audio to something with more spec
  • How small can you go?
    • What can you do with a small board just by itself?
    • Speech recognition on micro-controller not there yet but phrase and wake word recognition works
  • Glasses display looking almost there
    • Can have microphones
    • Avoid cameras to avoid privacy concerns
Share

Everything Open 2025 – Day 1 – Morning

Keynote: Sustaining Open Source Software by Justin Warren

Good talk. Advise you have a watch it on Video. Good thoughts on the economics of Open Source

Sandboxing untrusted code with WebAssembly by Katie Bell

  • Works for MongoDB. Webscale!
  • Untrusted Code
  • Example Shopify
    • Supports 3rd party apps
    • What happens when 3rd-party apps goes offline and is used by a lot of stores
    • What if slow and inserts itself into customer flow making experience bad
    • Decided to hosted 3rd party apps in their cloud to provide better reliability
    • Shopify decided to go with webassembly
  • Some alternatives for sandboxing
    • Small VM like firecracker – 4MB memory, 125ms startup
    • Docker – Using Shared Kernel still
    • V8 Isolates – Used to isolating processes within a chrome tab. Cloudflare runs many workers in a process, startip 5ms
    • But not fair comparison. Lots of tradeoffs on how secure vs speed vs flexability
  • Webassembly
    • Designed to compile big apps to run in a browser (eg photoshop)
    • Is a compile Target – .wasm binary
    • Originally designed to usually be called from javascript ( in browser )
    • Is a tiny simulated computer, very locked down, can’t interact with anything outside. Can just provide and call functions
    • When you build compiler will usually create a javascript wrapper to make it easier to use so you don’t have to call wasm directly.
  • WASI
    • An API lets you run webassembly programs as regular programs
    • wasmtime – program to run .wasm directly
    • Keeps things sandboxed but can’t optionally provide with with a very limited set of stuff that must be explicitly provided
  • Sandboxing Webassembly in the real world
    • Shopify use this. See their docs and definitions
    • Firefox and Graphite font shaping library
      • Compiled from native code into wasm to ensure memory safety rather than audit or re-write in rust
  • Is it secure?
    • Sometimes. But WASI is built with holes intentionally so can have bugs
    • Wasmtime has a lot of work put into sandboxing though
    • Use multiple layers of security
  • WASI standard is in progress ( webassembly itself is fairly stable )

80% faster, 70% less memory: building a new high-performance, low-cost Prometheus query engine by Joshua Hesketh, Charles Korn

  • Works at Grafana Labs on Mimir database
  • Explains time-series database. (Name+Labels)+time+number
  • Talk covers the query app which turns promql requests into a result
  • Memory used by the old software was bouncing, had to be over-provisioned which wastes money or sends back errror to use if runs out of memory.
  • Prometheus Promql engine has little room for extensions
  • Problem
    • Prom promql engines loads the entire series into memory before processing it further
    • Fix would require a new new rewrite.
    • Which they did
  • MQE engine
    • Loads a bunch of samples and then streams to operator(s). Then repeats a bit at a time
    • Will fallback to Prometheus engine of function is not yet implimented
    • Very efficient on range queries
  • He explained memory allocation strategy using pooling. I got a little lost
    • “That was a very oversimplified example”
  • query-tee
    • Send queries to two different engines and ensure they return the same result for testing
    • Has test group for data that can run this over as well as live queries. Might to fizzy query testing in future
  • Engine is available and can be switched in via command line
    • Does fall-back if things are not implemented
    • Implements the most common queries (above 90% of actual request)

Share

Everything Open 2024 – Day 3 talks

Keynote: Intelligent Interfaces: Challenges and Opportunities by Aaron Quigley

  • Eye Tracking of the user
    • DiffDisplays – Eye tracking and when you looked away from a screen it frooze it. When you looked back it gave you a summary/diff of what you missed
    • Bought this down to the widget level, a widget got notification when user looking or away and could decide what to do
  • Change Blindness (different from attention blindness)
    • When phone far away simplify phone interface, more detail when closer
    • People don’t see details of displays slowly fading in and out as distance from display changed
  • Phone on table, screen up or screen down
    • SpeCam – Facedown screen can have light and detect what it is sitting on. Guess material it is sitting on
    • Accuracy same/better than a proper spectrometer
  • MicroCam – Phone placed with screen face up
    • Placement aware computing
  • OmniSense
    • 360 Camera
    • Track what the user’s whole body is doing
    • Tracks what is happening all around the user. Danger sensors, context aware output
  • BreathIn control. Breath pattern to control phone
    • User camera in a watch potion to detect handle gestures (looking at top/back of hand)
  • RotoSwype – Smart ring to do gesture keyboard input
  • RadarCat – Radar + Categorization
    • More Socially acceptable that cameras everywhere and always on
    • Used to detect material
    • Complex pattern of reflection and absorption that returns lots of information
    • Trained on 661 feature and 512 bins
    • Radar signal can ever detect different colours. Different dyes interact differently
    • Can detect if people are wearing gloves
    • Application – Scales at self-checkout supermarket to detect what is being weighed
    • Radar in shoe can recognise the surface and layers below (carpet on weed etc)

Passwordless Linux – Passkey and External IdP support in FreeIPA by Fraser Tweedale

  • Passwords
    • Users are diligent (weak reuse)
    • Using passwords securely imposes friction and cognitive load
    • Phishable
  • Objectives – Reduce password picking risks, phishing, friction,frequency of login
  • Alternatives
    • 2FA, Smartcard, Passkeys / WebAuthn, Web SSO Providers
  • 2FA
    • HOTP / TOTP etc
    • phishable
  • Smart Cards
    • Phishing Resistant
  • Passkeys
    • Better versions of MFA Cards
    • Phishing resistant
    • “passkey” term is a little vague
  • Web SSO
    • SAML, OAuth2
    • Using an existing account to authenticate
    • Some privacy concern
    • Keycloak, Redhat SSO, Okta, Facebook
    • Great on the web, harder in other context
  • What about our workstations?
    • pam has hooks for most of the above (Web SSO less common) or pam_sss does all
  • FreeIPA / Red Hat Identity Management
  • DEMO

Locknote: Who gets to work in STEM? And who is being left out? by Rae Johnston

  • Poor diversity affects the development of AI
  • False identification much higher by facial recognition for non-white people
  • Feed the AI more data sets?
  • Bias might not even be noticed if the developers are not diverse
  • Only around 25% of STEM people are Women
  • Only 15% of UK scientist came from Working Class backgrounds (35% of the population)
  • 11% of Australians don’t have access to affordable Internet or don’t use it.
  • The digital divide is narrowing but getting deeper. Increasing harder to function if you are not online
  • Male STEM graduates are 1.8x more likely to be in jobs that required the array than women. Mush worse for indigenous people

Lightning Talks

  • Creating test networks with Network Namespace
    • ip netns add test-lan
  • Rerap Micron
  • Haystack Storage System
    • Time-bases key/value store
  • AgOpenGPS
    • Self Steering System for Tractors
  • Common Network Myths
    • End to end packet loss is the only thing that matters
    • Single broadcast domain is a SPOF, broadcast storms etc
    • Ping and ICMP is your friend. Please allow ping
    • Don’t force 1500 MTU
    • Asymmetric routing is normal
    • non-standard port number doesn’t make you secure
  • radio:console for remote radio
  • WASM
    • FileSender – Share large datasets over the Internet
Share

Everything Open 2024 – Day 2 talks

Keynote: How Adversaries Use AI by Jana Dekanovska

  • Adversary
    • Nation States
    • Ecrime
    • Hactivism
  • Trends
    • High Profile Ecrime attacks – Ransomware -> Data extortion
    • Malware-Free Attacks – Phish, Social engineering to get in rather than malware
    • Cloud Consciousness
    • Espionage – Focuses in Eastern Europe and Middle East
    • Vulnerability Exploitation – Not just zero days, Takes while to learn to leverage vuls
    • Cloud Consciousness – Adversary knows they are in the cloud, have to operate in it.
  • Generative AI
    • Code Generation
    • Social Engineer – Help people sound like Native Speakers, improve wording
    • Prompt Injection
  • Big Four States sponsoring attacks – China, North Korea, Iran, Russia
  • North Korea – Often after money
  • Russia, Iran – Concentrating on local adversaries
  • China
    • 1m personal in Cyber Security
    • Get as much data as possible
  • Elections
    • Won’t be hacking into voting systems
    • Will be generating news, stories, content and targeting populations
  • Crime Operations
    • GenAI helps efficiency and Speed of attacks
    • Average Breakout time faster from 10h in 2018 to 1h now
    • Members from around the world, at leats one from Australia
    • Using ChatGPT to help out during intrusions to understand what they are seeing
    • Using ChatGPT to generate scripts

Consistent Eventually Replication Database by William Brown

  • Sites go down. Lets have multiple sites for our database
  • CAP Theorem
  • PostgresSQL Database
    • Active Primary + Standby
    • Always Consistent
    • Promote passive to active in event of outage
    • Availability
    • But not partition tolerant
  • etcd
    • Nodes elect active node which handles writes. Passive nodes go offline then others are still happy
    • If active node fails then new active node elected and handles writes
    • Not availbale. Since if only one node then it will go to sleep cause it doesn’t know state of other nodes (dead or just unreachable)
  • Active Directory
    • If node disconnected then it will just keep serving old data
    • reads and writes always services even if they are out of contact with other nodes
    • Not consistent
  • Kanidm
    • identity management database
    • Want availability and partition tolerance
    • Because we want disconnected nodes to still handle reads and writes (eg for branch office that is off internet)
    • Also want to be able to scale very high, single node can’t handle all the writes
  • Building and Design
    • Simultaneous writes have to happen on multiple servers, what happens if writes overlap. Changes to same record on different servers
    • ” What would Postgres do? “
    • Have nanosecond timestamps. Apply events nicely in order, only worry about conflicts. Use Lamport Clock (which only goes forward)
    • What happens if the timestamps match?
    • Servers get a uuid, timestamp gets uuid added to it so one server is slightly newer
    • Both servers can go though process in isolation and get the same outputted database content
  • Lots more stuff but I got lost
    • Attribute State + CRDT
  • Most of your code will be doing weird paths. And they must all be tested.
  • Complaint that academic papers are very hard to read. Difficult to translate into code.

Next Generation Authorisation – a developers guide to Cedar by Ricardo Sueiras

  • Authorisation is hard
  • Ceder
    • DSL around authorisation
    • Policy Language
    • Evaluation and Authorisation Engine
    • Easy to Analise
  • Authorisation Language

Managing the Madness of Cloud Logging by Alistair Chapman

  • The use case
  • All vendors put their logs in weird places and in weird sorts of ways. All differently
  • Different defaults for different events
  • Inconsistent event formats –
  • Changes must be proactive – You have to turn on before you need it
  • Configuration isn’t static – VEndor can change around the format with little worning
  • Very easy to access the platform APIs from a VM.
  • Easy to get on a VM if you have access to the Cloud platform
  • Platform Security Tools
    • Has access to all logs and can correlate events
    • Doesn’t work well if you are not 100% using their product. ie Multi-cloud
    • Can cost a lot, requires agents to be deployed
  • Integrating with your own SIEM platform
    • Hard to push logs out to external sources sometimes
    • Can get all 3 into splunk, loki, elastic
    • You have to duplicate with the cloud provider has already done
  • Assess your requirements
    • How much do you need live correlation vs reviewing after something happened
    • Need to plan ahead
    • OSCF, OTel, ECS – Standards. Pick one and use for everything
    • Try log everything. Audit events, Performance metrics, Billing
    • But obvious lots of logs cost logs of money
    • Make it actionable – Discoverability and correlation. Automation
  • Taming log Chaos
    • Learn from Incidents – What sort of thing happens, what did you need availbale
    • Test assumptions – eg How trusted is “internal”
    • Log your logging – How would you know it is not working
    • Document everything – Make it easier to detect deviations from norm
    • Have processes/standards for the teams generating the events (eg what tags to use)
  • Prioritise common mistakes
    • Opportunity for learning
    • Don’t forget to train the humans
  • Think Holistically
    • App security is more than just code
    • Automation and tooling will help but not solve anything
    • If you don’t have a security plan… Make one
  • Common problems
    • Devs will often post key to github
    • github has a feature to block common keys, must be enabled
  • Summary
    • The logs you gather must be actionable
    • Get familiar with the logs, and verify they actually work they way you think
    • Put the logs in one place if you can
    • Plan for the worst
    • Don’t let the logs overwhelm you. But don’t leave important events unlogged
    • The fewer platforms you use the easier it is
Share

Everything Open 2024 – Day 1 talks

Developing in the open, building a product with our users by Toby Bellwood

  • The Lagoon Story
    • At amazee.io . Is Lagoon Lead
    • What is Lagoon
    • Application to Kubernetes (docker build for customer, converts to k8s)
    • Docker based
    • Based on git workflows. Mostly Drupal, WordPress, PHP and NodeJS apps
    • Presets for the extra stuff like monitoring etc
  • Why
    • Cause Developers are too busy to do all that extra stuff
    • and it means Ops prefer if it was all automated away (the right way)
  • 8 full-time team members
    • Knows a lot about application, not so much about the users (apart from Amazee.io)
    • Users: Hosting providers, Agencies, Developers
    • The Adopter: Someone using it for something else, weird use cases
    • Agencies: Need things to go out quickly, want automation, like documentation to be good. Often will need weird technologies cause customers wants that.
    • Developers: Just want it stabele. Only worried about one project at at time. Often OS minded
  • User Mindset
    • Building own tools using application
    • Do walking tours of the system, recorded zoom session
    • Use developer tools
    • Discord, Slack, Office Hours, Events, Easy Access to the team
  • Balance priorities
    • eg stuff customers will use even those Amazee won’t use
  • Engaging Upstream
    • Try to be a good participant, What they would want their customers to be
    • Encourage our teams to “contribute first”. Usually works well
  • Empowering the Team
    • Contribute under your own name
    • Participate in communities
  • How to stay Open Source forever?
    • Widening the Core Contributor Group
    • Learn from others in the Community. But most companies are not open sourcing the main component of their business.
    • Unsuccessful CNCF Sandbox project

Presenting n3n – A simple Peer to Peer VPN by Hamish Coleman

  • How to compares to other VPNs?
    • Peer to peer
    • NAT piecing
    • Not all packets need to go via the server
    • Distributed ethernet switch – gives extra features
    • Userspace except for tuntap driver which is pretty common
    • Low deployment requirements, easy to install in multiple environments
    • Relatively simple security, not super secure
  • History
    • Based off n2n (developed by the people who did ntop)
    • But they changed the license in October 2023
    • Decided to fork into a new project
    • First release of n3n in April 2024
  • Big change was they introduced a CLA (contributor licensing agreement)
  • CLAs have problems
    • Legal document
    • Needs real day, contributor hostile, asymmetry of power
    • Can lead to surprise relicencing
  • Alternatives to a CLA
  • Preserving Git history
    • Developer’s Certificate of Origin
    • Or it could be a CLA
  • Handling Changes
    • Don’t surprise your Volunteers
    • Don’t ignore your Volunteers
    • Do discuss with you Volunteers and bring them along
  • Alternatives
    • Wireguard – No NAT piercing
    • OpenVPN – Mostly client to Server. Also Too configurable
  • Why prefer
    • One simple access method (Speaker uses 4x OS)
    • A single access method
    • p2p avoid latency delays because local instances to talk directly
  • Goals
    • Protocol compatibility with n2n
    • Don’t break user visible APIs
    • Incrementally clean and improve codebase
  • How it works now
    • Supernode – Central co-ordination point, public IP, Some access control, Last-resort for packet forwarding
    • Communities – Nodes join, form a virtual segment
  • IP addresses
    • Can just run a DHCP server inside the network
  • Design
    • Tries to create a full mesh of nodes
    • Multiple Supernodes for metadata
  • Added a few features from n2n
    • INI file, Help text, Tidied up the CLI options and reduced options
    • Tried to make the defaults work better
  • Built in web server
    • Status page, jsonRPC, Socket interfaces, Monitoring/Stats
  • Current State of fork
    • Still young. Another contributor
    • Only soft announced. Growing base of awareness
  • Plans
    • IPv6
    • Optimise encryption/compression
    • Improve packaging and submit to distros
    • Test coverage
    • Better NAT piercing
    • Continue improve config experience
    • Selectable tuntap drivers
    • Mobile phone support hoped for but probably some distance away
  • Speaker’s uses for software
    • Manage mothers computer
    • Management interface for various servers around the world
    • LAN Gaming using Windows 98 machines
    • Connect back to home network to avoid region blockinghttps://github.com/n42n/n3n
  • https://github.com/n42n/n3n

From the stone age to silicon: The Dwarf Axe guide to the evolution of technology by Steven Ellis

  • What is a “Dwarf Axe” ?
    • Snowflakes vs Dwarf Axes
    • It’s an Axe that handled down and consistently delivers a service
    • Both the head ( software ) and the handle ( hardware ) are maintained and upgraded separately and must be maintained. Treated like the same platform even though it is quite different from what it was originally. Delivers the same services though
  • Keeps a fairly similar services. Same box on a organisation diagram
  • Home IT
    • Phones handed down to family members. Often not getting security patches anymore
  • Enterprise IT
    • Systems kept long past their expected lifetime
    • Maintained via virtualisation
  • What is wrong with a Big Axe?
    • Too Big to Fail
    • Billion dollar projects fail.
  • Alternatives
    • Virtual Machines – Running on Axe somewhere,
    • Containers – Something big to orchestrate the containers
    • Microservices – Also needs orchestration
  • Redesign the Axe
    • The cloud – It’s just someone else Axe
  • Options
    • Everything as a service. 3rd party services
  • Re-use has an end-of-life
    • Modern hardware should have better )and longer) hardware support
  • Ephemeral Abstraction
    • Run anywhere
    • Scale out not up
    • Avoid single points of failure
    • Focus on the service (not the infra or the platform)
    • Use Open tools and approaches
  • Define your SOE
    • Not just your OS
Share

Prometheus node_exporter crashed my server

I am in the middle of upgrading my home monitoring setup. I collect metrics via prometheus and query them with grafana. More details later but yesterday I ran into a little problem that crashed one of my computers.

Part of the prometheus ecosystem is node_exporter . This is a program that runs on every computer and exports cpu, ram, disk, network and other stats of the local machine back to prometheus.

One of my servers is a little HP Microserver gen7 I bought in late-2014 and installed Centos 7 on. It has a boot drive and 4 hard drives with data on it.

An HP Microserver gen7

I noticed this machine wasn’t showing up in the prometheus stats correctly. I logged in and checked and the version of node_exporter was very old and formatting it’s data in an obsolete way. So I download the latest version, copied it over the existing binary and restarted the service…

…and my server promptly crashes. So I reboot the server and it crashes a few seconds after the kernel starts.

Obviously the problem is with the new version of node_exporter. However node_exporter is set to start immediately after boot. So what I have to do is start Linux in “single user mode” ( which doesn’t run any services ) and edit the file that starts node_exporter and then reboot again go get the server up normally without it. I follow this guide for getting into single user mode.

After a big of googling I come across node_exporter bug 903 ( “node_exporter creating ACPI Error with Kernel error log ) which seems similar to what I was seeing. The main difference is that my machine crashed rather than just giving an error. I put that down to my machine running fairly old hardware, firmware and operating systems.

The problem seems to be a bug in HP’s hardware/firmware around some stats that the hardware exports. Since node_exporter is trying to get lots of stats from the hardware including temperature, cpu, clock and power usage it is hitting one of the dodgy interfaces and causing a crash.

The bug suggest disabling the “hwmon” check in node_exporter. I tried this but I was still getting a slightly different crash that looked like clock or cpu frequency. Rather than trying to trace further I disabled all the tests and then enabled the ones I needed one by one until the stats I wanted were populated ( except for uptime, because it turns out the time stats via –collector-time were one thing that killed it ).

So I ended up with the following command line

node_exporter --collector.disable-defaults
              --collector.filesystem
              --collector.uname
              --collector.vmstat
              --collector.meminfo
              --collector.loadavg
              --collector.diskstats
              --collector.cpu
              --collector.netstat
              --collector.netdev

which appears to work reliably.

Share

KVM Virtualisation on Ubuntu 22.04

I have been setting up a computer at home to act as a host for virtual machines. The machine is a recycled 10-year-old desktop with 4 cores, 32GB RAM and a 220GB SSD.

Growing the default disk

lvdisplay
lvextend -l +100%FREE /dev/ubuntu-vg/ubuntu-lv
lvdisplay
df -h
resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
df -h

Installing kvm and libvirt

Installing packages:

apt install qemu-kvm libvirt-bin virtinst bridge-utils cpu-checker 
apt install libvirt-daemon-system virtinst libvirt-clients bridge-utils cloud-utils

Setting up users and starting daemons

systemctl enable --now libvirtd
systemctl start libvirtd
systemctl status libvirtd
usermod -aG libvirt simon

Setting up Networking

I needed to put the instance on a static IP and then create a bridge so any VMs that were launched were on the same network as everything else at home.

I followed these articles

First remove the default networks created by KVM

~# virsh net-destroy default
Network default destroyed

~# virsh net-undefine default
Network default has been undefined

then run “ip add show” to check just the physical network is left

backup and edit file in /etc/netplan ( 00-installer-config.yaml in my case) that has config for the network

Config created by installer:

# This is the network config written by 'subiquity'
network:
ethernets:
enp2s0:
dhcp4: true
version: 2

Replacement config:

network:
  ethernets:
    enp2s0:
      dhcp4: false
      dhcp6: false
  bridges:
    br0:
      interfaces: [enp2s0]
      addresses: [192.168.1.27/24]
      routes:
        - to: default
          via: 192.168.1.254
      mtu: 1500
      nameservers:
        addresses: [122.56.237.1, 210.55.111.1]
      parameters:
        stp: true
        forward-delay: 4
      dhcp4: false
      dhcp6: false
  version: 2

Note: The format in the 20.04 doc is slightly out of date (for the default route). Corrected in my file and the following link.

I used yamllint to check the config and “netplan try” and “netplan apply” to update.

Now we can make KVM aware of this bridge. create a scratch XML file called host-bridge.xml and insert the following:

<network>
  <name>host-bridge</name>
  <forward mode="bridge"/>
  <bridge name="br0"/>
</network>

Use the following commands to make that our default bridge for VMs:

virsh net-define host-bridge.xml
virsh net-start host-bridge
virsh net-autostart host-bridge

And then list the networks to confirm it is set to autostart:

$ virsh net-list --all
 Name          State    Autostart   Persistent
------------------------------------------------
 host-bridge   active   yes         yes

Booting a Virtual Machine

Now I want to create a Virtual machine image that I can base others I create off. I followed this guide:

Create Ubuntu 22.04 KVM Guest from a Cloud Image

First I downloaded the jammy-server-cloudimg-amd64.img from cloud-images.ubuntu.com. Note this is the ones that doesn’t have “disk” in it’s name.

~# wget http://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img

Then I grew the disk image to 10GB and copied it to where libvirt could see it.

~# qemu-img resize jammy-server-cloudimg-amd64.img +8G

~# cp ubuntu-18.04-server-cloudimg-arm64.img /var/lib/libvirt/images/jammy2204.img

Now I need to configure the image, especially with a user and password so I can login. The way to do this is with cloud-init. This is a special file of commands to config a booting virtual machine. The weird thing with KVM is that the file is on a virtual cdrom attached to the virtual machine.

First create the config

#cloud-config
system_info:
  default_user:
    name: simon
    home: /home/simon

password: hunter2
chpasswd: { expire: False }
hostname: ubuntu-22-cloud-image

# configure sshd to allow users logging in using password
# rather than just keys
ssh_pwauth: True

and save as bootconfig.txt . Then convert it to an iso and copy that to the images folder

~# cloud-localds bootconf.iso bootconf.txt
~# cp bootconf.iso /var/lib/libvirt/images/

~# virsh pool-refresh default

Now I run the program virt-manager locally. This is a graphical program that connects from my desktop over ssh to the the KVM server.

I use virt manager to connect to the KVM server and create a new virtual machine

  • Machine Type should be “Ubuutu 22.04 LTS”
  • It should boot off the jammy2204.img disk
  • The bootconf.iso should be attached to the CDROM. But the machine does not need to boot off it.
  • Set networking to be “Virtual network ‘host-bridge’: Bridge Network”

Boot the machine and you should be able to login to the console using the user:password you created in the cloud-config. You can then change passwords, update packages and otherwise configure the instance to you liking. Once you have finished you can shutdown the machine.

To create a new VM you just need to clone the disk:

~# virsh vol-clone --pool default jammy2204.img newvm.img

and then create a new Virtual machine in virt-manager using the disk (no need for the iso since the disk has the correct passwords)

Share

Moving my backups to restic

I’ve recently moved my home backups over to restic . I’m using restic to backup the /etc and /home folders and on all machines are my website files and databases. Media files are backed up separately.

I have around 220 Gigabytes of data, about half of that is photos.

My Home setup

I currently have 4 regularly-used physical machines at home: two desktops, one laptop and server. I also have a VPS hosted at Linode and a VM running on the home server. Everything is running Linux.

Existing Backup Setup

For at least 15 years I’ve been using rsnaphot for backup. rsnapshot works by keeping a local copy of the folders to be backed up. To update the local copy it uses rsync over ssh to pull down a copy from the remote machine. It then keeps multiple old versions of files by making a series of copies.

I’d end up with around 12 older versions of the filesystem (something like 5 daily, 4 weekly and 3 monthly) so I could recover files that had been deleted. To save space rsnapshot uses hard links so only one copy of a file is kept if the contents didn’t change.

I also backed up a copy to external hard drives regularly and kept one copy offsite.

The main problem with rsnapshot was it was a little clunky. It took a long time to run because it copied and deleted a lot of files every time it ran. It also is difficult to exclude folders from being backed up and it is also not compatible with any cloud based filesystems. It also requires ssh keys to login to remote machines as root.

Getting started with restic

I started playing around with restic after seeing some recommendations online. As a single binary with a few commands it seemed a little simpler than other solutions. It has a push model so needs to be on each machine and it will upload from there to the archive.

Restic supports around a dozen storage backends for repositories. These include local file system, sftp and Amazon S3. When you create an archive via “restic init” it creates a simple file structure for the repository in most backends:

You can then use simple commands like “restic backup /etc” to backup files to there. The restic documentation site makes things pretty easy to follow.

Restic automatically encrypts backups and each server needs a key to read/write to it’s backups. However any key can see all files in a repository even those belonging to other hosts.

Backup Strategy with Restic

I decided on the followup strategy for my backups:

  • Make a daily copy of /etc, /home and other files for each machine
  • Keep 5 daily and 3 weekly copies
  • Have one copy of data on Backblaze B2
  • Have another copy on my home server
  • Export the copies on the home server to external disk regularly

Backblaze B2 is very similar Amazon S3 and is supported directly by restic. It is however cheaper. Storage is 0.5 cents per gigabyte/month and downloads are 1 cent per gigabyte. In comparison AWS S3 One Zone Infrequent access charges 1 cent per gigabyte/month for storage and 9 cents per gigabyte for downloads.

WhatBackblaze B2 AWS S3
Store 250 GB per month$1.25$2.50
Download 250 GB$2.50$22.50

AWS S3 Glacier is cheaper for storage but hard to work with and retrieval costs would be even higher.

Backblaze B2 is less reliable than S3 (they had an outage when I was testing) but this isn’t a big problem when I’m using them just for backups.

Setting up Backblaze B2

To setup B2 I went to the website and created an account. I would advise putting in your credit card once you finish initial testing as it will not let you add more than 10GB of data without one.

I then created a private bucket and changed the bucket’s lifecycle settings to only keep the last version.

I decided that for security I would have each server use a separate restic repository. This means that I would use a bit of extra space since restic will only keep one copy of a file that is identical on most machines. I ended up using around 15% more.

For each machine I created an B2 application key and set it to have a namePrefix with the name of the machine. This means that each application key can only see files in it’s own folder

On each machine I installed restic and then created an /etc/restic folder. I then added the file b2_env:

export B2_ACCOUNT_ID=000xxxx
export B2_ACCOUNT_KEY=K000yyyy
export RESTIC_PASSWORD=abcdefghi
export RESTIC_REPOSITORY=b2:restic-bucket:/hostname

You can now just run “restic init” and it should create an empty repository, check via b2 to see.

I then had a simple script that runs:

source /etc/restic/b2_env

restic --limit-upload 2000 backup /home/simon --exclude-file /etc/restic/home_exclude

restic --limit-upload 2000 backup /etc /usr/local /var/lib /var/backups

restic --verbose --keep-last 5 --keep-daily 6 --keep-weekly 3 forget

The “source” command loads in the api key and passwords.

The restic backup lines do the actual backup. I have restricted my upload speed to 20 Megabits/second . The /etc/restic/home_exclude lists folders that shouldn’t be backed up. For this I have:

/home/simon/.cache
/home/simon/.config/Slack
/home/simon/.local/share/Trash
/home/simon/.dropbox-dist
/home/simon/Syncthing/audiobooks

as these are folders with regularly changing contents that I don’t need to backup.

The “restic forget” command removes older snapshots. I’m telling it to keep 6 daily copies and 3 weekly copies of my data, plus at least the most recent 5 no matter how old then are.

This command doesn’t actually free up the space taken up by the removed snapshots. I need to run the “restic prune” command for that. However according to this analysis the prune operation generates so many API calls and data transfers that the payback time on disk space saved can be months(!). So I only run the command approx once every 45 days. Here is the code for this:

prune_run() {
    echo "Running restic Prune"
    /usr/local/bin/restic prune --cleanup-cache --max-unused 20%
    echo " "
    touch /etc/restic/last_prune_b2
    echo "Updating restic if required"
    echo " "
    /usr/local/bin/restic self-update
}

prune_check() {
    if [[ ! -f /etc/restic/last_prune_b2 ]]; then
        touch -d "2 days ago" /etc/restic/last_prune_b2
    fi

    if [[ $(find /etc/restic/last_prune_b2 -mtime -30 -print) ]]; then
        echo "Last backup was less than 30 days ago so wont run prune"
        echo " "
    else
        echo "Chance of running prune is 1/30"
        RANDOM=$(date +%N | cut -b4-9)
        flip=$((1 + RANDOM %30))
        if [[ $flip  = 15 ]]; then
            prune_run
        fi
    fi
}

prune_check

Setting up sftp

As well as backing up to B2 I wanted to backup my data to my home server. In this case I decided to have a single repository shared by all the servers.

First of all I created a “restic” account on my server with a home of /home/restic. I then created a folder /media/backups/restic owned by the restic user.

I then followed this guide for sftp-only accounts to restrict the restic user. Relevant lines I changed were “Match User restic” and “ChrootDirectory /media/backups/restic”

On each host I also needed to run “cp /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa ” and also add the host’s public ssh_key to /home/restic/.ssh/authorized_keys on the server.

Then it is just a case of creating a sftp_env file like in the b2 example above. Except this is a little shorter:

export RESTIC_REPOSITORY=sftp:restic@server.darkmere.gen.nz:shared
export RESTIC_PASSWORD=abcdefgh

For backing up my VPS I had to do another step since this couldn’t push files to my home. What I did was instead add a script that ran on the home server and used rsync to copy down folders from by VPS to local. I used rrsync to restrict this script.

Once I had a local folder I ran “restic –home vps-name backup /copy-of-folder” to backup over sftpd. The –host option made sure the backups were listed for the right machine.

Since the restic folder is just a bunch of files, I’m copying up it directly to external disk which I keep outside the house.

Parting Thoughts

I’m fairly happy with restic so far. I don’t have not run into too many problems or gotchas yet although if you are starting up I’d suggest testing with a small repository to get used to the commands etc.

I have copies of keys in my password manager for recovery.

There are a few things I still have to do including setup up some monitoring and also decide how often to run the prune operation.

Share