Moving my backups to restic

I’ve recently moved my home backups over to restic . I’m using restic to backup the /etc and /home folders and on all machines are my website files and databases. Media files are backed up separately.

I have around 220 Gigabytes of data, about half of that is photos.

My Home setup

I currently have 4 regularly-used physical machines at home: two workstations, one laptop and server. I also have a VPS hosted at Linode and a VM running on the home server. Everything is running Linux.

Existing Backup Setup

For at least 15 years I’ve been using rsnaphot for backup. rsnapshot works by keeping a local copy of the folders to be backup up. To update the local copy it uses rsync over ssh to pull down a copy from the remote machine. It then keeps multiple old versions of files by making a series of copies.

I’d end up with around 12 older versions of the filesystem (something like 5 daily, 4 weekly and 3 monthly) so I could recover files that had been deleted. To save space rsnapshot uses hard links so only one copy of a file is kept if the contents didn’t change.

I also backed up a copy to external hard drives regularly and kept one copy offsite.

The main problem with rsnapshot was it was a little clunky. It took a long time to run because it copied and deleted a lot of files every time it ran. It also is difficult to exclude folders from being backed up and it is also not compatible with any cloud based filesystems. It also requires ssh keys to login to remote machines as root.

Getting started with restic

I started playing around with restic after seeing some recommendations online. As a single binary with a few commands it seemed a little simpler than other solutions. It has a push model so needs to be on each machine and it will upload from there to the archive.

Restic supports around a dozen storage backends for repositories. These include local file system, sftp and Amazon S3. When you create an archive via “restic init” it creates a simple file structure for the repository in most backends:

You can then use simple commands like “restic backup /etc” to backup files to there. The restic documentation site makes things pretty easy to follow.

Restic automatically encrypts backups and each server needs a key to read/write to it’s backups. However any key can see all files in a repository even those belonging to other hosts.

Backup Strategy with Restic

I decided on the followup strategy for my backups:

  • Make a daily copy of /etc and other files for each server
  • Keep 5 daily and 3 weekly copies
  • Have one copy of data on Backblaze B2
  • Have another copy on my home server
  • Export the copies on the home server to external disk regularly

Backblaze B2 is very similar Amazon S3 and is supported directly by restic. It is however cheaper. Storage is 0.5 cents per gigabyte/month and downloads are 1 cent per gigabyte. In comparison AWS S3 One Zone Infrequent access charges 1 cent per gigabyte/month for storage and 9 cents per gigabyte for downloads.

WhatBackblaze B2 AWS S3
Store 250 GB per month$1.25$2.50
Download 250 GB$2.50$22.50

AWS S3 Glacier is cheaper for storage but hard to work with and retrieval costs would be even higher.

Backblaze B2 is less reliable than S3 (they had an outage when I was testing) but this isn’t a big problem when I’m using them just for backups.

Setting up Backblaze B2

To setup B2 I went to the website and created an account. I would advise putting in your credit card once you finish initial testing as it will not let you add more than 10GB of data without one.

I then created a private bucket and changed the bucket’s lifecycle settings to only keep the last version.

I decided that for security I would have each server use a separate restic repository. This means that I would use a bit of extra space since restic will only keep one copy of a file that is identical on most machines. I ended up using around 15% more.

For each machine I created an B2 application key and set it to have a namePrefix with the name of the machine. This means that each application key can only see files in it’s own folder

On each machine I installed restic and then created an /etc/restic folder. I then added the file b2_env:

export B2_ACCOUNT_ID=000xxxx
export B2_ACCOUNT_KEY=K000yyyy
export RESTIC_PASSWORD=abcdefghi
export RESTIC_REPOSITORY=b2:restic-bucket:/hostname

You can now just run “restic init” and it should create an empty repository, check via b2 to see.

I then had a simple script that runs:

source /etc/restic/b2_env

restic --limit-upload 2000 backup /home/simon --exclude-file /etc/restic/home_exclude

restic --limit-upload 2000 backup /etc /usr/local /var/lib /var/backups

restic --verbose --keep-last 5 --keep-daily 6 --keep-weekly 3 forget

The “source” command loads in the api key and passwords.

The restic backup lines do the actual backup. I have restricted my upload speed to 20 Megabits/second . The /etc/restic/home_exclude lists folders that shouldn’t be backed up. For this I have:

/home/simon/.cache
/home/simon/.config/Slack
/home/simon/.local/share/Trash
/home/simon/.dropbox-dist
/home/simon/Syncthing/audiobooks

as these are folders with regularly changing contents that I don’t need to backup.

The “restic forget” command removes older snapshots. I’m telling it to keep 6 daily copies and 3 weekly copies of my data, plus at least the most recent 5 no matter how old then are.

This command doesn’t actually free up the space taken up by the removed snapshots. I need to run the “restic prune” command for that. However according to this analysis the prune operation generates so many API calls and data transfers that the payback time on disk space saved can be months(!). So for now I’m planning to run the command only occasionally (probably every few months, depending on testing).

Setting up sftp

As well as backing up to B2 I wanted to backup my data to my home server. In this case I decided to have a single repository shared by all the servers.

First of all I created a “restic” account on my server with a home of /home/restic. I then created a folder /media/backups/restic owned by the restic user.

I then followed this guide for sftp-only accounts to restrict the restic user. Relevant lines I changed were “Match User restic” and “ChrootDirectory /media/backups/restic “

On each host I also needed to run “cp /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa ” and also add the host’s public ssh_key to /home/restic/.ssh/authorized_keys on the server.

Then it is just a case of creating a sftp_env file like in the b2 example above. Except this is a little shorter:

export RESTIC_REPOSITORY=sftp:restic@server.darkmere.gen.nz:shared
export RESTIC_PASSWORD=abcdefgh

For backing up my VPS I had to do another step since this couldn’t push files to my home. What I did was instead add a script that ran on the home server and used rsync to copy down folders from by VPS to local. I used rrsync to restrict this script.

Once I had a local folder I ran “restic –home vps-name backup /copy-of-folder” to backup over sftpd. The –host option made sure the backups were listed for the right machine.

Since the restic folder is just a bunch of files, I’m copying up it directly to external disk which I keep outside the house.

Parting Thoughts

I’m fairly happy with restic so far. I don’t have not run into too many problems or gotchas yet although if you are starting up I’d suggest testing with a small repository to get used to the commands etc.

I have copies of keys in my password manager for recovery.

There are a few things I still have to do including setup up some monitoring and also decide how often to run the prune operation.

Share

Donations 2020

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended).

I also blog about it to hopefully inspire others. See: 2019, 2018, 2017, 2016, 2015

All amounts this year are in $US unless otherwise stated

My main donations was $750 to Givewell (to allocate to projects as they prioritize). Once again I’m happy that Givewell make efficient use of money donated. I decided this year to give a higher proportion of my giving to them than last year.

Software and Internet Infrastructure Projects

€20 to Syncthing which I’ve started to use instead of Dropbox.

$50 each to the Software Freedom Conservancy and Software in the Public Interest . Money not attached to any specific project.

$51 to the Internet Archive

$25 to Let’s Encrypt

Advocacy Organisations

$50 to the Electronic Frontier Foundation

Others including content creators

I donated $103 to Signum University to cover Corey Olsen’s Exploring the Lord of the Rings series plus other stuff I listen to that they put out.

I paid $100 to be a supporter of NZ News site The Spinoff

I also supported a number of creators on Patreon:

Share

Talks from KubeCon + CloudNativeCon Europe 2020 – Part 1

Various talks I watched from their YouTube playlist.

Application Autoscaling Made Easy With Kubernetes Event-Driven Autoscaling (KEDA) – Tom Kerkhove

I’ve been using Keda a little bit at work. Good way to scale on random stuff. At work I’m scaling pods against length of AWS SQS Queues and as a cron. Lots of other options. This talk is a 9 minute intro. A bit hard to read the small font on the screen of this talk.

Autoscaling at Scale: How We Manage Capacity @ Zalando – Mikkel Larsen, Zalando SE

  • These guys have their own HPA replacement for scaling. Kube-metrics-adapter .
  • Outlines some new stuff in scaling in 1.18 and 1.19.
  • They also have a fork of the Cluster Autoscaler (although some of what it seems to duplicate Amazon Fleets).
  • Have up to 1000 nodes in some of their clusters. Have to play with address space per nodes, also scale their control plan nodes vertically (control plan autoscaler).
  • Use Virtical Pod autoscaler especially for things like prometheus that varies by the size of the cluster. Have had problems with it scaling down too fast. They have some of their own custom changes in a fork

Keynote: Observing Kubernetes Without Losing Your Mind – Vicki Cheung

  • Lots of metrics dont’t cover what you want and get hard to maintain and complex
  • Monitor core user workflows (ie just test a pod launch and stop)
  • Tiny tools
    • 1 watches for events on cluster and logs them -> elastic
    • 2 watches container events -> elastic
    • End up with one timeline for a deploy/job covering everything
    • Empowers users to do their own debugging

Autoscaling and Cost Optimization on Kubernetes: From 0 to 100 – Guy Templeton & Jiaxin Shan

  • Intro to HPA and metric types. Plus some of the newer stuff like multiple metrics
  • Vertical pod autoscaler. Good for single pod deployments. Doesn’t work will with JVM based workloads.
  • Cluster Autoscaler.
    • A few things like using prestop hooks to give pods time to shutdown
    • pod priorties for scaling.
    • –expandable-pods-priority-cutoff to not expand for low-priority jobs
    • Using the priority-expander to try and expand spots first and then fallback to more expensive node types
    • Using mixed instance policy with AWS . Lots of instance types (same CPU/RAM though) to choose from.
    • Look at poddistruptionbudget
    • Some other CA flags like scale-down-utilisation-threshold to lok at.
  • Mention of Keda
  • Best return is probably tuning HPA
  • There is also another similar talk . Note the Male Speaker talks very slow so crank up the speed.

Keynote: Building a Service Mesh From Scratch – The Pinterest Story – Derek Argueta

  • Changed to Envoy as a http proxy for incoming
  • Wrote own extension to make feature complete
  • Also another project migrating to mTLS
    • Huge amount of work for Java.
    • Lots of work to repeat for other languages
    • Looked at getting Envoy to do the work
    • Ingress LB -> Inbound Proxy -> App
  • Used j2 to build the Static config (with checking, tests, validation)
  • Rolled out to put envoy in front of other services with good TLS termination default settings
  • Extra Mesh Use Cases
    • Infrastructure-specific routing
    • SLI Monitoring
    • http cookie monitoring
  • Became a platform that people wanted to use.
  • Solving one problem first and incrementally using other things. Many groups had similar problems. “Just a node in a mesh”.

Improving the Performance of Your Kubernetes Cluster – Priya Wadhwa, Google

  • Tools – Mostly tested locally with Minikube (she is a Minikube maintainer)
  • Minikube pause – Pause the Kubernetes systems processes and leave app running, good if cluster isn’t changing.
  • Looked at some articles from Brendon Gregg
  • Ran USE Method against Minikube
  • eBPF BCC tools against Minikube
  • biosnoop – noticed lots of writes from etcd
  • KVM Flamegraph – Lots of calls from ioctl
  • Theory that etcd writes might be a big contributor
  • How to tune etcd writes ( updated –snapshot-count flag to various numbers but didn’t seem to help)
  • Noticed CPU spkies every few seconds
  • “pidstat 1 60” . Noticed kubectl command running often. Running “kubectl apply addons” regularly
  • Suspected addon manager running often
  • Could increase addon manager polltime but then addons would take a while to show up.
  • But in Minikube not a problem cause minikube knows when new addons added so can run the addon manager directly rather than it polling.
  • 32% reduction in overhead from turning off addon polling
  • Also reduced coredns number to one.
  • pprof – go tool
  • kube-apiserver pprof data
  • Spending lots of times dealing with incoming requests
  • Lots of requests from kube-controller-manager and kube-scheduler around leader-election
  • But Minikube is only running one of each. No need to elect a leader!
  • Flag to turn both off –leader-elect=false
  • 18% reduction from reducing coredns to 1 and turning leader election off.
  • Back to looking at etcd overhead with pprof
  • writeFrameAsync in http calls
  • Theory could increase –proxy-refresh-interval from 30s up to 120s. Good value at 70s but unsure what behavior was. Asked and didn’t appear to be a big problem.
  • 4% reduction in overhead
Share

Linkedin putting pressure on users to enable location tracking

I got this email from Linkedin this morning. It is telling me that they are going to change my location from “Auckland, New Zealand” to “Auckland, Auckland, New Zealand“.

Email from Linkedin on 30 August 2020

Since “Auckland, Auckland, New Zealand” sounds stupid to New Zealanders (Auckland is pretty much a big city with a single job market and is not a state or similar) I clicked on the link and opened the application to stick with what I currently have

Except the problem is that the pulldown doesn’t offer many any other locations

The only way to change the location is to click “use Current Location” and then allow Linkedin to access my device’s location.

According to the help page:

By default, the location on your profile will be suggested based on the postal code you provided in the past, either when you set up your profile or last edited your location. However, you can manually update the location on your LinkedIn profile to display a different location.

but it appears the manual method is disabled. I am guessing they have a fixed list of locations in my postcode and this can’t be changed.

So it appears that my options are to accept Linkedin’s crappy name for my location (Other NZers have posted problems with their location naming) or to allow Linkedin to spy on my location and it’ll probably still assign the same dumb name.

The basically appears to be a way for Linkedin to push user to enable location tracking. While at the same time they get to force their own ideas on how New Zealand locations work on users.

Share

Sidewalk Delivery Robots: An Introduction to Technology and Vendors

At the start of 2011 Uber was in one city (San Francisco). Just 3 years later it was in hundreds of cities worldwide including Auckland and Wellington. Dockless Electric Scooters took only a year from their first launch to reach New Zealand. In both cases the quick rollout in cities left the public, competitors and regulators scrambling to adapt.

Delivery Robots could be the next major wave to rollout worldwide and disrupt existing industries. Like driverless cars these are being worked on by several companies but unlike driverless cars they are delivering real packages for real users in several cities already.

Note: I plan to cover other aspects of Sidewalk Delivery Robots including their impact of society in a followup article.

What are Delivery Robots?

Delivery Robots are driverless vehicles/drones that cover the last mile. They are loaded with a cargo and then will go to a final destination where they are unloaded by the customer.

Indoor Robots are designed to operate within a building. An example of these is The Keenon Peanut. These deliver items to guests in hotels or restaurants . They allow delivery companies to leave food and other items with the robot at the entrance/lobby of a building rather than going all the way to a customer’s room or apartment.

Keenon Peanut

Flying Delivery Drones are being tested by several companies. Wing which is owned by Google’s parent company Alphabet, is testing in Canberra, Australia. Amazon also had a product called Amazon Prime Air which appears to have been shelved.

Wing Flying Robot

The next size up are sidewalk delivery robots which I’ll be concentrating on in the article. Best known of these is Starship Technologies but there is also Kiwi and Amazon Scout. These are designed to drive at slow speeds on the footpaths rather than mix with cars and other vehicles on the road. They cross roads at standard crossings.

KiwiBot
Starship Delivery Robot
Amazon Scout

Finally some companies are rolling out Car sized Delivery Robots designed to drive on roads and mix with normal vehicles. The REV-1 from Reflection AI is at the smaller end with company videos showing it using both car and bike lanes. Larger is the Small-Car sized Nuro.

REV-1
Nuro

Sidewalk Delivery Robots

I’ll concentrate most on Sidewalk Delivery Robots in this article because I believe they are the most mature and likeliest to have an effect on society in the short term (next 2-10 years).

  • In-building bots are a fairly niche product that most people won’t interact with regularly.
  • Flying Drones are close to working but it it seems to be some time before they can function safely in a built-up environment and autonomously. Cargo capacity is currently limited in most models and larger units will bring new problems.
  • Car (or motorbike) sized bots have the same problems as driverless cars. They have to drive fast and be fully autonomous in all sorts of road conditions. No time to ask for human help, a vehicle on the road will at best block traffic or at potentially be involved in an accident. These stringent requirements mean widespread deployment is probably at least 10 years away.

Sidewalk bots are much further along in their evolution and they have simpler problems to solve.

  • A small vehicle that can carry a takeaway or small grocery order is buildable using today’s technology and not too expensive.
  • Footpaths exist most places they need to go.
  • Walking pace ( up to 6km/h ) is fast enough to be good enough even for hot food.
  • Ubiquitous wireless connectivity enables the robots to be controlled remotely if they cannot handle a situation automatically.
  • Everything unfolds slower on the sidewalk. If a sidewalk bot encounters a problem it can just slow to a stop and wait for remote help. If that process takes 20 seconds then it is usually no problem.

Starship Technologies

Starship are the best known vendor and most advanced vendor in the sector. They launched in 2015 and have a good publicity team.

In late 2019 Starship announced a major rollout to US university campuseswith their abundance of walking paths, well-defined boundaries, and smartphone-using, delivery-minded student bodies“. Campuses include The University of Mississippi and Bowling Green State University .

The push into college campuses was unluckily timed with many being closed in 2020 due to Covid-19. Starship has increased delivery areas outside of campus in some places to try and compensate. It has also seen a doubling of demand in Milton Keynes. However the company has laid of some workers in March 2020.

Kiwibot

Kiwibot

Kiwibot is one of the few other companies that has gone beyond the prototype stage to servicing actual customers. It is some way behind Starship with the robots being less autonomous and needing more onsite helpers.

  • Based in Columbia with a major deployment in Berkley, California around the UCB campus area
  • Robots cost $US 3,500 each
  • Smaller than Starship with just 1 cubic foot of capacity. Range and speed reportedly lower
  • Guided by remote control using way-points by operators in Medellín, Colombia. Each operator can control up to 3 bots.
  • On-site operators in Berkley maintain robots (and rescue them when they get stuck).
  • Some orders delivered wholly or partially by humans
  • Concentrating on the Restaurant delivery market
  • Costs for the Business
    • Lease/rent starts at $20/day per robot
    • Order capacity 6-18/day per Robot depending on demand & distance.
    • Order fees are $1.99/order with 1-4 Kiwibots leased
    • or $0.99/order if you have 5-10 Kiwibots leased
  • Website, Kiwibot Campus Vision Video , Kiwibot end-2019 post

An interesting feature is that Kiwibot publish their prices for businesses and provide a calculator with which you can calculate break-even points for robot delivery.

As with Starship, Kiwibot was hit by Covid19 closing College campuses. In July 2020 they announced a rollout in the city of San Jose, California in partnership with Shopify and Ordermark. The company is trying to pivot towards just building the robot infrastructure and partner with companies that already have that [marketplace] in mind. They are also trying to crowdfund for investment money.

Amazon Scout

Amazon Scout

Amazon are only slowly rolling out their Scout Robots. It is similar in appearance to the Starship Robots vehicle but is larger.

  • Announced in January 2019
  • Weight “about 100 pounds” (50 kg). No further specs available.
  • A video of the development team at Amazon
  • Initially delivering to Snohomish County, Washington near Amazon HQ
  • Added Irvine, California in August 2019 but still supervised by human
  • In July 2020 announced rollouts in Atlanta, Georgia and Franklin, Tennessee, but still “initially be accompanied by an Amazon Scout Ambassador”.

Other Companies

There are several other companies also working on Sidewalk Delivery Robots. The most advanced are Restaurant Delivery Company Postmates (now owned by Uber) has their own robot called Serve which is in early testing. Video of it on the street.

Several other companies have also announced projects. None appear to be near rolling out to live customers though.

Business Model and Markets

At present Starship and Kiwi are mainly targeting the restaurant deliver market against established players such as Uber Eats. Reasons for going for this market include

  • Established market, not something new
  • Short distances and small cargo
  • Customers unload produce quickly product so no waiting around
  • Charges by existing players quite high. Ballpark costs of $5 to the customer (plus a tip in some countries) and the restaurant also being charged 30% of the bill
  • Even with the high charges drivers end up making only around minimum wage.
  • The current business model is only just working. While customers find it convenient and the delivery cost reasonable, restaurants and drivers are struggling to make money.

Starship and Amazon are also targeting the general delivery market. This requires higher capacity and also customers may not be home when the vehicle arrives. However it may be the case that if vehicles are cheap enough they could just wait till the customer gets home.

Still more to cover

This article as just a quick introduction of the Sidewalk Delivery Robots out there. I hope to do a later post covering more including what the technology will mean for the delivery industry and for other sidewalk users as well as society in general.

Share

Linux.conf.au 2020 – Friday – Lightning Talks and Close

Steve

  • Less opportunity for Intern type stuff
  • Trying to build team with young people
  • Internships
  • They Need opportunities
  • Think about giving a chance

Martin

  • Secure Scuttlebutt
  • p2p social web
  • more like just a protocol
  • scuttlebutt.nz
  • Protocol used for other stuff.

Emma

  • LCA from my perspective

Mike Bailey

  • Pipe-skimming
  • Enahncing UI of CLI tools
  • take first arg in pipe and sends to the next tool

Aleks

  • YOGA Book c930
  • Laptop with e-ink display for keyboard
  • Used wireshark to look at USB under Windows
  • Created a device driver based on packets windows was sending
  • Linux recognised it as a USB Keyboard and just works
  • Added new feature and
  • github.com/aleksb

Evan

  • Two factor authentication
  • It’s hard

Keith

  • Snekboard
  • Crowdsourced hardware project
  • crowdsupply.com/keith-packard/snekboard
  • $79 campaign, ends 1 March

Adam and Ben

  • idntfrs
  • bytes are not expensive any more

William

  • Root cause of swiss cheese

Colin

  • OWASP
  • Every person they taught about a vulnerbility 2 people appeared to write vulnerable code
  • WebGoat
  • Hold you hand though OWASP vulnerability list. Exploit and fix
  • teaching, playing to break, go back and fix
  • Forks in various languages

Leigh

  • Masculinity
  • Leave it better than you found it

David

  • Fixing NAT
  • with more NAT

Caitlin

  • Glitter!
  • conferences should be playful
  • meetups can be friendly
  • Ways to introduce job
  • Stickers

Miles

  • Lies, Damn lies and data science
  • Hipster statistics
  • LCA 2021 is in Canberra

Share

Linux.conf.au 2020 – Friday – Session 1 – Protocols / LumoSQL

The Fight to Keep the Watchers at Bay – Mark Nottingham

Disclaimer: I am not a security person, But in some sense we are all security people.

Why Secure the Internet

  • In the beginning it was just researchers and a Academics
  • Snowden was a watershed moment
  • STRINT Workshop in 2014
  • It’s not just your website, it’s the Javascript that somebody in injecting in front of it.

What has happened so far?

  • http -> https
    • In 2010 even major services, demo of firesheep program to grab cookies and auth off Wifi
    • Injecting cookies in http flows
    • Needed to shift needle to https
    • http/2 big push to make encrypted-only , isn’t actually though browsers only support https.
    • “Secure Contexts” cool features only https
  • Problem: Mixed Content
    • “Upgrading Insecure Requests” allow ad-hoc by pages
    • HTTPs is slow – istlsfastyet.com
    • Improvement in speed of implimentations
    • Let’s Encrypt
  • Around 85-90% https as of Early 2020
  • Some people were unhappy
    • Slow Satellite internet said they needed middle boxes to optimise http over slow links
    • People who did http shared caching
  • TLS 1.2 -> TLS 1.3
    • Complex old protocol
    • Implementation monculture
    • Outdated Crypto
    • TLS 1.3
      • Simplify where possible
      • encrypt most of handshake
      • get good review of protocol
      • At around 30%
      • Lots of implementations
    • Some unhappy. Financial institutions needed to sniff secure transactions (and had bought expensive appliances to do this)
      • They ended up forkign their own protocol
  • TCP -> QUIC
    • TCP is unencrypted, lots of leaks and room for in-betweens to play around
    • QUIC – all encrypted
    • Spin Bit – single bit of data can be used by providers to estimate packet loss and delay.
  • DNS -> DOH
    • Lots of click data sold by ISPs
    • Countries hijacking DNS by countries to block stuff
    • DNS over https co be co-located by a popular website
    • Some were unhappy
      • Lots of pushback from governments and big companies
      • Industry unhappy about concentration of DNS handling
      • Have to decide who to trust
  • SNI -> Encrypted SNI
    • Working progress, very complex
    • South Korea unhappy, was using it to block people
  • Traffic Analysis
    • Packet length, frequency, destinations
    • TOR hard to tell. Looking at using multiplexing and fix-length records
  • But the ends
    • Customer compromised or provider compromised (or otherwise sharing data)
  • Observations
    • Cost and Control
      • Cost: Big technology spends no obsolete
      • Control: some people want to do stuff on the network
    • We have to design tthe Internet to the pessimistic case
    • You can’t expose application data to the path anymore
    • Well-defined interfaces and counterbalanced roles
    • Technology and Policy need to work togeather and keep each other in check
    • Making some people unhappy means you need some guiding principles

LumoSQL – updating SQLite for the modern age – Dan Shearer

LumoSQL = SQLite + LMDB – WAL

SQLite

  • ” Is a replacement for fopen() “
  • Key/Value stores.
    • Everyone used Sleepycat BDB – bought be Oracle and licensed changed
    • Many switched to LMDB (approx 2010)
  • Howard Chu 2013 SQLightning faster than SLQite but changes not adopted into SQLite

LumoSQL

  • Funded by NLNet Foundation
  • Dan Shearer and Keith Maxwell

What isn’t working with SQLite ?

  • Inappropriate/unsupported use cases
  • Speed
  • Corruption
  • Encryption

What hasn’t been done so far

  • Located code, started on github.com/LumoSQL
  • Benchmarking tool for versions matrix
  • Mapped out how the keywords store works
    • So different backend can be dropped in.
  • Fixed bugs with the port and with lmdb

What’s Next

  • First Release Feb 2020
  • Add Multiple backends
  • Implement two database advances
Share

Linux.conf.au 2020 – Thursday – Session 3 – Software Freedom lost / Stream Processing

Open Source Won, but Software Freedom Hasn’t Yet: A Guide & Commiseration Session for FOSS activists by Bradley M. Kuhn, Karen Sandler

Larger Events elsewhere tend to be corperate sponsored so probably wouldn’t accept a talk like this

Free Software Purists

  • About 2/3s of audenience spent some time going out of their way using free software
  • A few years ago you could only use free software
  • To watch TV. I can use DRM or I can pirate. Both are problems.
  • The web is a very effecient way to install proprietary software (javascript) on your browser
  • Most people don’t even see that or think about it

Laptops

  • 2010-era Laptops are some of the last that are fully free-software
  • Later have firmware and other stuff that is all closed.
  • HTC Dream – some firmware on phone bit but rest was free software

Electronic Coupons

  • Coupons are all Digital. You need to run an app that tracks all you processors
  • “As a Karen I sometimes ask the store to just ket me have the coupon, even though it is expired”
  • Couldn’t install Disneyland App on older phones. So unable to bypass lines etc.

Proprietary dumping ground

  • Bradly had a device. Installed all the proprietary apps on it rather than his main phone
  • But it’s a bad idea since all the tracking stuff can talk to each other.

Hypocrisy of tradition free software advocacy

  • Do not criticise people for use Proprietary software
  • It is it is almost impossible to live your life without use it
  • It should be an aspirational goal
  • Person should not be seen as a failure if they use it
  • Asking others to use it instead is worse than using it yourself
  • Karen’s Laptop: It runs Debian but it is only “98% free”

Paradox: There more FOSS there is, the less software freedom we actually have in our technology

  • But there is less software freedom than there is in 2006
  • Because everything is computerized, a lot more than 15 years ago.
  • More things in Linux that Big companies want in datacentres rather than tinkerers in their homes want.

What are the right choices?

  • Be mindful
  • Try when you can to use free software. Make small choices that support software freedom
  • Shine a light on the problem
  • Don’t let the shame you feel about using proprietary software paralyze you
  • and don’t let the problems we face overwhelm you into inaction
  • Re-prioritize your FOSS development time.
    • Is it going to give more people freedom in the world?
    • Maybe try to do a bit in your free time.
  • Support each other
  • FAIF.us podcast

Advanced Stream Processing on the Edge by Eduardo Silva

Data is everywhere. We need to be able to extract value from it

  • Put it all in a database to extract value
  • Challenge: Data comes from all sorts of places
    • More data -> more bandwidth -> more resource required
    • Delays as more data ingested
  • Challenge: lots of different formats

Ideal Tool

  • Collect from different sources
  • convert unstructured to structured
  • enrichment and filtering
  • multiple destinations like database or cloud services

Fluentbit

  • Started in 2015
  • Origins lightweight log processor for embedded space
  • Ended up being used in cloud space
  • Written in C
  • Low mem and CPU
  • Plugable arch
  • input -> parser -> filter -> buffer -> routing -> output

Structure Messages

  • Unstructured to structured
  • Metadata
  • Can add tags to date on input, use it later for routing

Stream processing

  • Perform processing while the data is still in motion
  • Faster data processing
  • in Memory
  • No tables
  • No indexing
  • Receive structured data, expose a query language
  • Nomally done centrally

Doing this on the edge

  • Offload computation from servers to data collectors
  • Only sends required data to the cloud
  • Use a SQL-like language to write the queries
  • Integrated with fluent core

Functions

  • Aggregation functions
  • Time funtiocs
  • Timeseries functions
  • You can also write functions in Lua

Also exposed prometheus-type metrics

Share

Linux.conf.au 2020 – Thursday – Session 2 – Origins of X / Aerial Photography

The History of X: Lessons for Software Freedom – Keith Packard

1984 – The Origins of X

  • Everything proprietary
  • Brian Reid and Paul Asente: V Kernel -> VGTS -> W window system
    • Ported to VAXstation 100 at Stanford
    • 68k processor, 128k of VRAM
    • B&W
  • Bob Scheifler started hacking W -> X
  • Ported to Unix , made more Unix Friendly (async) renamed X

Unix Workstation Market

  • Unix was closed source
  • Vendor Unix based on BSD 4.x
  • Sun, HP, Digital, Apollo, Tektronix, IBM
  • this was when the configure program happened
  • VAXstation II
    • Color graphics 8bit accelerated
  • Sun 3/60
    • CPU drew everything on the screen

Early Unix Window System – 85-86

  • SunView dominates (actual commerical apps, Ddesktop widgets)
  • Digital VMS/US
  • Apollo had Domain
  • Tektronix demonstrated SmallTalk
  • all only ran on their own hardware

X1 – X6

  • non-free software
  • Used Internally at MIT
  • Shared with friends informally

X10 – approx 1986

  • Almost usable
  • Ported to various workstations
  • Distribution was not all free software (had bin blobs)
    • Sun port relied on SunView kernel API
    • Digital provided binary rendering code
    • IBM PC/RT Support completed in source form

Why X11 ?

  • X10 had warts
  • rendering model was pretty terrible
  • External Windows manager without borders
  • Other vendors wanted to get involved
    • Jim Gettys and Smokey Wallace
    • Write X11, release under liberal terms
    • Working against Sun
    • Displace Sunview
    • “Reset the market”
    • Digital management agreed

X11 Development 1986-87

  • Protocol designed as croos-org team
  • Sample implementation done mostly at DEC WRL, collaboration with people at MIT
  • Internet not functional enough to property collaborate, done via mail
    • Thus most of it happened at MIT

MIT X Consortium

  • Hired dev team at MIT
  • Funded by consortium
  • Members also voted on standards
    • Members stopped their on develoment
    • Stopped collaboration with non-members
  • We knew Richard too well – The GPL’s worst sponsor
  • Corp sponsors dedicated to non-free software

X Consortium Standards

  • XIE – X Imaging Extensions
  • PIX – Phigs Extension for X
  • LBX – Low Bandwidth X
  • Xinput (version 1)

The workstation vendors were trying to differentiate. They wanted a minimal base to built their stuff on. Standard was frozen for around 15 years. That is why X fell behind other envs as hardware changed.

X11 , NeWs and Postscript

  • NeWS – Very slow but cool
  • Adobe adapted PostScript interpreter for windows systems – Closed Source
  • Merged X11/NeWS server – Closed Source

The Free Unix Desktop

  • All the toolkits were closed source
  • Sunview -> XView
  • OpenView – Xt based toolkit

X Stagnates – ~1992

  • Core protocol not allowed to change
  • non-members pushed out
  • market fragments

Collapse of Unix

  • The Decade of Windows

Opening a treasure trove: The Historical Aerial Photography project by Paul Haesler

  • Geoscience Australia has inherated an extensive archive of hisorical photography
  • 1.2 million images from 1920 – 1990s
  • Full coverage of Aus and more (some places more than others)

Historical Archive Projects

  • Canonical source of truth is pieces of paper
  • Multiple attempts at scanning/transscription. Duplication and compounding of errors
  • Some errors in original data
  • “Historian” role to sift through and collate into a machine-readable form – usually spreadsheets
  • Data Model typically evolves over time – implementation must be flexible and open-minded

What we get

  • Flight Line Diagrams (metadata)
  • Imagery (data)
  • Lots scanned in early 1990s, but low resolution and missing data, some missed

Digitization Pipeline

  • Flight line diagram pipeline
    • High resolution scans
    • Georeferences
  • Film pipeline
    • Filmstock
    • High Resolution scans
    • Georeference images
    • Georectified images
    • Stitched mosaics + Elevation models

Only about 20% of film scanned. Lacking funding and film deteriorating

Other states have similar smaller archives (and other countries)

  • Many significantly more mature but may be locked in propitiatory platforms

Stack

  • Open Data ( Cc by 4.0)
  • Open Standards (TESTful, GeoJSON, STAC)
  • Open Source
  • PostGreSQL/PostGIS
  • Python3: Django REST Framework
  • Current Status: API Only. Alpha/proof-of-concept

API

  • Search for Flight runs
  • Output is GeoJSON

Coming Next

  • Scanning and georeferencing (need $$$)
  • Data entry/management tools – no spreadsheets
  • Refs to other archives, federated search
  • Integration with TerriaJS/National Map
  • Full STAC once standardized
Share

Linux.conf.au 2020 – Thursday – Session 1 – .NET to Linux / Collecting information

Engineer tested, manager approved: Migrating Windows/.NET services to Linux – Katie Bell

Works at Campaign Monitor

  • sends email spam
  • Company around since 2004

Software product generations

  • Originally a monolith
  • Windows, C# .net framework, IIS, Monolithic SQLServer
  • Went to microservices (called Reckless Microservices)
  • Windows, C# .net , OWIN Hosting / Nancy , Modular databases

Gen 2 – “Reckless” Microservice

  • Easy to create a new microservices
  • and deploy etc
  • Runs in ec2

Wanted to go to a tools like dockers, kubernetes that were not well supported by microsoft tools

Gen 3 – Docker Services

  • Linux
  • Java / Go

Lots of ways to do stuff

  • 3 different ways of doing everything
  • Confusing and big tax on developers
  • Losing knowledge about how the older Reckless stuff worked

A Crazy Idea

  • Run all the Reckless services in docker
  • Get rid of one whole generation

What does it take?

  • Move from .NET Framework to .NET Core
  • Framework very Windows specific – runtime installed at OS level
  • Core more open and cross-platform – self contained executable apps
  • But what about Mono? (Open Source .NET Framework) .
    • Probably not worth the effort since Framework is the way forward
  • But a lot of .NET Framework APIs not ported over to .NET Core. Some replaced by new APIs
  • .Net Standard libraries support on both though, which is lots of them

What Doesn’t port to Core?

  • Libraries moved/renamed
  • Some libs dropped
  • IIS, ASP.NET replaced with ASP.NET Core + MVC
  • WCF Server communication
  • Old unmaintained libraries

Luckily Reckless not using ASP.NET so shouldn’t to too hard to do. Maybe not sure a crazy idea.

But most companies don’t let people spend lots of time on Tech Debt.

Asked for something small – 2 weeks of 3 people.

  • 1 week: Hacky proof of concept (getting 1 service to run in .NET Core)
  • 2nd week: Document and investigate what full project would require and have to do
  • Last Day: Time estimates
  • Found that Windows ec2 instance were 45%
  • Cost saving alone of moving from Windows to Linux justied the project
  • Pitching:
    • Demo
    • Detailed time estimates
    • Proposal with multiple options
    • Concrete benifits, cost savings, problems with rusty old infra
  • Microsoft Portability Analyzer
    • Just run across app and gives very detailed output
  • icanhasdot.net
    • Good for external dependencies

Web Hosting differences

  • OWIN Hosting vs Kestrel
  • ASP.NET Core DI

Libraries that Do support .NET Standard

  • Had to upgrade all our code to support the new versions
  • Major changes in places

OS Differences

  • case-sensitive filenames
  • Windows services, event logging

Libararies that did not support .net Standard

  • Magnum – unmaintained
  • Topshelf

.NET Framework Libraries can be run under .NET Core using compatibility shim. Sometimes works but not really a good idea. Use with extreme caution

Overall Result

  • Took 6-8 months of 2-3 people
  • Everything migrated over.
  • Around 100 services
  • 78 actually running
  • 43 really needed to be migrated
  • 31 actually needed in the end
  • Estimated old hosting cost $145k/year
  • Estimated new hosting costing $70k/year
  • Actual hosting cost $15k/year
  • Got rid of almost all the extra infrastructure that was used to support reckless. another $25k/year saved

Advice for cleanup projects

  • Ask for something small
  • Test the idea
  • Demonstrate the business case
  • Build detailed time estimates

Collecting information with care by Opel Symes

The Problem

  • People build systems for people without checking our assumptions about people are valid
  • Be aware of my assumptions, this doesn’t cover all areas

Names

  • Form “First Name” and “Last Name” -> “Dear John Smith”
  • Fields Required – should be optional
  • Should not do character checks ( blocking accents etc )
  • Check production support emoji.. everywhere
  • MySQL Character Encodings. Only since 5.5 , default in MySQL 8
  • Every Database, table and text cloumn and defaults need to be changed to the new character set. Set connection options so things don’t get lost in transfer.
  • Personal Names around the world
  • Chinese names
  • Names can be long
  • Recommendation
    • Ask for “Full name” (where a legal name is required) and “Greeting”
    • Unicode all the way down – test with emoji
    • No Length limits

Email

  • Email addresses are quite complex
  • Does it have an “@”
  • Checked it is not a simple typo of a well-known email down
  • Will it be accepted by the email sender?
  • Look for an MX record
  • Ask the SMTP server if this username is valid
  • Simple checks for common errors
  • Don’t roll your own checking, use you own mail server or the mail library that you will using to send.

Gender

  • Transgender vs Cisgender
  • Non-binary – Gender that isn’t male or female
  • Don’t just give the two options
  • A 3rd “other” option isn’t ideal
  • A freeform field is good.
  • Gender Alternative from Nikki Stevens
  • Instead ask if people make up an “under representated community”

Pronouns

  • What pronounces should we use to refer to you? ( he , she, they )
  • Works okay in English but may not in other languages
  • Some lanugages lack gender-nutral pronoun
  • Some languages lack gender pronouns
  • pronoun.is

Titles

  • Ask for “None” but don’t actually print it “Dear None Smith”
  • Ask for Mx
  • Have a freeform field ( Dr, Count )
  • Maybe avoid titles if possible
  • Don’t show people according to gender, ask specifically.

Gender – WGEA

  • The Act defines gender as male or female.
  • Others are not reported.
  • Have an explanation for people who don’t fit in the above

Data Retention

  • Make it simple to change
  • Give users options if it isn’t (eg show preferred name)

Changing Username

  • Usernames are often options
  • Changing them comes with some caveats
  • Using UUIDS to links to users rather than usernames

Changing Emails

  • There are security implications

Deleting Data

  • Make it possible and no to hard
Share