Tonight I was reading, I'm an American software developer and the "broligarchs" don't speak for me and ran across this passage which completely derailed me way beyond what it should have:

I can buy a 32Gb USB flash drive for the price of a cup of coffee, store a compressed copy of Wikipedia on it, and throw it in my pocket like it’s no big deal. That’s incredible.

I thought, huh. That is indeed incredible! But what if I were more pedantic? I remembered the wikipedia archive I'd used previously being around 100GB, which is well beyond what you can throw on a 32 GB thumb drive. That said, I'm wrong more often than not so I decided to check my memory. I jumped over to the Kiwix Library page to investigate. For those who are unaware, Kiwix is a neat system that allows you to self host Wikipedia and a bunch of other useful websites in case the world, or just your internet connection, goes a little haywire. The Kiwix library page search leaves a lot to be desired though. With some quick browsing I could tell that the largest, most complete English Wikipedia archive was 100 GB. But with so many smaller alternatives, it was hard to tell what the smallest complete version was". Rather than spending 5 more minutes reading through the search results to find the right result I decided to see if I could quickly scrape some data here to make searching easier.

This was my first mistake.

I quickly found an Atom feed that I could add URL parameters like start, count and category. I had data! But it was in XML. XML and I. Well. Let's just say we don't really get along. If this was json, I could throw jq at it and query the data I need in a couple minutes. But it wasn't. It was XML. Ugh. But, wondered, perhaps there's something like jq that works with XML. Yes! It turns out xq is a thing. So I installed that and fiddled with it for all of 3 minutes before giving up on finding the right XPath expression to find the largest full archive. Then I decided, that perhaps it'd be easier to just have AI code up a customized script to get the data I wanted.

This was my second mistake.

I booted up Aider in a new repo and set out to create a script to convert the atom feed to a SQLite database I could very easily query. 19 minutes of prompting later and I had my answer. Now, I'm not one to just have an AI puke out some slop and post it on github and walk away. No sir! So I spent another 40 minutes cleaning it up and making it look nice. If you need a good tool to convert a Kiwix atom feed to a SQLite DB, I have your back. Because I'm certain you need this tool.

An hour and a half later, I now know that a full image free copy of Wikipedia will set you back 53 GB, meaning you'd actually need a 64 GB flash drive. On Amazon this would cost an additional-- wait. The 64 GB drives seem to be cheaper on balance than the 32 GB ones!? Nevermind. Where was I?

The AI tooling made this easy. But I can't help but pause and feel like maybe I'd lost something here. Had I not had access to these tools, maybe I would have actually sat down and learned xq better. Maybe I would have refreshed my XPath skills and maybe I wouldn't continue to give XML so much grief. Or maybe I would have realized sooner the silliness of throwing technology at a question I could have just read a few more words using my own brain to find the answer. In some alternate timeline, I could have just focused on what I was reading and ignored an irrelevant throw-away fact.

With new technologies, we let previous skills atrophy. With writing, we lessened memorization and discounted the oral tradition. With pocket calculators we placed less importance on rote arithmetic. With search engines we became less practiced in research in the library. Perhaps tools like jq and xq will be less needed in some future where we can ask AI to code anything in seconds. But just like memorization and storytelling are still valuable skills today, keeping up on such tools is likely going to be useful for some time to come.

Now, I should really get back to reading that blog article. It's certainly worth my time.