jb… a weblog by Jonathan Buys

Random Strings on the Command Line

December 19, 2024

Browsing through some old files today I came across this note:

To Generate A Random String:

tr -dc A-Za-z0-9_ < /dev/urandom | head -c 8 | xargs

Curios if it still worked I pasted it into my terminal and, unsurprisingly, was met with an error:

tr: Illegal byte sequence

The tr utility is one of the many old-school Unix programs with history reaching way back to System V. It stands for “translate characters”, and with the -dc flags on, it should have ignored all input except for alphabet characters A-Z, both upper and lower case, and the integers 0 through 9, and the underscore character. The “Illegal byte sequence” error means it was really not happy with the input it was getting from /dev/urandom.

On macOS, the pseudo-device /dev/urandom is, according to the man page, “a compatibility nod to Linux”. The device generates random bytes, so if we read it we’ll get back raw binary data that looks like:

00010101 01011001 10111101

The reason the command is not working like it used to is because most modern computing system expect text character encoding to be UTF-8. When tr gets the string of random bytes from /dev/urandom, it expects the bytes to be in a specific sequence that it can translate into printable characters on the screen. Since we are intentionally generating random bytes though, we might get a few characters that translate properly, but eventually we’ll encounter the “illegal byte sequence” error above.

To fix the problem, all we need to do is set LC_ALL=C before running tr:

LC_ALL=C tr -dc A-Za-z0-9_ < /dev/urandom | head -c 14 | xargs

Setting LC_ALL=C sets the language setting back to POSIX, or the original C ASCII encoding for text. That means when tr is fed a random string of bytes, it interprets each byte as a character, according to the ASCII table, which looks something like this:

Character ASCII Decimal ASCII Hexadecimal Binary Representation
A 65 41 01000001
B 66 42 01000010
C 67 43 01000011

Now each byte is interpreted as a character that matches the list passed as an argument to tr.

➜ LC_ALL=C tr -dc A-Za-z0-9_ < /dev/urandom | head -c 14 | xargs

Rhac_WGis7tHzS

So, to break down each command in the pipeline:

  • tr: filters out all characters except those in the sets A-Z, a-z, 0-9, and _
  • head -c 14: displays the first 14 characters of the input from tr
  • xargs: adds a nice newline character at the end of the string, so it’s easy to copy.

This command could be easily adopted to use base64 instead of tr without setting LC_ALL=C if you wanted more random characters in the string:

base64 < /dev/random | head -c 14 | xargs

Expanding head -c to 34 or so makes for a nice password generator.

In fact, I’ve aliased this to pgen in my .zshrc:

pgen()
{
    base64 < /dev/random | head -c 32 | xargs
}

There’s almost certainly easier ways to generate a random string in the shell, but I like this, and it works for me.


Update: The good Dr. Drang suggested ending the pipeline and running echo instead of xargs for clairity, which makes a lot of sense to me. I updated the alias to base64 < /dev/random | head -c 32; echo.


The Tao of Cal

December 4, 2024

The Tao of Cal - Cal Newport

With the end of year rapidly approaching, and people finding themselves with some spare thinking time as work winds down for the holidays, I thought it might be fun to try to summarize essentially every major idea I discuss in one short primer.

I’m a big fan of Cal Newport’s work. I’d like to quote the entire post, but I’ll just post this one sentence. Read the rest of this article, then buy his books and read those too. Follow his advice and live a better life.


Godot Isn't Making it

December 4, 2024

Godot Isn’t Making it

Outside of a miracle, we are about to enter an era of desperation in the generative AI space. We’re two years in, and we have no killer apps — no industry-defining products — other than ChatGPT, a product that burns billions of dollars and nobody can really describe. Neither Microsoft, nor Meta, nor Google or Amazon seem to be able to come up with a profitable use case, let alone one their users actually like, nor have any of the people that have raised billions of dollars in venture capital for anything with “AI” taped to the side — and investor interest in AI is cooling.

Edward Zitron seems to be a rare voice of frustrated reason in the tech industry. He’s very critical of AI, and, more and more, I’m thinking rightfully so. OpenAI is spending over $2 to make $1, burning through billions with no path to profitability.

Couple that with the environmental cost of AI (and its just plain awful cousin, crypto currency) and the unreliability of the generated answers, and I’m wondering just where all of this goes in the next year or so.


Gross Apple Marketing

October 29, 2024

I’m not sure what’s going on over in Cupertino for them to think that any of the recent Apple Intelligence ads they’ve been running are a good idea. They’re cringy at best, and honestly just flat out insulting.

In one a schlub writes an email to his boss and uses AI to make it sound ‘more professional’, in another a young woman uses it to lie about remembering an acquaintance’s name. In another the same young woman again uses it to lie about reading an email from a college, to her face, while she’s sitting with her. In yet another, linked to recently by Scott McNulty, a woman uses AI to lie to her husband about getting him something for his birthday.

If this is what Apple thinks their AI is for, I honestly don’t know that I want any part of it.

Compare and contrast with the video I posted yesterday, and with this beautiful animation from Canonical.

I’ve watched that little animation several times, and they tell a better story in a minute twenty-five than all of Apple’s AI commercials combined.


Scout

October 28, 2024

I’m rooting for these guys. If they can pull off this truck at the $50-$60k mark, I think they are going to have a winner. I’ve been looking at electric trucks for a while, and I’m excited to see another entry in the market. And what a fantastic video:

A proper body-on-frame truck, 10,000 lb towing capacity, all electric, made in the USA. Count me in.


The Manual with Tim Walz

September 21, 2024

Love this guy. Patrick Rhone calls him folksy, I agree.


Loading and Indexing SQLite

October 19, 2023

What a difference a couple of lines of code can make.

I recognize that databases have always been a weak point for me, so I’ve been trying to correct that lately. I have a lot of experience with management of the database engines, failover, filesystems, and networking, but too little working with the internals of the databases themselves. Early this morning I decided I didn’t know enough about how database indexes worked. So I did some reading, got to the point where I had a good mental model for them, and decided I’d like to do some testing myself. I figured 40 million records was a nice round number, so I used fakedata to generate 40 million SQL inserts that looked something like this:

INSERT INTO contacts (name,email,country) VALUES ("Milo Morris","pmeissner@test.tienda","Italy");
INSERT INTO contacts (name,email,country) VALUES ("Hosea Burgess","kolage@example.walmart","Dominica");
INSERT INTO contacts (name,email,country) VALUES ("Adaline Frank","shaneIxD@example.talk","Slovenia");

I saved this as fakedata.sql and piped it into sqlite3 and figured I’d just let it run in the background. After about six hours I realized this was taking a ridiculously long time, and I estimated I’d only loaded about a quarter of the data. I believe that’s because SQLite was treating each INSERT as a separate transaction.

A transaction in SQLite is a unit of work. SQLite ensures that the write to the database is Atomic, Consistent, Isolated, and Durable, which means that for each of the 40 million lines I was piping into sqlite3, the engine was ensuring that every line was fully committed to the database before moving on to the next line. That’s a lot of work for a very, very small amount of data. So, I did some more reading and found one recommendation of explicitly wrapping the entire load into a single transaction, so my file now looked like:

BEGIN TRANSACTION;

INSERT INTO contacts (name,email,country) VALUES ("Milo Morris","pmeissner@test.tienda","Italy");
INSERT INTO contacts (name,email,country) VALUES ("Hosea Burgess","kolage@example.walmart","Dominica");
INSERT INTO contacts (name,email,country) VALUES ("Adaline Frank","shaneIxD@example.talk","Slovenia");

COMMIT;

I set a timer and ran the import again:

➜  var time cat fakedata.sql| sqlite3 test.db
cat fakedata.sql  0.07s user 0.90s system 1% cpu 1:13.66 total
sqlite3 test.db  70.81s user 2.19s system 98% cpu 1:13.79 total

So, that went from 6+ hours to about 71 seconds. And I imagine if I did some more optimization (possibly using the Write Ahead Log?) I might be able to get that import faster still. But a little over a minute is good enough for some local curiosity testing.

Indexes

So… back to indexes.

Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a table creates another data structure that holds the field values and a pointer to the record it relates to. Once the index is created it is sorted. This allows binary searches to be performed on the new data structure.

One good analogy is the index of a physical book. Imagine that a book has ten chapters and each chapter has 100 pages. Now imagine you’d like to find all instances of the word “continuum” in the book. If the book doesn’t have an index, you’d have to read through every page in every chapter to find the word.

However, if the book is already indexed, you can find the word in the alphabetical list, which will then have a pointer to the page numbers where the word can be found.

The downside to the index is that it does take additional space. In the book analogy, while the book itself is 1000 pages, we’d need another ten or so for the index, bringing up the total size to 1010 pages. Same with a database, the additional index data structure requires more space to hold both the original data field being indexed, and a small (4-byte, for example) pointer to the record.

Oh, and the results of creating the index are below.

SELECT * from contacts WHERE name is 'Hank Perry';
Run Time: real 2.124 user 1.771679 sys 0.322396


CREATE INDEX IF NOT EXISTS name_index on contacts (name);
Run Time: real 22.129 user 16.048308 sys 2.274184


SELECT * from contacts WHERE name is 'Hank Perry';
Run Time: real 0.003 user 0.001287 sys 0.001598

That’s a massive improvement. And now I know a little more than I did.


The Perfect ZSH Config

August 14, 2023

If you spend all day in the terminal like I do, you come to appreciate it’s speed and efficiency. I often find myself in Terminal for mundane tasks like navigating to a folder and opening a file; it’s just faster to type where I want to go than it is to click in the Finder, scan the folders for the one I want, double-click that one, scan again… small improvements to the speed of my work build up over time. The speed is increased exponentially with the correct configuration for your shell, in my case, zsh.

zsh is powerful and flexible, which means that it can also be intimidating to try to configure yourself. Doubly-so when there are multiple ‘frameworks’ available that will do the bulk of the configuration for you. I used Oh My Zsh for years, but I recently abandoned it in favor of maintaining my own configuration using only the settings that I need for the perfect configuration for my use.

I’ve split my configuration into five files:

  • apple.zsh-theme
  • zshenv
  • zshrc
  • zsh_alias
  • zsh_functions

I have all five files in a dotfiles git repository, pushed to a private Github repository.

The zshenv file is read first by zsh when starting a new shell. It contains a collection of environmental variables I’ve set, mainly for development. For example:

export PIP_REQUIRE_VIRTUALENV=true
export PIP_DOWNLOAD_CACHE=$HOME/.pip/cache
export VIRTUALENV_DISTRIBUTE=true

The next file is zshrc, which contains the main bulk of the configurations. My file is 113 lines, so let’s take it a section at a time.

source /Users/jonathanbuys/Unix/etc/dotfiles/apple.zsh-theme
source /Users/jonathanbuys/Unix/etc/dotfiles/zsh_alias
source /Users/jonathanbuys/Unix/etc/dotfiles/zsh_functions

The first thing I do is source the other three files. The first is my prompt, which is cribbed entirely from Oh My Zsh. It’s nothing fancy, but I consider it to be elegant and functional. I don’t like the massive multi-line prompts. I find them to be far too distracting for what they are supposed to do.

My prompt looks like this:

 ~/Unix/etc/dotfiles/ [master*] 

It gives me my current path, what git branch I’ve checked out, and if that branch has been modified since the last commit.

The next two files, as their names suggest, contain aliases and functions. I have three functions and 16 aliases. I won’t go into each of them here, as they are fairly mundane and only specific for my setup. The three functions are to print the current path of the open Finder window, to use Quicklook to preview a file, and to generate a uuid string.

The next few lines establish some basic settings.

autoload -U colors && colors
autoload -U zmv

setopt AUTO_CD
setopt NOCLOBBER
setopt SHARE_HISTORY
setopt HIST_IGNORE_DUPS
setopt HIST_IGNORE_SPACE

The autoload lines setup zsh to use pretty colors, and to enable the extremely useful zmv command for batch file renaming. The interesting parts of the setopt settings are the ones dealing with command history. These three commands allow the sharing of command line history between open windows or tabs. So if I have multiple Terminal windows open, I can browse the history of both from either window. I find myself thinking that the environment is broken if this is not present.

Next, I setup some bindings:

  # start typing + [Up-Arrow] - fuzzy find history forward
  bindkey '^[[A' up-line-or-search
  bindkey '^[[B' down-line-or-search
  
  # Use option as meta
  bindkey "^[f" forward-word
  bindkey "^[b" backward-word
  
  # Use option+backspace to delete words
  x-bash-backward-kill-word(){
      WORDCHARS='' zle backward-kill-word
  
  }
  zle -N x-bash-backward-kill-word
  bindkey '^W' x-bash-backward-kill-word
  
  x-backward-kill-word(){
      WORDCHARS='*?_-[]~\!#$%^(){}<>|`@#$%^*()+:?' zle backward-kill-word
  }
  zle -N x-backward-kill-word
  bindkey '\e^?' x-backward-kill-word

These settings let me use the arrow keys to browse history, and to use option + arrow keys to move one word at a time through the current command, or to use option + delete to delete one word at a time. Incredibly useful, use it all the time. Importantly, this also lets me do incremental searching through my command history with the arrow keys. So, if I type aws, then arrow up, I can browse all of my previous commands that start with aws. And when you have to remember commands that have 15 arguments, this is absolutely invaluable.

The next section has to do with autocompletion.

# Better autocomplete for file names
WORDCHARS=''

unsetopt menu_complete   # do not autoselect the first completion entry
unsetopt flowcontrol
setopt auto_menu         # show completion menu on successive tab press
setopt complete_in_word
setopt always_to_end

zstyle ':completion:*:*:*:*:*' menu select

# case insensitive (all), partial-word and substring completion
if [[ "$CASE_SENSITIVE" = true ]]; then
  zstyle ':completion:*' matcher-list 'r:|=*' 'l:|=* r:|=*'
else
  if [[ "$HYPHEN_INSENSITIVE" = true ]]; then
    zstyle ':completion:*' matcher-list 'm:{[:lower:][:upper:]-_}={[:upper:][:lower:]_-}' 'r:|=*' 'l:|=* r:|=*'
  else
    zstyle ':completion:*' matcher-list 'm:{[:lower:][:upper:]}={[:upper:][:lower:]}' 'r:|=*' 'l:|=* r:|=*'
  fi
fi

unset CASE_SENSITIVE HYPHEN_INSENSITIVE
# Complete . and .. special directories
zstyle ':completion:*' special-dirs true

zstyle ':completion:*' list-colors ''
zstyle ':completion:*:*:kill:*:processes' list-colors '=(#b) #([0-9]#) ([0-9a-z-]#)*=01;34=0=01'
zstyle ':completion:*:*:*:*:processes' command "ps -u $USERNAME -o pid,user,comm -w -w"
# disable named-directories autocompletion
zstyle ':completion:*:cd:*' tag-order local-directories directory-stack path-directories

# Use caching so that commands like apt and dpkg complete are useable
zstyle ':completion:*' use-cache yes
zstyle ':completion:*' cache-path $ZSH_CACHE_DIR

zstyle ':completion:*:*:*:users' ignored-patterns \
        adm amanda apache at avahi avahi-autoipd beaglidx bin cacti canna \
        clamav daemon dbus distcache dnsmasq dovecot fax ftp games gdm \
        gkrellmd gopher hacluster haldaemon halt hsqldb ident junkbust kdm \
        ldap lp mail mailman mailnull man messagebus  mldonkey mysql nagios \
        named netdump news nfsnobody nobody nscd ntp nut nx obsrun openvpn \
        operator pcap polkitd postfix postgres privoxy pulse pvm quagga radvd \
        rpc rpcuser rpm rtkit scard shutdown squid sshd statd svn sync tftp \
        usbmux uucp vcsa wwwrun xfs '_*'

if [[ ${COMPLETION_WAITING_DOTS:-false} != false ]]; then
  expand-or-complete-with-dots() {
    # use $COMPLETION_WAITING_DOTS either as toggle or as the sequence to show
    [[ $COMPLETION_WAITING_DOTS = true ]] && COMPLETION_WAITING_DOTS="%F{red}…%f"
    # turn off line wrapping and print prompt-expanded "dot" sequence
    printf '\e[?7l%s\e[?7h' "${(%)COMPLETION_WAITING_DOTS}"
    zle expand-or-complete
    zle redisplay
  }
  zle -N expand-or-complete-with-dots
  # Set the function as the default tab completion widget
  bindkey -M emacs "^I" expand-or-complete-with-dots
  bindkey -M viins "^I" expand-or-complete-with-dots
  bindkey -M vicmd "^I" expand-or-complete-with-dots
fi

# automatically load bash completion functions
autoload -U +X bashcompinit && bashcompinit


That’s a long section, but in a nutshell this lets me type one character, then hit tab, and be offered a menu of all the possible completions of that character. It is case-insensitive, so b would match both boring.txt and Baseball.txt. I can continue to hit tab to cycle through the options, and hit enter when I’ve found the one I want.

The last section sources a few other files:

[ -f ~/.fzf.zsh ] && source ~/.fzf.zsh
[ -f "/Users/jonathanbuys/.ghcup/env" ] && source "/Users/jonathanbuys/.ghcup/env" # ghcup-env
[ -s "/Users/jonathanbuys/.bun/_bun" ] && source "/Users/jonathanbuys/.bun/_bun"
source /Users/jonathanbuys/Unix/src/zsh-autosuggestions/zsh-autosuggestions.zsh
source /Users/jonathanbuys/Unix/src/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh

If I’m experimenting with Haskell, I’d like to load the ghcup-env variables. If I have bun installed (a way, way faster npm), than use that. The final two sources are for even more enhanced autosuggestions and command line syntax highlighting. So, typos or commands that don’t exist will be red, good commands where zsh can find the executable will be green. The autosuggestions take commands from my history and suggest them, I can type right-arrow to accept the suggestion, or keep typing to ignore it.

Taken together, I’ve been able to remove Oh My Zsh, but keep all of the functionality. My shell configuration is constantly evolving as I find ways to make things faster and more efficient. I don’t consider myself a command line zealot, but I do appreciate how this setup gets out of my way and helps me work as fast as I can think.


p.s. A lot of this configuration was taken from other sources shared around the internet, as well as the zsh documentation. I regret that I haven’t kept references to the origins of some of these configs. If I can find the links I’ll post them here.


Future Work and AI

January 26, 2023

I’ve been trying to wrap my small monkey brain around what ChatGPT will mean in the long run. I’m going to try to think this through here. In many ways the advances we’ve seen in AI this past year perpetuate the automation trend that’s existed since… well, since humans started creating technology. I’ve seen arguments that seem to be on two ends of a spectrum, that the AI is often wrong and unreliable, and we shouldn’t use it for anything important, to AI is so good that it’s going to put us all out of jobs. As with most truths, I think the reality is somewhere in between.

It’s my opinion that jobs that AI can replace, it probably will replace a lot of. But not all. Referring back to our discussion about the current state of Apple news sites, if the site is a content farm pumping out low-value articles for hit counts and views, I can see AI handling that. If your site is well thought out opinions and reviews about things around the Apple ecosystem, that I think will be safe. Because it’s the person’s opinion that gives the site value.

For more enterprise-y jobs, I could see fewer low and mid-level developers. Fewer managers, fewer secretaries, fewer creatives. Not all gone, but certainly less than before. If your job is to create stock photos and put together slide shows, you might want to expand your skill set a bit.

I think… the kind of jobs that will survive are the type that bring real value. The kind of value that can’t be replicated by a computer. Not just the generation of some text or code, but coming up with the why. What needs to be made, and why does it need to be made?

Maybe AI will help free us up to concentrate on solving really hard problems. Poverty, clean water, famine, climate change. Then again, maybe it’ll make things worse. I suppose in the end that’s up to us.


GPG Signing Git Commits

June 9, 2022

On my way towards completing another project I needed to setup gpg public key infrastructure. There are many tutorials and explanations about gpg on the web, so I won’t try to explain what it is here. My goal is to simply record how I went about setting it up for myself to securely sign my Git commits.

Most everything here I gathered from this tutorial on dev.to, but since I’m sure I’ll never be able to find it again after today, I’m going to document it here.

First, install gpg with Homebrew:

brew install gpg

Next, generate a new Ed25519 key:

gpg --full-generate-key --expert

We pick option (9) for the first prompt, Elliptic Curve Cryptography, and option (1) for the second, Curve 25519. Pick the defaults for the rest of the prompts, giving the key a descriptive name.

Once finished you should be able to see your key by running:

gpg --list-keys --keyid-format short

The tutorial recommends using a second subkey generated from the first key to actually do the signing. So, we edit the master key by running:

gpg --expert --edit-key XXXXXXX

Replacing XXXXX with the ID of your newly generated key. Once in the gpg command line, enter addkey, and again select ECC and Curve 25519 for the options. Finally, enter save to save the key and exit the command line.

Now when we run gpg --list-keys --keyid-format short we should be able to see a second key listed with the designation [S] after it. The ID will look similar to this:

sub   ed25519/599D272D 2021-01-02 [S]

We will need the part after ed25519/, in this case 599D272D. Add that to your global Git configuration file by running:

git config --global user.signingkey 599D272D

If you’d like git to sign every commit, you can add this to your config file:

git config --global commit.gpgsign true

Otherwise, pass the -S flag to your git command to sign individual commits. I’d never remember to do that, so I just sign all of them.

Make sure that gpg is unlocked and ready to use by running:

echo "test"  | gpg --clearsign

If that fails, run export GPG_TTY=$(tty) and try again. You should be prompted to unlock GPG with the passphrase set during creation of the key. Enter the export command in your ~/.zshrc to fix this issue.

Finally, Github has a simple way to add gpg keys, but first we’ll need to export the public key:

gpg --armor --export 599D272D

Copy the entire output of that command and enter it into the Github console under Settings, “SSH and GPG keys”, and click on “New GPG key”. Once that’s finished, you should start seeing nice green “Verified” icons next to your commits.