Blog

The Making of Semaphore Decoder

I recently built a point-and-click visual semaphore decoder for puzzling (link). As far as I can tell, it is the only semaphore decoder online that lets you enter inputs visually (but let me know if there are others!).

I can't believe you made my pipe dream into a reality, and it only took you about two days - Johnny

For the record, it was much closer to 1.5 days for the code, but then I spent at least another half-day writing this post, so I guess he's not wrong. Here's some notes about the process.

Why?

I've been doing a lot of puzzles lately with my friend Johnny, and these puzzles often involve flag semaphore, a visual representation of letters using the position of two flags. These are often encoded as times on a clock, but sometimes also as things like tree branches or dance moves.

It's usually fairly quick to recognize a code that could be semaphore: there are two distinct lines radiating out from a single point. But even if I know that, it's still pretty tedious to actually decode it. For encodings that are easily representable by text, there's often very good online tools to solve them - for example, dCode has nice decoders for things like Morse code, Caesar cipher, ADFGVX, and so on. But none of the existing tools I found are very good for solving visual encodings like semaphore.

The standard lookup table for semaphore

It's not just that the encoding is visual. For example, the pigpen cipher is very visual, but it's comparatively painless to decode. Why? It has a visual hierarchy: even though there are four different subcharts in the lookup table, it is immediately obvious which one to look at from the shape of the code and the dot (or absence of a dot). Similarly, if you wanted to decode Morse code by hand, you can use a Morse decoding tree to quickly converge on the right letter.

In semaphore, there isn't a clear way to do this: no text representation, no hierarchy, no ordering. Hunting for the right shape in the lookup table is tedious. Quick, what does the semaphore with left flag horizontal and right flag at a 45 degree angle towards the ground mean? You don't really have a search strategy besides looking through every single letter. The best alternative I've seen is the "pie-slice" semaphore chart, but even that one takes feels slow and painful for me to read (even though in theory it should be just like the pigpen chart).

I thought the best way to do this would be a visual decoder, where you can just position the "flags" the way you want it to. I couldn't find anything on the web that did this, not even close. So, I made it.

A small detour in gesture recognition

My first attempt at making this was pretty questionable in hindsight. About a year ago I found out about the $1 Unistroke Recognizer from Brad Myers' course 05-640 Interaction Techniques at CMU. It's seriously impressive - the demo on the linked site is incredible, especially when you consider that it's trained on only one example of each gesture.

So anyways, I suddenly remembered the existence of the $1 Unistroke Recognizer and thought, perfect! I'll be able to just use my mouse and squiggle an angle that vaguely resembles the semaphore position, and then it'll magically machine-learningly decode it for me. I tried it on a couple user-defined semaphore positions and it worked!

So then I made a copy of the code locally so I could persist the templates of each letter. I threw one example of each letter in and tested and... found the results were confused. Flag positions that looked nothing alike were being mistakenly recognized! Even worse, these matches had high confidence. I went back to the original paper and found a clue:

It supports configurable rotation, scale, and position invariance,

Ah! Normally this is useful: it means that you can draw a gesture at a different angle, size, and position from the template and still have it be recognized. But in my case, every single semaphore letter is a rotation of another one. We actually don't want any of these invariants, so I spent some time digging in the code to try and disable this.

After doing that and adding a few more examples for each letter, accuracy improved, but it was still far lower than you would want for a computer-aided puzzling tool. The results were especially wrong for D, R, and L, which are all straight lines - I believe this is related to the "indicative angle" step in the $1 paper, which relies on the centroid of the stroke and the starting point. It's pretty hard to draw a straight line, so this metric varies wildly for straight-ish strokes that, to the human eye, look very similar.

I showed these results to Johnny as well as my roommate Ishraq, and they said almost identical things, to the effect of Cool demo, except for the part where it doesn't really work. Why are you trying to machine learn your way out of a simple problem like this? Isn't this a really small space with a small set of fixed positions? And... they were right. So I scrapped this approach - someday I'll have a legitimate reason to use the $1 Recognizer, someday - and went back to rethink the idea.

The idea

Having tried and failed at handwaving the problem away with machine learning, I then decided to actually think about the problem.

A semaphore letter (from here on shortened to "semaphore" for brevity) consists of exactly two distinct flag positions. There are only eight possible flag positions, set at 45 degree angles from one another. Since there are so few valid positions, I realized these can just be clickable elements. Once the user clicks two flag positions, it's just a (painless, computerized) table lookup to find which letter it represents.

The implementation

I've written some vanilla Javascript before, but I thought it would be a little tedious to add all the event listeners, keep track of state, and so on. I've also written a decent amount of React, but I didn't want to deal with setting up webpack or Babel or whatever for this little project. So I figured this would be a good opportunity to try and learn Vue - I glanced over the Vue introduction and liked what I saw, copy-pasted the starter code, and got started.

I wrote the basic implementation in about a day and cleaned it up in another half-day. There's just two Vue components: a flag, which represents a possibly-selectable flag using a clickable div with an position (angle) property; and the semaphore-decoder, which creates eight flags at the appropriate angles, handles the flag click events, and displays the results.

Writing Vue is a great experience. The concepts of v-bind and v-model are easy to pick up, and the documentation overall is very helpful (and searchable, too). I only needed to go to non-official documentation once, to figure out how to attach a global keydown listener (spoiler: you can't, but there's decent workarounds). And I was slowed down a bit by simple syntax errors since I was writing in a single file with string templates, so no IDE support.

But those are both pretty trivial issues. I spent a lot more time messing with CSS to get the flags to display correctly than I did with any Vue issue. I'll definitely use Vue again the next time I have a small project like this. In particular I really like the $emit event mechanism for communicating from child to parent - it feels a lot more natural than passing callbacks in React: like throwing an exception versus passing a continuation.

The interface

After the basic implementation, I had a single-letter semaphore decoder - great! But most of the time you actually need to decode several letters at a time. Clicking twice and writing the letter down on paper, or worse, alt-tabbing to type the letter, didn't seem like a good choice. So, I added a text buffer that you can push letters to by hitting the Enter key.

Originally this text buffer was a simple div, so you couldn't change the results after they were pushed. There did exist a Clear button to empty the buffer entirely, but losing progress is never fun. Johnny pointed out that it would be a lot better if it was a text input box, so you could just manipulate text in the normal ways: hit backspace to delete a single character, select-all, copy-paste, and all that.

There's also a quick link to anagram your current result on Nutrimatic, a powerful pattern-matching tool. Often the letters in the semaphore message will be jumbled, and you need to use additional information to get the correct ordering. However, it's often faster to just directly anagram the letters. Using Nutrimatic also synergizes with the textbox, because you can use Nutrimatic's pattern language syntax: like capital A (to represent any letter), if you're unsure of a symbol - and that will be anagrammed as well.

With some more testing, I found that using Enter to add a letter to the input box had an issue: if you had the "Clear" button focused, hitting Enter would write the letter, and then trigger the clear event and immediately delete it. I initially changed the button to be a div styled to look like a button (which can't be focused), but realized later that this problem is unavoidable for links. So, I moved the button to be . (period) instead, since you might want to type actual letters, and modifier keys like Ctrl and Shift resulted in weird interactions like accidentally double-adding the current letter if you Ctrl-Tab to a different tab.

Finally, I tried the site on mobile, and it was hot garbage. The flags were too thin to click on, and the text was all tiny. So I spent some time making it usable on mobile, starting with Skeleton and eventually just setting a pretty small max-width on everything, even on desktop. After I got it working on the Responsive Design Mode in Firefox DevTools, I spun up a local webserver (python -m http.server) and pulled up the site on my phone... only to see a different flavor of hot garbage. At least it was new and exciting garbage.

Now all of the flags were unreasonably tiny and unclickably close together, for some reason. I wanted to pull up DevTools on my phone, but apparently you can't do that at all. Luckily, it turns out you can do remote debugging over USB on Android! I tried it, and it turned out my mobile Firefox version was too old (68.6.0) for my browser (76.0.1) to remote debug. Then, I installed Firefox Nightly on my phone, fired it up, and found... that the site rendered correctly? And then I had nothing to actually remote debug, which is too bad.

So, the site is currently broken on Firefox on Android (at least, on my phone). I think the root cause is probably related to how CSS viewport units are handled, since that's how the flag size is calculated. But there's reason to believe that it will work in a few browser updates since it works on Nightly.

Evaluation

It's lame to claim that you made things better without backing it up, so I did a casual self-evaluation.

I tested the four following semaphore decoding methods:

  1. Regular lookup table (as above)
  2. Pie-slice lookup table
  3. My semaphore decoder, without the textbox
  4. My semaphore decoder, with the textbox

I expect this ordering to be from slowest to fastest. I don't expect to see any learning effect here - I've decoded plenty of semaphore by hand and I don't think it gets much easier (that's why I built this in the first place) - but just in case, I will start from the expected fastest method and move up towards the expected slowest.

I asked Johnny to send me four semaphore messages (generated using dCode's semaphore tool). Each message was 15-18 letters long. I used exactly one message per method, because I don't want my knowledge of the message to affect the results. I timed using my phone stopwatch, with the tables/decoder pre-opened, starting the timer right before I looked at the message and stopping after I finished decoding the last letter.

Here's the results, from fastest to slowest:

rank predicted rank method time (seconds)
1 (tie) 1 decoder + textbox 64
1 (tie) 4 lookup table 64
3 3 pie-slice table 71
4 2 decoder, no textbox 72

So, pretty surprising results! Note that the gap between ranks 3 and 4 is very small, probably within error bars if I had any. The lookup table and pie-slice table did better than I expected.

There's an interesting effect I noticed while testing: if you can predict the next letter, the lookup table becomes a lot more effective. If I see a Q, I can guess that the next letter is probably a U - bigram (and longer sequence) frequencies can be used for automated code breaking, by trying different decodings and checking if the resulting plaintext has reasonable bigram frequency. Here's an example of that using quadgram frequency.

I believe this prediction effect is why the lookup table performed better than I expected. Johnny chose messages that were well-known phrases (specifically, variations on inside jokes). So, they were predictable even across word boundaries. I would expect the lookup table to perform worse in a puzzle hunt setting, where answers are less predictable, and possibly even in a scrambled order that needs to be anagrammed or reordered with additional information. Testing this idea is left as an exercise to the reader because Johnny won't generate any more messages for me and this post is already 2000 words long and I'm not even done yet.

It's also interesting to see that the decoder with no textbox is actually worse than using the pie-slice table. I think this is caused by context-switching: you need to either click to a different window to type the result, or pick up a pen and physically write down the letter. Meanwhile, with the pie-slice table you can just look back and forth and write as you go.

Finally, I found that using . to enter text was a little rough because of the lack of feedback - sometimes I had to go back and look to see if the letter actually was added, because I wasn't sure. I think in a future version I'll try adding a small audio or visual cue to make it clear when a letter is added via the . shortcut.

Conclusion

I built and iterated on a visual semaphore decoder in just a couple days with the help of my friend Johnny. It works! It's probably faster than looking them up in a table. It definitely feels better to me.

The code is a single HTML file, using Vue in an in-line script. This was my first Vue project, so I probably did some questionable things. The source is available on Github - if you have any feedback, please open a pull request or contact me!

Thanks to Johnny Mok and Ishraq Bhuiyan for feedback on the project. Thanks to Johnny Mok for feedback on this post.