How to Recover a Forgotten Password

I’m able to write this blog post because I have just recovered my OS X login / keychain password half an hour ago. If you’re in a rush you can skip the story part and go to the guide.

My Story

I came back from a two week vacation and after 2.5 days, shortly before lunch, I couldn’t type my password anymore. I’ve been using this password for the last couple of years and entered it on average maybe 10 times per day. The visual or concrete representation I had long forgotten. I was only relying on my muscle memory and the rhythm when entering it. Now something was missing and with every failed attempt the stress level increased and I started training the wrong password.

After two days I decided to start from scratch. Clean install, reset all passwords (I still knew my email password), take care of online banking accounts, paypal, serial numbers and pass phrases for my ssh keys which were deployed on about 700 hosts (!!!) – all of which was stored in my Keychain on OS X. I still had an unencrypted backup from 6 months ago with the bulk of my old data. My online backup, which I was so proud about, was encrypted with the same forgotten password, which rendered it useless. My SSD was FileVault encrypted so the recent data was completely lost.

In parallel I started a text file and every now and then tried to tap my muscle memory for the correct password to avoid repeating the wrong one and also to see which parts of the password were the same each time.

From that I knew that the password was between 17 and 19 characters long. I was pretty confident about the first 11 and the last 2 characters which left me with 6 unknown. Luckily I could also limit the set of characters that could have appeared in that position and with healthy brute forcing effort I finally have my password back. I forgot about 2 characters.

Recovering a Forgotten Password

Step 1: Write down everything you still remember about your password

Which phrase, word or thing it was based on. Estimate how long it was. Make a couple of muscle memory attempts in a text file and try to find parts you think are correct. Try to identify the parts were you feel something is off. Were any characters doubled as in ‘aa’ or ‘pp’ or were all characters in the password unique? Did you use lower case, upper case, numbers and symbols?

Step 2: Create a Charset

Make a list of characters which were part of the password and add those characters which you would use for passwords in general. Exclude the once which you are certain that you wouldn’t use them. Put the characters in which you are unsure about. Keep that list as short as possible. Sort that character list: lowercase letters, uppercase letters, numbers, symbols

Step 3: Create a Word List

Generate permutations of your password. To do this you can download one of many tools. The one I have used is called crunch and this is how I have used it:

./crunch 19 19 -f ./charset.txt hukl \
-o ./wordlist_18.txt -d 1@ -d 1, -d 1% -d 1^ \
-t FooBarBazBo@@@@@@\}\;

This is telling crunch to create password permutations which are 19 characters long, using my custom list of characters, with no duplications of lowercase letters (@), uppercase letters (,), numbers (%) and symbols (^). I also provided a pattern with all the parts of the password which I felt quite confident about and used the @ symbol at the positions I wasn’t sure. The @ symbol in the pattern is replaced with characters from the provided list of characters. Even though I knew 13 of 19 characters and I limited the character set, the resulting wordlist was still 8GB large. It grows quickly the less certain you are.

Step 4: Acquiring the Hash

Passwords are usually not stored in plain text. Otherwise the recovery process would be quite simple. In most cases they will be stored using a cryptographic hash algorithm which only allows encryption but not decryption. When trying to bruteforce a password you need to know which algorithm was used and then use the same algorithm for your variations of the password. The process is successful if the same hash is found. Now in my case I wanted to unlock the OS X login keychain to get access to my passwords, serial numbers etc. To do that I needed to extract the password hash from the file. Luckily, JohnTheRipper, the tool which I used for the bruteforcing, came with a keychain2john converter tool.

Many other files can be handled directly. Google what you need to do for your use case. I’d guess that many other people had the same problem as you.

Step 5: Brute Force Time

The most popular tools are JohnTheRipper and hashcat. Look them up and read what they can and can’t do. Read the documentation to find which options suit your scenario. Mine looked looked like this:

./john --wordlist=wordlist_19.txt ~/keychain/converted_keychain

There are many more options available but my work was mostly in crafting the wordlist so JohnTheRipper was just needed to efficiently go through it. There is an option to use multiple process forks but my wordlist was to large for that option so I let it run on just one core at about 10k passwords per second. If that would have failed I would have used hashcat on a machine with plenty of GPU power but luckily I didn’t have to go that far.

After 6 hours, half way through the wordlist, my password was found. My muscle memory forgot about 2 characters!

List Of Tools

Final Advice

  • Have a recovery strategy for your login password / your password manager master password
  • The shorter you can make the wordlist the better. If you know something about the order try using this to compile a shorter list. A colleague of mine wrote a quick and dirty program in Go to do that. Basically for each position in the password you can provide a list of characters that are likely to appear there. This way you can define an order and a character set and keep the wordlist short.
  • Print out those recovery keys
  • Have a local unencrypted backup if your individual paranoia level allows it
  • https://xkcd.com/936/

FreeBSD Slow ZFS Bootloader

This is just a quick little post for people running into the same problem and spending a lot of time to figure this out.

I was setting up a new server which was built with a Supermicro X11SSL-F motherboard. I booted via IPMI and installed FreeBSD 10.3 with a ZFS Root Mirror setup. After the reboot, the boot loader took forever to make progress. The whole boot process took about 10 minutes where most of the time was spent in the BTX Loader and the boot menu screen with the little spinning progress indicator making a move only every second.

Once the kernel was loaded, the rest of the boot process was fast again.

I suspected some weird BIOS / SATA interaction and first tried all kinds of BIOS settings to fix the issue but with no success. I tried different SSDs and HDDs and I tried using a non mirror setup. Lastly I checked whether UFS had the same problem because the installer from the CD image bootet just fine and indeed installing FreeBSD 10.3 on UFS worked normally.

Then I tried installing FreeBSD 11-ALPHA-3 and with that my ZFS Root Mirror setup booted lightning fast as well. That meant it was really the ZFS boot loader (gptzfsboot/zfsloader) that was problematic.

Lastly I tried installing 10.3 on a ZFS Mirror again, then booting the FreeBSD 11 image and copying the boot loader from 11 into my 10.3 /boot directory because a newer boot loader should be able to load older versions of FreeBSD.

So once I booted from the FreeBSD 11 image again, I imported and mounted my installed pool to /mnt and copied all boot / zfs related files from the image to /mnt/boot/ (watch out for /boot/loader.conf and /boot/zfs/zpool.cache!).

Then I rebooted and now the boot process was lightning fast like in the beginning when I installed 11 only that it was now booting 10.3.

The files I’ve copied are: /boot/pmbr, /boot/gptzfsboot and /boot/zfsloader

Also see »How FreeBSD Boots on ZFS«

Truth Tables (… you can handle the truth)

About eight years ago I worked on my first major commercial software project. Before I worked as a freelancer and had mostly rather small and dull projects. This project however had a team of eight people plus management and it was surprisingly complex at times.

One day we had to implement a feature which depended on four independent conditions. Independent in this case means that there were no exclusive branches of logic. To implement it multiple branches with similar but not identical logic were necessary. I was a bit scared because I knew that this would be one big ugly hairball of a nested if / case statement which is impossible to get right on the first attempt and will probably be hard to test.

Then our head of engineering came along, an experienced old school hacker, who told me: »Lets implement it with a truth table!«

At first I didn’t know what he was talking about and then I felt like this would be some kind of arcane low level hack solution to this complex problem but it turned out to make everything clean and easy to reason about.

Since then I had maybe three or four other occasions where I used truth tables to solve similar complex problems and I was always grateful for this advice. Every time I proposed to use a truth table in these situations there were always some colleagues who either didn’t know about them, just like me seven years ago, or knew them theoretically but never used them in practice.

The last occurrence was just recently in my current project at Wooga which is why I thought it might be a good idea to share the concept with everyone who does not already know what I’m talking about.

Show Me The Tables

If you have studied computer science and/or philosophy you should have seen truth tables on paper. They are often used to describe the behavior of logic gates for example. This is how a truth table for an AND gate would look like:

 A | B | Output
---|---|-------
 0 | 0 |   0
 1 | 0 |   0
 0 | 1 |   0
 1 | 1 |   1

This table tells you that the output of an AND gate will be 1 only if input A AND B are 1.

In philosophy the same tables are used for reasoning about logic statements in arguments
(philosophers, please correct me here if thats not accurate enough, I know you care :).

Practical Application of Truth Tables

For practical applications truth tables have a slightly different form. Imagine we have a website where users generate a lot of content and they have the option to download their entire history which is so massive that we have to process and compress it. Now the app can be in a couple of different states and in only one of the states the user can actually download the archive.

Lets start simple and say we have two boolean variables which are important to determine the state of our application.

requested_archive:   true | false
processing_finished: true | false

For each of the variables we would now assign a unique binary flag:

requested_archive:   01 (decimal: 1)
processing_finished: 10 (decimal: 2)

Now the truth table would look like this:

requested | processing |
archive   | finished   | result | meaning
----------|------------|--------|----------------------------- 
    00    |    00      |   00   | not requested, not processed
    01    |    00      |   01   | requested but not processed
    00    |    10      |   10   | impossible
    01    |    10      |   11   | requested and processed

The basic idea here is to evaluate the conditions separately and use a binary OR to compute the resulting bit flag.

Like I said, it is a simple example which does not strictly require implementing a truth table but even in this situation a truth table in form of a comment above the nested conditional statement can have a benefit for the next person reading your code.

Also it provides an overview of all the possible combinations and what cases are actually relevant for you. You know what unit tests to write and you can be certain that you’ve covered everything.

Implementation

Ok now lets implement this truth table. We already know what cases we have and what they mean so the first thing is to assign them symbolic names in your code. In this example I will use ruby but it should work in a similar way in your favorite language.

For this simple example it might not seem obvious why truth tables are great but it illustrates some aspects already. The evaluation of conditions is isolated into separate functions which you can unit test very easily.

When you unit test the archive_state function and you get an unexpected state, the bit flags will tell you exactly which of the conditions was failing your expectations.

You give each state a symbolic name and you avoid a nested conditional.

I know, I know this looks crazy complex but its just an example and as soon as you add one or more conditions it will be much more clear why I prefer a truth table over nested conditions any time.

The next code example is from a real world erlang application and its again about determining the state of an application. Based on the day of the season, the state of the user data and the state of the processing we have to do different things and there is no exclusive branch for one of the conditions which means the implementation would be painful. (Note: there is a hidden fourth condition in the first one, checking that the season is really over as in all matches have been simulated on the last day of the season)

With truth tables we know we have 2^3 possible combinations, of which two are impossible, we know exactly which cases mean what and what we want to do for each of them, we also can implement each conditions in neat and tidy functions which are easily unit testable, we have symbolic names and the final case statement is actually readable.

Without the comment which shows the actual truth table and without all the other comments, this whole code snippet would be a lot less intuitive of course but with them it is self sufficient documentation.

The best part is, even if you have four or five conditions the complexity of the code does not increase, it is just more work to write down the initial truth table of 16 or 32 possibilities but the implementation would follow the same pattern and the evaluating case statement would still be easy to read.

Personally I went as far as four conditions and when I would encounter a situation where five conditions are evaluated I would first spent a few hours trying to reduce the amount of conditions maybe even refactoring code but if you really have this kind of complex system, truth tables are your best chance to implement it without losing your sanity.

Let me know in the comments if that was helpful.

Tape vs Digital

We have recently recorded one of our bands rehearsals with a few dedicated microphones to get a better sound than with the Zoom H1 which we are normally using. We had one large condenser mic for the drums, one shure sm57 for both guitar amps and a DI box from Atelier der Tonkunst for the bass, recorded through a Focusrite Saffire Pro 40 into Reaper.

We had a few nice ideas for new songs which were recorded and later mixed by me. Then I thought I will do something special and record one of those jams onto a compact cassette via a ONKYO Integra TA-6711 tapedeck just to see how it would sound.

I connected the tapedeck to my Focusrite interface, recorded the song straight from Reaper onto a Sony chrome cassette and then recorded it back into reaper from the cassette.

I tried to match the two resulting files in average loudness (RMS) and would like to know if people can figure out which is which by just listening to the two versions, without looking at the waveforms because that would give it away I guess. Just try to listen to it back and forth. Personally I was surprised how close both versions are and have doubts whether I could tell them apart on cheap headphones.

So remember, its just a rehearsal room jam – not a finished song, right? Here are the two 24bit/48kHz aiff files for you. Let me know in the comments which one you think is which and I will resolve it in a few days:

Calypso_A.aif
Calypso_B.aif

Bonus Question: Which one do you like better?

Click Here for the Solution