Online Backup Services Revisited

In my last post I compared several online backup services and decided that Crashplan was my personal winner when looking at the features and the price. In the following weeks a couple of people started to complain about the slow transfer rates to their backup servers. I also tested the upload speeds from several networks in germany on internet connections with up to 100 MBit in both directions and I could not get any transfer rate above 3 MBit/s.

I also contacted the support which confirmed that Crashplan is not limiting the bandwidth at all. Inside the Crashplan application settings there are a few options which potentially limit the transfer rate but changing those did not improve the situation for me. I’d be interested to know if US customers get better transfer rates.

Anyway, it is quite unusable like this and so I finally gave up on Crashplan for now. Instead I am now evaluating Arq. Since it uses Amazon S3 storage, you can choose the region in which the datacenter is in and that seems to make a huge difference. I was able to upload my backups with up to 4 MByte/s ( 32 Mbit/s ) on a 100 MBit network which is still not wire speed but much better than the transfer rates of Crashplan. The graphical user interface of Arq is also surprisingly simple and pleasant to use. But this is just a small update. More after I have used Arq for some time.

UPDATE

Arq 2 was just released: http://www.haystacksoftware.com/blog/2011/08/arq-2-is-out/

Evaluating Online Backup Services

Backups, the everlasting topic. Years after several hard drive crashes, after manual backups and semi automatic backups I’m still thinking about the right solution.

Current Setup

Pile of hard drives

Currently I’m using an USB / SATA adaptor to connect various hard drives without enclosure to my computer to run Time Machine semi automatic backups. With OS X 10.7 (Lion) Time Machine will add local snapshots for the time I’m not connected to my backup drive and it will write those snapshots to the backup drive once I connect it again. That is all quite nice but not an optimal solution. Over the years I’ve collected quite a few drives and with every new drive I get more annoyed by the pile they form on my desk. This is still better than no backup but it requires me to connect my drives regularly to my computer. Sometimes I do – sometimes I don’t.

NAS Options

My setup could be vastly improved by using a NAS or Apples Time Capsule. This would allow me to back up over the air with no wires and drives lying around on my desk.

I don’t like Time Capsule because it is just one drive in a plastic box with no direct access to the hard disk or the files. You can attach a recovery disk but that just makes it a little better. Only one drive also means no data redundancy. If the disk crashes the backup is gone. As far as I know there are also no checks for data integrity so if some bits flip on the disk you never know. If thats not correct please leave a comment.

When I buy something like a network attached storage I would also want to use it as a fileserver and other things and Time Capsule just doesn’t offer this kind of flexibility.

The other option I considered was buying a NAS. I really don’t like the consumer plastic boxes like the Drobo, Qnap and Synology products. On the one side they offer lots of nice features like Time Machine compatibility, web interfaces, fileserver and file sharing features. On the other side they are quite expensive, most of them are ugly, they use Filesystems like Ext3/4 or HFS+ and I also heard real horror stories of complete data losses especially with Drobo. To be fair, these stories are one ore two years old.

Then I thought about buying an »Acer Aspire easyStore H341« or an »HP ProLiant – MicroServer« and building my own custom NAS. These are basically small Atom powered PCs with four HDD slots and no display connector. They usually come with Microsoft Windows Server and require some time investment to get going. I thought about using FreeNAS as I like FreeBSD and it uses the ZFS as filesystem. It comes with a nice interface and offers everything I would need. Modern filesystem with data integrity checks, fileserver and file sharing capabilities and Time Machine compatibility.

But again, with hard drives included it wouldn’t be cheap. It would still be no real offsite backup, it could still suffer from hardware failure or theft. Besides the costs for the hardware and the time of setting everything up, there are also some costs for power as this machine would have to run 24/7/365.

One final disadvantage of all those »local« backup solutions is that I have to be home to run my backups but I also want to backup when I’m at work or somewhere entirely different.

This is why I think that those NAS solutions are not right for me.

Online Backup Services

So what are the alternatives? After coming to the conclusion that I actually don’t want a NAS device I thought about online backup services.

Of course, the first question that comes to mind is privacy and security in general. I don’t want to hand over my precious data to some company without strong and secure encryption and by that I mean that nobody but me should ever be able to get my data.

The second thing to consider is storage. How much do I have to pay for how many gigabytes? How redundant is my data stored and is it checked for data integrity?

Luckily a quick google search revealed that there are a couple of interesting options available:

All of those offer encrypted backups. The data is encrypted locally before it is send to the storage servers in the internet. All of those services have detailed informations about their architecture and features and all of them seem to have happy customers.

Personally after researching for two hours I think I will try Crashplan and here is why:

Pricing

Crashplan is cheap. Its not the cheapest but its cheap enough. You get to backup one computer with unlimited online storage for 49$/year and they have offerings for multiple computers too.

Arq and Jungledisk store the data on Amazon S3 or Rackspace Cloud which are a little more expensive than the other services with their own data centers. The client software of Arq costs another 29$.

Spideroak is carging 10$ / Month / 100 GB.

Security

All the named services offer good encryption and they seem to take similar approaches as well. The important thing is that they offer the option to use a self generated private key which is crucial for having completely private backups. Even if the police would take away all the machines they wouldn’t be able to get to the actual data. Spideroak and Crashplan explain the encryption process very detailed on their websites.

Consistency / Data Integrity

Arq and Jungledisk can use S3 which is considered to be quite save from data corruptions but there are also stories of missing data floating around. But nobody is giving you a full guaranty. Spideroark is claiming a 0.0000% error margin. Crashplan claims daily data verification and auto repair should it ever get corrupted.

The named services seem to have a good reputation of not losing data.

On the Arq website there is also a section about metadata and how the different services manage to keep track of it. The systems are tested with a software called Backup Bouncer. JungleDisk and Arq seem to be the only ones passing all tests, Crashplan fails in one test, Dropbox and Backblaze fail in 19 of 20! The section might be outdated though and since Backup Bouncer is a free tool you can verify it yourself.

Software / Integration

With every of these services comes some kind of software. Arq and Backblaze have native OS X clients while the others have mutli platform tools that do not feel like native apps. This is the only real drawback I found with Crashplan.

Extras

Interestingly enough you also get de-duplicated, compressed and encrypted backups on all these services. With Crashplan you can even choose to not use de-duplication to reduce potential cpu load on your computer while checking for duplicate data. The backups are of course differential which means that only changed data is transmitted, not entire snapshots (except the first). Crashplan allows unlimited file sizes while other services have file size limits of 4GB! It can backup locked and files and if you decide to backup your OS X unix directories Crashplan will happily do so.

Over all Crashplan seems to offer fine grained control over varius aspects of backups – which I like. Their support seems to be alright too. I’ve asked how de-duplication actually works and I got a reply within four hours on a sunday without having an account or anything.

As I said, I will try Crashplan and in addition I will keep backing up irregulary to my external Time Machine disk – just to be sure.

I know that there a a lot of other tools out there and I’m still interested in other suggestions although I’ve probably checked them out already.

You might also want to check out Wikipedias »Comparison of online backup services«

Its worth checking out the FAQs and detailed features of all those services as they usually answer most of the questions you come up with.

Lastly you can google for “Service A vs Service B” and you will get a lot of more articles like these on the web to make up your own mind.

UPDATE 1

Somebody on twitter just pointed me to this post in the Crashplan Support forum where a native mac menu bar app in beta status is available.

UPDATE 2

Thomas posted a link in the comments to a comparison matrix that he made.

UPDATE 3

Another interesting hint from the comments: Dolly Drive
Apparently they offer TimeMachine backups in the “cloud”. Unfortunately their faq is a little short on details especially on security and data integrity so I guess I will write them a mail and put the info into another post.

UPDATE 4

Several (european) readers pointed out that the upload to the crashplan datacenter is really slow, maxing out at 1.3Mbps. This is definitively one major drawback for european customers and something where Arq or other european providers could shine.

Camel Case in MySQL Table Names is a Bad Idea

Today at work I encountered all kinds of “naming schemes” for MySQL tables and columns. Camel case table names in particular can cause serious pain because:

  1. Table names directly correspond to filenames on your hard drive
  2. There are tons of different filesystems and some of them are case insensitive. So if you develop on OS X (case insensitive) but deploy on Linux (case sensitive) things can get funny quickly
  3. There are several different SQL servers which handle camel case / case sensitivity differently. When you switch to PostgreSQL or Oracle you are likely to encounter problems
  4. Read this document to learn about possible implications in MySQL itself

If you use lowercase table names, separated by underscores, you can skip all those potential problems. Luckily renaming tables is not as expensive as altering them.

Cannot delete File / unmount disk because it is in use …

On OS X there are these moments when Finder tells you that the trash cannot be emptied or that a disk can not be unmounted because some files in/on them are still being used. When emptying the trash, Finder even tells you about the files in question but not about the app that is accessing them.

There are two ways to find out:

1. opensnoop

With opensnoop you can display what files are currently being accessed (as in live) including the process id and the name of the application. Either you can display all the files or just the one you are interested in.

For example I have an image on my desktop. I can attach to that file and when I open it via double click in Finder I get the following output:

sudo opensnoop -f /Users/hukl/Desktop/IMG_0434.JPG 
Password:
  UID    PID COMM          FD PATH                 
  501  10244 Finder         9 /Users/hukl/Desktop/IMG_0434.JPG 
  501     32 mds           15 /Users/hukl/Desktop/IMG_0434.JPG 
  501  10278 Preview        6 /Users/hukl/Desktop/IMG_0434.JPG 
  501  10278 Preview        6 /Users/hukl/Desktop/IMG_0434.JPG 
  501  10278 Preview        7 /Users/hukl/Desktop/IMG_0434.JPG 
  501  10278 Preview        8 /Users/hukl/Desktop/IMG_0434.JPG 
  501     32 mds           15 /Users/hukl/Desktop/IMG_0434.JPG 
  501  10278 Preview        6 /Users/hukl/Desktop/IMG_0434.JPG

This only helps though if the file is being actively accessed. More often though an application only holds a reference to the file, preventing Finder to delete it. In this case opensnoop is no good but luckily there is another way:

2. lsof

lsof basically lists information about all files opened by applications. Therefore if I want to know why I can’t delete this image I just opened I can run:

lsof | grep /Users/hukl/Desktop/IMG_0434.JPG 
Preview   10278 hukl    8r     REG               14,5    1584476 483868 /Users/hukl/Desktop/IMG_0434.JPG

Now that I know that Preview.app is still accessing the File I can kill the process and delete the file.

Many times its Finder itself still holding references to the files even if all the applications are closed and there is no apparent reason for not deleting the file. In this case option-click on the Finder icon in the dock and relaunch Finder (you can also kill it in Terminal of course). The files should be deletable and the disks should be unmountable.

Using the Intel 510 Series SSD in a 2011 MacBook Pro at full speed and with TRIM

I just got a new MacBook Pro from my current employer and since I got it without an SSD I bought the Intel 510 250GB and installed it. Everything worked smoothly after the first boot. However, as @denis2342 pointed out, there are a few extra steps to make it run at full speed and performance.

First of all, although this MacBook Pro has a SATA-III interface with up to 6 Gigabit, the System Profiler only showed a »Negotiated Link Speed« of 3 Gigabit. In order to make it negotiate to 6 Gigabit a SMC reset has to be performed. Basically you have to press the (left side) Shift-Control-Option keys and the power button at the same time and after that you have to boot normally.

After that System Profiler showed a »Negotiated Link Speed« of 6 Gigabit.

Then, although OS X enables TRIM support for Apples own SSD drives on the latest MacBook Pros, it doesn’t enable it for 3rd party SSDs. There were workarounds which involved patching a CoreFramework it was kind of messy and not something you’d recommend to any beginner. Luckily there is now a tool called »TRIM Enabler« which allows to backup and restore the Core Framework library and also to patch it with the click of a button. This also worked as expected and after another reboot the System Profiler showed that TRIM was enabled for my 3rd party SSD.

After I ran an Update the TRIM support was disabled again and I had to run TRIM Enabler once more.

I really hope that Apple is enabling TRIM for all SSDs with Lion to make this step unnecessary.

That is about it. This SSD is really blazing fast. If you’re interested, there is a nice in-depth review at anandtech.com

While the SSDs from other vendors are still faster, the Intel SSDs offer a higher reliability.