Why Erlang?

Posted on April 22, 2012 by hukl

The chance that you are reading this blog post on a device with a multicore cpu is increasing on a daily basis which is why everybody is talking about concurrency now. Concurrency for our web applications and API backends means that we’d like our htop to look like this:

htop screenshot

I’ve recently been to a really awesome ruby conference and three or four talks out of 21 were about concurrency. The ruby community is quite open and so many possibilities were discussed: Using threads, using different ruby runtimes to circumvent the GIL, using more processes, using the actor model via libraries like Celluloid or even using Akka through JRuby.

While the actor model seems to be a good fit for building concurrent network applications it often suffers from problems if the runtime it is implemented in has no “native” support for it. There are implementations for Ruby, Python and Java but they all have to jump through several hoops to get the job done and are not necessarily yielding the best performance. This is one of many reasons why Erlang would be a much better choice but first, lets talk about this actor model for a bit to understand why it is such a good fit.

The Actor Model

There is this nice quote from wikipedia which offers a first glimpse:

»The Actor model adopts the philosophy that everything is an actor. This is similar to the everything is an object philosophy used by some object-oriented programming languages, but differs in that object-oriented software is typically executed sequentially, while the Actor model is inherently concurrent.«

While there are some resemblances between actors and objects, like modularity, encapsulation and message passing, the main feature of actors is that they are being run at the same time.

Strictly using message passing for sharing state with other actors which run in parallel enables asynchronous communication, meaning that the sender does not have to wait for a response from the receiver.

Another big difference to the OOP world is that in the actor model there is no global state and therefore also no shared memory between actors. In languages like Java, Ruby and Python there is always global state and threads have access to shared memory. This is often a cause for trouble in the form of deadlocks or race conditions and is maybe the biggest pain of using threads.

In the actor model each actor has its own internal state and is only sharing it via messages. Thereby it is acting as a serializer for access to its state and effectively preventing deadlocks and race conditions.

It might be also worth noting that the actor model especially makes sense for functional languages as they embrace the concept of immutable data.

There is a lot more to read about actors but I would say these are the most important bits to know. In general the actor model makes designing and implementing concurrent applications a lot easier. Compared to threads there is no need of managing the access to information with mutexes, locks or semaphores or other complex abstractions.

Ok, so what about Erlang?

First let me tell you that for years I have been a passionate Ruby developer. I really like the language and community a lot. From time to time though I felt I was hitting some invisible walls when it came to network applications like web apps, web servers, proxies etc. Basically everything that had to handle a lot of requests and/or did non trivial tasks.

I had Erlang on my radar for quite some time but coming from my ivory tower with a ruby rooftop it took several attempts to convince me that it was worth a try. Conceptually it already made a lot of sense to me and I’m sure that most people who read about Erlang will agree. I have to admit that I was mostly appalled by the weird syntax so much that it stopped me from trying. This was a big mistake though and a large part of my motivation to write this blog post is about telling you that you should try out Erlang as soon as possible.

Anyway, first lets describe Erlang in one line:

»Erlang is a functional language, implementing the actor model for concurrency.«

Its a language which was developed by Ericsson for their carrier grade telecom switches and the design goals were to create a language that would allow to design fault tolerant, highly available and concurrently running systems.

You can read all about it on wikipedia or this awesome website: http://learnyousomeerlang.com/ – They do a much better job describing the language.

Case study for Erlang at Wooga

This post is about getting you to try it and I will do that by telling a story about Erlang at Wooga.

Wooga makes social games with millions of daily active users. The games constantly talk to the servers to transform and persist the users game state. Some of our game backends are developed in Ruby and that worked really well so far. Ruby, like I said, is a really nice programming language and although it is certainly not the fastest, you can squeeze a lot of performance out of it when you know what you are doing.

Our biggest game in terms of users, revenue and backend complexity runs on about 80 to 200 application servers though. It handles about 5000-7000 requests per second and almost all of them are changing the game state of the user. I’d say the amount of application servers is still reasonable for the amount of load but its certainly not the most impressive number.

Then some day a new backend had to be built for a game with similar complexity and my colleague Paolo suggested to use Erlang this time as he thought it would be a really great fit for us. We hired an experienced Erlang developer (Knut) and together they implemented the backend. By now this game has approximately 50% of the users of the other game and the number of application servers they need is: 1!

They run the backend on two or three servers for redundancy purposes but it could perfectly run on one. Even if it would actually need four it would still be drastically more efficient and performant that the other backend(s).

Now of course they also knew about all the mistakes we have made in previous games and its not Erlang alone that gave them so much better performance but rather they could implement the backend in a unique way which is really easy with the actor model and rather hard everywhere else.

Basically they’ve build a stateful web server which means that each user who is playing the game is represented by an actor inside of the Erlang VM. The user starts playing and an actor with the users game state is spawned. All subsequent requests for the time the user is playing are going directly to this actor. Since the game state is held in the actors own memory all requests, which would otherwise hit the database, can be processed and answered extremely quickly.

If the actor crashes, all the other actors are not being harmed since there is no shared / global state. When the user stops playing, the actor will save the game state to a persistent data store and terminate making it easy for the garbage collection. Since the data is immutable it is always possible to revert to the game state before the transformation started in case something goes wrong.

It is really awesome and there is a lot more to tell about it. Fortunately Knut and Paolo have spoken on a couple of conferences about it and shared their slides so you can get some more insights:

* http://www.slideshare.net/wooga/erlang-factory-sanfran
* http://www.slideshare.net/hungryblank/getting-real-with-erlang

More Erlang at Wooga

After Paolo’s and Knut’s success the Erlang virus spread inside of the company. We have started new game backends in Erlang and built smaller additional services with it. Personally I can confirm that the more you learn about Erlang the more it makes sense and feels right. It made me even feel a little bit sorry for those at the Ruby conference who were struggling with different runtimes and libraries to introduce the level of concurrency and ease of development that Erlang delivers in one package. A package that has been in production use for more than 20 years.

The hard part of learning new languages is to find a reasonably sized project to start with. Learning just by reading books is always slow as you forget most of what you read when you don’t play around with it. Apart from the weird syntax which I don’t find that weird anymore, not having an actual project to try Erlang was the biggest show stopper for me. So I encourage you to pick a small little project and play around with Erlang. I think you will not regret it.

I hope I will find the time for a follow up blog post about how I learned Erlang and about getting started in it soon. In the meantime go to learnyousomeerlang.com and get started on your own. Trust me – this site is better than any book about Erlang which you can buy right now.

PS: Thanks to Elise Huard for proof reading! If you have feedback, drawings of an ivory tower with a ruby rooftop to make this blog post more colorful or any other contributions send it right away!

Text Editors for Programmers on the Mac

Posted on October 2, 2011 by hukl

Preface

In the last 10 years many new text editors became available for Mac OS X. Since I have tried most of them I wanted to give an overview and a brief description about each one of them. This is for developers that are looking for decent editors on the mac to get their job done.

Lets get straight to it then.

The Editors

Emacs / Vim

Emacs and Vim are really the dinosaurs of text editors for programmers. Both have a immensely rich feature set and can be extended to a degree where you can forget about the OS they are running on. Although there is and always was a strong debate about which one of the two is superior, it is mostly a matter of personal taste in the end. Both have a steep learning curve and both are probably not something you want to give your web development intern on the first day at the office. However they are really powerful and they have one big advantage in common which is that they are available on a lot of operating systems and are installed by default on most of them. Whenever you log into a linux, bsd, osx or other unix machine you can be sure that vi or emacs or both are already there. On some shared web hosts you can’t even install your own packages so you are stuck with one of them. The cool thing is that you have to learn only one editor and all its specialities and you can work with it on almost any platform and on any remote machine. Since they are both really old they are known to work. They can handle big text files without crashing and for every problem you encounter there is already an extension to solve it. This is why many experienced programmers which started with more modern editors come back to them.

Some of the features that very few other editors offer are: horizontal and vertical split views, whole project tab completion ( most other editors only complete words from the same file ), fundamental customization of almost every behavior of the editor.

Besides the features related to programming they are also powerful text editors.

Both editors have OS X versions with a nice GUI wrapper to make them better integrated citizens, AquaEmacs and MacVim.

I really recommend reading the blog post by Yehuda Katz about switching to MacVim.

Both of them are open source and free of charge.

BBEdit / TextWrangler

Speaking of dinosaurs of course BBEdit and the light weight version called TextWrangler come to mind. They are not as old as Emacs and Vim but BBEdit is around for 20 years! It claims to be the editor »that doesn’t suck« and it is a strong contender. However it is really focussed on text rather than programming. It has a lot of features dealing with plain text manipulation, search and replace etc. but lacks some of the features I need in my daily programming job. It has syntax highlighting, auto completion, syntax checking for a few languages and much more but in the end it always feels a little dusty compared to other editors. For example it supports SVN, CVS and Perforce but not Git in its most recent version.

I’d recommend BBEdit more to markup authors than to programmers but its worth checking out the website and the trial version.

SubEthaEdit

I’m not sure how log SubEthaEdit exists but its at least 6 or 7 years I’d say. The wikipedia page does not say anything about it so if one of the readers has a more accurate number let me know.

SubEthaEdit shines in collaborative editing and most of the time that also applies more to plain text than to programming but still, its a cool feature which allows you to edit text documents via internet or local bonjour connection. Other than that it is also a decent editor for programming with support for a lot of different programming languages. It is simple and does not take a whole lot of time to learn. In the end though it is also limited compared to the other more powerful editors. Non the less I think it belongs on this list.

Smultron

When it comes to simple and basic text editors for programming activities you have to take a look at Smultron. It used to be free but now its around 5$ in the Mac App Store. Still quite cheap and it features an icon you can’t miss in your dock! But seriously – its probably one of the most straight forward and simple text editors for the mac. Possibly a good entry point when you are just starting with programming and only need syntax highlighting and other basic editing features.

Textmate

Textmate was and probably still is the default text editor for many ruby developers. When the web framework Ruby on Rails first came out, the screencasts that came with it demonstrated the editing powers of Textmate so that everybody else started to use it as well. Compared to the other editors that were available at the time it was superior in almost every discipline. It was a lot better integrated into the mac environment compared to emacs or vim, offering standard shortcuts, preferences and native text controls. It supported a lot of languages and so called bundles which are a mixture of snippets, macros and other useful language specific functions like syntax validation or build system integration. It offered more comfort and flexibility than SubEthaEdit or BBEdit. It felt more productive and faster and it was easy to learn.

Why am I using the past tense here? Most of this still applies but Textmate did not receive a major update in years. Textmate2 became what Duke Nukem Forever was already famous for: Vaporware. Just recently some Textmate developer claimed that there would be an alpha release by the end of the year but I won’t believe it until a download link shows up on their website.

The problem with Textmate is mostly its lack of performance with big files, the instability of some really useful extensions and the lack of some features I have learned to love in other text editors like split views for example.

I quit using Textmate because I used some extensions which made it crash to often and because it crashed and locked up when working with big files. Other than that it is still a great editor and its principles were copied to a lot of other editors on other platforms.

Even today I use it for html, xml and other markup languages because its just the fasted editor to work with in that discipline.

I highly recommend trying it. There is a 30-day trial version.

Kod

When the people realized that TextMate2 was vaporware and no new version to fix the issues described above was in sight, alternative editors slowly began to appear. One of them is Kod, although it is really in an early state it is a decent and simple to use editor for programming. Its not offering a lot of features, not like Textmate or the other more powerful editors, but still quite usable. If you don’t want to spend a few dollars for Smultron, this might be just the right editor for you. I mentioned Smultron before because it is more mature while Kod is at version 0.0.3. For additional hipness it is worth knowing that its build upon Googles V8 JS engine.

Readers pointed out that the development is stagnating and nothing really happens anymore but as a basic editor it still works I think.

Sublime Text 2

Like Kod, Sublime Text appeared as well received alternative to Textmate but compared to Kod it has the same kind of feature set as Textmate and more. It even supports Textmates color themes and language definitions which makes a migration to it quite easy.

It even has split views and its lighting fast. It has vertical / column selection (like Vim/Emacs/Textmate), multiple line / word select and different ways to expand the current scope of selection. Really, when I tried it the first time I was amazed how snappy it is. Like Textmate it features snippets, macros and build system integration. On the other side its highly configurable though you have to use simple config files instead of the standard OS X preference panes. I’d say its a perfect mixture of Vims configurability and Textmates editing comfort and speed. I highly recommend giving it a try if you are looking for a powerful, feature rich text editor for programming.

Also worth noting: Sublime Text 2 is a cross platform editor which runs on Windows, Linux and Mac OS X. That fact scares many mac users away without even trying it but I can assure you once more that it runs stable and snappy on OS X and feels as much as a cocoa app as I’d expect it to.

Its price tag is a bit higher than others, currently 59$, but I did not have to hesitate long to support the development of this editor after using it for a couple of days in trial mode.

Vico

If you like the concepts of Vim but MacVim is not wrapping enough of Vims “awkwardness” for you then Vico might be the right editor for you. Basically it uses the vim key bindings and therefore you hardly ever need a mouse to use it and it shows all the shortcuts in a native OS X menu bar. Other than split view support it is a very basic text editor for programming and features a custom scripting language to customize it.

Chocolat

This is yet another Textmate contender but it is currently in private beta phase and its too early to really judge it as many many features are missing in it. I have to say it has the most intuitive split view implementation of all the editors mentioned here but elementary things like vertical / column selection or parenthesis / scope highlighting are still missing.

If you want to give it a try nonetheless go to http://chocolatapp.com/ and sign up for the beta or go to their irc channel on the freenode network and ask for an invite. Took me 5 minutes to get one.

Coda/Espresso

These tools, although from different vendors, focus on the same group of people: web developers. They try to bring the entire development tool set together by bundling the functionality together that is otherwise only found in separate tools like file transfers, version control etc.

I tried them both but although they bundle together lots of features they are very limited at the same time to a certain flavor of web development.

Still on the list as I think they will appeal to some people.

Code • Espresso

skEdit

skEdit is the only editor I haven’t used personally yet so I can’t say much about it other that a colleague of mine is using it as his primary editor for quite some time now so I wanted to mention it for completeness sake. Like Coda and Espresso it is focused on web development but its not as limited to it.

Final words

My first editor was emacs although I only used it in a very basic way. After that I have used Textmate for a couple of years and switched briefly to MacVim for about half a year. Currently I’m using Sublime Text 2 on a daily basis for my programming work.

Personally, I don’t like IDE’s. There are programming languages or environments where an IDE is necessary and really superiour to text editors for programmers but I just don’t like these languages or environments either. So please don’t start a “but $IDE does all of that and more” discussion here. This is about text editors only and yes, I know … some text editors are almost like an IDE.

If I forgot to mention your favorite editor please let me know!

For a more elaborate list checkout wikipedias comparison of text editors

Online Backup Services Revisited

Posted on August 21, 2011 by hukl

In my last post I compared several online backup services and decided that Crashplan was my personal winner when looking at the features and the price. In the following weeks a couple of people started to complain about the slow transfer rates to their backup servers. I also tested the upload speeds from several networks in germany on internet connections with up to 100 MBit in both directions and I could not get any transfer rate above 3 MBit/s.

I also contacted the support which confirmed that Crashplan is not limiting the bandwidth at all. Inside the Crashplan application settings there are a few options which potentially limit the transfer rate but changing those did not improve the situation for me. I’d be interested to know if US customers get better transfer rates.

Anyway, it is quite unusable like this and so I finally gave up on Crashplan for now. Instead I am now evaluating Arq. Since it uses Amazon S3 storage, you can choose the region in which the datacenter is in and that seems to make a huge difference. I was able to upload my backups with up to 4 MByte/s ( 32 Mbit/s ) on a 100 MBit network which is still not wire speed but much better than the transfer rates of Crashplan. The graphical user interface of Arq is also surprisingly simple and pleasant to use. But this is just a small update. More after I have used Arq for some time.

UPDATE

Arq 2 was just released: http://www.haystacksoftware.com/blog/2011/08/arq-2-is-out/

Evaluating Online Backup Services

Posted on July 10, 2011 by hukl

Backups, the everlasting topic. Years after several hard drive crashes, after manual backups and semi automatic backups I’m still thinking about the right solution.

Current Setup

Pile of hard drives

Currently I’m using an USB / SATA adaptor to connect various hard drives without enclosure to my computer to run Time Machine semi automatic backups. With OS X 10.7 (Lion) Time Machine will add local snapshots for the time I’m not connected to my backup drive and it will write those snapshots to the backup drive once I connect it again. That is all quite nice but not an optimal solution. Over the years I’ve collected quite a few drives and with every new drive I get more annoyed by the pile they form on my desk. This is still better than no backup but it requires me to connect my drives regularly to my computer. Sometimes I do – sometimes I don’t.

NAS Options

My setup could be vastly improved by using a NAS or Apples Time Capsule. This would allow me to back up over the air with no wires and drives lying around on my desk.

I don’t like Time Capsule because it is just one drive in a plastic box with no direct access to the hard disk or the files. You can attach a recovery disk but that just makes it a little better. Only one drive also means no data redundancy. If the disk crashes the backup is gone. As far as I know there are also no checks for data integrity so if some bits flip on the disk you never know. If thats not correct please leave a comment.

When I buy something like a network attached storage I would also want to use it as a fileserver and other things and Time Capsule just doesn’t offer this kind of flexibility.

The other option I considered was buying a NAS. I really don’t like the consumer plastic boxes like the Drobo, Qnap and Synology products. On the one side they offer lots of nice features like Time Machine compatibility, web interfaces, fileserver and file sharing features. On the other side they are quite expensive, most of them are ugly, they use Filesystems like Ext3/4 or HFS+ and I also heard real horror stories of complete data losses especially with Drobo. To be fair, these stories are one ore two years old.

Then I thought about buying an »Acer Aspire easyStore H341« or an »HP ProLiant – MicroServer« and building my own custom NAS. These are basically small Atom powered PCs with four HDD slots and no display connector. They usually come with Microsoft Windows Server and require some time investment to get going. I thought about using FreeNAS as I like FreeBSD and it uses the ZFS as filesystem. It comes with a nice interface and offers everything I would need. Modern filesystem with data integrity checks, fileserver and file sharing capabilities and Time Machine compatibility.

But again, with hard drives included it wouldn’t be cheap. It would still be no real offsite backup, it could still suffer from hardware failure or theft. Besides the costs for the hardware and the time of setting everything up, there are also some costs for power as this machine would have to run 24/7/365.

One final disadvantage of all those »local« backup solutions is that I have to be home to run my backups but I also want to backup when I’m at work or somewhere entirely different.

This is why I think that those NAS solutions are not right for me.

Online Backup Services

So what are the alternatives? After coming to the conclusion that I actually don’t want a NAS device I thought about online backup services.

Of course, the first question that comes to mind is privacy and security in general. I don’t want to hand over my precious data to some company without strong and secure encryption and by that I mean that nobody but me should ever be able to get my data.

The second thing to consider is storage. How much do I have to pay for how many gigabytes? How redundant is my data stored and is it checked for data integrity?

Luckily a quick google search revealed that there are a couple of interesting options available:

All of those offer encrypted backups. The data is encrypted locally before it is send to the storage servers in the internet. All of those services have detailed informations about their architecture and features and all of them seem to have happy customers.

Personally after researching for two hours I think I will try Crashplan and here is why:

Pricing

Crashplan is cheap. Its not the cheapest but its cheap enough. You get to backup one computer with unlimited online storage for 49$/year and they have offerings for multiple computers too.

Arq and Jungledisk store the data on Amazon S3 or Rackspace Cloud which are a little more expensive than the other services with their own data centers. The client software of Arq costs another 29$.

Spideroak is carging 10$ / Month / 100 GB.

Security

All the named services offer good encryption and they seem to take similar approaches as well. The important thing is that they offer the option to use a self generated private key which is crucial for having completely private backups. Even if the police would take away all the machines they wouldn’t be able to get to the actual data. Spideroak and Crashplan explain the encryption process very detailed on their websites.

Consistency / Data Integrity

Arq and Jungledisk can use S3 which is considered to be quite save from data corruptions but there are also stories of missing data floating around. But nobody is giving you a full guaranty. Spideroark is claiming a 0.0000% error margin. Crashplan claims daily data verification and auto repair should it ever get corrupted.

The named services seem to have a good reputation of not losing data.

On the Arq website there is also a section about metadata and how the different services manage to keep track of it. The systems are tested with a software called Backup Bouncer. JungleDisk and Arq seem to be the only ones passing all tests, Crashplan fails in one test, Dropbox and Backblaze fail in 19 of 20! The section might be outdated though and since Backup Bouncer is a free tool you can verify it yourself.

Software / Integration

With every of these services comes some kind of software. Arq and Backblaze have native OS X clients while the others have mutli platform tools that do not feel like native apps. This is the only real drawback I found with Crashplan.

Extras

Interestingly enough you also get de-duplicated, compressed and encrypted backups on all these services. With Crashplan you can even choose to not use de-duplication to reduce potential cpu load on your computer while checking for duplicate data. The backups are of course differential which means that only changed data is transmitted, not entire snapshots (except the first). Crashplan allows unlimited file sizes while other services have file size limits of 4GB! It can backup locked and files and if you decide to backup your OS X unix directories Crashplan will happily do so.

Over all Crashplan seems to offer fine grained control over varius aspects of backups – which I like. Their support seems to be alright too. I’ve asked how de-duplication actually works and I got a reply within four hours on a sunday without having an account or anything.

As I said, I will try Crashplan and in addition I will keep backing up irregulary to my external Time Machine disk – just to be sure.

I know that there a a lot of other tools out there and I’m still interested in other suggestions although I’ve probably checked them out already.

You might also want to check out Wikipedias »Comparison of online backup services«

Its worth checking out the FAQs and detailed features of all those services as they usually answer most of the questions you come up with.

Lastly you can google for “Service A vs Service B” and you will get a lot of more articles like these on the web to make up your own mind.

UPDATE 1

Somebody on twitter just pointed me to this post in the Crashplan Support forum where a native mac menu bar app in beta status is available.

UPDATE 2

Thomas posted a link in the comments to a comparison matrix that he made.

UPDATE 3

Another interesting hint from the comments: Dolly Drive
Apparently they offer TimeMachine backups in the “cloud”. Unfortunately their faq is a little short on details especially on security and data integrity so I guess I will write them a mail and put the info into another post.

UPDATE 4

Several (european) readers pointed out that the upload to the crashplan datacenter is really slow, maxing out at 1.3Mbps. This is definitively one major drawback for european customers and something where Arq or other european providers could shine.

Camel Case in MySQL Table Names is a Bad Idea

Posted on June 21, 2011 by hukl

Today at work I encountered all kinds of “naming schemes” for MySQL tables and columns. Camel case table names in particular can cause serious pain because:

Table names directly correspond to filenames on your hard drive
There are tons of different filesystems and some of them are case insensitive. So if you develop on OS X (case insensitive) but deploy on Linux (case sensitive) things can get funny quickly
There are several different SQL servers which handle camel case / case sensitivity differently. When you switch to PostgreSQL or Oracle you are likely to encounter problems
Read this document to learn about possible implications in MySQL itself

If you use lowercase table names, separated by underscores, you can skip all those potential problems. Luckily renaming tables is not as expensive as altering them.

SMYCK

a blog by John-Paul Bader