doep: Search for webapp speed part 2: Designing my own

Some thoughts about how I would make a webapp-engine if it should gain some real speed (just for fun):

Language:

There are several good languages out there for writing webapps. PHP, Java, C#, Perl, Python, Ruby, node.js (javascript), etc, etc.

It all comes down to what language you prefer, and I would prefer C++.

C++ have the speed and the flexibility to do everything, but the downside of being too complex for just doing webapps.

But C++ is my prefered language.

Model for speed:

So the big question is how a webengine should be modeled to avoid all the big bottlenecks. Lets look at the different parts of a webapp-server

Webserver:

First of all we need a webserver.

We could use an existing webserver like nginx, and it would probably be the best choice, but (just for the fun of it) lets see how a webserver would work if I made it myself.

Multiprocessing/Multithreaded/async webserver?

This is a good question. A multiprocessing webserver has the advantage of being safe. If a process dies/leaks/hangs the kernel handles the cleanup. But a multiprocessing webserver will take a lot of memory since each process has its own stack and overhead. The downside of a multiprocessing webserver is also that it's hard to communicate between processes.

A multithreaded webserver would probably be a good idea since threads doesn't take as much memory as a process, and threads share memory between each other. The downside would be that if a thread hangs, the whole process hangs.

We also have the option to do a async webserver (like node.js). Meaning that everything done in the webserver is done in a "non blocking" way using a library like boost::asio. The advantage of using a non blocking webserver is that you do not need to worry about any deadlocks, but it can be hard to build plugins that are non blocking. I find the async way very interesting and I've tried to find out how node.js does plugins like gzip nonblocking without luck. I will investigate this further :)

For speed and memory usage, I would go with either multithreaded or a async webserver.

Communication between app-engine and webserver:

We probably want a little bit of fail safety between our webserver and app-engine, so running the app and the webserver in different processes is probably a good idea. We could then write the webserver so it would start the app-process if it's not already running.

But running the app and the webserver in different processes will also cause our first performance problem. How do we communicate between the app-process and the webserver?

IPC (Inter-process communication) is a big topic on the internet, and I have to say that I do not know a lot about this. But as I see it, we have the following options:

Sockets: TCP/IP connections between the processes, just like the connection between the webserver and the client. This does not seem like an ideal option since it takes some time to open a new socket. But it's supported by all operatingsystems.
Pipe: (POSIX systems) You know.. like the pipes that you use on Linux like this "cat README.txt | grep stuff".
Named pipes: (POSIX systems) More advanced pipes, but does not seem like a good solution.
Semaphore: A kind of locking mechanism between processes.
Shared memory: A technique to share memory between processes. Does not seem to have a way to send notifications between processes though.

Sockets seems like the best solution, but to minimize latency, we should keep connections open between the app and the webserver, and reuse the connections.

The goal of the webserver should be to handle all static files and then be used as a kind of a proxy for the webapp and forward the connection to the app-process. Although it is probably a good idea to be able to send back some special headers from the app to the webserver like example "Webserver: use_compression=1" to tell the webserver to compress the response before sending it back to the client. This way we but some of the load over to the webserver, and let the app do it's own thing.

The apps:

So how do we design the app itself. The app itself should be a executable that when started will listen for socket connections. The webserver will be responsible for starting the app-executable if it's not already running.

So we need to write several classes and C++ templates to help making a good webapp (I will use the namespace "webplus" in this example for the classes/templates)

webplus::app should be the main class for the whole app, responable for listening to the correct socket (that probably comes from a config-file or send as parameters from the webserver when the app-executable is started)
webplus::request and webplus::response when a connection is made, the webplus::app should create a webplus::request and a webplus::response object that helps parsing headers, paths, and all request related information. The objects should then be send to the app-writers own code where the app decides itself if it should be written multithreaded or as a nonblocking app.
webplus::cachemap Helper map templates that could be used for objects often used. For instance if you have a User class and you would like to keep the 100 most frequently used User-objects in memory:
typedef webplus::cachemap UserCacheMap;
UserCacheMap userCache(100); // maximum 100 objects in the map
User doep('doep','Daniel'); // create a user object
userCache[1] = doep; // copy the object to the cache
It would probably be better to use a boost::shared_ptr if the objects gets bigger and I haven't fully thought this through. But you get the point ;)

Summary:

I'm not saying I'm about to write a new webapp-engine, I just started thinking about how I probably would have done if I where about to write one and I find it interesting to know how other people do things like this.

Although I have to admit that I am a little bit tempted to do this now, and a appengine like this would be ideal to use instead of the current musikServer.

doep

2010-06-28

Search for webapp speed part 2: Designing my own

1 comment: