12
Dec/09
0

Handling Multipart Uploads with MochiWeb

I was getting along fairly well with my own custom-built, lightweight Erlang web server, until I needed to handle file uploading. At this point, I decided it was time to stop trying to re-invent the wheel (or take the lazy approach—whichever way you choose to look at it), and take another look at MochiWeb, which implements this feature.

However, MochiWeb’s documentation is non-existent (unless I’m missing something—please let me know if I am). I came across James Gardner’s post on handling file uploads, which helped me out. But after digging through the MochiWeb source-code I discovered a slightly easier way of handling file uploads. It may be that this method has been added since James’ post in 2007, or maybe it’s just less appropriate to James’ requirements.

I’m going to quickly run over how to setup MochiWeb. I know this has been covered quite a lot by other people, but it’s something a found a little daunting when I started out, so I want to emphasise that it’s a lot easier to setup than maybe first appears. Then I’ll post and explain a short chunk of code for handling uploads, and saving them to /tmp. Finally, I’ll provide an example of using this technique to provide a very simple photo gallery system.

Setting up MochiWeb

There’s a pretty good tutorial on the BeeBole blog, but I’ll cover the basics here.

First step is to check out the code from the Google Code repository. Make sure you have subversion installed, and then:

svn checkout http://mochiweb.googlecode.com/svn/trunk/ mochiweb

Now, make sure that you have Erlang installed, and:

cd mochiweb
make
chmod +x scripts/new_mochiweb.erl
./scripts/new_mochiweb.erl mochiweb_uploads ../.

This will build the MochiWeb system and then create a new MochiWeb project. You can repeat this process whenever you’re starting a new project. Note that mochiweb_uploads is the project’s name. MochiWeb is a bit picky about project names: they must be valid module names, so (it seems) you can’t use hyphens.

Next, we should make sure that the project builds and runs before we start coding:

cd ../mochiweb_uploads
make
./start-dev.sh

By default, your MochiWeb project is setup as a lightweight web server that serves files from the priv/www directory on port 8000. All being well, you should now be able to point your browser at http://localhost:8000/ and see the ‘MochiWeb is running.’ message.

If you have any problems, you’ll need to wade through the progress reports on the console to try and figure out where things are going wrong. The clues will be in any ‘crash reports’.

You can stop the server with Ctrl+C, then entering ‘a’, and hitting return. Or you can just hit Ctrl+C twice.

Handling File Uploads

First of all, we need to put together a page that we can test our uploading with. We’ll just replace the index.html file in priv/www with something straight-forward:

<html>
<head>
<title>MochiWeb Upload Test</title>
</head>
<body>
<form action="upload_photo" method="post" enctype="multipart/form-data">
    <input type="file" name="photo" />
    <input type="submit" value="Upload" />
</form>
</body>
</html>

Routing the Request

The code for starting the web server is in src/mochiweb_uploads_web.erl. Open up this file and take a look at the loop/2 function. The function is split up into two parts by the outer case construct: the first part is for HTTP GET (and HEAD) requests, the second is for POST requests. We’re going to be handling the file uploads in the POST clause. Like so:

'POST' ->
    case Path of
        "upload_photo" ->
            upload_photo(Req);
        _ ->
            Req:not_found()
    end;

All we are doing here, is delegating responsibility for handling the ‘upload_photo’ request to a function called ‘upload_photo’, which we can add to the bottom of mochiweb_uploads_web.erl:

upload_photo(Req) ->
    Req:ok({"text/html", [], "<p>Hello, world!</p>"}).

That’s going to just return a ‘Hello, world!’ message so we can check the request is getting routed correctly. If we try out the system (make, ./start-dev.sh, go to localhost:8000), we should initially see the form, then after submitting the form our message should appear.

Handling the POSTed multipart data

Now onto actually handling the POSTed multipart data. We’ll replace our upload_photo/1 function with this:

upload_photo(Req) ->
    FileHandler = fun(Filename, ContentType) -> handle_file(Filename, ContentType) end,
    Files = mochiweb_multipart:parse_form(Req, FileHandler),
    Photo = proplists:get_value("photo", Files),
    % TODO: handle the photo here
    Req:ok({"text/html", [], "<p>Thank you. <a href=\"index.html\">Upload another?</a></p>"}).

Let’s examine this line-by-line.

First of all, we’re specifying a function that we pass to the mochiweb_multipart:parse_form/2 function. The function that we’re passing will get called once for every file that is present in the POST data (i.e., that is in the form)—don’t forget that multiple files may be being uploaded.

The file will be split into ‘chunks’. So it’s the job of this ‘FileHandler’ function to return another function that will be used to consume each chunk of the file (and then, finally, the ‘eof’ atom). We’ll come back to this in a moment.

The parse_form/2 function will then return a list (in fact, a property list) of all the files. There will be a mapping from the name of the input to the value finally returned from our file handler.

Let’s take a look at the handle_file/2 function:

handle_file(Filename, ContentType) ->
    TempFilename = "/tmp/" ++ atom_to_list(?MODULE) ++ integer_to_list(erlang:phash2(make_ref())),
    {ok, File} = file:open(TempFilename, [raw, write]),
    chunk_handler(Filename, ContentType, TempFilename, File).

We use erlang:phash2/2 and make_ref/0 together with a (hopefully) application-specific prefix to construct a random filename which will reside in ‘/tmp’. A couple of points to make here: collisions aren’t impossible and this is platform-dependent. We could get the temporary directory from the operating system, and we could generate better filenames.

We open a file for writing to and pass it to our ‘chunk handler’ ready for consuming the first chunk. Note that chunk_handler/4 is a function that returns a function. We return this generated function back to the MochiWeb code, which will use it to handle the first chunk.

Before I go on to explain the ‘chunk_handler’, I should point out that in theory we could use this opportunity to filter the input. If (for some reason) we wanted to handle different types of file in different ways, we could do this based on the ContentType. Annoyingly, there’s no way to inspect the name of the field (i.e., the name of the HTML input) that we’re handling at this stage.

As I mentioned, the chunk_handler/4 function simply returns another function (which will accept one parameter) that MochiWeb will call once it has fetched a chunk. I’m using the term ‘chunk’ casually: it can be either a raw chunk of data from the socket, or the eof atom. If the chunk being passed is not the eof atom, the function must return another function which will be used to handle the next chunk. However, if the function is passed the eof atom, we will return a value that we want to associate with this part of the multipart data. The value returned here will be the value that is stored in the property list we mentioned earlier (in upload_photo/1).

Here’s our chunk_handler/4:

chunk_handler(Filename, ContentType, TempFilename, File) ->
    fun(Next) ->
        case Next of
            eof ->
                file:close(File),
                {Filename, ContentType, TempFilename};
            Data ->
                file:write(File, Data),
                chunk_handler(Filename, ContentType, TempFilename, File)
        end
    end.

So, don’t forget we’re returning a function here, which will get executed later on. There are two cases, as we’ve discussed:

  • The eof case involves closing the file, and returning a tuple containing the original filename, the content-type and the temporary filename. We could return this in record form, or we may choose just to return the temporary filename if we didn’t care about the other parameters.
  • The non-eof case involves writing the chunk of data to the file, and then returning a function to handle the next chunk of data.

This concludes the data-handling code. At this point you may wish to refer back to the upload_photo/1—this is where you would then be able to reference each of the uploaded files in the property list by it’s HTML input’s name. The value in the property list would be the {Filename, ContentType, TempFilename} tuple being returned by chunk_handler/4.

A Semi-practical Example

To turn this into something ‘useful’, I’ve put together a rudimentary photo gallery system.

We will adapt our upload_photo/1 function to take another two parameters (the directory for the photos and a list of valid file extensions; the function will hence become upload_photo/3).

The total of our security features will be to check that the extension of the file corresponds to a recognised image type. Anyone will be able to upload files.

Here’s our function:

upload_photo(Req, PhotoDir, ValidExtensions) ->
    FileHandler = fun(Filename, ContentType) -> handle_file(Filename, ContentType) end,
    Files = mochiweb_multipart:parse_form(Req, FileHandler),
    {OriginalFilename, _, TempFilename} = proplists:get_value("photo", Files),
    case lists:member(filename:extension(OriginalFilename), ValidExtensions) of
        true ->
            Destination = PhotoDir ++ OriginalFilename,
            case file:rename(TempFilename, Destination) of
                ok ->
                    Url = "/",
                    Req:respond({302, [{"Location", Url}], "Redirecting to " ++ Url});
                {error, Reason} ->
                    file:delete(TempFilename),
                    html_response(Req, "An error occured whilst trying to move your file: " ++ atom_to_list(Reason) ++ ". Does the destination directory exist?")
            end;
        false ->
            file:delete(TempFilename),
            html_response(Req, "Invalid file type. File extension must be one of: " ++ string:join(ValidExtensions, ", ") ++ ". <a href=\"/\">Try again?</a>")
    end.

You can download the rest of the code. It’s not very elegant, but I was keen to put together something for you to take home.

Obviously this isn’t the sort of thing you should consider making available to the world wide web, but hopefully it’ll be useful in explaining how to handle file uploads.

Comments (0) Trackbacks (0)

No comments yet.

Leave a comment

No trackbacks yet.