Main

August 08, 2006

Darwin Calendar Server

The project I've been working on at Apple is now announced, and it's the Darwin Calendar Server. This server is a CalDAV implementation written in Python using the Twisted Framework.

Since we annouced this last night, the poor little G4 we had running the Subversion and Trac instances has gotten rather intimited by all the traffic, and at the moment it's taking a nap. So if you are willing to wait until next week to take a look, you'll probably have better luck then. (Yes, new hardware has been ordered.)

November 04, 2005

TwistedDAV is moving in with Twisted

The integration of TwistedDAV with Twisted is beginning to happen, beginning with a move of the TwistedDAV from my red-bean repository over to a branch in the Twisted repository.

Its module namespace will be changed from twisteddav.* to twisted.web2.dav.* and some functionality is moving from the dav submodule into web2 proper. A bunch of other work is forthcoming.

September 27, 2005

TwistedDAV

Yesterday, I got an OK from Apple to release TwistedDAV, a Python WebDAV server add-on to twisted.web2 that I've been working on at Apple, as an open source project under an M.I.T. license.

The intention is for this code to get assimilated into twisted.web2, but that will take some time, as we'll need to do some reviewing and refactoring in both code bases in order to get it just right. TwistedDAV will live in my red-bean Subversion repository while that work is underway and, assuming we get all of the functionality into twisted.web2, the code on red-bean will go away when we're done.

Right now it's a functional (though possibly still buggy) DAV level 1 server. DAV level 2 support means adding support for LOCK/UNLOCK and all of the associated behavior in the other methods. There is also the beginnings of support for the REPORT method, which is part of WebDAV versioning.

Rather than storing properties in an database off to the side like mod_dav, TwistedDAV uses Bob Ippolito's xattr library, which uses extended attributes on files to store the data. This keeps the data in the filesystem and associated with the file. The downside is that, for the time being, this solution is Mac-specific. The code for that is all in one class, though, so it can be replaced rather easily with a mod_dav type solution or perhaps something more clever.

August 02, 2005

FileMerge and Subversion

Bill wrote up a blurb on using FileMerge with Subversion which reminded me that I had already written a couple of tools in that vein, so I've posted them on Red-Bean.

svn-viewdiff wraps the command line interface of GNU diff around FileMerge and is useable as a diff-cmd with Subversion (--diff-cmd argument to svn diff or edit ~/.subversion/config to make it the default. This sets the merge target to the working copy, so if you deselect diffs in FileMerge and save, it will save to your working copy. I find this handy for quickly undoing debug code before committing.

svn-resolve is a tool to facilitate using FileMerge to resolve merge conflicts after an svn update or svn merge. It uses FileMerge as a three-way merge tool, allowing you to select which diff to take from which file.

July 23, 2005

ApacheCon Europe 2005

ApacheCon is always a blast. And Germany is awesome. We've even gotten into a routine. Every evening, Fitz and I and whoever is around go strolling about Stuttgart looking for food, we eat up, then we head over to Lavazza, where we get Lattes and ice cream. The waitress there, I'm pretty sure, doesn't really like us, but we like lattes and ice cream.

I'm working on a WebDAV server in Python (based on Twisted) and having a week of sitting next to Greg and Roy every afternoon and bombarding him with questions has been insanely useful. I was not expecting to get much work done this week, but I was actually very productive.

Fitz and I ran the ApacheCon Lighting Lottery Talks, which was quite fun. The format of the lighting talks is that prospective speakers submit a topic at the beginning of the talk (or beforehand, such as when I run into anyone I know and ask them what topic they are going to speak about…) and we draw names at random during your session. Once you name is drawn, you get 30 seconds to give a talk that can last up to 5 minutes. After 5 minutes, we find ways to remove you from the stage. We needed a timer tool for this, so I wrote one in Cocoa. Roy, being all clever, foudn some JavaScript and had one running in a web browser in far less time, but web browsers are lame application platforms, so I kept hacking on my little app. It turned out rather nicely, with a nice big timer on screen. It perhaps worked too well, because we never got an opportunity to remove a speaker.

We did have some great talks, my favorite being on implementing a Subversion class loader for Java, such that it would find the class in a Subversion repository (using HTTP), compile it, and load it. It devolved into offline support, where it could read you email for the commit logs and piece together the needed Java code from the patches in the logs. I also enjoyed Rich Bowen's talk about why he hates Apache HTTPd.

July 17, 2005

Charmin hasn't made it to Germany

Holy cow, it takes a long time to get to Stuttgart from Santa Clara. So I get up in the morning and Kristen drives me to San Fransisco airport, where we discover that my Delta filght to Atlanta is delayed by an hour, which means I won't make my connection to Frankfurt. OK, well that sucks. The next flight would be the following morning which really sucks, because I hate getting up early and it just wasted a day of my week in Germany. Better still, we're having ID problems because I didn't book my flight; a travel agency did it on behalf of ApacheCon, which is the why of this whole trip.

See, my last name is Sánchez Vega, but in the States having a last name with two parts really confuses people, so we Puerto Ricans often play games with our last names. Some folks hyphenate them (Sánchez-Vega), and others drop the latter half (Sánchez). Using only the first part of your last name is actually pretty normal in Puerto Rico; people understand that you are shorting your name, much as when I say I'm Wilfredo, most people, even stateside, recognize that there is more to my name, but one doesn't always say all of it. Anyway, I've never changed my last name officially; it's still Sánchez Vega, but when I got my drivers license and my passport (which is a whole story as well), I used the name Wilfredo Sánchez.

The plane tickets, however, were booked under Wilfredo Sánchez Vega, because the kind folks in Germany booked them for me. So the gate agent who is trying to get me on the next day's flight is unsure of this whole situation, like I'm Sánchez, not Sánchez Vega, so that's not really my ticket. I have to say that the whole business of strict authentication on using the ticket despite absolutely no authentication on buying the ticket is complete crap, and it's just a game the airlines play to price discriminate; any claim that this is a security thing is bunk, even if the government has been sucked into playing along.

Which isn't to say that the gate agent wasn't very nice; she was simply understandably confused by the bogus process she has to follow. It turns out that her friend at the next counter was Latin American and was familiar with the weird two-part last name thing and vouched for the OK-ness of that being my ticket, and explained that the folks in Europe are quite so ignorant; I'd be fine on the other side of the pond.

The good news is that this other agent also happens to have mad gate agent skillz, and we were getting along with her, so she decides to get us better hook-ups than this next day nonsense. After much wrangling with the computer she scores me a series of flights: San Francisco to Atlanta to Madrid on Delta, then Madrid to Frankfurt on Iberia. Longer travel time, but it'll happen that day, plus the flight from Atlanta to Madrid (the longest leg) was in Business Class. Now we're talking.

So I get on the next flight to Atlanta and Kristen goes on to San Francisco for the day. I scoot on over to my connection to Madrid, at which point I was sure I had lost my passport, possibly I left it at the counter in San Francisco. Nyeargh!!! Turns out it was in my shirt pocket. OK. Needless panic, it's over, get on the plane.

Last time ApacheCon was in Europe, I went there on Apple's dime via British Airways Business Class. It was swank. There's a nice lounge to wait in at SFO, the seats recline flat into little beds… oh, boy, that was nice. Delta wasn't quite so swank, but on this flight the service was excellent, and, more importantly, I took advantage of it. I got the apetizer this and the salad that, and oh, some Shiraz, and the main course (more Shiraz), and so on. It was like a long take-your-time dinner at a pretty OK restaurant. The main course was blah, but the rest was just fine, and four glasses of Shiraz, four glasses of port and a hot fudge sundae later, I was feeling OK with life as I took a nap in the roomy seat. And then there was breakfast. It was all good.

Then I'm in Madrid, and pretty well lost. I meander through the airport hoping to find Iberia, which is the national carrier of Spain, and on which I'm connecting to Frankfurt. Being a native (but very rusty) Spanish speaker, I'm, thinking I can talk the talk. So when I find someone looking like they could help me, the conversation would start with me asking a question in Spanish, them responding in much faster Spanish, my responding in confused Spanish and a little bit of English, and then they would apologize for assuming I knew Spanish and continue in English. Oh, well.

The flight to Frankfurt was OK. I bought a chicken curry sandwich on the flight and I was sure when I got it that it was egg salad instead, but it was, actually chicken curry.

Then in Frankfurt, I found my way to the train station. Fortunately, most signs have English in small text. Unlike Spain, in Germany, I can't even try to fake it. I know nothing German. After fumbling with the automatic ticket machines to get them to speak English, I was all set. I had missed the reserved train I had tickets for, but with a little help from the info desk, I was able to get on a train in Frankfurt, hop off in Manheim, and connect on to Stuttgart. German trains are nice. Really nice. It makes me wonder why my very rich country has no cool rain system going.

In Stuttgart I discover that Germans use bark, right off the tree, as toilet paper. Eek.

A bad ride later, I'm at the Maritim Hotel, starving and exhausted after about 26 hours of continunous travel. I find the Hackathon room, where I find several ASF folks, and decide I really need a taco. Unfortunately, that'll have to wait until I get back home.

June 16, 2005

Quote of the Day

…the gist being that BSD guys are a lot like Linux guys, except they have kissed girls.

Forbes.com

June 03, 2005

Quote of the Day

well it's all cases of "almost works", but if you put enough "almost works" things together in a particular way then you end up with something that approaches "works" as effort goes towards infinity

Bob Ippolito, on developing for web browsers

May 19, 2005

ApacheCon Europe 2005

ApacheCon Europe 2005 will be in Stuttgart, Germany on July 18th through the 22nd. I'll be going, and am seriously looking forward to running into the usual riffraff and scoundrels that write code because they like to.

The real question is: Do they rent stretch Hummer limosines in Stuttgart?

April 28, 2005

PHP 5.0.4 pooched my web server

I run a web server (Apache 2.1.x) at work which is the collaboration system for the developers. It provides a Subversion repository and a WebDAV volume we stick shared files onto, etc.

We figure a Wiki will be a cool addition (and, in fact, it really is) so I install PHP 5.0.4 and dokuwiki. While I'm at it, I upgrade from Apache 2.1.1 to 2.1.4 (upgraded APR also), and Subversion from 1.1.4 to 1.2-rc2.

It's all great, the Wiki is awesome (mad props to dokuwiki, it's simple to use and looks nice right out of the box), people are adding to the wiki, yay.

Sometime later, people start complaining that Subversion commits are failing due to some strange permissions error from the server. I look into it and the transactions directory in the repository DB (I'm using FSFS) has directories in it with mode 666 (-rw-rw-rw-). After creating these directories without the execute bit set, it fails to access the files within them and produces this error.

So now I'm wondering what kind of bug Subversion 1.2-rc2 has that wasn't in 1.1.4. I briefly has 1.2.-rc1 installed and it didn't have this problem, so maybe it's a new bug. The Subversion folks weren't biting, since no other such bugs had been reported. Then backing out to Subversion 1.1.4 still had the problem. I'm befuddled. It seems like maybe a umask problem, but it's intermittent, so it's happening in the httpd process. Argh!

We later discover that this problem isn't limited to Subversion. Users of the WebDAV volume are starting to see this issue with newly created files and directories (both cases mode 666, where the usual modes are 755 and 644). It's really not looking like a Subversion bug.

So now I'm building and installing all manner of versions of APR, httpd, and Subversion, trying to isolate this problem to some version of something, and I'm still stumped. I even went as far as to go back to APR 0.9.5, and httpd 2.0.x, older versions than what I had originally, just to have the canonical supported Subversion setup. The only version that I can get to work is the binaries I had from before (which I had cleverly backed up), but that didn't include PHP, which I needed for the WIki.

And then it dawned on me that the other thing I changed was adding PHP to the mix. So I disabled the PHP module and waited. Sure enough, a day later, there have been no permissions problems reported. (Though plenty of "Hey why's the Wiki busted?")

So it appears that PHP 5.0.4 does some strange thing that alters the file mode of files created by httpd to the number of the beast. I won't read too far into that, but I'm going to try PHP 4.3.11, which should be fine for dokuwiki, and see how that fares.

Moral of the story: don't change everything at once. Yes, I've already learned this moral many times.

April 26, 2005

Google never ceases to amaze

My friend Brian told me about this cool new Google "Stuff Finder" feature:

Where are my socks?

Now I know.

April 04, 2005

Subversion 1.1.4 packages available

Subversion 1.1.4 packages are now on my iDisk.

March 01, 2005

Subversion, other packages updated

I've rolled all of the packages on my iDisk again. There are a few updates (GPG, libxml2, HTTPd, mod_python, Subversion, stunnel).

Most notably, thanks to Matthew Willis, the Neon package (same version) now installs a shared library. In you install anything that depends on Neon (eg. Subversion), you'll need to re-install Neon as well. Additionally, this Neon build supports HTTPS such that Subversion notices, and the new Subversion build therefore supports HTTPS.

In related news, .Mac has been fixed so that the web front-end to my iDisk (linked to above) now actually shows correct file names for files with long names. Accessing it via Finder (Go -> Other User's Public Folder…) is still preferable, though.

February 15, 2005

Subversion for CodeWarrior

Stephen Davis has written a Subversion plugin for MetroWerks CodeWarrior. If you are a CoreWarrior user, you may want to check it out.

February 11, 2005

Spiffy HTML Templating

Yes, it's been done a thousand times, and I've gone and reinvented the wheel by creating yet another templating system for HTML. You might wonder what might posess me to do that, and the answer is that, basically, they all suck. It's entirely possible that mine sucks as well, but it works for my needs, and that's what I wrote it for. Actually, they don't all suck. WebObjects, for example, is pretty good, despite being a Java platform, which is a major reason why I can't use it. It's also not free (but then it's a lot more than a templating engine and includes EOF, which is well worth paying for).

My requirements:

  • No code in HTML.
    Instead, you insert tags, which are replaced with dynamic content. HTML is for composing content, not programming.
  • Components.
    This is something that WebObjects does very well. I want to be able to define nestable containers that implement widgets and whatnot.
    Components are not necessarily embedded in code. They can live in independent files that can be managed entirely apart from code. They may be composed entirely in HTML, or implemented entirely in code, or a combination thereof.
    WebObjects does this by using three files: HTML, bindings, and code. I put the bindings in the HTML, so I use two files: HTML and code.
  • Stream-based.
    Some toolkits (WebObjects included) render a page in memory, then spit it out to the server all at once. That model means that a page doesn't start to render in the user's browser until after all of the page has been rendered to HTML on the server, which may take a while. It is sometimes possible to send a reasonable amount of data to the user before blocking on an expensive operation (eg. a database query), and it should be possible to send data as soon as it becomes available.
  • Python.
    An awful lot of web tools are built for Java. Unfortunately, Java sucks for many reasons, not the least of which is that Java is an island, and doesn't play well with any other language/environment.
    I like Python, so I wrote it in Python, which is easily bridged to Objective-C, which means I can get to the entire Cocoa stack in Mac OS X. (Yeah, that's bridged to Java also, but not as well, and Java still sucks.) Not that AppKit is useful to a typical HTML application, but Core Data (in Tiger) sure might be.
  • Only one special tag.
    The <component> tag is the only tag that is special to the templating system.

Anyway, a first draft is checked into my Subversion tree at http://svn.red-bean.com/wsanchez/trunk/Spiffy/. Docs are sparse, but there is an example set of templates there, which I use to test the code. Spiffy doesn't do form handling (use the cgi module), nor is it an app server. It's just a templating engine, at least for now.

February 07, 2005

Enforcer

I've written a script called enforcer that can be used in a Subversion repository as a pre-commit hook in order to validate code before it is committed into the repository. This allows the repository administrator to enforce all sorts of evil rules. Doing this from scratch is pretty painful, so the script allows you to write hooks into specific commit events such as "this line was added/removed," "this file was modified/added/removed," etc.

From the doc string:

Enforcer is a utility which can be used in a Subversion pre-commit hook script to enforce various requirements which a repository administrator would like to impose on data coming into the repository.

A couple of example scenarios:

  • In a Java project I work on, we use log4j extensively. Use of System.out.println() bypasses the control that we get from log4j, so we would like to discourage the addition of println calls in our code.

    We want to deny any commits that add a println into the code. The world being full of exceptions, we do need a way to allow some uses of println, so we will allow it if the line of code that calls println ends in a comment that says it is ok:

    System.out.println("No log4j here"); // (authorized)

    We also do not (presently) want to refuse a commit to a file which already has a println in it. There are too many already in the code and a given developer may not have time to fix them up before commiting an unrelated change to a file.

  • The above project uses WebObjects, and you can enable debugging in a WebObjects component by turning on the WODebug flag in the component WOD file. That is great for debugging, but massively bloats the log files when the application is deployed.

    We want to disable any commit of a file enabling WODebug, regardless of whether the committer made the change or not; these have to be cleaned up before any successful commit.

What this script does is it uses svnlook to peek into the transaction is progress. As it sifts through the transaction, it calls out to a set of hooks which allow the repository administrator to examine what is going on and decide whether it is acceptable. Hooks may be written (in Python) into a configuration file. If the hook raises an exception, enforcer will exit with an error status (and presumably the commit will be denied by th pre-commit hook). The following hooks are available:

verify_file_added(filename)
called when a file is added.
verify_file_removed(filename)
called when a file is removed.
verify_file_copied(destination_filename, source_filename)
called when a file is copied.
verify_file_modified(filename)
called when a file is modified.
verify_line_added(filename, line)
called for each line that is added to a file.
(verify_file_modified() will have been called on the file beforehand)
verify_line_removed(filename, line)
called for each line that is removed from a file.
(verify_file_modified() will have been called on the file beforehand)
verify_property_line_added(filename, property, line)
called for each line that is added to a property on a file.
verify_property_line_removed(filename, property, line)
called for each line that is removed from a property on a file.

In addition, these functions are available to be called from within a hook routine:

open_file(filename)
Returns an open file-like object from which the data of the given file (as available in the transaction being processed) can be read.

In our example scenarios, we can deny the addition of println calls by hooking into verify_line_added(): if the file is a Java file, and the added line calls println, raise an exception.

Similarly, we can deny the commit of any WOD file enabling WODebug by hooking into verify_file_modified(): open the file using open_file(), then raise if WODebug is enabled anywhere in the file.

Note that verify_file_modified() is called once per modified file, whereas verify_line_added() and verify_line_removed() may each be called zero or many times for each modified file, depending on the change. This makes verify_file_modified() appropriate for checking the entire file and the other two appropriate for checking specific changes to files.

These example scenarios are implemented in the provided example configuration file enforcer.conf.

When writing hooks, it is usually easier to test the hooks on commited transactions already in the repository, rather than installing the hook and making commits to test the them. Enforcer allows you to specify either a transaction ID (for use in a hook script) or a revision number (for testing). You can then, for example, find a revision that you would like to have blocked (or not) and test your hooks against that revision.

January 07, 2005

Subversion 1.1.2 Packages Available

Subversion 1.1.2 packages are now available on my public iDisk. This build excludes the perl bindings, which don't build properly in 1.1.2. Expect a return of the perl bindings in 1.1.3.

December 08, 2004

CVS to Subversion

I am now in the process of converting the CVS repository we use at work into a Subversion repository. I'm using, naturally, the Subversion packages I've built and published on my iDisk.

The setup I am starting with is a dedicated CVS server with is accessed via SSH. The CVS repository is stored on a mounted NFS volume, which is served by a fancy server box that has backups and so on.

The Subversion repository cannot live on an NFS volume, so the target setup is a Subversion repository that is exposed via HTTP/WebDAV using the Apache HTTPd 2.1 web server. (I used the in-development HTTPd 2.1 instead of the "stable" 2.0 because I wanted to build using the 1.0 version of APR…)

The Subversion repository is kept on the local disk (at /var/subversion/repository) which is backed up to the NFS volume. On the NFS volume, I have a directory set up like so:

drwxr-xr-x  4 www  svn  backups/
drwxr-xr-x  3 www  svn  backups/full/
drwxr-xr-x  2 www  svn  backups/revisions/
drwxr-xr-x  2 svn  svn  bin/
-rwxr-xr-x  1 svn  svn  bin/hot-backup
-rwxr-xr-x  1 svn  svn  bin/mailer
drwxr-xr-x  2 svn  svn  conf/
-rw-r--r--  1 svn  svn  conf/mailer.conf
-rw-r--r--  1 svn  svn  conf/svn-access.conf
lrwxrwxrwx  1 svn  svn  repository/ -> /var/subversion/repository/

The repository at /var/subversion/repository and its contents are owner by user www and group www. The backups directory is also owner by www. Everything else is owner by user svn. The www user must own the repository because it is managed by the mod_dav_svn module in the Apache httpd process, which runs as www. The backups directory is also owned by www so that a nightly cron job for the www account can both read the repository and write to the backups directory:

# Backup subversion repository weekly
0 0 * * 0,4     /jingle/svn/bin/hot-backup /jingle/svn/repository /jingle/svn/backups/full

In the bin directory, I've added the hot-backup.py and mailer.py scripts. These scripts are in the Subversion source tree, but are not installed with Subversion.

In the conf directory, I have the configuration file for mailer.py and the AuthzSVNAccessFile access control file for the Subversion repository, which allows finer-grained access control. The httpd.conf file includes this configuration:

<Location /svn>
    AuthType Digest
    AuthName "Subversion"
    AuthDigestDomain http://svn-server/
    AuthDigestProvider file
    AuthUserFile  /etc/httpd/auth/users
    AuthzSVNAccessFile /nfs/svn/conf/svn-access.conf

    Require valid-user

    DAV svn
    SVNPath /var/subversion/repository
    RemoveHandler .cgi
    RemoveOutputFilter .html
</Location>

The remaining component is the populating repository itself.

The last time I ran a repository conversion using cvs2svn, it took all day. A likely factors is that the CVS repository is on an NFS volume. A local volume would be considerably faster. It probably doesn't help that the CVS server (733 Mhz G4, 512 MB RAM) isn't a top-of-the-line machine by today's measures.

So I'm going to want to do this on a faster machine with plenty of RAM using the local disk. Fortunately, I have that on my desk (2 GHz G5x2, 4 GB RAM). With that much RAM, I can also use a RAM disk to help things along. The steps I need to take, therefore, are:

  1. Cache the CVS repository to my local disk (in case I need to do this multiple times)
  2. Create a mondo RAM disk to store the working data
  3. Copy the CVS repository to the RAM disk
  4. Run cvs2svn, telling it to use the RAM disk for scratch space
  5. Load the dumpfile from cvs2svn into a new repository on the RAM disk

It turns out than the largest RAM disk I can make on my computer without having malloc() barf is 4632520 blocks (2.2 GB) even though I have about 3 GB of RAM free. Obviously, if you have less RAM, you may need to make a smaller RAM disk.

The temporary files created by cvs2svn while it is working can be quite large, and I had trouble getting both the CVS repository and the temp files into smaller RAM disks, so I used the largest size I could get away with. If you need a smaller RAM disk, Fitz tells me that it's probably best to put CVS on the RAM disk and the temp files on your local disk. Your CVS repository will be of a known size, whereas the temp files are not. Also, cvs2svn will hit the CVS files multiple times, and the benefit of the RAM disk is probably best bet on those files. I haven't done any metrics, since I managed to get it all into the RAM disk.

I have cvs2svn generate a dump file rather than load straight into a repository. This has a couple of advantages. First, if something goes wrong with loading, I don't need to re-run cvs2svn. Second, it gives me an opportunity to unmount the RAM disk I used for cvs2svn and start with a new one for the repository work.

Here is my script:

#!/bin/sh

##
# Configuration
##

wd="$(pwd -L)";

cvs_repo_remote="svn-server:/nfs/cvs/root/jingle/Jingle";
cvs_repo_local="${wd}/repository-cvs";
svn_dumpfile="${wd}/repository.svndump";
svn_repo_remote="svn-server:/var/subversion/repository";
svn_repo_local="${wd}/repository-svn";

# 4632520 blocks (2.2GB) appears to be as big as I can get away with
# without malloc errors
ram_disk_size="4632520";
ram_disk_name="cvs2svn-scratch";

##
# Do The Right Thing
##

echo "Starting up at $(date)";

if [ ! -f "${svn_dumpfile}" ] && [ ! -d "${svn_repo_local}" ]; then
    if [ -d cvs2svn ]; then
        echo "Updating cvs2svn...";
        svn update cvs2svn;
    else
        echo "Checking out cvs2svn...";
        svn checkout http://svn.collab.net/repos/cvs2svn/trunk cvs2svn;
    fi;

    if [ -n "${cvs_repo_remote}" ]; then
        echo "Copying CVS repository locally...";
        if ! rsync                      \
          --recursive                   \
          --delete                      \
          --verbose --progress --stats  \
          "${cvs_repo_remote}/"         \
          "${cvs_repo_local}/"; then
            echo "FATAL: copy failed.";
            exit 1;
        fi;
    fi;

    raw_device="$(echo $(hdid -nomount ram://4632520))";

    if [ -z "${raw_device}" ]; then
        echo "Unable to create RAM disk raw device.";
        exit 1;
    fi;

    echo "Created RAM disk raw device: ${raw_device}";

    echo "Formatting as case-sensitive HFS+...";
    newfs_hfs -s -v "${ram_disk_name}" "${raw_device}";

    echo -n "Mounting filesystem... ";
    fs_device="$(hdiutil mountvol "${raw_device}" | tail -1 | awk '{print $1}')";

    if [ -z "${fs_device}" ]; then
        echo "";
        echo "FATAL: Unable to create RAM disk.";
        hdiutil detach "${raw_device}";
        exit 1;
    fi;

    mount="$(df -lk | grep "^${fs_device}" | awk '{print $6}')";
    if [ -z "${mount}" ]; then
        echo "";
        echo "FATAL: Unable to locate RAM disk mount point.";
        hdiutil detach "${fs_device}";
        hdiutil detach "${raw_device}";
        exit 1;
    fi;
    echo "${mount}";

    df -lk | grep "^${fs_device}";

    echo "Copying CVS repository to RAM disk...";
    cd "$(dirname "${cvs_repo_local}")" && pax -rvw "$(basename "${cvs_repo_local}")" "${mount}";

    mkdir "${mount}/tmp";

    echo "Starting converstion at $(date)...";

    ./cvs2svn/cvs2svn                                   \
      --tmpdir="${mount}/tmp"                           \
      --mime-types=/usr/local/apache/conf/mime.types    \
      --no-default-eol --keywords-off                   \
      --encoding=UTF-8                                  \
      --dumpfile="${svn_dumpfile}"                      \
      --dump-only                                       \
      "${mount}/$(basename "${cvs_repo_local}")";

    echo "Converstion completed at $(date).";

    echo "Detaching RAM volume device ${fs_device}...";
    hdiutil detach "${fs_device}";

    echo "Detaching RAM raw device ${raw_device}...";
    hdiutil detach "${raw_device}";
fi;

if [ ! -f "${svn_dumpfile}" ]; then
    echo "No dumpfile?";
    exit 1;
fi;

if [ ! -d "${svn_repo_local}" ]; then
    raw_device="$(echo $(hdid -nomount ram://2000000))";

    if [ -z "${raw_device}" ]; then
        echo "Unable to create RAM disk raw device.";
        exit 1;
    fi;

    echo "Created RAM disk raw device: ${raw_device}";

    echo "Formatting as case-sensitive HFS+...";
    newfs_hfs -s -v "${ram_disk_name}" "${raw_device}";

    echo -n "Mounting filesystem... ";
    fs_device="$(hdiutil mountvol "${raw_device}" | tail -1 | awk '{print $1}')";

    if [ -z "${fs_device}" ]; then
        echo "";
        echo "FATAL: Unable to create RAM disk.";
        hdiutil detach "${raw_device}";
        exit 1;
    fi;

    mount="$(df -lk | grep "^${fs_device}" | awk '{print $6}')";
    if [ -z "${mount}" ]; then
        echo "";
        echo "FATAL: Unable to locate RAM disk mount point.";
        hdiutil detach "${fs_device}";
        hdiutil detach "${raw_device}";
        exit 1;
    fi;
    echo "${mount}";

    df -lk | grep "^${fs_device}";

    echo "Creating subversion repository...";
    svnadmin create --fs-type "fsfs" "${mount}/$(basename "${svn_repo_local}")";

    echo "Loading data into subversion repository...";
    svnadmin load "${mount}/$(basename "${svn_repo_local}")" < "${svn_dumpfile}";

    echo "Copying repository...";
    cd "${mount}" && pax -rvw "$(basename "${svn_repo_local}")" "$(dirname "${svn_repo_local}")";

    echo "Load completed at $(date).";

    echo "Detaching RAM volume device ${fs_device}...";
    hdiutil detach "${fs_device}";

    echo "Detaching RAM raw device ${raw_device}...";
    hdiutil detach "${raw_device}";
fi;

echo "Finished at $(date)";

Many thanks to Fitz and the Subversion developers and fans on #svn for lots of help.

December 07, 2004

Subversion 1.1.1 packages available

I'm updated my Subversion build to version 1.1.1. Subversion 1.1 offers the "FSFS" (flat file) back-end as an alternative to Berkeley DB.

The 1.1.1 package has been on my iDisk for a while, but it's gone through a few iterations in order to make the Python and Perl language bindings available. You'll need to install the SWIG package to use these bindings. The Java bindings are not included do to a build error and lack of interest in Java on my part.

September 24, 2004

Library insanity

Bob has an interesting post in his web log which includes some graphs that show the library dependencies in ls, python, and Mail.app.

ls links against two libraries (libncurses and libSystem) and picks up one extra from libSystem (libmathCommon... I'm not sure why that's a separate library, but there it is).

python links against libSystem as well as the CoreServices and Foundation frameworks, and picks up a fairly sizeable (perhaps a surprisingly large) number of extra libraries.

Mail links against some 14 libraries and frameworks and picks up more dependancies than one can look at without causing severe eye strain. I would guess that Mail uses a fraction of those libraries, but nevertheless loads all of them.

I've never been a fan of "umbrella frameworks" and this shows why. I'm even more distressed by the hiding of "subframework" so that I can't used them without loading up a crapload of libraries I don't care about.

Say, for example, that I want to write a program that uses the public API in CoreGraphics and nothing else. This should be straightforward, but I can't link to CoreGraphics; that framework is nested inside of the ApplicationServices framework and I have to link against that instead. So instead of loading one library, I am force to load a different library that loads the one I want plus twelve others, each of which links against some more libraries.

This severe library bloat, which is why Apple has put a lot of energy into optimizations in the linker such as prebinding and various loading tricks. These are good optimizations, mind you, but it would be nice if they weren't quite so necessary.

August 05, 2004

Subversion 1.0.6 packages ready

A little slow here, but Subversion 1.0.6 is available on my iDisk.

June 09, 2004

Stack Traces in Python Threads

I mentioned before that I'm using XMLRPC to debug a mutli-threaded Python application.

One of the things I'd like to get from this new debugging tool is a stack trace for the various threads my application is running. Sometimes, it appears to get stuck, and I'd like to know what it's doing. A deadlock of some sort is a likely cause, but I would really like to know where that deadlock is happening, and log output is not quite as detailed as I'd like, plus adding more logging just to tell me where I am required frequent restarts, which is a major drag.

It seems, however, that there is no way in Python to get a list of all running threads. Nor can you, from one thread, get a stack trace for another thread. However, in my program, I'm creating a set of worker threads, and I'm keeping track of them already, so I don't have to ask for a list of all of the threads. The trick part is then to get stack traces from these threads. Lacking the ability to inspect another thread directly, I found that Python's tracing utilities, which facilitate the Python debugging and profiling tools, can be used to get that information. Each worker thread calls sys.settrace(self._trace) in its run() method, and implements this tracing method:

def _trace(self, frame, event, arg):
    self._frame = frame
In the main thread, a status() method returns a trace for each thread:
def status(self):
    import inspect

    self._lock.acquire()
    try:
        status = "Available threads:\n"

        for worker in self._available_threads:
            status = status + "  " + worker.getName() + ":\n"

            frame = worker._frame
            if frame:
                status = status + "    stack:\n"
                for frame, filename, line, function_name, context, index in inspect.getouterframes(frame):
                    status = status + "      " + function_name + " @ " + filename + " # " + str(line) + "\n"

            status = status + "\n"

        return status
    finally:
        self._lock.release()

I'm building up a string here because it's a return value that can be sent over XMLRPC and printed. Structured data may be more useful, but hopefully you get the idea. Now I can not only inspect my programs state remotely, but see exactly what it's doing:

[bluntman:~] wsanchez% python
Python 2.3 (#1, Sep 13 2003, 00:49:11) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xmlrpclib
>>> p = xmlrpclib.ServerProxy("http://localhost:8001", allow_none = 1)
>>> p.status()
Available threads:
  thread01:
    stack:
      _release_save@/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/threading.py#181
      wait@/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/threading.py#223
      wait@/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/threading.py#350
      run@/Users/wsanchez/Python/test/Manager.py#20
      __bootstrap@/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/threading.py#436

One downside to this approach is that we're now incurring a fair bit of overhead because of all of the tracing; every statement the runtime executes now includes invoking the trace method and storing the frame. If performance is a concern, this can be problematic. However, it's easy enough to comment out the sys.settrace() call and uncomment it only when this level of debugging is desired.

May 27, 2004

XMLRPC as a debugging aid

I have a server application which is responsible for processing data as it becomes available. It scans all of the available data at startup and starts work on any source data which isn't already taken care of, and, once done, it waits. It gets notified when new data is added to the system (by another application which is responsible for getting new data), at which point it processes the new data, then waits again. As new source data is processed, a third application is notified that new processed data is available.

The applications here are built using Python, and the notifications are done using XMLRPC, using the xmlrpclib library included with Python. I chose XMLRPC because it's a workable IPC mechanism across processes that may or may not be on the same machine, and because it's ridiculously simple to use with Python.

Recently, I've had a need to debug this application while it's running. The time between startup and the satte I want to inspect is pretty long, and I don't want debug output spewing about the screen in the meantime, amking it harder to find the one message I'm actually interested in. (What I'm looking for is the state work queue once most of the available data has been processed, but some errors are occuring in a few items in the queue, which are getting retried.)

So I'm thinking back to my C programming days, where I could run gdb then attach to an already running program and see what's going on, then detach and let it continue. I find out that there is a python debugger, but you need to run the program in the debugger, plus I don't really want a gdb-like UI.

XMLRPC to the rescue! I'm already running an XMLRPC server, so I can poke at my program that way. All I have to do is fire up the Python interpreter and has at it:

[bluntman:~] wsanchez% python
Python 2.3 (#1, Sep 13 2003, 00:49:11) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xmlrpclib
>>> p = xmlrpclib.ServerProxy("http://localhost:8001", allow_none = 1)
>>> p.queue()
{'thread01': {'group_data': {'Group1': True, 'Group2': True, 'Group3': True}, 'data': {}},
 'thread02': {'group_data': {}, 'data': {}}}

The dictionary returned by p.queue() is the internal representation of the work queue I keep in my application. I can see here that thread01 has some grouped data to work on, and no individual itmes to work on. thread02 has no work to do. Additionally, I can set the logging level as I need, and add items to the work queue:

>>> p.logger.set_level(0)
>>> p.add_to_queue("group_data", "Group5")
>>> p.logger.set_level(1)

Simply being able to turn debug-level logging on and off while the program is running is a huge win, but being able to modify its state is limitless in its utility.

One downside here is that running an XMLRPC server brings you into the world of threaded applications, and if you aren't already there, it can present a new class of problems for you to worry about. However, if all you need to do is look at state or modify settings that aren't subject to major concurrency issues (eg. set the logging level), you don't have to worry too much about threads, and that much utility alone can be quite worthwhile.

May 26, 2004

Subversion 1.0.4 package available

I've updated the Subversion package on my iDisk to 1.0.4.
Dependencies are unchanged, so if upgrading you only need to install Subversion.

April 20, 2004

Gratuitous bad-assness

So, I thought this was pretty clever: goto for Python. comefrom, in particular, is nice hack.

For some background, see Come From (thanks, Bill). Note that in parallel C-INTERCAL, multiple COME FROMs pointing to the same location actually works. Eep.

Then Ben points me at this: Lingua::Romana::Perligata. Nyargh!! My…brain…is…expl…

Subversion 1.0.2 package available

I've rolled a new OS X Subversion package for 1.0.2. The other required packages remain the same. Mount this URL in Finder: http://idisk.mac.com/wsanchez/Public/ or go to http://www.wsanchez.net/iDisk.

April 12, 2004

LaunchBar Search Templates

I just added my own large piles of Search Templates into my copy LaunchBar 4. Thought I'd share my config.

Included items:

  • AirNav: Airport by ID
  • Amazon Books
  • Amazon Music
  • Burning Man
  • CDW | Mac Warehouse
  • Commonwealth Club of California
  • Consumer Search
  • Dictionary.com
  • Drugstore.com
  • eBay
  • eLyrics4U
  • Epicurious
  • Lyrics Cafe
  • Lyrics Time
  • Lyrics XP
  • Motley Fool
  • National Public Radio
  • Netflix
  • Pollstar Concerts
  • Python Documentation
  • Rotten Tomatoes
  • United States Government
  • Weather Channel
  • WebTender

March 31, 2004

LaunchBar 4

Over the past month or so, I've been hearing a lot about this app called QuickSilver, and how it's much better than LaunchBar.

For those unfamiliar with the two, LaunchBar is possibly the best $20 I've ever spent on computer stuff. Most of the software I use these days is either built-in to Mac OS X or free, so except for a couple of games and a few must-haves (eg. Quicken, PhotoShop), I don't buy a lot of software. I do, however, buy small utilities that add a lot of value for a good price. NetNewsWire, for example, is a cool app, but it's not $40 cool. At $20, it'd be a no-brainer. At $40, never mind. LaunchBar is beyond no-brainer; I have to have it. At $20, it would be simply criminal for me not to have it.

I should tell you why. It's actually hard to describe, and, in fact, my friend Max tried to sell me on it three times before I reallized how insanely cool it is. The basic idea is that you enter a hot key (default is command-space) to activate it, then you enter a few characters, and a list of files, applications, contacts, bookmarks, etc. on your system with those letter in it shows up. You select the one you want and it launches that app, or create an email to that contact, or opens that document, etc. No hunting around in Finder. Not having to remember where it is. No leaving the keyboard for the mouse. I even use it to switch between open documents; not figuring out where that window went.

Payam offers his own explanation.

Anyway, there's a new kid on the block, QuickSilver, which does a lot of what LaunchBar does, plus a bunch of extras. And so I've been hearing about how everyone is jumping ship from LaunchBar to QuickSilver. I, however, kept the faith, because I'm too hooked on LaunchBar to toss it just like that. And now I've been vindicated.

Launchbar 4 Beta 1 is out, and it's pretty damned sweet. I can search web sites from within the bar. I can run Unix commands and specify arguments from within the bar. The configuration panel is simpler, though you lose the LaunchBar 3 config. QuickSilver has no configuration, which is in some ways a win for simplicity, but it sucks when you want control over what gets indexed. And I want control. LaunchBar rocks.

March 24, 2004

Subversion 1.0.1 packages

I've repackaged everything on my iDisk. Instead of providing binary tarballs, I'm now providing native Mac OS X Installer packages wrapped in Internet-enabled disk images (which means Disk Utility will try to replace them with their contents, which would be the installer package, rather than mount them). Thanks to Bill for getting the ball rolling on that.

I now have a script which downloads the sources from the project web sites, builds them, and packages them. Eventually, it'll be a smaller script with a config file for each package, as I clean things up. This is turning into yet another ports system. The difference being that this one uses the native (though less-than-cool) installation system and doesn't require you to buy into a new thing.

I'm now only building released versions of software, except for Apache APR, for which I'm building the version included in the last released Apache HTTPD, because Subversion and HTTPD both require development versions of APR until the APR folks finally release 1.0.

Subversion is updated from 1.0.0 to 1.0.1. As before, you'll need to install several packages to get a working Subversion, as Subversion depends on several libraries and so on.

I'm considering how to add multi-package Installer bundles which include everything, but I'd like to avoid having to copy the individual packages into one big archive, because I'd like to be able to download and install them individually as well, plus my iDisk is filling up…

My iDisk public folder is mountable in Finder as http://idisk.mac.com/wsanchez/Public/. If you want to browse to it with Safari, go to http://www.wsanchez.net/iDisk.

February 24, 2004

Subversion 1.0.0 packages

I've built Subversion and all of the packages it depends on, and updated the packages in my public iDisk (wsanchez). They are all built for Mac OS X 10.3.

Note that you can't get to my iDisk using a web browser unless you know the paths to each of the files, which is unlikely. That's a drag, but not something i can fix.

Mount my public iDisk in Finder. In Panther, thats: Go -> iDisk -> Other User's Public Folder…, then enter wsanchez in the dialog. On older Mac OS X versions (or Windows), use iDisk Utility.

February 05, 2004

Occasional unexpected tradgedy

It is occasionally somewhat unfortunate that the W key is right next to the E key.

January 23, 2004

iTunes RSS feed parser

I cobbled together a parser for the iTunes RSS feed which renders the feed as HTML suitable for inclusion into a web page.

I made an effort to generate HTML in a manner that one can apply a style sheet to it and change the layout and appearance of it in a flexible manner, rather than using some sort of template mechanism. CSS is, unfortunately, both hard and in most, if not all, browsers generally broken, so it's not a perfect system, but I got it working well enough to render the page how I like it, so that's good enough for now. :-)

You can try it out and download it from http://www.wsanchez.net/itms/.

January 18, 2004

Apache HTTPd <Directory > and HFS+: A Couple of Work-arounds

I wrote yesterday that Apache HTTPd's <Directory > directive doesn't quite work on case-insensitive filesystems such as HFS+. Here I offer a couple of work-arounds.

This is only a problem for files sub-directories that are exposed to the URI space by the server navigating through the filesystem. You can protect your entire document root through a RedirectMatch directive rule, for example. This works because / on your server's URI space is explicitly mapped to a location in the filesystem (ok, also because there are no case variants of /). However, it wouldn't work for a <Location /foo> rule if foo is a file or directory in your document root, because other URIs (eg. /Foo) can be used to get to that file or directory, and you only specified one.

(This almost makes me suggest that we need a <CaseInsensitiveLocation > tag, but fundamentally, this issue is about the filesystem, and we don't want to add a new tag for every possible semantic of all possible resource stores, so no, I'm not going to suggest that.)

The good news is that until this gets fixed, we can come up with a couple of work-arounds.

The first is pretty straighforward. If you want to change the configuration for a directory in your doc root, stick a .htaccess file in there. No matter what URI was used to get to the directory, the server will always find the .htaccess file and read it. There are a few drawbacks here, but they may be insignificant for many users:

  • You have to enable the use of .htaccess files on your server and allow overrides for the configuration directives you want to use in them. This may be a problem if people you don't trust to use those directives have access to those files, which isn't an uncommon arrangement.
  • If you edit your web site using WebDAV, you may not be able to edit .htaccess files via DAV, as access to those files is disables in the default config. If you use a different virtual host for DAV access (you pretty much have to if you are going to have any kind of processing done on served files, such as server-side includes, php, etc.), so you may choose to disable access to .htaccess files on the main server and allow it via DAV. That rules out using .htaccess for configuring the DAV server, however, so if you want per-directory config on DAV also, this won't work.
  • There is a small performance penalty, as the server has to open and handle the extra file. This is probably insignificant unless you have a very high-volume site.

Another option, the one I am using, is to move the directory in question out of your server's URI space (i.e. out of document root) and use an alias. For example, we'd move /path/to/docroot/foo to /path/not/in/docroot/foo and add some config options:

RedirectMatch Permanent "^/foo$" http://myservername/foo/
Alias /foo/ /path/not/in/docroot/foo/
<Location /foo>
   ...options...
</Location>

This involves a bit more cofiguration, but solves the problem nicely. Because foo is no longer in your document root, the only URI that will get you to it is /foo/. You can use <Location /foo> as well as <Directory /path/not/in/docroot/foo> here because now that /foo is the only URI that will get you there, this should be sufficient.

The RedirectMatch directive isn't strictly necessary, but it emulates that the server normally does when you access a directory without the trailing slash, and redirects you to the URI with the trailing slash. You can let the server do this for you by omitting the trailing slashes in the arguments to the Alias directive, but then you have a potential conflict if you have another URI /foobar on your server; this avoids that.

January 17, 2004

Apache HTTPd <Directory > and HFS+

The <Directory > directive in Apache HTTPd doesn't work if you have a case-insensitive file system.

For example, if you have:

<Directory /path/to/docroot/foo>
  Deny from all
</Directory>

And you try to open http://localhost/foo/bar.html, you will fail due to the above deny rule. But an attempt to open http://localhost/Foo/bar.html will succeed (assuming foo/bar.html exists, etc.), despite the deny rule.

If the above were written as a <Location > rule (<Location /foo>), the above behavior would be correct because URIs are case-sensitive and the two URLs are different, but in the <Directory > case, we're placing a rule on an object in the filesystem, and on HFS+ (and others), Foo and foo refer to the same object.

In beginning to investigate what's going on in code, I noticed that server/request.c has code switched by a CASE_BLIND_FILESYSTEM macro, which is enabled on Windows. A macro implementation assumes that all file systems on the system you are building for have one of either case-sensitive or case-insensitive semantics. On Mac OS, and I assume other systems as well, we have a variety of available filesystems which may use either set of semantics.

The correct solution, I think, would be to add an apr_canonical_filename() function to APR and have HTTPd use that in server/request.c instead of the macro. We match up canonical file names with the <Directory > entries, and all is good. That takes care of request processing. Presumably, we should also canonicalize path names when parsing the <Directory > directives; though that can at least be worked around by making sure you type the canonical name into the configuration. Even so, we should deal with that, since the canonical name may not be obvious to the user.

Apple has a solution to this in HTTPd-1.3, which they implemented in a module (mod_hfs_apple). Much better would have been a patch to the HTTPd server which does what I describe above, and the problem remains unsolved for Apache 2.

January 22, 2002

IMAP at home

I recently set up my own IMAP server at home so I don't end up with all of my mail on my laptop, the loss of which would make me rather sad.

In the process, I started using junkfilter. I enjoy using junkfilter. It's spiffy. Thank you, Mr. Sutter.

[Copied from advogato.org]