Jim Ursetto
Ursetto Consulting, Inc.

Download

Article source code
Article tarball

Introduction

Recently, I noticed that iTunes had lost track of some of my music files. I keep nearly all my music on an NFS server, visible to both iTunes and a SLiMP3 device. At some point, I had reorganized the music directories a bit without considering that the pathnames in iTunes' database would be invalidated.

The simplest way to reconnect lost files is to reimport them. Unfortunately, when doing so you lose your hard-won statistics, such as Play Count and Last Played. I figured I might retain these statistics by locating the new files and then updating iTunes Music Library.xml with the new pathnames. And if I could automate the process, I'd be able to quickly fix this problem in the future whether it occurred deliberately or otherwise.

This article documents my attempt to do just that by building an application in Chicken Scheme, using the bindings to Objective C and Cocoa provided by the objc egg. It's targeted at the intermediate Scheme programmer, who may have some experience with Cocoa. It may also be useful to a beginner looking for examples of an interactive development process, and to a non-Scheme user for the same reason.

In this article, I present code as I write it and design decisions as I make them; and I offer a lot of commentary. I've tried to document the coding process from start to finish, including the false starts, because it is my opinion that the reader can benefit from an author's mistakes as well as his successes.

Are you ready? Then let's begin.

Beginnings

Here's our initial strategy. We'll look up every track in the database and where it should reside on disk. If the file doesn't actually exist, we flag it as bad, and use some heuristic to determine where it has moved to. We'll update the database with this new location and write it out. For flexibility, we'd probably like to grab the bad tracks in one phase, and update the database in a second phase, so that we can examine our proposed changes before committing them.

To begin, we need my objc egg (download with chicken-setup objc) and my plist library source code (placed in the project directory). We'll type the code below directly into a new file. I'm using an interactive development environment—Emacs with Neil van Dyke's Quack—so I can send my definitions to the running Scheme interpreter as they are entered. We'll do quite a bit of testing at the REPL, as you'll see.

(use objc plist)

The iTunes XML file is a property list or "plist" file, which can be read in using the NSDictionary class method dictionaryWithContentsOfFile:. We actually use an NSMutableDictionary because we're going to be updating the dictionary in memory; using a plain NSDictionary seems to work as well here, but let's be safe.

(define-objc-classes NSMutableDictionary)

A quick primer: the define-objc-classes form looks up the NSMutableDictionary class in the Objective C runtime, and creates a new variable called NSMutableDictionary which we can use to access the class. (@ obj keyword1: arg1 keyword2: arg2) is like the Objective C message expression [obj keyword1: arg1 keyword2: arg2] — it sends obj the keyword1:keyword2: message with arguments arg1 and arg2. Finally, if a hyphen occurs in a method name, we remove it and uppercase the next letter; this allows us to type dictionary-with-contents-of-file: instead of dictionaryWithContentsOfFile:.

itunes:open-library opens the music library file name and returns an NSDictionary representing the iTunes library.

(define (itunes:open-library name)
  (or (@ NSMutableDictionary
         dictionary-with-contents-of-file: (expand-tilde name))
      (error 'itunes:open-library "failed to open library" name)))

You'll notice the call to expand-tilde. Cocoa won't automatically expand a tilde into a filename, but it's very likely one will be present in the iTunes library filename. So we write a little wrapper function around the NSString method stringByExpandingTildeInPath:. If the argument is a Scheme string, we convert it into an NSString first. expand-tilde returns an NSString. (The ns:string? and ns:make-string procedures are from the plist library, and will probably be integrated into the objc egg in the future.)

(define (expand-tilde name)
  (@ (if (ns:string? name)
         name
         (ns:make-string name))
     string-by-expanding-tilde-in-path))

We can use the NSDictionary method writeToFile:atomically: to write out the updated iTunes plist as ASCII XML. I pass #t to atomically: for safety; this way the plist will be written to a temporary file and moved into place.

(define (itunes:write-library plist dest-name)
  (@ plist write-to-file: (expand-tilde dest-name)
           atomically: #t))

Now we can do a little exploration. Enter the following definitions:

(define name "~/Music/iTunes/iTunes Music Library.xml")
(define lib  (itunes:open-library name))

If we now type "lib" at the REPL, we should theoretically see an ASCII representation of the entire iTunes XML database (courtesy of lib's description method). In actuality, Chicken truncates this output to prevent mistakes like, oh, say accidentally dumping 8 megabytes of data into your interpreter session. If we'd actually like to see all the output, we should probably write it to a file:

#;94> (with-output-to-file "/tmp/itunes-ascii.txt"
        (lambda () (display lib)))

This will take a few seconds. Below, I show a heavily edited version of the output, enough so you can see the structure of the plist.

#;96> lib
#<objc-instance {
    "Application Version" = "6.0.2"; 
    "Music Folder" = "file://localhost/Users/jim/Music/iTunes/iTunes%20Music/"; 
    Playlists = (
        {
            Name = Library; 
            "Playlist ID" = 5867; 
            "Playlist Items" = (
                {"Track ID" = 5353; },
            ); ...
        }, ...
    );
    Tracks = {
        100 = {
            Album = scenemusic; 
            Artist = radix; 
            Name = plopalou; 
            Size = 2946233; 
            ...
        }; ...
    } ...
}>

We could have simply looked at the XML file itself to see the structure, but this way is more instructive (and easier to read, to boot).

To obtain track data, we look at the "Tracks" key; its value is a dictionary containing one key and value pair per track. We can retrieve this value using the NSDictionary method objectForKey:.

(define (itunes:tracks plist)
  (let ((tracks (@ plist object-for-key: "Tracks")))
    (unless (@ tracks is-kind-of-class: NSDictionary)
      (error 'itunes:tracks "tracks key is not a dictionary."))
    tracks))
(define tracks (itunes:tracks lib))

We already know what the tracks dictionary looks like from our exploration of the structure of lib, so I won't show the output of typing "tracks" at the REPL here. Instead, let's take a closer look at the specific track that appeared earlier, track ID 100.

#;105> (@ tracks object-for-key: "100")
#<objc-instance {
    Album = scenemusic; 
    Artist = radix; 
    "Bit Rate" = 128; 
    Comments = "radix@...."; 
    "Date Added" = 2002-11-11 11:33:34 -0600; 
    "Date Modified" = 2005-02-06 13:33:20 -0600; 
    "File Folder Count" = 4; 
    Genre = Electronic; 
    Kind = "MPEG audio file"; 
    "Library Folder Count" = 1; 
    Location = "file://localhost/Users/jim/Music/iTunes/iTunes%20Music/radix/scenemusic/plopalou.mp3"; 
    Name = plopalou; 
    "Persistent ID" = 993694A5E4D4BA2F; 
    "Play Count" = 11; 
    "Play Date" = -1086610989; 
    "Play Date UTC" = 2005-08-31 19:05:07 -0500; 
    "Sample Rate" = 44100; 
    Size = 2946233; 
    "Total Time" = 184111; 
    "Track ID" = 100; 
    "Track Type" = File; 
    Year = 2000; 
}>

Examining the track data structure, it appears the Location key will give us the music file's pathname as a file:// URL.

#;109> (@ #105 object-for-key: "Location")
@"file://localhost/Users/jim/Music/iTunes/iTunes%20Music/radix/scenemusic/plopalou.mp3"

The #105 above refers to interpreter result number 105—in other words, the result of (@ tracks object-for-key: "100"). # by itself refers to the most recent result. This is an extremely useful abbreviation when working in the REPL.

The @"..." is a shorthand way to describe an NSString. You can also create NSStrings using the @"..." syntax; but you don't normally need to, since the Objective C bridge changes Scheme strings passed as method arguments into NSStrings.

plist keys and values

plist keys are -always- strings, even when they appear to be integers. objectForKey: expects an actual Objective C object, and the bridge will throw an error if you pass it a number. Even converting your key to an NSNumber object doesn't work:

#;107> (@ tracks object-for-key: 100)
Error: bad argument type - not a structure of the required type: objc:instance
#;107> (ns:make-number 100)
#<objc-instance 100>        ; An NSNumber integer object
#;108> (@ tracks object-for-key: (ns:make-number 100))
#f

This restriction is only true for plists, not for NSDictionary types in general, which accept any object as a key. Although you may -add- a non-string key to a plist structure, writing the plist out to disk later will fail.

plist values are a different matter: they may be instances of NSString, NSNumber, NSDate, NSData, NSArray or NSDictionary. The plist library provides functions to convert most of these to and from Scheme objects, including the "catch-all" schemify and objcify. The plist API isn't finalized, so consult the code for the latest. Here are a couple of examples:

#;117> (define bit-rate (@ #105 object-for-key: "Bit Rate"))
#;118> bit-rate
#<objc-instance 128>        ; An NSNumber (integer) object
#;118> (ns:number-value bit-rate)
128
#;119> (schemify bit-rate)
128
#;120> (objcify 128)
#<objc-instance 128>        ; An NSNumber object, again

plist-path

By the way, the plist library provides a convenience function called plist-path that lets us easily trace a given path through a nested plist structure. The idea for this came from the article Using Perl to Manage Plist Files, crossed with SXPath. In our case, we need to iterate over many tracks and we'll wind up using a different approach. Nevertheless, here's a straightforward way to obtain the Location of track ID 100:

#;101> (plist-path lib '(Tracks 100 Location))
@"file://localhost/Users/jim/Music/iTunes/iTunes%20Music/radix/scenemusic/plopalou.mp3"

file:// URL parsing

We've got a file location in the form of a file:// URL, but we need a real filename so we can check if the file exists. Originally, I wrote the following naive function to strip off the file://localhost prefix:

(define (URL->filename str)
  (let ((url-prefix "file://localhost"))
    (and (substring=? str url-prefix)
         (substring str (string-length url-prefix)))))

This doesn't work, of course, because URLs can contain escaped characters. But why write a URL parser when Cocoa already has one, in the form of the NSURL class? Consulting the documentation, we find the NSURL methods +URL-with-string, -isFileURL, and -path. Before trying them out, we need to import the NSURL class.

(define-objc-classes NSURL)
#;124> (define loc (@ (@ tracks object-for-key: "100")
                         object-for-key: "Location"))    ; the file:// URL, an NSString
#;125> (@ NSURL URL-with-string: loc)                    ; create a new NSURL
#<objc-instance file://localhost/Users/jim/Music/iTunes/iTunes%20Music/radix/scenemusic/plopalou.mp3>
#;126> (@ #125 is-file-URL)
#\x1        ; the objective C bridge true value
#;127> (@ #125 path)
@"/Users/jim/Music/iTunes/iTunes Music/radix/scenemusic/plopalou.mp3"

For convenience, we define a small wrapper which converts a string to an NSURL.

(define (ns:string->URL str)
  (@ NSURL URL-with-string: str))

Operating on all tracks

To iterate through the tracks, we will use ns:enumerator-fold, which is defined in the plist library. Here is the definition:

(define (ns:enumerator-fold kons knil e)
  (let loop ((seed knil))
    (let ((obj (objc:send e next-object)))
      (if (not obj)
          seed
          (loop (kons obj seed))))))

If you're not familiar with fold, here's a little background. fold is known as the fundamental list iterator. It starts with a procedure KONS, a "seed" KNIL and a list L. For each element X of L, fold obtains a new seed by applying KONS to X and the current seed. For example, if KONS is cons and KNIL is '(), fold will return a copy of L, reversed. With fold and its sister fold-right, you can implement many standard operations on lists, such as for-each, map and filter.

ns:enumerator-fold is similar to fold, but iterates over an Objective C collection (such as an NSArray or NSDictionary) using an NSEnumerator.

As we've seen, plist files such as the iTunes Music Library are nested collections of arrays and dictionaries, so ns:enumerator-fold can be used to operate on entire sections of a plist file—in our case, the list of tracks.

We typically obtain an enumerator via the instance's objectEnumerator (for NSArray) or keyEnumerator (for NSDictionary) method. NSDictionary accepts objectEnumerator messages as well; this will iterate over its values instead of its keys.

During our exploration of the plist data, we saw that the Tracks key contains a dictionary which maps integer track IDs to track data dictionaries. We don't expect to need these track IDs, just the track data. Therefore, we can use object-enumerator to iterate over the values only.

(define track-enumerator (@ tracks object-enumerator))

The plist library code defines some convenience functions based on ns:enumerator-fold, such as ns:array->list and alist->ns:dictionary. We could, for example, use (ns:array->list (@ tracks all-values)) to obtain a Scheme list of all the track data dictionaries, then use regular list operations (such as fold) on the list. However, this list may contain tens of thousands of temporary Scheme objects—one for each track—since each track dictionary is wrapped in a Scheme object. It's a lot more efficient to operate on each object as we need it in ns:enumerator-fold; this way each object is garbage-collected immediately after it has been used.

Here's some code which leverages what we've learned so far, to build a data structure containing all the "bad" (unconnected) files in the iTunes database. As described in the section on folding, our lambda is called once for each element (track). When a good file or a non-file URL (such as http://) is found, it returns the current seed unchanged. When a bad file is found, we cons the list (bad-file <name> <location>) onto the seed and return that. Creating a plain list, rather than using fancy object or record types, makes it easier to examine and take apart in the REPL.

(define bad-files
  (cons 'bad-file-list
        (ns:enumerator-fold (lambda (x xs)
                              (let* ((location (@ x object-for-key: "Location"))
                                     (URL (ns:string->URL location)))
                                (if URL
                                    (if (@ URL is-file-URL)
                                        (let ((name (ns:string-value (@ URL path))))
                                          (if (file-exists? name)
                                              xs
                                              (cons `(bad-file ,name
                                                               ,(ns:string-value location))
                                                    xs)))
                                        xs) ;; Not a file; do nothing
                                    (begin
                                      (warning "invalid URL: " location)
                                      xs))))
                            '()
                            track-enumerator)))

The structure of bad-files should look like this:

(bad-file-list (bad-file <name> <URL>) (bad-file <name> <URL>) ...)

The astute reader will notice that fold creates its list in reverse order from the source, but this doesn't matter as dictionary values are unordered anyway.

Let's write the list out so we can examine it in an editor. Here's a convenience function (for use in the REPL) which pretty-prints a list to a file.

(define (write-list x file)
  (with-output-to-file file
    (lambda () (pp x))))
#;130> (write-list bad-files "/tmp/not-found.scm")

I used the .scm extension so I get Scheme syntax highlighting when I open the file. Below we can also see a bit of the resulting output.

(bad-file-list                 ; contents of file
  (bad-file
     "/mnt/mp3/scratch/cd/mp3/ariana/18/Faith No More/Angel Dust/09 Be Aggressive.mp3"
     "file://localhost/mnt/mp3/scratch/cd/mp3/ariana/18/Faith%20No%20More/Angel%20Dust/09%20Be%20Aggressive.mp3")
  (bad-file ...) ...)

Finding files

We've successfully built a list of bad files; now we need to apply the "heuristic" I briefly mentioned in order to guess where each file has moved. The strategy I've chosen for this part is pretty simple: find all music files under a destination directory and index them based on the last few components of their pathnames. We can't generally match the filename alone, since unless your filenames are extraordinarily descriptive, we're likely to encounter duplicates. Most people create a directory per artist or album or both, e.g. Artist/Album/Filename.mp3. In all probability, files won't be moved individually; instead an artist or album will be moved as a chunk. This means we can usually make a unique file fingerprint by including a few path components from the end—3 should be adequate, but we'll make it configurable.

We'll move on now to finding all files under a given base directory, keyed by a portion of the pathname. Chicken provides a nice utility function, find-files, in the posix unit. Chicken also provides pathname operations, but none suit our purpose of grabbing the last N components of a pathname. We need to find the Nth occurrence from the right of the system directory separator (/) in the pathname, and grab everything to the right of it. To this end, we can repeatedly use SRFI-13's string-index-right to find the rightmost occurrence of a character, on successively smaller substrings.

(use srfi-13)  ;; for string-index-right
; Find index of Nth occurrence of character C in STR, starting from the right.
(define (string-index-right-n str c n)
  (let ((start 0))
    (let loop ((end (string-length str))
               (n n))
      (if (zero? n)
          end
          (and end
               (loop (string-index-right str c start end)
                     (fx- n 1)))))))
; Obtain the rightmost N pathname components from PATH.  Returns PATH
; if N is greater than the number of components.
; (pathname-components-right "/tmp/foo/bar/baz/qux" 2)  => "/baz/qux"
(define (pathname-components-right path n)
  (let ((index (string-index-right-n
                path pathname-directory-separator n)))
    (if index (substring path index) path)))

find-music-files will find all files under base-dir, returning a hash table which maps the last sig pathname components of each file to the full filename. As the second argument of find-files, we pass a regular expression which should match any music filenames in our collection.

(use posix)    ;; for find-files
(define itunes:music-regexp ".*\.([Mm][Pp]3|[Aa][Aa][Cc])")  ; find-files anchors this with ^ and $
(define (find-music-files base-dir sig)
  (alist->hash-table
   (map (lambda (fn)
          (cons (pathname-components-right fn sig)
                fn))
        (find-files base-dir itunes:music-regexp))))
;; using Chicken's ,d option to show contents of the hash table
#;135> ,d (find-music-files "/mnt/mp3/scratch/cd/mp3/ariana" 3)
...
"/Garbage/beautifulgarbage/05 Cup of Coffee.mp3" -> "/mnt/mp3/scratch/cd/mp3/ariana/16/Garbage/beautifulgarbage/05 Cup of Coffee.mp3"
"/Joni Mitchell/Blue/05 Blue.mp3" -> "/mnt/mp3/scratch/cd/mp3/ariana/15/Joni Mitchell/Blue/05 Blue.mp3"
"/Primus/Pork Soda/Primus - Pork Soda - 08 Wounded Knee.mp3" -> "/mnt/mp3/scratch/cd/mp3/ariana/14/Primus/Pork Soda/Primus - Pork Soda - 08 Wounded Knee.mp3"
"/Radiohead/Kid A/Radiohead - Kid A - 08 Idioteque.mp3" -> "/mnt/mp3/scratch/cd/mp3/ariana/12/Radiohead/Kid A/Radiohead - Kid A - 08 Idioteque.mp3"
...

This implementation looks nice, but harbors a flaw: when a key collision occurs, only the latest value is preserved. Key collisions are still a possibility, and we'd like to save the duplicate filenames so the user can see what happened. Therefore, we'll create the hash table by hand, and store each value as a list (even single values, for simplicity).

Also, some experimentation revealed that find-files recurses into symbolic links to directories, which was resulting in duplicate files. We can prevent this by using the limit procedure argument, which is called for every directory. Recursion is done only when this procedure returns true.

(define (find-music-files base-dir sig)
  (let ((table (make-hash-table string=?)))
   (for-each (lambda (fn)
               (let ((short-name (pathname-components-right fn sig)))
                 (hash-table-set! table
                                  short-name
                                  (let ((old-key (hash-table-ref/default
                                                  table short-name #f)))
                                    (if old-key
                                        (cons fn old-key) ; duplicate key; cons value onto value list
                                        (list fn))))))    ; new key, single element value list
             (find-files base-dir itunes:music-regexp
                         cons '()
                         (lambda (fn)
                           (not (symbolic-link? fn)))))  ; don't recurse into symlinks
   table))

To test the new code, I created 4 test files and several similar directories in /tmp. Two files are not distinguishable from the other two when the pathname component value is too low.

#;137> ,d (find-music-files "/tmp/mp3" 3)
hash-table with 2 elements and comparison procedure <unknown>:
 "/b/c/1.mp3"	-> ("/tmp/mp3/a/b/c/1.mp3" "/tmp/mp3/z/b/c/1.mp3")
 "/b/c/2.mp3"	-> ("/tmp/mp3/a/b/c/2.mp3" "/tmp/mp3/z/b/c/2.mp3")
#;138> ,d (find-music-files "/tmp/mp3" 4)
hash-table with 4 elements and comparison procedure <unknown>:
 "/a/b/c/1.mp3"	-> ("/tmp/mp3/a/b/c/1.mp3")
 "/z/b/c/1.mp3"	-> ("/tmp/mp3/z/b/c/1.mp3")
 "/a/b/c/2.mp3"	-> ("/tmp/mp3/a/b/c/2.mp3")
 "/z/b/c/2.mp3"	-> ("/tmp/mp3/z/b/c/2.mp3")

Correlating bad files with good ones

Now we expand bad-files to look up each bad file in the music table by its key (using significant-path-components path components). Our lookup function, music-table-ref, will return a string if there is only one file match, a list (duplicate file1 file2 ...) if there are multiple matches, or #f if no matches were found at all.

We always want to start the enumerator from the beginning of the track list, so we'll do that directly in the fold code rather than manually reevaluating the track-enumerator global variable here.

(define (music-table-ref table key)
  (let ((path (hash-table-ref/default table key #f)))
    (cond  ((not (pair? path)) path)
           ((null? (cdr path)) (car path))
           (else (make-duplicate path)))))
(define (make-duplicate filenames)
  (cons 'duplicate path))
(define (duplicate? file)
  (and (pair? file)
       (eq? 'duplicate (car file))))

And here's the new code to grab the bad files along with their "good" counterparts. This code is almost a straight cut-and-paste, except for the new call to music-table-ref, and a slightly different list structure. Instead of containing bad-file records, the new result list will contain moved records, each of which contains a source (a bad-file record) and a destination (the output of music-table-ref for that particular file).

(define significant-path-components 3)
(define music-table (find-music-files "/mnt/mp3/scratch/cd/mp3" significant-path-components))
(define bad-files
  (cons 'bad-file-list
        (ns:enumerator-fold (lambda (x xs)
                              (let* ((location (@ x object-for-key: "Location"))
                                     (URL (ns:string->URL location)))
                                (if URL
                                    (if (@ URL is-file-URL)
                                        (let ((name (ns:string-value (@ URL path))))
                                          (if (file-exists? name)
                                              xs
                                              (cons `(moved (bad-file ,name
                                                                      ,(ns:string-value location))
                                                            ,(music-table-ref
                                                              music-table
                                                              (pathname-components-right
                                                               name significant-path-components)))
                                                    xs)))
                                        xs) ;; Not a file; do nothing
                                    (begin
                                      (warning "invalid URL: " location)
                                      xs))))
                            '()
                            (@ tracks object-enumerator))))

Our list structure should now look like this:

(bad-file-list (moved (bad-file <name> <URL>) <destination>) ...)

Running this on my NFS drive /mnt/mp3/scratch/cd/mp3 yields 352 moved files, of which 0 are duplicates and 21 not found (completely deleted tracks). Once again, I wrote out the result to a file using (write-list bad-files "/tmp/moved.scm"). Here is an excerpt:

(bad-file-list
 (moved (bad-file
         "/mnt/mp3/scratch/cd/mp3/ariana/18/Faith No More/Angel Dust/09 Be Aggressive.mp3"
         "file://localhost/mnt/mp3/scratch/cd/mp3/ariana/18/Faith%20No%20More/Angel%20Dust/09%20Be%20Aggressive.mp3")
        "/mnt/mp3/scratch/cd/mp3/burned/22/Faith No More/Angel Dust/09 Be Aggressive.mp3")
 ...)

Data structure access

We need to define some accessors and creators for our data structures, rather than just using car and cons. This will insulate us from future changes to the structure layout; we need only update the accessor functions.

(define (bad-file? x) (eq? (car x) 'bad-file))
(define (bad-file-name x) (cadr x))
(define (bad-file-URL x) (caddr x))
; I'll refer to records tagged with 'moved as "moved-item" records from now on.
(define (make-moved-item bad-file dest) (list 'moved bad-file dest))
(define (moved-item? x) (eq? (car x) 'moved))
(define (moved-item-source x) (cadr x))         ;; will be a bad-file record
(define (moved-item-destination x) (caddr x))
(define (make-bad-file-list x) (cons 'bad-file-list x))
(define (bad-file-list? x) (eq? (car x) 'bad-file-list))
(define (bad-file-list-items x) (cdr x))        ;; list of moved-item records

Now we can, for example, use a filter to return bad files which were also not found in the moved music-table.

(filter (lambda (m)
          (not (moved-item-destination m)))
        (bad-file-list-items bad-files))

This is a bit clunky, not to mention the fact that it returns a list of moved-item records. It should instead return a bad-file-list so that we can filter the results again with the same code. So we'll define a filter function that handles unpackaging and repackaging the results.

; Filter a bad file list and return a second bad file list with the results.
(use srfi-1) ;; filter
(define (bad-file-list-filter pred x)
  (make-bad-file-list
   (filter pred (bad-file-list-items x))))
; Add a couple predicates useful for the bad-file-list-filter.
(define (moved-item-duplicate? x) (duplicate? (moved-item-destination x)))
(define (moved-item-notfound? x) (not (moved-item-destination x)))
(define (moved-item-unique? x) (string? (moved-item-destination x)))

Now it's much easier to get a list of all the files we failed to find:

#;149> (bad-file-list-filter moved-item-notfound? bad-files)
(bad-file-list
  (moved (bad-file
           "/mnt/mp3/scratch/cd/mp3/burned/Veronica/3/Mase/Harlem World/13 What You Want.mp3"
           "file://localhost/mnt/mp3/scratch/cd/mp3/burned/Veronica/3/Mase/Harlem%20World/13%20What%20You%20Want.mp3")
         #f)
  ...)

It turns out that I deleted this album off the disk, but not from iTunes.

If you like, you can now store the bad file list to disk and read it back at any time, showing only duplicates or notfound or unique items.

significant-path-components example

It's important that we specify the significant-path-components value appropriately. For example, I'll use a value of 1 (filename only) rerun the query on my library. At the REPL, we redefine significant-path-components and then resend the definition of bad-files to the interpreter. Afterward, make sure to reset these to their old values.

#;150> (define significant-path-components 1)
#;151> <... re-evaluate bad-file-list>
#;152> (write-list (bad-file-list-filter moved-item-duplicate? bad-files)
                   "/tmp/dups.scm")

There were 7 total duplicates. 5 were actual duplicate files; for example,

(moved (bad-file
        "/mnt/mp3/scratch/cd/mp3/ariana/mp3dump1/misc/David Lee Roth - Just A Gigolo.mp3"
        "file://localhost/mnt/mp3/scratch/cd/mp3/ariana/mp3dump1/misc/David%20Lee%20Roth%20-%20Just%20A%20Gigolo.mp3")
       (duplicate
        "/mnt/mp3/scratch/cd/mp3/burned/Veronica/4/miscellany/David Lee Roth - Just A Gigolo.mp3"
        "/mnt/mp3/scratch/cd/mp3/burned/mp3dump1/misc/David Lee Roth - Just A Gigolo.mp3"))

2 were valid name collisions—different songs with the same base filename.

(moved (bad-file
        "/mnt/mp3/scratch/cd/mp3/ariana/18/Aretha Franklin/Amazing Grace (Disc 2)/05 Wholly Holy.mp3"
        "file://localhost/mnt/mp3/scratch/cd/mp3/ariana/18/Aretha%20Franklin/Amazing%20Grace%20(Disc%202)/05%20Wholly%20Holy.mp3")
        (duplicate
         "/mnt/mp3/scratch/cd/mp3/burned/22/Aretha Franklin/Amazing Grace (Disc 1)/05 Wholly Holy.mp3"
         "/mnt/mp3/scratch/cd/mp3/burned/22/Aretha Franklin/Amazing Grace (Disc 2)/05 Wholly Holy.mp3"))

I became curious to see how many total duplicates there were in the searched directory—above, only duplicates that match bad files are shown. We can get this data from music-table by looking for table values containing more than one element. It might pay off to encapsulate this stuff within a function, but we won't right now.

#;153> (filter (lambda (x) (pair? (cdr x))) (hash-table-values music-table))

The length of this list is 23 (i.e., 23 duplicates). Interestingly, only 4 are different files with the same filename, even though the filenames contain only the track number and title. The remaining 19 are real duplicate songs — 16 of which are from the same album! (It turns out I ripped one of my CDs twice.)

This is clearly useful functionality and should probably be made available to the user in a nicer way.

Fixing bad file locations in the database

Now we'd like to make another pass through the iTunes track database, fixing any bad file locations that we can. For each URL in the database, we need to find the matching moved-item record, if it exists. This is why we saved the full bad-file location URL in the bad-file record. For quickest lookup, we'll create another hash table mapping the bad file URL back to its corresponding moved-item record.

Looking back on it, we could have stored the track ID number in the bad-file record, and used it to index this hash table instead. The only real drawback is that iTunes can change these IDs, which would invalidate any old bad-file data you had saved. However, I'll leave the design as it is.

(define (make-URL->moved-item-table bad-file-list)
  (let ((h (make-hash-table string=?)))
    (for-each (lambda (m)
                (hash-table-set! h
                                 (bad-file-URL (moved-item-source m))
                                 m))
              (bad-file-list-items bad-file-list))
    h))
; This is not very well named.
(define URL-to-moved-item-table (make-URL->moved-item-table bad-files))

Here is an excerpt from the resulting hash table:

#;155> ,d URL-to-moved-item-table
"file://localhost/mnt/mp3/scratch/cd/mp3/ariana/16/DJ%20Krush/The%20Message%20at%20the%20Depth/05%20The%20Blackhole.mp3" ->
(moved (bad-file
        "/mnt/mp3/scratch/cd/mp3/ariana/16/DJ Krush/The Message at the Depth/05 The Blackhole.mp3"
        "file://localhost/mnt/mp3/scratch/cd/mp3/ariana/16/DJ%20Krush/The%20Message%20at%20the%20Depth/05%20The%20Blackhole.mp3")
       "/mnt/mp3/scratch/cd/mp3/burned/4/DJ Krush/The Message at the Depth/05 The Blackhole.mp3")

We really need item accessors for the URL table; I address this issue at the end of the article.

Now we can pass back through the tracklist, updating any bad URLs with our new location. The location must be a string, not an NSURL. If you try to write a plist containing an NSURL, it will fail:

#;160> (define d (@ NSMutableDictionary dictionary))
#;161> (@ d set-object: (@ NSURL file-URL-with-path: "/tmp/abc") for-key: "Location")
#<objc-instance {Location = file://localhost/tmp/abc; }>
#;162> (@ d write-to-file: "/tmp/a.plist" atomically: #f)
#f

Instead, we need to use -absoluteString to obtain the URL as an NSString.

(define (path->URL x)
  (@ (@ NSURL file-URL-with-path: x)
     absolute-string))

The code to obtain move-results is, once again, very similar to the code for bad-file-list. Instead of music-table-ref, we use hash-table-ref to look up the current track's moved-item record. As a quick check, we verify the track's pathname matches that of the looked-up record. Finally, if the destination is valid—meaning it's a unique file, not a duplicate or not-found file—we update the dictionary using setObject:forKey:.

The move-results list structure looks exactly like the bad-file-list structure, except that only successful moves will be included. This means that any duplicate and not-found items will be stripped out without ceremony or warning. That's fine, because we can filter bad-file-list itself if we need this data.

(define move-results
  (cons 'move-results
        (ns:enumerator-fold
         (lambda (x xs)
           (let* ((location (@ x object-for-key: "Location"))
                  (URL (ns:string->URL location)))
             (if URL
                 (if (@ URL is-file-URL)
                     (let ((name (ns:string-value (@ URL path))))
                       (let ((moved-item
                              (hash-table-ref/default URL-to-moved-item-table
                                                      (ns:string-value location)
                                                      #f)))
                         (if moved-item
                             ;; Verify pathname gibes with bad-file record.
                             (if (string=? name (bad-file-name
                                                 (moved-item-source moved-item)))
                                 (if (moved-item-unique? moved-item)
                                     (begin
                                       (@ x set-object:  (path->URL
                                                          (moved-item-destination
                                                           moved-item))
                                            for-key: "Location")
                                       (cons moved-item xs))
                                     xs) ;; this xs nest is here in case we want
                                 xs)     ;; to warn or cons different messages
                             xs)))       ;; ...
                     xs)                 ;; Not a file; do nothing
                 (begin
                   (warning "invalid URL: " location)
                   xs))))
         '()
         (@ tracks object-enumerator))))

After evaluating move-results and possibly examining the output, we can run the bad files search again on the newly updated dictionary, by reevaluating the bad-files expression. Examining this output, I see that the only output consists of the 21 files that were not found in the new location (which had been deleted on purpose).

#;165> (length (bad-file-list-items bad-files))   => 21

Now we can write our library to a new XML file, shut down iTunes, and copy the file into place.

#;166> (itunes:write-library lib "~/scheme/xcode/plist/iTunes Music Library.xml.new")

Tragedy

Unfortunately, as many people (now including myself) have belatedly discovered after hacking up the iTunes XML file, iTunes will blithely ignore any changes you've made. The real data is kept in the binary iTunes Library file; the XML file is "read-only" and is generated from this data, overwriting your changes.

Fortunately, a solution of sorts is detailed at Schmolleworld. In short, you first make a backup of your iTunes Library and iTunes Music Library.xml files, then truncate the iTunes Library file (i.e. " ~/Music/iTunes Library")> and overwrite the old .xml file with your new copy. iTunes will think the iTunes Library file is corrupted, and rebuild it from the XML file. Although this isn't exactly what you might call "safe", it worked for me and my 5,628 song library.

However, any file in the XML database that iTunes cannot actually find will be eradicated from the iTunes database. Files that you've deliberately deleted, such as the album I described above, will disappear, which is pretty useful. Files that have been moved and not updated in your XML file, or which are on a removed volume, will also disappear, which is bad.

That means all files you want to keep must be resolved before recreating the library. You should check the output of (bad-file-list-filter moved-item-notfound? bad-files) to find all moved files that can't be resolved. You can update the XML file as many times as you want, but should not recreate the library until you are confident all files you want to keep are resolved.

iTunes will also "reorganize" your iTunes Music folder (usually "~/Music/iTunes/iTunes Music") when importing the XML file, I assume only if you have "Keep iTunes Music folder organized" selected. In my case, a few question marks in filenames were translated into underscores, and that's it. iTunes won't lose track of these files even if you back out your XML file and restore the old names.

A better solution than the above might be to generate an AppleScript, or to make AppleScript calls directly from Objective C, which sends commands to iTunes to update its database. For best results and safety, we would have to be able to look up songs via the Location URL. Although I can look up by alias name, I can't convert this to a URL. Furthermore, even if we indexed by track ID instead, AppleScript doesn't let you set Location anyway!

Double tragedy

And yet after all that, a fatal error has occurred: all non-smart playlists, including Podcasts, are destroyed when importing a new XML file. Most playlists can be restored using ListSaver, a $6 shareware product from Dead End Software. However, the Podcast playlist is restored as a regular playlist, with all your podcast data intact but your subscription info gone. You can resubscribe, but your files on disk will not show up in the subscription. I don't know of any workaround.

Groovy Movin' reconnects files by sending the Command-I keystroke to bring up the "Can't find file" dialog box, then navigates (again via keystrokes!) the file selector and finds the correct file. This process is very dodgy; it's subject to timing issues and has a huge amount of error correction to compensate.

Another option would be for you to delete and reimport bad files manually, then output an AppleScript which would restore ratings and playcounts scraped from a backup copy of the XML file. The Applescript would need to reference files by Artist, Album, and Size. For best results, you'd want to specify the restored files manually, to avoid updating every song. One can imagine an Applescript which retrieves the current iTunes selection and spits it to a program which grabs the old data, generating a new script to update the selection. Probably such a thing has already been written.

All this heartache just because Apple didn't provide a way to modify a music file's location via Applescript! We just can't reasonably change the hardcoded paths in the music library. It looks like our only real option to is to fall back on that tried-and-true tool of system administrators everywhere—the symbolic link.

(Update June 2007: originally, the symbolic links we create had to be permanently available. I have now devised a way to remove the symlinks after you have reconnected your lost files; see Moving unmanaged iTunes files.)

Symlinks

We'll perform the same track dictionary traversal we did in obtaining move-results, but instead of updating the dictionary we'll create symbolic links from each correct file to the corresponding bad file. First, since the bad file path may not even exist, we'll need a convenient way to create a string of directories before making the symlink, a la mkdir -p.

In fact, we could simply shell out to mkdir -p, but we might like more control than it provides—for example, we might not want to follow symbolic links. Also, it's a nice exercise!

Directory creation

Our strategy: first, split the pathname into its components. All slashes are stripped out of the component list, so we can't tell if a given list represents an absolute pathname. Therefore, we use absolute-pathname? on the pathname string to select a pathname-building function for later. At the start, no components are completed (a null list) and all are remaining. At each step of the loop, we append the first remaining component to the completed list, and use the pathname-building function from above to make it into a new pathname. Existing directories can simply be recursed into. Existing symlinks, if they point to directories, are optionally recursed into. If the path doesn't exist, we create a new directory. Any failure condition will throw an error. This process repeats until no components remain.

While developing the directory creation code, I encountered some interesting behaviour testing for file type and existence. I recorded my findings in the following table, to ensure I handled all cases correctly.

Note: if /tmp/1 is a symlink to a directory,
      (symbolic-link? "/tmp/1")   => #t
      (directory? "/tmp/1")       => #f
      (symbolic-link? "/tmp/1/")  => #f
      (directory? "/tmp/1/")      => #t
However, if /tmp/1 is a dead symlink,
      (symbolic-link? "/tmp/1")   => #t
      (symbolic-link? "/tmp/1/")  => error: no such file
      (file-exists? "/tmp/1")     => #f   ; follows the symlink!
      (file-exists? "/tmp/1/")    => #f
Unlike symbolic-link?, directory? does an implicit file-exists?.
      (directory? "/no/such/file") => #f

As you can see, appending a trailing slash "resolves" a symbolic link, which prevents us from testing if it is really a link. But we also need to resolve symlinks to recurse into them. We note that the pathname builder, make-pathname, takes two arguments, directory and file, and just appends a slash to the pathname if file is #f. Therefore, we call make-pathname twice in the code below, once for no slash and once when we need the trailing slash. Appending a slash to resolve a symlink is, in fact, simpler than using read-symbolic-link to do so.

; Creates all directories in pathname in the manner of mkdir -p.
; Throws an error upon failure; returns the pathname on success.  If traverse-symlinks
; is #f [default is #t], it is an error to recurse into symlinks to directories.
(define (create-directories pathname . traverse-symlinks?)
  (let ((make-pathname (if (absolute-pathname? pathname)
                           make-absolute-pathname
                           make-pathname)))
    (let loop ((path-done '())
               (path-left (string-split pathname
                                        (string pathname-directory-separator))))
      (if (null? path-left)
          pathname
          (let ((next-path (append path-done (list (car path-left))))
                (pathname-so-far (make-pathname path-done (car path-left))))
            (if (directory? pathname-so-far)
                (loop next-path (cdr path-left))
                (if (file-exists? pathname-so-far)
                    (if (and (symbolic-link? pathname-so-far)
                             (:optional traverse-symlinks? #t)                ; Resolve symlink by
                             (directory? (make-pathname pathname-so-far #f))) ; appending a slash.
                        (loop next-path (cdr path-left))
                        (error 'create-directories "file exists" pathname-so-far))
                    (begin (create-directory pathname-so-far)                 ; may throw an error
                           (loop next-path (cdr path-left))))))))))

Interlude: Accessing our result structures

Our result structures all look similar—they contain a type tag followed by a list of results—and the bad-file-list filter is pretty useful. So we'll generify the result processing by defining accessors and operations on "result" structure types.

(define (result-type x) (car x))
(define (result-items x) (cdr x))
(define (make-result type items) (cons type items))
(define (result? x) (symbol? (result-type x)))   ;; Should check for a valid result tag, but oh well.
(define (filter-result pred x)
  (if (result? x)
      (make-result (result-type x)
                   (filter pred (result-items x)))
      (error 'filter-result "not a result list")))

Additionally, symbolic link creation can fail for one reason or another, and we might like to report this within the returned result structure. We'll create an error record which we expect to be placed in the destination slot of a moved-item record.

(define (moved-item-error? x)
  (error? (moved-item-destination x)))
(define (error? x) (and (pair? x)
                        (eq? (car x) 'error)))
(define (make-link-error msg dest) (list 'error msg dest))

Interlude: Exceptions

The procedure create-symbolic-link throws an error rather than returning a boolean value. But we'd prefer that errors be caught and signaled in the results list; we don't want a single error to stop our processing.

So, we'll make a helper procedure which invokes a thunk (zero-argument procedure) and returns #t upon success and #f upon error, and define a new procedure create-symbolic-link* on top of this. We can catch errors using the methods described in SRFI-12: Exception Handling.

(define (error->boolean thunk)
  (handle-exceptions _ #f (begin (thunk) #t)))   ;; See SRFI-12 for details.
(define (create-symbolic-link* old new)
  (error->boolean
   (lambda ()
     (create-symbolic-link old new))))

Then again, we'd like to get the actual exception message so we can inform the user exactly how linking failed, since there are myriad ways to actually fail. The meat of our linking procedure will consist of calls to create-directories and create-symbolic-link, which may each throw errors.

Rather than cluttering up our link result code with the somewhat unsightly exception handling constructs, let's define a simple macro that looks like an if block, with a fourth slot for error handling.

Our first attempt: an if statement with 4 blocks. The fourth is a catch procedure called if an error is thrown. The catch procedure receives 3 arguments, corresponding to the three usual pieces of information given with each exception: message, location, and argument list. You might reasonably expect the fourth slot to be a code block like the others. Using a lambda, though, lets the user choose his argument names, rather than having bindings magically appear in his environment.

(define-macro (try-if pred conseq alt catch-proc)
  (let ((exn (gensym)))
    `(handle-exceptions ,exn
                        (,catch-proc ((condition-property-accessor 'exn 'message #f) ,exn)
                                     ((condition-property-accessor 'exn 'location #f) ,exn)
                                     ((condition-property-accessor 'exn 'arguments #f) ,exn))
                        (if ,pred
                            ,conseq
                            ,alt))))

This function takes an error message and argument list (as obtained from an exception) and flattens them into a single string. (conc is the do-what-I-want string appender.)

(define (error->string msg args)
  (cond ((null? args) msg)
        ((and (pair? args)
              (null? (cdr args)))
         (conc msg ": " (car args)))
        (else
         (conc msg ": " args))))

I used try-if for a little while, but found it unwieldy. Feel free to skip this excerpt from my first version of the link results code:

(try-if (and (create-directories (pathname-directory src))
             (create-symbolic-link dst src))
        (cons moved-item xs)
        (error 'link-results "bad #f value")       ;; this should never happen
        (lambda (msg loc args)
          (cons (make-moved-item
                 (moved-item-source moved-item)
                 (make-link-error (error->string msg args) dst))
                xs))))

First I noted that we could pass the exception to the catch procedure instead of its components, then unpack the exception data in error->string.

(define-macro (try-if pred conseq alt catch-proc)
  (let ((exn (gensym)))
    `(handle-exceptions ,exn
                        (,catch-proc ,exn)
                        (if ,pred
                            ,conseq
                            ,alt))))
(define (error->string exn)
  (let ((msg  ((condition-property-accessor 'exn 'message #f)   exn))
        (loc  ((condition-property-accessor 'exn 'location #f)  exn)) ;; ignored
        (args ((condition-property-accessor 'exn 'arguments #f) exn)))
   (cond ((null? args) msg)
         ((and (pair? args)
               (null? (cdr args)))
          (conc msg ": " (car args)))
         (else
          (conc msg ": " args)))))

In fact, we don't even need the pred/conseq/alt arguments to try-if — we could change it into a generic try macro which executes a single try block. On the other hand, perhaps only the predicate of the if should be evaluated within the exception handler. Since handle-exceptions always returns, though, we could not distinguish between a return value from the catch proc and a return value from the predicate. (In other words, how do we get the value of the predicate to the remainder of the if, while skipping the if when an error occurs?) Undoubtedly there is a way to do this, but our macro doesn't need to be a catch-all solution, as long as it helps us do our work.

Therefore, our try macro will accept a single try block, returning its value. However, if an error occurs during the try block, the catch procedure will be called with the exception and its return value will become the return value of try. try consequently becomes just a very thin wrapper around handle-exceptions, and serves only to provide a different syntax.

(define-macro (try try-block catch-proc)
  (let ((exn (gensym)))
    `(handle-exceptions ,exn
                        (,catch-proc ,exn)
                        ,try-block)))

Creating the symlinks

We can now cut-and-paste (yet again) the move-results code and tweak it to perform symlinks instead of updating the plist dictionary. We really should have abstracted out the common code earlier, but we were trying to get a proof-of-concept going and nice coding took a back seat. Never fear: we will address this problem in a bit.

This code passes through the tracks list, looking up the corresponding moved-item record for each track. If a unique destination file is found, then a symlink pointing to it is created in place of the missing source file. Directories are created as needed for the symlink. Any symlink that fails will have its moved-item-destination field set to (error exception-message "/path/to/dest"). Duplicate and not-found files will be skipped and are omitted from the results, just like in move-results.

At this point I went back and re-opened the iTunes database file, then re-evaluated bad-files to get a clean copy of the bad-file-list, because the plist dictionary had been modified when I was playing around with move-results.

(define link-results
  (make-result
   'link-results
   (ns:enumerator-fold
    (lambda (x xs)
      (let* ((location (@ x object-for-key: "Location"))
             (URL (ns:string->URL location)))
        (if URL
            (if (@ URL is-file-URL)
                (let ((name (ns:string-value (@ URL path)))) ;; REMOVE
                  (let ((moved-item
                         (hash-table-ref/default URL-to-moved-item-table
                                                 (ns:string-value location)
                                                 #f)))
                    (if moved-item
                        ;; Verify pathname gibes with bad-file record.
                        (if (string=? name (bad-file-name
                                            (moved-item-source moved-item)))
                            (if (moved-item-unique? moved-item)
                                ;; The meat: symlink the destination to the bad file.
                                (let ((src (bad-file-name (moved-item-source moved-item)))
                                      (dst (moved-item-destination moved-item)))
                                  (try (begin (create-directories (pathname-directory src))
                                              (create-symbolic-link dst src)
                                              (cons moved-item xs))
                                       (lambda (exn)
                                         (cons (make-moved-item
                                                (moved-item-source moved-item)
                                                (make-link-error (error->string exn) dst))
                                               xs))))
                                xs) ;; this xs nest is here in case we want
                            xs)     ;; to warn or cons different messages
                        xs)))       ;; ...
                xs)                 ;; Not a file; do nothing
            (begin
              (warning "invalid URL: " location)
              xs))))
    '()
    (@ tracks object-enumerator))))

Finally, the payoff. Having evaluated link-results, it turns out to be a good thing I saved all the error output. Every single link returned an error:

(filter-result moved-item-unique? link-results)                  => '(link-results)
(eqv? (length link-results)
      (length (filter-result moved-item-error? link-results)))   => #t
(write-list link-results "/tmp/link-errors.scm")

Here's an example failing record:

(moved (bad-file
        "/mnt/mp3/scratch/cd/mp3/ariana/22/Depeche Mode/Violator/02 Sweetest Perfection.mp3"
        "file://localhost/mnt/mp3/scratch/cd/mp3/ariana/22/Depeche%20Mode/Violator/02%20Sweetest%20Perfection.mp3")
       (error "can not create directory - Permission denied: /mnt/mp3/scratch/cd/mp3/ariana/22/Depeche Mode"
              "/mnt/mp3/scratch/cd/mp3/burned/15/Depeche Mode/Violator/02 Sweetest Perfection.mp3"))

I forgot I didn't have proper permissions to write to my NFS drive! So I created a new UNIX group, chgrped the music directories, and added my user to the new group. The interpreter, unfortunately, had to be restarted to see this change. Up to this point, I had been using one continuous interpreter session, a common paradigm in the Lisp world. I was saddened at its passing, but resolved the next session would be even better.

With permissions fixed, I ran the query again and wound up with zero errors:

(filter-result moved-item-error? link-results)   => (link-results)
(eqv? (length (result-items (filter-result moved-item-unique? link-results)))
      (length (result-items link-results)))      => #t

I subsequently confirmed that all the links were created correctly, and that iTunes now had access to all my "lost" files! I then reran the bad files query, and it came up only with the 21 files that were intentionally deleted. (Actually, it came up with one extra album, due to my ill-fated attempts to import an updated XML file. One of my albums was "reorganized"—i.e. renamed—by iTunes. As soon as I accessed these songs (via Get Info), iTunes magically found them again and updated the XML file.)

My article Moving unmanaged iTunes files explains how to make iTunes reference the real, underlying files in its database, so you can then dispose of the symlinks.

Cleanup!

As the late Harry Caray would say: "Holy cow!" Our code may work, but it sure is ugly. We've accumulated a bunch of global variables, many of which are used to share state between procedures. We've also done a lot of cutting-and-pasting, which has left us with similar but separate code for three different passes through the track list. It probably wasn't the best idea to put this cleanup off, because we've got a lot of work to do. On the other hand, now that we've solved the problem, it is much clearer where exactly code can be refactored.

Let's tackle global variables first. Not all globals are bad—for example, constants that must be visible everywhere are fine. We don't have any of these, though. (Chicken provides a define-constant form for true constants.) Other globals such as name (iTunes XML filename) and lib (the plist representing the iTunes library) are just artifacts of our programming in the REPL. These are simply passed as arguments to toplevel procedures; they're not referenced by name from within any procedures. These are nice to have for REPL interaction, so we can safely remove them into a separate driver file, treating the main code as a library.

itunes:music-regexp

Our itunes:music-regexp might be a good candidate for a SRFI-39 parameter. It's "mostly" constant and has a good default value, but we might want to override it. Parameters, a form of dynamically scoped variable, are a safer substitute for global variables, as they are thread-local and you can also temporarily override them with the parameterize form. The only place this variable is used is from within find-music-files; we could certainly pass it as an (optional) argument instead. But let's make it a parameter and see how it turns out.

(define itunes:music-regexp (make-parameter ".*\.([Mm][Pp]3|[Aa][Aa][Cc])"))

Only the tiniest tweak to find-music-files is required.

(define (find-music-files base-dir sig)
  (let ((table (make-hash-table string=?)))
   (for-each (lambda (fn)
               (let ((short-name (pathname-components-right fn sig)))
                (hash-table-set! table
                                 short-name
                                 (let ((old-key (hash-table-ref/default
                                                 table short-name #f)))
                                   (if old-key
                                       (cons fn old-key)
                                       (list fn))))))
             (find-files base-dir (itunes:music-regexp)    ; the only change
                         cons '()
                         (lambda (fn)
                           (not (symbolic-link? fn)))))
   table))

music-table-ref

significant-path-components is used in both find-music-files and music-table-ref. We could parameterize it so it's accessible from both, or pass it as an argument. However, once a music table is created, it should always be called with the same value of significant-path-components. The approach in an "object-oriented" language might be to create an object containing significant-path-components and a hash table as instance variables, with a pathname lookup operation defined on the class. In Scheme, we can use a closure to achieve the same effect.

We'll change our fold function to take a music-lookup procedure, rather than a music table. This procedure will accept one argument (a path), returning the matching pathname(s) in its database. We can then implement music-lookup internally in any manner consistent with our goal. In other words:

(music-table-ref music-table
                 (pathname-components-right
                  name significant-path-components))
 ; becomes
(music-lookup name)

Our implementation is a function-creating-function which will call find-music-files and return a music-lookup function. music-lookup will be closed over both the table created by find-music-files and the significant-path-components passed to it. This piggybacks off the toplevel music-table-ref, which can still be tested separately. The drawback is the table is now read-only. To avoid this, we could implement the music-table as a dispatch function (an "object") instead, capable of doing both a lookup and returning the raw table, depending on which message you send the object. Or we could implement it as the list (music-table <sig-path-components> <hash-table>) and have music-table-ref pull out the appropriate data.

The key realization here is that we can implement the music table any way we want, as long as we can create a 1-argument music-lookup procedure that returns the correct data. Whether music-lookup looks at global variables or is closed over the music-table data doesn't matter. Whether it sends a message to an object, or uses define-record accessors or list operations doesn't matter. We've now achieved better encapsulation and separation of interface from implementation.

We could of course define find-music-files and music-table-ref within the closure created by make-music-lookup. Neither needs to be visible in the global environment, and we would no longer have to pass significant-path-components as an argument to either. However, functions defined within closures are often a pain to debug at the REPL; I feel it's often easier to test them when defined at toplevel.

I'll keep the music table wrapped opaquely in the lookup function, but I may revisit this in the future if we need access to the hash table. (If you'll recall, earlier we did pull out all values in the music table, to obtain a list of every duplicate file. So more transparency may indeed be warranted.)

(define (make-music-lookup base-dir . significant-path-components)
  (let* ((sig-components (:optional significant-path-components 3))
         (table (find-music-files base-dir sig-components)))
    (lambda (path)
      (music-table-ref table
                       (pathname-components-right
                        path sig-components)))))

The URL table

I like the approach of creating a dedicated lookup function enough to do the same thing for the URL-to-moved-item-table. This will be passed to the moved- and link-results generators.

(define (make-moved-lookup bad-files)
  (let ((table (make-URL->moved-item-table bad-files)))
    (lambda (location)
      (hash-table-ref/default table location #f))))

location-fold and functified bad-files, move-files, and link-files

Now that we've addressed the global variable issue, we can turn our attention to the unconscionable reliance on cut-and-paste which festers yet and lurks malignly in the dark dank recesses of that grotesque silken-masked blasphemy... our code.

We need to wrap the calls to ns:enumerator-fold inside functions, which we should have done in the first place. Each of the result generators has only two free variables: tracks (the track dictionary) and a lookup function. We can pass these as parameters to the new functions. Additionally, much code is shared between all three; we can abstract this out into a new function location-fold, which will accept a procedure argument meat that performs the unique code of the fold. In all three cases the unique code refers to x, xs, location and name, so we'll pass these as arguments to meat.

I will present the newly-abstracted code here without much comment. Here's an example of its use:

(define tracks (itunes:tracks
                (itunes:open-library "~/Music/iTunes/iTunes Music Library.xml")))
(define music-lookup (make-music-lookup "/mnt/mp3/scratch/cd/mp3"))
(define bad-files (get-bad-files tracks music-lookup))
(define moved-lookup (make-moved-lookup bad-files))
(link-files tracks moved-lookup)

Or, without intermediate variables other than tracks:

(define tracks ...)
(link-files tracks (make-moved-lookup
                    (get-bad-files tracks
                                   (make-music-lookup "/mnt/mp3/scratch/cd/mp3"))))

And without further ado, the code:

(define (location-fold tracks meat)
  (ns:enumerator-fold (lambda (x xs)
                        (let* ((location (@ x object-for-key: "Location"))
                               (URL (ns:string->URL location)))
                          (if URL
                              (if (@ URL is-file-URL)
                                  (meat x xs location (ns:string-value (@ URL path)))
                                  xs) ;; Not a file; do nothing
                              (begin
                                (warning "invalid URL: " location)
                                xs))))
                      '()
                      (@ tracks object-enumerator)))
(define (get-bad-files tracks music-lookup)
  (make-result 'bad-file-list
   (location-fold tracks
    (lambda (x xs location name)
      (if (file-exists? name)
          xs
          (cons `(moved (bad-file ,name ,(ns:string-value location))
                        ,(music-lookup name))
                xs))))))
(define (link-files tracks moved-lookup)
  (make-result 'link-results
   (location-fold tracks
    (lambda (x xs location name)
      (let ((moved-item (moved-lookup (ns:string-value location))))
        (if moved-item
            ;; Verify pathname gibes with bad-file record.
            (if (string=? name (bad-file-name
                                (moved-item-source moved-item)))
                (if (moved-item-unique? moved-item)
                    ;; The meat: symlink the destination to the bad file.
                    (let ((src (bad-file-name (moved-item-source moved-item)))
                          (dst (moved-item-destination moved-item)))
                      (try (begin (create-directories (pathname-directory src))
                                  (create-symbolic-link dst src)
                                  (cons moved-item xs))
                           (lambda (exn)
                             (cons (make-moved-item
                                    (moved-item-source moved-item)
                                    (make-link-error (error->string exn) dst))
                                   xs))))
                    xs) ;; this xs nest is here in case we want
                xs)     ;; to warn or cons different messages
            xs))))))

move-files does share some code with link-files, but we've abstracted enough for now.

(define (move-files tracks moved-lookup)
  (make-result 'move-results
   (location-fold tracks
    (lambda (x xs location name)
      (let ((moved-item (moved-lookup (ns:string-value location))))
        (if moved-item
            ;; Verify pathname gibes with bad-file record.
            (if (string=? name (bad-file-name
                                (moved-item-source moved-item)))
                (if (moved-item-unique? moved-item)
                    (begin
                      (@ x set-object:  (path->URL
                                         (moved-item-destination
                                          moved-item))
                           for-key: "Location")
                      (cons moved-item xs))
                    xs) ;; this xs nest is here in case we want
                xs)     ;; to warn or cons different messages
            xs))))))

Congratulations!! The end

If you've managed to read this far, your eyes almost certainly hurt. Nevertheless, I hope this article has interested you in Cocoa programming in Scheme.

If you haven't already, download the source code for this article now. It contains the same commentary and code—the article is generated directly from it—but the superseded code is commented out. So you can load the entire file to use the finished code, or follow along with the article by sending code blocks from your IDE to your interpreter.

Or download the full article package, which contains the source, the plist library, the article text, the article generator, sample output and the extracted code stripped of commentary.

Comments on, questions about or quibbles with this article? Send a message to zb at this domain.

Endnotes

For advanced uses of fold, see the paper A tutorial on the Universality and Expressiveness of Fold.