Image databasing, an integrated, customized approach.

By Daniel L. Geiger geiger at vetigastropoda.com

Summary: Off the shelf photo data bases are often not suitable for particular collections and data-tracking needs. Approaches to designing your own database with general database software is provided. An example using FileMaker is given.

Introduction
Terminology

Platform and programs
Implementation
Navigation and Scripting
Alternative set-ups
Label printing
Archiving
You can do it, too
RTFM
FileMaker Geek Notes

Introduction

I read Kerry Thalmann’s article on image storage and record keeping in the March/April issue of View Camera with great interest. I wholeheartedly agree with him on the need of an archiving system, and a database-driven approach being the most flexible. But unlike him, I find pre-existing databases not a workable solution. If you have been looking at off-the-shelf photo-databases and have been thinking, "this is nice, but lacks xyz and has all this stuff I don't need" then it may be time to design your own system.

I have been developing from the ground up a system representing about 25,000 images in >9,000 series, both 35 mm as well as 4x5, for over a decade and want to share some of my experiences. It is strongly affected by being a natural history museum professional, which has shaped and heightened my sensitivity with respect to keeping things in order. I started with my first photo-database in about 1989 in Filemaker (1), a simple flat-file design. As the canned photo-databases started to appear, my version was already quite well designed for my particular purposes. In the current version made in FM Pro6 Developer, it is a multilevel relational design with scripts, buttons, and on-the-fly customizeable functions. Development is done in fits. There are years where I don't change a thing, and then I add a certain functionality. The current design and overhaul with buttons was done a couple of years ago between x-mas and new year.

My photographic focus is on organisms, I call it nature journalism, as opposed to beauty in nature. The main ordering principle is the Linnean classification: kingdom, phylum, class, order, family, genus, species, with a number of sub-categories. I have been looking at various pre-packaged photo-databases, but they all have too few categories. Hence, I had to bite the bullet and design my own.

Terminology

Databases use their own set of terms, which may make it a bit more difficult than necessary to get into the database world. Below is a short list of terms that are used and what they mean in a language understandable to the rest of us.

Platform and programs

As I am primarily a Mac person, the choice is pretty simple: FileMaker Pro. Additionally it is the only program that works both on Mac as well as on the PC, and the Developer version (approx. $500) makes it possible to generate stand-alone applications that can be sent out to potential clients. It works like a pdf-document with integrated Acrobat reader, but is a fully functional read-only database. The client does not have to own a copy of FileMaker. I thought this would be very handy, but it turns out that I have not used it that much. If you consider developing a web-site with searchable database, then FileMaker is not ideal, as hosting of FileMaker databases is quite uncommon and pricy. A Microsoft Access-based system would be better in that case, but is not usable on the Mac.

If you already have data in electronic form, maybe an Excel table, or in columns in a Word-document, then it is possible to import that data into a proper database. You simply convert the document into a "tab-delimited file", or the file should be able to be read into the database as a tab-delimited file (most likely the case with Excel and most databases). If you migrate data from a Word file, or across platforms, you may have to generate a text file, in which the content of each column is separated by a tab command for each (set of) image(s):

Lens tab exposure tab film tab development tab ....
Lens tab exposure tab film tab development tab ....

If you have a Word-file with a narrative "Took picture with SA90, developed in D76, ....", that can not be converted into a proper database.

Implementation

The next step is the setup and design. First some general remarks.

The field names will be different for you and me. If you shoot cars, than you need fields for make, model, year, color, ... If you do fashion, you may need maker of clothes, model name, agency, ... In the end you will also need to find your physical images. How you organize your images is entirely up to you, but you should do it in a logical and organized fashion. The simplest one is by number from 1-1,000,000. It means that images are not arranged by topics, but it will always be easy to find them. If you want to arrange them by topic, think VERY CAREFULLY about how you want to do it. Mine are arranged by systematic arrangement for 35 mm, with scenics being stored chronologically, and LF (relatively small number thus far) numerically. I have one series of numbers for all images, 35 mm and LF. You may use some sort of prefix for different series, but I would advise against it, because it is prone to errors. You may also add a field about physical storage of a particular image: Storage: Binder 10, or Box 397, or CD 38, or Customer xyz file.

Whether you keep a record for each image, or for series is up to you. I use series of similar shots (different exposures, flash and available light, ...) and then scan only one (or occasionally a limited number). The average number of images per record is three, but ranges from 1 to 97 [I got carried away].

Also notice, that you can change your database! You don't need to think of all possiblities and future developments today. For instance, I added latitude and longitude only once I got a GPS unit, and I modify the drop-down menu for lenses as I change my arsenal.

My fields include (Figure 1):

Some fields have drop down menus (e.g., lens) that also help to keep field contents the same. Here an important database principle emerges: rather have too many fields, as you can merge content from different fields very easily, but to separate content is quite tricky. For instance, if I need all the locality information in one field, I generate a calculation field that appends the content from country, state, town, local name into one field, that is easier to work with for instance when putting together an invoice.

Figure 1. Main screen for data base. The red, blue, white, grey, and black buttons all have scripts associated, that will execute certain functions. It gives the database a web-site feel. The green bookmark section is a relational database that allows me to mark a set of images for particular projects/submissions.

An other principle in databasing images is consistency. One way to force consistency with assigning an image to a particular categories (family in overall classification) is to implement a relational database system. Relational databases are two (or more) databases/tables, that are linked. One has the images records, one record per series, the other has the classification records, one record per family. As there are multiple images of the same family, I establish a link between each image record in one table and its appropriate family record in a different table. It also means, if the classification changes (family moved to different superfamily), I change only one record in the classification database, and automatically all records in the image database are changed as well (Figure 2). Other labor-saving set-ups include:

A similar approach is also implemented for scans, where there may be multiple scans of the same image archived on several CDs, and they can be accessed instantaneously (Figure 3). An invoicing module has also been added.

Figure 2. Two relational databases used for the classification. One has the classification (phylum, class, order ... family), one per family. The other has the association between scientific name and common name. In the Classification table, one could also seggregate into multiple tables one for each pair of adjacent ranks. This is too convoluted for me, but is implemented in larger databases we use at the museum.

Figure 3. CD module provides multiple hits for each record to the main data-base (many-to-one relation), which is done through a portal. The CD module is populated using an apple script to scan a CD/DVD for images, and create new records for each file with file name, type of file, file-size.

Navigation and Scripting

Navigation should be easy, and web-like. Click the labeled button, and something happens. I do that by associating small scripts with the buttons. An example script is provided below. In the main window, when I click on "images", the following script is executed:

Enter Browse Mode []
Go to Layout ["image browse']

This very simple script is generated with a click-various-options fashion. I don't have to actually type in any code. This can get a bit more complicated as you add functions, but is not that difficult at all. Most of these buttons also have a layout change associated with it. I use 13 layouts in the main data table.

I also use the database to generate web pages. I select a particular set of images, and export the “web” field, that contains the html code for a page. To do that, I had to write some scripts within FileMaker with if/then statements and checking for parameters like record number, something that takes a bit of fiddling. The resulting pages are static, not on-line database search results, but it works for me. Below is the sample code that I use to generate simple web-pages:

If(Status(CurrentRecordNumber)=1, ("<html><head>
<title>" & Classification_ID::Phylum & ", " & Classification_ID::Class & "</title>
<meta http-equiv=Content-Type content=text/html; charset=iso-8859-1>
</head>
<body bgcolor=#000000><center><font color=#FFFFFF>SNAP image gallery | all images &copy Daniel L. Geiger/SNAP<br>" &
Classification_ID::Classsummary &" </font></center>") & "<br><br><center><table width=100% border=0>",""
)
&
If( Int(Status(CurrentRecordNumber)/2)-(Status(CurrentRecordNumber)/2)<0, "<tr><td valign=center><center><font size=-1 color=#FFFFFF", "<td valign=center><center><font size=-1 color=#FFFFFF")
&
"<img src=../images/" & Int(Number/100)*100 &"/"& ImageIDplussubID# &".jpg><br>"&
ImageIDplussubID# & "<i> "& Main subject & "</i><br>" &
State_Province & ", " & Country &
If( Int(Status(CurrentRecordNumber)/2)-(Status(CurrentRecordNumber)/2)<0, "</font></center></td>", "</font></center></td></tr>")
&
If(Status(CurrentFoundCount) = Status(CurrentRecordNumber), "</table><br>
<font color=#FFFFFF>All images &copy Daniel L. Geiger/SNAP</font></center></body></html>","")

This looks quite complicated, but most of the tricky placement of commas and brackets is done by Filemaker. The difficult part is to make a two column design, without knowing how many images there are in the group. For that I used the integer-function. The "integer of record-n divided by 2" is identical to "record-n divided by two" if the record number is even. If odd, then there is a difference. Hence, the big problem is to devise a mathematical/logical expression to distinguish between items; executing it in the program is a snap. This is just one example how scripting language can be used. A sample page generated with the web script can be found here.

Alternative set-ups

Some nature photographers combine a classification code with a serial number for that particular image. They take from instance Walker’s Mammals of the World, and number family 1 through x, then in each family genus 1-x, in each genus species 1-x, and then add a serial image number: 01-02-03-15. What happens if a species is moved to a different genus? Or if the photographer misidentified an animal? What if a photographed species has not been described yet? (I am one of those people who describe new species, hence, I am aware of that possibility.) Then the archiving number changes, and the image becomes no longer traceable. Hence, the classification code, and the image identifiers must be treated as separate entities. This is also how the vast majority of natural history museum collections are organized from the American Museum  of Natural History to the Zoologische Staatssammlung. From the perspective of record keeping, a photograph is no different from an actual specimen.

Label printing

Many photo-databases tout their label printing capabilities. I strongly advise against using self adhesive labels anywhere near a photograph. The adhesives will deteriorate over time. In a benign case, labels simply fall off as the glue dries out. But they may also transform into a sticky mess and smudge the original. Therefore, it is better to write serial numbers on the image holder of your choice, either in pencil (graphite is archival) or with a permanent marker (Sharpie or similar). At the museum we obtain collections in which self-adhesive labels have been used. They come in all kinds of states, from yellowed, to spotted, to crumbling, but they are always worse than normal paper labels in a time frame as short as 20 years.

Labels are important as a clean presentation tool. However, there is a strong transition to an all-digital workflow. I submit all my images on CD to my stock agency, and have not sent out an actual photograph in this century. My stock agency does not even want a self-adhesive label on the CD; they prefer handwritten data because the not-so-sticky labels have damaged some of their CD drives. My volume is certainly low, IRS calls it “hobby income”, but there is a clear trend. Stand-alone databases also come with pre-formatted label templates, so it is certainly not that much of a selling point for dedicated photo-databases.

Below is an example of a possible print-page/pdf-page of some images, that may be sent to a client. Such an output in general is called a "report", i.e., the result of a search = query. It looks a bit nicer, has header and footer, but actually is just a data-base output. I could also link such a page to an address table, and automatically insert the client's address, or could have at the bottom of the page all my conditions automatically included. Again, the options are limitless.

Figure 5. Sample print-page of a submission. All images from LF chromes, ArcaSwiss F-line 4x5.
7414: Nikor T360, 81A. 7423: SA 90 f/8. Remainder: Sinaron 150 S. 7452, 7445, 7454 with TLA 360.

Archiving

Archiving digital files is yet an other challenge. I will sidestep here the issue of permanence of CDs. My approach is using CDs and DVDs (over 300 so far) stored off-site, and then I also store all files on a large external hard drive. I use a network attached storage (NAS)-drive, containing four 250 GB drives, set up as a RAID5 array, which gives me 750 GB of effective storage. RAID5 means that if one of the four drives fails, then all the data is still available. NAS drives are connected by a gigabit ethernet connection and are cross platform compatible. I have a Buffalo Technologies 1 TB system connected to PC as well as two Macs; administration and set-up only work from the PC.

Once I burn a CD or DVD, I add records to the digital file module of my FileMaker database system. For each image there is a field with a list of all digital files I have of it and where each is located. The file information from the files on a CD/DVD is automatically added with disc number, image number, and file size using a home cooked apple script. I drag the CD icon on top of the droplet, and the small program reads the content of the CD, and puts the right information in the appropriate FileMaker fields (Figure 4). Thumbnail images are added to the database with another script.

CD-ROM script:

on open (list_of_aliases)

tell application "Finder"
set counter to 0
set FilePath to first item of list_of_aliases as string
set dname to characters 1 thru 8 of FilePath as string

get items
set my_list to every file of item 3

repeat with n in my_list
set counter to counter + 1
set filesize to size of item counter of my_list as string
(*this part gets the file size of the n-th file and converts it to megabytes*)
set MB to (filesize / 1024 / 1024) as string
(*this part gets the file name*)
set filename to item counter of my_list as string
set filename to characters 10 thru end of filename as string
(*sets value of the record number in FM*)
set imageid to item counter of my_list as string
set imageid to characters 10 thru 13 of imageid as string
tell application "FileMaker Developer"
activate
--open file "Mercury:Files:SLIDES FM6:Scans.fp5"
--end if

create record
set cell "image ID" of last record to imageid
set cell "Disk" of last record to dname
set cell "file size_MB" of last record to MB
set cell "alt. file name" of last record to filename

end tell

end repeat
tell application "CD index XX droplet"
activate
display dialog "The CD " & dname & " has been catalogued" & return & ¬
counter & " images have been added"
end tell
eject item 3
end tell

end open

************************************

Thumbnail-adding script.

on open (list_of_aliases)
set FilePath to first item of list_of_aliases as string -- result: Zeidora:thumbnails:XXXX.jpg
set CN to count characters of FilePath
set FN to items 21 through CN of FilePath as string
set imagenumber to items 21 through 24 of FilePath as string
set FoldName to (imagenumber as real) div 100 * 100
-- defines the Foldername by increments of 100
set NewFilePath to "Zeidora3:Files:SLIDES FM6:images:" & FoldName
-- defines the path to the nnOO folder as "NewFilePath"
set NewLocation to NewFilePath & ":" & FN
-- defines the path to the file in the new location

tell application "Finder"
move file FilePath to folder NewFilePath
end tell
tell application "FileMaker Developer"
activate
if not (exists document "SNAPcatalog.fp5") then
open file "Zeidora3:Files:SLIDES FM6:SNAPcatalog.fp5"
end if
delete every request
create request
set cell "number" of request 1 to imagenumber
find
set cell "image" to file NewLocation
end tell

end open

Above are the apple scripts to add CD-records and thumbnails to the FileMaker database. Programmers, don't laugh too hard. The scripts have many flaws (works only from one particular location, file name has to be of particular length, can only drop one image at a time), but it works for me. The program first puts the thumbnail in a particular directory (one directory for every 100 numbers), and then makes a link from the FileMaker record to that image.

The archiving of the physical images is done in two ways in binders using PrintFile plasic sheets. 35 mm slides are archived by classification (plants by families, animals by higher classification). 4x5 are kept separate, because I think if I would mix 35 mm slides and 4x5 sheets, the 35 mm frames would damage the 4x5 sheets. As I don't have that many 4x5 sheets, these are organized numerically, which is roughly equal to by time.

Figure 6. PrintFile binder sleeves for image storage. On the right, 4x5 transparency with associated data sheet. The data sheet for series number 7624 is usually folded and put into one of the film pockets. Right. Corresponding system for 35 mm slides. All slides are mounted, and a label with the classification for each family is printed on heavy stock and inserted into the first pocket. Note that only a single number is written on each frame, usually with genus and species name. Some frames have been recycled, so have some crossed out parts. Photographed on Portatrace 48" light box, which is screwed onto two plastic rolling drawer units from Office Depot (much cheaper and more functional than the OME stand).

You can do it, too.

If all of that sounds terribly technically involved, don’t worry too much. The important thing is to get started. With a home grown system, you can start small, and add capabilities as your needs expand. I started with a single flat-file database without thumbnails, before digital imaging was practical. I learned the rudiments of apple script programming because there was a need. I am not a computer nerd, but as a researcher I am predispositioned to learn new things as need arises. An other advantage of regular database programs is, that once you know how to do it, you can generate other databases as well, without having to purchase yet another dedicated system. I have some for my 3000 literature reprints, for my two ongoing research projects on a couple of snail families (one with over 5000 records). You may also find it useful to compare techniques, for instance which images were developed in a particular way. The personalized options are endless.

This article is not intended as a guide to creating databases. There are a number of finer point to consider, from numbers of nesting levels in relational set-ups, to data verification options. These are issues that you will deal with as your system grows. If you have looked at pre-packaged photo-databases, and thought they don’t quite cut it, quit fretting, take a deep breath and create your own. The most difficult part you have already done, and that is figuring out what you want. Now you only have to implement it. It’s not that difficult after all, and it will serve you better, because you customized it for your needs.

RTFM

Read the ******* manual. If you never dealt with a database, do read the manual, or some third party book on it. I just read the FileMaker manuals, which are ok, not as good as the older PS manuals, but quite a bit better than average. If you are not prepared to read, you will have a lot of frustration in your future. Then a canned version, warts and all, might be better for you.

FileMaker Geek Notes

Bookmarks. The green bookmarks are set up as follows: The "Change Names of Bookmarks" buttons opens a new file (Projectnames), where I can add new records. The text entered is the name of the new project (cards, NOSCAN, ...). Now I have multiple records with unique text values in each record. In the main database, the green box with checkboxes is done by specifying under the command "Field Format" "Check boxes" "using value list" from "Projects". The value list "Projects" is specified as "from field" in field "Project name" in file "Projectnames". That means, that the value list is not static, but dynamic representing the current field contents in the records of the file Projectnames.

Searching for thumbnails. You can not directly search the thumbnail container. A workaround is to have a two-check-box field (Thumbnails? yes/no), where the value (yes/no) is an "isempty" funtion of the thumbnail container: if thumbnail isempty, then Thumbnails? = no, else Thumbnails = yes. [or in FM script: If(IsEmpty(image), "no", "yes")]. Now you can search for whether the checkbox in Thumbnails? is yes or no. Voilà.