Biological Taxonomy in Drupal

As I have mentioned in previous posts, with some colleagues at the NHM I have been developing a template installation of the Drupal CMS (branded "Scratchpads") that help biological taxonomists get the products of their work on the web. The most often requested featuring is the ability to handle biological taxonomic hierarchies of taxon names. Biologists (just like information scientists) use complex name hierarchies to classify information about taxonomic groups (taxa). They need to be able to pin data (maps, images, specimens lists etc) about these names to dynamically create taxon pages with new information as it is submitted to the site (see my previous mySpecies posts). We were thinking we would have to write a new module to do this, but fortunately Drupal's built in taxonomy module does everything we need.

A Drupal instance can support multiple classifications, and each classification allows single or multiple tree-like hierarchies. In fact you don't even have to have a hierarchy to use the taxonomy feature, which might be useful for microbiologists where tree-like hierarchies often don't make much biological sense. Each taxonomy (termed "category" in Drupal speak) comprises multiple names ("terms" sensu Drupal) that optionally include a description and a list of synonyms. For biological taxonomists the description is akin to the authority of a taxon name or might be an identifier (URL or DOI) of a paper describing that taxon, whilst synonyms are alternative names of the taxon. Within Drupal we can tie a classification to particular content types (nodes) and if necessary enforce users to classify pages. Thus as a user submits a new image, bibliographic reference or any other content type, it will automatically be found by browsing or searching the classification on the site, or even just by appending the name to the words "/taxonomy/term/" as part of the sites URL. You even get clickable breadcrumbs at the top of any classified page automatically listing the parent-child relationships.

It is possible to manually get a classification into Drupal (indeed Drupal's features for editing and managing classifications are very good). However, most biological taxonomists already have extensive databases with all this information so we have written a little script that will import Child-Parent classifications from a tab delimited text file to produce the (XML) import file Drupal needs to understand the classification. Instructions for doing this are given below. Note that for the moment the script only handles single hierarchy classifications and you cannot yet import descriptions or synonyms (I'll update this page when we get this fixed). We have tested this with classifications of up to 10,000 names and I will up this by testing a 20,000 multiple hierarchies file once our import script can cope with this. Just for the record, Drupal allows you to export the data again in XML format so it would be possible to manage complex classifications within Drupal and then extract them again if you wanted:


Step 1. Prepare the data file

This is just a tab delimited text file of Child and Parent relationships in two separate columns. i.e.

Species1 Genus1
Species2 Genus1
Species3 Genus2
Genus1 Family1
Genus2 Family2
Family1 Order1
Family2 Order1
Order1 Class1
Class1 Phylum1
Phylum1 Kingdom1
Kingdom1

If you are already using a database to manage biological taxonomic names, you should be able to generate this easily. Note that the highest-ranking name has no parent - just leave this entry blank in the parent column. Also the columns should have no labels, the order of names in the list does not matter, and the root taxon can appear anywhere in the list.

Step 2. Generate the XML file with your classification

Upload the text file with your classification to http://vsmith.info/taxonomy_xml.php. Give the classification any name you like in the text box. Then hit the "Choose File" button and "Choose" the text file on you computer containing the classification. Finally press "Submit" on the web page and wait for the XLM file to be produced. It can take about 5 minutes to handle a 10,000-taxon name file.

Step 3. Upload the XML file into your site

Save the XML file from the web browser on to your computer, go to the "Categories" section of your site (in the administration section) and hit the "Import" tab. Alternatively you can navigate directly to"http://yourSITEaddress/admin/content/taxonomy/import" to get to the page. Under "Files to Import" hit the "Choose File" button and "Choose" the XML file you have just created. Then hit "Import". Drupal can take about 5 minutes to handle a 10,000-taxon name file.

Step 4. Start using your classification

When the import is done, your new classification will be present under the Categories list - just hit "List" under the Categories section or go to "http://yourSITEaddress/admin/content/taxonomy". From here you can edit the classification, by adding or deleting names, change the classification or export the lot in XML format. To start using it, click "Edit Vocabulary" for the classification you have just imported and select all the types of web pages (node types) you want to classify with these names. Then create some new content and select the taxon name you want to classify your content with from the pull down list in the in the "Categories" section. Any search on a name will fine the content, or you can browse content in the Categories list or append the name to the words "/taxonomy/term/" as part of your sites URL.