Last week I ran a session at SciFoo Camp entitled “Biodiversity on the web”. I had originally planned to talk about some web-based environments (Scratchpads) we have created at the NHM to help biologists get their taxonomic research online, and the implications of this for science publishing. However, I bottled on my original talk (Science publishing for the MySpace generation: MySpecies and the Encyclopedia of Life), which was too long and obscure for most of the people that showed up. Instead the audience discussion focused on the Encyclopedia of Life (EOL) project, and a related initiative called the Biodiversity Heritage Library (BHL). I am loosely associated with both of these.
I could not have asked for a more august audience. They included the theoretical physicist and mathematician Freeman Dyson, an editor of Nature (Henry Gee), the conservation biologist extraordinaire Stuart Pimm, Saul Griffith of Howtoons fame, the Director of Citizen Science at Cornell Lab of Ornithology (Janis Dickinson), and of course – Martha Stewart. I gave a truncated version of my original talk, and then opened the session up for discussion. This focused on what I consider the fundamental problem for EOL (who will write the encyclopedia, and perhaps more importantly – why), what will EOL contain, and some copyright issues associated with BHL. Here are a few highlights from the discussion:
- Henry Gee stressed that the nomenclatural codes governing the naming of species must urgently be reformed to permit online publication. Without this the ICZN, ICBN and the other codes will become irrelevant (if they are not already). After much discussion, Stuart Pimm agreed. I talked a little on how the Scratchpads and initiatives like Zoobank could help in this regard.
- EOL must distinguish itself from Wikipedia. In the eyes of many people present, EOL was just window dressing for Wikipedia-like content and given the difficulties associated with “writing” EOL, Saul made the point that the money might be better spent on Wikipedia. I gave the usual spin on why EOL would (should) be different (mashups, scalability, meta-analysis of data etc) but I am not sure I found my arguments that convincing.
- The audience for EOL is still insufficiently defined. Consequently the motivations for those that might contribute and use this resource are unclear. EOL (and to a limited extent BHL) cannot be all things to all people – at least not yet, and the projects need a much clearer vision about what they expect to achieve, why they are needed, and for whom. Again, the vision behind the Scratchpads within the context of changes afoot in science publishing can help here, but this is not (to my knowledge) part of the official EOL script.
- People were generally shocked about the copyright issues associated with BHL, though there was no clear consensus on what to do about the problem. Some advocated blatant disregard for copyright, others that a more cautious approach would be wise. Regardless, of those that expressed an opinion, all thought that the official cutoff points (1923 in the USA and 1890!!! for most of Europe) were absurd. While most of the Heritage literature is “interesting” and often full of quaint pictures that make great coffee table books, it is off marginal scientific value. Citation analysis of what biological taxonomists actually use bares this out. Saul suggested that we try to determine the monetary value of post 1923 literature to the publishers, and use these data in any arguments on why BHL should scan it. I’m not sure how we might actually do this, but its something I will look into.
- Stuart Pimm needs EOL to answer what species occur where. This information is fundamental to any conservation studies. In a very taxon specific way, biologists can supply these data through a Scratchpad, and of course this is GBIF’s raison detre. EOL’s unique value is that it can bring all these data together such that ultimately biologists can start to ask why questions based on aggregated information, but again, this vision is not being officially articulated (at least I have not heard it).
- Martha Stewart used her publishing experience to stress the need for multimedia content and interactivity on the site. Janis Dickinson made similar comments based on her citizen science experience. I explained that there is a world of different between providing a highly enriched and engaging environment for a few high profile taxa like mammals and birds (circa 15,000 spp), versus the other 1.8 million less charismatic species that will make up the bulk of EOL. I’m not sure all those present understood this point.
Saul Griffith made the point that is arguably the most damaging for EOL and the boarder field of biodiversity research. Taxonomy, experts, authority, controls - these are all demonstrably what the web is not about. Trying to implement these in an online environment like EOL and related initiatives would be futile. As I struggled to explain some of the problems associated with studying and documenting biodiversity (scale, long tail issues, history, money etc), and why EOL could (should) be different, Martha Stewart commented that I sounded “pained by these problems”, and that “as a scientist this hurts you [me].” Well frankly Martha is right – I am pained by these problems and it does hurt me. Obviously it is early days for EOL and while it is still vaporware, it is hard to provide definitive answers to some of these problems. But the same cannot be said for the field of biological taxonomy. Unless taxonomists embrace change in the way we work, I think the field of biological taxonomy will be extinct long before most of the biodiversity taxonomists’ study. EOL can and should be part of this change, and EOL needs to engage with this community much in the way that I have been doing with the Scratchpads. Otherwise, I won’t be able to defend this field much longer.
PS. I have three apologies to make!
- Firstly I originally booked the room for just half an hour, and subsequently learnt that Jonathan Eisen booked the other half at the last minute for a session on Terraforming. Needless to say, my session overran and went the full hour. Fortunately the displaced group terraformed the lobby, and based on Jonathan Eisen’s blog, their discussion set the world (or rather other worlds) to rights.
- Freeman Dyson attempted to say something at one point during my session and I managed to talk over him. Alas I will never know what words of wisdom he had to offer.
- I think I brought one of the participants to my session under false pretences. After a slightly frustrating earlier session on science publishing, I announced that I was going to demo an example of how the web might be used for science publishing in biodiversity studies. Although I briefly did this (and did similar show and tell of the Scratchpads in Aaron Swartz’s session), the focus of the discussion was more on EOL. This was because the audience was more interested in this.
Comments
Biodiversity Heritage Library selection
BHL Utility
Hi Tom,
Thanks for the information. I appreciate that it is early days for BHL (and EOL), and that as time passes, society publishers will rapidly see the value of releasing their back catalog. With some notable exceptions the commercial publishers are much more problematic. With the recent formation of lobbying groups like PRISM there is evidence that commercial science publishers are getting their act together in the fight against Open Access (though see the OpenAccess reply here). Commercial publishers will be reluctant to give away their back catalog if they feel it has any monetary value. To my mind the best we can do (in addition to efforts like BHL) is stop the rot by ensuring that our new content is open access. This is something that EOL can help us with, though I have yet publicly develop these arguments yet – watch this space.
Concerning the utility of the pre-1923 literature, there will be major taxon specific discrepancies, but the fact that even modern taxonomists poorly cite the older literature, suggests to me it will be of marginal use, with the possible exception of many botanists. In the case of parasitic lice (the group I work on), just 30% were described before 1923. Furthermore, most of these older descriptions are so poor they do not diagnose the taxon in question. Only by reference to type material do these older descriptions start to have some value. This seems to be common for entomological descriptions (i.e. more than 50% of the described diversity of life). I find it ironic that many (most?) older descriptions, fail to adequately diagnose taxa, effectively making their names nomen nudum. So much for code!
The bottom line is that I am really excited about BHL - you have to start somewhere and the pre-1923 literature is the obvious place to start. Just take care not to over hype the project, especially given the copyright constraints. BHL should be engaging at a policy level with those organizations that are lobbying for IPR reform on science literature. Furthermore, to demonstrate the value of the BHL ASAP you need to make the content findable. This means article level information and an API to the portal such that folk like me can provide taxonomists with direct access to content. Otherwise most of the BHL will go unread - at least for now, and this will make it hard to raise more cash.
Monetary value of older literature
Buy it up then give it away?
"..Essentially the
Not Quite...
Terraforming
EOL Microbial Integration?
Missed a good session.
I need to clone myself
I just posted another piece about my SciFoo Highlights, and in doing this while checking the schedule I realized there were so many other great sessions I missed. To be honest I would have happily attended every one of them. Thanks for keeping me up to date via Nodalpoint.
Cheers, Vince