Martha Stewart feels my pain!

Last week I ran a session at SciFoo Camp entitled “Biodiversity on the web”. I had originally planned to talk about some web-based environments (Scratchpads) we have created at the NHM to help biologists get their taxonomic research online, and the implications of this for science publishing. However, I bottled on my original talk (Science publishing for the MySpace generation: MySpecies and the Encyclopedia of Life), which was too long and obscure for most of the people that showed up. Instead the audience discussion focused on the Encyclopedia of Life (EOL) project, and a related initiative called the Biodiversity Heritage Library (BHL). I am loosely associated with both of these.

I could not have asked for a more august audience. They included the theoretical physicist and mathematician Freeman Dyson, an editor of Nature (Henry Gee), the conservation biologist extraordinaire Stuart Pimm, Saul Griffith of Howtoons fame, the Director of Citizen Science at Cornell Lab of Ornithology (Janis Dickinson), and of course – Martha Stewart. I gave a truncated version of my original talk, and then opened the session up for discussion. This focused on what I consider the fundamental problem for EOL (who will write the encyclopedia, and perhaps more importantly – why), what will EOL contain, and some copyright issues associated with BHL. Here are a few highlights from the discussion:

  • Henry Gee stressed that the nomenclatural codes governing the naming of species must urgently be reformed to permit online publication. Without this the ICZN, ICBN and the other codes will become irrelevant (if they are not already). After much discussion, Stuart Pimm agreed. I talked a little on how the Scratchpads and initiatives like Zoobank could help in this regard.
  • EOL must distinguish itself from Wikipedia. In the eyes of many people present, EOL was just window dressing for Wikipedia-like content and given the difficulties associated with “writing” EOL, Saul made the point that the money might be better spent on Wikipedia. I gave the usual spin on why EOL would (should) be different (mashups, scalability, meta-analysis of data etc) but I am not sure I found my arguments that convincing.
  • The audience for EOL is still insufficiently defined. Consequently the motivations for those that might contribute and use this resource are unclear. EOL (and to a limited extent BHL) cannot be all things to all people – at least not yet, and the projects need a much clearer vision about what they expect to achieve, why they are needed, and for whom. Again, the vision behind the Scratchpads within the context of changes afoot in science publishing can help here, but this is not (to my knowledge) part of the official EOL script.
  • People were generally shocked about the copyright issues associated with BHL, though there was no clear consensus on what to do about the problem. Some advocated blatant disregard for copyright, others that a more cautious approach would be wise. Regardless, of those that expressed an opinion, all thought that the official cutoff points (1923 in the USA and 1890!!! for most of Europe) were absurd. While most of the Heritage literature is “interesting” and often full of quaint pictures that make great coffee table books, it is off marginal scientific value. Citation analysis of what biological taxonomists actually use bares this out. Saul suggested that we try to determine the monetary value of post 1923 literature to the publishers, and use these data in any arguments on why BHL should scan it. I’m not sure how we might actually do this, but its something I will look into.
  • Stuart Pimm needs EOL to answer what species occur where. This information is fundamental to any conservation studies. In a very taxon specific way, biologists can supply these data through a Scratchpad, and of course this is GBIF’s raison detre. EOL’s unique value is that it can bring all these data together such that ultimately biologists can start to ask why questions based on aggregated information, but again, this vision is not being officially articulated (at least I have not heard it).
  • Martha Stewart used her publishing experience to stress the need for multimedia content and interactivity on the site. Janis Dickinson made similar comments based on her citizen science experience. I explained that there is a world of different between providing a highly enriched and engaging environment for a few high profile taxa like mammals and birds (circa 15,000 spp), versus the other 1.8 million less charismatic species that will make up the bulk of EOL. I’m not sure all those present understood this point.

Saul Griffith made the point that is arguably the most damaging for EOL and the boarder field of biodiversity research. Taxonomy, experts, authority, controls - these are all demonstrably what the web is not about. Trying to implement these in an online environment like EOL and related initiatives would be futile. As I struggled to explain some of the problems associated with studying and documenting biodiversity (scale, long tail issues, history, money etc), and why EOL could (should) be different, Martha Stewart commented that I sounded “pained by these problems”, and that “as a scientist this hurts you [me].” Well frankly Martha is right – I am pained by these problems and it does hurt me. Obviously it is early days for EOL and while it is still vaporware, it is hard to provide definitive answers to some of these problems. But the same cannot be said for the field of biological taxonomy. Unless taxonomists embrace change in the way we work, I think the field of biological taxonomy will be extinct long before most of the biodiversity taxonomists’ study. EOL can and should be part of this change, and EOL needs to engage with this community much in the way that I have been doing with the Scratchpads. Otherwise, I won’t be able to defend this field much longer.

PS. I have three apologies to make!

  1. Firstly I originally booked the room for just half an hour, and subsequently learnt that Jonathan Eisen booked the other half at the last minute for a session on Terraforming. Needless to say, my session overran and went the full hour. Fortunately the displaced group terraformed the lobby, and based on Jonathan Eisen’s blog, their discussion set the world (or rather other worlds) to rights.
  2. Freeman Dyson attempted to say something at one point during my session and I managed to talk over him. Alas I will never know what words of wisdom he had to offer.
  3. I think I brought one of the participants to my session under false pretences. After a slightly frustrating earlier session on science publishing, I announced that I was going to demo an example of how the web might be used for science publishing in biodiversity studies. Although I briefly did this (and did similar show and tell of the Scratchpads in Aaron Swartz’s session), the focus of the discussion was more on EOL. This was because the audience was more interested in this.

 

Comments

Biodiversity Heritage Library selection

I'm the BHL Director. While the BHL will scan a significant part of the pre-1923 literature, we are actively seeking permissions for digitizing and making openly available journal literature from "learned society" non-profit journals post-1923. A significant portion of the specialized taxonomic literature is covered by this category. We are just beginning but we have two signed agreements in hand and 6 more under consideration. These permissions allow us to mount the digitized backfiles up to or very close to the present. We will ramp up these efforts this fall. The "usefulness" of the pre-1923 literature is also a contentious issue. For detailed, hard-core identification of rare taxa in certain disciplines, it is still hugely important.

BHL Utility

Hi Tom,
Thanks for the information. I appreciate that it is early days for BHL (and EOL), and that as time passes, society publishers will rapidly see the value of releasing their back catalog. With some notable exceptions the commercial publishers are much more problematic. With the recent formation of lobbying groups like PRISM there is evidence that commercial science publishers are getting their act together in the fight against Open Access (though see the OpenAccess reply here). Commercial publishers will be reluctant to give away their back catalog if they feel it has any monetary value. To my mind the best we can do (in addition to efforts like BHL) is stop the rot by ensuring that our new content is open access. This is something that EOL can help us with, though I have yet publicly develop these arguments yet – watch this space.

Concerning the utility of the pre-1923 literature, there will be major taxon specific discrepancies, but the fact that even modern taxonomists poorly cite the older literature, suggests to me it will be of marginal use, with the possible exception of many botanists. In the case of parasitic lice (the group I work on), just 30% were described before 1923. Furthermore, most of these older descriptions are so poor they do not diagnose the taxon in question. Only by reference to type material do these older descriptions start to have some value. This seems to be common for entomological descriptions (i.e. more than 50% of the described diversity of life). I find it ironic that many (most?) older descriptions, fail to adequately diagnose taxa, effectively making their names nomen nudum. So much for code!

The bottom line is that I am really excited about BHL - you have to start somewhere and the pre-1923 literature is the obvious place to start. Just take care not to over hype the project, especially given the copyright constraints. BHL should be engaging at a policy level with those organizations that are lobbying for IPR reform on science literature. Furthermore, to demonstrate the value of the BHL ASAP you need to make the content findable. This means article level information and an API to the portal such that folk like me can provide taxonomists with direct access to content. Otherwise most of the BHL will go unread - at least for now, and this will make it hard to raise more cash.

Monetary value of older literature

It is very easy to identify a monetary value, but you have to get some insider information. For example, how many copies of particular paper were purchased and downloaded from the publisher's site during certain period of time. Then it is normalized for each type of paper. I bet that average price of a taxonomic paper will be nil. Publisher would not have gotten a penny form selling individual papers anyway. Even the New York Times stopped selling their articles because it is not profitable. Vlad

Buy it up then give it away?

I think you are right to say that most publishers back catalog has little monetary value, though in recent years some publishers have begun extending their electronic back catalog with the express claim of adding value for subscribers. For most taxonomic literature, because the market is so small, and because most people want articles (not whole journal issues) the value of this content must be close to nil. The problem is getting insider information to determine the value, and then negotiating with publishers for the permissions. This is very labour intensive. I wonder whether there would be a case for wholesale negotiation and purchase of blocks of publishers back catalog. We could advertise for donations to purchase these from publishers (it would be very easy for people to sponsor as there is a clearly defined product, and their would be tax breaks for people donating money from the US) and make it available to everybody through BHL. Essentially the publishers would get getting money for doing nothing. - Just a thought! Vince

"..Essentially the

"..Essentially the publishers would get getting money for doing nothing.." Sounds like "protection money" :)

Not Quite...

The publishers own the rights to this stuff. If they are to give these rights in perpetuity, they will need some incentive. Commercial publishers don't care about scientific progress; they only care about their financial propriety. Thus we have to we have to make it in their interests to care - this costs money.

Terraforming

Well, given your topic, I really should have been in your session (I am working for example on a microbial genomic encyclopedia as well as trying to convince people to do a global microbial survey). But given that we could not get in to your session, it was good to practice our Terraforming by making our own conference room in the hall ...

EOL Microbial Integration?

Hi Jonathan, Maybe there is scope for integrating content from your microbial genomic encyclopaedia into EOL? After all, your group makes up the vast bulk of the diversity of life, and EOL is not just for outreach. The vision is for EOL to act as a technical resource, with tools for data aggregation, visualization and perhaps analysis. Your community are also a lot more progressive than mine, which makes it easier to organize content. EOL is at a very preliminary phase (they are currently hiring technical folk to design and build infrastructure), but there may be scope for integration in the near future - I'll keep you posted. Good luck with your microbial survey. I guess Craig Venter led the way with this. I only wish my taxa were as easy to collect. Best, Vince

Missed a good session.

Hi Vince, this was another good session I missed. With seven parallel tracks, its unfortunately only possible to attend around 15% of all of scifoo... so thanks for blogging it.

I need to clone myself

Hi Duncan,
I just posted another piece about my SciFoo Highlights, and in doing this while checking the schedule I realized there were so many other great sessions I missed. To be honest I would have happily attended every one of them. Thanks for keeping me up to date via Nodalpoint.
Cheers, Vince