Breaking Barriers

In a recent blog post, Rod Page highlighted his concerns about the Scratchpads creating barriers to integrating biodiversity data. In Rod's words "My worry is that in the long term this is going to create lots of silos that some poor fool will have to aggregate together to do anything synthetic with. This makes inference difficult, and also raises issues of duplication (for example, in bibliographies)." Rod is of course right - this is a real risk. However, the initial focus of the Scratchpads was not to solve the data aggregation problem (aka - make Rod's life easier). Rather, they are intended to solve a much bigger problem. One that is endangering the entire discipline. Having solved this in a way that is sympathetic to Rod's concerns, we can then worry about data integration.

The problem I set out to solve with the Scratchpads is that at present most taxonomic information is locked in the minds (or desktop computers) of taxonomists, and there is no ready outlet for this information except perhaps the web. The marginal costs of traditional publication (both direct and indirect costs) are so high, that just a trickle of the actual data we generate gets published (i.e. shared), and with so many taxonomists at or nearing retirement, there is a risk that most of this information will be lost if we don't act fast. For taxonomy to survive, it is essential that we (the taxonomic community) find an outlet for the vast quantities of data we generate. Furthermore, we should actively encourage the digital publication of these data. Rod would probably agree with this but he would legitimately add that none of this precludes a Wiki - or specifically a Semantic Wiki, as the best technical solution to publishing this. Rod's post makes the point that a single Wiki prevents the creation of multiple data silo's and is a better framework to collate, annotate and synthesize biodiversity data (i.e. makes it easier to fix inaccurate data, assign and synonomise multiple identifiers etc). On all counts I would agree - so what is it I have got against Wiki's

It's the sociology
Christine Hine is a sociologist who has spent much of her career looking at how taxonomists and built and use IT systems. Her work reminds us that a technologically superior project may not always succeed. Wiki's and to some extent similar data repositories like EDIT's Common Data Model (CDM), may be technically the best solution to a problem like aggregating biodiversity data. But without taking into account the sociological factors that influence why people contribute to a project, concepts like Rod's Wiki (or EDIT's CDM) are destined for obscurity. Scratchpads work (and by work I mean they are used by a reasonable number of their target audience - i.e. taxonomists) because they build on models of publication and trust that already exist and are widely understood. They don't disenfranchise contributors whose views might depart from the norm (such users can simply set up their own site). Scratchpads also allow contributors to brand themselves and their work, such that their contribution can be readily identified and accessed by others. This is done in a way that is sympathetic to the rights of the original authors, without (non commercially) restricting how others reuse their work or over branding reuse such that it demeans the subsequent user of these data. Scratchpads also provides users with a strong incentive to develop their content, though this branding. None of this can be done with a Wiki. One of the biggest advantages of a Semantic Wiki is that it is sufficiently flexible to accommodate different types of data that were not preconceived by those designing the system. However, the same can be said of the Scratchpads, which allow users to create custom content types to suit their bespoke data needs. Scratchpads work because they are an electronic equivalent of the a multi-authored paper, but have the advantage that they can be constantly revised and evolve as new information comes to light. This goes several steps beyond what is possible with traditional publication, but is not so much of radical change to the publication process as the whole community adopting a common Wiki.

Technical limitations of Wiki's
Sociology aside, there are a few technical limits on what you can do with Wiki's that make them problematic right now. In the Scratchpads we can create sophisticated data editors for building and editing commonly used data sets. This is hard to do in a Semantic Wiki, though is not technically impossible. Likewise, Wiki's can make it hard to normalize and repurpose data across a site. This is essential for a lot of biological data, which would otherwise have to be repeated in many places and rapidly get out of sync. Again, Wiki technology has ways of dealing with this, but it is fair to say that these more sophisticated Wiki features are in their relative infancy.

Scratchpads are not data silos
Rods principle argument against the Scratchpads is that we are building data silos. This is a legitimate concern. However, for the majority of data types there is sufficient structure in the Scratchpads to build services on top of these data such that they might be readily shared with specialized data repositories. There are plenty of such repositories for biodiversity data (Genbank for DNA sequences, BOLD for DNA barcodes, GBIF for specimen records, flickr and morphbank for images, TreeBASE for phylogenies, and far far too many to mention for taxonomic names). We have a BBSRC grant that is currently in review that (if funded) will provide the funds to add these web services to Scratchpad data. In fact some of these services already exist in the Scratchpads. For example, Rod lists bibliographies in his blog post. Its ironic that this is one of the few data types in the Scratchpads that already has a web service, and can be accessed, de-duplicated and flagged for errors and inconsistencies (see here for a first pass at this). We have similar services in place for specimens via GBIF, and plans for plenty more.

The bottom line...
Would a Wiki be better for science - yes, in the long term Rod is probably right. Can I convince a majority of taxonomists to use a Wiki right now- no, at least not in the near to medium term. Not without the sociological (and a handful of technical) concerns being addressed. Since taxonomy has struggled in recent years, I figure its better to do some thing now, and build on this system to address its deficiencies, than focus on a solution that won't get used by the community of researchers that need it most, and need it now.

Comments

CMS Convergent Evolution?

Hi! I believe that in the last years many web based projects and frameworks are developing so many features they become very similar - both drupal and (semantic) mediawiki grew huge communities, builidng extensions/plugins/modules, whihc usually provide 80-90% of any web app you could think of, including digital biodiversity apps. Drupal have borrowed a lot of winning features from Wikis, while semantic mediawiki can be almost set up as a regular CMS (For example, I have pointed Roderic to Semantic Forms - a cool extension which lets end users fill in very complex data via forms, into live semantic pages). I guess in both systems you will need to write almost the same 5-10% custom code to make it behave nicer in a taxonomic context (whether it''s templates, views, content types, or God forbid, actual PHP...). Personally I think the big question is whether the underlying technology, drupal or mediawiki, is truly good enough for advanced taxonomic apps - for example, I dislike mixing my data with a lot of non-scientific information in a single sql db, and SQL might not be the right solution for complex semantic relationships in the first place - i.e., I see semantic mediawiki and Drupal CCK/views as clever hacks, but for complex queries you need to do some loops in the air, hope the server won't choke, and never never look at the sql itself... I am really looking forward for a truly object-oriented/semantic database to use for the underlying technology, with a smart framework on top to allow smart content publishing and manipulation. [Cross posting @ iPhylo ] Udi

Reply

Hi Udi, I think you are right - there will be convergence taking the best of Wiki's and the best of CMS's. However, we need a product now and in the absence of this we have to make the best of the available technology. My second (and more important point) is that the products we build need to take into account the sociology of their users (in our case the scientific community) - it is not just about the technology. For too long we have been building things that no one uses because the developers did not take into account the behavior and motivations of their users. We need to take lessons from the more successful ventures on the web if we are to succeed. Ultimately success means building a product that users have to have, because in our field this is the only path to sustainability.

View My Stats