Sustainability matters in informatics

Parts of the taxonomic community just don't get sustainability. I have always known this was a problem, but two events this week demonstrate just how much work there is to do in explaining why sustainability matters. Early this week I received a series of e-mails on the TDWG mailing list that said the websites for the two LSID projects on SourceForge are broken (see here and here). For the uninitiated, LSID stands for 'Life Science Identifier'. These are supposed to be the Globally Unique IDentifier (GUID) of choice for the taxonomic community. In essence this is the system of numbering (a barcode if you like) that we give biodiversity data, such that we can electronically find it again. In theory, LSID's were our community’s way of guaranteeing the sustainability (i.e. citability) of biodiversity data, and it is thus deeply ironic that the LSID project has itself proven unsustainable.

I have never been a big fan of LSID's as few within the community seem to be able to technically implement them, and even less people understand the social conventions (i.e. persistence in perpetuity) we must adopt to make them work. I have always understood this social challenge to be far greater than the technical challenge. However, my identifier of choice (URI's - specifically URL's) came in for a bit of a bashing this week, when a technical administrator from the American Museum of Natural History (AMNH) wrote to me asking to change a link on my website because the ENTIRE DOMAIN of the American Museum was about to change!!! Everything at "http://amnh.org" is going to be moved to a new host at "http://www.american-mnh.org" and the old domain is going to be "released for charitable purposes". In other words, as of June 1st 2009, all links to anything (data, papers, webpages etc) that point to amnh.org will break! [INSERTED 31 March, 09: AMNH have confirmed this was a hoax. See my comment below]

To be fair I have not taken the trouble to check this out. Indeed, when I mentioned this to a colleague, they thought it must be a joke. Unfortunately I don’t think this is. Copied below is the original message I received. If someone can confirm the veracity of this message (and what "released for charitable purposes") actually means, do let me know:

MESSAGE SENT ON MARCH 19, 2009

Dear Mr. Smith,

my name is Andy Braxton, I am Technical Director at the "American Museum of Natural History" in NY.

I am writing to you today because as of June 1st, 2009 our website at http://www.amnh.org will no longer be accessible at this domain-name. Instead, we have moved our website to a new host, namely: http://www.american-mnh.org.

I therefore ask you to change your links to our website on the following subpage of yours:

Your Subpage: http://www.vsmith.info
Currently links to: http://www.amnh.org
Please change to: http://www.american-mnh.org

Please note that as of June 1st, 2009 all links pointing to amnh.org will be invalid. Our old domain-name has been released for charitable purposes and it will thus cease to show our content.

Thank you for your cooperation!

Sincerely Yours,
Andy Braxton (Technical Director)

American Museum of Natural History
Central Park West at 79th Street
New York, NY 10024-5192
USA

Mail: abraxton@american-mnh.org
Web: http://www.american-mnh.org

Comments

AMNH Domain Address NOT changing

Steve Mau, Director of Digital Technology at AMNH has contacted me to confirm that the message from "Andy Braxton" is some kind of hoax and that the AMNH domain will not be changing. Presumably the owner of american-mnh.org was trying to gain traffic (or worse) by pretending to be someone they are not. Apologies to AMNH for being caught out by this.

Re: Sustainability matters in informatics

Hi Vince. You don't come out and say this out loud, but surely you are thinking now: gosh, perhaps those LSIDs aren't such a bad idea after all!

Also, on your statement regarding LSIDs "few within the community seem to be able to technically implement them": if a given project/community wants the various features offered by LSIDs, then it is always going to be a non-trivial thing to implement + maintain such a system, LSID-based or not. If you use URLs (as the W3C wants you to) and some hodge-podge mix of approaches to handle metadata etc., I predict that would be no easier to do than implementing an LSID-based system, except for the fact that the LSID-protocol specifies the various desired features explicitly (it's what it was designed for!).
On the other hand, if one *just* want a permanent link to your stuff and are not interested in the rest of the feature set, then LSIDs are probably overkill and using PURLs would be a simpler solution, sure. Use the right tool for job, as the saying goes :)
Lastly, one main reason for LSIDs not being endorsed by the W3C is that they don't 'do something' when you put them in a browser URL-box, and (more importantly) can't be resolved by e.g. Semantic Web reasoners who can only follow http-links. But this shortcoming could be largely worked around by using up a resolver(s) service for LSIDs and use HTTP URLs (http://lsid.resolver-thingy.org/urn:lsid:...), and then embed these URLs in say RDF-XML data.

Simon: whatever the reason for it, the fact remains that the Sourceforge site is down and has been for quite some time, and it's just not encouraging when one is promoting the use of LSIDs to e.g. PIs and the main site for the project (top on the list of Google results!) is dead.

Re: Re: Sustainability matters in informatics

As Simon points out, "persistence in perpetuity is a problem for LSIDs, and ALL other GUIDs". My point is that this is primarily a social problem, not a technical problem. We already have a technical solution to the technical part of the problem (i.e. URL's). Furthermore the existing technical solution (URLs) plays nicely with existing technologies (resolve in web browsers, work with the semantic web etc). So why do we need to reinvent another technical solution (LSID's), especially one that is deficient in certain ways (doesn't resolve without another technical hack, doesn't play well with the semantic web). It is the social problem (i.e. link rot) that we need to address. The argument that we need another technical solution because people’s mindset with URLs is that they change is not sufficient justification for the effort involved in LSIDs.

LSID infrastructure was never down

Dear Vince,

I wanted to clarify the matter.

The pieces of the LSID infrastructure that really MATTER were NEVER OFF-LINE, not even for a minute, in the last year or so. Check a few examples:

The actual LSID infrastructure, including the TDWG LSID resolver and the LSID authorities listed there have always been operational, as you can see from the links above.

The open source community website that supports software developers involved in implementing LSID clients and resolvers also was never off-line either. See the following links:

What has been down is the website of the LSID project website that presents the technology, with a few pages of information and links to various resources. That site has been down because Source Forge changed their website provider configuration (with previous notice) that broke our setup, and we were not able to restore it. But we are working on it.

So, besides loosing a bit of the documentation and information about LSIDs, the underlying glue that keeps these links up and running kept working without much human intervention.

It's is indeed unfortunate that we were not able to restore the LSID website, but never for a moment the LSID infrastructure stopped working. So I suppose that your argument is moot.

I hope this clarifies the matter a bit.

Regards,

Ricardo Pereira
Volunteer Systems Administrator
Biodiversity Information Standards - TDWG

Re LSID infrastructure was never down

Dear Ricardo - realize this. As I said to Simon I was only commenting on the irony that the website of the project intended to give our community sustainability, has itself proved unsustainable. The LSID infrastructure is clearly doing fine, and there is community supporting it. This isn't my problem with LSID's. It is the fact LSID's solve a problem that already has a solution (URLs) that is my issue with LSIDs.

Re Re LSID infrastructure was never down

Vince - I beg to differ here. LSIDs were designed to deal with several problems somewhat specific to the life sciences. It is true that you *can* do almost everything in the LSID spec with URLs + various techniques, but as Mark Wilkinson says on his blog (http://semanticmusings.blogspot.com/2007/06/argument-for-lsids.html):
What worries me about NOT adopting a new identifier system as we move into the Semantic Web is that we start to hack and kludge our way to full functionality by adding novel behaiours on top of URLs, or start putting the "intelligence" of where to find data/metadata into redirects, purl URLs, or other nasty, centralized, and IMO unsustainable architectures.

See also more on this debate (aka 'URLs vs LSIDs wars') going back 2-3 years on the W3C SemWeb mailing list:
http://lists.w3.org/Archives/Public/public-semweb-lifesci/

Re Re LSID infrastructure was never down

Ultimately this comes down to a cost vs. benefits discussion, and [in my opinion] the bottom line is that the benefits are not enough to justify the costs when it comes to LSID's. In part this is because our solution has to transcend the life sciences, and there is little sign of any buy in from other communities.

Thanks for the link to Mark Wilkinson's blog post on this subject. Mark's statement "The Browser is going the way of the Dodo!" is emblematic of the gulf between developers and users. For developers, browsers just get in the way of machine-to-machine interactions. In contrast users are now more reliant on web browsers than ever before. The fact that LSID's cannot be resolved in browsers without additional software IS a major problem for most users, whose knowledge of IT does not transcend a Web browser. It’s much less of an issue for a developer.

Vince. I would contend the

Vince. I would contend the point about the browser resolution by saying that prefixing the LSID with a URL to a resolver service basically solves that particular problem. After all, the exact same thing applies to DOIs (don't resolve in a browser, but commonly passed around prefixed w/ http://dx.doi.org), and yet DOIs have been a major success :) Also, an important point was made on the TDWG mailing list recently: "DOIs have a business model. LSIDs do not" .
However, I should say that I am myself slowly changing opinion on the GUID front away from favoring LSIDs. Lately I'm thinking that DOIs are perhaps the GUID technology/infrastructure to turn to, in particular if certain changes are made to the DOI-registration pricing scheme (far cheaper bulk price per DOI, e.g. for mass-tagging say 0.5M elements). For certain things in my domain of interest (cataloging results from genome-wide scans for disease-associated variants), assigning DOIs to whole datasets is almost a no-brainer, and possibly relatively easy to implement by extending initiatives such as this one:
Publication and Citation of Scientific Primary Data" (STD-DOI) is a project funded by the German Science Foundation. Its aim is to make primary scientific data citeable as publications...

You're joking...

A project's website going down for a period of time doesn't mean that the project itself has died http://lists.tdwg.org/pipermail/tdwg-tag/2009-March/000386.html.

"Persistence in perpetuity" is a problem for LSIDs, and ALL other GUIDs.

As for saying you prefer URIs over LSIDs, that's just silly, LSIDs are URIs.

Re. You're joking...

I never meant to suggest that LSID's were dead. I was only commenting on the irony that the website of the project intended to give our community sustainability, has itself proved unsustainable. As for my statement suggesting I prefer URIs over LSIDs, you are of course, quite right, LSIDs are URIs. I have struck through the two words of offending text.

Oh, you must be kidding

Oh, you must be kidding me! Would they change the address of the digital library http://digitallibrary.amnh.org/ to http://digitallibrary.american-mnh.org/ ? The latter address does redirects to the former (current) one.

Reply

I know - I cannot quite believe it! I guess I should write to Andy and check this out.

View My Stats

Comments