bep

Update Namespace and Identifier Handling

Abstract

The purpose of this BEP is to improve references in BEL terms (<namespace>:<name>) by adopting the OBO syntax of references (<namespace>:<identifier>!<name>).

For example, p(HGNC:MAPT), could be optionally written using the OBO syntax of p(HGNC:6893!MAPT), which includes both the identifier (6893) and the label (MAPT) of the human Tau protein.

This BEP also more specifically defines what characters are allowed in namespaces.

Preamble

BEP-Id: BEP-0008

Status: Published

Version: 1

BEL-Version: 2.0.0+

Authors: Charles Tapley Hoyt

Created-Date: 2019-07-25

Type: Standards Track

Rationale and Goals

Grounding entities referenced in BEL documents is non-trivial because it is more common to use BEL namespaces that list the labels for named entities rather than their identifiers. While this improves readability of BEL documents, it makes it difficult to ground entities because the labels change so often, and there is no standard way of maintaining these resources. The inclusion of OBO-style identifiers would allow users to write BEL documents that are both easier to validate while preserving the readability and understandability of the text. Entities referenced with OBO-style identifiers could be looked up for lexical correctness using services like Identifers.org then other (yet defined) standardized BEL resources could be used to optionally check the correctness or up-to-datedness of the label that was used. Alternatively, BEL systems could decide to completely throw away the labels specified by users and look them up themselves. This would alleviate the some of the burden posed by grounding entities and maitenance of BEL namespaces in some cases, while leaving the legacy system in place while we consider how to upgrade old documents. Tools like the Ontology Lookup Service and Glida will continue support users in finding both the identifier and name for each entity, and this will likely not increase the effort needed for curation.

Use Cases

  1. Specifying an abundance, gene, RNA, miRNA, protein, or named complex:
  1. Specifying a physical location:

Note that GO identifiers have a redundant GO: prefix. This is okay.

  1. Specifying a protein modification
  1. Specifying the locations of a translocation

There are other places where named identifiers can show up, like inside other types of modifications and inside fusion. The handling of these references would be consistent across all instances.

Discussion

Specification

Anywhere that an identifier was possible using the <namespace>:<name> syntax, it should be possible to use the <namespace>:<identifier>!<name> syntax. Names containing characters that were neither alphanumeric nor an underscore were quoted. In general, identifiers are more well-formed and will not require the option to be quoted, while the name will still be able to be quoted. Like in OBO, there is the optional ability to leave whitespace between the <identifier> and ! as well as between the ! and <name>.

Entity IDs are composed of namespaces (prefixes), database/terminology identifiers and optional labels, e.g. <namespace>:<accession>!<label[Optional]>

Namespace specification

Namespaces may contain upper or lower case alphanumeric characters, ‘.’ periods, dash and underscores, e.g. regex: [\w\.\-]+ This is a superset of what was allowed in BEL 1-2.1 (uppercase alphanumerics only).

This has been updated to be compatible with identifiers.org (https://registry.identifiers.org/prefixregistrationrequest which has the following instruction on namespaces: Character string meant to precede the colon in resolved identifiers. No spaces or punctuation, only lowercase alphanumerical characters, underscores and dots. Example: ensembl.plant or ec-code or chebi NOTE: contrary to instruction - there are several identifiers.org prefixes using dashes)

Accession IDs

Recommended are unique identifiers from the source database/terminology service. The Accession ID must be surrounded by double-quotes if it contains spaces, a comma, an exclamation mark or a closing parenthesis, regex: [\s\,\)\!]

Optional Label

The label must be surrounded by double-quotes if it contains a comma or a closing parenthesis, regex: [\,\)]

Backwards Compatibility

The addition of this BEP does not break backwards compatibility, but it does bring up the concern that using a combination of BEL-style and OBO-style references might make handling BEL documents more tricky and less consistent.

Reference Implementation