The purpose of this BEP is to improve references in BEL terms (<namespace>:<name>
) by adopting the OBO syntax of references (<namespace>:<identifier>!<name>
).
For example, p(HGNC:MAPT)
, could be optionally written using the OBO syntax of p(HGNC:6893!MAPT)
, which includes both the identifier (6893) and the label (MAPT) of the human Tau protein.
This BEP also more specifically defines what characters are allowed in namespaces.
BEP-Id: BEP-0008
Status: Published
Version: 1
BEL-Version: 2.0.0+
Authors: Charles Tapley Hoyt
Created-Date: 2019-07-25
Type: Standards Track
Grounding entities referenced in BEL documents is non-trivial because it is more common to use BEL namespaces that list the labels for named entities rather than their identifiers. While this improves readability of BEL documents, it makes it difficult to ground entities because the labels change so often, and there is no standard way of maintaining these resources. The inclusion of OBO-style identifiers would allow users to write BEL documents that are both easier to validate while preserving the readability and understandability of the text. Entities referenced with OBO-style identifiers could be looked up for lexical correctness using services like Identifers.org then other (yet defined) standardized BEL resources could be used to optionally check the correctness or up-to-datedness of the label that was used. Alternatively, BEL systems could decide to completely throw away the labels specified by users and look them up themselves. This would alleviate the some of the burden posed by grounding entities and maitenance of BEL namespaces in some cases, while leaving the legacy system in place while we consider how to upgrade old documents. Tools like the Ontology Lookup Service and Glida will continue support users in finding both the identifier and name for each entity, and this will likely not increase the effort needed for curation.
p(HGNC:MAPT)
p(HGNC:6893!MAPT)
p(HGNC:6893 ! MAPT)
p(HGNC:MAPT, loc(GO:cytosol))
p(HGNC:MAPT, loc(GO:GO:0060402!cytosol))
p(HGNC:MAPT, loc(GO:GO:0060402 ! cytosol))
Note that GO identifiers have a redundant GO:
prefix. This is okay.
p(HGNC:MAPT, pmod(GO:"protein phosphorylation"))
p(HGNC:MAPT, pmod(GO:GO:0006468!"protein phosphorylation"))
p(HGNC:MAPT, pmod(GO:GO:0006468 ! "protein phosphorylation"))
p(FPLX:RAS, pmod(F)) directlyIncreases tloc(p(FPLX:RAS), fromLoc(MESH:"Intracellular Space"), toLoc(MESH:"Cell Membrane"))
p(FPLX:RAS, pmod(F)) directlyIncreases tloc(p(FPLX:RAS), fromLoc(MESH:D042541!"Intracellular Space"), toLoc(MESH:D002462!"Cell Membrane"))
p(FPLX:RAS, pmod(F)) directlyIncreases tloc(p(FPLX:RAS), fromLoc(MESH:D042541 ! "Intracellular Space"), toLoc(MESH:D002462 ! "Cell Membrane"))
There are other places where named identifiers can show up, like inside other types of modifications and inside fusion. The handling of these references would be consistent across all instances.
DEFINE
statement at the top of the BEL document be updated to allow users to specify which type of references they want to use for each namespace?Anywhere that an identifier was possible using the <namespace>:<name>
syntax, it should be possible to use the <namespace>:<identifier>!<name>
syntax.
Names containing characters that were neither alphanumeric nor an underscore were quoted.
In general, identifiers are more well-formed and will not require the option to be quoted, while the name will still be able to be quoted.
Like in OBO, there is the optional ability to leave whitespace between the <identifier>
and !
as well as between the !
and <name>
.
Entity IDs are composed of namespaces (prefixes), database/terminology identifiers and optional labels, e.g. <namespace>:<accession>!<label[Optional]>
[\w\.\-]+:[^\s\,\)\!]+\s*!?\s*[\,\)]
[\w\.\-]+:".+.*?"\s*!?\s*".*?"
(any double quotes in the quoted strings for the accession or label must be escaped)Namespaces may contain upper or lower case alphanumeric characters, ‘.’ periods, dash and underscores, e.g. regex: [\w\.\-]+
This is a superset of what was allowed in BEL 1-2.1 (uppercase alphanumerics only).
This has been updated to be compatible with identifiers.org (https://registry.identifiers.org/prefixregistrationrequest which has the following instruction on namespaces: Character string meant to precede the colon in resolved identifiers. No spaces or punctuation, only lowercase alphanumerical characters, underscores and dots. Example: ensembl.plant or ec-code or chebi NOTE: contrary to instruction - there are several identifiers.org prefixes using dashes)
Recommended are unique identifiers from the source database/terminology service. The Accession ID must be surrounded by double-quotes if it contains spaces, a comma, an exclamation mark or a closing parenthesis, regex: [\s\,\)\!]
The label must be surrounded by double-quotes if it contains a comma or a closing parenthesis, regex: [\,\)]
The addition of this BEP does not break backwards compatibility, but it does bring up the concern that using a combination of BEL-style and OBO-style references might make handling BEL documents more tricky and less consistent.