This manual is designed to enable an annotator (curator) to follow the sequence of steps involved in checking and amending entries in ChEBI. All operations are carried out using the ChEBI annotator tool.
Inputting of User Name and Password into the login page takes the user to the 'Welcome' page.
Located on the left-hand side of the page.
Enables the annotator to search the complete content of the database. Annotators may enter names or partial names, ChEBI IDs, other IDs (e.g. KEGG, Beilstein), synonyms, InChI, SMILES, etc. Searches are case insensitive unless the 'Case Sensitive?' box is checked. The wildcard character is %.
Allows the annotator to perform searches of unchecked entries only or to base the seach criteria on the classification within the ontology. For the unchecked-entry search, a drop-down menu allows the source of the entry (Chemical Ontology, IntEnz, or KEGG COMPOUND) to be specified; the search is case-insensitive unless the 'Case Sensitive' box is checked. For the Ontology Classification search, the annotator is able to base the search on compound status (CHECKED or OK), classification (CLASSIFIED or UNCLASSIFIED) and relation status (CHECKED or OK).
Allows two entities to be merged by inputting the IDs. The Merge Compounds screen requires the annotator to select which of the original ChEBI names, definition, default structures and ontology trees are to be retained. Failure to check any of the options will result in an error notice. Full details of one or both entries may be displayed by clicking on the relevant 'show compound' links. The merge procedure can be cancelled at any time by use of the Cancel Changes tab. Care must be taken when merging entries as there may be far-reaching consequences, especially if one or more of the entries is already publicly visible. The annotator must make absolutely certain by checking all data sources that the two entries being merged relate to one and the same entity. If in doubt, or if for example there are unproven stereochemical differences or ambiguities between two entries, the annotator should not perform a merge but relate the entries to one another through the ontology.
Takes the annotator to the Demerge Compounds screen, allowing selection of which children are to be demerged from the parent. As for 'Merge', the demerge procedure can be cancelled by use of the Cancel Changes tab. For the same reasons as given above for merging, care must also be taken when demerging entries. Examples of when demerging is justified are (1) when differences exist between names and synonyms for entries that have been subject to a previous automatic merging and (2) when it is desirable to distinguish between acids and their conjugate bases. Caution: all demerged entries will inherit the ontology structure of the parent entry. Therefore the annotator will need to modify or delete relationships which are no longer valid.
Allows addition of a new entry to the database. Specifying the ChEBI name (see below) and submitting will allow the system to generate a new ChEBI ID on a Compound Result screen, with the new entry having an initial status OK. Before adding a new compound the annotator should always conduct a thorough search in the database for names, registry numbers and database links. If a compound is then found already to exist, the annotator should use this rather than creating a new one.
Allows the annotator to view logs relating to automated procedures (e.g. KEGG updates, incorporation of new sources, automated merges and demerges).
Leads to the 'Special Characters List', a list of xml tags used to enhance the chemical mark-up.
Speaks for itself.
Allows the side menu to be hidden, maximising the view of the main screen, a particularly useful feature when using a browser such as Mozilla Firefox which does not allow line wraps. The side menu may be reinstated via the <<show link.
This is the main screen on which the results for a ChEBI entity are displayed. It offers the annotator six tabs: View, View SC, Edit Compound, Edit Ontology, Edit Structure, Edit Comment.
Displays the following features:
ChEBI Name – The name recommended by the annotator for use within the biological community.
ChEBI ID – The stable ID assigned by the system. IDs are assigned in sequence and their absolute values have no inherent meaning.
ChEBI ASCII Name – The ChEBI name with any special characters rendered in ASCII format.
Definition – a definition of the entry, especially relevant for classes of entities, less so for instances.
The view of an entity selected by the annotator as being of prime importance and which will be the main structure displayed on the public web interface. If there is more than one graphical structure, these may be viewed via a 'more>>' link. The status of the structures (OK, CHECKED, DELETED, OBSOLETE) is indicated by a colour-coded frame. Also displayed are the SMILES string and the InChI, both derived from the MDL molfile corresponding to the default structure.
The status of the entry is indicated as OK, CHECKED, DELETED or OBSOLETE. Details of type of merger (automatic or annotator), who and when created it, and who and when last modified it are supplied.
Assigned (manually by the annotator) whenever possible; generally the molecular formula. The use of subscripts is avoided. The source is stated, together with the status and an indication of which child from a merged entry was the source or whether this was from a parent.
Mass and charge, calculated automatically from the default structure, will appear here and should be checked.
Shows the relationships relevant to the entity being viewed. Clicking on 'Tree View' opens up a visual depiction of the tree with the different types of relationship being indicated by different symbols. Entries and relationships with status OK (i.e. unchecked) are shown in light grey, while those with status CHECKED are in dark grey. The line for the entity currently being viewed is shown in bold type. Clicking on 'Parents and Children View only' returns the annotator to the default (textual) display of relationships.
Shows one or more names for an entity based on current IUPAC recommendations.
Shows all synonyms and their sources. Only those with status CHECKED will be viewable to the public.
Provides accession numbers and (if available) links to source databases. Only those with status CHECKED will be viewable to the public.
Lists CAS, Beilstein and Gmelin Registry Nos., where these are available.
Shows comments added by an annotator, relating either to a single data entry or to a complete entity.
Displays the standard view but with the addition of xml tags around special characters.
The main screen used by the annotators for editing the main text of an entry.
The screen used for editing the ontology.
The screen used for editing the various structural representations of an entity.
Allows the annotator to add and edit a comment, either to a single data entry or to an entity as a whole.
This is the screen upon which an annotator can edit all details of a ChEBI entry except for its structural data, comments and relationships within the ontology.
The recommended name may be changed by the annotator to bring it into line with current usage within the biological community. Although there is a limit on the number of characters in a ChEBI Name, this is enormous (around 4000) and it is highly desirable that such names are kept short – abbreviations (e.g. ATP, NAD) are acceptable. A good maximum number of characters to work to is 50. Special characters are encoded using the xml tags listed in the Help file. Care must be taken to use the correct tags with characters that can be used in more than one context, e.g. to distinguish between <stereo>alpha</stereo> and <locant>alpha</locant>. To aid in selection of the correct tags a Special Character tool has been incorporated, accessible via a link next to the ChEBI Name field; similar links to this tool are found next to all those fields on the Edit Compound screen into which free text can be input.
Unless it is an abbreviation (e.g. ATP, NADPH), a ChEBI Name should start with a lower case letter, not a capital (unless this is a special character relating to stereochemistry or denoting an element).
Changing the ChEBI Name has consequences for other databases and resources which use the ChEBI Name as a reference and hence great care must be taken when making changes to ChEBI Names.
NB. In the case of IntEnz the ChEBI Name may be used within the Reactions field if no IntEnz Name exists. However if an IntEnz Name exists then the changing the ChEBI Name will have no effect on IntEnz.
A singular name should always be used unless the entity is a class and a singular entity already exists within the database, in this case a plural can be used. For example: porphyrin (CHEBI:8337) is an entity, porphyrins (CHEBI:26214) is a class.
The curator tool will produce validation errors if:
A definition may be added. This is especially relevant to classes of compounds which appear at the higher levels of the ontology. Good sources of definitions for the Molecular Structure Ontology are the IUPAC Gold Book (http://goldbook.iupac.org/
) and the various IUPAC documents on nomenclature and terminology (see http://www.chem.qmul.ac.uk/iupac/
), while for the Biological Function and Application Ontologies, (modified) definitions of MeSH terms can be adopted. No sources of definitions need to be cited.
The curator tool will produce validation errors if:
The annotator should change the status of an entry to CHECKED only when all details, including those relating to structure and the ontology, have been edited to the annotator's satisfaction. An entry which has status CHECKED will be viewable on the public web interface and included in the downloadable files at the next release.
Any formula derived from a primary source should be checked and if correct its status changed to CHECKED. If a different formula is to be added, the status of the incorrect formula should be changed to DELETE and the new formula added with status CHECKED. Subscripts and hyphens should not be used. The order of atomic elements within molecular formulae should follow the Hill system (http://en.wikipedia.org/wiki/Hill_system
). The source must be specified using the dropdown menu – if arising from an annotator's own brain this should be indicated as 'ChEBI'. If an entry cannot be assigned a formula (typically in the case of a class of compounds), then a dot '.' should be entered into the formula field and its status kept as 'OK'.
The following conventions regarding ChEBI formulae should be followed:
The curator tool will produce validation errors if:
Synonyms derived from primary sources will be displayed along with details of the source and status. The annotator should check the status of each synonym and amend if necessary. NB. Annotators must take extra care when contemplating deletion of any synonym derived from IntEnz that has type NAME, as this will also in effect cause a similar deletion within IntEnz.
Any new synonym which the annotator considers relevant should be added along with its source. Cross-reference to the source should be via links in the Database Accession or Registry Numbers sections (see below).
A synonym taken from an external source should not normally be altered when being entered into ChEBI. However, if there is a real need to make alterations (e.g. in order to rearrange an index style of presentation, or to correct errors in the nesting of brackets), then the 'Adapted' checkbox next to the synonym should be ticked.
IUPAC names should also be added here. An 'IUPAC name' is a name based on current recommendations of IUPAC. It need not be fully systematic as it can make use of 'retained' and 'preselected' names. Some relevant sources are:
A Guide to IUPAC Nomenclature of Organic Compounds, Recommendations 1993 Nomenclature of Organic Chemistry, Sections A, B, C, D, E, F and H, 1979 Edition. ('The Blue Book') – largely superseded but still useful for class names and older trivial names. Compendium of Biochemical Nomenclature, 1993 Edition ('The White Book') – however many sections have been superseded. Nomenclature of Inorganic Chemistry (recommendations 1990) ('The Red Book') Nomenclature of Inorganic Chemistry II. Recommendations 2000 Nomenclature of Inorganic Chemistry - IUPAC Recommendations 2005 ('The Revised Red Book'; largely supersedes the 1990 and 2000 editions) IUPAC Compendium of Chemical Terminology ('The Gold Book'), 1987. A revised version in electronic form is available at http://goldbook.iupac.org/index.html
. Compendium of Macromolecular Nomenclature, 1991 ('The Purple Book')Further details of these and other IUPAC nomenclature documents are available at http://www.iupac.org/publications/books/seriestitles/nomenclature.html
and http://www.chem.qmul.ac.uk/iupac/
.
Annotators should bear in mind the following points when entering IUPAC Names:
The curator tool will produce validation errors if:
All database accessions listed should be checked and amended if necessary. Status must be 'CHECKED' for lines to be viewable on the public web interface. New entries are added using the 'Add Database Accessions' facility (see below).
The curator tool will produce validation errors if:
All numbers listed should be checked and amended if necessary. Status must be 'CHECKED' for lines to be viewable on the public web interface. New entries are added using the 'Add Database Accessions' facility (see below). Beilstein and Gmelin Registry Numbers can be added if known (but note that these numbers constitute the only data that ChEBI can include from these two sources, owing to the databases not being freely accessible).
The curator tool will produce validation errors if:
Used by the annotator for the entering of new database accessions or registry numbers. The type and source must be selected from the dropdown menus.
Changes may be incorporated by clicking on 'Submit Changes'. Erroneous changes may be cancelled at any time up to submission by clicking on 'Cancel Changes'.
Using this screen, an annotator can both edit existing relationships between entities and create new ones.
In the three sub-ontologies "Biological Role", "Application" and "Subatomic Particle" a singular ChEBI Name should always be used. A plural ChEBI Name is allowed within the "Molecular Structure" sub-ontology if the entity is a class and the singular ChEBI Name already exists.
This view lists all the parent and child relationships directly pertaining to an entity and their status [CHECKED, OK, DELETED or OBSOLETE (the OBSOLETE status can be created only by the system)]. Only relationships with status CHECKED and OK will be included in the tree structure and be visible on the public web interface. The annotator must check each existing relationship and amend if necessary.
When editing an existing entry, the annotator needs to check all its non-OBSOLETE relationships and leave these with status CHECKED or DELETED. No relationships may be deleted which would cause an entity to be separated from the tree: it is necessary to create a new relationship prior to deleting the last unwanted one.
Hint: When creating and editing relationships, it is useful to open the annotator tool in two or more tabs or separate windows to facilitate rapid copying and pasting of ChEBI IDs.
Displays in graphical form the tree structure. All direct lines upwards are shown together with downward lines only as far as immediate children. Checked entries are shown in a darker grey with the line for the entity currently being viewed being in bold type. Annotators may navigate around the ontology in the tree view by clicking on any displayed line. A table of relationships and their shorthand symbols is displayed at the right-hand side of the tree view. Brief descriptions of the sub-ontologies and relationships in the ChEBI Ontology are provided in Sections 5.4 and 5.5 respectively, with fuller descriptions and examples being included in the ChEBI User Manual, accessible via the public web interface.
Allows the annotator to add a new relationship. Dropdown menus are provided for selecting the type of relationship while the ID for entity to which the new relationship refers is entered into the relevant box.
Changes may be incorporated by clicking on 'Submit Changes'. Erroneous changes may be cancelled at any time up to submission by clicking on 'Cancel Changes'.
The tool has general validations which apply to most relationship types. In general when the term "enabled" is used to describe a relationship it means that its relationship status is either "CHECKED" or "OK".
The curator tool will produce validation errors if:
Validations when creating new ontology relationships:
ChEBI Ontology is subdivided into four separate sub-ontologies:
Classifies molecular entities according to structure.
Classifies entities on the basis of role, e.g. as antibiotics, antiviral agents, coenzymes, enzyme inhibitors.
Classifies entities, where appropriate, on the basis of their applications, e.g. as pesticides, detergents, healthcare products, fuel.
Classifies particles which are smaller than atoms.
Relationships can be created between an entity and either a parent or a child. To create a new relationship between two entries, open the Edit Ontology feature for one of them and enter the ChEBI ID for the other in the appropriate box, selecting the type of relationship from the dropdown menu.
The relationships used in ChEBI are:
Is aUsed to imply that 'Entity A' is an instance of 'Entity B' or that 'Class A' is an instance of 'Class B'. This is the chief hierarchical non-cyclic relationship used thoughout the ontologies.
Is part ofThis relationship is used to denote the relationship between a part and the whole, especially between components of a salt or an addition compound; a substituent and the compound into which it is substituted; and in the higher levels of the ontology to describe its subdivision.
Is conjugate base of and
Is conjugate acid ofCyclic relationships which are used mainly between acids and their conjugate bases. When creating a new relationship, only one of these needs to be entered, as the system will create the reverse relationship. Note that although the IUPAC definition of conjugate acid/base refers to a difference in charge of 1 unit only, for ChEBI this is relaxed to include multiple charge differences. This is especially relevant to di- and poly-carboxylic acids [e.g. ChEBI uses the relationship "succinic acid is_conjugate_acid_of succinate(2—)"].
Is tautomer ofA cyclic relationship used to show the interrelationship between two tautomers, where the differences between the structures are significant enough to warrant their separate inclusion in ChEBI.
Is enantiomer ofA cyclic relationship used when two entities are enantiomers of each other. An entity may have this relationship with only one other entity.
Has functional parentUsed to denote the relationship between two molecular entities (or classes of entities), one of which possesses one or more chacteristic groups from which the other can be derived by functional modification. This relationship is especially useful to demonstrate the relationships between a number of functionalised entities and a common less-functionalised parent.
Has parent hydrideUsed to denote the relationship between an entity and its parent hydride (defined by IUPAC as "an unbranched acyclic or cyclic structure or an acyclic/cyclic structure having a semisystematic or trivial name to which only hydrogen atoms are attached").
Is substituent group fromIndicates the relationship between a substituent group (or atom) and its parent molecular entity, from which it is formed by loss of one or more protons or simple groups.
Structures are input and edited using the MarvinSketch Applet. To open this, first open the Edit Structure screen and click in the box at the right-hand side. Structures for inputting may be drawn manually or copied and pasted from other applications, e.g. ACD/Name.
The applet allows 2D structures to be drawn, with stereochemistry at chiral centres being indicated by bold and dashed wedges, with the points of the wedges directed towards the stereocentre. In cases where stereochemistry at a centre is possible but not specified, a plain bond linking the stereocentre and the substituent is generally used (although in certain cases a wavy bond may be used to provide emphasis). Where stereochemistry across double bonds is not defined, this is indicated by use of a wavy bond to H or, if fully substituted, to one of the substituents.
Attention is drawn to the document 'Graphical Representation of Stereochemical Configuration (IUPAC Recommendations 2006)', published in Pure Appl. Chem. Vol. 78, No. 10, pp 1897-1970, 2006, which gives recommendations on preferred and acceptable ways of displaying 3D stereochemical information in 2D diagrams, along with examples for all types of stereochemical configuration.
A manipulatable 3D view (e.g. ball-and-stick or wireframe) may be generated from the 2D structure by use of the 3D viewer (go to View, Open 3D Viewer). Such structures may be added to the compound information via an extra MarvinSketch applet on the Edit Structure screen, but should not be used as the default structure.
If 3D coordinates are available as a 3D molfile, e.g. from a crystal-structure determination, these may also be added directly to the molfile box on the Edit Structure screen to create an extra graphical structure.
It is possible to generate simple whole-integer atom labels on a structural diagram using the MarvinSketch applet. Right-clicking on an atom and then selecting 'Map' will allow an atom label between 1 and 99 to be selected and added to that atom. However, such labelled structures must never be used as a default structure.
The MDL molfile for a structure is displayed in a window on the left-hand side of the screen. Information between this window and the graphic display is transferred by use of left and right radio buttons. Molfiles may be entered directly by copy-and-paste from other external databases, e.g. KEGG COMPOUND.
Every compound entry which has a structure should be assigned a default structure. The InChI and the SMILES will automatically be generated from this default structure if possible. Tautomer generation for the InChI has been switched on making InChIs generated by tautomers distinguishable.
The annotator may find it useful to add one or more comments, either for public viewing or for internal use only. Such comments may be associated with a specific item of data or with the entry as a whole. The text is keyed into the Add Comment box and its association selected by checking the appropriate 'Select item' or 'General comment on compound' radio button. The comment is then incorporated by clicking on 'Submit Changes'. Erroneous comments may be cancelled at any time up to submission by clicking on 'Cancel Changes'.
Changes to existing comments may also be made via the Edit Comment screen.
The following are the minimal requirements for an entity or class to be checked.
Example: sulfanediyl group (CHEBI:29830).
It has:
Usually, ChEBI name is formed as 'name + group'. This name is not necessarily 'IUPAC name + group'.
A structure for a group must contain at least one pseudoatom (attachment point) which is indicated with an asterisk, *. It is important that the annotator tick the 'Validation Off' box since otherwise ChEBI will not accept the structure.
The group should be attached to its parent molecule via the relationship that we use only for groups:
sulfanediyl group (CHEBI:29830) is substituent group from hydrogen sulfide (CHEBI:16136)
ChEBI follows the MDL mol format
specification for its molfiles and what follows is a summary of this file.
Below is a list of the types of file formats available from MDL.
In ChEBI we use the molfile format but as we will see later on it allows various properties from the other files.
In the table below is a list of properties allowed in the properties block of a connection table. The molfile format allows all properties except the [Reaction] properties.
Please refer to Pg 15 of the format specification for an exact list of all the properties table. All the properties listed in the table under molfiles are allowed in molfiles but they will have restrictions on when they can be used. For example, the RGroup attachment point (APO) requires that an RGroup be present in the connectivity table.
You are viewing a mobilized version of this site...
View original page here