Inputs and outputs: Third-party database building mode
Note
All input files are required to use tab characters as field delimiters.
Structure
Example of directory structure (but files and directories can be placed anywhere):
example: MetaCyc_db ├── compounds.dat (MetaCyc) MetaNetX_db ├── chem_xref.tsv (MetaNetX) ├── chem_prop.tsv (MetaNetX) Complementary ├── complementary_datatable.tsv (for MetaCyc/MetaNetX) logs/
Input data
Note
Not all data listed below are mandatory.
The easiest way to build a conversion datatable is to use only MetaNetX data (chem_xref.tsv and chem_prop.tsv files).
You can provide the files or directly let the tool download them for you with the command metanetmap build_db --db metanetx.
File/Directory |
Description |
|---|---|
metacyc_compounds |
Text file provided by the MetaCyc database |
chem_xref |
Tabular file from MetaNetX with ref to others db |
chem_prop |
Tabular file from MetaNetX with properties |
complementary_datatable |
Tabular file provided by the user (see details below) |
output |
Output directory for db download and conversion datatable results and logs |
Details on input files
metacyc_compounds.dat (MetaCyc):
compounds.dathas to be provided by the user. Access to this file requires a licence for MetaCyc
The following is an exemple of entry for the compound WATER from a MetaCyc flat file .dat extension. The file is structured as key-value pairs, where each line represents a specific property or annotation of the compound.
Some keys, such as CHEMICAL-FORMULA, SYNONYMS, or DBLINKS, may occur multiple times. Values can contain nested content, quotes, or formatting (e.g. HTML tags in names).
Some key characteristics (non-exhaustive)
Field |
Description |
|---|---|
|
Primary identifier of the compound in the MetaCyc database. |
|
Declares the type of entity — typically |
|
Human-readable compound name. May contain HTML formatting. |
|
Chemical composition split across multiple lines, each specifying an element and its count. |
|
Cross-references to external databases such as BiGG, ChEBI, HMDB, KEGG, PubChem, etc. Multiple lines. |
|
Standard InChI string describing the molecular structure. |
|
Hashed InChI identifier (short, fixed-length string) used for quick comparison of chemical structures. |
|
A template indicating how this compound ID is generated or structured (e.g., starts with |
|
Octanol–water partition coefficient (logP), representing hydrophobicity. |
|
Average molecular weight based on atomic composition. |
|
Exact mass using the most abundant isotope for each element. |
|
Alternative or non-standard InChI representation. |
|
Topological polar surface area (TPSA) of the molecule. |
|
Simplified Molecular Input Line Entry System (SMILES) string representing the structure. |
|
Alternate or common names for the compound. Can appear on multiple lines. |
Example compound entry in the MetaCyc file
UNIQUE-ID - Primary identifier within the MetaCyc database (WATER).
TYPES - Declares the entity as a Compound.
COMMON-NAME - H<sub>2</sub>O.
CHEMICAL-FORMULA - Stored in multiple lines for atomic composition.
CHEMICAL-FORMULA - Stored in multiple lines for atomic composition.
DBLINKS - Cross-references to external databases such as BIGG, HMDB, ChEBI, etc.
DBLINKS - (CHEBI "15377" NIL |taltman| 3452438148 NIL NIL)
DBLINKS - (LIGAND-CPD "C00001" NIL |kr| 3346617699 NIL NIL)
INCHI - InChI=1S/H2O/h1H2 Chemical structure descriptors.
INCHI-KEY - InChIKey=XLYOFNOQVPJJNP-UHFFFAOYSA-N
INSTANCE-NAME-TEMPLATE - CPD-*
LOGP - -0.5
MOLECULAR-WEIGHT - 18.015
MONOISOTOPIC-MW - 18.0105646863
NON-STANDARD-INCHI - InChI=1S/H2O/h1H2
POLAR-SURFACE-AREA - 1.
SMILES - O
SYNONYMS - Alternate names for the compound.
SYNONYMS - H2O
SYNONYMS - hydrogen oxide
SYNONYMS - water
chem_xref.tsv (MetaNetX): Tabular file provided by the user from MetaNetX website. It can also be directly downloaded by MetaNetMap using the command:
metanetmap build_db --db metanetx
Each line represents an entry linking different identifiers or names for the same metabolite. This kind of table is commonly used as a mapping table between databases such as MetaNetX, SEED, BiGG, or ChEBI.
Column |
Name |
Description |
|---|---|---|
1 |
source |
Source database and identifier (e.g. mnx:BIOMASS, seedM:cpd11416, ChEBI:16234…) |
2 |
ID |
Corresponding MetaNetX or normalized identifier (e.g. MNXM01, MNXM02, BIOMASS) |
3 |
description |
Descriptive information, including names, synonyms, or
notes separated by |
Example entries
Source ID Description
BIOMASS BIOMASS BIOMASS
mnx:BIOMASS BIOMASS BIOMASS
seed.compound:cpd11416 BIOMASS Biomass
seedM:M_cpd11416 BIOMASS secondary/obsolete/fantasy identifier
seedM:cpd11416 BIOMASS Biomass
MNXM01 MNXM01 PMF||Translocated proton that acccounts for the Proton Motive Force||Not to be confused with H(+) (MNXM1)
mnx:PMF MNXM01 PMF||Translocated proton that acccounts for the Proton Motive Force||Not to be confused with H(+) (MNXM1)
CHEBI:16234 MNXM02 hydroxide||HO-||HYDROXIDE ION||Hydroxide ion||OH(-)||OH-||hydridooxygenate(1-)||oxidanide
CHEBI:29356 MNXM02 oxide(2-)||O(2-)||oxide
MNXM02 MNXM02 OH(-)||hydroxyde
bigg.metabolite:oh1 MNXM02 Hydroxide ion
biggM:M_oh1 MNXM02 secondary/obsolete/fantasy identifier
biggM:oh1 MNXM02 Hydroxide ion
chebi:13365 MNXM02 secondary/obsolete/fantasy identifier
chebi:13419 MNXM02 secondary/obsolete/fantasy identifier
chebi:16234 MNXM02 hydroxide||HO-||HYDROXIDE ION||Hydroxide ion||OH(-)||OH-||hydridooxygenate(1-)||oxidanide
chebi:29356 MNXM02 oxide(2-)||O(2-)||oxide
chebi:44641 MNXM02 secondary/obsolete/fantasy identifier
chebi:5594 MNXM02 secondary/obsolete/fantasy identifier
metacyc.compound:OH MNXM02 OH-||OH||hydroxide||hydroxide ion||hydroxyl||hydroxyl ion
metacycM:OH MNXM02 OH-||OH||hydroxide||hydroxide ion||hydroxyl||hydroxyl ion
mnx:HYDROXYDE MNXM02 OH(-)||hydroxyde
seed.compound:cpd15275 MNXM02 hydroxide ion||oh1
seedM:M_cpd15275 MNXM02 secondary/obsolete/fantasy identifier
seedM:cpd15275 MNXM02 hydroxide ion||oh1
vmhM:M_oh1 MNXM02 secondary/obsolete/fantasy identifier
vmhM:oh1 MNXM02 hydroxide ion||hydroxide
vmhmetabolite:oh1 MNXM02 hydroxide ion||hydroxide
Note
The
||separator indicates multiple synonyms or alternative names.Identifiers such as
MNXM##correspond to MetaNetX universal metabolite IDs.Lines describing
BIOMASSorPMFrepresent pseudo-metabolites used in metabolic network models.
chem_prop.tsv (MetaNetX):
This table lists basic information for metabolites or pseudo-metabolites, including chemical formulas, charges, molecular masses, and structure encodings. It links each metabolite to a reference identifier from a source database.
This file does not have to be provided by the user if MetaNetMap is used to download the necessary data, with the command:
metanetmap build_db --db metanetx
Table structure
Column |
Name |
Description |
|---|---|---|
1 |
ID |
Unique internal or MetaNetX identifier (e.g. MNXM01) |
2 |
name |
Common metabolite name (e.g. PMF, OH(-), H3O(+)) |
3 |
reference |
Source or cross-reference identifier (e.g. mnx:PMF) |
4 |
formula |
Molecular formula (e.g. H, HO, H3O) |
5 |
charge |
Net electrical charge (integer, may be 0, -1, +1, etc.) |
6 |
mass |
Molecular mass in Daltons (Da) |
7 |
InChI |
IUPAC International Chemical Identifier string |
8 |
InChIKey |
Hashed representation of the InChI |
9 |
SMILES |
Simplified molecular structure in SMILES format |
Example entries
BIOMASS BIOMASS mnx:BIOMASS
MNXM01 PMF mnx:PMF H 1 1.00794 InChI=1S/p+1 GPRLSGONYQIRFK-UHFFFAOYSA-N [H+]
MNXM02 OH(-) mnx:HYDROXYDE HO -1 17.00700 InChI=1S/H2O/h1H2/p-1 XLYOFNOQVPJJNP-UHFFFAOYSA-M [H][O-]
MNXM03 H3O(+) mnx:OXONIUM H3O 1 19.02300 InChI=1S/H2O/h1H2/p+1 XLYOFNOQVPJJNP-UHFFFAOYSA-O [H][O+]([H])[H]
Note
Some entries (like
BIOMASSorPMF) represent pseudo-metabolites used in constraint-based metabolic models.InChIandSMILESare standard line notations for representing chemical structures computationally.Charges and masses are provided for use in biochemical simulations and model balancing.
complementary_datatable.tsv:
Tabular file provided by the user
(MetaCyc)
UNIQUE-ID |
ADD-COMPLEMENT |
BIGG |
SEED |
|---|---|---|---|
|
(2S)-2-isopropyl-3-oxosuccinic acid |
||
|
(S)-dihydroorotic acid |
||
|
3-phosphoshikimic acid |
||
|
dann |
||
(MetaNetX)
UNIQUE-ID |
ADD-COMPLEMENT |
BIGG |
SEED |
|---|---|---|---|
|
(2S)-2-isopropyl-3-oxosuccinic acid |
||
|
(S)-dihydroorotic acid |
||
|
3-phosphoshikimic acid |
||
|
7,8-diaminononanoate |
dann |
The complementary_datatable is a tabular file provided by the user.
It allows users to add their own custom identifiers in order to improve matching with their metabolomic data.
Requirements and structure:
The first column must be a
UNIQUE-IDthat links to the MetaCyc/MetaNetX database.All following columns are free and may contain any identifiers or names. Their column names will be automatically included in the main conversion datatable.
The file must be in tabular format (e.g., TSV), with headers.
Important
If you have a metabolite without a matching ``UNIQUE-ID`` in MetaCyc/MetaNetX, you may assign it a custom or fictional ID in the first column.
This fictional
UNIQUE-IDwill still be included in the conversion table, and will be used if a match is found based on the name or identifier you provided.Be sure to keep track of any custom or fictional IDs you create, so you can filter or manage them later if needed.
Output data
File/Directory |
Description |
|---|---|
conversion_datatable |
Tabulated file, first column is the UNIQUE-ID in MetaCyc/MetaNetX |
logs |
Directory provides more detailed information |
Note
The conversion_datatable file acts as a bridge between the metabolomic data and the metabolic networks.
It combines all structured information extracted from the MetaCyc compounds.dat file or from MetaNetX files chem_xref.tsv and chem_prop.tsv files, along with any additional identifiers or metadata provided by the user through the complementary_datatable file.
This unified table serves as a comprehensive knowledge base that allows the tool to search across all known identifiers for a given metabolite, and match them between the input metabolomic data and the metabolic networks.
By leveraging both the MetaCyc/MetaNetX database and user-provided knowledge, the conversion_datatable enables robust and flexible mapping across diverse data sources.
The logs directory contains detailed information about the processing steps.
It is useful for debugging, auditing, and understanding how the tool performed the mapping and handled the input data.
Output data details for database building mode are below in Inputs and outputs: Mapping mode: Datatable_conversion_metacyc and Datatable_conversion_metanetx
For more details on how to custom you own conversion datatable and advanced methods (partial match, ambiguities, …), see Advanced usage