Advanced usage

UNIQUE-ID in conversion datatable

The UNIQUE-ID is defined as the primary identifier for a specific metabolite. It represents the unique reference assigned to each metabolite in the database used to generate the conversion datatable (e.g., MetaCyc:Glucopyranose, MetaNetX:MNXM1364061, etc.). It also serves as the central reference point to which all other identifiers related to this metabolite, such as InChI, COMMON-NAME, ChEBI, … are linked.

In both the conversion datatable and the complementary datatable, the ``UNIQUE-ID `` must appear as the first column. This ensures consistency, as the identifier uniquely facilitates data validation and matching across different sources, with all complementary information related to a metabolite linked to it.

Therefore, the UNIQUE-ID serves as the central reference point for detecting potential ambiguities between datasets and for eliminating redundancies.

MNM_ID: a unique-id for maf files

The MNM_ID is a unique-id generate at the begining of the MetaNetMap run. MAF files provided by user are rewritten with a unique identifier assigned to each row corresponding to a metabolite. They will be rewritten in a This directory is generated in a output folder generate in the main outpu_folder. This ensures fast and unambiguous tracking of all IDs associated with a given row, and prevents confusion in cases where metabolites from different rows map to the same metabolite in the metabolic network, an ambiguity that is then reported in the Partial match column.

If multiple MAF files are provided, the IDs are assigned progressively, taking into account the IDs already generated from previously processed MAF files. For example, if two MAFs contain 10 and 15 metabolites respectively, the first file will receive IDs from 1 to 10, and the second from 11 to 26.

Note

The IDs are assigned according to the order in which the tool reads the files. The first file processed will receive the first IDs, and so on. This order may not correspond to the user’s intended ordering unless the files have been checked and sorted beforehand.

Creating your own conversion datatable

Requirements and structure:

  • The first column must be a UNIQUE-ID that links to the MetaCyc/MetaNetX database or your own unique identifiers.

  • All following columns normally follow the column names listed below, but you can add others with different names if needed.

  • It is recommended to keep the columns ChEBI, PUBCHEM, and InChIKey with the same names, as Metanetmap performs a preprocessing step to check that they contain the correct prefixes (ChEBI: , PUBCHEM: , or InChIKey= ) and adds them if necessary. If the columns do not have the correct name, this preprocessing will not be performed.

  • For SYNONYMS, the synonyms column also undergoes preprocessing, since in our data tables, the expected syntax is a list: [‘synonym1’, ‘synonym2’]. If you want to include synonyms, please use this syntax.

  • The file must be in tabular format (e.g., TSV), with headers.

    Note

    The following column names are recognised:

    UNIQUE-ID, ChEBI, COMMON-NAME, ABBREV-NAME, SYNONYMS, ADD-COMPLEMENT, MOLECULAR-WEIGHT, MONOISOTOPIC-MW, SEED, BIGG, HMDB, METANETX, METACYC, LIGAND-CPD, REFMET, PUBCHEM, CAS, INCHI-KEY, SMILES

Pre-process mapping

For metabolomic data, whether provided as single or multiple files, metabolite information will be uniquely considered by a unique identifier such as unique-id, common-name, etc that is expected to be described in the first column of the input. Those identifiers will also be the unique values to which the output file relates to, indicating which metabolites were matched. Other columns, such as ChEBI, InChIKey, etc, available in the metabolite annotation tables will be considered as metadata to improve matching chances.

Metabolic network data is processed by extracting the identifiers and names of metabolites, together with all available metadata, such as ChEBI or InChIKey for instance.

MetaNetMap performs a preprocessing step to ensure that certain columns contain the correct prefixes (ChEBI: , PUBCHEM: , or InChIKey= ). This helps avoid conflicts between numerical values in different columns and reduces the risk of mismatches by harmonizing identifiers as much as possible between metabolomic data, metabolic network, and the conversion datatable.

Partial match

Partial match is an option to the mapping mode that aims at rescuing unmatched entries. Note that it can be time-consuming depending on the number of unmatched metabolite signals. It is a post-processing step applied to metabolites or IDs that were not successfully mapped during the initial run. These unmatched entries are re-evaluated using specific strategies, which increase the chances of finding a match (e.g., via ChEBI, InChIKey, or enantiomer simplification).

After this processing step, the entire mapping pipeline is re-executed, taking the modifications into account.

The following treatments are applied:

  • ChEBI (only if a ChEBI column exists in the metabolomics data): For each row containing a ChEBI ID, the API from EBI is used to retrieve the full ChEBI ontology of the metabolite. These related terms are then remapped against the target databases.

  • InChIKey (only if a InChIKey column exists in the metabolomics data): An InChIKey is structured as XXXXXXXXXXXXXX-YYYYYYYAB-Z. The first block (X) represents the core molecular structure. We extract only this primary structure to increase the chances of a match during the second mapping phase.

  • Enantiomers: Stereochemistry indicators (L, D, R, S) are removed from both the metabolomics data and the databases. This improves matching rates, since stereochemical information is often missing in metabolomics datasets.

To facilitate the analysis of the results, every match found during this process will be automatically added to the “Partial match” column in the output. The user should be cautious when using these matches, as they require manual validation before any interpretation.

Handling Ambiguities

Using a large amount of cross-referenced data increases the probability that inconsistent mappings will occur and, consequently, the risk of ambiguity. The same metabolite may match multiple times in the conversion datatable, in the metabolomic data, or in the metabolic networks.

Two main types of ambiguities can be mentioned:

  • One metabolite of the metabolomic annotation profile maps to two distinct metabolites of the GSMN.

  • One metabolites of the metabolomic profiles map on two identifiers in the conversion datatable.

  • Two distinct metabolites of the metabolomic profiles map to the same metabolite of the GSMN.

When multiple input metabolites correspond to the same unique identifier or vice versa this situation is flagged as an ambiguity and is automatically added to the “Partial match” column in the output.

The tool does not attempt to resolve this conflict automatically. Instead, these entries are explicitly marked, so the user can manually review and resolve the ambiguity. This ensures data integrity and allows the user to decide whether:

  • The match is correct and can be accepted;

  • The mapping should be adjusted or ignored;

  • Further curation is needed (e.g., manual verification against synonyms, names, or external identifiers).

This behaviour helps avoid/reduce false positives during automatic matching.

It should be noted that the effectiveness of ambiguity handling may vary depending on the structure of the data, whether they are metabolic datasets, metabolic networks, or conversion tables.

For example, although the InChI is theoretically unique for a single metabolite, in practice, some databases or metabolic networks may associate one InChI with multiple metabolites, which introduces ambiguity.

Similarly, in metabolomic data, certain columns may combine or concatenate several types of identifiers. This can reduce the likelihood of accurate matching, as the identifiers are not clearly separated. Specific preprocessing steps are implemented, for example, adding prefixes to identifiers such as PUBCHEM or ChEBI to standardize IDs and prevent conflicts between numeric values, ultimately improving comparison accuracy.

Note

A new version with the addition of UNIQUE-ID in metabolomics data will be implemented soon to facilitate parsing.