{ "cells": [ { "cell_type": "markdown", "id": "19917597", "metadata": {}, "source": [ "# Quick Start\n", "\n", "This python script describes how to use `LiRTMaTS` python package. The\n", "input data and retention time reference files used here are in\n", "https://github.com/wanchanglin/lirtmats/tree/master/examples/data.\n", "\n", "## Setup\n", "\n", "The users need to load python package `LAMP` before using `LiRTMaTS`. It's\n", "functions used here are for loading data set and summarising the matching\n", "results. For details, see https://github.com/wanchanglin/lamp." ] }, { "cell_type": "code", "execution_count": 1, "id": "b045fb5c", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:41.718675Z", "iopub.status.busy": "2025-12-04T12:29:41.718675Z", "iopub.status.idle": "2025-12-04T12:29:42.611230Z", "shell.execute_reply": "2025-12-04T12:29:42.610197Z" } }, "outputs": [], "source": [ "import sqlite3\n", "import pandas as pd\n", "from lamp import anno\n", "import lirtmats.lirtmats as rtm" ] }, { "cell_type": "markdown", "id": "f4c99a20", "metadata": {}, "source": [ "## Data Loading\n", "\n", "`LiRTMaTS` supports text files separated by comma (`,`) or tab (`\\t`).\n", "The Microsoft's XLSX is also supported, using argument `sheet_name` to\n", "indicate which sheet is used for input data. The default is 0 for the\n", "first sheet.\n", "\n", "Here we use a small example data set with `tsv` format. This data set\n", "includes peak list and intensity data matrix. `LiRTMaTS` requires peak\n", "list's name, m/z value and retention time. User needs to indicate the\n", "locations of feature name, m/z value, retention time and starting points\n", "of data matrix from data. Here they are 1, 2, 3 and 4, respectively.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "aae9685b", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:42.611230Z", "iopub.status.busy": "2025-12-04T12:29:42.611230Z", "iopub.status.idle": "2025-12-04T12:29:42.662591Z", "shell.execute_reply": "2025-12-04T12:29:42.662591Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemzrtD121A122A125A126A127A128B131...E214E215E216H234H235H236H237H238H239H240
0M102T899102.034153898.8501601.404584e+073.689953e+063.598363e+061.138875e+074.887524e+062.104782e+067.288258e+06...3.125203e+063.608369e+06NaN4.763811e+062.281365e+06NaN3.404450e+063.720441e+064.539032e+05NaN
1M102T849102.034154849.0853501.473961e+07NaN5.934387e+06NaN4.607624e+065.969186e+063.367949e+06...1.276006e+071.490770e+072.880142e+064.263577e+06NaNNaNNaN4.437697e+066.777076e+066.341930e+06
2M105T45105.04267745.3539425.520865e+051.813279e+052.734923e+052.342655e+056.241395e+041.068277e+051.192451e+05...3.092946e+041.788324e+051.810794e+053.225256e+05NaN3.734778e+051.935349e+05NaN1.094705e+051.946732e+05
3M105T54105.05496154.3500496.669635e+054.833251e+062.137479e+061.552473e+061.753294e+062.301363e+06NaN...1.186390e+063.001167e+062.558921e+06NaNNaN1.695460e+06NaN1.834140e+061.029692e+064.382618e+05
4M105T48_1105.07421647.5386266.310113e+05NaN5.199302e+054.302566e+055.650141e+053.635406e+051.096530e+06...7.882748e+05NaN9.822090e+054.974403e+053.604541e+051.340656e+06NaNNaN6.020203e+053.597655e+05
..................................................................
1995M299T296299.233645295.5695408.125150e+041.020165e+052.209362e+053.557402e+056.039153e+05NaN2.330915e+05...3.872671e+051.632064e+057.224218e+043.678394e+049.526812e+045.785549e+046.183749e+05NaN2.915690e+04NaN
1996M300T43_1299.91950442.8320665.042924e+04NaNNaN2.222376e+053.763288e+052.094474e+051.163715e+05...4.035525e+052.032260e+052.700920e+05NaN2.675647e+052.695188e+052.750383e+052.882957e+056.720465e+043.352428e+05
1997M300T62300.11972062.428854NaN3.914945e+055.182468e+057.492101e+051.546338e+065.741346e+059.712791e+05...8.554399e+057.431820e+058.878200e+05NaN3.625514e+054.987110e+051.393237e+065.217566e+05NaN1.257126e+05
1998M300T285_2300.124255285.061758NaN4.602130e+054.559729e+059.718658e+053.864969e+053.877729e+051.315307e+06...2.418197e+062.917536e+069.108396e+054.583314e+054.022556e+052.673259e+05NaNNaN8.926295e+042.126753e+04
1999M300T288300.181271287.9443777.880306e+051.738638e+061.113482e+064.063701e+063.788191e+061.201084e+062.988076e+06...2.907005e+063.365814e+062.761628e+061.865813e+061.956308e+06NaN2.918514e+06NaNNaNNaN
\n", "

2000 rows × 40 columns

\n", "
" ], "text/plain": [ " name mz rt D121 A122 \\\n", "0 M102T899 102.034153 898.850160 1.404584e+07 3.689953e+06 \n", "1 M102T849 102.034154 849.085350 1.473961e+07 NaN \n", "2 M105T45 105.042677 45.353942 5.520865e+05 1.813279e+05 \n", "3 M105T54 105.054961 54.350049 6.669635e+05 4.833251e+06 \n", "4 M105T48_1 105.074216 47.538626 6.310113e+05 NaN \n", "... ... ... ... ... ... \n", "1995 M299T296 299.233645 295.569540 8.125150e+04 1.020165e+05 \n", "1996 M300T43_1 299.919504 42.832066 5.042924e+04 NaN \n", "1997 M300T62 300.119720 62.428854 NaN 3.914945e+05 \n", "1998 M300T285_2 300.124255 285.061758 NaN 4.602130e+05 \n", "1999 M300T288 300.181271 287.944377 7.880306e+05 1.738638e+06 \n", "\n", " A125 A126 A127 A128 B131 \\\n", "0 3.598363e+06 1.138875e+07 4.887524e+06 2.104782e+06 7.288258e+06 \n", "1 5.934387e+06 NaN 4.607624e+06 5.969186e+06 3.367949e+06 \n", "2 2.734923e+05 2.342655e+05 6.241395e+04 1.068277e+05 1.192451e+05 \n", "3 2.137479e+06 1.552473e+06 1.753294e+06 2.301363e+06 NaN \n", "4 5.199302e+05 4.302566e+05 5.650141e+05 3.635406e+05 1.096530e+06 \n", "... ... ... ... ... ... \n", "1995 2.209362e+05 3.557402e+05 6.039153e+05 NaN 2.330915e+05 \n", "1996 NaN 2.222376e+05 3.763288e+05 2.094474e+05 1.163715e+05 \n", "1997 5.182468e+05 7.492101e+05 1.546338e+06 5.741346e+05 9.712791e+05 \n", "1998 4.559729e+05 9.718658e+05 3.864969e+05 3.877729e+05 1.315307e+06 \n", "1999 1.113482e+06 4.063701e+06 3.788191e+06 1.201084e+06 2.988076e+06 \n", "\n", " ... E214 E215 E216 H234 \\\n", "0 ... 3.125203e+06 3.608369e+06 NaN 4.763811e+06 \n", "1 ... 1.276006e+07 1.490770e+07 2.880142e+06 4.263577e+06 \n", "2 ... 3.092946e+04 1.788324e+05 1.810794e+05 3.225256e+05 \n", "3 ... 1.186390e+06 3.001167e+06 2.558921e+06 NaN \n", "4 ... 7.882748e+05 NaN 9.822090e+05 4.974403e+05 \n", "... ... ... ... ... ... \n", "1995 ... 3.872671e+05 1.632064e+05 7.224218e+04 3.678394e+04 \n", "1996 ... 4.035525e+05 2.032260e+05 2.700920e+05 NaN \n", "1997 ... 8.554399e+05 7.431820e+05 8.878200e+05 NaN \n", "1998 ... 2.418197e+06 2.917536e+06 9.108396e+05 4.583314e+05 \n", "1999 ... 2.907005e+06 3.365814e+06 2.761628e+06 1.865813e+06 \n", "\n", " H235 H236 H237 H238 H239 \\\n", "0 2.281365e+06 NaN 3.404450e+06 3.720441e+06 4.539032e+05 \n", "1 NaN NaN NaN 4.437697e+06 6.777076e+06 \n", "2 NaN 3.734778e+05 1.935349e+05 NaN 1.094705e+05 \n", "3 NaN 1.695460e+06 NaN 1.834140e+06 1.029692e+06 \n", "4 3.604541e+05 1.340656e+06 NaN NaN 6.020203e+05 \n", "... ... ... ... ... ... \n", "1995 9.526812e+04 5.785549e+04 6.183749e+05 NaN 2.915690e+04 \n", "1996 2.675647e+05 2.695188e+05 2.750383e+05 2.882957e+05 6.720465e+04 \n", "1997 3.625514e+05 4.987110e+05 1.393237e+06 5.217566e+05 NaN \n", "1998 4.022556e+05 2.673259e+05 NaN NaN 8.926295e+04 \n", "1999 1.956308e+06 NaN 2.918514e+06 NaN NaN \n", "\n", " H240 \n", "0 NaN \n", "1 6.341930e+06 \n", "2 1.946732e+05 \n", "3 4.382618e+05 \n", "4 3.597655e+05 \n", "... ... \n", "1995 NaN \n", "1996 3.352428e+05 \n", "1997 1.257126e+05 \n", "1998 2.126753e+04 \n", "1999 NaN \n", "\n", "[2000 rows x 40 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols = [1, 2, 3, 4]\n", "data_fn = \"./data/df_pos_3.tsv\" # use tsv file\n", "df = anno.read_peak(data_fn, cols, sep='\\t')\n", "df" ] }, { "cell_type": "markdown", "id": "719fc2c2", "metadata": {}, "source": [ "Data frame `df` now includes only `name`, `mz`, `rt` and intensity data\n", "matrix.\n", "\n", "## Retention Time Matching\n", "\n", "To perform retention time matching, users use either default retention\n", "time library or their own reference file. The reference file must have one\n", "column: `rt_lib` which is used for retention time matching with a range or\n", "torrance in seconds. Also the column `ion_mode` should be required for\n", "indication of positive or negative mode matching. If `ion_mode` is not\n", "included in the reference file, all rows will be used for matching.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "f11caa11", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:42.665135Z", "iopub.status.busy": "2025-12-04T12:29:42.665135Z", "iopub.status.idle": "2025-12-04T12:29:42.686202Z", "shell.execute_reply": "2025-12-04T12:29:42.686202Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifiermetabolite_namert_libinchikeyion_mod
0ACMG_aqC18_POS_0001MS5029_Isovaleraldehyde24.6QPUYECUOLPXSFR-UHFFFAOYSA-Npositive
1ACMG_aqC18_POS_0002LO57_Dihydroxyfumaric acid hydrate27.0SEKGMJVHSBBHRD-WZHZPDAFSA-Mpositive
2ACMG_aqC18_POS_0003LO61_Benzoic acid27.0DMBUODUULYCPAK-UHFFFAOYSA-Npositive
3ACMG_aqC18_POS_0004LO52_Spermine28.2XDSPGKDYYRNYJI-IUPFWZBJSA-Npositive
4ACMG_aqC18_POS_0005LO21_Spermidine30.0HELXLJCILKEWJH-NCGAPWICSA-Npositive
..................
2827ACMG_aqC18_POS_1412LIM3312_Cholesterol659.4ASOSVCXGWPDUGN-UHFFFAOYSA-Nnegative
2828ACMG_aqC18_POS_1413LO13_5alpha-Cholestan-3-one672.6XQCZBXHVTFVIFE-UHFFFAOYSA-Nnegative
2829ACMG_aqC18_POS_1414LIM3310_5alpha-Cholest-7-en-3beta-ol675.0WLFXSECCHULRRO-UHFFFAOYSA-Nnegative
2830ACMG_aqC18_POS_1415LO302_5alpha-Cholestanol681.6YCIMNLLNPGFGHC-UHFFFAOYSA-Nnegative
2831ACMG_aqC18_POS_1416LO45_10Z-Nonadecenoic acid723.6QIGBRXMKCJKVMJ-UHFFFAOYSA-Nnegative
\n", "

2832 rows × 5 columns

\n", "
" ], "text/plain": [ " identifier metabolite_name rt_lib \\\n", "0 ACMG_aqC18_POS_0001 MS5029_Isovaleraldehyde 24.6 \n", "1 ACMG_aqC18_POS_0002 LO57_Dihydroxyfumaric acid hydrate 27.0 \n", "2 ACMG_aqC18_POS_0003 LO61_Benzoic acid 27.0 \n", "3 ACMG_aqC18_POS_0004 LO52_Spermine 28.2 \n", "4 ACMG_aqC18_POS_0005 LO21_Spermidine 30.0 \n", "... ... ... ... \n", "2827 ACMG_aqC18_POS_1412 LIM3312_Cholesterol 659.4 \n", "2828 ACMG_aqC18_POS_1413 LO13_5alpha-Cholestan-3-one 672.6 \n", "2829 ACMG_aqC18_POS_1414 LIM3310_5alpha-Cholest-7-en-3beta-ol 675.0 \n", "2830 ACMG_aqC18_POS_1415 LO302_5alpha-Cholestanol 681.6 \n", "2831 ACMG_aqC18_POS_1416 LO45_10Z-Nonadecenoic acid 723.6 \n", "\n", " inchikey ion_mod \n", "0 QPUYECUOLPXSFR-UHFFFAOYSA-N positive \n", "1 SEKGMJVHSBBHRD-WZHZPDAFSA-M positive \n", "2 DMBUODUULYCPAK-UHFFFAOYSA-N positive \n", "3 XDSPGKDYYRNYJI-IUPFWZBJSA-N positive \n", "4 HELXLJCILKEWJH-NCGAPWICSA-N positive \n", "... ... ... \n", "2827 ASOSVCXGWPDUGN-UHFFFAOYSA-N negative \n", "2828 XQCZBXHVTFVIFE-UHFFFAOYSA-N negative \n", "2829 WLFXSECCHULRRO-UHFFFAOYSA-N negative \n", "2830 YCIMNLLNPGFGHC-UHFFFAOYSA-N negative \n", "2831 QIGBRXMKCJKVMJ-UHFFFAOYSA-N negative \n", "\n", "[2832 rows x 5 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ion_mode = \"pos\"\n", "# ref_path = \"\" # if empty, use default reference file for matching\n", "ref_path = \"./data/rt_lib_202509.tsv\"\n", "ref = rtm.read_rt(ref_path, ion_mode=ion_mode)\n", "ref" ] }, { "cell_type": "markdown", "id": "d2559f87", "metadata": {}, "source": [ "`rt_tol` is a threshold for the retention time matching window. The unit\n", " is seconds and the default value is 5." ] }, { "cell_type": "code", "execution_count": 4, "id": "1e2067e8", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:42.686202Z", "iopub.status.busy": "2025-12-04T12:29:42.686202Z", "iopub.status.idle": "2025-12-04T12:29:44.329388Z", "shell.execute_reply": "2025-12-04T12:29:44.329388Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idrtidentifiermetabolite_namert_libinchikeyion_modrt_range
0M105T4545.353942ACMG_aqC18_POS_0280LO309_Asymmetric dimethylarginine40.5ZDLDXNCMJBOYJV-YFKPBYRVSA-Npositive5
1M105T4545.353942ACMG_aqC18_POS_0281MS5037_Ribonic acid gamma-lactone40.5DAUAQNGYDSHRET-UHFFFAOYSA-Npositive5
2M105T4545.353942ACMG_aqC18_POS_0282LO18_L-Dihydroorotic acid40.5KCDXJAYRVLXPFO-UHFFFAOYSA-Npositive5
3M105T4545.353942ACMG_aqC18_POS_0283LO30_Stachydrine40.5ITECRQOOEQWFPE-UHFFFAOYSA-Npositive5
4M105T4545.353942ACMG_aqC18_POS_0284LO72_Aminoadipic acid40.5JYPHNHPXFNEZBR-UHFFFAOYSA-Npositive5
...........................
150065M300T288287.944377ACMG_aqC18_POS_0942MS5008_Ethyl crotonate291.0OZWKMVRBQXNZKK-UHFFFAOYSA-Nnegative5
150066M300T288287.944377ACMG_aqC18_POS_0943MS5032_2-Phenyl-1-propanol291.0DKYWVDODHFEZIM-UHFFFAOYSA-Nnegative5
150067M300T288287.944377ACMG_aqC18_POS_0944LO15_Methyl indole-3-acetate291.0RTIXKCRFFJGDFG-UHFFFAOYSA-Nnegative5
150068M300T288287.944377ACMG_aqC18_POS_0945LO03_Cinnamic aldehyde291.6FNYLWPVRPXGIIP-UHFFFAOYSA-Npositive5
150069M300T288287.944377ACMG_aqC18_POS_0945LO03_Cinnamic aldehyde291.6FNYLWPVRPXGIIP-UHFFFAOYSA-Nnegative5
\n", "

150070 rows × 8 columns

\n", "
" ], "text/plain": [ " id rt identifier \\\n", "0 M105T45 45.353942 ACMG_aqC18_POS_0280 \n", "1 M105T45 45.353942 ACMG_aqC18_POS_0281 \n", "2 M105T45 45.353942 ACMG_aqC18_POS_0282 \n", "3 M105T45 45.353942 ACMG_aqC18_POS_0283 \n", "4 M105T45 45.353942 ACMG_aqC18_POS_0284 \n", "... ... ... ... \n", "150065 M300T288 287.944377 ACMG_aqC18_POS_0942 \n", "150066 M300T288 287.944377 ACMG_aqC18_POS_0943 \n", "150067 M300T288 287.944377 ACMG_aqC18_POS_0944 \n", "150068 M300T288 287.944377 ACMG_aqC18_POS_0945 \n", "150069 M300T288 287.944377 ACMG_aqC18_POS_0945 \n", "\n", " metabolite_name rt_lib inchikey \\\n", "0 LO309_Asymmetric dimethylarginine 40.5 ZDLDXNCMJBOYJV-YFKPBYRVSA-N \n", "1 MS5037_Ribonic acid gamma-lactone 40.5 DAUAQNGYDSHRET-UHFFFAOYSA-N \n", "2 LO18_L-Dihydroorotic acid 40.5 KCDXJAYRVLXPFO-UHFFFAOYSA-N \n", "3 LO30_Stachydrine 40.5 ITECRQOOEQWFPE-UHFFFAOYSA-N \n", "4 LO72_Aminoadipic acid 40.5 JYPHNHPXFNEZBR-UHFFFAOYSA-N \n", "... ... ... ... \n", "150065 MS5008_Ethyl crotonate 291.0 OZWKMVRBQXNZKK-UHFFFAOYSA-N \n", "150066 MS5032_2-Phenyl-1-propanol 291.0 DKYWVDODHFEZIM-UHFFFAOYSA-N \n", "150067 LO15_Methyl indole-3-acetate 291.0 RTIXKCRFFJGDFG-UHFFFAOYSA-N \n", "150068 LO03_Cinnamic aldehyde 291.6 FNYLWPVRPXGIIP-UHFFFAOYSA-N \n", "150069 LO03_Cinnamic aldehyde 291.6 FNYLWPVRPXGIIP-UHFFFAOYSA-N \n", "\n", " ion_mod rt_range \n", "0 positive 5 \n", "1 positive 5 \n", "2 positive 5 \n", "3 positive 5 \n", "4 positive 5 \n", "... ... ... \n", "150065 negative 5 \n", "150066 negative 5 \n", "150067 negative 5 \n", "150068 positive 5 \n", "150069 negative 5 \n", "\n", "[150070 rows x 8 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rt_tol = 5\n", "res = rtm.comp_match_rt(df, ref, rt_tol)\n", "res" ] }, { "cell_type": "markdown", "id": "9fb0512a", "metadata": {}, "source": [ "## Summarize Results\n", "\n", "The function `comp_summ` in package `LAMP` summarises the retention time\n", "matching." ] }, { "cell_type": "code", "execution_count": 5, "id": "e4d97561", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:44.330903Z", "iopub.status.busy": "2025-12-04T12:29:44.330903Z", "iopub.status.idle": "2025-12-04T12:29:44.879843Z", "shell.execute_reply": "2025-12-04T12:29:44.879843Z" } }, "outputs": [], "source": [ "sr, mr = anno.comp_summ(df, res)" ] }, { "cell_type": "markdown", "id": "499abec3", "metadata": {}, "source": [ "This function combines peak table with retention time matching results and\n", "returns two results in different formats. `sr` is single row results for\n", "each peak id in peak table `df`:" ] }, { "cell_type": "code", "execution_count": 6, "id": "99c35f0a", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:44.881389Z", "iopub.status.busy": "2025-12-04T12:29:44.881389Z", "iopub.status.idle": "2025-12-04T12:29:44.895776Z", "shell.execute_reply": "2025-12-04T12:29:44.895776Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemzrtrt_rangeidentifiermetabolite_namert_libinchikeyion_mod
0M100T54100.07592553.8109245.0ACMG_aqC18_POS_0389::ACMG_aqC18_POS_0389::ACMG...LO488_Maleic acid::LO488_Maleic acid::LO321_L-...48.9::48.9::49.2::49.2::50.4::50.4::50.4::50.4...JJVNINGBHGBWJH-UHFFFAOYSA-N::JJVNINGBHGBWJH-UH...positive::negative::positive::negative::positi...
1M1015T2541014.985384253.6261775.0ACMG_aqC18_POS_0782::ACMG_aqC18_POS_0783::ACMG...LO481_3-Hydroxydecanedioic acid::LIM3308_Suber...249.00000000000003::249.00000000000003::249.00...FVWJYYTZTCVBKE-ROUWMTJPSA-N::TVZGACDUOSZQKY-UH...positive::positive::negative::negative::positi...
2M101T228101.060060228.1254035.0ACMG_aqC18_POS_0654::ACMG_aqC18_POS_0654::ACMG...MS5018_Dimethyl maleate::MS5018_Dimethyl malea...223.2::223.2::223.8::223.8::223.8::223.8::223....KIWQWJKWBHZMDT-UHFFFAOYSA-N::KIWQWJKWBHZMDT-UH...positive::negative::positive::positive::positi...
3M102T849102.034154849.085350NaNNaNNaNNaNNaNNaN
4M102T899102.034153898.850160NaNNaNNaNNaNNaNNaN
..............................
1995M865T700865.244172700.365420NaNNaNNaNNaNNaNNaN
1996M919T647918.701782646.9882205.0ACMG_aqC18_POS_1407::ACMG_aqC18_POS_1407::ACMG...LO05_Vitamin K1::LO05_Vitamin K1::LIM3314_Phyl...642.6::642.6::643.2::643.2::647.1::647.1ZFDIRQKJPRINOQ-HYXAFXHYSA-N::ZFDIRQKJPRINOQ-HY...positive::negative::positive::negative::positi...
1997M925T237_1924.898294236.9644625.0ACMG_aqC18_POS_0690::ACMG_aqC18_POS_0691::ACMG...LO306_Syringic acid::LO315_ortho-Hydroxyphenyl...232.2::232.2::232.2::232.2::232.2::232.2::232....AFBPFSWMIHJQDM-UHFFFAOYSA-N::OISVCGZHLKNMSJ-UH...positive::positive::positive::positive::positi...
1998M933T267933.410460266.9764715.0ACMG_aqC18_POS_0839::ACMG_aqC18_POS_0840::ACMG...MS5012_Diethyl malonate::MS5019_Trimethylaceti...262.2::262.2::262.2::262.2::262.2::262.2::262....KEVYVLWNCKMXJX-UHFFFAOYSA-N::WTTJVINHCBCLGX-ZD...positive::positive::positive::negative::negati...
1999M934T242933.932365242.3953715.0ACMG_aqC18_POS_0720::ACMG_aqC18_POS_0721::ACMG...LIM3312_Aspartame::MS5023_Ethyl levulinate::MS...237.6::237.6::237.6::237.6::237.6::237.6::237....MBDOYVRWFFCFHM-SNAWJCMRSA-N::XPFVYQJUAUNWIW-UH...positive::positive::positive::positive::negati...
\n", "

2000 rows × 9 columns

\n", "
" ], "text/plain": [ " name mz rt rt_range \\\n", "0 M100T54 100.075925 53.810924 5.0 \n", "1 M1015T254 1014.985384 253.626177 5.0 \n", "2 M101T228 101.060060 228.125403 5.0 \n", "3 M102T849 102.034154 849.085350 NaN \n", "4 M102T899 102.034153 898.850160 NaN \n", "... ... ... ... ... \n", "1995 M865T700 865.244172 700.365420 NaN \n", "1996 M919T647 918.701782 646.988220 5.0 \n", "1997 M925T237_1 924.898294 236.964462 5.0 \n", "1998 M933T267 933.410460 266.976471 5.0 \n", "1999 M934T242 933.932365 242.395371 5.0 \n", "\n", " identifier \\\n", "0 ACMG_aqC18_POS_0389::ACMG_aqC18_POS_0389::ACMG... \n", "1 ACMG_aqC18_POS_0782::ACMG_aqC18_POS_0783::ACMG... \n", "2 ACMG_aqC18_POS_0654::ACMG_aqC18_POS_0654::ACMG... \n", "3 NaN \n", "4 NaN \n", "... ... \n", "1995 NaN \n", "1996 ACMG_aqC18_POS_1407::ACMG_aqC18_POS_1407::ACMG... \n", "1997 ACMG_aqC18_POS_0690::ACMG_aqC18_POS_0691::ACMG... \n", "1998 ACMG_aqC18_POS_0839::ACMG_aqC18_POS_0840::ACMG... \n", "1999 ACMG_aqC18_POS_0720::ACMG_aqC18_POS_0721::ACMG... \n", "\n", " metabolite_name \\\n", "0 LO488_Maleic acid::LO488_Maleic acid::LO321_L-... \n", "1 LO481_3-Hydroxydecanedioic acid::LIM3308_Suber... \n", "2 MS5018_Dimethyl maleate::MS5018_Dimethyl malea... \n", "3 NaN \n", "4 NaN \n", "... ... \n", "1995 NaN \n", "1996 LO05_Vitamin K1::LO05_Vitamin K1::LIM3314_Phyl... \n", "1997 LO306_Syringic acid::LO315_ortho-Hydroxyphenyl... \n", "1998 MS5012_Diethyl malonate::MS5019_Trimethylaceti... \n", "1999 LIM3312_Aspartame::MS5023_Ethyl levulinate::MS... \n", "\n", " rt_lib \\\n", "0 48.9::48.9::49.2::49.2::50.4::50.4::50.4::50.4... \n", "1 249.00000000000003::249.00000000000003::249.00... \n", "2 223.2::223.2::223.8::223.8::223.8::223.8::223.... \n", "3 NaN \n", "4 NaN \n", "... ... \n", "1995 NaN \n", "1996 642.6::642.6::643.2::643.2::647.1::647.1 \n", "1997 232.2::232.2::232.2::232.2::232.2::232.2::232.... \n", "1998 262.2::262.2::262.2::262.2::262.2::262.2::262.... \n", "1999 237.6::237.6::237.6::237.6::237.6::237.6::237.... \n", "\n", " inchikey \\\n", "0 JJVNINGBHGBWJH-UHFFFAOYSA-N::JJVNINGBHGBWJH-UH... \n", "1 FVWJYYTZTCVBKE-ROUWMTJPSA-N::TVZGACDUOSZQKY-UH... \n", "2 KIWQWJKWBHZMDT-UHFFFAOYSA-N::KIWQWJKWBHZMDT-UH... \n", "3 NaN \n", "4 NaN \n", "... ... \n", "1995 NaN \n", "1996 ZFDIRQKJPRINOQ-HYXAFXHYSA-N::ZFDIRQKJPRINOQ-HY... \n", "1997 AFBPFSWMIHJQDM-UHFFFAOYSA-N::OISVCGZHLKNMSJ-UH... \n", "1998 KEVYVLWNCKMXJX-UHFFFAOYSA-N::WTTJVINHCBCLGX-ZD... \n", "1999 MBDOYVRWFFCFHM-SNAWJCMRSA-N::XPFVYQJUAUNWIW-UH... \n", "\n", " ion_mod \n", "0 positive::negative::positive::negative::positi... \n", "1 positive::positive::negative::negative::positi... \n", "2 positive::negative::positive::positive::positi... \n", "3 NaN \n", "4 NaN \n", "... ... \n", "1995 NaN \n", "1996 positive::negative::positive::negative::positi... \n", "1997 positive::positive::positive::positive::positi... \n", "1998 positive::positive::positive::negative::negati... \n", "1999 positive::positive::positive::positive::negati... \n", "\n", "[2000 rows x 9 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sr" ] }, { "cell_type": "markdown", "id": "05503557", "metadata": {}, "source": [ "`mr` is multiple rows format if the match more than once from the reference\n", "file:" ] }, { "cell_type": "code", "execution_count": 7, "id": "95c9b44f", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:44.895776Z", "iopub.status.busy": "2025-12-04T12:29:44.895776Z", "iopub.status.idle": "2025-12-04T12:29:44.919033Z", "shell.execute_reply": "2025-12-04T12:29:44.918004Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemzrtidentifiermetabolite_namert_libinchikeyion_modrt_range
0M100T54100.07592553.810924ACMG_aqC18_POS_0389LO488_Maleic acid48.9JJVNINGBHGBWJH-UHFFFAOYSA-Npositive5.0
1M100T54100.07592553.810924ACMG_aqC18_POS_0389LO488_Maleic acid48.9JJVNINGBHGBWJH-UHFFFAOYSA-Nnegative5.0
2M100T54100.07592553.810924ACMG_aqC18_POS_0390LO321_L-Theanine49.2SULYEHHGGXARJS-UHFFFAOYSA-Npositive5.0
3M100T54100.07592553.810924ACMG_aqC18_POS_0390LO321_L-Theanine49.2SULYEHHGGXARJS-UHFFFAOYSA-Nnegative5.0
4M100T54100.07592553.810924ACMG_aqC18_POS_0391LO310_Dihydrothymine50.4YPTJKHVBDCRKNF-UHFFFAOYSA-Npositive5.0
..............................
150218M934T242933.932365242.395371ACMG_aqC18_POS_0775LO35_2-Methoxybenzoic acid247.2RFKITWRHKUYMRJ-UHFFFAOYSA-Npositive5.0
150219M934T242933.932365242.395371ACMG_aqC18_POS_0772MS5015_Phenylglyoxal247.2QWIZNVHXZXRPDR-WSCXOGSTSA-Nnegative5.0
150220M934T242933.932365242.395371ACMG_aqC18_POS_0773MS5021_Ethyl 2-methylacetoacetate247.2BHTRKEVKTKCXOH-LBSADWJPSA-Nnegative5.0
150221M934T242933.932365242.395371ACMG_aqC18_POS_0774LO12_Homoveratrumic acid247.2SEBFKMXJBCUCAI-UHFFFAOYSA-Nnegative5.0
150222M934T242933.932365242.395371ACMG_aqC18_POS_0775LO35_2-Methoxybenzoic acid247.2RFKITWRHKUYMRJ-UHFFFAOYSA-Nnegative5.0
\n", "

150223 rows × 9 columns

\n", "
" ], "text/plain": [ " name mz rt identifier \\\n", "0 M100T54 100.075925 53.810924 ACMG_aqC18_POS_0389 \n", "1 M100T54 100.075925 53.810924 ACMG_aqC18_POS_0389 \n", "2 M100T54 100.075925 53.810924 ACMG_aqC18_POS_0390 \n", "3 M100T54 100.075925 53.810924 ACMG_aqC18_POS_0390 \n", "4 M100T54 100.075925 53.810924 ACMG_aqC18_POS_0391 \n", "... ... ... ... ... \n", "150218 M934T242 933.932365 242.395371 ACMG_aqC18_POS_0775 \n", "150219 M934T242 933.932365 242.395371 ACMG_aqC18_POS_0772 \n", "150220 M934T242 933.932365 242.395371 ACMG_aqC18_POS_0773 \n", "150221 M934T242 933.932365 242.395371 ACMG_aqC18_POS_0774 \n", "150222 M934T242 933.932365 242.395371 ACMG_aqC18_POS_0775 \n", "\n", " metabolite_name rt_lib \\\n", "0 LO488_Maleic acid 48.9 \n", "1 LO488_Maleic acid 48.9 \n", "2 LO321_L-Theanine 49.2 \n", "3 LO321_L-Theanine 49.2 \n", "4 LO310_Dihydrothymine 50.4 \n", "... ... ... \n", "150218 LO35_2-Methoxybenzoic acid 247.2 \n", "150219 MS5015_Phenylglyoxal 247.2 \n", "150220 MS5021_Ethyl 2-methylacetoacetate  247.2 \n", "150221 LO12_Homoveratrumic acid 247.2 \n", "150222 LO35_2-Methoxybenzoic acid 247.2 \n", "\n", " inchikey ion_mod rt_range \n", "0 JJVNINGBHGBWJH-UHFFFAOYSA-N positive 5.0 \n", "1 JJVNINGBHGBWJH-UHFFFAOYSA-N negative 5.0 \n", "2 SULYEHHGGXARJS-UHFFFAOYSA-N positive 5.0 \n", "3 SULYEHHGGXARJS-UHFFFAOYSA-N negative 5.0 \n", "4 YPTJKHVBDCRKNF-UHFFFAOYSA-N positive 5.0 \n", "... ... ... ... \n", "150218 RFKITWRHKUYMRJ-UHFFFAOYSA-N positive 5.0 \n", "150219 QWIZNVHXZXRPDR-WSCXOGSTSA-N negative 5.0 \n", "150220 BHTRKEVKTKCXOH-LBSADWJPSA-N negative 5.0 \n", "150221 SEBFKMXJBCUCAI-UHFFFAOYSA-N negative 5.0 \n", "150222 RFKITWRHKUYMRJ-UHFFFAOYSA-N negative 5.0 \n", "\n", "[150223 rows x 9 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mr" ] }, { "cell_type": "markdown", "id": "8837c041", "metadata": {}, "source": [ "All of results can be saved into a `sqlite3` database and use\n", "[DB Browser for SQLite](https://sqlitebrowser.org/) to view. Or save these\n", "results in other formats, such as TSV, CSV or XLSX, separately." ] }, { "cell_type": "code", "execution_count": 8, "id": "bdc0a967", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:44.921039Z", "iopub.status.busy": "2025-12-04T12:29:44.921039Z", "iopub.status.idle": "2025-12-04T12:29:44.925914Z", "shell.execute_reply": "2025-12-04T12:29:44.925914Z" } }, "outputs": [], "source": [ "f_save = False # here we do NOT save results\n", "db_out = \"test.db\"\n", "sr_out = \"test_s.tsv\"\n", "mr_out = \"test_m.tsv\"\n", "xlsx_out = \"test.xlsx\"" ] }, { "cell_type": "code", "execution_count": 9, "id": "919b88f6", "metadata": { "execution": { "iopub.execute_input": "2025-12-04T12:29:44.925914Z", "iopub.status.busy": "2025-12-04T12:29:44.925914Z", "iopub.status.idle": "2025-12-04T12:29:44.934684Z", "shell.execute_reply": "2025-12-04T12:29:44.934684Z" } }, "outputs": [], "source": [ "if f_save:\n", " # save all results into a sqlite3 database\n", " conn = sqlite3.connect(db_out)\n", " df[[\"name\", \"mz\", \"rt\"]].to_sql(\"peaklist\",\n", " conn,\n", " if_exists=\"replace\",\n", " index=False)\n", " mr.to_sql(\"anno_mr\", conn, if_exists=\"replace\", index=False)\n", " sr.to_sql(\"anno_sr\", conn, if_exists=\"replace\", index=False)\n", "\n", " conn.commit()\n", " conn.close()\n", "\n", " # save results into text files\n", " sr.to_csv(sr_out, sep=\"\\t\", index=False)\n", " mr.to_csv(mr_out, sep=\"\\t\", index=False)\n", "\n", " # save results into Excel format\n", " with pd.ExcelWriter(xlsx_out, mode=\"w\", engine=\"openpyxl\") as writer:\n", " sr.to_excel(writer, sheet_name=\"single-row\", index=False)\n", " mr.to_excel(writer, sheet_name=\"multiple-row\", index=False)" ] }, { "cell_type": "markdown", "id": "a4256549", "metadata": {}, "source": [ "It should be noted that saving of Excel file takes much longer time than\n", "text files.\n", "\n", "## End User Usages\n", "\n", "`LiRTMaTS` provides two computation options: command line interface(CLI)\n", "and graphical user interface (GUI).\n", "\n", "To use GUI, you need to open a terminal and type in:\n", "\n", "```bash\n", "$ lirtmats gui\n", "```\n", "\n", "To use CLI, open a terminal and type in command with required arguments,\n", "something like:\n", "\n", "```bash\n", "lirtmats cli \\\n", " --input-data \"./data/df_pos_3.tsv\" \\\n", " --input-sep \"tab\" \\\n", " --col-idx \"1, 2, 3, 4\" \\\n", " --rt-path \"\" \\\n", " --rt-sep \"tab\" \\\n", " --rt-tol \"5.0\" \\\n", " --ion-mode \"pos\" \\\n", " --save-db \\\n", " --summ-type \"xlsx\" \\\n", "```\n", "\n", "Execution of this command line will produce `df_pos_3_rtm.db` and\n", "`df_pos_3_rtm.xlsx` in the directory `./data/`. If the `summ-type` is `tsv`\n", "or `csv`, files `df_pos_3_rtm_s.tsv` or `df_pos_3_rtm_s.csv` and\n", "`df_pos_3_rtm_m.tsv` or `df_pos_3_rtm_m.csv` will be saved into `./data`.\n", "\n", "For the best practice, you can create a bash script `.sh` (Linux\n", "and MacOS) or Windows script `.bat` to contain these CLI\n", "arguments. Change parameters in these files each time when processing new\n", "data set.\n", "\n", "For example, there are `lirtmats_cli.sh` and `lirtmats_cli.bat` in\n", "https://github.com/wanchanglin/lirtmats/tree/master/examples.\n", "\n", "- For Linux and MacOS terminal:\n", "\n", " ```bash\n", " $ chmod +x lirtmats_cli.sh\n", " $ ./lirtmats_cli.sh\n", " ```\n", "\n", "- For Windows terminal:\n", "\n", " ```bash\n", " $ lirtmats_cli.bat\n", " ```\n", "\n", "Note that if users use `xlsx` files for input data and reference file when\n", "using GUI or CLI, all data must be in the first sheet. If you use\n", "`LiRTMaTS` functions in your python scripts, there are no such\n", "requirements." ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all", "main_language": "python", "notebook_metadata_filter": "-all" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 5 }