Blood-based tumor biomarker discovery

TeckDr Teck Yew Low joined the UKM Medical Molecular Biology Institute (UMBI; Kuala Lumpur, Malaysia) as an associate professor and senior research fellow located in the National University of Malaysia (UKM; Malaysia). Before that, he obtained his PhD in Biochemistry from the National University of Singapore (NUS; Singapore) and worked with mass spectrometry-based proteomics in the Genome Institute of Singapore (GIS; Singapore), the Center for Experimental Bioinformatic (CEBI) in University of Southern Denmark (SDU; Denmark) and the Netherlands Proteomics Center (NPC) in Utrecht University in the Netherlands. His research interests include bridging (phospho)proteomics with clinics and cancer proteogenomics in the Malaysian population.

Keywords: Biomarker, liquid biopsy, proteomics, genomics, cell-free DNA (cf-DNA), circulating tumor DNA (ct-DNA)

In ancient Greece, body fluids were used as surrogates to predict health and disease status. It was documented that Hippocrates and Theophilus incorporated uroscopy (visual examination of urine) in their diagnostic protocols. Two thousand years later, body fluids particularly plasma and serum, are still useful as proxies for diagnosis, albeit in the forms of ‘biomarkers’. Although there are multiple ways to define what constitutes a biomarker, it is generally accepted that biomarkers are circulatory biomolecules that can be used as objective indicators of normal biological processes, pathogenic processes or pharmacological responses [1]. These biomolecules can help to answer a number of questions that may assist in clinical decision-making including: what disease(s) the patient is suffering from?; at what stage is the disease?; how can the disease risks be best stratified and managed?; and does the health of the patient improve upon treatment? Ideal biomarkers should preferably possess the following beneficial attributes such as being sensitive, specific, cost-effective, easily obtainable and non-invasive [2]. Importantly, it should also be quantifiable, correlate well with the severity of disease conditions and able to offer early detection.

Proteins are common analytes used in routine clinical tests and over 100 FDA-approved protein biomarkers have been used in the form of blood tests. Some examples of protein-based tumor biomarkers include: alphafetoprotein, carcinoembryonic antigen and prostate-specific antigen. Due to its ability to quantitatively differentiate between proteins and their modified isoforms, between healthy and diseased specimens, proteomics has been widely applied for biomarker discovery. However, most biomarker candidates that are reported in proteomic studies are often performed in small cohorts due to limited throughput. They are seldom extensively verified and rarely meet the stringent FDA criteria, resulting in little translation into clinical practice. Plasma/serum proteomics is riddled with technical challenges. Notably, detection of lower abundant proteins is often limited by the wide dynamic range of its protein composition, rendering them beyond the reach of even present-day mass spectrometers (MS) [3]. Though many formats of serum depletion have been introduced to remove the top N most abundant proteins, this is rarely sufficient and often distorts quantitative ratios [4]. Fortuitously, continuous improvements in sample preparation, chromatographic separation and MS have slowly reduced these gaps [5–7]. Recently, Geyer et al. reported a robust ‘‘plasma proteome profiling’’ pipeline, i.e. an automated, single-run shotgun workflow devoid of protein depletion. It was demonstrated that this pipeline enabled quantitative analysis of hundreds of plasma proteomes from one single finger pricks with 20 min LC gradients [8].

Strategy-wise, a conventional proteomics biomarker discovery pipeline adopts a hierarchical procedure described as the ‘triangular approach’ that comprises three tiers [9,10]. Tier one typically involves shotgun or bottom-up experiments that are performed on selected controls and cases, and it contributes to the hypothesis-free discovery of a comprehensive list of statistically significant, differentially expressed proteins. In tier two – the validation phase, a number of differentially expressed proteins from tier one are selected to be followed up with targeted proteomics with selected reaction monitoring (SRM) in a larger and independent cohort. SRM assays are developed such that, for each protein of interest, a set of three to five proteotypic peptides is selected; followed by the assessment of the retention times and fragmentation profiles for each peptide. Subsequently, a triple quadrupole MS is set up to continuously fragment only these peptides. Since MS monitoring is restricted to only selected precursor-fragment pairs (transitions), sensitive and specific quantification can be achieved in a reproducible and high-throughput manner [11]. An alternative to SRM assays is the data-independent acquisition (DIA) technique that can provide comprehensive and permanent records of all detectable molecular species. In DIA, the fragment ion data is sequentially acquired across a defined set of m/z isolation windows covering the mass range of interest, followed by targeted data extraction based on spectral libraries for protein identification and quantification [12]. Finally, in tier three, a few validated biomarker candidates are further tested with well-established methods such as immunoassays with another new and larger cohort.

Recently, an alternative ‘rectangular strategy’ has been proposed [9]. With this approach, both discovery and validation cohorts consist of large groups of participants; and samples from both cohorts are measured in parallel using shotgun proteomics. An advantage of this method is that protein patterns that are characteristic of particular health or disease states, in addition to single biomarker candidates can be attained. Subsequent accumulation of plasma proteome data potentially leads to a knowledge base that links proteome profiles to different ‘perturbations’, such as diseases, risks, treatments and lifestyles. Consequently, an individual’s plasma proteome measurement can be compared against this global knowledge base so as to deconvolute co-morbidities, to guide treatment and monitor effectiveness. This is akin to cancer genomics, whereby the genomic profiles of individual patients are deciphered so as to reveal the genomic signatures as well as the molecular subtypes of cancers for guiding precision therapy [13]. Nevertheless, irrespective of the triangular or rectangular strategies, biomarker discovery efforts typically necessitate well-funded, up-to-date infrastructures, well-curated cohorts, biobanking and multi-disciplinary expertise. This presents a significant barrier to entry for many independent scientists. Nevertheless, a number of emerging, dedicated centers have been established, that focus on industrial-scale proteomics for biomarker discovery.

On a separate note, cell-free DNA (cfDNA) has been found to be consistently released into the circulation by different populations of cells; whereby cfDNA released from cancerous cells are named circulating tumor DNA (ctDNA). Since oncogenesis often incurs various genomic alterations that encompass single nucleotide variants, copy number variation or chromosomal structural rearrangements, such genetic variants may be detected among the ctDNA, thanks to the advent in next-generation sequencing (NGS) technologies [14]. A blood test that allows ctDNA or circulating tumor cells to be detected is aptly named a liquid biopsy, in contrast to surgical biopsy that is excised from tissues. Other than ctDNA, aberrant miRNA, lncRNA can also be found in liquid biopsy, in addition to circulating tumor cells and exosomes that may be tumor-specific. It has been documented that ctDNA can be detected in many types of cancers [15].

Despite the robustness of NGS/TGS and other molecular biology-based techniques in comparison to MS-based proteomics, it has been reported that many early-stage tumors do not release amounts of ctDNA that is readily detectable by even the most sensitive techniques [16]. In 2018, Cohen et al. reported a newly developed blood test named CancerSEEK, a multi-analyte panel that evaluates the presence of cancer mutations and eight cancer-associated protein biomarkers in blood [17]. In this test, multiplex PCR is used to detect ctDNA mutations for 16 genes, spanning across 2001 genomic positions, while simultaneously immunoassays are used to evaluate eight protein biomarkers. When CancerSEEK was applied to 1005 patients with non-metastatic cancers of the ovary, liver, stomach, pancreas, esophagus, colorectum, lung, or breast, it yielded positive results in a median of 70% of the eight cancer types, with sensitivities ranging from 69–98% for the detection of five cancer types (ovary, liver, stomach, pancreas and esophagus) for which there are no screening tests available for average-risk individuals.

In summary, proteomics-based tumor biomarker discovery has come a long way since the days of two-dimensional gels. The expression of proteins and the associated post-translational modifications are phenomena that manifest rapid adaptability, in contrast to genomic variants that remain relatively static over time. Thus, a multi-biomarker panel may need to incorporate different classes of biomolecules so as to narrate the disease history in a holistic manner.


  1. Strimbu K, Tavel JA. What are biomarkers? Curr. Opin. HIV AIDS. 5(6), 463–466 (2010).
  2. Institute of Medicine (US) Forum on Drug Discovery D and T. Qualifying Biomarkers. (2008).
  3. Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1(11), 845–867 (2002).
  4. Lee PY, Osman J, Low TY, Jamal R. Plasma/serum proteomics: depletion strategies for reducing high-abundance proteins for biomarker discovery. Bioanalysis 11(19), 1799–1812 (2019).
  5. Krieger JR, Wybenga-Groot LE, Tong J, Bache N, Tsao MS, Moran MF. Evosep one enables robust deep proteome coverage using tandem mass tags while significantly reducing instrument time. J. Proteome Res. 18(5), 2346–2353 (2019).
  6. Bekker-Jensen DB, Martinez-Val A, Steigerwald S et al. A compact quadrupole-orbitrap mass spectrometer with FAIMS interface improves proteome coverage in short LC gradients. Mol. Cell. Proteomics mcp.TIR119.001906 (2020).
  7. Meier F, Brunner AD, Koch S et al. Online parallel accumulation–serial fragmentation (PASEF) with a novel trapped ion mobility mass spectrometer. Mol. Cell. Proteomics 17(12), 2534–2545 (2018).
  8. Geyer PE, Kulak NA, Pichler G, Holdt LM, Teupser D, Mann M. Plasma proteome profiling to assess human health and disease. Cell Syst. 2(3), 185–195 (2016).
  9. Geyer PE, Holdt LM, Teupser D, Mann M. Revisiting biomarker discovery by plasma proteomics. Mol. Syst. Biol. 13(9), 942 (2017).
  10. Rifai N, Gillette MA, Carr SA. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24(8), 971–983 (2006).
  11. Borràs E, Sabidó E. What is targeted proteomics? A concise revision of targeted acquisition and targeted data analysis in mass spectrometry. Proteomics 17(17–18), 1700180 (2017).
  12. Aebersold R, Bensimon A, Collins BC, Ludwig C, Sabido E. Applications and developments in targeted proteomics: from SRM to DIA/SWATH. Proteomics 16(15–16), 2065–2067 (2016).
  13. Berger MF, Mardis ER. The emerging clinical relevance of genomics in cancer medicine. Nat. Rev. Clin. Oncol.15(6), 353–365 (2018).
  14. Domínguez-vigil IG, Moreno-martínez AK, Wang JY, Michael H, Roehrl A, Barrera-saldaña HA. The dawn of the liquid biopsy in the fight against cancer. 9(2), 2912–2922 (2018).
  15. Heitzer E, Ulz P, Geigl JB. Circulating tumor DNA as a liquid biopsy for cancer. Clin. Chem. 61(1), 112–123 (2015).
  16. Bettegowda C, Sausen M, Leary RJ et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6(224) (2014).
  17. Cohen JD, Li L, Wang Y et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359(6378), 926–930 (2018).


The author would like to thank SCIEX Malaysia for technical and research support; and Dr Tan Shing Cheng for critical proofreading of the manuscript.

The opinions expressed in this feature are those of the author and do not necessarily reflect the views of Bioanalysis Zone or Future Science Group.