A Comprehensive Survey of Recent Approaches on Microarray Image Data

  • Survey Article
  • Published: 06 December 2023
  • Volume 5 , article number  64 , ( 2024 )

Cite this article

dna microarray research paper

  • C. K. Roopa   ORCID: orcid.org/0000-0002-8332-0901 1 ,
  • M. P. Priya   ORCID: orcid.org/0000-0002-4465-3930 1 ,
  • B. S. Harish   ORCID: orcid.org/0000-0001-5495-0640 1 &
  • M. S. Maheshan   ORCID: orcid.org/0000-0002-3330-8795 1  

84 Accesses

Explore all metrics

Microarray image processing techniques are used to study gene expressions in the form of images. This helps in genomic study without sequencing to extract useful information from gene expressions. Microarray images have greater scope of study in the field of Bioinformatics. This helps in processing the information embedded in these gene expressions which is laying a foundation to biological interpretation. Microarray image analysis is needed for applications like gene discovery, disease diagnosis and treatment, taxonomic research, drug discovery. This review paper discusses various types of microarray data and stages pertaining to processing of microarray images. Substantial literature describes comparative study on various microarray image processing techniques to formulate effective results in extracting gene expressions. This paper describes the latest approaches on microarray image processing tools, datasets, experimentation, results with comparative study on computational methods applied in literature. The literature also highlights challenges faced in the phase of pre-processing, gridding, spot segmentation and recognition techniques which can be helpful for a researcher to identify effective methods for classifying gene expressions. Further, the paper gives an overview of standard datasets used by research authors and details on creation of microarray data independently which can be helpful for experimentations. The gene expression extracted from microarray image processing techniques representing biological data helps in integrating computation techniques with the bioinformatics field leading to an open area of research for additional development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

dna microarray research paper

Similar content being viewed by others

dna microarray research paper

Algorithms to Preprocess Microarray Image Data

dna microarray research paper

Noise Reduction from the Microarray Images to Identify the Intensity of the Expression

dna microarray research paper

Microarray Analysis Using Multiple Feature Data Clustering Algorithms

Harikiran J, Ramakrishna D, Avinash B, Lakshmi PV, KiranKumar R. A new method of gridding for spot detection in microarray images. Computer Engineering and Intelligent Systems. 2014;5(3):25–33.

Google Scholar  

Bajcsy, P. (2005, September). An overview of DNA microarray image requirements for automated processing. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)-Workshops (pp. 147–147). IEEE.

Alonso-Betanzos, A., Bolón-Canedo, V., Morán-Fernández, L., & Sánchez-Maroño, N. (2019). A review of microarray datasets: where to find them and specific characteristics. Microarray Bioinformatics, 65–85.

Belean, B., Borda, M., & Fazakas, A. (2008, September). Adaptive microarray image acquisition system and microarray image processing using FPGA technology. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 327–334). Springer, Berlin, Heidelberg.

Bajcsy, P., Liu, L., & Band, M. (2007). DNA microarray image processing. DNA Array Image Anal. Nuts Bolts (Nuts Bolts Ser, 1–77.

Lukac R, Plataniotis KN, Smolka B, Venetsanopoulos AN. cDNA microarray image processing using fuzzy vector filtering framework. Fuzzy Sets Syst. 2005;152(1):17–35.

Article   MathSciNet   MATH   Google Scholar  

Joseph SM, Sathidevi PS. A fully automated gridding technique for real composite cdna microarray images. IEEE Access. 2020;8:39605–22.

Article   Google Scholar  

Bajcsy P. Gridline: automatic grid alignment DNA microarray scans. IEEE Trans Image Process. 2004;13(1):15–25.

Article   MathSciNet   Google Scholar  

Biju, V. G., & Mythili, P. (2015). Microarray Image Gridding using Grid line Refinement Technique. ICTACT Journal on Image & Video Processing, 5(4).

Hirata Jr, R., Barrera, J., Hashimoto, R. F., & Dantas, D. O. (2001, October). Microarray Gridding by Mathematical Morphology. In sibgrapi (pp. 112–119).

Rueda, L. (2007, December). Sub-grid detection in DNA microarray images. In Pacific-Rim Symposium on Image and Video Technology (pp. 248–259). Springer, Berlin, Heidelberg.

Lukac R, Plataniotis KN. Vector edge operators for cDNA microarray spot localization. Comput Med Imaging Graph. 2007;31(7):510–22.

Angulo J, Serra J. Automatic analysis of DNA microarray images using mathematical morphology. Bioinformatics. 2003;19(5):553–62.

Shirani, S. (2018). 5 Non-Statistical Segmentation Methods for DNA Microarray Images. Microarray Image and Data Analysis: Theory and Practice, 129.

Bajcsy P. An overview of DNA microarray grid alignment and foreground separation approaches. EURASIP Journal on Advances in Signal Processing. 2006;2006:1–13.

Karimi N, Samavi S, Shirani S, Banaei A, Nasr-Esfahani E. Real-time lossless compression of microarray images by separate compaction of foreground and background. Computer Standards & Interfaces. 2015;39:34–43.

Farouk RM, SayedElahl MA. Microarray spot segmentation algorithm based on integro-differential operator. Egyptian Informatics Journal. 2019;20(3):173–8.

Eisen, M. B., & Brown, P. O. (1999). [12] DNA arrays for analysis of gene expression. In Methods in enzymology (Vol. 303, pp. 179–205). Academic Press.

El-Gawady AS, Eltoukhy MM, El-Tawel G, Wahed ME. Segmentation of complementary DNA microarray images using marker-controlled watershed technique. International Journal of Computer Applications. 2015;975:8887.

Deepa, J., & Thomas, T. (2009, December). Automatic segmentation of DNA microarray images using an improved seeded region growing method. In 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC) (pp. 1469–1474). IEEE.

Abdulrahman, A., & Varol, S. (2020, June). A Review of Image Segmentation Using MATLAB Environment. In 2020 8th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1–5). IEEE.

Wu, S., & Yan, H. (2003, February). Microarray image processing based on clustering and morphological analysis. In APBC (pp. 111–118).

Shao G, Li D, Zhang J, Yang J, Shangguan Y. Automatic microarray image segmentation with clustering-based algorithms. PLoS ONE. 2019;14(1): e0210075.

Ho J, Hwang WL. Automatic microarray spot segmentation using a snake-fisher model. IEEE Trans Med Imaging. 2008;27(6):847–57.

Athanasiadis E, Cavouras D, Kostopoulos S, Glotsos D, Kalatzis I, Nikiforidis G. A wavelet-based Markov random field segmentation model in segmenting microarray experiments. Comput Methods Programs Biomed. 2011;104(3):307–15.

Katzer M, Kummert F, Sagerer G. Methods for automatic microarray image segmentation. IEEE Trans Nanobiosci. 2003;2(4):202–14.

Said KAM, Jambek AB, Sulaiman N. A study of image processing using morphological opening and closing processes. International Journal of Control Theory and Applications. 2016;9(31):15–21.

In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)-Workshops (pp. 147–147). IEEE.

Farouk, R. M., Badr, S., & Elahl, M. S. (2014). Recognition of cDNA microarray image using feedforward artificial Neural Network. arXiv preprint arXiv:1410.2381 .

Manjunath SS, Shreenidhi BS, Nagaraja J, Pradeep BS. Morphological spot detection and analysis for microarray images. International Journal of Innovative Technology and Exploring Engineering. 2013;2(5):189–93.

Belean B, Borda M, Ackermann J, Koch I, Balacescu O. Unsupervised image segmentation for microarray spots with irregular contours and inner holes. BMC Bioinformatics. 2015;16(1):1–12.

Jain AN, Tokuyasu TA, Snijders AM, Segraves R, Albertson DG, Pinkel D. Fully automatic quantification of microarray image data. Genome Res. 2002;12(2):325–32.

Linder, N., Konsti, J., Turkki, R., Rahtu, E., Lundin, M., Nordling, S., ... & Lundin, J. (2012). Identification of tumor epithelium and stroma in tissue microarrays using texture analysis. Diagnostic pathology, 7(1), 1–11.

Groch, K., Kuklin, A., Petrov, A., & Shams, S. (2001). Image segmentation and quality control measures in microarray image analysis. JALA: Journal of the Association for Laboratory Automation, 6(3), 73–76.

A Alkhaldi, N., Abdulaziz Abdullah Alsedais, R., Halawani, H. T., & Abdelkhalek Aboutaleb, S. M. (2022). Manta Ray Foraging Optimization with Vector Quantization Based Microarray Image Compression Technique. Computational Intelligence and Neuroscience, 2022.

Quackenbush J. Microarray data normalization and transformation. Nat Genet. 2002;32(4):496–501.

Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2(6):418–27.

Hern, M., Munoz-Gómez, J., Blanes, I., Marcellin, M. W., & Serra-Sagrista, J. (2012, April). DNA microarray image coding. In 2012 Data Compression Conference (pp. 32–41). IEEE.

Liu, Y. Z. N. Y., Ning, Z., Chen, Y., Guo, M., Liu, Y., Gali, N. K., ... & Lan, K. (2020). Aerodynamic characteristics and RNA concentration of SARS-CoV-2 aerosol in Wuhan hospitals during COVID-19 outbreak. bioRxiv. 2020. DOI, 10(2020.03), 08–982637.

Wang, D., Hu, B., Hu, C., Zhu, F., Liu, X., Zhang, J., ... & Peng, Z. (2020). Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–infected pneumonia in Wuhan, China. Jama, 323(11), 1061–1069.

Nagarajan R. Intensity-based segmentation of microarray images. IEEE Trans Med Imaging. 2003;22(7):882–9.

Kashyap RAMGOPAL, Gautam PRATIMA. Microarray image segmentation using improved GOGAC method. International Journal of Computer Science and Engineering (IJCSE). 2013;2(4):67–74.

Zhang, Y., Szustakowski, J., & Schinke, M. (2009). Bioinformatics analysis of microarray data. Cardiovascular Genomics, 259–284.

Belean B, Gutt R, Costea C, Balacescu O. Microarray image analysis: from image processing methods to gene expression levels estimation. IEEE Access. 2020;8:159196–205.

Liang P, Pardee AB. Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science. 1992;257(5072):967–71.

Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006;7(1):1–13.

Zeebaree, D. Q., Haron, H., & Abdulazeez, A. M. (2018, October). Gene selection and classification of microarray data using convolutional neural network. In 2018 International Conference on Advanced Science and Engineering (ICOASE) (pp. 145–150). IEEE.

Lukac R, Plataniotis KN, Smolka B, Venetsanopoulos AN. A multichannel order-statistic technique for cDNA microarray image processing. IEEE Trans Nanobiosci. 2004;3(4):272–85.

Fard, P. J. M., & Moradi, M. H. (2009). Micro Array Images Segmentation Using a Novel Approach. In World Congress on Medical Physics and Biomedical Engineering, September 7–12. Munich, Germany. Berlin, Heidelberg: Springer; 2009. p. 1520–3.

Uslan V, Bucak IÖ. Microarray image segmentation using clustering methods. Mathematical and Computational Applications. 2010;15(2):240–7.

Article   MATH   Google Scholar  

Liu, Y., & Sha, X. Z. (2010, June). Automatic Recognition of Microarray Images Using Projection Algorithm. In 2010 4th International Conference on Bioinformatics and Biomedical Engineering (pp. 1–4). IEEE.

Ayyad SM, Saleh AI, Labib LM. A new distributed feature selection technique for classifying gene expression data. Int J Biomath. 2019;12(04):1950039.

Chen Y, Meyer CA, Liu T, Li W, Liu JS, Liu XS. MM-ChIP enables integrative analysis of cross-platform and between-laboratory ChIP-chip or ChIP-seq data. Genome Biol. 2011;12(2):1–10.

Sealfon, S. C., & Chu, T. T. (2011). RNA and DNA microarrays. In Biological microarrays (pp. 3–34). Humana Press, Totowa, NJ.

Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS ONE. 2014;9(1): e78644.

Smyth GK, Speed T. Normalization of cDNA microarray data. Methods. 2003;31(4):265–73.

Liu, Y., Ji, Y., Li, M., Wang, M., Yi, X., Yin, C., ... & Xiao, Y. (2018). Integrated analysis of long noncoding RNA and mRNA expression profile in children with obesity by microarray analysis. Scientific reports, 8(1), 1–13.

Shao, G., Wang, T., Hong, W., & Chen, Z. (2013, April). An improved SVM method for cDNA microarray image segmentation. In 2013 8th International Conference on Computer Science & Education (pp. 391–395). IEEE.

Hedde, P. N., Abram, T. J., Jain, A., Nakajima, R., de Assis, R. R., Pearce, T., ... & Zhao, W. (2020). A modular microarray imaging system for highly specific COVID-19 antibody testing. Lab on a Chip, 20(18), 3302–3309.

Joseph, S. M., & Sathidevi, P. S. (2019, October). CDNA microarray image enhancement for effective gridding of spots. In TENCON 2019–2019 IEEE Region 10 Conference (TENCON) (pp. 326–331). IEEE.

Wang XH, Istepanian RS, Song YH. Application of wavelet modulus maxima in microarray spots recognition. IEEE Trans Nanobiosci. 2003;2(4):190–2.

Gan, Z., Zou, F., Zeng, N., Xiong, B., Liao, L., Li, H., ... & Du, M. (2019). Wavelet denoising algorithm based on NDOA compressed sensing for fluorescence image of microarray. IEEE access, 7, 13338–13346.

Baans, O. S., & Jambek, A. B. (2017, September). Software profiling analysis for DNA microarray image processing algorithm. In 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) (pp. 129–132). IEEE.

Bariamis D, Maroulis D, Iakovidis DK. Unsupervised SVM-based gridding for DNA microarray images. Comput Med Imaging Graph. 2010;34(6):418–25.

Ahmad, M. M., Jambek, A. B., & bin Mashor, M. Y. (2014, August). A study on microarray image gridding techniques for DNA analysis. In 2014 2nd International Conference on Electronic Design (ICED) (pp. 171–175). IEEE.

Zacharia E, Maroulis D. An original genetic approach to the fully automatic gridding of microarray images. IEEE Trans Med Imaging. 2008;27(6):805–13.

Karthik SA, Manjunath SS. Automatic gridding of noisy microarray images based on coefficient of variation. Informatics in Medicine Unlocked. 2019;17: 100264.

Harikiran, J., Avinash, B., LAKSHMI, P., & Kirankumar, R. (2014). AUTOMATIC GRIDDING METHOD FOR MICROARRAY IMAGES. Journal of Theoretical & Applied Information Technology, 65(1).

Katzer, M., Kummert, F., & Sagerer, G. (2003, March). A Markov random field model of microarray gridding. In Proceedings of the 2003 ACM symposium on Applied Computing (pp. 72–77).

Bengtsson A, Bengtsson H. Microarray image analysis: background estimation using quantile and morphological filters. BMC Bioinformatics. 2006;7(1):1–15.

Neekabadi, A., Samavi, S., Razavi, S. A., Karimi, N., & Shirani, S. (2007, September). Lossless microarray image compression using region based predictors. In 2007 IEEE International Conference on Image Processing (Vol. 2, pp. II-349). IEEE.

Kondisetty DP, Hussain MA. A novel approach for cDNA image segmentation using SLIC based SOM methodology. International Journal of Engineering and Technology (UAE). 2018;7(2):52–5.

Saberkari H, Bahrami S, Shamsi M, Amoshahy MJ, Ghavifekr HB, Sedaaghi MH. Fully automated complementary DNA microarray segmentation using a novel fuzzy-based algorithm. Journal of Medical Signals and Sensors. 2015;5(3):182.

Giannakeas N, Karvelis PS, Exarchos TP, Kalatzis FG, Fotiadis DI. Segmentation of microarray images using pixel classification—Comparison with clustering-based methods. Comput Biol Med. 2013;43(6):705–16.

Siddiqui, K. I., Hero, A. O., & Siddiqui, M. M. (2002, November). Mathematical morphology applied to spot segmentation and quantification of gene microarray images. In Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002. (Vol. 1, pp. 926–930). IEEE.

Hira, Z. M., & Gillies, D. F. (2015). A review of feature selection and feature extraction methods applied on microarray data. Advances in bioinformatics, 2015.

Farouk, R. M., & SayedElahl, M. A. (2016). Robust cDNA microarray image segmentation and analysis technique based on Hough circle transform. arXiv preprint arXiv:1603.07123 .

Wang, F., Huang, S., Gao, R., Zhou, Y., Lai, C., Li, Z., ... & Liu, L. (2020). Initial whole-genome sequencing and analysis of the host genetic contribution to COVID-19 severity and susceptibility. Cell discovery, 6(1), 1–16.

Tarca AL, Romero R, Draghici S. Analysis of microarray experiments of gene expression profiling. Am J Obstet Gynecol. 2006;195(2):373–88.

Aharoni A, Vorst O. DNA microarrays for functional plant genomics. Plant Mol Biol. 2002;48(1):99–118.

Blohm DH, Guiseppi-Elie A. New developments in microarray technology. Curr Opin Biotechnol. 2001;12(1):41–7.

Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing. 2017;256:56–62.

Dashtban M, Balafar M. Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics. 2017;109(2):91–107.

Tsai, T. H., Yang, C. P., Tsai, W. C., & Chen, P. H. (2007, October). Error Reduction on Automatic Segmentation in Microarray Image. In 2007 IEEE Workshop on Signal Processing Systems (pp. 76–81). IEEE.

Genome research issue, https://genome.cshlp.org . January 2023.

Asaph Aharoni OV. DNA microarrays for functional plant genomics. Plant Mol Biol, Springer. 2002;48:99–118.

GenePix 4000A User's Guide. In GenePix 4000A User's Guide. Union City, CA, USA: s.l. : Axon Instruments. UserGuide, 1999.

Temme, J. S., & Gildersleeve, J. C. (2022). General strategies for glycan microarray data processing and analysis. Glycan Microarrays: Methods and Protocols, 67–87.

Ahmed, S. T., & Kadhem, S. M. (2022). Early Alzheimer's Disease Detection Using Different Techniques Based on Microarray Data: A Review. International Journal of Online & Biomedical Engineering, 16(4).

Akune Y, Arpinar S, Silva LM, Palma AS, Tajadura-Ortega V, Aoki-Kinoshita KF, Feizi T. CarbArrayART: a new software tool for carbohydrate microarray data storage, processing, presentation, and reporting. Glycobiology. 2022;32(7):552–5.

Alrefai N, Ibrahim O. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput Appl. 2022;34(16):13513–28.

Zaffino, P., & Spadea, M. F. (2022). Algorithms to Preprocess Microarray Image Data. Microarray Data Analysis, 69–78.

Baans, O. S., & Jambek, A. B. (2019). Background correction method for DNA microarray image processing. Asia-Pacific Journal of Molecular Biology and Biotechnology, 27(3).

Osama, S., Shaban, H., & Ali, A. A. (2022). Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review. Expert Systems with Applications, 118946.

Download references

Author information

Authors and affiliations.

Department of Information Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka State, India

C. K. Roopa, M. P. Priya, B. S. Harish & M. S. Maheshan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to B. S. Harish .

Ethics declarations

Conflict of interest.

The authors whose names are listed certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Progresses in Image Processing” guest edited by P. Nagabhushan, Peter Peer, Partha Pratim Roy and Satish Kumar Singh.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Roopa, C.K., Priya, M.P., Harish, B.S. et al. A Comprehensive Survey of Recent Approaches on Microarray Image Data. SN COMPUT. SCI. 5 , 64 (2024). https://doi.org/10.1007/s42979-023-02352-5

Download citation

Received : 13 February 2023

Accepted : 20 September 2023

Published : 06 December 2023

DOI : https://doi.org/10.1007/s42979-023-02352-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Microarray image processing
  • Bioinformatics
  • Computational methods

Advertisement

  • Find a journal
  • Publish with us
  • Track your research
  • Research Article
  • Open access
  • Published: 01 September 2007

DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research

  • Victor Trevino 1 , 2 ,
  • Francesco Falciani 2 &
  • Hugo A. Barrera-Saldaña 3  

Molecular Medicine volume  13 ,  pages 527–541 ( 2007 ) Cite this article

21k Accesses

156 Citations

15 Altmetric

Metrics details

Among the many benefits of the Human Genome Project are new and powerful tools such as the genome-wide hybridization devices referred to as microarrays. Initially designed to measure gene transcriptional levels, microarray technologies are now used for comparing other genome features among individuals and their tissues and cells. Results provide valuable information on disease subcategories, disease prognosis, and treatment outcome. Likewise, they reveal differences in genetic makeup, regulatory mechanisms, and subtle variations and move us closer to the era of personalized medicine. To understand this powerful tool, its versatility, and how dramatically it is changing the molecular approach to biomedical and clinical research, this review describes the technology, its applications, a didactic step-by-step review of a typical microarray protocol, and a real experiment. Finally, it calls the attention of the medical community to the importance of integrating multidisciplinary teams to take advantage of this technology and its expanding applications that, in a slide, reveals our genetic inheritance and destiny.

Introduction

Genomics approaches have changed the way we do research in biology and medicine. We now can measure the majority of mRNAs, proteins, metabolites, protein-protein interactions, genomic mutations, polymorphisms, epigenetic alterations, and micro RNAs in a single experiment. The data generated by these methods together with the knowledge derived by their analyses was unimaginable just a few years ago. These techniques, however, produce such amounts of data that making sense of them is a difficult task. So far, DNA microarray technologies are perhaps the most successful and mature methodologies for high-throughput and large-scale genomic analyses.

DNA microarray technologies initially were designed to measure the transcriptional levels of RNA transcripts derived from thousands of genes within a genome in a single experiment. This technology has made it possible to relate physiological cell states to gene expression patterns for studying tumors, diseases progression, cellular response to stimuli, and drug target identification. For example, subsets of genes with increased and decreased activities (referred to as transcriptional profiles or gene expression “signatures”) have been identified for acute lymphoblast leukemia ( 1 ), breast cancer ( 2 ), prostate cancer ( 3 ), lung cancer ( 4 ), colon cancer ( 5 ), multiple tumor types ( 6 ), apoptosis-induction ( 7 ), tumorigenesis ( 8 ), and drug response ( 9 ). Moreover, because the published data is increasing every day, integrated analysis of several studies or “meta-analysis,” have been proposed in the literature ( 10 ). These approaches detect generalities and particularities of gene expression in diseases.

More recent uses of DNA microarrays in biomedical research are not limited to gene expression. DNA microarrays are being used to detect single nucleotide polymorphisms (SNPs) of our genome (Hap Map project) ( 11 ), aberrations in methylation patterns ( 12 ), alterations in gene copy-number ( 13 ), alternative RNA splicing ( 14 ), and pathogen detection ( 15 , 16 ).

In the last ten or 15 years, high quality arrays, standardized hybridization protocols, accurate scanning technologies, and robust computational methods have established DNA microarray for gene expression as a powerful, mature, and easy to use essential genomic tool. Although the identification of the most relevant information from microarray experiments is still under active research, very well established methods are available for a broad spectrum of experimental setups. In this publication, we present the most common uses of DNA microarray technologies, provide an overview of their frequent biomedical applications, describe the steps of a typical laboratory procedure, guide the reader through the processing of a real experiment to detect differentially expressed genes, and list valuable web-based microarray data and software repositories.

Technology Description

It is well known that complementary single-stranded sequences of nucleic acids form double stranded hybrids. This property is the basis of the very powerful molecular biology tools such as Southern and Northern blots, in situ hybridization, and Polymerase Chain Reaction (PCR). In these, specific single-stranded DNA sequences are used to probe for its complementary sequence (DNA or RNA) forming hybrids. This same idea also is used in DNA microarray technologies. The aim, however, is not only to detect but also to measure the expression levels of not a few but rather thousands of genes in the same experiment. For this purpose, thousands of single-stranded sequences that are complementary to target sequences are bound, synthesized, or spotted to a glass support whose size is similar to a typical microscope slide. There are mainly two types of DNA arrays, depending on the type of spotted probes. One uses small single-stranded oligonucleotides (~22 nt) synthesized in situ whose leading provider is Affymetrix (Santa Clara, CA, USA, http://www.affymetrix.com ). The other type of arrays uses complementary DNA (cDNA) obtained by reverse transcription of the genes’ messenger RNAs (mRNA), completion of the second strand, cloning of the double-stranded DNAs, and typically PCR amplification of their open reading frames (ORF), which become the bound probes. One of the limitations of using large ORF or cDNA sequences is an uneven optimal melting temperature caused by differences in their sizes and the content of GC-paired nucleotides. A second problem is cross-hybridization of closely related sequences, overlapped genes, and splicing variants. In oligo-based DNA arrays, the targeted nucleic acid specie is redundantly detected by designing several complementary oligonucleotides spanning each entire target sequence by segments. The oligonucleotides are designed in such a way to avoid the cDNA probe drawbacks and to maximize the specificity for the target gene. Initially, DNA arrays were based on nylon membranes that are still in use. However, the glass provides an excellent support for attaching the nucleotide sequences, is less sensitive to light than membranes, and is non-porous, allowing the use of very small amounts of sample. There is a more recent and different technology that uses designed oligonucleotide probes attached to beads that are deposited randomly in a support. The position of each bead and hence the sequence it carries is determined by a complex pseudo-sequencing process. These types of arrays, provided by Illumina (SanDiego, CA, USA, http://www.illumina.com ) are mainly used for genotyping, copy-number measurements, sequencing, and detecting loss of heterozygosity (LOH), allele-specific expression, and methylation. A recent review of this technology has been published elsewhere ( 17 ). For clinical research, however, the preferred technology so far is the oligo-based microarrays whose leading provider is Affymetrix.

The general process in microarray experiments is depicted in Figure 1 . Fluorescent dyes are used to label the extracted mRNAs or amplified cDNAs from the tissue or cell samples to be analyzed. The DNA array is then hybridized with the labeled sample(s) by incubating, usually overnight, and then washing to remove non-specific hybrids. A laser excites the attached fluorescent dyes to produce light which is detected by a (confocal) scanner. The scanner generates a digital image from the excited microarray. The digital image is further processed by specialized software to transform the image of each spot to a numerical reading. This process is performed, first, finding the specific location and shape of each spot, followed by the integration (summation) of intensities inside the defined spot, and, finally, estimating the surrounding background noise. Background noise generally is subtracted from the integrated signal. This final reading is an integer value assumed to be proportional to the concentration of the target sequence in the sample to which the probe in the spot is directed. In competitive two-dye assays, the reading is transformed to a ratio equal to the relative abundance of the target sequence (labeled with one type of fluorochrome) from a sample respect to a reference sample (labeled with another type of fluorochrome). In the one-dye Affymetrix technologies, the fluorescence is commonly yellow, whereas in two-dyes technologies the colors used are green for reference and red for sample (although a replicate using dye-swap is common). The choice of the technology that is more appropriate depends on experimental design, availability, costs, and the expected number of expression changes. In general, when only a minority of the genes is expected to change, a two-dye or reference design is more suitable, otherwise a one-dye technology may be more appropriate.

figure 1

Schematic Representation of a Gene Expression Microarray Assay. Arrows represent process (left column) and pictures or text represent the product. Differences in the protocol in one- and two-dye technologies are specific to the technology rather than to the samples or question. For CGH, the process is similar, replacing mRNA by DNA.

Finally, at the end of the experiment, an important issue derived from statistical tests in microarray data is the concept of the real significance of results and the concomitant need for multiplicity of tests. For example, when applying a t-test, the result is the probability that the observed values are given by chance. Commonly, we call a result significant when the probability is smaller than five percent. For large-scale data, a t-test would be performed thousands of times (one for each gene) which means that from 10,000 t-tests at five percent of significance level, we will call 500 genes differentially expressed merely by chance which is very close or even higher than those actually selected from experiments. Therefore, a correction to attempt to control for false positives should be performed. The most common correction method is the False Discovery Rate proposed originally by Benjamini and Hochberg ( 18 ) and extended by Storey and Tibshirani ( 19 ).

Applications in Biomedical Research

The ultimate output from any microarray assay, independent of the technology, is to provide a measure for each gene or probe of the relative abundance of the complementary target in the examined sample. In this section, we revise the most common applications of the data derived from clinical studies using microarrays irrespective of the technology employed.

Relating Gene Expression to Physiology: Differential Expressed Genes

The most common and basic question in DNA microarray experiments is whether genes appear to be downregulated (the expression has decreased) or upregulated (the expression has increased) between two or more groups of samples. This type of analysis is essential because it provides the simplest characterization of the specific molecular differences that are associated with a specific biological effect. These signatures can be used to generate new hypotheses and guide the design of further experiments. A statistical test is used to assess each gene to determine whether the expression is statistically different between two or more groups of samples (Figure 2 ). When comparing populations of individuals, a large number of samples per class are needed to avoid interference from variation due to individuals rather than experimental group. For laboratory-controlled samples, such as cell lines or strains, at least three biological replicates are recommended to compute a good estimate of the variance, hence the statistical confidence (as more replicates means more confidence and fewer false positives). Using a statistical technique called power analysis, it is possible to estimate the number of samples required to identify a high percentage of truly differentially regulated genes. Although the use of this approach is common practice in the design of biological experiments, its use is not widespread in the microarray community.

figure 2

Detection of Differential Expressed Genes. Large differences in gene expression are likely to be genuine differences between two groups of samples (A and B) whereas small differences are unlikely to be truly differences. Samples can be biological replicates or unreplicated populational samples.

To detect differentially expressed genes, intuitive and formal statistical approaches have been proposed. The most famous intuitive approach, proposed in early microarray studies, is the fold change in fluorescence intensity ( 20 , 21 ) expressed as the logarithm (base 2 or log 2 ) of the sample divided by the reference (ratios). In this way, fold change equal to one means that the expression level has increased two fold (upregulation), fold change equal to −1 means that the expression level has decreased two fold (downregulation) whereas zero means that the expression level has not changed. Larger values account for larger fold changes. Genes whose fold change is larger than a certain (arbitrary) value, are selected for further analyses. Although fold change is a very useful measure, the weaknesses of this criterion are the overestimation for low expressed genes in the reference (denominators close to zero tend to elevate the value of the ratio), the subjective nature of the value that determines a “significant” change, and the tendency to omit small but significant changes in gene expression levels. For these reasons, currently the most sensible option is following formal statistical approaches to select differentially expressed genes. For two groups of samples, the common t-test is the easiest option, while not the best, for analyzing two-dye microarrays whose log 2 ratios generate normal-like distributions after normalization (see next section), and the ANOVA (analysis of variance) test for more than two groups of samples. These options apply for both one- and two-dye microarrays. If the data is non-standardized, Wilcoxon or MannWhitney tests may be applied. A comparison of differential expression statistical tests, including t-test, has been published elsewhere ( 22 ).

The approaches we have described are univariate. That is, one gene is tested at a time independently of any other gene. There are multivariate procedures however, where genes are tested in combinations rather than isolated. Whilst being more powerful ( 23 – 26 ), these approaches require a more complex analysis.

Biomarker Detection: Supervised Classification

Disease type and severity often are determined by expert physicians or pathologists on the basis of patient symptoms or by analyzing features of the diseased tissue obtained by biopsy inspection. This categorization may allow the choice of appropriate pharmacological or surgical therapy. In this context, the availability of molecular markers associated with clinical outcome have been useful in allowing disease monitoring to begin at a very early stage and complementing the clinical and histo-pathological analysis. The more recent application of DNA microarrays in clinical research has been a very important step toward the development of more complex markers based on multigene signatures. The identification of gene expression “signatures” associated with disease categories is called biomarker detection or supervised classification (Figure 3 ).

figure 3

Biomarker Detection. Larger differences in gene expression are more likely to be genuine differences between two groups of samples (A and B) than small differences. In this case, a large number of samples are more informative than individual replications.

The fundamental difference between identifying differentially expressed genes and identifying a set of genes of real diagnostic or prognostic value is that a biomarker needs to be predictive of disease class or clinical outcome. For this reason, it must be possible to associate, to a given set of marker genes, a rule that allows identification of an unknown sample. The classification accuracy of the biomarker also needs to be determined with robust statistical procedures. Therefore, during the biomarker selection procedure, a substantial fraction of the samples are set aside in order to evaluate independently the accuracy of the selected biomarkers (in terms of sensitivity and specificity). Thus, such studies require a relatively large number of samples.

We already explained that unlike differential expression, in biomarker selection for diagnostics, a rule is needed to make predictions. This rule is generated by a classifier, a statistical model that assigns a sample to a certain category based on gene expression values. For example, a sensible classifier for diabetes is whether sugar levels in serum reach certain value. In statistics, this classifier is referred to as univariate. That is, only one variable (sugar level) is needed in the rule. Nevertheless, for DNA microarray studies, it is common to obtain a large gene list useful for disease discrimination. Multiple genes provide robustness in the estimation and consider potential synergy between genes. Therefore, multivariate classifiers are commonly used. For example, it is well known that obesity and parental predisposition to diabetes, in addition to sugar levels in serum, is a more precise diabetes diagnosis criteria. Multivariate classifier can be designed using genes selected either by a univariate method such as t-test, ANOVA, Wilcoxon, PAM ( 27 ), Golub’s centroid ( 1 ), or by a multivariate method ( 23 – 26 ).

Thus, the possibility to characterize the molecular state of diseased tissues has led to an improvement in prognosis and diagnosis as well as providing evidence of the existence of distinct disease subclasses in previously considered homogeneous diseases.

Describing the Relationship Between the Molecular State of Biological Samples: Unsupervised Classification

One key issue in the analysis of microarray data is finding genes with a similar expression profile across a number of samples. Co-expressed genes have the potential to be regulated by the same transcriptional factors or to have similar functions (for example belonging to the same metabolic or signaling pathways). The detection of co-expressed genes therefore may reveal potential clinical targets, genes with similar biological functions, or expose novel biological connections between genes. On the other hand, we may want to describe the degree of similarity between biological samples at the transcriptional level ( 28 ). We may expect such analysis to confirm that samples with similar biological properties (for example samples derived from patients affected by the same disease) tend to have a similar molecular profile. Although this is true, it also has been demonstrated that the molecular profile of samples reflects disease heterogeneity and therefore it is useful in discovering novel diseases sub-classes ( 5 ). From the methodological prospective, these questions can be addressed using unsupervised clustering methods.

In this context, hierarchical clustering is, among several options ( 29 ), one of the most used unsupervised classification methods (Figure 4 ). Other methods are available in several software packages such as R (The R Roundation for Statistical Computing, http://www.r-project.org ), GEPAS ( 30 ), TIGR T4 ( 31 ), ( 32 ), GeneSpring ( 33 ), and Genesis ( 34 ). The core concept behind hierarchical clustering is the progressive construction of gene or sample cluster by adding one element (gene, sample, or a smaller cluster) at the time. In this way, more similar elements are added early to small clusters whereas less similar elements are added to later forming larger clusters. To decide which element is more similar to another, it is important to rely on a similarity or dissimilarity measure. Commonly used measures include Euclidean distance (defined as the geometrical distance between two elements in an n -dimensional space) and correlation distance. The result of the hierarchical clustering is therefore a hierarchical organization of patterns, similar to a phylogenetic tree. For example, in Figure 4b the most similar genes five and six are first merged to form a cluster, then genes one and two form a different cluster which is lengthened later on by adding the next more similar gene three; and the process continues until all genes have been included in a cluster and all clusters have been merged. For large-scale microarray data, it is common to use a simultaneous hierarchical clustering for samples and genes ( 32 ). Typically, genes are represented in the y-axis, whereas samples are drawn in the x-axis. A color-coded matrix (heatmap), where samples and genes are sorted according to the results of the clustering, is used to represent the expression values for each gene in each sample. This two-dimensional clustering procedure is particularly suitable to explore the results of a large microarray experiment (see Figure 4 ).

figure 4

Unsupervised Classification and Detection of Co-expressed Genes. (A) Double-Hierarchical clustering of gene expression values (heatmap), in rows by genes, and in columns by samples. Similar samples (columns) generate clusters easily identified. For example, the gene expression of samples A and C is similar across genes. However A and C are different from the rest. Co-expressed genes (rows) form tight and small clusters. A selected cluster framed by dotted lines is shown in B. (B) Hierarchical generation of clusters from a selected group of genes in A.

Identification of Prognostic Genes Associated with Risk and Survival

In medicine, the association of prognostic factors with survival times is invaluable. The link between gene expression levels and survival times may provide a useful tool for early diagnosis, prompt therapeutic intervention, and designing patient-specific treatments. Consequently, the selection of biomarkers that correlate with survival times is a very important objective in the analysis of microarray data. To date, a number of approaches have been developed. The most commonly used procedures incorporate genes into exponential, poison, or Cox regression models using a univariate variable selection procedure ( 35 ). The gene selection procedure is summarized in Figure 5 . The selected genes combined in clinical classes can then be used to detect variations in survival times using both the Kaplan-Meier method and statistical tests. Often, researchers are interested in finding subgroups of samples independently of the recorded clinical data whose survival times are significantly different. This information can then be used to prescribe specific treatments. In previous sections, we have shown how unsupervised data exploration methods such as cluster analysis can be used to identify sub-groups of samples within what was previously considered an homogeneous disease. Once these sub-groups have been identified, survival analysis can be used to test whether they are characterized by different clinical outcomes ( 35 ).

figure 5

Selection Procedure for Genes Associated with Survival Times as Risk Factors. A positive gene (left plot) is that whose expression included as a risk factor in a survival model (Cox, exponential, poison, etc.) can be fitted reasonably well (dotted line) to the original survival times (steep solid line). The predicted survival curve from a negative gene (dotted line in right plot) is not close to the observed survival curve (steep solid line).

Association of Genes with Disease Surrogate Markers: Regression Analysis

An interesting question in the analysis of microarray data derived from clinical studies is whether there is an association between gene expression and an ordinal variable that represent a response, or more generally, a measure of disease progression — a surrogate marker. Examples of these variables are the concentration of metabolites, proteins in serum, response to treatment or dosage, growth, or any other clinical measure whose numerical representation makes sense progressively. The approach, depicted in Figure 6 , is conceptually similar to that introduced in the Survival Analysis section of this review. The mathematical model in the cases that relate the independent variable, such as time, levels of metabolites, protein, or treatment, to dependent variables (genes) is, commonly, a linear regression model. Nevertheless, such a model can be modified to include other available information.

figure 6

Selection Procedure for Genes Associated with Outcome. The expression of a positive gene (horizontal axis in left plot) is highly correlated with the associated outcome (vertical axis). For a non-associated gene (right plot), the gene expression (horizontal axis) is not correlated to outcome (vertical axis).

Genetic Disorders: Gene Copy Number and Comparative Genomic Hybridization

It is well known that several inherited diseases are a consequence of genetic rearrangements such as gene duplications, translocations, and deletions. Moreover, these alterations are observed in cancer cells as well. A specific microarray technique used to detect these abnormalities in a single hybridization experiment is called Comparative Genomic Hybridization (CGH) (Pollack, 1999) ( 13 ). The core concept in CGH is the use of genomic DNA (gDNA) in the hybridization to compare the gDNA from a disease sample versus that of a healthy individual. Hence, a typical microarray design can be used in this approach (see Figure 1 ).

The signal intensity in all probes in the microarray should, therefore, be very similar for healthy samples. Thus, differences in gene copy number are easily detected by changes in signal intensity. Using this technology, Zhao et al. , (2005) ( 36 ) recently have characterized the variations of gene copy number in several cell lines derived from prostate cancer and Braude et al., ( 36 ) confirmed an alteration in chronic myeloid leukemia.

Genetic Disorders: Epigenetics and Methylation

Around 80 percent of CpG-dinucleotides are naturally methylated at the fifth position of the cytosine pyrimidine ring ( 37 ). The patterns of cytosine methylation along with histone acetylation and phosphorylation control the activation and deactivation of genes without changing the nucleotide sequence ( 38 ). These regulatory mechanisms are known as the epigenetic phenomena. In particular, genes methylated in their promoters become inactive irrespective of the presence of the transcriptional activators. Aberrations in any of these epigenetic patterns cause several syndromes and may predispose carriers to cancer ( 39 ). To detect patterns of methylation using microarrays, two main methods have been proposed ( 40 ). One is based on the enrichment of the unmethylated fraction of CpG islands and the other focuses on the hypermethylated fraction. Both methods make use of methylation-sensitive restriction enzymes to generate fragments enriched in either unmethylated or methylated CpG sites (Figure 7 ). In the first method, sample and control gDNA are cleaved with methylation-sensitive enzymes that cut unmethylated CpG sites generating protruding shorter fragments leaving methylated CpG sites unaltered. Specific adaptors then are linked to these protruding ends. Methylated fragments subsequently are cut by a CpG specific enzyme. The remaining fragments that contain the adaptor, those that were originally unmethylated, are amplified using PCR and primers complementary to the adaptors’ sequence. The result is that genes belonging to the unmethylated fraction are associated with higher fluorescent intensities on the microarray. On the other hand, in the second method, the gDNA from the sample and control samples are cleaved with a restriction enzyme to generate small protruding fragments. Fragments then are linked to adaptors and cut by methylationsensitive restriction enzymes leaving methylated flanked fragments unaltered which are amplified using PCR. The result is that the methylated fraction is amplified and detected in the microarray. The microarrays used in these experiments are, therefore, specially designed to include such fragments. Using the methods described, methylation patterns have been screened for several types of cancers ( 41 – 46 ).

figure 7

Detection of Altered Methylated Patterns and DNA Polymorphisms in Genomic DNA. Left Panel; Enrichment of unmethylated DNA fragments (see text). Right Panel; Enrichment of hypermethylated fragments (see text). Scheme adapted from Schumacher et al. (2006) ( 41 ).

Genetic Disorders and Variability: Gene Polymorphism and Single Nucleotide Polymorphism

The human genome carries at least ten million nucleotide positions that vary in at least one of 100 individuals in a population ( 47 ). The identification of these single nucleotide polymorphisms (SNPs) is an important tool for identifying genetic loci linked to complex disorders ( 47 ). Although there are commercially available microarrays to detect SNP, these technologies still are in their infancy and the widespread distribution is still halt because of the relatively high cost per sample. So far, the number of SNPs stored in public databases is more than two million whereas the available microarrays for SNPs detection only cover 10,000 SNPs. The three major strategies for SNP genotyping using microarrays are all based on primer extension techniques depicted in Figure 8 . The primer included in the microarray probe hybridizes to the target sequence precisely adjacent to its SNP. The first strategy (see Figure 8A ) consists of mini-sequencing the primer specific for each polymorphism immobilized in the microarray support. PCR products, DNA polymerase, and different color fluorescent-labeled nucleotides are added in the hybridization-one-base-extension to detect the SNPs in parallel. The genotype is detected by color combinations. The second strategy (see Figure 8B ) uses the same concept of primer-specific hybridization, though combined with only one dye and more than one base extension. The genotype is revealed by signal strength. The third strategy (see Figure 8C ) makes one-base extension in solution combined with different color fluorescent-labeled nucleotides. Primers then are captured by hybridization in the microarray. The genotype is detected by color combinations. Recent studies have produced genome-wide SNP characterization for a number of tumor types ( 48 – 50 ).

figure 8

Major Techniques for Detection of SNPs Using Microarrays. Colors and patterns are used for illustrative purposes. Scheme adapted from Syvanen (2005) ( 48 ).

Chromatin Immunoprecipitation: Genetic Control and Transcriptional Regulation

Transcription factors (TF) are regulatory proteins that can bind specific DNA sequences (usually promoters) to control the level of gene expression. Mutations or alterations in the expression or activation of TF are known in several diseases ( 51 ). For example, abnormal over-expression of the TF c-Myc is found in 90 percent of gynecological cancers, 80 percent of breast cancers, 70 percent of colon cancers, and 50 percent of hepatocarcinomas ( 52 ). Therefore, establishing the link between TF and their targets is essential to characterize and design better cancer therapies. To identify these targets, DNA fragments are incubated with a selected TF that has been tagged (Figure 9 ). The complex DNA-TF is precipitated using a quite specific antibody against the tagged peptide. Precipitated DNA then is labeled and hybridized in DNA microarrays to reveal genome-wide targets for the selected TF (see Figure 9 ). An experimental overview and computational methods for the analysis of these data have been revised elsewhere ( 53 , 54 ).

figure 9

Chromatin Immuno-Precipitation (ChIP-on-chip) Essay. The generation of a hybrid gene between a gene for a transcription factor (TF) and a tag coding sequence renders a quimaeric TF. Upon binding to its DNA target the complex can be pulled-down from the tag to recover such type of DNA sequences.

Pathogen Detection

Classically, pathogen detection is achieved through a series of clinical tests which detect, generally, single pathogens. A battery of clinical assays is therefore performed to typify a sample. A radical recent approach uses DNA microarrays to test for the presence of hundreds of pathogens in a single experiment ( 15 , 16 ). For this, known sequences from each pathogen are collected and those being pathogen-specific are selected (Figure 10 ). The collection of specific sequences is used to build a purpose-specific microarray. Then genomic DNA from a patient biopsy, or from a food sample suspected to be infected, is extracted and hybridized to the microarray. Pathogen detection is simply revealed by spot intensity.

figure 10

Multi-Pathogen Detection Using DNA Microarrays. Specific DNA sequences from disease-causing micro-organisms can be spotted on a microarray for pathogen detection.

An Overview of a Typical Microarray Experiment

In this section we provide a brief description of the typical workflow of a microarray experiment and its data analysis (see Figure 1 ).

RNA Extraction

RNA can be extracted from tissue or cultured cells using molecular biology laboratory procedures (although several commercial kits are available). The amount of mRNA required is about 0.5/µ/g which is equivalent to 20/µ/g of total RNA, though there is some variation depending on the microarray technology. When the amount of mRNA (or DNA) is scarce, an amplification step, for example by PCR amplification of reverse transcribed cDNA, is needed before labeling.

mRNA is retro-transcribed using reverse transcriptase to generate cDNA. Labeling is achieved by including in the reaction (or in a separate reaction) modified fluorescent nucleotides that are made fluorescent by excitation at appropriate wavelengths. The most common fluorescent dyes used are Cy3 (green) and Cy5 (red). The unincorporated dyes usually are removed by column chromatography or ethanol precipitation.

Hybridization

Hybridization is carried out according to conventional protocols. Hybridization solution contains saline sodium citrate (SSC), sodium dodecyl sulphate (SDS) as detergent, non-specific DNA such as yeast DNA, salmon sperm DNA, or repetitive sequences, blocking reagents like bovine serum albumin (BSA) or Denhardt’s reagent, and labeled cDNA from the samples. Hybridization temperatures range from 42 ° C to 45 ° C for cDNA-based microarrays and from 42 ° C to 50 ° C for oligo-based microarrays. Hybridization volumes vary between 20/µ/L to 1 mL depending on the microarray technology. A hybridization chamber is usually needed to keep temperature and humidity constant.

After hybridization, the microarray is washed in salt buffers of decreasing concentration and dried by slide centrifugation or by blowing air after immersion in alcohol. Then the slide is read by a scanner which consists of a device similar to a fluorescence microscope coupled with a laser, robotics, and digital camera to record the fluorescent excitation. The robotics focuses on the slide, lens, camera, and laser by rows similar to a common desktop scanner. The amount of signal (color) detected is presumed to be proportional to the amount of dye at each spot in the microarray and hence proportional to the RNA concentration of the complementary sequence in the sample. The output is, for each fluorescent dye, a monochromatic (non-colored) digital image file typically in TIFF format. False-color images (red, green, and yellow) are reconstructed by specialized software for visualization purposes only.

Image Analysis

The goal in this step is to identify the spots in the microarray image, quantify the signal, and record the quality of each spot. Depending on the software used, this step may need some degree of human intervention. The digital images are loaded in specialized software with a pre-loaded design of the microarray (grid layout) which instructs the software to consider number, position, shape, and dimension of each spot. The grid is then accommodated to the actual image automatically or manually. Fine-tuning of spot positions and shapes is usually performed to avoid any bias in the robotic construction of the microarray. Human involvement is needed to mark those spots that could be artifacts such as bubbles or scratches which are common. Finally, an automated integration function is performed using the software to convert the actual spot readings to a numerical value. The integration function considers the signal and background noise for each spot. The output of the image analysis may be commonly a tab-delimited text file or a specific file format. Common image analysis software include ScanArray (PerkinElmer, Waltham, MA, USA), GenePix (Axon), (Molecular Devices Corporation, Union City, CA, USA) TIGR-SpotFinder/TM4 ( www.tigr.org ), (The Institute for Genomic Research, Rockville, MD, USA) and GeneChip (Affymetrix, Santa Clara, CA, USA). This process varies from automatic or semi-automatic to manual depending on the microarray technology, scanner, and software used.

Normalization

Systematic errors are introduced in labeling, hybridization, and scanning procedures. The main aims of normalization is to correct for these errors preserving the biological information and to generate values that can be compared between experiments, especially when they were generated in, and with, different times, places, reagents, microarrays, or technicians. There are two types of normalization, “within” and “between” array normalization. “Within” array normalization refers to normalization applied in the same slide and it is applicable, generally, to two-dye technologies. For this, let us define M = Log 2 (R/G) and A = Log 2 (R*G)/2 where R and G are the red and green readings respectively. Under the assumption that the majority of genes have not been differentially expressed, the majority of the M values should oscillate around zero. “Within” normalization is finally performed shifting the imaginary line produced by the values of M (in vertical axis) to zero along the values of A (in horizontal axis). This kind of normalization, sometimes called loess, usually is performed by spatial blocks to avoid any bias in the microarray printing process (called print-tip-loess). “Between” normalization is necessary when at least two slides are analyzed to guarantee that both slides are measured in the same scale and that its values are independent from the parameters used to generate the measurements. The goal is to transform the data in such a way that all microarrays have the same distribution of values. For two-dye technologies this is optional and is commonly done through scaling or standardizing the values once within normalization has been performed. For one-dye microarrays, between normalization is usually performed using methods to equalize distributions such as quantilenormalization ( 55 ) after log 2 transformation. There are, however, a number of normalization methods. The right choice is usually data-dependent. A comparison of the results of different normalization methods is recommended.

Missing Values

The image analysis process (generally in spotted microarrays) does not always generate a value for a gene because the spot was defective or manually marked as faulty. This is not a major issue when genes are replicated in several spots in the microarray, because the reading of the gene still can be estimated using the remaining spots. If the value in a spot is systematically missing in several arrays, it should be removed from the analysis. If the number of missing values is low, the corresponding spots can simply not be considered in all arrays. However, when the number of arrays is large, this could lead to the removal of several spots. To avoid these problems, one must use only those methods that can deal with missing values, or, use algorithms to infer those values ( 30 ). Results should, therefore, be interpreted considering that some values were inferred.

Current microarrays contain more than 10,000 genes, spots, or probes. Dealing with large amounts of data may require expensive computational resources and large processing times. A common practice is to remove genes that have not shown significant changes across samples, genes with several missing data, or those whose average expression is very low (because low expressed genes are more susceptible to noise). The most common approaches use statistical tests (lower), signal-to-noise estimations (higher), variability (higher), and average (higher).

Transformation

The numerical values from image analysis are commonly integer numbers between one and 32,000 for both signal and background. The background normally is subtracted from the signal. The distribution of these values is, however, concentrated in a narrow range and, therefore, is transformed using logarithms (base 2 generally) which generate normal-like distributions. Negative values resulting from subtraction may raise problems in transformations which are resolved by restricting the values or performing more robust transformations such as the generalized logarithm.

Statistical Analysis

The procedure after image analysis and data processing depends mainly on the particular biological issue and data available. These procedures have been described in the Applications Section of this review.

Illustrating The Detection of Differentially Expressed Genes: The Case of Term Placenta

In previous sections, we have introduced the experimental and data analysis methods used in common microarray experiments. To illustrate these procedures, we will use a case study designed to identify genes that are preferentially expressed in placenta. This study, currently ongoing in our laboratory, is part of a larger project whose results are expected to assist further research revealing molecular mechanisms involved in fetus development, placental function, and pathologies related to pregnancy. To identify genes specific for human placenta, we used a two-color microarray. In this experiment, mRNA extracted from two normal human placentas was compared with a pool of mRNA extracted from several normal tissues not including placenta. To gain information on the variability expected from experimental errors, we also compared two aliquots of the reference mRNA in the same array. An overview of the process is depicted in Figure 11 . A brief description of the detailed procedure follows.

figure 11

Experimental Design of the Placenta Microarray Experiment. RNAs from two term human placentas were compared with RNAs from a collection of human tissues, except placenta, in search of placental specific transcripts.

Step 1: mRNA Extraction and Microarray Hybridization

Human total term placenta RNA isolated using proteinase K-phenol based protocol (described in ( 56 )) and a pool of commercially available total RNAs from several human tissues not including placenta were part of the set of reagents utilized in the EMBO-INER Advanced Practical Course 2005 held in Mexico city (EMBO Courses and Workshops Programme, Heidelberg Germany, http://www.embo.org/courses_workshops/mexico.html ). They were quality controlled by running them in a RNA 6000 Nano Assay from Agilent (Agilent Technologies Inc., Santa Clara, CA, USA). First strand cDNA was synthesized from each RNA (5 µg) sample by reverse transcription using an oligo-dT primer with a T7-promoter sequence attached to its 5′ end, while s strand resulted from treating the first strands with RNase H plus DNA polymerase I (Message Amp aRNA kit from Ambion, Austin, TX, USA). Column purified double-stranded cDNAs were transcribed (in vitro transcription) with T7 RNA polymerase and the amplified RNAs (aRNAs) were purified also by column binding and subsequent elution. Fluorescent labels were attached indirectly to the hybridization probes by a two-step procedure. The first step consisted of a reverse transcription of the aRNA using this time a mixture of all four desoxiribonucleotides and including aminoallyl-dUTP. In the second step, N -hydroxysuccinimide-activated fluorescent dyes (Cy3 and Cy5) were coupled to the cDNAs by reaction with the amino functional groups. Probes were preincubated with blocking reagents (human Cot DNA at 1 pg/mL and poly-dA DNA also at pg/mL) and then hybridized to prehybridized (6X SSC, 0.5 percent SDS and one percent BSA) slides in hybridization buffer (50 percent formamide, 6X SCC, 0.5 percent SDS and 5X Denhardt’s solution). Slides were washed once in 2X SSC/0.1 percent SDS at 65 ° C for five minutes, twice in 0.1X SSC/0.1 percent SDS but first at 65 ° C for ten minutes, and then at room temperature for two minutes, and finally in isopropanol, also at room temperature, with slide centrifugation between each washing step, and stored in the dark until scanning. Fluorescent probes were hybridized to cDNA microarrays (laboratory made oligo-based microarray containing half of the probes in each of two slides).

Step 2: Microarray Scanning, Spot Finding and Image Processing

Microarrays were scanned using ScanArray Express (PerkinElmer, Waltham, MA, USA). Images obtained were analyzed using ChipSkipper (EMBLEM Technology Transfer GmbH, Heidelberg, Germany, http://www.embl-em.de ) to obtain a single value for each spot representing the ratio (in log 2 scale) of the mRNA expression level from placenta to the reference mRNA from the pool of non-placenta tissues. A value of zero represents similar expression level in both mRNA samples. A value of one represents two-fold over-expression in placenta whereas a value of −1 represents two-fold downregulation in placenta. One placental sample was hybridized in duplicate into the two microarrays using a dye-swap design. In this approach the labeling scheme is reversed in two separate microarrays. To gain information on the variability associated with experimental error, two aliquots of the reference pool mRNA were compared on the same microarray. Likewise the comparison between experimental and control samples and the comparison between the two control samples were performed in duplicate using the dye-swap design. To summarize, the experiment was performed using six microarrays (two placenta samples compared with a reference in duplicate and two reference mRNA as controls, see Figure 11 ).

Step 3: Quality Assessment, Processing and Normalization

To ensure that all microarrays were comparable in scale, we performed print-tip loess normalization, shifting the imaginary M line to zero (Figure 12 ). We processed the dataset, removing from the analysis all control and empty spots. Representative plots before and after “within” normalization and processing for both placenta and control experiments are shown in Figure 12 . Note that, as expected, there are important differences in ratio values (see M value in Figure 12C - D ) for highly expressed genes (A value) in placenta compared with the reference (see Figure 12C ), whereas ratios in the control experiment are very close to zero (see Figure 12D ) indicating a very high reproducibility of the technology.

figure 12

Quality Assessment and Normalization. (A) Ratio values (M = Log2(R/G), R = Red channel, G = Green channel) versus average values (A = Log2(R×G)/2) for one placenta sample. Dots represent spots in the microarray. Crosses correspond to control spots. Lines represent the tendency for each block (print-tip) in the microarray. (B) Control assay, two reference mRNA aliquots were hybridized changing the dye color only. Symbols as in (A). (C) Normalized data from (A). (D) Normalized data from (B). Control spots removed in (C) and (D).

Step 4: Detection of Differential Expressed Genes

Duplicated spots were averaged to generate a unique measure per gene per array. To detect differentially expressed genes, we used a one-sample t-test under the null hypothesis of no differential expression (mean ratio equal zero). Resulted P -values were adjusted for multiplicity tests using the False Discovery Rate (FDR) approach ( 18 , 57 ). Because of the small number of samples, we treated the replicated biological samples as independent for preliminary purposes only. The effect of this exercise is a slight underestimation of the variance in favor of more sensible results. We treated the replicated biological samples as independent to increase the level of confidence in the statistical tests. In addition, we limited the selection of differentially expressed genes to those that fulfill two conditions: firstly, genes whose FDR value is less than 0.10 (ten percent corresponding to raw P -values less than 0.0000118), and secondly, genes whose absolute fold expression is at least two. Using these criteria, 350 (out of 21,456) were selected. A subset of 205 genes is depicted in Figure 13 (see step 5).

figure 13

Genes differentially expressed in placenta compared with other tissues. (A) Heatmap showing the relative gene expression in placenta. Darker color means higher expression in placenta. Genes are ordered using a hierarchical clustering algorithm. (B) Heatmap showing the score in T1dbase corresponding to genes in (A). Darker colors represent more specific expression.

Step 5: Validation

To verify the process of selection, we made two comparisons. First, as negative control, we followed the same selection criteria for the control microarrays that made use of the reference sample in both channels. The result was that no genes match the criteria. Second, we performed a comparison using the Tissue Expression tool ( http://www.t1dbase.org/page/Tissue Home) from T1dbase ( 59 ). This tool makes use of Gene Expression Atlas ( 59 ), SAGEmap ( 60 ), and TissueInfo ( 58 ), integrating all measurements in a single score ( 58 ). This score, estimated for several tissues, represents whether the expression for a gene is tissue-specific. Scores closer to one are meant to be tissue-specific whereas scores closer to zero represents no-tissue-specificity. From the 350 genes resulted in Step 4, we selected only those that are included in this database. The result was 201 genes. Several genes that seem to be over-expressed in the placentas processed here (darker colors in Figure 13A ) shows consistently higher placenta-specific scores in T1dbase (darker colors in Figure 13B ). These results suggest that the experiment is coherent and valid.

Step 6: Analysis

Once genes have been selected, further computational, literature, and laboratory analyses are needed to confirm, expand, or restrain the results. Here, the analysis only dealt with comparing the results with T1dbase-Tissue Specific Expression Tool. However, queries to Gene Ontology, KEEG pathways, Pubmed, Blasts, or any other pertinent database resource should be considered a compulsory step.

Conclusions and Trends

DNA microarrays are a powerful, mature, versatile, and easy-to-use genomic tool that can be applied for biomedical and clinical research. The research community is expanding the use of this approach for novel applications. The main advantage is the genomic-wide information provided at reasonable costs. Biological interpretation however requires the integration of several sources of information. In this context, a new discipline referred as Systems Biology is emerging that integrates biological knowledge, clinical information, mathematical models, computer simulations, biological databases, imaging, and high-throughput “omic” technologies, such as microarray experiments. Therefore, multidisciplinary groups involving clinicians, biologists, statisticians, and, recently, bioinformaticians are being formed and expanded in all important research institutions. Subsequently, virtually all biology-related research areas are moving from merely describing cellular and molecular components in a qualitative manner, toward a more quantitative approach. These new teams are generating huge amounts of data and more convincing models to ultimately reveal hidden pieces in the biological puzzle. This new knowledge is having a crucial impact on the treatment of diseases, because, among other things, it individualizes subtypes of pathologies, disease risks, and survival, treatment, prognosis, and outcome, quickly moving biomedical research to the era of personalized medicine.

All supplementary materials are available online at molmed.org .

Golub TR et al. (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science . 286:531–7.

Article   CAS   Google Scholar  

van’t Veer LJ et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature . 415:530–6.

Article   Google Scholar  

Singh D et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 1:203–9.

Wang T et al. (2000) Identification of genes differentially over-expressed in lung squamous cell carcinoma using combination of cDNA subtraction and microarray analysis. Oncogene . 19: 1519–28.

Alon U et al. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. U. S. A. 96: 6745–50.

Ramaswamy S et al. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl Acad. Sci. U. S. A. 98:15149–54.

Brachat A, Pierrat B, Brungger A, Heim J. (2000) Comparative microarray analysis of gene expression during apoptosis-induction by growth factor deprivation or protein kinase C inhibition. Oncogene . 19:5073–82.

Bonner AE, Lemon WJ, You M. (2003) Gene expression signatures identify novel regulatory pathways during murine lung development: implications for lung tumorigenesis. J. Med. Gen. 40:408–17.

Brachat A et al. (2002) A microarray-based, integrated approach to identify novel regulators of cancer drug response and apoptosis. Oncogene . 21:8361–71.

Rhodes DR et al. (2004) Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl Acad. Sci. U. S. A. 101:9309–14.

Cutler DJ et al. (2001) High-throughput variation detection and genotyping using microarrays. Genome Res. 11:1913–1925.

Yan PS et al. (2001) Dissecting complex epigenetic alterations in breast cancer using CpG island microarrays. Cancer Res. 61: 8375–80.

CAS   PubMed   Google Scholar  

Pollack JR, Perou CM, Alizadeh AA, et al. (1999) Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23:41–6.

Relogio A et al. (2005) Alternative splicing microarrays reveal functional expression of neuron-specific regulators in Hodgkin lymphoma cells. J. Biol. Chem. 280: 4779–84.

Wang D et al. (2002) Microarray-based detection and genotyping of viral pathogens. Proc. Natl Acad. Sci. U. S. A. 99:15687–92.

Conejero-Goldberg C et al. (2005) Infectious pathogen detection arrays: viral detection in cell lines and postmortem brain tissue. Biotechniques . 39:741–51.

Fan JB, Chee MS, Gunderson KL. (2006) Highly parallel genomic assays. Nat. Rev. Genet. 7: 632–44.

Benjamini Y, Hochberg Y. (1995) Controlling the False Discovery Rate — a Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B. 57:289–300.

Google Scholar  

Storey JD, Tibshirani R. (2003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. U. S. A. 100:9440–5.

Yue H et al. (2001) An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucleic Acids Res. 29: E41–41.

Mutch DM, Berger A, Mansourian R, Rytz A, Roberts MA. (2001) Microarray data analysis: a practical approach for selecting differentially expressed genes. Genome Biol. 2: PREPRINT0009.

Kim SY, Lee JW, Sohn IS. (2006) Comparison of various statistical methods for identifying differential gene expression in replicated microarray data. Stat. Methods Med. Res. 15:3–20.

Li LP, Weinberg CR, Darden TA, Pedersen LG. (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics . 17:1131–42.

Ooi CH, Tan P. (2003) Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics . 19:37–44.

Sha NJ et al. (2004) Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60:812–9.

Trevino V, Falciani F. (2006) GALGO: an R package for multivariate variable selection using genetic algorithms. Bioinformatics . 22:1154–6.

Tibshirani R, Hastie T, Narasimhan B, Chu G. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl Acad. Sci. U. S. A. 99:6567–72.

Getz G, Levine E, Domany E. (2000) Coupled two-way clustering analysis of gene microarray data. Proc. Natl Acad. Sci. U. S. A. 97:12079–84.

Sheng Q, Moreau Y, Smet FD, Marchal K, Moor BD. (2005) Advances in Cluster Analysis of Microarray Data. In: Azuaje F, Dopazo J (eds.) Data analysis and visualization in genomics and proteomics. John Wiley, Hoboken, NJ, pp. 153–171.

Chapter   Google Scholar  

Vaquerizas JM et al. (2005) GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data. Nucleic Acids Res. 33: W616–20.

Saeed AI, Hagabati NK, Braisted JC, et al. (2006) TM4 microarray software suite. DNA Microarrays, Part B: Databases and Statistics 411:134–193.

Grewal A, Conway A. (2000) Tools for Analyzing Microarray Expression Data. Journal of Lab Automation 5:62–4.

Sturn A, Quackenbush J, Trajanoski Z. (2002) Genesis: cluster analysis of microarray data. Bioinformatics . 18:207–8.

Eisen MB, Spellman PT, Brown PO, Botstein D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. U. S. A. 95:14863–8.

Rosenwald A, Wright G, Chan WC, et al. (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346:1937–47.

Zhao HJ, Kim Y, Wang P, et al. (2005) Genome-wide characterization of gene expression variations and DNA copy number changes in prostate cancer cell lines. Prostate 63:187–197.

Braude I et al. (2006) Large scale copy number variation (CNV) at 14q12 is associated with the presence of genomic abnormalities in neoplasia. BMC Genomics . 7:138.

Bird AP. (1986) Cpg-Rich Islands and the Function of DNA-Methylation. Nature . 321:209–13.

Henikoff S, Matzke MA. (1997) Exploring and explaining epigenetic effects. Trends Genet. 13:293–5.

Laird PW. (2003) The power and the promise of DNA methylation markers. Nat. Rev. Cancer. 3: 253–66.

Schumacher A, Kapranov P, Kaminsky Z, et al. (2006) Microarray-based DNA methylation profiling: technology and applications. Nucleic Acids Res. 34:528–42.

Lodygin D, Epanchintsev A, Menssen A, Diebold J, Hermeking H. (2005) Functional epigenomics identifies genes frequently silenced in prostate cancer. Cancer Res. 65:4218–27.

Gebhard C et al. (2006) Genome-wide profiling of CpG methylation identifies novel targets of aberrant hypermethylation in myeloid leukemia. Cancer Res. 66:6118–28.

Shi H et al. (2006) Discovery of novel epigenetic markers in non-Hodgkin’s lymphoma. Carcinogenesis . 28:60–70.

Zhang D et al. (2006) Microarray-based molecular margin methylation pattern analysis in colorectal carcinoma. Anal. Biochem. 355:117–24.

Wei SH et al. (2006) Prognostic DNA methylation biomarkers in ovarian cancer. Clin. Cancer Res. 12:2788–94.

Piotrowski A et al. (2006) Microarray-based survey of CpG islands identifies concurrent hyperand hypomethylation patterns in tissues derived from patients with breast cancer. Genes Chromosomes Cancer . 45:656–67.

Syvanen AC. (2005) Toward genome-wide SNP genotyping. Nat. Genet. 37:S5–10.

Teh MT et al. (2005) Genomewide single nucleotide polymorphism microarray mapping in basal cell carcinomas unveils uniparental disomy as a key somatic event. Cancer Res. 65: 8597–603.

Hoque MO, Lee CC, Cairns P, Schoenberg M, Sidransky D. (2003) Genome-wide genetic characterization of bladder cancer: a comparison of high-density single-nucleotide polymorphism arrays and PCR-based microsatellite analysis. Cancer Res. 63:2216–22.

Dumur CI et al. (2003) Genome-wide detection of LOH in prostate cancer using human SNP microarray technology. Genomics . 81:260–9.

Moreno-Rocha JC, Revol de Mendoza A, Barrera-Saldana HA. (1999) Genetic transcription in eukaryotes: from transcriptional factors to disease. Rev. Invest. Clin. 51:375–84.

Gardner L, Lee LA, Dang CV. (2002) c-myc Protooncogene. In: Bertino JR (ed.) Encyclopedia of Cancer . Academic Press, San Diego, Calif., pp. 555–561.

Wu J, Smith LT, Plass C, Huang TH. (2006) ChIP-chip comes of age for genome-wide functional analysis. Cancer Res. 66:6899–902.

Beyer A et al. (2006) Integrated assessment and prediction of transcription factor binding. PLoS Comput. Biol. 2:e70.

Bolstad BM, Irizarry RA, Astrand M, Speed TP. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics . 19:185–93.

Barrera-Saldana HA, Robberson DL, Saunders GF. (1982) Transcriptional products of the human placental lactogen gene. J. Biol. Chem. 257: 12399–404.

Storey JD. (2002) A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B. 64:479–98.

Hulbert EM, Smink LJ, Adlem EC, et al. (2007) T1DBase: integration and presentation of complex data for type 1 diabetes research. Nucleic Acids Research 35:D742–D746.

Su AI et al. (2002) Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. U. S. A. 99:4465–70.

Lash AE et al. (2000) SAGEmap: a public gene expression resource. Genome Res. 10:1051–60.

Huminiecki L, Lloyd AT, Wolfe KH. (2003) Congruence of tissue expression profiles from Gene Expression Atlas, SAGEmap and TissueInfo databases. BMC Genomics . 4:31.

Brazma A et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29: 365–71.

Spellman PT et al. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3: RESEARCH0046.

Download references

Acknowledgments

HABS thanks the Staff of the Microarray Technology EMBO-INER Advanced Practical Course for enjoyable course lessons, materials and results; Peter Davies, Nancy and Greg Shipley of UT Medical School for additional laboratory training; Albert Sasson for critical reading of the manuscript and the offices of the Dean of his school and of the President of his University for support. Victor Trevino thanks Darwin Trust of Edinburgh and CONACyT for his PhD scholarship, and ITESM for support.

Author information

Authors and affiliations.

Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Nuevo León, México

Victor Trevino

School of Biosciences, University of Birmingham, Birmingham, UK

Victor Trevino & Francesco Falciani

Departamento de Bioquímica, Facultad de Medicina de la Universidad Autónoma de Nuevo León, Laboratorio de Genómica y Bioinformática del Unidad de Laboratorios de Ingeniería y Expresión Genética, Avenida. Madero y Eduardo Aguirre Pequeño, Colonia, Mitras Centro, 64460, Monterrey, Nuevo León, México

Hugo A. Barrera-Saldaña

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Hugo A. Barrera-Saldaña .

Additional information

Communicated by: Adolofo Martinez-Palomo

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Trevino, V., Falciani, F. & Barrera-Saldaña, H.A. DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research. Mol Med 13 , 527–541 (2007). https://doi.org/10.2119/2006-00107.Trevino

Download citation

Received : 06 December 2006

Accepted : 02 July 2007

Published : 01 September 2007

Issue Date : September 2007

DOI : https://doi.org/10.2119/2006-00107.Trevino

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Powerful Genomic Tools
  • Unmethylated Fraction
  • Formal Statistical Approach
  • Oligo-based Microarray
  • Slide Centrifugation

Molecular Medicine

ISSN: 1528-3658

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

dna microarray research paper

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 01 October 2004

DNA-microarray analysis of brain cancer: molecular classification for therapy

  • Paul S. Mischel 1 ,
  • Timothy F. Cloughesy 2 &
  • Stanley F. Nelson 3  

Nature Reviews Neuroscience volume  5 ,  pages 782–792 ( 2004 ) Cite this article

4887 Accesses

153 Citations

7 Altmetric

Metrics details

The genomic revolution is transforming clinical medicine — instead of the present model of population risk assessment and empirical treatment, we will move to one of predictive individualized care based on molecular classification and targeted therapy. This review highlights the role of DNA microarrays in developing predictive molecular diagnostics for patients with brain tumours.

The molecular events that are crucial for normal development and function are similar between individuals. However, in cancer, genetic and epigenetic alterations result in cascades of deregulated molecular events, which lead to genetically complex, highly individual tumours. Finding consistencies that can be therapeutically exploited is vital for the development of new treatments.

Brain cancer is now the leading cause of death from cancer in children under the age of 15 and the second leading cause of death from cancer from age 15 to 34. In adults, brain cancer is proportionately less common than other cancers, yet it accounts for a disproportionate percentage of deaths from cancer.

Primary brain tumours arise from the constituent cells of the CNS or their meningeal covering, whereas secondary brain tumours metastasize from a distant site. In 1928, Bailey and Cushing suggested that brain tumours could be classified by their microscopic resemblance to a presumed CNS cell of origin or its developmental precursor. Although recent work shows a more complex pattern, this model has remained a guiding principle for brain tumour classification.

New approaches are being developed to specifically target proteins or pathways that are altered in cancer cells. Morphologically identical tumours can be distinct in their mutational patterns, signalling-pathway alterations and gene-expression profiles, and, most importantly, in their response to a range of therapies.

Medulloblastomas have distinctive global gene-expression profiles that readily distinguish them from morphological mimics, and DNA microarrays can detect molecular subsets of medulloblastoma cases that differ in survival. Low-grade astrocytomas, oligodendrogliomas and glioblastomas also have distinctive global gene-expression profiles.

The fact that DNA microarrays can be used to detect molecular subsets that differ in survival indicates that it will soon be possible to develop gene-based predictors of therapeutic response. DNA microarrays might also facilitate the functional analysis of new anti-cancer compounds and the identification of novel biomarkers and molecular-imaging probes.

Cancer cells do not 'invent' new pathways; they use pre-existing pathways in different ways or they combine components of these pathways in a new fashion. By mapping, expanding and refining pathway maps in brain cancer, DNA-microarray studies might provide insight into the connectivity of these pathways in the developing and normally functioning brain.

It is possible to imagine a day in the not-too-distant future when serum biomarkers and molecular-imaging probes that are identified by DNA microarrays will be used for screening or early detection. Tumours will undergo microarray analysis to identify pathway alterations that point to the most beneficial therapy, and response to therapy will be monitored using molecular imaging probes and/or serum biomarkers.

Primary brain tumours are among the most lethal of all cancers, largely as a result of their lack of responsiveness to current therapy. Numerous new therapies hold great promise for the treatment of patients with brain cancer, but the main challenge is to determine which treatment is most likely to benefit an individual patient. DNA-microarray-based technologies, which allow simultaneous analysis of expression of thousands of genes, have already begun to uncover previously unrecognized patient subsets that differ in their survival. Here, we review the progress made so far in using DNA microarrays to optimize brain cancer therapy.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

176,64 € per year

only 14,72 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

dna microarray research paper

Similar content being viewed by others

dna microarray research paper

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

dna microarray research paper

Genome-wide CRISPR screens identify the YAP/TEAD axis as a driver of persister cells in EGFR mutant lung cancer

dna microarray research paper

Spatial transcriptomics reveals discrete tumour microenvironments and autocrine loops within ovarian cancer subclones

Slonim, D. K. From patterns to pathways: gene expression data analysis comes of age. Nature Genet. 32 (Suppl.), 502–508 (2002).

Article   CAS   PubMed   Google Scholar  

Barabasi, A. L. & Oltvai, Z. N. Network biology: understanding the cell's functional organization. Nature Rev. Genet. 5 , 101–113 (2004). This paper is an excellent review of the concept of network biology, and describes the quantitative tools that can be used to analyse networks.

Ideker, T. Systems biology 101 — what you need to know. Nature Biotechnol. 22 , 473–475 (2004).

Article   CAS   Google Scholar  

Legler, J. M. et al. RESPONSE: re: brain and other central nervous system cancers: recent trends in incidence and mortality. J. Natl Cancer Inst. 92 , 77–78 (2000).

Article   Google Scholar  

Mischel, P. S. & Cloughesy, T. F. Targeted molecular therapy of GBM. Brain Pathol. 13 , 52–61 (2003).

Article   PubMed   Google Scholar  

Bailey, P. & Cushing, H. A Classification of the Tumors of the Glioma Group on a Histogenic Basis with a Correlated Study of Prognosis (Lippincott, Philadelphia, 1928).

Google Scholar  

Lu, Q. R. et al. Common developmental requirement for Olig function indicates a motor neuron/oligodendrocyte connection. Cell 109 , 75–86 (2002).

Doetsch, F. The glial identity of neural stem cells. Nature Neurosci. 6 , 1127–1134 (2003).

Doetsch, F., Caille, I., Lim, D. A., Garcia-Verdugo, J. M. & Alvarez-Buylla, A. Subventricular zone astrocytes are neural stem cells in the adult mammalian brain. Cell 97 , 703–716 (1999).

Zhou, Q. & Anderson, D. J. The bHLH transcription factors OLIG2 and OLIG1 couple neuronal and glial subtype specification. Cell 109 , 61–73 (2002).

Hemmati, H. D. et al. Cancerous stem cells can arise from pediatric brain tumors. Proc. Natl Acad. Sci. USA 100 , 15178–15183 (2003).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bachoo, R. M. et al. Epidermal growth factor receptor and Ink4a/Arf: convergent mechanisms governing terminal differentiation and transformation along the neural stem cell to astrocyte axis. Cancer Cell 1 , 269–277 (2002).

Kleihues, P. et al. The WHO classification of tumors of the nervous system. J. Neuropathol. Exp. Neurol. 61 , 215–225 (2002).

Holland, E. C. et al. Combined activation of Ras and Akt in neural progenitors induces glioblastoma formation in mice. Nature Genet. 25 , 55–57 (2000).

Lee, Y. & McKinnon, P. J. DNA ligase IV suppresses medulloblastoma formation. Cancer Res. 62 , 6395–6399 (2002). In this paper, the authors compare the global gene-expression profiles of medulloblastomas that are derived from a set of genetically defined mouse crosses, thereby identifying the contribution of a number of genes, including PTC1 , LIG4 and p53 , in the development of medulloblastoma.

CAS   PubMed   Google Scholar  

Lee, Y. et al. A molecular fingerprint for medulloblastoma. Cancer Res. 63 , 5428–5437 (2003).

Weiner, H. L. et al. Induction of medulloblastomas in mice by sonic hedgehog, independent of Gli1. Cancer Res. 62 , 6385–6389 (2002).

Rao, G., Pedone, C. A., Coffin, C. M., Holland, E. C. & Fults, D. W. c-Myc enhances sonic hedgehog-induced medulloblastoma formation from nestin-expressing neural progenitors in mice. Neoplasia 5 , 198–204 (2003).

Rao, G. et al. Sonic hedgehog and insulin-like growth factor signaling synergize to induce medulloblastoma formation from nestin-expressing neural progenitors in mice. Oncogene 23 , 6156–6162 (2004).

Choe, G. et al. Analysis of the phosphatidylinositol 3'-kinase signaling pathway in glioblastoma patients in vivo . Cancer Res. 63 , 2742–2746 (2003).

Ermoian, R. P. et al. Dysregulation of PTEN and protein kinase B is associated with glioma histology and patient survival. Clin. Cancer Res. 8 , 1100–1106 (2002).

Vivanco, I. & Sawyers, C. L. The phosphatidylinositol 3-kinase AKT pathway in human cancer. Nature Rev. Cancer 2 , 489–501 (2002).

Berman, D. M. et al. Medulloblastoma growth inhibition by hedgehog pathway blockade. Science 297 , 1559–1561 (2002). This paper was crucial in defining the role of the hedgehog pathway in the genesis of medulloblastoma, and in identifying inhibition of the hedgehog pathway as a potential therapy.

Wechsler-Reya, R. & Scott, M. P. The developmental biology of brain tumors. Annu. Rev. Neurosci. 24 , 385–428 (2001). This outstanding review highlights the genetic mechanisms that are known to be involved in medulloblastoma, as well as other paediatric brain tumours.

Ramaswamy, S. Translating cancer genomics into clinical oncology. N. Engl. J. Med. 350 , 1814–1816 (2004).

Liotta, L. A. et al. Protein microarrays: meeting analytical challenges for clinical applications. Cancer Cell 3 , 317–325 (2003).

Lindblad-Toh, K. et al. Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays. Nature Biotechnol. 18 , 1001–1005 (2000).

Hoque, M. O., Lee, C. C., Cairns, P., Schoenberg, M. & Sidransky, D. Genome-wide genetic characterization of bladder cancer: a comparison of high-density single-nucleotide polymorphism arrays and PCR-based microsatellite analysis. Cancer Res. 63 , 2216–2222 (2003).

Albertson, D. G. & Pinkel, D. Genomic microarrays in human genetic disease and cancer. Hum. Mol. Genet. 12 , R145–R152 (2003).

Shi, H. et al. Triple analysis of the cancer epigenome: an integrated microarray system for assessing gene expression, DNA methylation, and histone acetylation. Cancer Res. 63 , 2164–2171 (2003).

Yeakley, J. M. et al. Profiling alternative splicing on fiber-optic arrays. Nature Biotechnol. 20 , 353–358 (2002).

Mischel, P. S., Nelson, S. F. & Cloughesy, T. F. Molecular analysis of glioblastoma: pathway profiling and its implications for patient therapy. Cancer Biol. Ther. 2 , 242–247 (2003). This review provides an overview of the gene-expression and signal-transduction alterations in glioblastoma and suggests potential therapeutic strategies.

Druker, B. J. Perspectives on the development of a molecularly targeted agent. Cancer Cell 1 , 31–36 (2002).

Sawyers, C. L. Disabling Abl-perspectives on Abl kinase regulation and cancer therapeutics. Cancer Cell 1 , 13–15 (2002).

Betensky, R. A., Louis, D. N. & Cairncross, J. G. Influence of unrecognized molecular heterogeneity on randomized clinical trials. J. Clin. Oncol. 20 , 2495–2499 (2002).

Bianco, R. et al. Loss of PTEN/MMAC1/TEP in EGF receptor-expressing tumor cells counteracts the antitumor action of EGFR tyrosine kinase inhibitors. Oncogene 22 , 2812–2822 (2003).

Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286 , 531–537 (1999).

Armstrong, S. A. et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genet. 30 , 41–47 (2002).

Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406 , 747–752 (2000).

Sorlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl Acad. Sci. USA 98 , 10869–10874 (2001).

Sorlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl Acad. Sci. USA 100 , 8418–8423 (2003).

Ramaswamy, S. et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl Acad. Sci. USA 98 , 15149–15154 (2001).

Shia, J. et al. Value of histopathology in predicting microsatellite instability in hereditary nonpolyposis colorectal cancer and sporadic colorectal cancer. Am. J. Surg. Pathol. 27 , 1407–1417 (2003).

Cardiff, R. D. et al. Validation: the new challenge for pathology. Toxicol. Pathol. 32 , 31–39 (2004).

Boorman, G. A. et al. Toxicogenomics, drug discovery, and the pathologist. Toxicol. Pathol. 30 , 15–27 (2002).

Dhanasekaran, S. M. et al. Delineation of prognostic biomarkers in prostate cancer. Nature 412 , 822–826 (2001).

Shappell, S. B. et al. Prostate pathology of genetically engineered mice: definitions and classification. The consensus report from the Bar Harbor meeting of the Mouse Models of Human Cancer Consortium Prostate Pathology Committee. Cancer Res. 64 , 2270–2305 (2004).

Lossos, I. S. et al. Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. N. Engl. J. Med. 350 , 1828–1837 (2004).

Bullinger, L. et al. Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N. Engl. J. Med. 350 , 1605–1616 (2004).

Valk, P. J. et al. Prognostically useful gene-expression profiles in acute myeloid leukemia. N. Engl. J. Med. 350 , 1617–1628 (2004).

Rosenwald, A. et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346 , 1937–1947 (2002).

Glinsky, G. V., Higashiyama, T. & Glinskii, A. B. Classification of human breast cancer using gene expression profiling as a component of the survival predictor algorithm. Clin. Cancer Res. 10 , 2272–2283 (2004).

Glinsky, G. V., Glinskii, A. B., Stephenson, A. J., Hoffman, R. M. & Gerald, W. L. Gene expression profiling predicts clinical outcome of prostate cancer. J. Clin. Invest. 113 , 913–923 (2004).

Hedenfalk, I. et al. Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proc. Natl Acad. Sci. USA 100 , 2532–2537 (2003).

van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347 , 1999–2009 (2002).

Lapointe, J. et al. Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc. Natl Acad. Sci. USA 101 , 811–816 (2004).

Lynch, T. J. et al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. N. Engl. J. Med. 350 , 2129–2139 (2004).

Paez, J. G. et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science 304 , 1497–1500 (2004).

Pomeroy, S. L. et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415 , 436–442 (2002). This important paper showed that medulloblastomas can be readily distinguished from other brain tumours, including morphological mimics, on the basis of gene-expression profiling. It also showed that a relatively small number of genes could accurately predict patient survival and response to therapy.

Fernandez-Teijeiro, A. et al. Combining gene expression profiles and clinical parameters for risk stratification in medulloblastomas. J. Clin. Oncol. 22 , 994–998 (2004). In a logical continuation of the work described in reference 59, the authors showed that gene-expression data can predict the outcome for patients with medulloblastoma, independent of clinical variables.

MacDonald, T. J. et al. Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nature Genet. 29 , 143–152 (2001). In this paper, the authors used cDNA-microarray analysis to identify a potentially crucial and therapeutically targetable pathway that might promote metastasis in medulloblastoma.

Oliver, T. G. et al. Transcriptional profiling of the sonic hedgehog response: a critical role for N-myc in proliferation of neuronal precursors. Proc. Natl Acad. Sci. USA 100 , 7331–7336 (2003).

Shai, R. et al. Gene expression profiling identifies molecular subtypes of gliomas. Oncogene 22 , 4918–2493 (2003).

Rickman, D. S. et al. Distinctive molecular profiles of high-grade and low-grade gliomas based on oligonucleotide microarray analysis. Cancer Res. 61 , 6885–6891 (2001).

Sallinen, S. L. et al. Identification of differentially expressed genes in human gliomas by DNA microarray and tissue chip techniques. Cancer Res. 60 , 6617–6622 (2000).

Huang, H. et al. Gene expression profiling of low-grade diffuse astrocytomas by cDNA arrays. Cancer Res. 60 , 6868–6874 (2000).

Fuller, G. N. et al. Reactivation of insulin-like growth factor binding protein 2 expression in glioblastoma multiforme: a revelation by parallel gene expression profiling. Cancer Res. 59 , 4228–4232 (1999).

Godard, S. et al. Classification of human astrocytic gliomas on the basis of gene expression: a correlated group of genes with angiogenic activity emerges as a strong predictor of subtypes. Cancer Res. 63 , 6613–6625 (2003).

Khatua, S. et al. Overexpression of the EGFR/FKBP12/HIF-2α pathway identified in childhood astrocytomas by angiogenesis gene profiling. Cancer Res. 63 , 1865–1870 (2003).

Nutt, C. L. et al. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63 , 1602–1607 (2003). This paper showed that cDNA-microarray technology could potentially be used to address diagnostically confusing gliomas, and that transcriptional information contains more data about outcome than does pathological examination.

Mischel, P. S. et al. Identification of molecular subtypes of glioblastoma by gene expression profiling. Oncogene 22 , 2361–2373 (2003). This paper highlights the potential of cDNA microarrays to detect molecular subsets of morphologically identical glioblastomas.

Freije, W. A. et al. Gene expression profiling of gliomas strongly predicts survival. Cancer Res. [In the press]. In this paper, the authors demonstrate that gene-expression-based grouping of malignant gliomas is a more powerful predictor of survival than pathological type, grade or age.

Scherf, U. et al. A gene expression database for the molecular pharmacology of cancer. Nature Genet. 24 , 236–244 (2000).

Staunton, J. E. et al. Chemosensitivity prediction by transcriptional profiling. Proc. Natl Acad. Sci. USA 98 , 10787–10792 (2001).

Wallqvist, A. et al. Mining the NCI screening database: explorations of agents involved in cell cycle regulation. Prog. Cell Cycle Res. 5 , 173–179 (2003).

PubMed   Google Scholar  

Butte, A. J., Tamayo, P., Slonim, D., Golub, T. R. & Kohane, I. S. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl Acad. Sci. USA 97 , 12182–12186 (2000).

Bohen, S. P. et al. Variation in gene expression patterns in follicular lymphoma and the response to rituximab. Proc. Natl Acad. Sci. USA 100 , 1926–1930 (2003).

Sotiriou, C. et al. Gene expression profiles derived from fine needle aspiration correlate with response to systemic chemotherapy in breast cancer. Breast Cancer Res. 4 , R3 (2002).

Article   PubMed   PubMed Central   Google Scholar  

Cheok, M. H. et al. Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nature Genet. 34 , 85–90 (2003).

Stegmaier, K. et al. Gene expression-based high-throughput screening (GE-HTS) and application to leukemia differentiation. Nature Genet. 36 , 257–263 (2004).

Reifenberger, G. & Louis, D. N. Oligodendroglioma: toward molecular definitions in diagnostic neuro-oncology. J. Neuropathol. Exp. Neurol. 62 , 111–126 (2003).

Gajjar, A. et al. Clinical, histopathologic, and molecular markers of prognosis: toward a new disease risk stratification system for medulloblastoma. J. Clin. Oncol. 22 , 984–993 (2004).

Liu, E. T. & Karuturi, K. R. Microarrays and clinical investigations. N. Engl. J. Med. 350 , 1595–1597 (2004).

Diehn, M., Eisen, M. B., Botstein, D. & Brown, P. O. Large-scale identification of secreted and membrane-associated gene products using DNA microarrays. Nature Genet. 25 , 58–62 (2000).

Tanwar, M. K., Gilbert, M. R. & Holland, E. C. Gene expression microarray analysis reveals YKL-40 to be a potential serum marker for malignant character in human glioma. Cancer Res. 62 , 4364–4368 (2002).

Herschman, H. R. Molecular imaging: looking at problems, seeing solutions. Science 302 , 605–608 (2003).

Bergmann, S., Ihmels, J. & Barkai, N. Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2 , E9 (2004).

Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302 , 249–255 (2003).

Chang, H. Y. et al. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol. 2 , E7 (2004).

Article   PubMed   PubMed Central   CAS   Google Scholar  

Whitfield, M. L. et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell 13 , 1977–2000 (2002).

Ficenec, D. et al. Computational knowledge integration in biopharmaceutical research. Brief. Bioinform. 4 , 260–278 (2003).

Buckingham, S. Bioinformatics: programmed for success. Nature 425 , 209–215 (2003).

Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13 , 2498–2504 (2003).

Angelastro, J. M. et al. Identification of diverse nerve growth factor-regulated genes by serial analysis of gene expression (SAGE) profiling. Proc. Natl Acad. Sci. USA 97 , 10424–10429 (2000).

Liu, D. X. & Greene, L. A. Regulation of neuronal survival and death by E2F-dependent gene repression and derepression. Neuron 32 , 425–438 (2001).

Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W. From molecular to modular cell biology. Nature 402 (Suppl.), C47–C52 (1999).

Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. & Barabasi, A. L. The large-scale organization of metabolic networks. Nature 407 , 651–654 (2000).

Alon, U. Biological networks: the tinkerer as an engineer. Science 301 , 1866–1867 (2003).

Agrawal, H. Extreme self-organization in networks constructed from gene expression data. Phys. Rev. Lett. 89 , 268702 (2002).

Article   PubMed   CAS   Google Scholar  

Chen, J. et al. The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface. Nucleic Acids Res. 32 , D578–D581 (2004).

Ramaswamy, S. & Golub, T. R. DNA microarrays in clinical oncology. J. Clin. Oncol. 20 , 1932–1941 (2002).

Download references

Acknowledgements

We wish to thank S. Horvath, M. Carlson, W. Freije and Z. Fang for their contribution to this work, and we thank D. Geschwind and H. Kornblum for helpful discussions about this review. The authors are supported by the NINDS and NCI and by Accelerate Brain Cancer Cure, the Packard Foundation, the Harry Allgauer Foundation through The Doris R. Ullmann Fund for Brain Tumor Research Technologies, the Henry E. Singleton Brain Tumor Endowment, Art of the Brain and the Ziering Family Foundation in memory of Sigi Ziering.

Author information

Authors and affiliations.

Departments of Pathology and Laboratory Medicine, the Henry E. Singleton Brain Cancer Research Program at the David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90095, California, USA

Paul S. Mischel

Department of Neurology, the Henry E. Singleton Brain Cancer Research Program at the David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90095, California, USA

Timothy F. Cloughesy

Department of Human Genetics, the Henry E. Singleton Brain Cancer Research Program at the David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90095, California, USA

Stanley F. Nelson

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Paul S. Mischel .

Ethics declarations

Competing interests.

The authors declare no competing financial interests.

Related links

Entrez gene, further information, encyclopedia of life sciences.

Brain Cancers

Mischel homepage

Cloughesy homepage

Nelson homepage

National Cancer Institute

Any brain tumour that originates from the glial cell lineage.

The spread of cancer cells from one organ or tissue to another, usually though the blood stream or the lymphatic system.

Bi-allelic (typically) base pair substitutions, which are the most common forms of genetic polymorphism.

During splicing, introns are excised from RNA after transcription and the cut ends are rejoined to form a continuous message. Alternative splicing allows the production of different messages from the same DNA molecule.

A term that refers to the growth of dense fibrous tissue around a tumour.

Complementary DNA that is produced from an RNA template by an RNA-dependent DNA polymerase.

The movement of cells in response to a chemical gradient that is provided by chemotactic agents.

Reverse transcriptase–polymerase chain reaction (PCR) — a reaction in which messenger RNA is converted into DNA (reverse transcription), which is then amplified by PCR.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Mischel, P., Cloughesy, T. & Nelson, S. DNA-microarray analysis of brain cancer: molecular classification for therapy. Nat Rev Neurosci 5 , 782–792 (2004). https://doi.org/10.1038/nrn1518

Download citation

Issue Date : 01 October 2004

DOI : https://doi.org/10.1038/nrn1518

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Znf179 induces differentiation and growth arrest of human primary glioblastoma multiforme in a p53-dependent cell cycle pathway.

  • Kuen-Haur Lee
  • Chi-Long Chen
  • Chi-Chen Huang

Scientific Reports (2017)

Comparison of Gene Expression Profile Between Tumor Tissue and Adjacent Non-tumor Tissue in Patients with Gastric Gastrointestinal Stromal Tumor (GIST)

Cell Biochemistry and Biophysics (2015)

Improving reliability and absolute quantification of human brain microarray data by filtering and scaling probes using RNA-Seq

  • Jeremy A Miller
  • Vilas Menon
  • Mike J Hawrylycz

BMC Genomics (2014)

RETRACTED ARTICLE: Ranking candidate genes of esophageal squamous cell carcinomas based on differentially expressed genes and the topological properties of the co-expression network

  • Yuzhou Shen
  • Jicheng Tantai

European Journal of Medical Research (2014)

Genome-wide comparison of paired fresh frozen and formalin-fixed paraffin-embedded gliomas by custom BAC and oligonucleotide array comparative genomic hybridization: facilitating analysis of archival gliomas

  • Gayatry Mohapatra
  • David A. Engler
  • David N. Louis

Acta Neuropathologica (2011)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

dna microarray research paper

SYSTEMATIC REVIEW article

Deep learning techniques for cancer classification using microarray gene expression data.

Surbhi Gupta,

  • 1 Department of Computer Science and Engineering Department, SMVDU, Jammu, India
  • 2 Model Institute of Engineering and Technology, Jammu, India
  • 3 School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India

Cancer is one of the top causes of death globally. Recently, microarray gene expression data has been used to aid in cancer’s effective and early detection. The use of DNA microarray technology to uncover information from the expression levels of thousands of genes has enormous promise. The DNA microarray technique can determine the levels of thousands of genes simultaneously in a single experiment. The analysis of gene expression is critical in many disciplines of biological study to obtain the necessary information. This study analyses all the research studies focused on optimizing gene selection for cancer detection using artificial intelligence. One of the most challenging issues is figuring out how to extract meaningful information from massive databases. Deep Learning architectures have performed efficiently in numerous sectors and are used to diagnose many other chronic diseases and to assist physicians in making medical decisions. In this study, we have evaluated the results of different optimizers on a RNA sequence dataset. The Deep learning algorithm proposed in the study classifies five different forms of cancer, including kidney renal clear cell carcinoma (KIRC), Breast Invasive Carcinoma (BRCA), lung adenocarcinoma (LUAD), Prostate Adenocarcinoma (PRAD) and Colon Adenocarcinoma (COAD). The performance of different optimizers like Stochastic gradient descent (SGD), Root Mean Squared Propagation (RMSProp), Adaptive Gradient Optimizer (AdaGrad), and Adaptive Momentum (AdaM). The experimental results gathered on the dataset affirm that AdaGrad and Adam. Also, the performance analysis has been done using different learning rates and decay rates. This study discusses current advancements in deep learning-based gene expression data analysis using optimized feature selection methods.

1 Introduction

Cancer is one of the deadliest diseases, and with its increasing prevalence, early identification and treatment are critical ( Sung et al., 2021 ) ( Schiff et al, 2007 ; Reid et al, 2011 ). Lung cancer cases have been surpassed by female breast cancer cases and are one of the most often detected forms of cancer. Figure 1 shows the cancer cases and deaths in 2020.

www.frontiersin.org

FIGURE 1 . Cancer cases and deaths in 2020.

About two-third of cases are detected at initial stages ( Fotouhi et al, 2019 ; Id et al., 2021 , Kashyap et al, 2022 ). The classification and identification of gene expression using DNA microarray data is an effective tool for cancer diagnosis and prognosis for specific cancer subtypes. AI-based learning algorithms are vital tools and the most often used way to achieve significant features of gene expression data and play an essential part in gene categorization. This article will give a review of some of those strategies from the literature and information on the various datasets on which these techniques are applied and their associated benefits and drawbacks. The most classic variants of deep learning, such as Convolution Neural Networks, Artificial Neural Networks, and Autoencoders, have been established as essential tools for clinical oncology research and can be used to drive decision-making regarding disease diagnosis and therapy. As time passes, sickness in general, and cancer in particular, grow increasingly complex and challenging to identify, analyze, and treat. Cancer research is a prominent topic of study in the medical world.

1.1 Distribution of articles

The selected articles for analysis have been published in last 5-years. Most of the research articles explored in this study have been published in 2018 and 2019. The articles that have explored gene expression data for cancer diagnosis/survival/stage prediction have been included in this study. Figure 2 presents the year-wise distribution of articles.

www.frontiersin.org

FIGURE 2 . Year-wise Distribution of articles.

1.2 Contributions of study

The study contributes in a number of ways. Following are the significant contributions made by the study:

• This article reviews recent developments in deep learning-based feature selection techniques for gene expression data interpretation and offers an extensive review of Deep Learning architectures that have demonstrated success across a wide range of industries and are now used to help doctors identify various chronic conditions.

• In this work, we have compared the outcomes of several optimizers on a dataset of RNA sequences. The study’s deep learning system categorizes five types of cancer: colon cancer, lung adenocarcinoma, prostate cancer, invasive breast carcinoma, and kidney clear cell carcinoma (COAD).

• The efficiency of several optimizers, including adaptive gradient optimization (AdaGrad), stochastic gradient descent (SGD), root mean square propagation (RMSProp), as well as adaptive momentum (Adam). AdaGrad and Adam are more precise, according to the experimental findings discovered in the dataset. The performance of a variety of learning and decay rates was explored in the performance study.

1.3 Organization of paper

This paper is organized in a way that boosts the comprehensibility of the article. Second section gives the description of the significance of gene-expression analysis in cancer research. Section 2 gives description of search strategy used to select the articles for this study. Further Section 3 presents an overview of deep learning approaches where conventional approaches are discussed. Section 4 illustrates the importance of deep learning techniques in Cancer Prediction. Further, Section 5 embraces the literature of recent studies that have explored the deep learning strategies for gene section or survival prediction from microarray gene expression datasets. The article is discussed and concluded in Section 6 and Section 7, respectively. This study reviews and presents a comparative analysis of the previous studies. This article aims to analyze the concepts underlying deep learning-based classification algorithms used in healthcare.

2 Search strategy

The search strategy used in this paper is Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) strategy. All the research studies selected for this systematic review have been extracted from databases like PubMed, Web of Science, EBSCO, and EMBASE. All the research articles that have been published before 2016 are excluded from the analysis. The keywords used for extraction of articles include “Deep Learning”, “Artificial Intelligence”, “Cancer”, “Micro-array analysis”, “gene-expression”, and combination of these keywords. The research articles that have focused on the optimization of gene selection using deep learning techniques have been included in the study. Figure 3 shows the PRISMA strategy flowchart.

www.frontiersin.org

FIGURE 3 . Prisma search strategy.

3 Deep learning

The Artificial intelligence is the idea of making innovative and intelligent machines. Machine learning is an artificial intelligence subset that aids in developing AI-driven applications. Deep learning is a subtype of machine learning that trains a model using large amounts of data and advanced methods. Figure 4 shows hierarchy of AI, Machine Learning, Deep Learning.

www.frontiersin.org

FIGURE 4 . Artificial intelligence and sub-parts.

www.frontiersin.org

FIGURE 5 . Artificial intelligence.

www.frontiersin.org

FIGURE 6 . Convolutional neural network. • Long Short-Term Memory (LSTM) Network: Hochreiter and Schimdhuber collaborated to create the LSTM ( Lecun et al, 2015 ), which is utilized in various applications. LSTMs were chosen by IBM primarily for voice recognition. The LSTM employs a memory unit known as a cell that may retain its value for an extended period and aids the device in remembering the most recent computed value. The memory unit, also known as a cell, comprises three gates that regulate the movement of data inside the unit. Figure 7 shows the logical structure of a LSTM model.

www.frontiersin.org

FIGURE 7 . Long short-term memory.• The input port, also known as the gate, controls new data flow into the memory. • The forget gate forgets the irrelevant/unnecessary information. • The third port must regulate the information stored as output.

The significant differences between deep learning approaches and traditional learning are summarized in Table 1 .

www.frontiersin.org

TABLE 1 . Distinction between deep and traditional learning.

• Artificial Neural Networks: One of the most often used data modeling algorithms in medicine is neural networks. In the early 20th century, neural networks were developed ( Daoud and Mayo, 2019 ). The primary goal of employing neural networks is to recognize patterns and conduct classification tasks. A human brain is used to represent the neural network system. The human brain is made up of millions of neurons that are all linked together. Figure 5 shows the representation of an artificial neural network.

Similarly, a neural network represents multiple neurons with a weight assigned to each link. These neurons act in parallel. During the learning stage, the network updates the weights for prediction of proper input to produce the output function ( Gupta and Gupta, 2021b ). Different optimization tasks are done by neural networks using different optimization techniques. Sigmoid optimization is mathematically given in Equation 1 .

The mathematical working of Hyperbolic Tangent ( Tanh ) optimization technique is given in Equation 2 .

The working of Rectilinear Unit ( Relu ) optimization technique is expressed in Eq. 3 .

Because of its adaptive character, altering the weights aids in the minimization of error. In contrast to basic modeling methods, neural networks have the advantage of predicting non-linear relationships. In the study of medical data, neural networks play a significant role such as medication development. The use of a neural network to predict cardiac disease is possible.

• Convolutional Neural Network (CNN): CNN is a multi-layer neural network based on the visual brain of animals. LeCun et al. constructed the first CNN. CNN’s major application areas include image processing and character recognition ( Akkus et al, 2017 ; Zahras, 2018 ). In terms of construction, the initial layer recognizes features, however the intermediate layer recombines features to produce high-level input characteristics, followed by classification. The collected characteristics will then be pooled, which reduces their dimensionality. Convolution and pooling are the following steps, which are then put into a fully connected multi-layer perceptron. The last layer, known as the output layer, recognizes the image’s characteristics using back-propagation techniques ( Gupta and Gupta, 2021a ). Because of its unique properties, such as local connection and shared weights, CNN increases the system’s accuracy and performance. It outperforms all other deep learning techniques. In comparison to other types of architecture, it is the most often utilized. Figure 6 shows a convolutional neural network.

The cell’s weight can be utilized as a regulating factor. There is a requirement for a training approach known as Backpropagation through time (BPTT) that improves weight. For optimization, the technique requires network output error.

4 Deep learning in cancer prediction

Deep learning has been widely utilized to improve prognosis ( Huang et al., 2020 ). Gene expression profiles, which describe the molecular state, offer enormous promise as a medical diagnostic tool. However, current training data sets have a minimal sample size for classification compared to the number of genes involved, and these training data constraints challenge specific classification techniques. One of the most important new clinical applications of microarray data is abnormality detection. Because of the high dimensionality, gene selection is a crucial step in enhancing the classification performance of expression data. As a result, better approaches for selecting functional genes for cancer prediction and detection are required. Microarray studies yield a massive quantity of gene-expression information from a single sample. The quantity of gene-expressions (features) to cases (samples) ratio is highly skewed, resulting in the well-known curse-of-dimensionality issue. In a single experiment, microarray technology generates hundreds of gene expressions. However, comparing the quantity of characteristics, the quantity of samples/patients is significantly lower (up to a few hundred) (several thousand). The limited number of samples (training data) provided is insufficient to create an efficient model from the given data. This is referred to as data scarcity.

Processing microarray gene expression data is a diverse field of computer science that includes graph analysis, machine learning, clustering, and classification. Microarray technology allows for the measurement of thousands of gene expressions in a single experiment. Gene expression levels aid in identifying linked genes and disease development, which aids in the early detection and prognosis of many forms of cancer.

5 Literature work

Using microarray gene expression patterns ( Dwivedi, 2016 ), develop a framework of supervised machine learning approaches for discriminating acute lymphoblastic leukemia from acute myeloid leukemia. This classification was accomplished using an artificial neural network (ANN) ( Tumuluru and Ravi, 2017 ). Using microarray gene expression patterns develop a framework of supervised machine learning approaches for discriminating acute lymphoblastic leukemia from acute myeloid leukemia. This classification was accomplished using an artificial neural network (ANN). In 2020, prostate cancer ( Surbhi Gupta, 2021 ) was predicted using Multi-layer perceptrons and explored multiple data balancing techniques. Another recent study in 2021 ( Gupta and Gupta, 2021b ) predicted mesothelioma with 96% accuracy using ANN ( Tumuluru and Ravi, 2017 ). presented an approach for cancer categorization based on gene-expression data. The logarithmic transformation pre-processed the gene expression data to reduce the classification’s complexity, while the Bhattacharya distance identified the most informative genes. The weight update in Deep Belief Neural Networks has estimated the average error using GOA and Gradient Descent.

The experimentation with colon and leukemia data demonstrates the proposed cancer classification’s efficacy. The accuracy rate of the proposed classification approach employing gene expression data is 0.9534, and 0.9666 detection rate.

Despite decades of research, clinical diagnosis of cancer and the identification of tumor-specific markers remain unknown ( Danaee et al., 2017 ). offered a deep learning technique for cancer detection and identifying critical genes for breast cancer diagnosis using autoencoders. The error rates are computed using log loss function given in Equation 4 .

In the above equation, J ( k ) and L ( m ) represent prediction and target values ( Cho et al., 2018 ). applied automated learning to search for survival-specific gene mutations in patients with lung adenocarcinoma (LUAD) using data from TCGA. Distinct feature selection methods were utilized to find survival-specific mutations in response to particular clinical variables. Kaplan-Meier survival analysis was performed on the extracted LUAD survival-specific mutations individually or in groups. Patient death was strongly associated with mutations in MMRN2 and GMPPA, whereas patient survival was associated with mutations in ZNF560 and SETX. In addition, DNAJC2 and MMRN2 mutations were associated with a substantial negative correlation with overall survival, but ZNF560 mutations were associated with a significant positive correlation with overall survival ( Lin et al, 2018 ). tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers.

A retrospective study (Lin et al., 2018) investigated the use of Deep Learning (DL) to predict acute myeloid leukemia (AML) prognosis. This study used 94 AML cases from the TCGA database. Age, ten common cytogenetic mutations, and the 23 most common mutations have been used as input data. Also, the results suggested feasible applications of deep learning (DL) in the prognostic prediction utilizing next-generation sequencing (NGS) data as proof-of-concept research.

Research work ( Parvathavardhini and Manju, 2020 ) proposed a Neuro-Fuzzy approach for interpreting gene-expression data from microarray experiments. The analysis enabled the detection and classification of cancer, hence facilitating treatment selection and development. The proposed strategy was evaluated against three publicly available datasets of cancer gene expression. Also ( Sevakula et al, 2018 ), proposed a cancer-verification transfer learning process in combination with autoencoders. The cross entropy function is used for optimizing the neural models. The cross entropy ( C E ) is calculated using Equation 5 .

The term X i denotes the probability for i th instance and Y i represents all the truth values for k instances. The algorithm’s performance was evaluated on the GEMLeR repository dataset, and hence has significant implications for precision medicine.

( Xu et al., 2019b ) employed numerous computational methods for classifying cancer subtypes have been presented. However, the majority of them create the model only using gene expression data. 2019 ( Huynh et al, 2019 ). proposed a new support vector machine (SVM) classification model for gene expression based on features collected from a deep convolutional neural network (DCNN). The Equation 6 illustrates the working of CNN.

Here a a n d   b denote the input data and kernel respectively. Also, [ x , y ] denote the row and column indexes of resultant matrix

Nonetheless, it is characterized by highly high-dimensional data, which results in an over-fitting problem for the classifying model ( Lin et al, 2018 ). purposed a novel way for incorporating deep learning into an ensemble approach that included numerous machine learning models. First, the study provided valuable gene data to five distinct categorization models using differential gene expression analysis. Then outputs of the five classifiers are then combined using a deep learning algorithm.

Significant bioinformatics research ( Shon et al, 2021 ) has been undertaken in cancer research, and bioinformatics methodologies may aid in developing methods and models for early prediction of stomach cancer. This study aimed to build a CNN algorithm to analyze TCGA data. This study merged RNA-seq, and clinical data looked for and assessed potential genes employing the CNN model. In addition, this study performed learning and evaluated the status of cancer patients. The proposed model acquired an accuracy of 95.96 percent and a critical status accuracy of 50.51 percent. Despite overfitting due to the small sample size, reasonably accurate results for the sample type were achieved. This method can be used to forecast the diagnosis of stomach cancer, which comes in various forms and has a variety of underlying causes.

( Gupta and manoj, 2021 ) discovered that group algorithms for chronic disease diagnosis could be more effective than baseline algorithms. Additionally, it outlines many impediments to furthering the use of machine learning classification to detect illness. The proposed strategy achieved 98.5, 99, and 100% accuracy in this study. The disease datasets used in the study includes Diabetes, Cardiovascular Disease, and Breast Cancer. The algorithms used for the disease prediction are Group Algorithms, Stacked, and Neural Network.

( Abdollahi et al, 2021 ) proposed a novel strategy for reducing the number of features by utilizing an autoencoder. Each gene’s weight is determined as a consequence of our autoencoder model. The weights indicate the magnitude of each gene’s effect on survival probability. Our approach enhances survival analysis by speeding up the procedure, increasing prediction accuracy, and decreasing the calculated survival probability’s error rate. The error rates are computed using root mean squared error ( RMSE ). The mathematical formula of RMSE is given in Equation 7 ) where A and O represent actual and observed values respectively.

5.1 Comparative analysis

Multiple studies aimed to investigate cancer prediction models. Table 2 presents the research analysis table.

www.frontiersin.org

TABLE 2 . Research analysis.

6 Experimental results

This section holds the simulation results achieved using ANN model along with multiple optimizers like Stochastic gradient descent (SGD), Root Mean Squared Propagation (RMSProp), Adaptive Gradient Optimizer (AdaGrad), and Adaptive Momentum (AdaM). Also, the performance analysis has been done using different learning rates and decay rates.

6.1 Dataset analysis

TCGA dataset is available at https://archive.ics.uci.edu/ml/datasets/gene + expression + cancer + RNA-Seq . This dataset comprises data on five different forms of cancer, including kidney renal clear cell carcinoma ( KIRC ), Breast Invasive Carcinoma ( BRCA ), lung adenocarcinoma (LUAD ), Prostate Adenocarcinoma ( PRAD ) and Colon Adenocarcinoma ( COAD ). The dataset consists of 20,531 attributes of 801 patients.

6.2 Optimization with multiple optimizers

The performance of multiple optimizers is analyzed and shown in Figure 8 . From Figure 8 , it is clear that both “Adam” and “Adagrad” performed the best on training and testing data.

www.frontiersin.org

FIGURE 8 . Accuracy of multiple optimizers.

The ANN model using SGD and rmsprop optimizer attained 35.3% on training data and 43.8% on test data. Both the Adam approaches performed well. Hence, we considered analyzing the performance of different parameters like learning rates and decay rates.

6.3 Optimization with learning rates

The performance of ADAM optimizer using different learning rates is analyzed and shown in Figure 9 .

www.frontiersin.org

FIGURE 9 . Performance of multiple learning rates.

From the figure, it is clear that learning rate (’0.01’, ‘0.001’, ‘0.0001’, ‘1e −05 ) performed the best on training and testing data. The ANN models performed worst (35% on train and 43.8% on test set) with slowest (lrate = “1.0”, “0.1”).

6.4 Optimization with decay rates

The technique of learning rate decay (lrDecay) is used to train current neural networks. It begins with a high rate of learning and then decays several times. It has been demonstrated empirically to aid in both optimization and generalization. The performance of ADAM optimizer using different decay rates is investigated and revealed in Figure 10 .

www.frontiersin.org

FIGURE 10 . Performance of multiple decay rates.

From the figure, it is clear that decay rate (“0.1”, “0.001”) performed the best on training and testing data. The ANN models performed worst (35.3% on train and 43.8% on test set) and (63.5% on train and 68.7% on test set) with decay rates “0.01” and “0.0001” respectively.

7 Discussion

Several strategies for gene selection in cancer categorization have been proposed in prior studies. The advent of deep learning has profoundly affected a wide variety of machine learning applications and research. Few of such studies ( Gupta and Gupta, 2021a ), ( Gupta and Gupta, 2021b ), ( Gupta and Gupta, 2021c ) are described in this section. The work flow used for classification of cancer data is shown in Figure 11 .

www.frontiersin.org

FIGURE 11 . Deep learning for Cancer Classification.

Initially, the exploration of data is done and termed as “exploratory data analysis”. Further, data preprocessing steps are used like cleaning data, reducing dimension (feature reduction), normalizing the data. Further the next stage splits the preprocessed data into sets. The deep learning classification algorithm is trained on the training set for classification of data. The trained classification model is further evaluated on the test set. The evaluation of the data can express the accurateness of the model. The number of cancer cases is rapidly increasing. It is difficult to diagnose because the illness is frequently asymptomatic in its early stages. Early detection can increase the odds of a patient’s recovery and cure. Cancer is notoriously difficult to diagnose in its early stages and is prone to recurrence after treatment. Cancer classification is a crucial topic. One of the most effective methods for cancer classification is gene selection ( Gupta and Gupta, 2021d ). The task of choosing a set of genes that enhances classification accuracy is NP-Hard. Furthermore, making accurate and specific cancer diagnostic forecasts is quite tricky. Because of the nonspecific symptoms and imprecise scans, certain tumors are more challenging to diagnose in their early stages. As a result, improving the prediction model in diagnostic cancer research is vital. Furthermore, most cancer research articles have increased dramatically, particularly those that use deep learning methodologies ( Shimizu and Nakayama, 2020 ). Again, the present research shows that traditional analysis techniques ( Akkus et al., 2017 ; Ronoud and Asadi, 2019 ; Chaunzwa et al., 2021 ) aid in improving the prediction accurateness and is frequently applied in healthcare sector. Its success is since it enables the discovery of highly complicated non-linear correlations between characteristics; and the extraction of information from unlabeled data unrelated to the situation at hand. Statistical studies demonstrate that deep learning models outperform numerous widely used cancer categorization algorithms.

Several academics have investigated automated learning methodologies; however, these approaches still have several flaws that make cancer classification difficult. Specific machine learning algorithms have been found incapable of exploiting unstructured data in cancer classification. CNNs are particularly appropriate for analyzing a wide range of unstructured data. This capability enabled deep learning algorithms to take an active role in the early diagnosis of cancer through data classification. Deep learning approaches have achieved high accuracy and other statistical characteristics. Deep Learning has succeeded in various domains, including image, video, audio, and text processing. Deep Learning faces a unique problem in gene expression analysis for various cancer detection and prediction tasks to define appropriate biomarkers for different cancer subtypes. Despite several research studies on multimodal treatment approaches, survival times remain short. The gathering of significant genes that can increase accuracy can provide adequate guidance in early cancer detection. Cancer can be classified into several subgroups. However, it is a complex task because of the vast number of genes and the comparatively few experiments in gene expression data ( Kumar et al, 2021 ). Cancer identification from microarray gene expression data presents a significant difficulty due to the small sample size, high dimensionality, and complexity of the data ( Dargan et al, 2020 ). There is a need for rapid and computationally efficient methods to address such issues. This study briefly explores the research studies that employed deep learning architectures that selected the most relevant genes for cancer prediction using gene expression data. Although Deep Learning has had success in various domains, it has yet to be thoroughly explored in genomics, notably in genomic cancer.

8 Conclusion

Cancer has become one of the top causes of death worldwide in recent years. As a result, increasing research is being done to determine the most effective diagnosing and treating cancer. However, cancer treatment faces numerous obstacles, as possible causes of cancer include genetic problems or epigenetic modifications in the cells. RNA sequencing is a substantial approach for assessing gene expression in model organisms and can provide information for bio-molecular cancer diagnosis. Microarray gene expression profiles can be used to classify tumors efficiently and effectively. Predicting various tumors is a significant problem, and offering accurate predictions would be highly beneficial in delivering better therapy to patients. The advent of deep learning approaches is critical for improving patient monitoring, as it can aid clinicians in making decisions regarding deadly diseases. Furthermore, Gene expression data are utilized to develop a classification model that will help cancer treatment. Classification of cancer subtypes is critical for effective diagnosis and individualized cancer treatment. The article concludes that the recent advances in high-throughput sequencing technology have resulted in the quick generation of multi-omics data from the same cancer sample. Thus, deep learning-based molecular illness classification holds considerable promise in the realm of genomics, particularly concerning gene microarray data.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abdollahi J., Nouri-Moghaddam B., Ghazanfari M. (2021). Deep Neural Network Based Ensemble learning Algorithms for the healthcare system diagnosis of chronic diseases. ArXiv Preprint Available at: https://ArXiv/org/abs.2103.08182 .

Google Scholar

Ahn T., Lee C. (2018). Deep learning-based identification of cancer or normal tissue using gene expression data.In Proceeding IEEE International Conference on Bioinformatics and Biomedicine (BIBM) . Madrid Spain . 03-06 December 2018 . IEEE , 1748–1752. doi:10.1109/BIBM.2018.8621108

CrossRef Full Text | Google Scholar

Akkus Z., Galimzianova A., Hoogi A., Rubin D. L., Erickson B. J. (2017). Deep learning for brain MRI segmentation : State of the art and future directions. J. Digit. Imaging. 30 (4), 449–459.

PubMed Abstract | CrossRef Full Text | Google Scholar

Alomari O. A., Khader A. T., Al-Betar M. A., Awadallah M. A. (2018). A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing. Appl. Intell. (Dordr). 48 (11), 4429–4447. doi:10.1007/s10489-018-1207-1

Aziz R., Verma C. K., Srivastava N. (2017). A novel approach for dimension reduction of microarray. Comput. Biol. Chem. 71, 161–169. doi:10.1016/j.compbiolchem.2017.10.009

Basavegowda H. S., Dagnew G. (2020). Deep learning approach for microarray cancer data classification. CAAI Trans. Intell. Technol. 5, 22–33. doi:10.1049/trit.2019.0028

Chaunzwa T. L., Hosny A., Xu Y., Shafer A., Diao N., Lanuti M., et al. (2021). Deep learning classification of lung cancer histology using CT images. Sci. Rep. 1, 5471. doi:10.1038/s41598-021-84630-x

Chen X., Xie J., Yuan Q. (2018). A method to facilitate cancer detection and type classification from gene expression data using a deep autoencoder and neural network. Mach. Learn. Available at: https://arXiv/org/abs1812.08674 .

Ching T., Zhu X., Garmire L. X. (2018). Cox-nnet : An artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput. Biol. 14, e1006076–18. doi:10.1371/journal.pcbi.1006076

Cho H., Lee S., Ji Y. G., Hyeon D., Id L. (2018). Association of specific gene mutations derived from machine learning with survival in lung adenocarcinoma. PLoS One 13, e0207204. doi:10.1371/journal.pone.0207204

Danaee P., Ghaeini R., Hendrix D. A. (2017). A deep learning approach for cancer detection and relevant gene identification. Pac. Symp. Biocomput. 22, 219–229. doi:10.1142/9789813207813_0022

Daoud M., Mayo M. (2019). A survey of neural network-based cancer prediction models from microarray data. Artif. Intell. Med. 97, 204–214. doi:10.1016/j.artmed.2019.01.006

Dargan S., Kumar M., Rohit M., Gulshan A. (2020). A survey of deep learning and its applications : A new paradigm to machine learning. Arch. Comput. Methods Eng. 27 (4), 1071–1092. doi:10.1007/s11831-019-09344-w

Dwivedi A. K. (2016). Artificial neural network model for effective cancer classification using microarray gene expression data. Neural comput. Appl. 29, 1545–1554. doi:10.1007/s00521-016-2701-1

Extraction S. F. (2017). “Prognosis prediction of human breast cancer by integrating deep neural network and support vector machine supervised feature extraction and classification for breast cancer prognosis prediction,” in Proceeding International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) , Shanghai China , 14-16 October 2017 ( IEEE ). doi:10.1109/CISP-BMEI.2017.8301908

Fotouhi S., Asadi S., Kattan M. W. (2019). A comprehensive data level analysis for cancer diagnosis on imbalanced data. J. Biomed. Inf. 90, 103089. doi:10.1016/j.jbi.2018.12.003

Gao F., Wang W., Tan M., Zhu L., Zhang Y., Fessler E., et al. (2019). DeepCC : A novel deep learning-based framework for cancer molecular subtype classification. Oncogenesis 8, 44. doi:10.1038/s41389-019-0157-8

García-díaz P., Sánchez-berriel I., Martínez- J. A., Diez-pascual A. M. (2019). Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data. Genomics 112, 1196. doi:10.1016/j.ygeno.2019.11.004

Guia J. M. De. (2019). “DeepGx : Deep learning using gene expression for cancer classification,” in Proceeding IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) , Vancouver BC Canada , 27-30 August 2019 ( IEEE ), 913–920. doi:10.1145/3341161.3343516

Guo Y., Liu S., Li Z., Shang X. (2018). BCDForest : A boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinforma. 19 (5), 118–213. doi:10.1186/s12859-018-2095-4

Gupta G., Manoj G. (2021). “Deep learning for brain tumor segmentation using magnetic resonance images,” in Proceeding IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) , Melbourne, Australia , 13-15 October 2021 ( IEEE ), 1–6. doi:10.1109/CIBCB49929.2021.9562890

Gupta S. (2021). Computational prediction of cervical cancer diagnosis using ensemble-based classification algorithm . doi:10.1093/comjnl/bxaa198

Gupta S., Gupta M. (2021c). “Deep learning for brain tumor segmentation using magnetic resonance images,” in ProceedingIEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology , Melbourne, Australia , 13-15 October 2021 ( IEEE ). doi:10.1109/CIBCB49929.2021.9562890

Gupta S., Gupta M. K. (2021d). A comparative analysis of deep learning approaches for predicting breast cancer survivability. Archives Comput. Methods Eng. , 1

Gupta S., Gupta M. K. (2021a). A comprehensive data‐level investigation of cancer diagnosis on imbalanced data. Comput. Intell. 38, 156–186. doi:10.1111/coin.12452

Gupta S., Gupta M. K. (2021b). Computational model for prediction of malignant mesothelioma diagnosis. Comput. J. doi:10.1093/comjnl/bxab146

He B., Luo H., Zhou Z., Wang B., Liang Y., Lang J., et al. (2020). A neural network framework for predicting the tissue-of-origin of 15 common cancer types based on RNA-seq data. Front. Bioeng. Biotechnol. 8 (8), 737–811. doi:10.3389/fbioe.2020.00737

Huang Z., Johnson T. S., Han Z., Helm B., Cao S., Zhang C., et al. (2020). Deep learning-based cancer survival prognosis from RNA-seq data : Approaches and evaluations. BMC Med. Genomics 13 (5), 41–12. doi:10.1186/s12920-020-0686-1

Huynh P., Nguyen V., Do T. (2019). Novel hybrid DCNN–SVM model for classifying RNA-sequencing gene expression data. J. Inf. Telecommun. 3, 533–547. doi:10.1080/24751839.2019.1660845

Id J. L., Zhou Z., Dong J., Fu Y., Li Y., Luan Z., et al. (2021). Predicting breast cancer 5-year survival using machine learning: A systematic review. PLoS One 16, e0250370–23. doi:10.1371/journal.pone.0250370

Jerez M., Franco L., Veredas F. J., Lo G. (2020). Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. Plos One 15, e0230536–24. doi:10.1371/journal.pone.0230536

Joshi P., Park T. (2019). “Cancer subtype classification based on superlayered neural network,” in Proceeding IEEE International Conference on Bioinformatics and Biomedicine , San Diego, CA, USA , 18-21 November 2019 ( IEEE ), 1988–1992. doi:10.1109/BIBM47256.2019.8983343

Kashyap D., Pal D., Sharma R., Garg V. K., Goel N., Koundal D., et al. (2022). Global increase in breast cancer incidence: Risk Factors and preventive Measures. Biomed. Res. Int. 2022, 9605439. doi:10.1155/2022/9605439

Kim B., Yu K., Lee P. C. W. (2020). Cancer classification of single-cell gene expression data by neural network. Bioinformatics 36, 1360–1366. doi:10.1093/bioinformatics/btz772

Kong Y., Yu T. (2018). A deep neural network model using random forest to extract feature representation for gene expression data classification. Sci. Rep. 8 (1), 16477. doi:10.1038/s41598-018-34833-6

Kumar Y., Gupta S., Singla R., Chen Y. (2021). A systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Arch. Comput. Methods Eng. 29 (4), 2043–2070. doi:10.1007/s11831-021-09648-w

Lecun Y., Bengio Y., Hinton G. (2015). Deep learning. Nature 521 (7553), 436–444. doi:10.1038/nature14539

Lin M., Jaitly V., Wang I., Hu Z., Chen L., Wahed M., et al. (2018). Application of deep learning on predicting prognosis of acute myeloid leukemia with cytogenetics age and mutations. Mach. Learn. Available at: https://arXiv/org/abs1810.13247 .

Motieghader H., Najafi A., Sadeghi B., Masoudi-nejad A. (2017). A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata. Inf. Med. Unlocked 9, 246–254. doi:10.1016/j.imu.2017.10.004

Panda M. (2017). Elephant search optimization combined with deep neural network for microarray data analysis. J. King Saud Univ. - Comput. Inf. Sci. 32, 940–948. doi:10.1016/j.jksuci.2017.12.002

Parvathavardhini S., Manju S. (2020). Cancer gene detection using Neuro fuzzy classification algorithm. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 3 (3), 2456

Reid A., Klerk N. De, Musk A. W. B. (2011). Does exposure to asbestos cause ovarian cancer ? A systematic literature review and meta-analysis. Cancer Epidemiol. Biomarkers Prev. 20, 1287–1295. doi:10.1158/1055-9965.EPI-10-1302

Ronoud S., Asadi S. (2019). An evolutionary deep belief network extreme learning-based for breast cancer diagnosis. Soft Comput. 23. 13139–13159. doi:10.1007/s00500-019-03856-0

Salman I., Ucan O., Bayat O., Shaker K. (2018). Impact of metaheuristic iteration on artificial neural network structure in medical data. Process. (Basel). 6, 57. doi:10.3390/pr6050057

Schiff M., Castle P. E., Jeronimo J., Rodriguez A. C., Wacholder S. (2007). Human papillomavirus and cervical cancer. Clin. Microbiol. Rev. 16, 1–17. doi:10.1128/CMR.16.1.1-17.2003

Sevakula R. K., Singh V., Member S., Kumar C., Cui Y. (2018). Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 5963, 2089–2100. doi:10.1109/TCBB.2018.2822803

Shimizu H., Nakayama K. I. (2020). Artificial intelligence in oncology. Cancer Sci. 111 (5), 1452–1460. doi:10.1111/cas.14377

Shon H. S., Yi Y., Kim K. O., Cha E., Kim K. (2021). Classification of stomach cancer gene expression data using CNN algorithm of deep learning. J. Biomed. Transl. Res. 20 (1), 15–20. doi:10.12729/jbtr.2019.20.1.015

Sung H., Ferlay J., Siegel R. L., Laversanne M., Soerjomataram I., Jemal A., et al. (2021). Global cancer statistics 2020 : GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca. Cancer J. Clin. 71 (3), 209–249. doi:10.3322/caac.21660

Surbhi Gupta M. G. (2021). “Prostate cancer prognosis using multi-layer perceptron and class balancing techniques,” in Proceeding 2021 Thirteenth International Conference on Contemporary Computing (IC3-2021) (IC3 ’21) , August 05-07, 2021 (New York NY USA: ACM ), 1.

Torkey H., Atlam M., El-fishawy N., Salem H. (2021). A novel deep autoencoder based survival analysis approach for microarray dataset. Peer Comput. Sci. 1, e492. doi:10.7717/peerj-cs.492

Tumuluru P., Ravi B. (2017). “Goa-Based DBN : Grasshopper optimization algorithm-based deep belief neural networks for cancer classification Goa-based DBN : Grasshopper optimization algorithm-based deep belief neural networks for cancer classification,” in Proceeding International Journal of Applied Engineering Research , 14218–14231.

Urda D., Moreno F. (2017). “Deep learning to analyze RNA-seq gene expression data,” in International work-conference on artificial neural networks , 50–59. Springer: Cham . doi:10.1007/978-3-319-59147-6

Wessels F., Schmitt M., Krieghoff-henning E., Jutzi T., Worst T. S., Waldbillig F., et al. (2021). Deep learning approach to predict lymph node metastasis directly from primary tumor histology in prostate cancer. BJU Int. 128, 352. doi:10.1111/bju.15386

Xiao Y., Wu J., Lin Z., Zhao X. (2018a). A deep learning-based multi-model ensemble method for cancer prediction. Comput. Methods Programs Biomed. 153, 1–9. doi:10.1016/j.cmpb.2017.09.005

Xiao Y., Wu J., Lin Z., Zhao X. (2018b). A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data. Comput. Methods Programs Biomed. 166, 99–105. doi:10.1016/j.cmpb.2018.10.004

Xu J., Wu P., Chen Y., Meng Q., Dawood H., Dawood H. (2019a). A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinforma. 20, 527. doi:10.1186/s12859-019-3116-7

Xu J., Wu P., Chen Y., Meng Q., Dawood H., Khan M. M. (2019b). A novel deep flexible neural forest model for classification of cancer subtypes based on gene expression data. IEEE Access 7, 22086–22095. doi:10.1109/ACCESS.2019.2898723

Yuan Y., Shi Y., Li C., Kim J., Cai W., Han Z., et al. (2016). DeepGene : An advanced cancer type classifier based on deep learning and somatic point mutations. BMC Bioinforma. 17 (17), 476. doi:10.1186/s12859-016-1334-9

Zahras D. (2018).Cervical cancer risk classification based on deep convolutional neural network, In ProceedingInternational Conference on Applied Information Technology and Innovation , Padang, Indonesia , 03-05 September 2018 ,. IEEE , 149–153. doi:10.1109/ICAITI.2018.8686767

Keywords: artificial intelligence, cancer, deep learning, gene expression, Rna-sequences

Citation: Gupta S, Gupta MK, Shabaz M and Sharma A (2022) Deep learning techniques for cancer classification using microarray gene expression data. Front. Physiol. 13:952709. doi: 10.3389/fphys.2022.952709

Received: 25 May 2022; Accepted: 01 September 2022; Published: 30 September 2022.

Reviewed by:

Copyright © 2022 Gupta, Gupta, Shabaz and Sharma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Mohammad Shabaz, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Skip to main content
  • Skip to FDA Search
  • Skip to in this section menu
  • Skip to footer links

U.S. flag

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

U.S. Food and Drug Administration

  •   Search
  •   Menu
  • Vaccines, Blood & Biologics
  • Science & Research (Biologics)

Advanced Technology for Reducing the Risk of Transmission by Transfusion

Robert Duncan, Ph.D. Headshot

Robert Duncan, PhD

Office of Blood Research and Review Division of Emerging and Transfusion Transmitted Diseases Laboratory of Emerging Pathogens

[email protected]

Robert C. Duncan received his PhD from the University of Maryland and performed post-doctoral research at the National Cancer Institute. He is currently a senior investigator and reviewer at the FDA Center for Biologics Evaluation and Research (CBER). His broad experience includes over 25 years of research encompassing virology, bacteriology, parasitology, cell biology, and new technology for pathogen detection. He has published over 50 peer- reviewed papers in these fields. He has been assisted by a postdoctoral research fellow and a technician, who have contributed to the success of his research program.

Dr. Duncan is internationally recognized for his expertise in blood donor screening for protozoan parasites, having chaired FDA committees for product license review, written FDA policy documents, and served on World Health Organization standards committees.

His laboratory has completed a variety of significant projects, including 1) developing a Leishmania donovani genomic microarray; 2) using microarray technology to detect bioterror pathogens; 3) collaborating with Life Technologies, Corp. to develop a blood-borne pathogen detection device using the multiplex real-time PCR OpenArray platform; 4) designing and testing a blood-borne pathogen detection resequencing microarray; 5) directing a study entitled, “Methods to Generate Spiked Clinical Samples for Use in Studies to Support FDA Clearance of Diagnostics for Low Prevalence Pathogens,” at the request of NIAID; 6) using a custom resequencing microarray to monitor the genomic drift of Ebola virus; 7) collaborating with Creative LIBS Solutions, LLC to evaluate a device to detect pathogens in blood by analyzing the spectrum of light released after a laser pulse to the sample surface.

General Overview

The Office of Blood Research and Review (OBRR) performs regulatory evaluation of in vitro diagnostic devices used to screen donated blood for infectious microorganisms. Currently, this screening is done for only a few specific, pathogenic (disease-causing) microorganisms that public health officials consider the biggest threats to blood safety.

However, there are no donor screening tests for several other pathogens that are uncommon, are transmitted only at certain times of the year, or that occur only in certain areas of the country. Nonetheless these pathogens can be transmitted and cause sometimes fatal diseases following transfusion into vulnerable populations (newborns, the elderly, and those whose immune system is weak).

In addition, certain pathogens may acquire sufficient genetic changes that render them undetectable to existing tests licensed for screening donated blood. This would require modification of existing tests or addition of new ones to detect the genetically altered pathogens.

The solution to this growing threat to the blood supply lies in the combination of 1) recent advances in technologies for identifying the genetic makeup of cells and microorganisms; 2) "multiplex platforms" that incorporate such methods into devices that enable very rapid detection of several different pathogens simultaneously. These technologies might simplify routine donor screening of many pathogens, including those that have acquired variations.

To stay abreast of this developing technology and facilitate its advancement, the Division of Emerging and Transfusion Transmitted Diseases (DETTD) established a program of research and evaluation in the Laboratory of Emerging Pathogens (LEP).

The aim of the program is to enhance blood safety and availability by using new technologies and combinatorial approaches to detect known blood-borne pathogens. Therefore, this program addresses the following FDA Focus Areas of Regulatory Science a) Public Health Emergency Preparedness and Response b) Medical Countermeasures and Preparedness for Emerging Infectious Diseases and c) Technologies to Reduce Pathogen Contamination.

Scientific Overview

“Just a quick glance at what Advanced Technology for Blood Safety looks like. Clockwise from the upper left: the fluidic processing station for the Affymetrix GeneChip, the laser-based spectrophotometric apparatus with an inset of blood soaked filter after laser collection of data, the OpenArray wafer showing nanowells for spatially multiplexed PCR, an Affymetrix GeneChip and finally in the bottom left is a liquid-handling robot”

We use our expertise in -and access to blood-borne pathogens, to evaluate new platforms for detecting pathogen-spiked human blood and plasma specimens. Moreover, we leverage the equipment and expertise of outside companies through contracts and collaborations as the most cost-effective way to evaluate these new technologies. Platforms will undergo final testing using repository blood donor specimens.

The principal investigator has studied new pathogen detection technology since 2000 and has published studies on the use of advanced technology to detect blood-borne pathogens (a Leishmania donovani genomic microarray, a microarray for pathogen detection, PCR-based species identification, a multiplex real-time PCR platform, a resequencing microarray, and Laser Induced Breakdown Spectrum Analysis).

Our laboratory conceived of the OpenArray multiplex real time PCR platform project, optimized it with spiked specimens, and validated it with repository specimens of infected blood donors obtained from Creative Testing Solutions.

The laboratory also evaluated the TessArae Resequencing Pathogen Microarray for its potential in multiplex blood donor screening. The laboratory is independently evaluating a custom-designed Blood Borne Pathogen Resequencing Microarray (BBP-RMA, version 2) for improved identification of pathogens in blood. We also use the resequencing microarray technology in a novel approach to tracking genomic drift in the Ebola Virus with funding from the Medical Counter Measures Initiative. Our work has demonstrated genomic drift in cultured viruses that express the Ebola surface glycoprotein.

Next generation sequencing (NGS) with a nanopore device is being applied to blood-borne pathogen detection in comparison to other multiplex platforms. Remaining engaged with NGS technology is important as the diagnostic field looks to the NGS platform as the way of the future.

The laboratory has established and validated standardized methods for spiking rare pathogens in blood to create specimens for testing new diagnostic devices. The standardized methods have been published and posted on the National Institute for Allergy and Infectious Diseases (NIAID) website for the use of device developers.

In addition, my laboratory participated in a collaboration with physicists in a research and development company who are testing a laser-based spectrophotometric device for detecting pathogens in blood. This multiplex technology tests a specimen without the need for sample preparation or traditional amplification methods. Successful results led to a publication, though re-direction of the company’s goals toward SARS-CoV-2 detection resulted in the termination of the collaboration.

The laboratory is collaborating with the Infectious Disease Next Generation Sequencing Diagnostic Sequencing Project by providing DNA samples extracted from well-characterized lab cultured pathogens. This CDRH project is building a database, called ARGOS, composed of sequences that are characterized well enough to be a resource for clinical molecular diagnostics.

Important Links

  • ORCID ID: 0000-0001-8409-2501

Publications

  • J Appl Microbiol 2022 Mar;132(3):2431-40 The use of laser-based diagnostics for the rapid identification of blood borne viruses in human plasma samples. Multari RA, Cremers DA, Nelson A, Fisher C, Karimi Z, Young S, Green V, Williamson P, Duncan R
  • PLoS One 2022 Feb 10;17(2):e0263732 Tracking ebolavirus genomic drift with a resequencing microarray. Tiper I, Kourout M, Fisher C, Konduru K, Purkayastha A, Kaplan G, Duncan R
  • PLoS Negl Trop Dis 2020 Feb 28;14(2):e0008050 CpG ODN D35 improves the response to abbreviated low-dose pentavalent antimonial treatment in non-human primate model of cutaneous leishmaniasis. Thacker SG, McWilliams IL, Bonnet B, Halie L, Beaucage S, Rachuri S, Dey R, Duncan R, Modabber F, Robinson S, Bilbe G, Arana B, Verthelyi D
  • Clin Infect Dis 2019 May 30;68(12):2036-44 Asymptomatic visceral leishmania infantum infection in U.S. soldiers deployed to Iraq. Mody RM, Lakhal-Naouar I, Sherwood JE, Koles NL, Shaw D, Bigley D, Co EA, Copeland NK, Jagodzinski LL, Mukbel RM, Smiley R, Duncan RC, Kamhawi S, Jeronimo SMB, DeFraites RF, Aronson NE
  • J Appl Microbiol 2019 May;126(5):1606-17 The use of laser-based diagnostics for the rapid identification of infectious agents in human blood. Multari RA, Cremers DA, Nelson A, Karimi Z, Young S, Fisher C, Duncan R
  • Expert Rev Mol Diagn 2019 Jan;19(1):15-25 Advances in multiplex nucleic acid diagnostics for blood-borne pathogens: promises and pitfalls--an update. Duncan R, Grigorenko E, Fisher C, Hockman D, Lanning B
  • Biochim Biophys Acta 2018 Aug;1865(8):1148-59 A novel signal sequence negative multimeric glycosomal protein required for cell cycle progression of Leishmania donovani parasites. Ahuja K, Beg MA, Sharma R, Saxena A, Naqvi N, Puri N, Rai PK, Chaudhury A, Duncan R, Salotra P, Nakhasi H, Selvapandiyan A
  • Cell Host Microbe 2018 Jan 10;23(1):134-43 Gut microbes egested during bites of infected sand flies augment severity of leishmaniasis via inflammasome-derived IL-1beta. Dey R, Joshi AB, Oliveira F, Pereira L, Guimarães-Costa AB, Serafim TD, de Castro W, Coutinho-Abreu IV, Bhattacharya P, Townsend S, Aslan H, Perkins A, Karmakar S, Ismail N, Karetnick M, Meneses C, Duncan R, Nakhasi HL, Valenzuela JG, Kamhawi S
  • J Mol Diagn 2017 Jul;19(4):549-60 Highly multiplex real-time PCR-based screening for blood-borne pathogens on an OpenArray platform. Grigorenko E, Fisher C, Patel S, Winkelman V, Williamson P, Chancey C, Anez G, Rios M, Majam V, Kumar S, Duncan R
  • J Microbiol Methods 2017 Jan;132:76-82 Comparison of multiplex PCR hybridization-based and singleplex real-time PCR-based assays for detection of low prevalence pathogens in spiked samples. Hockman D, Dong M, Zheng H, Kumar S, Huff MD, Grigorenko E, Beanan M, Duncan R
  • Transfusion 2016 Jun;56(6 Pt 2):1537-47 Multiplex detection and identification of viral, bacterial, and protozoan pathogens in human blood and plasma using a high-density resequencing pathogen microarray platform. Kourout M, Fisher C, Purkayastha A, Tibbetts C, Winkelman V, Williamson P, Nakhasi HL, Duncan R
  • J Appl Microbiol 2016 Apr;120(4):1119-29 Standardized methods to generate mock (spiked) clinical specimens by spiking blood or plasma with cultured pathogens. Dong M, Fisher C, Anez G, Rios M, Nakhasi HL, Hobson JP, Beanan M, Hockman D, Grigorenko E, Duncan R
  • Expert Rev Mol Diagn 2016 Jan;16(1):83-95 Advances in multiplex nucleic acid diagnostics for blood-borne pathogens: promises and pitfalls. Duncan R, Kourout M, Grigorenko E, Fisher C, Dong M

IMAGES

  1. DNA Microarray: Introduction, Definition, Principle, Detection method

    dna microarray research paper

  2. DNA Microarray- Definition, Principle, Procedure, Types

    dna microarray research paper

  3. (PDF) DNA microarray analysis: Principles and clinical impact

    dna microarray research paper

  4. Common limitations and solutions to DNA microarray methodology

    dna microarray research paper

  5. Schematic representation of DNA microarray procedure. The cultivated

    dna microarray research paper

  6. Illustration of DNA Microarray, a general protocol

    dna microarray research paper

VIDEO

  1. Bài 13

  2. Microarrays

  3. Array Plate Protein and DNA Microarray Hybridization Station

  4. DNA MICROARRAY ll Full Concept CUET PG ll CSIR-NET Bsc Msc Students #zoology#cuetpreparation

  5. GRHL2-binding DNA Microarray motif analysis

  6. Microarray Technology

COMMENTS

  1. DNA microarrays: Types, Applications and their future

    Figure 2. Three basic types of microarrays: (A) Spotted arrays on glass, (B) self assembled arrays and (C) in-situ synthesized arrays. A. With spotted arrays, a "pen" (or multiple pens) are dipped into solutions containing the DNA of interest and physically deposited on a 1"x 3" glass microscope slide.

  2. (PDF) Microarray Technology: Methods and Applications

    Microarray Technology, Methods and Applications. pp.7-23. The microarray technology has been a tremendous advance in molecular-based testing methods for biochemical and biomedical applications. As ...

  3. Microarrays

    A microarray is a set of samples, for example DNA, RNA or proteins, arranged on a solid substrate or chip, such as a glass slide or silicon film, that is used in high throughput experiments ...

  4. Overview of DNA Microarrays: Types, Applications, and Their Future

    This is followed by discussion of the methods of manufacture of microarrays and the most common biological applications. The unit ends with a brief description of the limitations of microarrays and discusses how microarrays are being rapidly replaced by DNA sequencing technologies. Curr. Protoc. Mol.

  5. An introduction to DNA microarrays for gene expression analysis

    The primary technological platform treated in this paper is the spotted DNA microarray, with a secondary focus on Affymetrix® arrays (see Section 3 for a description of the microarray types). This is largely due to the fact that the authors' have more extensive experience working with data only from the former, and that much of the research ...

  6. Comparing whole genomes using DNA microarrays

    DNA microarrays present an alternative way to study differences between closely related genomes. ... the research and commercial community is accelerating towards ... This paper is the first to ...

  7. Recent Advances in DNA Microarray Technology: an Overview on ...

    The microarray experimental process involves six different steps (Fig. 1): (a) probe design, where the knowledge of gene sequence is crucial for the design of probes that are specific for a particular RNA or DNA sequence, (b) production of microarray, (c) sample preparation, where RNA or DNA extraction process is strictly dependent to the specimen and to the downstream application (PCR ...

  8. DNA Microarray: Basic Principle and It's Applications

    DNA microarray (also co mmonly known as gene. chip, DNA chip, or biochip) is a collection of microsco pic. DN A sp ots att ached to a soli d su rface. In DNA chi p. technolo gy, single stranded ...

  9. Full article: Recent Application of DNA Microarray Techniques to

    Even if this paper is concerned only on DNA microarray, there are different microarray techniques such as protein, peptide, glycan, antibody, and aptamer microarrays which can be used for screening vaccine candidate, and study posttranslational modifications. ... Related research . People also read lists articles that other readers of this ...

  10. A Comprehensive Survey of Recent Approaches on Microarray ...

    This review paper discusses various types of microarray data and stages pertaining to processing of microarray images. Substantial literature describes comparative study on various microarray image processing techniques to formulate effective results in extracting gene expressions. ... DNA microarray datasets taken for research purposes from ...

  11. Microarray cancer feature selection: Review, challenges and research

    Microarray technology first appeared in the research arena in the late 1980s (Rafii, Hassani & Kbir, 2017). Augenlicht et al. (1987) were the first researchers to describe DNA Microarrays, where about 4000 of complementary DNA (cDNA) sequences were spotted on nitrocellulose (Augenlicht, Taylor, Anderson & Lipkin, 1991).

  12. DNA microarray analysis: Principles and clinical impact

    In this introductory paper, we present the principles of DNA microarray experiments, selected clustering methods for gene expression analysis and the impact to clinical research. Keywords: DNA microarray, gene expression, hierarchical clustering, self organizing maps, support vector machine, B-cell lymphoma.

  13. DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical

    More recent uses of DNA microarrays in biomedical research are not limited to gene expression. DNA microarrays are being used to detect single nucleotide polymorphisms (SNPs) of our genome (Hap Map project) ( 11 ), aberrations in methylation patterns ( 12 ), alterations in gene copy-number ( 13 ), alternative RNA splicing ( 14 ), and pathogen ...

  14. DNA-microarray analysis of brain cancer: molecular ...

    DNA-microarray-based technologies, which allow simultaneous analysis of expression of thousands of genes, have already begun to uncover previously unrecognized patient subsets that differ in their ...

  15. Deep learning techniques for cancer classification using microarray

    The use of DNA microarray technology to uncover information from the expression levels of thousands of genes has enormous promise. The DNA microarray technique can determine the levels of thousands of genes simultaneously in a single experiment. ... This study analyses all the research studies focused on optimizing gene selection for cancer ...

  16. Paper review: An overview on microarray technologies

    This paper aims to give an overview about microarray technology on the two platforms and the advantage of using them on bioinformatics research. DNA structure (Fajriyah, 2014) Cell, Chromosome ...

  17. Dna Microarray for Cancer Classification Using Deep Learning

    Examining the level of expression of genes using DNA microarray . The area of genetic studies is currently experiencing a surge in interest in technology for a specific organism . Applications for microarray studies in the medical profession include illnessprediction and diagnosis, cancer research, and many more.

  18. Advanced Technology for Reducing the Risk of Transmission by

    He has published over 50 peer- reviewed papers in these fields. He has been assisted by a postdoctoral research fellow and a technician, who have contributed to the success of his research program.

  19. Microbiology Research

    In Saccharomyces cerevisiae, the Rpd3L complex includes the histone deacetylase Rpd3 and the DNA binding proteins Ume6 and Ash1 and serves as a transcriptional silencer or enhancer. In S. cerevisiae, the transcription of PDR5, which encodes a major drug efflux pump, and pleiotropic drug resistance (PDR) are hyperactivated by the transcription factor Pdr3 in ρ0/− cells, which lack ...

  20. Machine Learning Methods for Cancer Classification Using Gene

    The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification. ... methods. The DNA microarray method employs a two-dimensional array with microscopic spots to which short sequences or genes bind to known DNA molecules through a hybridization process. NGS ...