Glossary of Terms





Rothamsted Research is one if eight institutes sponsored by the biotechnology and biological sciences research council













The application of multivariate statistical, pattern recognition and informatics methods to chemical data.


Functional Genomics

Now that the human genome has been catalogued, it is necessary to determine the function of each gene, and to understand the control mechanisms. It will also be required that the role that genotype and environment play in determining the phenotype be elucidated. To understand gene function, researchers need to apply high-throughput technologies to study functional networks and pathways. With enough data and appropriate chemometrics tools, it should be possible to do this, which would allow optimization of drug target selection and the development of safer, more effective therapeutics. Metabonomics promises to be a lead technology in this process.



The study of gene sequences and differences in those sequences between species and individuals and the variation of gene sequences.Knowing the gene sequence per se does not necessarily give insight into deep biological function, but as the understanding of the functional variability in gene sequences increases this will lead to the discovery of many new drug targets.



The quantitative complement of all the low molecular weight molecules present in cells in a particular physiological or developmental state.


Metabonomics and Metabolomics

These very similar terms have arisen at about the same time in different area of bioscience research, mainly animal biochemistry and microbial/plant biochemistry respectively. Although both involve the multiparametric measurement of metabolites they are not identical as metabonomics deals with integrated, multicellular, biological systems including communicating extracellular environments and metabolomics deals with simple cell systems and, at least in terms of published data, mainly intracellular metabolite concentrations.


NMR Spectroscopy

Some atomic nuclei possess a non-zero magnetic moment. This property is quantised and leads to discrete energy states in a magnetic field. Nuclei such as 1H, 13C, 15N, 19F and 31P can undergo transitions between these states when radiofrequency pulses of appropriate energy are applied. The exact frequency of a transition depends on the type of nucleus and on its electronic environment in a molecule. For example, 1H nuclei in a molecule give NMR peaks at frequencies (chemical shifts) characteristic of their chemical environment. NMR spectroscopy is extensively used as a structural tool and information on isomers and molecular conformations can be obtained by interpretation of the chemical shifts as well as splitting patterns due to indirect nuclear interactions (J couplings). In metabonomics, it is the the patterns that occur when many different biochemical entities are detected simultaneously in a mixture using 1H NMR that are interpreted


Pattern Recognition

PR and related multivariate statistical approaches can be used to discern significant patterns in complex data  sets and are particularly appropriate in situations where there are more variables than samples in the data set. The general aim of PR is to classify objects (in this case 1H NMR spectra) or to predict the origin of objects based on identification of inherent patterns in a set of indirect measurements. PR methods can reduce the dimensionality of complex data sets via 2 or 3D mapping procedures, thereby facilitating the visualisation of inherent patterns in the data



The observable traits or characteristics of an organism, for example hair colour, or the presence or absence of a disease. Phenotypic traits are not necessarily genetic.


Principal Components Analysis

This is a data dimension reduction method. It is termed an unsupervised technique in that no a priori knowledge as to the class of the samples is required and analysis is based on the calculation of latent variables. Principal components are linear combinations of the original data variables such that the first component explains as much as possible of the variance in the data set and subsequent components are orthogonal to each other and explain decreasing levels of data variance. Use of PCA enables the "best" representation, in terms of biochemical variation in the data set to be displayed in two or three dimensions.



The measurement of cellular protein production and levels, the structural characterisation of those proteins and the understanding of their functions. This science is also heavily dependent on advanced analytical methodologies, including for example 2D gel-electrophoresis combined with nanospray mass spectrometry for separation and identification of proteins. Interestingly in humans, there may be only about 30,000 genes, but there are thought to be many more cellular proteins than there are genes, including all the possible post-translational modifications. This poses an immediate theoretical problem when gene expression- proteomic correlations are being sought as there is a higher level of cellular control than the genome which is in the protein complement itself. Also changes in gene expression which may or may not result in changes in cellular protein synthesis have to occur at different times in the cell, and different gene regulation events occurring at the same time may take different times to effect the proteome. From an analytical viewpoint, so far it has only been possible to separate and identify a small fraction of the possible cellular proteins.



This is the quantitative measurement of gene expression in a cell or tissue. Generally this involves the measurement of mRNA levels by various methods, the most popular currently being via proprietary gene chips. The problems here include the fact that chips are very expensive, that many genes or sequences have no known function and that the relationships between quantitative variation or patterns in expression and the influences of cell or pathway function are, at best, poorly understood. Moreover it is widely appreciated that mRNAs are not chemically stable and steps must be taken to ensure quantitative reliability of the chip measurements.  A less well considered problem stems from the fact that quite large samples of tissue are generally required to make an extensive set of gene expression measurements on one sample (up to 1 g in the case of human tissues). In such sample, even in a relatively homogeneous tissue such as liver, there may be dozens of cell types in different topographical locations performing different functions and by definition have different levels of genetic activity. The gene chip measures an average of these activities the meaning of which is unclear.