a1 Wellcome Trust Centre for Cell Biology School of Biological Sciences The University of Edinburgh
Put a group of bioinformaticians in a room and the chances are that the skill base for each of them will differ considerably, even those involved in genetics based projects. By the nebulous definition of bioinformatics they could be involved in IT-infrastructure, including data management and storage, or involved in data processing and analysis using any combination of the myriad of software tools available in each field of genetics. It is essential, therefore, to restrict the scope of any book that deals with bioinformatics. “Bioinformatics for Geneticists” does that by focusing on human disease genetics and limits the bioinformatics aspect to associated software tools and techniques. Given this (relatively) specialised viewpoint a layperson may question the need for a new edition of this book as the last was published under three years ago. However the fundamental changes to our understanding of some areas such as non-coding RNAs and the rapid expansion of key resources such as single nucleotide polymorphism data produced from the HAPMAP project, have left a void in the knowledge base covered by textbooks. To this book's credit, these topics, in conjunction with more mature areas, are covered in a way that is both understandable and interesting.
The book is split into 19 chapters with five main sections: An introduction to bioinformatics for the geneticist, Mastering genes, genomes and genetic variation data, Bioinformatics for genetic study design and analysis, Moving from associated genes to disease alleles, and Analysis at the genetic and genomic data interface. Each chapter is written by different authors but assembled such that they describe a different and often research-focused area. As the book's subtitle “a bioinformatics primer for the analysis of genetic data” suggests, the chapters are often useful for directing the reader to further information resources. However, the level of detail differs considerably between chapters. The HAPMAP project for example is covered at great length, and as arguably one of newer and faster growing datasets this attention is easily justifiable. Microarray analysis on the other hand is described at a more fundamental level and mentions alternative applications such as chIP-chip, going into much detail on their analysis.
A benefit of research focused chapters is that the reader gains a basis in a variety of bioinformatics tools and techniques aplicable to each subject area. One of my favourite chapters in this aspect is Chapter 9 “Integrating genetics, genomics and epigenomics to identify disease genes”. Here a range of software tools and techniques are taught together while focusing on a particular case study. Despite this, the opportunity to summarise the relevant resources and repositories is not missed. Perhaps the main drawback to this integrated approach is that knowledge from many of the data repositories and software packages common to multiple fields (such as those at ensEMBL, SwissProt and NCBI) is learned in a piecemeal fashion to the detriment of understanding the ethos and connectivity behind these key data sources.
Other aspects of a bioinformatician's job are data management and programming. Chapter 2 “Managing and manipulating genetic data” alludes to covering these topics but the focus of the chapter is extremely basic. Although essential reading for complete novices, some of the Perl code examples are quite confusing to the target audience. Moreover, it does not emphasise the use of the pre-existing libraries available in BioPerl. In my opinion the premise of showing and teaching limited code examples (especially just one programming language), to a target audience of non-programmers is flawed. A more useful approach may have been to suggest the best languages to learn, based on their ease of use and the availability of existing code repositories for specific tasks, and to emphasise the best books and resources for self-learning. On the whole the absence of any mention or review of the repositories of reusable code is rather disappointing as although the R-language, Perl and BioPerl are briefly mentioned, ensEMBL Perl API, BioConductor [R-language], BioPython [Python], BioRuby [Ruby] and BioPostgres [Postgres database] are not. Lastly, as both academic and corporate labs are now generating a mountain of data, a review of specific database storage solutions would have been advantageous. Only the microarray chapter (15.5) addresses this, providing an excellent (albeit already slightly outdated) review of microarray data storage.
Although I have highlighted some drawbacks in the style and scope of the book I wish to emphasise that the content in the chapters is mostly of the highest quality. Bioinformatics for Geneticists is an excellent resource not just for the geneticist wishing to learn bioinformatics tools available in their field, but also for bioinformaticians wanting to learn background genetics in new or parallel areas of research. Moreover this book should ensure that any researcher's skill base is maintained, and that they are exploiting the growing and maturing bioinformatics resources freely available to the scientific community.