Statistics, by Freedman, Pisani, and Purves
Statistics depends crucially on mathematics, but it is not subordinate in this dependence. Much of the power of statistics is in common sense, amplified by appropriate mathematical tools, and refined through careful analysis. An introductory course in statistics is then, first and foremost, a course in proper reasoning – with a quantitative bent.
Some years ago I found myself teaching such a course at a large state university. The audience was primarily students without much mathematical training (e.g. no calculus). I’m a mathematician by background, and I confess to never having taken an introductory statistics course myself, but I found the material to be interesting and the students to be, for the most part, sharp and motivated.
However, a blight upon the class was the book I was assigned to teach from. It was a patchwork quilt of good intentions – cut, stretched, and stitched together by committee into a whole that was less than its parts. There were full-color photos, boxes and text insets, many anecdotes and relevant examples about teenagers and iPhone usage. Upon a glance at the table of contents, it appeared quite sophisticated for a book of its level by including some “advanced” material not usually covered in such an introductory class. The textbook was sold in a bundle with access to an online homework system, and of course, it was all expensive as hell.
There are basically four main topics covered in an intro stats course. They are i. the Design of Experiments, or Art of getting data in the first place, ii. Descriptive Statistics, or the Art of summarizing tables of numbers. iii. Probability Theory, (basic discrete probability, and some limit laws), iv. Inferential Statistics (parameter estimation and significance testing).
The richest of these topics is the design of experiments, as this is where the real world creeps in. All data in which statisticians work with comes from an experiment or study of some sort. The tools of statistics are designed to develop a quantitative understanding of real phenomena, by providing a microscope of sorts (or telescope, if you’d prefer) to bring this experiential data into focus. As Hamming said, the purpose of computing is insight, not numbers.
Unfortunately, it is easier to grade numbers than insight. Which brings me back to my assigned textbook. There was, in my opinion, an undue emphasis on technique as opposed to critical statistical thinking. The vision of the book is that in September you can start with a room of starry-eyed 18-20 year olds, ignorant of even the most basic descriptive statistics, and by December they’ll be dutifully filling out ANOVA tables.
The problem with this is that without a thorough understanding of the limitations of technique, as well as the real challenges of getting useful data and meaningfully interpreting the results of a statistical analysis, such a a book is like a loaded gun. Emphasis on shallow technical topics such as computing F-statistics and p-values develops a false sense of mastery; and even more dangerous than someone ignorant of statistics is someone half-ignorant of statistics.
The book by Freedman, Pisani, and Purves is the one I would have liked to teach from, and it was the book I drew upon the most in prepping my own lectures, as an antidote to the overwrought and confused style of my assigned text. The authors maintain the underlying attitude that statistics is a useful tool for understanding certain questions about the world, but in this way it augments human judgement, rather than supplanting it. To quote from the preface:
Why does the book include so many exercises that cannot be solved by plugging into a formula? The reason is that few real-life statistical problems can be solved that way. Blindly plugging into statistical formulas has caused a lot of confusion. So this book takes a different approach: thinking.
Although it is an elementary book, the signal-to-noise ratio is quite good. The writing is clear and simple, and the layout itself is mostly devoid of unnecessary distractions. There are occasional New Yorker cartoons interspersed throughout the text, but these are both charming and topical. Traditionally conterintuitive topics, such as the law of large numbers, or the interpretation of things such as confidence intervals & p-values, are explained both in a relatively straightforward fashion in the text, and also by way of Socratic dialogs. The really important points are boxed in.
What I admire the most about the book are the many excellent examples drawn from the social and natural sciences. The authors constantly emphasize, from various directions, the underlying meaning of the statistical statements being made, bringing into focus the role of mathematical models, and the linkage of these with reality. This is even reflected in the sequencing of topics. The book begins first with a discussion of experiment design, and here the tone is immediately established. After a several page discussion of the polio vaccine trials of the 1950s, and the confounding in some due to poor design, one is led to the zeroth law of statistics: garbage in, garbage out. Or in other words, a poorly designed experiment will lead to wrong conclusions about the world. And it matters, because lives are at stake.
A recurrent theme is the book’s emphasis on the limitations of the statistical techniques it develops. For example, in a chapter on sample surveys, the basic problem of parameter estimation is introduced in the context of political polling. Simple random sampling is presented as something of a holy grail in survey design, and the dangers of deviating from this mechanism are discussed through an analysis of several historical polls. Once the case has been made for simple random sampling, it would be all too easy to stop the discussion there. Instead, the authors dedicate a whole chapter to the problem of estimating the unemployment rate of the United States labor force. There are two aspects of this which are emphasized:
Simple random sampling in this context would be impractical for several reasons, and so the Current Population Survey (a nationwide survey conducted by the Census Bureau) uses fairly sophisticated probability models in its design.
Estimating standard errors here is likewise more complicated than in the context of a simple random sample, but nonetheless the estimation can still be done in a coherent and principled manner.
It is refreshing that the authors take the time to develop these ideas through an in-depth analysis of statistical practice in the wild. One may be concerned that too much emphasis on practice can distract from the underlying ideas; in fact, just the opposite happens, as the authors are careful to emphasize the key points of the case studies they undertake.
The book is not without its weak moments, although they are few. One in particular which I recall is the treatment of A/B testing. Essential to any hypothesis testing is the matter of how to reduce the sampling mechanism to a simple probabilistic model, so that a quantitative test may be derived. The book emphasizes one such model: simple random sampling from a population, which then involves the standard probabilistic ideas of binomial and multinomial distributions, along with the normal approximation to these. Thus, one obtains the z-test.
In the context of randomized controlled experiments, where a set of subjects is randomly assigned to either a control or treatment group, the simple random sampling model is inapplicable. Nonetheless, when asking whether the treatment has an effect there is a suitable (two-sample) z-test. The mathematical ideas behind it are necessarily different from those of the previously mentioned z-test, because the sampling mechanism here is different, but the end result looks the same. Why this works out as it does is explained rather opaquely in the book, since the authors never developed the probabilistic tools necessary to make sense of it (here one would find at least a mention of hypergeometric distributions). Given the emphasis placed in the beginning of the book on the importance of randomized, controlled experiments in statistics, it feels like this topic is getting short-shrift.
But such a criticism is quite minor. I would strongly recommend this book to anyone interested in learning basic statistics, teaching basic statistics, or simply reviewing the fundamentals. I’d also mention that an online book by Philip Stark (with video lectures), much in the spirit of Freedman, Pisani, and Purves’ text, is available here.
References
- David Freedman, Robert Pisani, and Roger Purves, Statistics, 4th ed.