For as long as I can remember, I have been interested in how well we know what we say we know, how we acquire data about things, and how we react to and use this data. I was surprised when I first realized that many people are not only not interested in these things, but are actually averse to learning about them. It wasn’t until fairly recently that I concluded that we seem to be genetically programmed, on one hand, to intelligently learn how to acquire data and use it to our advantage but also, on the other hand, to stubbornly refuse to believe what some simple calculations and/or observations tell us.

My first conclusion is supported by our march through history, learning about agriculture and all the various forms of engineering and using this knowledge to make life better and easier. My latter conclusion comes from seeing all the people sitting on stools in front of slot machines at gambling casinos, many of whom are there that day because the astrology page in the newspaper told them that this was “their day.”

This is a book about probability and statistics. It's mostly about probability, with just one chapter dedicated to an introduction to the vast field of statistical inference.

There are many excellent books on this topic available today. I find that these books fall into two general categories. One category is textbooks. Textbooks are heavily mathematical with derivations, proofs and problem sets, and an agenda to get you through a term's course work. This is just what you need if you are taking a course.

The other category is books that are meant for a more casual audience — an audience that's interested in the topic but isn't interested enough to take a course. We're told today that people have “mathephobia,” and the books that appeal to these people try very hard to talk around the mathematics without actually presenting any of it. Probability and statistics are mathematical topics. A book on these subjects without math is sort of like a book on French grammar without any French words in it. It's not impossible, but it sure is doing things the hard way.

This book tries to split the difference. It's not a textbook. There is, however, some math involved. How much? Some vague reminiscences about introductory high school algebra along with a little patience in learning some new notation should comfortably get you through it. You should know what a fraction is and recognize what I'm doing when I add or multiply fractions or calculate decimal equivalents. Even if you don't remember how to do it yourself, just realizing what I'm doing and accepting that I'm probably doing it right should be enough. You should recognize a square root sign and sort of remember what it means. You don't need to know how to calculate a square root — these days everybody does it on a pocket calculator or a computer spreadsheet anyway. You should be able to read a graph. I review this just in case, but a little prior experience helps a lot. In a few cases some elementary calculus was needed to get from point A to point B. In these cases I try to get us all to point A slowly and clearly and then just say that I needed a magic wand to jump from A to B and that you'll have to trust me.

If you thumb through the book, you'll see a few “fancy” formulas. These are either simply shorthand notations for things like repeated additions, which I discuss in great detail to get you comfortable with them, or in a few cases some formulas that I'm quoting just for completeness but that you don' t need to understand if you don't want to.

As I discuss in the first chapter, probability is all about patterns of things such as what happens when I roll a pair of dice a thousand times, or what the life expectancies of the population of the United States looks like, or how a string of traffic lights slows you down in traffic. Just as a course in music with some discussions of rhythm and harmony helps you to “feel” the beauty of the music, a little insight into the mathematics of the patterns of things in our life can help you to feel the beauty of these patterns as well as to plan things that are specifically unpredictable (when will the next bus come along and how long will I have to stand in the rain to meet it?) as best possible.

Most popular science and math books include a lot of biographical information about the people who developed these particular fields. This can often be interesting reading, though quite honestly I'm not sure that knowing how Einstein treated his first wife helps me to understand special relativity.

I have decided not to include biographical information. I often quote a name associated with a particular topic (Gaussian curves, Simpson's Paradox, Poisson distribution) because that's how it's known.

Probabilistic considerations show up in several areas of our lives. Some we get explicitly from nature, such as daily rainfall or distances to the stars. Some we get from human activities, including everything from gambling C games to manufacturing tolerances. Some come from nature, but we don't see them until we “look behind the green curtain.” This includes properties of gases (e.g., the air around us) and the basic atomic and subatomic nature of matter.

Mathematical analyses wave the banner of truth. In a sense, this is deserved. If you do the arithmetic correctly, and the algorithms, or formulas, used are correct, your result is the correct answer and that's the end of the story. Consequently, when we are presented with a conclusion to a study that includes a mathematical analysis, we tend to treat the conclusion as if it were the result of summing a column of numbers. We believe it.

Let me present a situation, however, where the mathematics is absolutely correct and the conclusion is absolutely incorrect. The mathematics is simple arithmetic, no more than addition and division. There’s no fancy stuff such as probability or statistics to obfuscate the thought process or the calculations. I've changed the numbers around a bit for the sake of my example, but the situation is based upon an actual University of California at Berkeley lawsuit.

We have a large organization that is adding two groups, each having 100 people, such as two new programs at a school. I'll call these new programs A and B.

Program A is an attractive program and for some reason is more appealing to women than it is to men. 600 women apply; only 400 men apply. If all the applicants were equally qualified, we would expect to see about 60 women and 40 men accepted to the 100 openings for the program. The women applicants to this program tend to be better qualified than the men applicants, so we end up seeing 75 women and 25 men accepted into program A. If you didn't examine the applications yourself, you might believe that the admissions director was (unfairly) favoring women over men.

Program B is not as attractive a program and only 100 people apply. It is much more attractive to men than it is to women: 75 men and 25 women apply. Since there are 100 openings, they all get accepted.

Some time later, there is an audit of the school's admission policies to see if there is any evidence of unfair practices, be they sexual, racial, ethnic, whatever. Since the new programs were handled together by one admissions director, the auditor looks at the books for the two new programs as a group and sees that:

600 + 25 = 625 women applied to the new programs. 75 + 25 = 100 women were accepted. In other words, 100/625 = 16% of the women applicants were accepted to the new programs.

400 + 75 = 475 men applied to the new programs. 25 + 75 = 100 men were accepted. In other words, 100/475 = 21% of the men applicants were accepted to the new programs.

The auditor then reviews the qualifications of the applicants and sees that the women applicants were in no way inferior to the men applicants; in fact it's the opposite. The only plausible conclusion is that the programs' admissions director favors men over women.

The arithmetic above is straightforward and cannot be questioned. The flaw lies in how details get lost in summarization — in this case, looking only at the totals for the two programs rather than keeping the data separate. I'll show (in Chapter 13 ) how a probabilistic interpretation of these data can help to calculate a summary correctly.

My point here, having taken the unusual step of actually putting subject matter into a book's preface, is that mathematics is a tool and only a tool. For the conclusion to be correct, the mathematics along the way must be correct, but the converse is not necessarily true.

Probability and statistics deals a lot with examining sets of data and drawing a conclusion — for example, “the average daily temperature in Coyoteville is 75 degrees Fahrenheit.” This sounds like a great place to live until you learn that the temperature during the day peaks at 115 degrees while at night it drops to 35 degrees. In some cases we will be adding insight by summarizing a data set, but in some cases we will be losing insight.

My brother-in-law Jonathan sent me the following quote, attributing it to his father. He said that I could use it if I acknowledge my source: Thanks, Jonathan.

“The average of an elephant and a mouse is a cow, but you won't learn much about either elephants or mice by studying cows.” I'm not sure exactly what the arithmetic in this calculation would look like, but I think it's a memorable way of making a very good point.

I could write a long treatise on how bad conclusions have been reached because the people who had to draw the conclusions just weren't looking at all the data. Two examples that come to mind are (1) the Dow silicone breast implant lawsuit where a company was put out of business because the plaintiffs “ demonstrated ” that the data showed a link between the implants and certain serious disease and (2) the crash of the space shuttle Challenger where existing data that the rubber O - rings sealing the liquid hydrogen tanks get brittle below a certain temperature somehow never made it to the table.

The field of probability and statistics has a very bad reputation (“Lies, Damned Lies, and Statistics*”). It is so easy to manipulate conclusions by simply omitting some of the data, or to perform the wrong calculations correctly, or to misstate the results — any and all of these possibly innocently — because some problems are very complicated and subtle. I hope the materials to follow show what information is needed to draw a conclusion and what conclusion(s) can and can't be drawn from certain information. Also I'll show how to reasonably expect that sometimes, sometimes even inevitably, as the bumper stickers say, stuff happens.

I spend a lot of time on simple gambling games because, even if you're not a gambler, there's a lot to be learned from the simplest of random events — for example, the result of coin flips.

I've also tried to choose many examples that you don't usually see in probability books. I look at traffic lights, waiting for a bus, life insurance, scheduling appointments, and so on. What I hope to convey is that we live in a world where so many of our daily activities involve random processes and the statistics involved with them.

Finally, I introduce some topics that show how much of our physical world is based on the consequences of random processes. These topics include gas pressure, heat engines, and radioactive decay. These topics are pretty far from things you might actually do such as meeting a friend for lunch or counting birds in the woods. I hope you'll find that reading about them will be interesting.

One last comment: There are dozens and dozens of clever probability problems that make their way around. I've included several of these (the shared birthday, the prize behind one of three doors, etc.) where appropriate and I discuss how to solve them. When first confronted with one of these problems, I inevitably get it wrong. In my own defense, when I get a chance to sit down and work things out carefully, I (usually) get it right. This is a tricky subject. Maybe that's why I find it to be so much fun.

*This quote is usually attributed to Benjamin Disraeli, but there seems to be some uncertainty here. I guess that, considering the book you’re now holding, I should say that “There is a high probability that this quote should be attributed to Benjamin Disraeli."

Probably Not:  Preface

Home
1