Evidence vs. Nonsense: A Guide to the Scientific Method

Tim Larkin

Hemp plants grow on the planet Jupiter. Nutmeg is good for brain diseases. Flies, insects and other forms of life are created spontaneously by decaying meat.

Each of these totally incorrect statements was once held as truth by learned people, including scientists. And each was based, not on superstition and ignorance, but on reason and observation.

When Galileo discovered in the 17th century that Jupiter possessed four moons, the great mathematician and physicist Christian Huygens applied his reasoning power to the question of "Why four instead of one?" He asked himself, "What is the purpose of a moon?" Well, the "purpose" of earth's moon was to help sailors navigate. If a planet has four moons it must therefore have a lot of sailors. Sailors mean boats. Boats mean sails. And sails mean ropes. Ropes are made of hemp. Therefore, it is obvious that Jupiter must have many hemp-producing plants.

As for nutmeg, its convoluted surface resembles the surface of the brain. According to a once respected principle of drug therapy called "the doctrine of signatures," a thoughtful Creator had provided clues to help treat disease. For example, since the leaves of the cyclamen plant roughly resemble the human ear, they should be used for ear afflictions. By the same token, the nutmeg, given its cerebral shape, was the obvious drug of choice for brain disorders.

When the Voyager 1 and 2 satellites raced past Jupiter in 1979, the number of moons detected around that planet had risen to 14, but no one talked about hemp. And today we know that nutmeg has no therapeutic value (although if consumed in large amounts it can severely damage the liver).

The conclusions once accepted about nutmeg and the brain and hemp on Jupiter have something in common. They were derived by a reasoning process that involved "seeing through the intellect" to discern what something is "for" or what "should be," rather than "seeing through the eyes" to determine what something actually is. This process of finding out what something actually is we call the scientific method. Indeed, science itself is, simply put, a way to discover evidence, a way that includes an objective, logical and systematic method of analysis.

But even the best methods can be improperly used. One of the best examples of both the proper and improper use of the scientific method involved an attempt to settle, once and for all, the question of whether living things could spring up from nonliving matter—spontaneous generation. Belief in spontaneous generation was founded on observation—inaccurate observation. People saw that certain lower forms of life appeared in water, soil and decaying organic substances of many kinds. Soon it was accepted as truth that worms and caterpillars came from dew on cabbage leaves, houseflies from wet wood, moths from woolen garments, anchovies from sea foam, and mice from river mud. Certain substances seemed to be potent producers of life, such as rotting wood, animal hair, stagnant water, paper, and the carcasses of animals.

By the 1800s, skepticism regarding spontaneous generation was growing among a few scientists, particularly since invention of the microscope had revealed the existence of bacteria. By the middle of the 19th century, some biologists concluded that spontaneous generation was nonsense and were ready to prove it.

Foremost among the skeptics was Louis Pasteur. He was challenged by another eminent scientist, F. A. Pouchet. Both Pasteur and Pouchet conducted experiments. Both used scientific methods and scientific apparatus. Pouchet's tests with various substances "proved" that life sprang up spontaneously; Pasteur's proved the opposite. How could this be possible? Well, Pouchet did not. Pouchet's experimental apparatus could not prevent microorganisms and dust particles in the air from reaching the experimental substance, while Pasteur succeeded in devising an apparatus that excluded air. The point is, both were using the scientific method of experimentation, observation and logic—but one used it correctly and the other did not. Pasteur produced evidence; Pouchet produced nonsense disguised as evidence.

Using the scientific method correctly, so that it yields evidence and not nonsense, always poses a challenge to researchers conducting an investigation, particularly when higher forms of life are being studied. Sometimes, even modern scientists fail the challenge. Today, as in Pasteur's day, constant vigilance and extraordinary effort is required in order to produce a scientifically valid result from an experiment. For example, when FDA's National Center for Toxicological Research conducts an experiment to see if a certain chemical causes cancer in mice, extraordinary efforts are made to make sure that whatever ill effects the mice exhibit are the result of just one thing—the specific chemical being tested—and nothing else, such as genetic predisposition of the test animals, or pollutants in the air, or contact with the animal handlers. To eliminate these variables, both the mice that receive the chemical (the experimental group) and the mice that do not (the control group) are genetically similar; the environment of both groups is carefully controlled—incoming air is filtered, temperature and humidity are constantly monitored, lights for both groups go on and off at the same time each day. Uncontaminated equipment, bedding and water are the same for both groups. Even the "interior environment" of the animals is scrutinized, with tests for 33 parasites, 20 types of bacteria, 13 fungi and 13 viruses. If any of these intruders are found, the affected mice are eliminated from the study. Scientists working with the mice undergo a complete physical examination, wear sterilized clothing, and engage in the equivalent of a surgical scrub before contact with the animals. The effort to find just one thing, by excluding any other variables, also includes the chemical being tested. Extreme care is taken so that the animal feed and the test chemical mixed with it are pure and uniform. Finally, the animals are sacrificed at specified intervals, during which observations of a total of 48 organ and tissue samples are made.

It is obvious that finding evidence rather than nonsense is extraordinarily difficult, and expensive, even when the study only involves small laboratory animals. When the study involves human beings, the difficulty is compounded a thousandfold.

First, while humans are involved in various experiments, such as in the final phases of testing new medicines, there are—and must be—strict limits on the kinds of experiments and the conditions under which they can be carried out. These conditions are expressed in international codes, specifically those of the 18th World Medical Association Declaration of Helsinki (1964), which were revised and expanded in 1975 by the 29th World Declaration of Tokyo. To meet these standards, medical institutions that receive federal research funds must set up special committees of scientists and laymen to approve any human experiment. And, in the United States, proposed experiments that involve a new drug or a medical device with a significant risk must first gain approval of the Food and Drug Administration. Such safeguards protect the subjects in the experiments but often make the research more difficult to carry out.

Second, it is impossible to design human experiments with the same restrictions employed for animal experiments. For example, we cannot obtain the kind of detailed knowledge about human genetic history that is routinely available for various laboratory animals, so genetic variation is largely an unknown. Further, we cannot isolate humans in a controlled laboratory setting for lengthy periods to eliminate environmental variables. This does not mean that the search for evidence is impossible with humans, only that it is extremely difficult.

To overcome these obstacles while adhering to the scientific method, several techniques have evolved, some of which provide more compelling evidence than others that the particular variable—the one thing being sought or tested—actually caused the observed effect. The highest quality of evidence comes from a properly conducted clinical trial. Just as with the mice in the cancer study, this form of experiment involved two groups: the test or experimental group, which receives the new drug or whatever it is that is being tested, and the control group, which does not. To ensure maximum similarity, members of both groups are randomly chosen from the same pool of candidates. Since, unlike laboratory animals, the behavior of human subjects can be affected by knowledge about the nature of the experiment, it is necessary to give the control group a placebo, a substance not known to be effective in dealing with the condition being observed yet superficially similar in every way to that being given the experimental group. (There is one exception: testing a new drug on patients already seriously ill. In such cases it is ethically impermissible to use a placebo that has no therapeutic action. In such cases a "control" drug—one that is not experimental and whose effects are known and measurable—is used instead of a placebo.) In addition to using a placebo, it is also necessary that not only the participants but also those measuring the effect of the test not know (are blind to) who is in the experimental group and who are the controls. This is the so-called "double blind" clinical trial.

An example of proper use of a clinical trial to establish the effect of a single variable was a study to see if lowering cholesterol intake reduced the risk of coronary heart disease in men between 35 and 59 who already had high cholesterol levels. Some 480,000 men volunteered for the program, of whom 3,810 were chosen. To ensure only one variable was involved, all volunteers who had other health problems, such as a history of angina or abnormal electrocardiograms, were not allowed to participate. Individuals were randomly assigned to the experimental and control groups. For an average of 7.4 years, the experimental group received a cholesterol-reducing drug (cholestyramine), while the control group received a placebo dispensed in an identical packet. Both groups followed a moderately low-cholesterol diet. Participants visited clinics every two months to receive new supplies of the medication (or placebo) and to take various tests. At he end of the study, those receiving the medication had significantly lower cholesterol levels and suffered 24 percent fewer deaths from coronary heart disease and 19 percent fewer non-fatal heart attacks compared to the control group.

What is the one thing found in this careful and extensive clinical trial? That reducing cholesterol lowers the risk of coronary heart disease? No. The test included only middle-aged men. That a low-cholesterol diet lowers the risk of coronary heart disease in middle-aged men? No. Only that the drug and the diet together lowered the rate of coronary heart disease. To find out the effect of diet alone on middle-aged males, another study would have to be conducted in which the experimental group followed a low-cholesterol diet and the control group did not. (This would not only be very difficult to do, it would raise serious ethical questions as well.)

Where clinical trials are not practical, there are other ways to employ the scientific method, such as the cohort study, the cross-sectional survey, and the case-control study. In cohort studies—which look forward in time—a group of people is observed over a long period, perhaps many years, to see what habits or characteristics affect their health. Cohort studies have been used, for example, to study the relationship between cigarette smoking and lung cancer.

The cross-sectional survey—which, like a snapshot, "freezes" a specific moment in time—aims at finding the same kind of relationships that might be shown by the "moving picture" of the cohort study, but at far less cost. In a cross-sectional survey, a specific group is looked at to see if a substance or activity, say smoking, is related to the health effect being investigated—for example, lung cancer. If a significantly greater number of smokers already have lung cancer than those who don't smoke, this would support the hypothesis that lung cancer is caused by smoking.

While the cohort study looks forward, and the cross-sectional survey looks at the present, the case-control study looks backward, comparing the characteristics of one group (such as those with lung cancer) with another group (those who do not have lung cancer) to see if there are differences (such as smoking habits).

Since it is impossible to say with total certainty that just one thing has been uncovered, the evidence found in these kinds of studies is generally less strong than that from more well-controlled experiments such as clinical trials. It is not possible to say the evidence is absolute proof that a substance or activity causes a certain effect. It is only possible to say that certain health effects are associated with the substance or activity under study. Sometimes this association is so strong that the hypothesis is considered proven.

As with Pasteur's and Pouchet's experiments concerning spontaneous generation, all of these types of studies can be done well (and thus produce some form of scientific evidence) or done poorly, in which case we have nonsense rather than evidence. Too often what is hailed in press or TV reports as the latest "scientific fact" turns out not to be fact at all, but rather the result of an error in applying the scientific method.

A recent example concerns a study done at Oregon Health Sciences University by Dr. David McCarren and others and published last year in Science magazine. The study concluded that low calcium intake was associated with high blood pressure and that, contrary to what was commonly believed, high sodium intake was associated not with high blood pressure but rather with lower blood pressure.

Many of the greatest scientific advances have resulted in overturning what had been accepted as fact—for example, that life does not spring up spontaneously from dung heaps and that the earth is not the center of the universe. So the fact that this scientific test overturned a commonly accepted view could have been in the great tradition of science in which there are no final facts, only tentative ones always open to challenge and revision as new knowledge arises. An important question, however, was: How was the test conducted? Did it in fact find, or come close to finding, just one thing—that those with high blood pressure consumed less sodium than those who had lower blood pressure?

The evidence supporting the high sodium/lower blood pressure conclusion was derived from a type of cohort study. It involved measuring the levels of 17 nutrients (including calcium and sodium) in the diets of some 10,000 people with no history of high blood pressure. But subsequent examination of how the study was conducted raised serious questions about the quality of the evidence, or even that it was evidence at all. FDA, the National Center for Health Statistics (the source of data used in the study), and the National Heart, Lung, and Blood Institute evaluated the use of the statistics and found "major conceptual and statistical problems in the author's approach." These problems, the agencies found, produced "inappropriate conclusions." Specifically, the study was found to contain three errors that made it difficult to say what, if anything, had been found regarding the relationship between blood pressure and sodium.

The first error involved the HOW of measurement—the blood pressure readings themselves. A blood pressure reading is usually a measure of systolic pressure (when the heart is actively pumping blood into the arteries) and diastolic pressure (when the relaxed heart is receiving blood from the veins). However, this study considered only the systolic pressure, even though doctors consider both numbers to be clinically important.

The second error involved the WHAT of measurement—the amount of sodium in the foods consumed. The study confined itself to the amount of sodium in the foods before they were prepared, not the often considerable amounts that might be added during cooking or at the table. The study also ignored the varying levels of sodium in drinking water and even in some medicines that the participants may have been taking.

The third error involved the WHO of measurement—failure to consistently control or make allowances for such variables among the participants as age, race, sex and weight. For example, when age was considered, the findings showed that sodium is in fact directly associated with high blood pressure, contrary to the conclusion reached by the scientists who conducted the study.

One of the reasons a controlled experiment is viewed by scientists as an effective way to arrive at fact is that it can be repeated by other scientists to see if the results can be duplicated. If such a challenge is met and the same experiment conducted by different and perhaps very skeptical scientists yields the same findings, then the results are accepted.

On the other hand, results from nonexperimental methods, such as case-control or cohort studies, are usually validated by closely checking such factors as the way the data were collected and analyzed and whether the data were sufficient to support the conclusion. In the study that concluded that lowering cholesterol levels would reduce the risk of heart disease, the data, the method and, hence, the conclusion passed this test. But the sodium/high blood pressure study failed to pass the scrutiny of other scientists; so the conclusion from other studies, that high blood pressure is directly related to sodium intake, still stands.

The advance of scientific knowledge is based on employing the scientific method and logically analyzing the results. Not all such studies will stand the scrutiny of other scientists; some startling new findings will go the way of spontaneous generation and nutmeg therapy for brain disease. But without such studies, and the scrutiny of their findings in the scientific community, there would be precious little growth in knowledge at all, and we might still be trying to estimate the size of this year's hemp crop on Jupiter.

This article is reprinted from the June 1985 issue of FDA Consumer. At the time it was written Mr. Larkin was a freelance reporter and former FDA employee who worked in Easton, Maryland.

This article was posted on February 11, 2005.