Supreme Court of Texas.
MERRELL DOW PHARMACEUTICALS, INC., Petitioner,
Ernest HAVNER and Marilyn Havner on Behalf of their minor child Kelly HAVNER, Respondents.
Argued March 19, 1996.
Decided July 9, 1997.
Order Overruling Rehearing Nov. 13, 1997.
Attorneys & Firms
*708 Steven Goode, Austin, for Petitioner.
John T. Flood, Corpus Christi, for Respondents.
OWEN, Justice, delivered the opinion of the Court in which PHILLIPS, Chief Justice, and ABBOTT, Justices, join.
The issue in this case is whether there is any evidence that the drug Bendectin caused Kelly Havner to be born with a birth defect. We hold that the evidence offered is legally insufficient to establish causation. Accordingly, we reverse the judgment of the court of appeals. 907 S.W.2d 535.
Kelly Havner was born with a limb reduction pyridoxine hydrochloride, which is vitamin B–6. Prior to 1977, Bendectin had contained a third component, dicylomine hydrochloride, which is an anticholergenic. Approximately thirty million women took Bendectin in either the two- or three-ingredient form.
More than twenty years ago, questions were raised about Bendectin and its possible association with birth defects and that Bendectin is a safe drug. Although FDA approval of Bendectin has never been revoked, Merrell Dow withdrew the drug from the market in 1983, a little over a year after Kelly Havner was born.
The Havners’ suit is based on theories of negligence, defective design, and defective marketing. It is one of thousands brought against Merrell Dow and its predecessors for the manufacture and distribution of Bendectin. In virtually all the Bendectin litigation, the central issue has been the scientific reliability of the expert testimony offered to establish causation. Merrell Dow challenged the Havners’ causation evidence at several junctures in these proceedings. It filed a motion for summary judgment, contending that there is no scientifically reliable evidence that Bendectin causes limb reduction birth defects or that it caused Kelly Havner’s birth defect. Before denying the motion, the trial court held a hearing at which the scientific *709 reliability of the Havners’ summary judgment evidence was extensively aired.
Just before trial, the scientific reliability of the Havners’ evidence was again raised by Merrell Dow in motions in limine that sought to exclude the testimony of certain of the Havners’ experts and other causation evidence. One of these motions requested that testimony about causation be excluded until a prima facie case had been established that there was a statistically significant elevated risk that a child would be born with limb reduction birth defects if the child’s mother ingested Bendectin. Another motion sought to preclude the Havners’ witnesses from relying on in vitro and in vivo animal studies. Other motions sought to exclude entirely the testimony of three of the Havners’ causation witnesses. The issues were fully briefed, and after a lengthy hearing, the trial court denied each of the motions.
A bifurcated jury trial ensued. In the liability phase, the Havners called five experts on the causation question. Merrell Dow objected to the admission of some, but not all, of this evidence. Merrell Dow also unsuccessfully moved for a directed verdict on the issue of causation at the close of the Havners’ evidence. As can be seen from the record, the question of scientific reliability was raised repeatedly.
At the conclusion of the liability phase, the jury found in favor of the Havners and awarded $3.75 million. In the punitive damages stage, the jury awarded $30 million, but that amount was reduced by the trial court to $15 million pursuant to former TEX. CIV. PRAC. & REM.CODE § 41.007. Merrell Dow appealed.
The panel of the court of appeals that originally heard the case reversed and rendered judgment that the Havners take nothing, holding that the evidence of causation was legally insufficient. Id. at 564. We granted Merrell Dow’s application for writ of error.
Merrell Dow challenges the legal sufficiency of the Havners’ causation evidence and the admissibility of some of that evidence and further contends that its due process rights under the United States Constitution and its due course rights under the Texas Constitution were denied. Because of our disposition of this case, we reach only the no evidence point of error.
All the expert witnesses on causation have appeared in other cases in which Bendectin was claimed to have caused limb reduction birth defects. The Sixth Circuit commented that the Bendectin suits are “variations on a theme, somewhat like an orchestra which travels to different music halls, substituting musicians from time to time but playing essentially the same repertoire.” Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1351 (6th Cir.1992).
The federal courts have dealt extensively with Bendectin litigation. To date, no plaintiff has ultimately prevailed in federal court. The evidence in those cases has been similar to that offered by the Havners. The federal decisions have discussed the substance of the evidence in detail, and often the testimony under scrutiny included that of Drs. Palmer, Newman, Glasser, Gross, and Swan, the Havners’ witnesses. These decisions are not binding on our Court, but they do provide extensive consideration of the scientific reliability of the causation evidence.
Some federal courts have concluded that the expert evidence of causation is legally insufficient. See Monahan v. Merrell–National Labs., No. 83–3108–WD, 1987 WL 90269 (D.Mass. Dec.18, 1987).
*710 Other federal courts have found the expert evidence to be inadmissible. See Will v. Richardson–Merrell, Inc., 647 F.Supp. 544 (S.D.Ga.1986).
One federal circuit court initially found the expert testimony admissible and reversed a summary judgment for Merrell Dow. 6 F.3d 778 (3d Cir.1993).
A few federal district courts have denied summary judgment for Merrell Dow on the basis that the evidence raised a fact question. Lanzilotti v. Merrell Dow Pharms., Inc., No. 82–0183, 1986 WL 7832 (E.D.Pa. July 10, 1986) (denying motion for directed verdict).
Decisions in which Merrell Dow obtained a jury verdict in its favor include In re Bendectin Litigation, 857 F.2d 290 (6th Cir.1988).
However, a state trial court recently entered judgment on a jury verdict against Merrell Dow that included a finding of fraud. In a written opinion, the court was highly critical of the evidence offered by Merrell Dow, concluding that there was ample evidence Merrell Dow had made misrepresentations to the FDA, including misrepresentations about its animal studies on Bendectin. Blum v. Merrell Dow Pharm., Inc., No. 1027 (Pa.Ct.C.P. Dec. 13, 1996) (appeal pending).
At least one state court has granted summary disposition for Merrell Dow on the basis that the expert testimony of Drs. Newman, Palmer, and Swan was inadmissible. DePyper v. Navarro, No. 83–303467–NM, 1995 WL 788828 (Mich.Cir.Ct. Nov.27, 1995) (holding plaintiffs’ experts’ testimony inadmissible under the Davis/Frye rule and rendering judgment for Merrell Dow).
The only appellate decision we have found, state or federal, that has upheld a verdict in favor of a plaintiff in a Bendectin case is from the court of appeals for the District of Columbia in Oxendine v. Merrell Dow Pharms., Inc., No. 82–1245, 1996 WL 680992 (D.C.Super.Ct. Oct. 24, 1996) (appeal pending).
Thus, we are not the first court to wrestle with the issues presented by the Bendectin litigation.
As in most of the Bendectin cases, the central issue before us is not whether the plaintiffs’ witnesses possessed adequate credentials, skills, or experience to testify about causation. The only witness whose qualifications have been challenged is Dr. Palmer, whose experience in identifying the cause of birth defects is questioned by Merrell Dow. Cf. Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 583 & n. 2, 113 S.Ct. 2786, 2792 & n. 2, 125 L.Ed.2d 469 (1993). The issue before us, as in most of the previously cited Bendectin cases, is whether the Havners’ evidence is scientifically reliable and thus some evidence to support the judgment in their favor.
In determining whether there is no evidence of probative force to support a jury’s finding, all the record evidence must be considered in the light most favorable to the party in whose favor the verdict has been rendered, and every reasonable inference deducible from the evidence is to be indulged in that party’s favor. Transportation Ins. Co. v. Moriel, 879 S.W.2d 10, 25 (Tex.1994)).
Several of the Havners’ experts testified that Bendectin can cause limb reduction Schaefer, 612 S.W.2d at 202.
In Schaefer, a workers’ compensation case, the plaintiff suffered from atypical Id. at 204–05.
Other courts have likewise recognized that it is not so simply because “an expert says it is so.” Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1360 (6th Cir.1992) (holding evidence legally insufficient in Bendectin case when no understandable scientific basis was stated).
It could be argued that looking beyond the testimony to determine the reliability of scientific evidence is incompatible with our no evidence standard of review. If a reviewing court is to consider the evidence in the light most favorable to the verdict, the argument runs, a court should not look beyond the expert’s testimony to determine if it is reliable. But such an argument is too simplistic. It reduces the no evidence standard of review to a meaningless exercise of looking to see only what words appear in the transcript of the testimony, not whether there is in fact some evidence. We have rejected such an approach. See Burroughs Wellcome, 907 S.W.2d at 499–500.
Justice Gonzalez, in writing for the Court, gave rather colorful examples of unreliable scientific evidence in Rule 702 deals with the admissibility of evidence, it offers substantive guidelines in determining if the expert testimony is some evidence of probative value.
Similarly, to say that the expert’s testimony is some evidence under our standard of review simply because the expert testified that the underlying technique or methodology supporting his or her opinion is generally accepted by the scientific community is putting the cart before the horse. As we said in Robinson, an expert’s bald assurance of validity is not enough. 516 U.S. 869, 116 S.Ct. 189, 133 L.Ed.2d 126 (1995)).
*713 The view that courts should not look beyond an averment by the expert that the data underlying his or her opinion are the type of data on which experts reasonably rely has likewise been rejected by other courts. The underlying data should be independently evaluated in determining if the opinion itself is reliable. See, e.g., In re Agent Orange, 611 F.Supp. at 1245). If the expert’s scientific testimony is not reliable, it is not evidence. The threshold determination of reliability does not run afoul of our no evidence standard of review.
Indeed, the United States Supreme Court would agree that a determination of scientific reliability is appropriate in reviewing the legal sufficiency of evidence. While admissibility rather than sufficiency was the focus of the Supreme Court’s decision in Daubert, that Court explained that when “wholesale exclusion” is inappropriate and the evidence is admitted, a review of its sufficiency is not foreclosed:
[I]n the event the trial court concludes that the scintilla of evidence presented supporting a position is insufficient to allow a reasonable juror to conclude that the position more likely than not is true, the court remains free to direct a judgment ... and likewise to grant summary judgment.
509 U.S. at 595, 113 S.Ct. at 2798.
The Court cited two Bendectin decisions in support of this statement, Raynor v. Merrell Pharms. Inc., 104 F.3d 1371, 1376 (D.C.Cir.1997) (affirming judgment notwithstanding the verdict and noting that even if expert testimony were admissible under Daubert, it was “unlikely” that a jury could reasonably find it sufficient to show causation).
As already discussed, a number of other decisions in the Bendectin litigation have held that the causation evidence was legally insufficient, sometimes setting aside a jury verdict and in other cases granting summary judgment or a directed verdict. See supra at 709. The decision in Richardson–Merrell said in no uncertain terms that the trial court did not err in granting judgment notwithstanding the verdict because “[w]hether an expert’s opinion has an adequate basis” is an issue “falling within the province of the court.” 857 F.2d at 833.
There are many decisions outside the Bendectin litigation that have examined the reliability of scientific evidence in a review of the legal sufficiency of the evidence. See, e.g., 520 U.S. 1114, 117 S.Ct. 1243, 137 L.Ed.2d 325 (1997).
In Robinson, we set forth some of the factors that courts should consider in looking beyond the bare opinion of the expert. Those factors include:
(1) the extent to which the theory has been or can be tested;
(2) the extent to which the technique relies upon the subjective interpretation of the expert;
(3) whether the theory has been subjected to peer review and publication;
(4) the technique’s potential rate of error;
(5) whether the underlying theory or technique has been generally accepted as valid by the relevant scientific community; and
(6) the non-judicial uses that have been made of the theory or technique.
See Robinson, 923 S.W.2d at 557. The issue in Robinson was admissibility of evidence, but as we have explained the same factors may be applied in a no evidence review of scientific evidence.
If the foundational data underlying opinion testimony are unreliable, an expert will not be permitted to base an opinion on that data because any opinion drawn from that data is likewise unreliable. Further, an expert’s testimony is unreliable even when the underlying data are sound if the expert draws conclusions from that data based on flawed methodology. A flaw in the expert’s reasoning from the data may render reliance on a study unreasonable and render the inferences drawn therefrom dubious. Under that circumstance, the expert’s scientific testimony is unreliable and, legally, no evidence.
We next consider some of the difficult issues surrounding proof of causation in a toxic tort case such as this.
The Havners do not contend that all limb reduction birth defects is unknown. Given these undisputed facts, what must a plaintiff establish to raise a fact issue on whether Bendectin caused an individual’s birth defect? The question of causation in cases like this one has engendered considerable debate. Courts that have addressed the issue have not always agreed, and commentators have expressed widely divergent views on the quantum and quality of evidence necessary to sustain a recovery.
Sometimes, causation in toxic tort cases is discussed in terms of general and specific causation. See, e.g., From Science to Evidence: The Testimony on Causation in the Bendectin Cases, 46 STAN. L.REV.. 1, 14 (1993). General causation is whether a substance is capable of causing a particular injury or condition in the general population, while specific causation is whether a substance caused a particular individual’s injury. In some cases, controlled scientific experiments *715 can be carried out to determine if a substance is capable of causing a particular injury or condition, and there will be objective criteria by which it can be determined with reasonable certainty that a particular individual’s injury was caused by exposure to a given substance. However, in many toxic tort cases, direct experimentation cannot be done, and there will be no reliable evidence of specific causation.
In the absence of direct, scientifically reliable proof of causation, claimants may attempt to demonstrate that exposure to the substance at issue increases the risk of their particular injury. The finder of fact is asked to infer that because the risk is demonstrably greater in the general population due to exposure to the substance, the claimant’s injury was more likely than not caused by that substance. Such a theory concedes that science cannot tell us what caused a particular plaintiff’s injury. It is based on a policy determination that when the incidence of a disease or injury is sufficiently elevated due to exposure to a substance, someone who was exposed to that substance and exhibits the disease or injury can raise a fact question on causation. See generally 516 U.S. 869, 116 S.Ct. 189, 133 L.Ed.2d 126 (1995). The Havners rely to a considerable extent on epidemiological studies for proof of general causation. Accordingly, we consider the use of epidemiological studies and the “more likely than not” burden of proof.
Epidemiological studies examine existing populations to attempt to determine if there is an association between a disease or condition and a factor suspected of causing that disease or condition. See, e.g., Bert Black & David E. Lilienfeld, Causation in Toxic Torts: Burdens of Proof, Standards of Persuasion, and Statistical Evidence, 96 YALE L.J. 376, 380 (1986). Dr. Glasser, a witness for the Havners, gave as an example a study designed to see if a given drug causes rashes. Even though a study may show that ten people who took the drug exhibited a rash, while rashes appeared on only three people who did not take the drug, Dr. Glasser explained that the study cannot tell us which of the exposed ten got the rash because of the drug. We know that things other than the drug cause rashes.
Recognizing that epidemiological studies cannot establish the actual cause of an individual’s injury or condition, a difficult question for the courts is how a plaintiff faced with this conundrum can raise a fact issue on causation and meet the “more likely than not” burden of proof. Generally, more recent decisions have been willing to recognize that epidemiological studies showing an increased risk may support a recovery. Judge Weinstein, whose decision in the Agent Orange litigation has been widely discussed and followed, has observed that courts have been divided between the “strong” and “weak” versions of the preponderance rule. id. at 1263.
*716 Other courts have likewise found that the requirement of a more than 50% probability means that epidemiological evidence must show that the risk of an injury or condition in the exposed population was more than double the risk in the unexposed or control population. See, e.g., Cook v. United States, 545 F.Supp. 306, 308 (N.D.Cal.1982) (stating that in vaccine case, when relative risk is greater than 2.0, there is a greater than 50% chance that the injury was caused by the vaccine).
Some courts have reached a contrary conclusion, holding that epidemiological evidence showing something less than a doubling of the risk may support a jury’s finding of causation. In Grassis v. Johns–Manville Corp., 248 N.J.Super. 446, 591 A.2d 671, 674–76 (App.Div.1991) (holding that trial court erred in precluding opinion testimony based on epidemiological studies showing relative risks of less than 2.0).
The “doubling of the risk” issue in toxic tort cases has provided fertile ground for the scholarly plow. Those who advocate that something short of a doubling of the risk is adequate to support liability or who advocate that some type of proportionate liability should be imposed include Daniel A. Farber, Apples and Oranges: Confidence Coefficients and the Burden of Persuasion, 73 CORNELL L.REV. 54, 71–73 (1987).
On the other end of the spectrum is Michael Dore, who asserts that epidemiological studies cannot, standing alone, establish causation. See Dore, A Commentary on the Use of Epidemiological Evidence, supra, 7 HARV. ENVTL. L. REVV. at 434; see also Michael D. Green, Causal Inference in Epidemiology: Implications for *717 Toxic Tort Litigation, 71 N.C. L.REV. 247, 253, 289 (1992) (arguing that a strong association requires a risk ratio greater than or equal to 8.0, although moderate association of 3.0 to 8.0 could suffice if coupled with other factors).
Some commentators have been particularly critical of attempts by the courts to meld the more than 50% probability requirement with the relative risks found in epidemiological studies in determining if the studies were admissible or were some evidence that would support an award for the claimant. But there is disagreement on how epidemiological studies should be used. Some commentators contend that the more than 50% probability requirement is too stringent, while others argue that epidemiological studies have no relation to the legal requirement of “more likely than not.” Compare Gold, supra, 73 CORNELL L.REV. at 69 (arguing that it is fallacious to reason that “if the data are more probable under one hypothesis than another, then the former hypothesis is more likely to be true than the latter”); James Robins & Sander Greenland, The Probability of Causation Under a Stochastic Model for Individual Risk, 45 BIOMETRICS 1125, 1131 (1989) (concluding that proportional liability schemes cannot be based on epidemiological data alone).
Although we recognize that there is not a precise fit between science and legal burdens of proof, we are persuaded that properly designed and executed epidemiological studies may be part of the evidence supporting causation in a toxic tort case and that there is a rational basis for relating the requirement that there be more than a “doubling of the risk” to our no evidence standard of review and to the more likely than not burden of proof. See generally Cook, 545 F.Supp. at 308.
Assume that a condition naturally occurs in six out of 1,000 people even when they are not exposed to a certain drug. If studies of people who did take the drug show that nine out of 1,000 contracted the disease, it is still more likely than not that causes other than the drug were responsible for any given occurrence of the disease since it occurs in six out of 1,000 individuals anyway. Six of the nine incidences would be statistically attributable to causes other than the drug, and therefore, it is not more probable that the drug caused any one incidence of disease. This would only amount to evidence that the drug could have caused the disease. However, if more than twelve out of 1,000 who take the drug contract the disease, then it may be statistically more likely than not that a given individual’s disease was caused by the drug.
This is an oversimplification of statistical evidence relating to general causation, as we discuss below, but it illustrates the thinking behind the doubling of the risk requirement. For another viewpoint in this same vein, see ROBERT P. CHARROW & DAVID E. BERNSTEIN, WASHINGTON LEGAL FOUNDATION, SCIENTIFIC EVIDENCE IN THE COURTROOM: ADMISSIBILITY AND STATISTICAL SIGNIFICANCE AFTER DAUBERT 28–34 (1994), who advocate that there is a mathematically demonstrable relationship between relative risk and the more likely than not standard. They contend that a relative risk of slightly more than 2.0 will rarely, if ever, satisfy the legal causation *718 standard. From a mathematical perspective, the probability of general causation changes as the level of statistical significance changes. Id. at 29–31. A relative risk of 2.2 may be sufficient to show more than a 50% probability at the 0.05 level (5 chances out of 100 that result occurred by chance), but not at the 0.10 level (10 chances out of 100). With calculations that we do not attempt to set out here, these commentators offer an example in which a relative risk ratio of 2.75 results in a probability of general causation of about 52% with a statistical significance of 0.05, but only about a 43% probability of general causation with a statistical significance of 0.10. Id. at 31–32.
We recognize, as does the federal Reference Manual on Scientific Evidence, that a disease or condition either is or is not caused by exposure to a suspected agent and that frequency data, such as the incidence of adverse effects in the general population when exposed, cannot indicate the actual cause of a given individual’s disease or condition. See Linda A. Bailey et al., Reference Guide on Epidemiology, in FEDERAL JUDICIAL CENTER, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 169 (1994). But the law must balance the need to compensate those who have been injured by the wrongful actions of another with the concept deeply imbedded in our jurisprudence that a defendant cannot be found liable for an injury unless the preponderance of the evidence supports cause in fact. The use of scientifically reliable epidemiological studies and the requirement of more than a doubling of the risk strikes a balance between the needs of our legal system and the limits of science.
We do not hold, however, that a relative risk of more than 2.0 is a litmus test or that a single epidemiological test is legally sufficient evidence of causation. Other factors must be considered. As already noted, epidemiological studies only show an association. There may in fact be no causal relationship even if the relative risk is high. For example, studies have found that there is an association between silicone Breast Cancer?, 326 NEW ENG. J. MED.. 1649 (1992)). Likewise, even if a particular study reports a low relative risk, there may in fact be a causal relationship. The strong consensus among epidemiologists is that conclusions about causation should not be drawn, if at all, until a number of criteria have been considered. One set of criteria widely used by epidemiologists was published by Sir Austin Bradford Hill in 1965.2 Another set of criteria *719 used by epidemiologists in studying disease is the Henle–Koch–Evans Postulates.3 Although epidemiologists do not consider it necessary that all these criteria be met before drawing inferences about causation, they are part of sound methodology generally accepted by the current scientific community.
Sound methodology also requires that the design and execution of epidemiological studies be examined. For example, bias can dramatically affect the scientific reliability of an epidemiological study. See, e.g., Bailey et al., Reference Guide on Epidemiology, inREFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 138–43; Thompson, supra, 71 N.C. L.REV. at 260. We will not undertake an extended discussion of the many ways in which bias may cause results of a study to be misleading. We note only that epidemiological studies “are subject to many biases and therefore present formidable problems in design and execution and even greater problems in interpretation.” Marcia Angell, The Interpretation of Epidemiologic Studies, 323 NEW ENG. J. MED. 823, 824 (1996).
We also note that some of the literature indicates that epidemiologists consider a relative risk of less than three to indicate a weak association. See Thompson, supra, 71 N.C. L.REV. at 252 (citing Ernest L. Wynder, Guidelines to the Epidemiology of Weak Associations, 16 PREVENTIVE MED. 139, 139 (1987)). The executive editor of the New England Journal of Medicine, Marcia Angell, has stated that “[a]s a general rule of thumb, we are looking for a relative risk of three or more [before accepting a paper for publication], particularly if it is biologically implausible or if it’s a brand-new finding.” Gary Taubes, Epidemiology Faces Its Limits, SCIENCE, July 14, 1995, at 168. Similarly, Robert Temple, the director of drug evaluation at the FDA, has said that “[m]y basic rule is if the relative risk isn’t at least three or four, forget it.” Id. We hasten to point out that these statements are contained in what is more akin to the popular press, not peer-reviewed scientific journals, and the context of those statements is not altogether clear. We draw no conclusions from any of the foregoing articles other than to point out that there are a number of reasons why reliance on a relative risk of 2.0 as a bright-line boundary would not be in accordance with sound scientific methodology in some cases. Careful exploration and explication of what is reliable scientific methodology in a given context is necessary.
A few courts that have embraced the more-than-double-the-risk standard have indicated in dicta that in some instances, epidemiological studies with relative risks of less than 2.0 might suffice if there were other evidence of causation. See, e.g., Hall, 947 F.Supp. at 1398, 1404. We need not decide in this case whether epidemiological evidence with a relative risk less than 2.0, coupled with other credible and reliable evidence, may be legally sufficient to support causation. We emphasize, however, that evidence of causation from whatever source must be scientifically reliable. Post hoc, speculative testimony will not suffice.
A physician, even a treating physician, or other expert who has seen a skewed data sample, such as one of a few infants who has a supra, 15 CARDOZO L.REV. at 2148–49. Further, as we discuss in Part VI(A), an expert cannot dissect a study, picking and choosing data, or “reanalyze” the data to derive a higher relative risk if this process does not comport with sound scientific methodology.
The FDA has promulgated regulations that detail the requirements for clinical investigations of the safety and effectiveness of drugs. supra, 97 HARV. L.REV.. at 870 (arguing that anecdotal or particularized evidence accomplishes no more than a false appearance of direct and actual knowledge of a causal relationship). Expert testimony that is not scientifically reliable cannot be used to shore up epidemiological studies that fail to indicate more than a doubling of the risk.
To raise a fact issue on causation and thus to survive legal sufficiency review, a claimant must do more than simply introduce into evidence epidemiological studies that show a substantially elevated risk. A claimant must show that he or she is similar to those in the studies. This would include proof that the injured person was exposed to the same substance, that the exposure or dose levels were comparable to or greater than those in the studies, that the exposure occurred before the onset of injury, and that the timing of the onset of injury was consistent with that experienced by those in the study. See generally Thompson, Parker v. Employers Mut. Liab. Ins. Co., 440 S.W.2d 43, 47 (Tex.1969) (holding that a cause becomes “probable” only when “in the absence of other reasonable causal explanations it becomes more likely than not that the injury was a result”).
In sum, we emphasize that courts must make a determination of reliability from all the evidence. Courts should allow a party, plaintiff or defendant, to present the best available evidence, assuming it passes muster under Robinson, and only then should a court determine from a totality of the evidence, considering all factors affecting the reliability of particular studies, whether there is legally sufficient evidence to support a judgment.
Finally, we are cognizant that science is constantly reevaluating conclusions and theories and that over time, not only scientific knowledge but scientific methodology in a particular field may evolve. We have strived to make our observations and holdings in light of current, generally accepted scientific *721 methodology. However, courts should not foreclose the possibility that advances in science may require reevaluation of what is “good science” in future cases.
Certain conventions are used in conducting scientific studies, and statistics are used to evaluate the reliability of scientific endeavors and to determine what the results tell us. In this opinion, we consider some of the basic concepts currently used in scientific studies and statistical analyses and how those concepts mesh with our legal sufficiency standard of review. For an extended discussion of statistical methodology and its use in epidemiological studies, see Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1353 n. 1 (6th Cir.1992); Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 138–43, 171–78. We do not attempt to discuss all the multifaceted aspects of the scientific method and statistics, but focus on the principles that shed light on the particular facts and issues in this case.
One way to study populations is by a retrospective case-control or case-comparison epidemiological study. For example, this type of study identifies individuals with a disease and a suitable control group of people without the disease and then looks back to examine postulated causes of the disease. See Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 136–38, 172. Another type of epidemiological study is a cohort study, or incidence study, which is a prospective study that identifies groups and observes them over time to see if one group is more likely to develop disease. Id. at 134–36, 173.
An “odds ratio” can be calculated for a case-control study. Id. at 175. For example, an odds ratio could be used to show the odds that ingestion of a drug is associated with a particular disease. The odds ratio compares the odds of having the disease when exposed to the drug versus when not exposed. If the ratio is 2.67, the odds are that a person exposed to the drug is 2.67 times more likely to develop the disease under study.
Similarly, the “relative risk” that a person who took a drug will develop a particular disease can be determined in a cohort study. Id. at 173, 176. The relative risk is calculated by comparing the incidence of disease in the exposed population with the incidence of the disease in the control population. If the relative risk is 1.0, the risk in exposed individuals is the same as unexposed individuals. If the relative risk is greater than 1.0, the risk in exposed individuals is greater than in those not exposed. If the relative risk is less than 1.0, the risk in exposed individuals is less than in those not exposed. For the result to indicate a doubling of the risk, the relative risk must be greater than 2.0. See id. at 147–48.
Perhaps the most useful measure is the attributable proportion of risk, which is the statistical measure of a factor’s relationship to a disease in the population. It represents the “proportion of the disease among exposed individuals that is associated with the exposure.” Id. at 149. In other words, it reflects the percentage of the disease or injury that could be prevented by eliminating exposure to the substance. For a more detailed discussion of the calculation and use of the attributable proportion of risk, see id. at 149–50; Black & Lilienfeld, supra, 71 N.C. L.REV. at 252–56.
The numeric value of an odds ratio is at least equal to the relative risk, but the odds ratio often overstates the relative risk, especially if the occurrence of the event is not rare. For an example of the difference between the mathematical calculation of the odds ratio and the relative risk, see BARBARA HAZARD MUNRO & ELLIS BATTEN PAGE, STATISTICAL METHODS FOR HEALTH CARE RESEARCH 233–35 (2d ed. 1993). In the example given by Munro and Page, the odds ratio was 3.91, while the relative risk was only 3.0 based on the same set of data. See also Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 149; Thompson, supra, 71 N.C. L.REV. at 250 n. 22.
*722 The relative risk may be expressed algebraically as:
RR = Ie ÷ Ic
where RR is the relative risk, Ie is the incidence of the disease in the exposed population, and Ic is the incidence of disease in the control population. A sample calculation is as follows:
· the incidence of the disease in exposed individuals (Ie) is 30 cases per 100 persons, or 0.3
· the incidence of the disease in the unexposed individuals (Ic) is 10 cases per 100 persons, or 0.1
· the relative risk is the incidence in the exposed group (0.3) divided by the incidence in the unexposed group (0.1), which equals 3.0
Using this hypothetical, can we conclude that people who are exposed are three times more likely to contract disease than those who are not? Not necessarily. The result in any given study or comparison may not be representative of the entire population. The result may have occurred by chance. The discipline of statistics has determined means of telling us how significant the results of a study may be.
The first step in understanding significance testing is to understand how research is often conducted. A researcher tests hypotheses and does so by testing whether the data support a particular hypothesis. The starting point is the null hypothesis, which assumes that there is no difference or no effect. If you were studying the effects of Bendectin, for example, the null hypothesis would be that it has no effect. The researcher tries to find evidence against the hypothesis. See DAVID S. MOORE & GEORGE P. MCCABE, INTRODUCTION TO THE PRACTICE OF STATISTICS 449 (2d ed. 1993); MUNRO & PAGE, supra, at 54. The statement that the researcher suspects may be true is stated as the alternative hypothesis. If a significant difference is found, the null hypothesis is rejected. If a significant difference is not found, the null hypothesis is accepted. MUNRO & PAGE, supra, at 54. This concept is important because it is the basis of the statistical test. Id.
A study may contain error in deciding to reject or accept a hypothesis, and this error can be one of two types. Id.; MOORE & MCCABE, supra, at 482–87. A Type I error occurs when the null hypothesis is true but has been rejected, and a Type II error occurs when the null hypothesis is false but has been accepted. MUNRO & PAGE, supra, at 55. An example of the two types of error given by Munro and Page is a comparison of two groups of people who have been taught statistics by different methods. Id. Group A scored significantly higher than Group B on a test of their knowledge of statistics. The null hypothesis is that there is no difference between the teaching methods, but because the study indicated there was a difference, the null hypothesis was rejected. Suppose, however, that Group A was composed of people with higher math ability and that in actuality the teaching method did not matter at all. The rejection of the null hypothesis is a Type I error. Id.
The probability of making a Type I error can be decreased by changing the level of significance, that is, the probability that the results occurred by chance. Id. If the level of significance had been five in one hundred (0.05), there is only a five in one hundred chance that the result occurred by chance alone. If the level of significance is one in one hundred (0.01), there is only a one in one hundred chance that the result occurred by chance alone. However, as the significance level is made more stringent (e.g., from 0.05 to 0.01), it will be more difficult to find a significant result. Id. Altering the significance level in this manner also increases the risk of a Type II error, which is accepting a false null hypothesis. Id. To avoid Type II errors, the level of significance can be lowered, for example, to ten in one hundred (0.1). Id.
Different levels of significance may be appropriate for different types of studies depending on how much risk one is willing to accept that the conclusion reached is wrong. Again, to take examples offered by Munro and Page, assume that a test for a particular genetic defect exists and that if the defect is *723 diagnosed at an early stage, a child with the defect can be successfully treated. If the genetic defect is not diagnosed in time, the child’s development will be severely impaired. If a child is mistakenly diagnosed as having the defect and treated, there are no harmful effects. Most would agree that it would be preferable to make a Type I error rather than a Type II error under these circumstances. Id. A Type II error would be failing to diagnose a child that had the genetic defect.
Contrast that hypothetical with one in which a federal study is conducted to determine whether a particular method of teaching underprivileged children increases their success in school. Id. The cost of implementing this teaching method in a nationwide program would be very great. A Type I error would be to conclude that the program had an effect when it did not. Id. The significance level for this project would probably be higher than the one used to screen for genetic defects in the other hypothetical. In the genetic defects example, it is preferable to treat children even if they may not have the disease, but in the teaching method example, it is not preferable to teach children at considerable cost if it has no effect.
A confidence level can be used in epidemiological studies to establish the boundaries of the relative risk. These boundaries are known as the confidence interval. See id. at 59–63; see also David H. Kaye & David A. Freedman, Reference Guide on Statistics, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 376–77, 396; MOORE & MCCABE, supra, at 432–37. The confidence interval tells us if the results of a given study are statistically significant at a particular confidence level. See MOORE & MCCABE, supra, at 432–33. A confidence interval shows a “range of values within which the results of a study sample would be likely to fall if the study were repeated numerous times.” Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 173. If, based on a confidence level of 95%, a study showed a relative risk of 2.3 and had a confidence interval of 1.3 to 3.8, we would say that, if the study were repeated, it would produce a relative risk between 1.3 and 3.8 in 95% of the repetitions. However, if the interval includes the number 1.0, the study is not statistically significant or, said another way, is inconclusive. This is because the confidence interval includes relative risk values that are both less than and greater than the null hypothesis (1.0), leaving the researcher with results that suggest both that the null hypothesis should be accepted and that it should be rejected. See, e.g., 884 F.2d 166 (5th Cir.1989); Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 173. This concept was explained to the jury in this case by Dr. Glasser, one of the Havners’ witnesses. Thus, a study may produce a relative risk of 2.3, meaning the risk is 2.3 times greater based on the data, but at a confidence level of 95%, the confidence interval has boundaries of 0.8 and 3.2. The results are therefore insignificant at the 95% level. If the researcher is willing to accept a greater risk of error and lowers the confidence level to 90%, the results may be statistically significant at that lower level because the range does not include the number 1.0. See generally Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 151–55. “[T]he narrower the confidence interval, the greater the confidence in the relative risk estimate found in the study.” Id. at 173.
The generally accepted significance level or confidence level in epidemiological studies is 95%, meaning that if the study were repeated numerous times, the confidence interval would indicate the range of relative risk values that would result 95% of the time. See supra, 71 N.C. L.REV. at 256. Virtually all the published, peer-reviewed studies on Bendectin have *724 a confidence level of at least 95%. Although one of the Havners’ witnesses, Dr. Swan, advocated the use of a 90% confidence level (10 in 100 chance of error), she and other of the Havners’ witnesses conceded that 95% is the generally accepted level.
Another of the Havners’ witnesses, Dr. Glasser, explained that in any scientific application, the confidence interval is kept very high. He testified that you “don’t ever see [confidence intervals of 50% or 60%] in a scientific study because that means we’re going to miss it a lot of times and [scientists] are not willing to take that risk.” One commentator advocates that the confidence level for admissibility of epidemiological studies should be higher than the generally accepted 95% and should be 99%. See Dore, A Proposed Standard, supra note 3, 28 HOW. L.J. at 693–95. But cf. Longmore v. Merrell Dow Pharms., Inc., 737 F.Supp. 1117, 1119–20 (D.Idaho 1990) (concluding that the scientific standard for determining causation is much stricter than the standard employed by the court and that confidence levels of 95%, 90%, or even 80% should not be required).
We think it unwise to depart from the methodology that is at present generally accepted among epidemiologists. See generally Bert Black, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993) (No. 92–102))). Accordingly, we should not widen the boundaries at which courts will acknowledge a statistically significant association beyond the 95% level to 90% or lower values.
It must be reiterated that even if a statistically significant association is found, that association does not equate to causation. Although there may appear to be an increased risk associated with an activity or condition, this does not mean the relationship is causal. As the original panel of the court of appeals observed in this case, there is a demonstrable association between summertime and death by drowning, but summertime does not cause drowning. 907 S.W.2d at 544 n. 8.
There are many other factors to consider in evaluating the reliability of a scientific study including, but certainly not limited to, the sample size of the study, the power of the study, confounding variables, and whether there was selection bias. These factors are not central to a resolution of this appeal, and we do no more than acknowledge that determining scientific reliability can have many facets.
Armed with some of the basic principles employed by the scientific community in conducting studies, we turn to an examination of the evidence in this case measured against the Robinson factors. See doxylamine succinate, the antihistamine component of Bendectin. We consider each in turn.
Dr. J. Howard Glasser, an associate professor at the University of Texas School of Public Health at the Texas Medical Center in Houston, is an epidemiologist with a Ph.D. in experimental statistics and a Master of Science of Bio–Statistics. He gave the jury an overview of statistics. As noted earlier, he explained that statistics are used to determine if there is a significant association between two events or occurrences, but cautioned that a statistical association is not the same thing as causation.
Glasser identified a number of epidemiological studies from which he concluded that it was more likely than not that there is an *725 association between Bendectin and birth defects, even though the authors of those studies did not find such an association. One study was done by Cordero and had a relative risk of 1.18 and a confidence interval of 0.65 to 2.13. However, the relative risk would need to exceed 2.0, and the confidence interval could not include 1.0, for the results to indicate more than a doubling of the risk and a statistically significant association between Bendectin and limb reduction birth defects. See supra Part V; see also birth defects resulted in a relative risk of 4.18, but the confidence interval was 0.48 to 36.3, a very large interval that included 1.0. Dr. Glasser agreed that results with a confidence interval that included 1.0 or a lower number would be inconclusive and statistically insignificant.
Dr. Glasser did, however, reanalzye some data, called the Jick data, that had been included in a report to the FDA. Glasser isolated information on women who had filled two or more prescriptions of Bendectin and who were not exposed to spermicide, which resulted in a relative risk of 13.0 of limb reduction birth defects. However, the confidence level he used was 90%. Further, there is no testimony or other evidence regarding the confidence interval. The confidence interval may or may not have contained 1.0.
The Havners also point to a memorandum prepared within the FDA that was identified by Dr. Glasser. The document indicates that the relative risk of limb defects when Bendectin is given within the first three lunar months of pregnancy is 2.13. The only conclusion drawn by Dr. Glasser from this memorandum is that, taken in conjunction with the other articles he had discussed, there is an “importance of time” and an “importance of exposure with the highest relative risk coming when the exposure period one to three lunar months is counted.” The memo itself was not introduced into evidence, and there is no evidence of the confidence level at which the relative risk of 2.13 was found or of the confidence interval. The confidence interval may or may not have contained 1.0.
Finally, Glasser testified about published studies on Bendectin that did show statistically significant results, but they dealt with birth defects did not bear out an association with Bendectin.
The other expert witness for the Havners who testified about epidemiological studies was Dr. Shanna Swan. She has a doctorate in statistics and is the Chief of the Reproductive Epidemiological Program for the state of California. She also teaches epidemiology at the University of California at Berkeley.
Dr. Swan conceded that none of the published epidemiological studies found an association between Bendectin and limb reduction defects. She identified a number of these studies and confirmed that the confidence intervals in each of them included 1.0. However, Dr. Swan testified about these studies at some length and criticized the methodology. Then, relying on these same studies, she opined that Bendectin more probably than not is associated with limb reduction birth defects. Swan considered the findings of these studies in the aggregate and testified that the results fall along a curve in which the “weight of the curve” was in the direction of an increased risk. Yet, she also said that these studies were consistent with a relative risk that was between 0.7 and 1.8. That is not a doubling of the risk. It may support her opinion that it is more probable than not that there is an association between Bendectin and limb reduction defects, but the magnitude of the association she gleaned from these studies is not more than 2.0, based on her own testimony.
Dr. Swan also performed a reanalysis of data from at least two studies. One reanalysis was of raw unpublished data underlying *726 the Jick study of limb reduction birth defects, the same data about which Dr. Glasser testified. Dr. Swan derived a relative risk estimate of 2.2 for women exposed to Bendectin during the first trimester. She also testified that the relative risk for women who were exposed to Bendectin but not exposed to spermicide was 8.8 and finally, that if women who were exposed to two or more Bendectin prescriptions were considered, without regard to exposure to spermicide, the relative risk was 13 with a confidence interval from 3 to 53. She did not reveal the confidence level used in obtaining these results, and there is no evidence of the confidence level in the record.
The other reanalysis by Dr. Swan was of data in the Cordero study, which was based on information collected by the Center for Disease Control in Atlanta. An abstract she prepared regarding this data was published in the Journal for the Society of Epidemiological Research in 1983 or 1984 and states that the original Cordero study found the odds ratio for limb reduction birth defects to be 1.2. Swan concluded, however, that when a different control group is selected, the relative risk estimates are affected. Swan’s abstract stated that, “under certain assumptions,” which are not identified, “the odds ratio for limb reduction defects” are “a highly significant” 2.8. There is no explanation in the abstract or in Dr. Swan’s testimony of the significance level used to obtain the 2.8 result. The result may well be statistically inconclusive at a 95% confidence level. We simply do not know from this record. Without knowing the significance level or the confidence interval, there is no scientifically reliable basis for saying that the 2.8 result is an indication of anything. Further, her choice of the control group could have skewed the results. Although her abstract does not identify what control group she used, Swan testified at trial that she chose births of 857 F.2d 823 (D.C.Cir.1988).
In addition to the statistical shortcomings of the Havners’ epidemiological evidence, another strike against its reliability is that it has never been published or otherwise subjected to peer review, with the exception of Dr. Swan’s abstract, which she acknowledges is not the equivalent of a published paper. Dr. Swan has published a number of papers in scientific journals, including a study that concluded Bendectin is not associated with cardiac birth defect cases for many years, Dr. Swan has never attempted to publish her opinions or conclusions about Bendectin and limb reduction defects. Similarly, studies by Dr. Glasser have been published in refereed journals, but none of his 32 to 33 publications mentions Bendectin or limb reduction birth defects.
As already discussed, there are over thirty published, peer-reviewed epidemiological studies on the relationship between Bendectin and birth defects. None of the findings offered by the Havners’ five experts in this case have been published, studied, or replicated by the relevant scientific community. As Judge Kozinski has said, “the only review the plaintiffs’ experts’ work has received has been by judges and juries, and the only place their theories and studies have been published is in the pages of federal and state reporters.” Daubert, 43 F.3d at 1318 (commenting on the same five witnesses called by the Havners). A related factor that should be considered is whether the study was prepared only for litigation. Has the study been used or relied upon outside the courtroom? Is the methodology recognized in the scientific community? Has the litigation spawned its own “community” that is not part of the purely scientific community? The opinions to which the Havners’ witnesses testified have never been offered outside the confines of a courthouse.
Publication and other peer review is a significant indicia of the reliability of scientific evidence when the expert’s testimony is in an area in which peer review or publication would not be uncommon. Publication in *727 reputable, established scientific journals and other forms of peer review “increases the likelihood that substantive flaws in methodology will be detected.” Perry v. United States, 755 F.2d 888, 892 (11th Cir.1985)).
We do not hold that publication is a prerequisite for scientific reliability in every case, but courts must be “especially skeptical” of scientific evidence that has not been published or subjected to peer review. Science and the Law in the Wake of Daubert: A New Search for Scientific Knowledge, 72 TEX. L.REV. 715, 778 (1994). Publication and peer review allow an opportunity for the relevant scientific community to comment on findings and conclusions and to attempt to replicate the reported results using different populations and different study designs.
The need for the replication of results was acknowledged by the Havners’ witnesses. Moreover, it must be borne in mind that the discipline of epidemiology studies associations, not “causation” per se. Particularly where, as here, direct experimentation has not been conducted, it is important that any conclusions about causation be reached only after an association is observed in studies among different groups and that the association continues to hold when the effects of other variables are taken into account. See, e.g., MOORE & MCCABE, supra, at 202.
As we have already observed, an isolated study finding a statistically significant association between Bendectin and limb reduction defects would not be legally sufficient evidence of causation. The Havners’ witnesses conceded that when a number of studies have been done, it would not be good practice to pick out one to support a conclusion. As the federal Reference Manual on Scientific Evidence points out, “[m]ost researchers are conservative when it comes to assessing causal relationships, often calling for stronger evidence and more research before a conclusion of causation is drawn.” Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 157. For example, Dr. Swan explained that initially, some studies showed a statistically significant association between Bendectin and the birth defect.
Accordingly, if scientific methodology is followed, a single study would not be viewed as indicating that it is “more probable than not” that an association exists. See, e.g., Lynch, 830 F.2d at 1194 (asserting that a new study coming to a different conclusion and challenging the consensus would be admissible).
The argument is sometimes made that waiting until an association found in one study is confirmed by others will mean that early claimants will be denied a recovery. See, e.g., Green, *728 supra, 72 TEX. L. REV. at 779–82 (discussing Galileo, Pasteur, DNA, and continental drift).
Others have argued that liability should not be allocated only on the basis of reliable proof of fault because legal rules should have the goals of “risk spreading, deterrence, allocating costs to the cheapest cost-avoider, and encouraging socially favored activities,” and because “ ‘consumers of American justice want people compensated.’ ” Rochelle Cooper Dreyfuss, 519 U.S. 819, 117 S.Ct. 73, 136 L.Ed.2d 33 (1996).
The Havners relied on in vivo animal studies to support the conclusion that Bendectin causes limb reduction birth defects in humans. This evidence was presented by Dr. Adrian Gross, a veterinarian and a veterinary pathologist who had worked at the FDA from 1964 to 1979, served as the Chief of the Toxicology Branch at the Environmental Protection Agency from 1979 to 1980, and thereafter was a Senior Science Advisor at the EPA. Dr. Gross confirmed that the FDA and EPA consider animal studies in assessing the potential human response to drugs or pesticides. He testified that what will affect an animal is likely to affect humans in the same way and that the only reason animal studies are done is to predict if the drug at issue will have an adverse effect on humans.
Dr. Gross reviewed a number of animal studies that had been conducted on Bendectin. He described studies on rabbits exposed to Bendectin in which he saw “a lot of malformed kits.” Gross testified about another study of rabbits that he found statistically significant. He opined that the probability that the malformations in this study occurred by chance were six in 10,000. With respect to another animal study on rabbits, he stated that the probability that the drug was harmless was less than one per 1,000,000. He listed studies on monkeys, rats, and mice showing “highly significant deleterious harmful effects as far as birth defects in rats was at 100 milligrams per kilogram per day, which would be the equivalent of a daily dosage of 1200 tablets for a woman weighing 132 pounds.
The Havners assert in their briefing before this Court that the accepted technique for determining if a substance is a teratogen in humans is to look at all information, including epidemiological data, animal data, biological plausibility, and in vitro studies. Dr. Swan confirmed that these are the relevant sources of information in determining Allen v. Pennsylvania Eng’g Corp., 102 F.3d 194, 197 (5th Cir.1996) (quoting and following Brock in toxic tort case).
We further note that with respect to the in vivo studies about which Dr. Gross testified, their reliability as predictors of the effect of Bendectin in humans is questionable because of the dosage levels. Dr. Gross offered no explanation of how the very high dosages could be extrapolated to humans. Other courts have rejected animal studies that relied on high dosage levels as evidence of causation in humans. See, e.g., Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349 (6th Cir.1992) (reasoning that to eliminate drugs toxic to embryos at high dosage levels would eliminate most drugs and many useful chemicals on which modern society depends heavily) (citing James Wilson, Current Status of Teratology, in HANDBOOK OF TERATOLOGY 60 (1977)). Gross also failed to explain why the published studies from which he extracted his data had concluded Bendectin was not harmful.
The in vivo studies identified in this case cannot support the jury’s verdict.
Dr. Stuart Allen Newman also relied on animal studies to support his opinion that Bendectin is a teratogen in humans. Dr. Newman holds a doctorate in chemical physics and is a professor at New York Medical College. He has published over fifty articles, although none contain the opinions or conclusions to which he testified in this case.
The studies Newman reviewed were in vitro studies, which are based on tests conducted on cells in a test tube or petri dish. doxylamine succinate was potentially capable of inducing genetic damage and that it should be tested on other systems. But Newman testified that if you find an effect that prevails across a number of different species, “you can be awfully sure that the same thing will prevail in humans.”
Newman opined that Kelly Havner’s defect was due to loss of portions of the skeleton that could with scientific certainty have been caused by a teratogen that affected the embryo. Similarly, he testified that the findings of one study, the Hassell/Horigan Study, indicated to him that doxylamine succinate is a teratogen in humans. He also testified that he had reviewed the records surrounding Marilyn Havner’s pregnancy and that to a reasonable certainty, she was not exposed to any teratogen other than Bendectin.
The in vitro studies are similar to the cell biology data at issue in Richardson, 857 F.2d at 830 (“Positive results from in vitro studies may provide a clue signaling the need for further research, but alone do not provide a satisfactory basis for opining about causation in the human context.”); Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 130–31 (noting that the problem with in vitro studies is extrapolating the findings “from tissues in laboratories to whole human beings”).
Logical support for Dr. Newman’s opinions was also lacking. A number of substances, such as birth defects in humans. Dr. Newman’s testimony is not evidence of causation.
Of the five witnesses who testified on the question of causation, the only witness who opined that Bendectin was the cause of Kelly Havner’s birth defect, as opposed to birth defects in general, was Dr. John Davis Palmer. Dr. Palmer is a licensed medical doctor and holds a doctorate in pharmacology. He is a professor at the University of Arizona College of Medicine and the acting head of its Pharmacology Department. His opinion was based in part on the testimony of the Havners’ other witnesses.
Dr. Palmer testified that there is a critical period during gestation when the limbs of a fetus are forming. Marilyn Havner took Bendectin somewhere between the 32nd and 42nd day of gestation, depending on how the date of conception is calculated, which was within the period for the development of Kelly Havner’s hand and arm. Palmer explained that the molecular structure of doxylamine succinate is a teratogen in humans. Relying on this same information and on information concerning Kelly Havner, including the date her mother ingested Bendectin, Dr. Palmer concluded that to a reasonable degree of medical certainty, Bendectin caused the birth defect seen in Kelly Havner’s hand.
However, Dr. Palmer’s testimony is based on epidemiological studies that conclude just the opposite. To the extent that he relied on the opinions of Drs. Swan, Glasser, Newman, or Gross, there is no scientifically reliable evidence to support their opinions, as we have seen. Palmer identified no other study or body of knowledge that would support his opinion, other than the chemical structure of Turpin, 959 F.2d at 1360. That court further observed that Dr. Palmer’s conclusions so overstated their predicate that it could not legitimately form the basis for a jury verdict. Id. We agree with that observation based on the record in this case.
* * * * * *
There is no scientifically reliable evidence to support the verdict in this case. Accordingly, we reverse the judgment of the court of appeals in part and render judgment for Merrell Dow.
BAKER, J., not sitting.
*731 GONZALEZ, Justice, concurring.
I join the Court’s opinion and judgment. I write separately to reiterate that the guidelines we established in E.I. du Pont de Nemours & Co. v. Robinson, 923 S.W.2d 549 (Tex.1995), are not limited to expert testimony based on a novel scientific theory.
In Robinson, we held that Burroughs Wellcome Co. v. Crye, 907 S.W.2d 497, 500 (Tex.1995) (Gonzalez, J., concurring).
Recently, the Court of Criminal Appeals addressed a similar attack on Kelly, that court’s equivalent of Robinson. In rejecting this argument, the court stated:
Nowhere in Kelly did we limit the two-pronged standard to novel scientific evidence. The [United States] Supreme Court in Daubert directly addressed the issue in a footnote, stating “[a]lthough the Frye decision itself focused exclusively on ‘novel’ scientific techniques, we do not read the requirements of Jordan v. State, 928 S.W.2d 550, 554 (Tex.Crim.App.1996)....
Robinson ‘s grasp what might be considered routine science.
The Havners attempted to prove causation primarily through expert testimony based on epidemiological and animal studies. These foundations are by no means novel. By applying the Robinson factors to Merrell Dow’s no-evidence challenge, the Court implicitly holds that Robinson applies to scientific expert testimony across the board. The trial *732 court must only determine whether the evidence is relevant and reliable. See Robinson, 923 S.W.2d at 556. It need not decide whether the evidence is also novel.
SPECTOR, Justice, concurring.
The Court today fails to heed its own warning that “the examination of a scientific study by a cadre of lawyers is not the same as its examination by others trained in the field of science or medicine.” 953 S.W.2d at 727 (internal citations omitted). I agree that the Havners’ expert witness testimony is not legally sufficient evidence of causation. However, as a judge, and not a scientist, I am uncomfortable with the majority’s ambitious scientific analysis and its unnecessarily expansive application of the Daubert standard. The majority’s opinion, replete with dicta, gives courts no practical guidance outside the context of Bendectin litigation. Accordingly, I concur only in the judgment of the Court.
ON MOTION FOR REHEARING
The motion for rehearing filed on behalf of the Havners is overruled. However, the tenor of that motion requires that we address the conduct of Respondents’ counsel.
This is not the first time in this case that the Havners’ counsel have engaged in less than exemplary conduct. Following the decision of the original panel of the court of appeals, which had reversed the judgment of the trial court and rendered judgment that the Havners take nothing, Robert C. Hilliard filed two briefs with the court of appeals which that court, sitting en banc, found to be “insulting, disrespectful, and unprofessional.” Id.
In assessing the appropriate response to the motion for rehearing that has now been filed by Hilliard and his co-counsel in this Court, we agree with another of our courts of appeals who recently found it necessary to address attacks on the integrity of that court:
A distinction must be drawn between respectful advocacy and judicial denigration. Although the former is entitled to a protected voice, the latter can only be condoned at the expense of the public’s confidence in the judicial process. Even were this court willing to tolerate the personal insult levied by [counsel], we are obligated to maintain the respect due this Court and the legal system we took an oath to serve.
Johnson v. Johnson, 948 S.W.2d 835, 840–41 (Tex.App.—San Antonio 1997, writ requested)1 (sanctioning counsel for disparaging remarks about the trial court and forwarding the court of appeals’ opinion to the Office of General Counsel, concluding that a substantial question had been raised about counsel’s honesty, trustworthiness, or fitness as a lawyer).
Courts possess inherent power to discipline an attorney’s behavior. “ ‘Courts of justice are universally acknowledged to be vested, by their very creation, with power to impose silence, respect, and decorum, in their presence.’ ” Johnson, 948 S.W.2d at 840–41.
The Disciplinary Rules governing the conduct of a lawyer provide:
*733 A lawyer should demonstrate respect for the legal system and for those who serve it, including judges, other lawyers and public officials. While it is a lawyer’s duty, when necessary, to challenge the rectitude of official action, it is also a lawyer’s duty to uphold legal process.
TEX. DISCIPLINARY R. PROF’L CONDUCT preamble ¶ 4, reprinted in TEX. GOV’T CODE, tit. 2, subtit. G app. A (Vernon Supp.1997) (TEX. STATE BAR R. art. X, § 9).
Rule 8.02(a) of the Disciplinary Rules specifically states:
A lawyer shall not make a statement that the lawyer knows to be false or with reckless disregard as to its truth or falsity concerning the qualifications or integrity of a judge, adjudicatory official or public legal officer, or of a candidate for election or appointment to judicial or legal office.
Id. Rule 8.02(a).
The Legislature has also provided a mechanism for courts to sanction counsel who file pleadings presented for an improper purpose or to harass. 10.005. In addition, one of the lawyers for the Havners, Barry Nace, is a non-resident attorney. His appearance in Texas courts is subject to the Rules Governing Admission to the Bar, including Rule XIX.
The specific portions of the “Respondents’ Motion for Rehearing” filed in this Court that raise particular concerns are the “Statement of the Case for Rehearing” (pages 1–5), the “Brief of the Argument” (pages 8, 14, and 16), and the “Prayer for Relief” (pages 19–20). Counsel for Respondents Robert C. Hilliard of the firm of Hilliard & Muñoz, Barry J. Nace of the firm of Paulson, Nace, Norwind & Sellinger, and Rebecca E. Hamilton of the firm of White, White & Hamilton, P.C., are hereby afforded the opportunity to respond as to why the Court should not
1) refer each of them to the appropriate disciplinary authorities;
2) prohibit attorney Nace from practicing in Texas courts; and
3) impose monetary penalties as sanctions.
Any response must be filed in this Court by 5:00 p.m., Monday, November 24, 1997.
Done at the City of Austin, this 13th day of November, 1997.
BAKER, J., not sitting.
Rule 702 provides:
If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify in the form of an opinion or otherwise.
TEX.R. CIV. EVID. 702.
The Bradford Hill criteria are summarized as follows:
1. Strength of association. “First upon my list I would put the strength of association. To take a very old example, by comparing the occupations of patients with scrotal cancer with the occupations of patients presenting with other diseases, Percival Pott could reach the correct conclusion because of the enormous increase of scrotal cancer in the chimney sweeps.”
2. Consistency. “Next on my list of features to be specifically considered I would place the consistency of association. Has it been repeatedly observed by different persons, in different places, circumstances and times?”
3. Specificity. “If ... the association is limited to specific workers and to particular sites and types of disease and there is no association between the work and other modes of dying, then clearly that is a strong argument in favor of causation.”
4. Temporality. “Which is the cart and which the horse?”
5. Biological gradient. “Fifthly, if the association is one which can reveal a biological gradient, or dose-response curve, then we should look most carefully for such evidence.... The clear-dose response curve admits of a simple explanation and obviously puts the case in a clearer light.”
6. Plausibility. “It would be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand. What is biologically plausible depends on the biological knowledge of the day.”
7. Coherence. “The cause-and-effect interpretation of our data should not seriously conflict with the generally known facts of the natural history and biology of the disease.”
8. Experiment. “Occasionally it is possible to appeal to experimental ... evidence.... Here the strongest support for the causation hypothesis may be revealed.”
9. Analogy. “In some circumstances it would be fair to judge by analogy. With the effects of thalidomide and rubella before us we would surely be ready to accept slighter but similar evidence with another drug or another viral disease in pregnancy.”
Bernstein, supra, 71 N.C. L.REV. at 268–74.
See, e.g., Black & Lilienfeld, A Proposed Standard For Evaluating the Use of Epidemiological Evidence in Toxic Tort and other Personal Injury Cases, 28 HOW. L.J. 677, 691 (1985); see also Bailey et al., Reference Guide on Epidemiology, in REFERENCE MANUAL ON SCIENTIFIC EVIDENCE, supra, at 160–64.
These factors are:
(1) the extent to which the theory has been or can be tested;
(2) the extent to which the technique relies upon the subjective interpretation of the expert;
(3) whether the theory has been subjected to peer review and/or publication;
(4) the technique’s potential rate of error;
(5) whether the underlying theory or technique has been generally accepted as valid by the relevant scientific community; and
(6) the non-judicial uses which have been made of that theory or technique.
E.I. du Pont de Nemours & Co. v. Robinson, 923 S.W.2d 549, 557 (Tex.1995) (citation and footnote omitted).
An application for writ of error is pending in this Court, and we express no opinion on the merits of that appeal.