HUMAN HEALTH CASE STUDIES
Quality Assurance Methods for Monte Carlo Risk Analysis
Funding:National Institutes of Health (NIH)
Ferson, S. 1996. What Monte Carlo methods cannot do. Human and Environmental Risk Assessment.
Ferson, S. 1996. Reliable calculation in probabilistic logic: accounting for small sample size and model uncertainty. Proceedings of Intelligent Systems: A Semiotic Perspective. National Institute for Standards and Technology, Gaithersburg, Maryland.
Authors: Scott Ferson
Although probabilistic risk assessments based on Monte Carlo simulation methods are now routinely used to forecast public health consequences of various management and regulatory decisions regarding potential environmental toxicants, the reliability of the probabilistic assessments is rarely estimated. Mostly this is because the sensitivity studies this would require are extremely cumbersome. We propose to test the feasibility of a direct approach to estimating reliability that is based on probability bounds (i.e. interval bounds on cumulative distribution functions that model the risk of adverse consequences). These bounds can be constructed to contain model uncertainty comprehensively and representation error rigorously.
The probability bounds approach can be used to redress some of the most serious criticisms commonly leveled against Monte Carlo assessments, including (1) input distributions are unknown, (2) correlations and dependencies among variables are ignored, and (3) mathematical structure of the model is questionable. To establish feasibility, we will conduct case studies that illustrate its use, establish its data requirements, conservatism and workability, derive optimal formulas for use with some common mathematical operations, and explore how empirical information can be used in practice to tighten the bounds. The probability bounds approach is expected to be vastly easier to use than current second-order Monte Carlo methods.
Detecting Disease Clusters in Structured Environments
Funding: National Institutes of Health (NIH) and by New York State Science and Technology Foundation
Published: Ferson, S. 1996. Reliable calculation in probabilistic logic: accounting for small sample size and model uncertainty. Proceedings of Intelligent Systems: A Semiotic Perspective. National Institute for Standards and Technology, Gaithersburg, Maryland.
Authors: Scott Ferson
One the difficulties faced by health professionals in detecting of disease clusters is that the data sets are often small and inferences must be based on a relative handful of observations. It is crucial for the health professional to know what statistical tests are best in these small-environment problems, and to have these methods available in a user-friendly computer package. A variety of new, rapid, and exact combinatorial expressions for cluster analysis of patterns of disease have been proposed. Investigations into the statistical power of both these and other previously published methods for cluster detection in structured small environments will be used to recommend different tests for different kinds of problems and different amounts of data.
An interactive program called EPIC (Exact Probabilities for Incidence Clustering) that includes an intuitive interface and a thorough set of tutorials and guidelines will help the professional choose the best statistical test for a particular problem. EPIC will allow the health professional to investigate allegations of disease clustering within small, structured environments, such as families, sibships, wards, classrooms, cell blocks, job types, age classes or locations within a building. When sample sizes are small (as is almost always the case in real circumstances of public health concern), exact statistical methods are necessary since the approximate methods usually used only yield accurate estimates with data sets are large.
Exact methods guarantee that Type I error can be controlled to any desired level. Although a few exact methods based on matrix occupancy models have previously been described for data sets with perfectly regular structure, no exact methods were applicable when, for instance, families were of different sizes. EPIC will provide, for the first time, general exact statistical methods for use with small data sets in structured environments. It will allow public health professionals in the research and regulatory communities access to these new methods in a flexible and powerful microcomputer implementation.
Spatial clustering of childhood cancers in the Denver Metroplex
Funding: Radian Corporation, with funding from Electric Power Research Institute (EPRI)
Authors: Scott Ferson
Location of study: Denver
Although we could detect no spatial clustering of childhood cancers at the finest resolution of individual cases and controls, analysis with aggregated data using census information detected statistically significant spatial clustering. The intensity (and significance) of the spatial clustering was even stronger at the level of entire cities in the Denver metroplex. The principal finding is that, when the locations of childhood cancers are aggregated into area/frequency data, statistical tests reveal significant spatial clustering.
This conclusion is robust in the sense that it is independent of many details of the analysis and the data and seems to persist over spatial resolutions ranging from the scale of a city to that of a census tract. This confirms the finding that there is strong spatial inhomogeneity in childhood cancer incidence across the region. The inhomogeneity cannot be removed simply by accounting for changes in the population density of children. This observation of clustering of childhood cancers has several important ramifications for the study of this cancer's phenomenology. The findings call into question many of the other statistical procedures already conducted on this data set that have assumed spatial homogeneity of disease etiology or incidence. Although often unstated, such an assumption is very common and probably affects many if not the majority of the most prominent conclusions that have been made about the Denver cancer data.
Stochastic cell proliferation models of carcinogenesis: Microcomputer software for research and risk analysis
Funding:National Institutes of Health (NIH) and the Chemical Industry Institute of Toxicology (CIIT)
Authors: Scott Ferson
Quantitative models of carcinogenesis that are based on stochastic proliferation of cell types have been suggested by several researchers. For the most part, these models exist only as equations in articles, or, in a few cases, as specialized and non-portable software implementations. The consequence of this is that the models are not accessible to a broad audience of biologists who are impeded by the rigorous mathematics of the theoretical treatment or the awkwardness or unavailability of software.
We propose to develop easy-to-use microcomputer software in which the variety of important models of carcinogensis are implemented in a generalized package. From this platform researchers will be able to increase their intuition by exploring the consequences of the various assumptions made by the disparate models, as well as construct new models from the building blocks of basic assumptions about model structure, cell kinetics and mutation rates. Empirical toxicologists who possess data on dosage-dependent cancer incidence will also be able to use the software to estimate the risk or probability of tumorigenesis under any of the competing models. This software will permit users to develop their intuition about the competing models of carcinogensis, and make projections about the immediate hazards of carcinogenesis from a particular substance in a particular system.
Detecting sites at risk of becoming foci for Lyme disease
Funding: National Institutes of Health (NIH)
Authors: Jeffrey Millstein
This research project aims to develop a computer model for the detection of areas at high risk for becoming foci for Lyme disease.Solving this type of problem is typically attacked by applying the techniques of statistical analysis to empirical data.
My approach is to implement fuzzy logic inference procedures as computer software so that these techniques can be applied to a public health problem for which the relationships between vectors and their habitats are not yet clearly known. The computer model uses fuzzy logic to make inferences from a rule base. Rules are constructed using relationships between variables which are described by adjectives. Adjectives can be modified by adverbs, and complex rules can be formed through conjunction. Adjectives are described graphically. The proposed computer model will require a minimal set of information about a habitat's characteristics, such as the types of extent of covering vegetation, the local climate, and the pool of hosts and reservoirs of Ixodes dammini, the tick vector of Borrelia burgdorferi, the biological agent of Lyme borreliosis. From these data the fuzzy logic inference algorithm will assess the likelihood that the specified area can support the development of Lyme disease foci.
The objective of the Phase I research was to "(1) develop a computer-based platform for users to construct set-based qualitative models, and (2) customize this program to analyze data for the detection of areas at high risk for becoming foci for Lyme disease." The premise for constructing this program was to develop a novel kind of approach for rating parcels of land for the risk of becoming foci for Lyme disease. Solving this type of problem is typically attacked by applying the techniques of statistical analysis to empirical data. My approach was to implement recently developed qualitative procedures as computer software so that these techniques could be applied to a problem for which the relationships between vectors and their habitats are not yet clearly known. As stated by the CDC(1989), "Data concerning risk factors for acquiring Lyme disease are limited." Thus, the overall goal was to develop a tool for analyzing the data which are available in order to develop a more precise picture of the factors which permit foci of Lyme disease to develop.
The mathematical techniques that I proposed to utilize are commonly referred to as fuzzy logic inference. At the time, the use of fuzzy logic to solve biological problems was extremely limited and no computing platform existed. Almost exclusively, fuzzy logic techniques have been limited to use in engineering control systems. I saw an opportunity to apply these techniques to a wide class of problems of public health importance such as how to efficiently assess the risk that a particular habitat can support arthropod vectors of human diseases, such as the Ixodes dammini - Borrelia burgdorferi system.
The three-month Phase I research period allowed me to develop and test a compact fuzzy logic inference engine. This module uses fuzzy associative memory architecture and works by having users specify variables, adjectives, and a rule base. Adjectives are described graphically. The rule base has its own syntax and supports the modification of adjectives using adverbs or complementation. In addition, two types of fuzzy inference are supported; these are correlation-minimum and correlation-product inference. The system can generate syntactically correct C code for the variable definitions, the rule base and for the adjective set. This code will be used in the final stand-alone system. The inference module has been connected with crude data input and output routines for testing the general idea of rating habitat to determine the likelihood that Lyme disease foci could develop. Although a significant amount of progress was made, more work is necessary before the system is ready for commercial production.
Software Environment for Fuzzy Arithmetic
Funding: Deutscher Akademischer Austasch Dienst
Published:Ferson, S. 1990. Ecological and environmental risk analysis: using computers to estimate impacts and uncertainties. CPSR Newsletter 8:25-28.
Ferson, S. and R. Kuhn. 1992. Propagating uncertainty in ecological risk analysis using interval and fuzzy arithmetic. Computer Techniques in Environmental Studies IV, P. Zannetti (ed.), pp. 387-401, Elsevier Applied Science, London.
Ferson, S. 1993. Using fuzzy arithmetic in Monte Carlo simulation of fishery populations. Proceedings of the International Symposium on Management Strategies for Exploited Fish Populations. Alaska Sea Grant College Program Report No. 93-02, University of Alaska, Fairbanks.
Ferson, S. and R. Kuhn. 1994. Interactive microcomputer software for fuzzy arithmetic. Proceedings of the High Consequence Operations Safety Symposium. J.A. Cooper (ed.), Sandia National Laboratories, SAND94-2364, pp. 493-506. Albuquerque, New Mexico.
Authors: Ruediger Kuhn, Scott Ferson
There are uncertainty propagation problems which are poorly suited for traditional methods such as Monte Carlo simulation because of extremely sparse data sets, ignorance about correlations among variables, or graded definitions of important quantities (how many children have high body burdens of an environmental contaminant depends on what you consider "high"). Fuzzy arithmetic was a developed for such situations as a way to make rigorous calculations without requiring subjective decisions. The Risk Calc software was developed as a part of this project.
Cluster Analysis in Space and Time
Funding: National Institutes of Health (NIH) and the Electric Power Research Institute (EPRI)
Published: Jacquez, G.M. and L.I. Kheifets.1993. Synthetic cancer variables and the construction and testing of synthetic risk maps. Statistics in Medicine 12: 1931-1942.
Applied Biomathematics. 1993.
Oden, N. 1995. Adjusting Moran's I for population density. Statistics in Medicine 15: 783-806.
Oden, N. and G. Jacquez. 1996. Realistic power simulations compare point- and area-based disease cluster tests. Statistics in Medicine 15: 783-806.
Authors: N. Oden, G. Jacquez, L.I. Kheifets
Location of study: southern California
Diseases such as cancer are often clumped together in terms of their incidence in time or their distribution across space in ways that suggest a common environmental cause or a particular etiology or contagion process. However, humans are likely to perceive clusters even in purely randomly distributed data. The statistical problem is to determine whether there exists an excess of disease incidence--a cluster--above what might be expected by chance alone.
Some diseases may be clustered in time, so that most cases occur at a particular time of year. Other diseases, like those caused by a point release of toxic chemicals, may be clustered in space, so that most cases occur in the same place. There may also be space-time interaction, like an epidemic wave, so that pairs of cases are close both in time and space.
Statistical analysis can detect whether cases are clustered in space, in time, or whether there is space-time interaction. Useful statistical analyses include location/date methods such as Mantel's test, Knox test, Cuzick and Edwards case-control spatial clustering test, nearest neighbor statistics, and others. Complementary analyses include area/time-interval methods such as Dat's 0-1 matrix test, the Scan Test, Moran's I and Moran's I adjusted for population size, the empty cells test for rare events, Grimson's proximity test for binary events, Larsen's unimodal clustering test and the Ederer-Myers-Mantel test. Software is needed that brings all of these tests to the health professional in a convenient environment that also supports displaying of disease incidence information in multiple dimensions.