Biometrika Advance Access originally published online on November 19, 2007
Biometrika 2007 94(4):769-786; doi:10.1093/biomet/asm061
Articles |
Bayesian Nonparametric Estimation of the Probability of Discovering New Species
Dipartimento di Economia Politica e Metodi Quantitativi, Università degli Studi di Pavia, 27100 Pavia, Italy lijoi{at}unipv.it
Departamento de Probabilidad y Estadística, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, México, 04510 México D.F., Mexico ramses{at}sigma.iimas.unam.mx
Dipartimento di Statistica e Matematica Applicata, Università degli Studi di Torino, 10122 Torino, Italy igor{at}econ.unito.it
Received for publication 1 June 2006. Revision received 1 February 2007.
We consider the problem of evaluating the probability of discovering a certain number of new species in a new sample of population units, conditional on the number of species recorded in a basic sample. We use a Bayesian nonparametric approach. The different species proportions are assumed to be random and the observations from the population exchangeable. We provide a Bayesian estimator, under quadratic loss, for the probability of discovering new species which can be compared with well-known frequentist estimators. The results we obtain are illustrated through a numerical example and an application to a genomic dataset concerning the discovery of new genes by sequencing additional single-read sequences of cdna fragments.
Key Words: Bayesian nonparametrics Gibbs-type random partition Posterior probability of discovering a new species Sample coverage Species sampling
References
-
Antoniak C. E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. (1974) 2:1152–74.[CrossRef]
Berestycki N., Pitman J. Gibbs distributions for random partitions generated by a fragmentation process. J. Statist. Phys. (2007) 127:381–418.[CrossRef]
Boender C. G. E., Rinnooy Kan A. H. G. A multinomial Bayesian approach to the estimation of population and vocabulary size. Biometrika (1987) 74:849–56.
Boneh S., Boneh A., Caron R. J. Estimating the prediction function of the number of unseen species in sampling with replacement. J. Am. Statist. Assoc. (1998) 93:372–9.[CrossRef][Web of Science]
Bunge J., Fitzpatrick M. Estimating the number of species: a review. J. Am. Statist. Assoc. (1993) 88:364–73.[CrossRef][Web of Science]
Chao A. On estimating the probability of discovering a new species. Ann. Statist. (1981) 9:1339–42.[CrossRef]
Chao A., Bunge J. Estimating the number of species in a stochastic abundance model. Biometrics (2002) 58:531–9.[CrossRef][Web of Science][Medline]
Chao A., Lee S.-M. Estimating the number of classes via sample coverage. J. Am. Statist. Assoc. (1992) 87:210–7.[CrossRef][Web of Science]
Chao A., Shen T.-J. Nonparametric prediction in species sampling. J. Agric. Biol. Envir. Statist. (2004) 9:253–69.[CrossRef]
Charalambides C. A. Combinatorial Methods in Discrete Distributions (2005) Hoboken, NJ: Wiley.
Charalambides C. A., Singh J. A review of the Stirling numbers, their generalisations and statistical applications. Commun. Statist. A (1988) 17:2533–95.
Christen J. A., Nakamura M. Sequential stopping rules for species accumulation. J. Agric. Biol. Envir. Statist. (2003) 8:184–95.[CrossRef]
Clayton M. K., Frees E. W. Nonparametric estimation of the probability of discovering a new species. J. Am. Statist. Assoc. (1987) 82:305–11.[CrossRef][Web of Science]
Do K.-A., Müller P., Tang F. A Bayesian mixture model for differential gene expression. Appl. Statist. (2005) 54:627–44.
Doksum K. Tailfree and neutral random probabilities and their posterior distributions. Ann. Prob. (1974) 2:183–201.[CrossRef]
Efron B., Thisted R. Estimating the number of unseen species: how many words did Shakespeare know? Biometrika (1976) 63:435–47.
Ewens W. J. The sampling theory of selectively neutral alleles. Theor. Pop. Biol. (1972) 3:87–112.[CrossRef][Web of Science][Medline]
Ferguson T. S. A Bayesian analysis of some nonparametric problems. Ann. Statist. (1973) 1:209–30.[CrossRef]
Gandolfi A., Sastri C. C. A. Nonparametric estimations about species not observed in a random sample. Milan J. Math. (2004) 72:81–105.[CrossRef]
Gnedin A., Pitman J. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. POMI (2005) 325:83–102.
Good I. J. The population frequencies of species and the estimation of population parameters. Biometrika (1953) 40:237–64.
Good I. J., Toulmin G. H. The number of new species, and the increase in population coverage, when a sample is increased. Biometrika (1956) 43:45–63.
Hill B. M. Posterior moments of the number of species in a finite population and the posterior probability of finding a new species. J. Am. Statist. Assoc. (1979) 74:668–73.[CrossRef][Web of Science]
Lijoi A., Mena R. H., Prünster I. Hierarchical mixture modeling with normalised inverse Gaussian priors. J. Am. Statist. Assoc. (2005) 100:1278–91.[CrossRef][Web of Science]
Mao C. X. Prediction of the conditional probability of discovering a new class. J. Am. Statist. Assoc. (2004) 99:1108–18.[CrossRef][Web of Science]
Mao C. X., Lindsay B. G. A Poisson model for the coverage problem with a genomic application. Biometrika (2002) 89:669–82.
Müller P., Quintana F. A. Nonparametric Bayesian data analysis. Statist. Sci. (2004) 19:95–110.[CrossRef]
Pitman J. Exchangeable and partially exchangeable random partitions. Prob. Theory Rel. Fields (1995) 102:145–58.[CrossRef]
Pitman J. Some developments of the Blackwell-MacQueen urn Scheme. Statistics, Probability and Game Theory. Papers in honor of David Blackwell—Ferguson T. S., et al, eds. (1996) 30:245–67. Lecture Notes, Monograph Series. Institute of Mathematical Statistics, Hayward.
Pitman J. Combinatorial Stochastic Processes (2006) Ecole d'Eté de Probabilités de Saint-Flour XXXII-2002. Lecture Notes in Mathematics N° 1875. New York: Springer.
Quintana F. A. A predictive view of Bayesian clustering. J. Statist. Plan. Infer. (2006) 136:2407–29.[CrossRef]
Regazzini E., Lijoi A., Prünster I. Distributional results for means of random measures with independent increments. Ann. Statist. (2003) 31:560–85.[CrossRef]
Robbins H. E. Estimating the total probability of the unobserved outcomes of an experiment. Ann. Math. Statist. (1968) 39:256–7.
Starr N. Linear estimation of the probability of discovering a new species. Ann. Statist. (1979) 7:644–52.[CrossRef]
Shen T.-J., Chao A., Lin C.-F. Predicting the number of new species in further taxonomic sampling. Ecology (2003) 84:798–804.[CrossRef][Web of Science]
Tiwari R. C., Tripathi R. C. Nonparametric Bayes estimation of the probability of discovering a new species. Commun. Statist. A (1989) 18:877–95.
Zhang H., Stern H. Investigation of a generalised multinomial model for species data. J. Statist. Comp. Simul. (2005) 75:347–62.[CrossRef]
| ||||||||||||||||||||||||||||||||||||||||||||||||||