Skip Navigation


Biometrika Advance Access originally published online on November 19, 2007
Biometrika 2007 94(4):769-786; doi:10.1093/biomet/asm061
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lijoi, A.
Right arrow Articles by Prünster, I.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 Biometrika Trust

Articles

Bayesian Nonparametric Estimation of the Probability of Discovering New Species

Antonio Lijoi

Dipartimento di Economia Politica e Metodi Quantitativi, Università degli Studi di Pavia, 27100 Pavia, Italy lijoi{at}unipv.it

Ramsés H. Mena

Departamento de Probabilidad y Estadística, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, México, 04510 México D.F., Mexico ramses{at}sigma.iimas.unam.mx

Igor Prünster

Dipartimento di Statistica e Matematica Applicata, Università degli Studi di Torino, 10122 Torino, Italy igor{at}econ.unito.it

Received for publication 1 June 2006. Revision received 1 February 2007.

We consider the problem of evaluating the probability of discovering a certain number of new species in a new sample of population units, conditional on the number of species recorded in a basic sample. We use a Bayesian nonparametric approach. The different species proportions are assumed to be random and the observations from the population exchangeable. We provide a Bayesian estimator, under quadratic loss, for the probability of discovering new species which can be compared with well-known frequentist estimators. The results we obtain are illustrated through a numerical example and an application to a genomic dataset concerning the discovery of new genes by sequencing additional single-read sequences of cdna fragments.

Key Words: Bayesian nonparametrics • Gibbs-type random partition • Posterior probability of discovering a new species • Sample coverage • Species sampling



References

    Antoniak C. E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. (1974) 2:1152–74.[CrossRef]

    Berestycki N., Pitman J. Gibbs distributions for random partitions generated by a fragmentation process. J. Statist. Phys. (2007) 127:381–418.[CrossRef]

    Boender C. G. E., Rinnooy Kan A. H. G. A multinomial Bayesian approach to the estimation of population and vocabulary size. Biometrika (1987) 74:849–56.[Abstract/Free Full Text]

    Boneh S., Boneh A., Caron R. J. Estimating the prediction function of the number of unseen species in sampling with replacement. J. Am. Statist. Assoc. (1998) 93:372–9.[CrossRef][Web of Science]

    Bunge J., Fitzpatrick M. Estimating the number of species: a review. J. Am. Statist. Assoc. (1993) 88:364–73.[CrossRef][Web of Science]

    Chao A. On estimating the probability of discovering a new species. Ann. Statist. (1981) 9:1339–42.[CrossRef]

    Chao A., Bunge J. Estimating the number of species in a stochastic abundance model. Biometrics (2002) 58:531–9.[CrossRef][Web of Science][Medline]

    Chao A., Lee S.-M. Estimating the number of classes via sample coverage. J. Am. Statist. Assoc. (1992) 87:210–7.[CrossRef][Web of Science]

    Chao A., Shen T.-J. Nonparametric prediction in species sampling. J. Agric. Biol. Envir. Statist. (2004) 9:253–69.[CrossRef]

    Charalambides C. A. Combinatorial Methods in Discrete Distributions (2005) Hoboken, NJ: Wiley.

    Charalambides C. A., Singh J. A review of the Stirling numbers, their generalisations and statistical applications. Commun. Statist. A (1988) 17:2533–95.

    Christen J. A., Nakamura M. Sequential stopping rules for species accumulation. J. Agric. Biol. Envir. Statist. (2003) 8:184–95.[CrossRef]

    Clayton M. K., Frees E. W. Nonparametric estimation of the probability of discovering a new species. J. Am. Statist. Assoc. (1987) 82:305–11.[CrossRef][Web of Science]

    Do K.-A., Müller P., Tang F. A Bayesian mixture model for differential gene expression. Appl. Statist. (2005) 54:627–44.

    Doksum K. Tailfree and neutral random probabilities and their posterior distributions. Ann. Prob. (1974) 2:183–201.[CrossRef]

    Efron B., Thisted R. Estimating the number of unseen species: how many words did Shakespeare know? Biometrika (1976) 63:435–47.[Abstract/Free Full Text]

    Ewens W. J. The sampling theory of selectively neutral alleles. Theor. Pop. Biol. (1972) 3:87–112.[CrossRef][Web of Science][Medline]

    Ferguson T. S. A Bayesian analysis of some nonparametric problems. Ann. Statist. (1973) 1:209–30.[CrossRef]

    Gandolfi A., Sastri C. C. A. Nonparametric estimations about species not observed in a random sample. Milan J. Math. (2004) 72:81–105.[CrossRef]

    Gnedin A., Pitman J. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. POMI (2005) 325:83–102.

    Good I. J. The population frequencies of species and the estimation of population parameters. Biometrika (1953) 40:237–64.[Abstract/Free Full Text]

    Good I. J., Toulmin G. H. The number of new species, and the increase in population coverage, when a sample is increased. Biometrika (1956) 43:45–63.[Abstract/Free Full Text]

    Hill B. M. Posterior moments of the number of species in a finite population and the posterior probability of finding a new species. J. Am. Statist. Assoc. (1979) 74:668–73.[CrossRef][Web of Science]

    Lijoi A., Mena R. H., Prünster I. Hierarchical mixture modeling with normalised inverse Gaussian priors. J. Am. Statist. Assoc. (2005) 100:1278–91.[CrossRef][Web of Science]

    Mao C. X. Prediction of the conditional probability of discovering a new class. J. Am. Statist. Assoc. (2004) 99:1108–18.[CrossRef][Web of Science]

    Mao C. X., Lindsay B. G. A Poisson model for the coverage problem with a genomic application. Biometrika (2002) 89:669–82.[Abstract/Free Full Text]

    Müller P., Quintana F. A. Nonparametric Bayesian data analysis. Statist. Sci. (2004) 19:95–110.[CrossRef]

    Pitman J. Exchangeable and partially exchangeable random partitions. Prob. Theory Rel. Fields (1995) 102:145–58.[CrossRef]

    Pitman J. Some developments of the Blackwell-MacQueen urn Scheme. Statistics, Probability and Game Theory. Papers in honor of David Blackwell—Ferguson T. S., et al, eds. (1996) 30:245–67. Lecture Notes, Monograph Series. Institute of Mathematical Statistics, Hayward.

    Pitman J. Combinatorial Stochastic Processes (2006) Ecole d'Eté de Probabilités de Saint-Flour XXXII-2002. Lecture Notes in Mathematics N° 1875. New York: Springer.

    Quintana F. A. A predictive view of Bayesian clustering. J. Statist. Plan. Infer. (2006) 136:2407–29.[CrossRef]

    Regazzini E., Lijoi A., Prünster I. Distributional results for means of random measures with independent increments. Ann. Statist. (2003) 31:560–85.[CrossRef]

    Robbins H. E. Estimating the total probability of the unobserved outcomes of an experiment. Ann. Math. Statist. (1968) 39:256–7.

    Starr N. Linear estimation of the probability of discovering a new species. Ann. Statist. (1979) 7:644–52.[CrossRef]

    Shen T.-J., Chao A., Lin C.-F. Predicting the number of new species in further taxonomic sampling. Ecology (2003) 84:798–804.[CrossRef][Web of Science]

    Tiwari R. C., Tripathi R. C. Nonparametric Bayes estimation of the probability of discovering a new species. Commun. Statist. A (1989) 18:877–95.

    Zhang H., Stern H. Investigation of a generalised multinomial model for species data. J. Statist. Comp. Simul. (2005) 75:347–62.[CrossRef]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lijoi, A.
Right arrow Articles by Prünster, I.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?