Applied asymptotics case studies in small-sample statistics pdf




















Brown, L. Kallianpur, P. Krishnaiah, and J. Rao, North-Holland, Amsterdam, — Chen, L. Poisson approximation for dependent trials, Ann. DasGupta, A. The matching, birthday, and the strong birthday problems: A contemporary review, J.

Dembo, A. Aldous and Pemantle R. Diaconis, P. Dudley, R. Galambos, J. Hwang, J. Improving upon standard estimators in discrete exponential families with applications to Poisson and negative binomial cases, Ann. Johnson, O. Fisher information inequalities and the central limit theorem, Technical Report. LeCam, L. An approximation theorem for the Poisson binomial distribution, Pac. Linnik, Y. An information theoretic proof of the central limit theorem, Theory Prob. Rachev, S.

Reiss, R. A limiting Poisson law in a scheme of sums of dependent random variables, Teor. Stein, C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables, L. Le Cam, J. Neyman, and E. Estimation of the mean of a multivariate normal distribution, Ann. Sweeting, T. The first term goes to zero a.

It should be pointed out here that the conclusion of this example can be derived by more direct means and without using the uniform strong law. Example 3. However, the asymptotic distribution of r is not trivial to derive. The tool required is known as the delta theorem, one of the most useful theorems in 38 3 More General Weak and Strong Laws and the Delta Theorem asymptotic theory.

The delta theorem specifies the asymptotic distribution of a smooth transformation g Tn if the asymptotic distribution of Tn is known. We present it in a later section in this chapter. A reference for this entire section is Kesten The condition on F as we have stated it in Theorem 3. Suppose the ensemble average is some number c.

Now suppose that you start with one individual x 0 and iteratively change the location from x 0 to x 1 to x 2 , etc. Consider now the time average of these function values. For well-behaved functions and certain types of transformations T , the ensemble average equals the time average of one individual.

Phase transitions by repeated applications of T mix things up so well that the time average for one individual equals the ensemble average after infinite time has passed, roughly speaking. The ergodic theorem is immensely useful in probability and statistics. We need some definitions to give a statement of it.

Definition 3. T is called invariant if it is measure-preserving; i. Theorem 3. See Breiman for a rigorous definition. Roughly speaking, a stationary process is ergodic if the long-run behavior of its paths is not affected by the initial conditions.

One useful corollary of the ergodic theorem for statisticians is the following result. The delta theorem says how to approximate the distribution of a transformation of a statistic in large samples if we can approximate the distribution of the statistic itself. Serfling , Lehmann , and Bickel and Doksum can be consulted for more examples and implications of the delta theorem in addition to the material that we present below.

We state it next. In fact, a tedious calculation also shows that the joint limiting distribution of any finite number of central sample moments is a multivariate normal. See Serfling for explicit formulas of the covariance matrix for that limiting distribution. It is not possible to write a clean formula for v in general. The fixed sample density of rn in the bivariate normal case can be written as an infinite series or in terms of hypergeometric functions see Tong The same method also produces approximations, with error bounds, on the moments of g Tn.

The order of the error can be made smaller the more moments Tn has. To keep notation simple, we give approximations to the mean and variance of a function g Tn below when Tn is a sample mean. See Bickel and Doksum for proofs and more information. Let g be a scalar function with four uniformly bounded derivatives. The variance approximation above is simply what the delta theorem says. With more derivatives of g that are uniformly bounded, higherorder approximations can be given.

We apply Theorem 3. The delta theorem produces it more easily than the direct calculation. The direct calculation is enlightening, however. Exercise 3. Someone having the entire set of n pictures can cash them in for money. Let Wn be the minimum number of cereal boxes one would need to purchase to own a complete set P of the pictures.

Hint: Approximate the mean of Wn. Is the convergence almost sure as well? Show that Exercise 3. References Bickel, P. Birkhoff, G. Proof of the Ergodic theorem, Proc. USA, 17, — Chung, K. A Course in Probability Theory, 3rd ed.

Feller, W. Kesten, H. Sums of independent random variables—without moment conditions, the Rietz Lecture, Ann. Lehmann, E. Maller, R. A note on domains of partial attraction, Ann. Revesz, P. Schneider, I. Sen, P. Strassen, V. An invariance principle for the law of the iterated logarithm, Z. A converse to the law of the iterated logarithm, Z. Tong, Y. Chapter 4 Transformations A principal use of parametric asymptotic theory is to construct asymptotically correct confidence intervals.

A number of approximations have been made in using this interval. The plug-in standard deviation estimate is quite often an underestimate of the true standard deviation. Transformations of the first type are known as variance-stabilizing transformations VSTs , those of the second type are known as symmetrizing transformations STs , and those of the third type are known as biascorrected transformations BCTs. Ideally, we would like to find one transformation that achieves all three goals.

However, usually no transformation can even achieve any two of the three goals simultaneously. There is an inherent tension between the three goals. One can often achieve more than one goal through iterations; e. Or first obtain an ST and then find a bias-corrected adjustment of that. There is an enormous body of literature on transformations, going back to the early s.

We will limit ourselves A. The leading early literature includes Curtiss , Bartlett , Anscombe , Hotelling , and Fisher More recent key references include Bickel and Doksum , Efron , and Hall The material in this chapter is also taken from Brown, Cai, and DasGupta Unfortunately, the concept does not generalize to multiparameter cases, i.

It is, however, a useful tool in one-parameter problems. In the above, the integral is to be interpreted as a primitive. Such a statistic or transformation of Tn is called a variancestabilizing transformation.

Note that the transformation is monotone. As long as there is an analytical formula for the asymptotic variance function in the limiting normal distribution for Tn , and as long as the reciprocal of its square root can be integrated in closed form, a VST can be written down. For one-parameter problems, it is a fairly general tool, and there are 4.

It is important to remember, however, that the VST g Tn may have large biases and its variance may not actually be nearly a constant unless n is adequately large. We come to these later. First, we work out some examples of VSTs and show how they are used to construct asymptotically correct confidence intervals for an original parameter of interest. The arctanh transformation of rn attains normality much quicker than rn itself.

Example 4. This example has a feature that was not shared by the preceding examples: there is an invariance structure in this example. In problems with such an invariance structure, there is no clear need for using VSTs to obtain confidence intervals, even though they are available. Suppose we have iid observations X 1 , X 2 ,. Intuitively, correcting the VST for its bias, or correcting it to make it have a variance that is more approximately a constant, should lead to better inference, an idea that goes back to Anscombe We know that the transformation arcsin Xn is a variance-stabilizing transformation in this case.

The traditional VST does not have the second-order variance stabilization property, but the new one suggested by Anscombe does. The traditional VST does not, once again, have the second-order variance stabilization property, but the one suggested by Anscombe does. However, there are no sets of constants that can simultaneously achieve secondorder bias and variance correction. The method that Anscombe used, namely perturbing by a constant, is usually not productive.

A more natural method of bias correction is to use the following method, which we simply outline. They are derived in Brown, Cai, and DasGupta These are used to find second-order bias-corrected and variance-stabilized transforms. We mention only the bias-correction result. Theorem 4. As Example 4. However, it has a second-order bias; i. To use Theorem 4.

From here, a straightforward application of Theorem 4. We can conduct the test by using z r. Therefore, one can find the actual type I error probability of each test and compare it with the nominal level as a measure of the level accuracy of the tests.

The bias-corrected VST offers some improvement in level accuracy. On using Theorem 4. The type I error rates can be found by using noncentral chi-square probabilities Brown, Cai, and DasGupta The test based on the VST has a serious level inaccuracy, and it is striking how much improvement the bias-corrected version provides. Bias correction is a good general principle. The seriousness of the omission depends on the amount of skewness truly present in the finite sample distribution.

We make it precise below. One feature of symmetrizing transforms is that they are not variance-stabilizing. The two goals are intrinsically contradictory. Also, symmetrizing transforms generally have a second-order bias.

So, a bias correction of a symmetrizing transform may lead to further improvement in the quality of inference but at the expense of increasing formal complexity. We need some notation to define symmetrizing transformations. Let g. The derivation is available in Brown, Cai, and DasGupta Here then is the definition. Definition 4.

The general solution of this differential equation is available in Brown, Cai, and DasGupta , but we will not present it here. We specialize to the case of the one-parameter exponential family, where, it turns out, there is essentially a unique symmetrizing transform, and it is relatively simple to describe it.

Here is the result of Brown, Cai, and DasGupta Now that we have a general representation of the symmetrizing transform in the entire one-parameter exponential family, we can work out some examples and observe that the symmetrizing transforms are different from the VSTs. This is interesting. This is the well-known Wilson-Hilferty transform for the Gamma distribution. Since the goals of stabilizing the variance and reducing the skewness are intrinsically contradictory and the corresponding transforms are different, the question naturally arises if one is better than the other.

No general statements can be made. It often has quite a bit of bias. A bias correction of the VST seems to be the least that one should do. In the Poisson case, a bias-corrected VST seems to produce more accurate inference than the symmetrizing transform. But in the Gamma case, the symmetrizing transform is slightly better than even the bias-corrected VST.

No studies seem to have been made that compare bias-corrected VSTs to bias-corrected symmetrizing transforms. Exercise 4. Is arcsin Xn unbiased for arcsin p? Was your guess right? Compare it with Exercise 4. How would you go about constructing a variance-stabilizing transformation?

In the text, a specific variance-stabilizing transformation was described. How would you go about constructing other variance-stabilizing transformations? Think of a few such statistics and investigate the corresponding variance-stabilizing transformations. Consider the equal-tailed exact confidence interval and the asymptotically correct confidence interval of Example 4. Compare it with Exercises 4. Using Theorem 4.

References 61 References Anscombe, F. Bar-Lev, S. On the construction of classes of variance stabilizing transformations, Stat. Bartlett, M.

The use of transformations, Biometrics, 3, 39— Bickel, P. An analysis of transformations revisited, J. Interval estimation for a binomial proportion, Statist. On selecting an optimal transformation, preprint. Curtiss, J. On transformations used in the analysis of variance, Ann.

DiCiccio, T. Constructing approximately standard normal pivots from signed roots of adjusted likelihood ratio statistics, Scand. Efron, B. Transformation theory: How normal is a family of distributions? Fisher, R. The analysis of variance with various binomial transformations, Biometrics, 10, — Hall, P.

On the removal of skewness by transformation, J. B, 54 1 , — Hotelling, H. New light on the correlation coefficient and its transforms, J. B, 15, — Sprott, D. Likelihood and maximum likelihood estimation, C. Chapter 5 More General Central Limit Theorems Theoretically, as well as for many important applications, it is useful to have CLTs for partial sums of random variables that are independent but not iid.

We present a few key theorems in this chapter. A nearly encyclopedic reference is Petrov Other useful references for this chapter are Feller , Billingsley , Lehmann , Ferguson , Sen and Singer , and Port Other specific references are given later.

A proof can be seen in Billingsley It can be shown that the condition A. The LindebergFeller theorem is a landmark theorem in probability and statistics. Generally, it is hard to verify the Lindeberg-Feller condition. A simpler theorem is the following. Theorem 5. A proof is given in Sen and Singer Here is an important example.

Example 5. Thus, the asymptotic normality of the LSE least squares estimate is established under some conditions on the design variables, an important result. A CLT without a finite variance can sometimes be useful.

We present the general result below and then give an illustrative example. Feller contains detailed information on the availability of CLTs without the existence of a variance, along with proofs.

First, we need a definition. Definition 5. We present an example below where asymptotic normality of the sample partial sums still holds, although the summands do not have a finite variance.

The 5. It is also of basic interest to probabilists. A proof can be seen in Hoeffding Port is another useful reference for combinatorial CLTs. See the exercises. Consider, for example, the familiar hypergeometric distribution, wherein an urn has n balls, D of which are black, and m are sampled at random without replacement. What one needs is a clever embedding in the random permutation setup. Thus consider random variables X 1 ,. In statistics, for example, the natural interpretation of a sequence of sample observations for a Bayesian would be that they are exchangeable, as opposed to the frequentist interpretation that they are iid.

Central limit theorems for exchangeable sequences bear some similarity and also a lot of differences from the iid situation. Some key references on the central limit results are Blum et al. For expositions on exchangeability, we recommend Aldous and Diaconis Interesting examples can be seen in Diaconis and Freedman We first define the notion of exchangeability. However, the converse is not true. A famous theorem of de Finetti says that an infinite sequence of exchangeable Bernoulli random variables must be a mixture of an iid sequence of Bernoulli variables.

More precisely, the theorem de Finetti says the following. We only treat the case of a finite variance. Here are two main CLTs for exchangeable sequences with a finite variance. Consult Klass and Teicher and Chernoff and Teicher for their proofs.

We see from Theorem 5. There are other such differences from the iid case in the central limit theorem for exchangeable sequences. In some important applications, the correct structure is of a triangular array of finitely exchangeable sequences rather than just one infinite sequence.

The next result is on such an exchangeable array; we first need a definition. There are some practical problems that arise in applications, for example in sequential statistical analysis, where the number of terms present in a partial sum is a random variable.

The question is whether a CLT still holds under appropriate 5. If we have problems more general than the iid case with a finite variance, then the theorem does need another condition on the underlying random variables X 1 , X 2 ,. The area has since continued to flourish, and a huge body of deep and elegant results now exist in the literature. We provide a short account of infinitely divisible distributions on the real line.

Infinitely divisible and stable distributions are extensively used in applications, but they are also fundamentally related to the question of convergence of distributions of partial sums of independent random variables.

Three review papers on the material of this section are Fisz , Steutel , and Bose, DasGupta, and Rubin Feller is a classic reference on infinite divisibility and stable laws. The following important property of the class of infinitely divisible distributions describes the connection of infinite divisibility to possible weak limits of partial sums of independent random variables.

Then F is infinitely divisible. The result above allows triangular arrays of independent random variables with possibly different common distributions Hn for the different rows.

This class is the so-called stable family. We first give a more direct definition of a stable distribution that better explains the reason for the name stable. Complete characterizations for a distribution H to be in the domain of attraction of a specific stable law are known.

In particular, one can characterize all distributions H for which sample means have a normal limit on appropriate centering and norming; we address this later in this section. Random variables with a bounded support cannot be infinitely divisible. This is a well-known fact and is not hard to prove. Interestingly, however, most common distributions with an unbounded support are infinitely divisible. A few well-known ones among them are also stable.

Thus, all normal distributions are infinitely divisible. For a given n, take X 1 ,. Thus, all Poisson distributions are infinitely divisible. Then X is not infinitely divisible. If it is, then, for any n, there exist iid random variables X 1 ,. The most common means of characterizing infinitely divisible id laws is by their characteristic functions. Several forms are available; we give two of these forms, namely Form A for the finite-variance case and Form B for the general case.

Here c is arbitrary. For certain applications and special cases, Form A is more useful. Below, Form A will be used for cases of a finite variance. Log Beta distribution. Log Gamma distribution. Suppose Y has the Gamma 1, p distribution. Characteristic functions of id laws satisfy some interesting properties.

Such properties are useful to exclude particular distributions from being id and to establish further properties of id laws as well. They generally do not provide much probabilistic insight but are quite valuable as analytical tools in studying id laws. A collection of properties is listed below.

Large classes of positive continuous random variables can be shown to be infinitely divisible by using the following famous result. Then X is infinitely divisible. It is well-known that a positive random variable X has a completely monotone density iff X has the same distribution as Y Z , where Z is exponential with mean 1 and Y is nonnegative and independent of Z.

That is, all scale mixtures of exponentials are infinitely divisible. Stable laws occupy a special position in the class of infinitely divisible distributions. They have found numerous applications in statistics. Starting from Form B of the characteristic function of infinitely divisible distributions, it is possible to derive the following characterization for characteristic functions of stable laws.

The possible values of b are the entire real line. Normal distributions are the only stable laws with a finite variance. In fact, this is one of the principal reasons that stable laws are used to model continuous random variables with densities having heavy tails.

Here is an important result on moments of stable laws. If H has a finite second moment, then this limit can be shown to be necessarily zero. However, the limit can be zero without H having a finite second moment. We saw in Example 5. The corresponding characterization result for a general stable law is the following. Exercise 5. Identify bn. Suppose 0 Exercise 5.

Hint: The answer involves the Riemann zeta function. If so, establish what the centering and norming are. Let Tn be the number of records up to time n. This is a famous result and is not very easy to prove. Can the correlation between X 1 and X 2 be negative?

Can the correlation between X 1 and X 2 be zero? Show that Z 1 Z 2 is infinitely divisible. Show that Z 1 Z 2. Z n is infinitely divisible. Hint: Look at convolutions of a normal with a Poisson distribution. References Aldous, D. Anscombe, F. Large sample theory of sequential estimation, Proc. Cambridge Philos. Probability and Measure, 3rd ed. Blum, J. Central limit theorems for interchangeable processes, Can. Bose, A. A contemporary review of infinitely divisible distributions and processes, Sankhya Ser.

A, 64 3 , Part 2, — Chernoff, H. A central limit theorem for sums of interchangeable random variables, Ann. Funzione caratteristica di un fenomeno allatorio. Atti R. Lincii Ser.

A dozen de Finetti style results in search of a theory, Ann. Fisz, M. Infinitely divisible distributions: recent results and applications, Ann. Hewitt, E. Let us know here. System error. Please try again! How was the reading experience on this article?

The text was blurry Page doesn't load Other:. Details Include any more information that will help us locate the issue and fix it faster for you. Thank you for submitting a report! Submitting a report will send us an email through our customer support system.

Submit report Close. Recommended Articles Loading There are no references for this article. Read and print from thousands of top scholarly journals. Already have an account? Log in. APA Robinson, A. Our policy towards the use of cookies All DeepDyve websites use cookies to improve your online experience.

The treatment is oriented towards practice and comes with code in the R language available from the web which enables the methods to be applied in a range of situations of interest to practitioners.

The analysis includes some comparisons of higher order likelihood inference with bootstrap or Bayesian methods. Anteprima libro ». Cosa dicono le persone - Scrivi una recensione. Sommario Sezione 1. Sezione 2. Sezione Sezione 9. Reid Anteprima limitata - V marginal marginal likelihood matrix maximum likelihood estimate modified likelihood root nlogL nlreg package nonlinear regression normal approximation nuisance parameters obtained P-value panel of Figure parameter of interest plots posterior profile log likelihood quantities regression model residuals response sample space Section shows significance level simulation standard normal studentized residuals sufficient statistic Table tail area approximation tangent exponential model testing Total number values variables variance function variance parameters vector Wald pivot.



0コメント

  • 1000 / 1000