“Milankovitch cycles and microfossils: principals and practice of palaeocological illustrated by Cenomanian chalk-marl Rhythms” by C.R. Paul - a comment

Sir - In his paper on Milankovitch cycles and microfossils, Paul (1992) has launched a comprehensive attack on the use of
standard counts and percentages in palaeoecology, with particular reference to the methods used by micropalaeontologists studying 
Upper Cretaceous chalk and marl assemblages. We commend him for the diligent and painstaking way in which he has constructed his argument. 
He is, however, wrong. In presenting our counter-attack, we wish to take issue with several of his statements. In the following discussion, 
direct quotes from Paul (1992) are given thus in italics and quotes.

Sir -In his paper on Milankovitch cycles and microfossils, Paul (1992) has launched a comprehensive attack on the use of standard counts and percentages in palaeoecology, with particular reference to the methods used by micropalaeontologists studying Upper Cretaceous chalk and marl assemblages. We commend him for the diligent and painstaking way in which he has constructed his argument. He is, however, wrong. In presenting our counter-attack, we wish to take issue with several of his statements. In the following discussion, direct quotes from Paul (1992) are given thus "in italics and quotes".
"Percentages are, in effect, standard counts of 100". This is not so. A standard count of 100 is unlikely to give a true representation of the YO composition of an assemblage. That is whyastandardcountof at least300specimensisrecommended. The practice of making standard counts has developed not only "to ensure that key taxa are not overlooked" but equally importantly to ensure that the proportion of each taxon in the assemblage is determined with a high degree of confidence. The recommended figure of 300 originates from a study by Dryden (1931;fide Phleger, 1960) on accuracy in percentage representation of heavy mineral frequencies. Phleger (1960; Ch.1) discusses the topic in detail; he concludes (p. 35): "...that little if anything is to be gained by counting samples much larger than approximately 300 specimens and that the illusion of accuracy tends to be misleading".
Weagree that "...standard countsorpercentagesforall taxaare interdependent". However, Paul's distinction between signal ("a genuine change in the abundance of a taxon") and echo ("a passive response to a change in the abundance of another taxon") is inappropriate. He illustrates his argument with Fig. 2, which "was constructed on the assumption that the taxon concerned was present at an absolutely invariant abundance in terms of specimens per square metre of seafloor or per gramme of sediment". The first part of this assumption is unwarranted since the standing crop or population density of the taxon is likely to have varied considerably (and at a very much higher frequency than the sampling interval) through time; a 6cm thick sample of chalk or marl does not represent the sea floor at a given instant in time, but the cumulative result of at least hundreds of years' worth of superimposed "sea floor", mixed by currents and bioturbation. The specimens of a taxon in a fossil assemblage do not constitute a population, nor do fossil assemblages truly represent communities; they are (to quote Griffiths & Evans (1992) in the same issue) "time-averaged taxocenes which have undergone a variety of processes of sortage and attrition". The second part of the assumption is unwarranted since it also assumes a constant rate of sediment deposition. Absolute abundance of specimens, expressed as numbers per weight or volume of sediment, is subject to variation due to changes in the rate of sedimentation. Paul's Fig. 2 shows fluctuations in the relative abundance of a taxon which are independent of sedimentation rate; they may be held to reflect real changes in assemblage composition through time, which may be interpreted as responses by taxa to changes in environmental parameters of ecological significance. In other words, the signal and the echo both contain useful information; that provided by the echo is arguably the more relevant to palaeoecology .
"...standard counts and percentages may give misleading impressions and suggest inappropriate conclusions" Paul illustrates this point by showing that in terms of percentage (i.e. relative abundance) Gavelinella and Hedbergella were more abundant in chalks than in marls, while in terms of absolute abundance (numbers of specimens per 500g sample) the reverse was the case (his Tables 1 & 2). However, in comparing absolute abundances from two different lithologies he makes the implicit assumption that sedimentation rates were the same during chalk deposition as during marl depositionand as he himself argues in a later part of his paper, this was almost certainlynot thecase.Thedifferentresultsgivenbyprecentages and absolute abundances can be explained if the sedimentation rates in the chalks were higher than those in the marls. Consider the three chalk/marl rhythms on which his Tables 1 & 2 are based. Taking the durations estimated by Paul (his Table 6) and the thicknesses given in Fig. 2 of Leary et al. (1989), we find that the sedimentation rates of the chalk beds were more than twice those of the marls. The calculations for Paul's Table 6 may be slightly suspect since they are not actually based on three complete rhythms, but on "two complete marl chalk-marl couplets and most of a third one" (Leary et al., 1989). However, if one accepts that these are at least reasonable approximations then the foraminifera1 assemblages of the chalks have been diluted by higher sedimentation rates, and should be multiplied by a factor of at least two to allow direct comparison with those of the marls. If this is done, absolute abundance shows the same relationship between chalks and marls as percentages. Once again, it is clear that percentage (i.e., relative abundance) is a more reliable measure for palaeoecology than absolute abundance. Of course, it could be argued that thesedimentation rate was itself an important ecological parameter. However, the sedimentation rates in question appear to be less than 0.5cm/100yr -this rate of influx of sediment is unlikely to cause problems for even the most lethargic of benthonic foraminifera, notwithstanding Paul's comment about them being "unable to leave the area if sedimentation rates became uncomfortably high". Absolute abundance data are useful, of course, but should be considered in conjunction with (not instead of) relative abundancedata. It is essential, furthermore, to have a clear idea of what "absolute" abundance actually means. In a clay or silt lithology, for example, absolute abundance of microfossils may be measured against a baseline of minerogenic sediment. In a chalk, the baseline is a biogenic sediment composed of microfossils; in such a case, "absolute" abundance of foraminifera might be set against a coccolith baseline, for example, but in terms of complete assemblages (including all the microfossils and macrofossils preservedand still representing a biased and incomplete record of the original living community) it would be, in fact, relative abundance.
Finally, whether dealing with absolute or relative abundances, the data obtained are only as good as the sampling method. Micropalaeontologists usually sample fossil assemblages three times: When collecting a sample in the field (sample size and interval relative to bed thicknesses are significant variables).
When processing that sample (this effectively subsamples the original sample in some way); biases introduced at this stage are likely to be exaggerated when dealing with marls and chalks since the former will break downmore completely in one freeze-thaw cycle, thus yielding a "higher" faunal density.
When picking microfossils from the processed sample residue.
The bias or errors that may be introduced by this third stage of sampling can be avoided by picking the entire sample; in practice this would often be far too time-consuming, so the sample residue must be sub-sampled. This is often done by sieving the residue into size-fractions, which certainly makes picking much easier. Unfortunately it appears to be common practice (and one which Paul endorses) to then use only one of the fractions, usually the <500pm >250pm fraction. In Cretaceous chalk and marl samples, very small planktonic foraminifera ( e g . heterohelicids) are often abundant, but since they occur almost entirely in the <250pm fraction they are habitually left out of calculations of Planktonic/Benthonic ratios (eg. by Paul, 1992 andby Leary et al., 1989;see also comments by Curry, 1982). In some Cenomanian-Turonian boundary (Oceanic Anoxic Event) samples, the finest residues (>63pm) examined by one of us (DJH) were dominated by calcispheres and heterohelicids, yet the latter were not included in P/B ratio calculations by Jarvis et al. (1988) (Leary, pers. comm.); a pity, since heterohelicids may be useful indicators of strong oxygen minimum zones (Sliter & Premoli Silva, 1990;Boersma & Premoli Silva, 1989). At the other end of the scale, larger specimens (>500pm) are also excluded; what can be the justification for ignoring larger benthonic foraminifera (e.g. orbitolinids) whenthey arejustasmuchapart of anassemblage as Gavelinella and must have played a role in chalk sea-floor communities? Paul even argues for the exclusion of large specimens of genera (e.g., Lenticulina) which are also 1.

3.
represented in the <500 >250pm fraction. Similar problems arise with chalk ostracod assemblages; coarse fractions are likely to be dominated by bairdiaceans and platycopids, while fine fractions may yield common and diverse cytherurids (Weaver, 1981). The fractions chosen for sieving are entirely artificial. P/B ratios calculated fromsuchanarbitrary selection of specimens may be useful in biostratigraphy and contain at least some of the original signal, but they are a poor excuse for palaeoecological data. First I would like to thank Horne and Slipper for their comments. They make some cojent points and enable me to clarify an implicit assumption behind my arguments that I omitted to state explicitly in the original paper, although I have made it elsewhere (Paul, 1992, p.130). However, I do find some of Horne & Slipper's arguments paradoxical. In the first paragraph I am taken to task for launching a "comprehensive attack on the use of standard counts and percentages in palaeoecology". Horne & Slipper assert flatly that I am wrong, by which I presume they mean that one should make standard counts. Later in the article, they state "Absolute abundance data are useful, of course, but should be considered in conjunction with (not instead of) relative abundance data." I could not agree more. That sentence succinctly summarizes the first aim of my paper. The implicit assumption that I omitted to spell out is that the two types of data are not, and cannot be, alternatives. This is a one-sided test. If complete data on absolute abundance are available, anyonecan calculate percentages (ie. relative abundance). If standard counts are made, no-one can estimate absolute abundance, not even the person who made the original counts and not even for the whole fauna let alone for each constituent taxon. My "attack was a plea to all palaeoecologists who record quantitative data (not just those working on the chalk) to do so in a way that makes both types of data available. One way to do so would be to count every microfossil present in a sample, but that would be extremely time-consuming and very inefficient. I suggested a technique which is only slightly more timeconsuming than making standard counts, but which yields estimates of both absolute and relative abundance. Even if I am wrong, as Horne and Slipper assert, this can only be confirmed by recording data on both absolute and relative abundance and demonstrating repeatedly that the former are consistently irrelevant or misleading. I am fairly confident that this will not prove to be the case, but I am absolutely certain that I will never be proved wrong so long as everyone continues to make standard counts. (This should not be taken as a coded plea to continue making standard counts. I am quite content to be proved wrong. That is how science advances).
Home & Slipper make four specific comments; the first three start with a direct quote from my paper, the fourth concerns sampling methods. I would like to consider each in turn and will number them 1-4. "Percentages are, in effect, standard counts of 100". This quotationis takenout of context. AllImeantherewas that all three disadvantages of standard counts apply equally to percentages, no matter how large the counts on which they are based. However, Horne & Slipper go on to make some additional points with which I would like to take issue. First no count will give a "true" representation of the composition of an assemblage. This can only ever be estimated. They are correct to point out that a count of 300 specimens will give a more accurate estimate than a count of 100. Their quotation from Phleger (1960) "that little if anything is to be gained by counting samples much larger than approximately 300 specimens and that the illusion of accuracy tends to be misleading." may be empirically acceptable for samples with 1.
low numbers of taxa (as one assumes is true of most heavy mineral assemblages). However, it will certainly not hold for a diverse fauna of more than 100 taxa, since a count of 300 only gives a 95% probability of detecting species present at 1% of the fauna. It does not hold, for example, if one wished to detect the nodosarian genera present in my Cenomanian samples (which have diversities well below 100) because the nodosarians are so rare. The fundamental relationship here is given by the equation: Where Q is the probability of overlooking a rare taxon, p is the proportion of the total fauna which the taxon constitutes, and n is the number of trials which in this context is the number of identified specimens (ie. the count).
Selecting values of Q and p determines the size of the count. With a typical population structure where a few species dominate and most are relatively rare, and with a diversity of over 100 taxa, p would have to be less than 0.01 (1%) and a suitable count would be considerably in excess of 300 to be even 90% certain of not overlooking the rarer forms. Shaw (1964, chapter 18) outlined the theory behind these calculations in detail, while Dennison & Hay (1967) and Hay (1972) have published extremely wide ranging graphs of values for Q, p and n.
2. Horne & Slipper's second criticism concerning the interdependence of counts and percentages initially misses the point. My Fig. 2 was simply constructed to demonstrate that patterns can be generated by echoes (i.e. by a passive response to changes in abundance of other taxa) when no genuine signal (i.e. a real change in abundance) occurs. Of course the example is totally artificial, it has to be because real samples are subject to all the vagaries which Home & Slipper rightly document. To illustrate my point the reader has to know what the truth is. I chose to state that the taxon did not vary in abundance whatsoever because this makes the resulting diagram simpler. Any other predetermined pattern could be substituted, but most would be swamped out by the echoes from the two taxa that do vary in abundance in this example. This artificial example makes no assumptions whatsoever about rate of sedimentation. Finally, I wholeheartedly concur with Home & Slipper's statement that signal and echo both contain valuable information. However, I cannot for the life of me see how anyone can test their assertion that the information "provided by the echo is arguably more relevant to palaeoecology" unless data are recorded in a way which allows one to distinguish between signal and echo. Standard counts and percentages do not allow one to do this. Again, assume Home & Slipper are right and I am wrong. How can this be proved unless data are recorded in the way that I advocated?
As regards the third point concerning trends inrelative versus absolute abundance, Home & Slipper have again taken my example tooliterally. Ididnot seek toexplainthe differences reported by Leary and Ditchfield (1989) in the abundances of Gavelinella and Hedbergella in chalks compared with mark I merely wished to point out that the trend in relative abundance 3.
is the reverse of that in absolute abundance, and that these reversed trends might lead to different interpretations if considered alone. I suspect Horne & Slipper are perfectly correct in their explanation of these opposite trends, but they could not possibly have arrived at their explanation without knowing what the absolute abundances of these genera are. Had PaulLeary not recorded totalnumbers, but just identified the first 300 specimens he saw in each sample, none of us would be any the wiser.
Home & Slipper make important points concerning biases that can creep in during sampling, processing and picking, with which I wholeheartedly agree. This serves to emphasize that standardization of techniques is essential. For example, I regret very much that I lost count of the number of freeze-thaw cycles my first batch of samples went through. Hence I cannot state exactly how many cycles they were subjected to and only that both batches were processed approximately equally thoroughly. More importantly, no-one can reproduce my experiments exactlynot even me.
Home & Slipper take issue with the practice of restricting counts to a single size fraction. They have an important point to which I cannot see a simple solution and they make no suggestions. Of course larger benthic foraminifera are important in palaeoecology; of course heterohelicids are too; but how can quantitative data from different size fractions be combined in a way that is both reproducible and meaningful? The P/B ratio (which is never recorded as a ratio but as a percentage) is widely used in foraminiferal studies, but what does it mean if different researchers record it in different ways? And how can we tell if they do, since some researchers do not record their method? I have shown how variable the socalled P/B ratio can be if one combines data from two size fractions, let alone from three or four to include the heterohelicids. The only suggestion I can make is to record in the way that I advocated from each size fraction, but this would involve at least four times as much effort. Would the results be worth it? In quantitative studies on molluscs, with which I am more familiar, it is standard practice to make a cutoff at 0 . 5 m and count everything above that size, combiningdata fromall fractions. However, this rarely results in counts over lo00 individuals. My richest Cenomanian sample had an estimate of over 7000 individuals in the >250 micron fraction alone. I cannot imagine what the total of individuals larger than 63 microns would be. I chose a

4.
compromise which I believe combined adequate data with a reasonably small amount of time and effort. Irecorded explicitly what I did so others could test my results by repeating my experiments as nearly as possible under the same conditions. I may not have chosen the best method, but my results are testable. That is the fundamental point. Unless details of sampling, processing and picking procedures are recorded, experiments are not reproducible and results cannot be tested. I would welcome Home & Slippers' views on this. Simply stating that different size fractions contribute valuable information does not solve the problem of how best to gather and record these data.
I have spent a good deal of the last twelve years trying to convince the scientific world in general, and palaeontologists in particular, that the fossil record is by no means as incomplete as we are often led to believe. In doing so I have also been trying to convince palaeontologists to extract and record as much data as possible from their sample. In this case I would argue thatwitha littlemoreefforttwice the amount of data can beobtained (i.e. absoluteand relativeabundance). Interestingly, Home & Slipper donot apparently dispute my interpretations of Milankovitch control on microfossil assemblages. Yet most of my conclusions could not have been formulated, let alone tested in the future, without data on absolute abundance.