A Review of Some Variance Formulas for Sampling a Lot of Coal

 

C. H. Proctor, Raleigh, NC, 2003

 

1.     Background

An earlier review of variance formulas covered those dealing with sub-sampling from a composite into a test portion.  This review will focus on formulas covering the sampling of increments drawn to form the composite.  For this case the variance reflects the level of some analyte, ash percent in coal will be our primary example, as it varies from one composite to another as the increment sampling operation is replicated on a given lot or a collection of lots of some type. 

 

Sampling of bulk materials is done in stages and the variances from the stages are added to get the overall variance of the final estimate.  Conceptually, bulk sampling can be viewed as multi-stage sampling or stratified sampling or some other simple design described by Cochran (1977), but its true complexity goes far beyond the standard statistical sampling of collections of discrete identifiable objects.  Bicking (1967) explains these peculiarities about as well as do most general discussions of bulk sampling. 

 

Consider the particles in the one-gram test portion that is analyzed for ash percent.  They are all of diameter less than 0.025 cm.  Thus there are over 500,000 of them.  They come from some 500,000 locations in the lot of around 1000 tons of coal.  If that lot could be frozen in space along a very long conveyor belt, much in the same order as it passed the sampling system, and the 500,000 locations could light up and shine through the coal, you would see the effects of the sampling stages.  One would notice the places where the cross-stream cutter swept off the increments and dark voids between.  Within the region of an increment one might see some more even spacing than normal with clumping at the smallest distances. 

 

If only we had the ash percents at each of the 500,000 points then we could use the elegant formulas in Cochran based on increment means and so forth or, better, the extended formulas from stochastic processes theory.  Of course we cannot do such detailed work but it is helpful to keep this model in mind.

 

2.     Coal sampling

The fundamentals of coal sampling were enumerated some time ago by Bertholf (1952b) and the fundamental variance formula given in his equation (2a) continues in use,

 

                          

 

Bertholf (1952b) cites eight estimates of trend variance found by applying the “analysis of variance technique” of Bertholf (1952a) that ranged from 5.43 to 0.02.  But he notes that these differences are fairly well explained by percent of free impurities and ash percent and other characteristics of a coal and that experience allows one to guess what will be the level of trend variance to be anticipated.  He defines the coefficient of trend variation as the percentage ratio of so to ash percent and recommends using 10% for raw coal from one source to 20% for raw coal from many sources and half of these values for washed coal.  One recognizes here the basic strategy of samplers everywhere to guess at the coefficient of variation as the first step to setting sample size.  He goes on to provide a judgment estimate for increment variance from particle sampling formulas based on particle average weight and to use it in setting required gross sample mass.  These were in the days before hand-held calculators and his results appear in nomographs.

 

There was a lengthy discussion of Bertholf’s (1952b) paper and one statistical expert made a statement about coal sampling that “¼the same principles of sampling that the Census Bureau uses can be applied,” while another disagreed saying, “I do not see why coal sampling should necessarily be handled with the same words as some of the old standard problems¼”.  The second statistician (John Tukey) went on to show with formulas how the variation in increment weights would impinge on increment variance and we will point to this issue as a major shortcoming of much of coal sampling theory but just note here that the coal samplers were warned.  The other statistician’s comment represents a mistaken view that I tended to hold myself. 

 

In the vein of wise hindsight it may be interesting to call attention to an operational issue that I am sure the people who design and operate sampling systems recognize but that Bertholf may not have fully understood.  This is in connection with his discussion of the newly emerging mechanical sampling systems which were commonly taking much larger increments.  After one determines the required weight of gross sample and finds that the mechanical sampling system was collecting way too much, Bertholf suggested that, “Increments¼may be resampled to eliminate the excess weight without the necessity for preliminary crushing-or, the resampling may be done after the entire gross sample is collected.”  If the resampling can be done before combining then it would seem that a more even representation of the increments into the test portion would be achieved than if the increments are mixed and then resampled.  The only way to resolve such a question would be do an experiment to compare the two alternative treatments.  Such an experiment would also show the relative costs associated with the alternatives.  In my own experiences it would be unlikely that present day coal sampling practitioners would be willing to collect such data - - there seems to be less reliance on experiments today.  

 

Now that the basic variance formula has been outlined we will review its development and again we turn to an article by Bertholf (1954) in which he moves from the variance formulas of sub-sampling to those of increment sampling in coal.  He first reports on Bushell’s (1937) experiments showing how variance decreased as increment weight increased – rapidly at first then leveled off.  Then he cites Landry (1944) who noticed “the relation between increment weight and variance (to be) a straight line of slope a-1 on a logarithmic plot.”  If the slope had been -1, then variance would have dropped as the reciprocal of numbers of particles but as a inches up the drop is less pronounced and reflects some adjacency correlation in ash content among the particles.  This happens as the particles are crushed if the offspring fragments tend to keep together and to be similar. 

 

Bertholf lists his own formula and next cites a formula from Visman’s thesis of 1947 as (B+A/w)/N+C, where letters replace the sigma’s and the sample preparation and analysis variance is denoted as C.  We will later on see that different sources of data (weighted ANOVA versus the “two-sizes” formula) underlie these two similar looking variance expressions.  The remaining papers that Bertholf cites contain data and discussions but contribute few formulas, except for one by Jowett (1952) on variograms that we will deal with later on.  A Liddle (1951) one mentions “duplicate sampling” and may also be worthy of special mention. 

 

3.     Compositing with Random Weights

A feature missing from Bertholf’s scanning of the literature, and indeed missing in much of bulk sampling, is the statistical theory of the compositing (or combining or pooling) of somewhat unequal amounts of physical material to get a physical average.  This theory was introduced by Brown and Fisher (1972) and extended by Elder, Thompson and Myers (1980) based on a matrix formulation by Rohde (1976).  The operational feature measured by a coefficient of variation among “sampling ratios” is recognized as important in coal sampling. The Brown and Fisher formula incorporates such a coefficient of variation

 

 

 

        

 

 

The sampling design considered by these authors was of bales of wool to determine percent impurities.  The customs inspector faces N bales in a lot and randomly selects J of them.  From each selected bale, K core samples were drawn and then all JK are pooled.  This pool or composite was well mixed and then partitioned into R equal parts of which r were randomly selected for analysis.  The average of the r determinations is the estimate,

 

                     

Notice that Brown and Fisher pay attention to the finiteness of N and of R but several terms cancel as these become large.  The squared coefficient of variation for the weights is,

                                                                                                               

and can be inserted into the variance formula.  Gradually their formula moves toward just the product of the last two terms times 1 plus a CV-squared,

 

                                                                             

 

 

 

  This form is common for variances of compositing with random weights but the degree of approximation is sometimes not close as when the sampling design involves systematic spacing of the increments from an autocorrelated process.  It is exact when K=1 or one core per bale, however. 

Such formulas are much easier to derive with Rohde’s (1976) matrix version in which the estimate is viewed as the product of a vector of weights by a vector of increments means.  That is,

 

The second term in the variance will be zero and the first term is simple if the covariances are not too complicated.  The last term can, however, be messy and that is where the approximate form of breaks down. 

 

  1. Check Sampling for Precision

We apologize for taking up so much space with formulas for compositing that are so little used, but they will certainly find more applications in the future.  By way of transition we will look at some data from a carefully conducted experiment reported on by Ishikawa (1965) on the manual sampling of coal in Japan.  Twenty sublots (rail cars) were each sampled at double the routine rate and two interleaved gross samples each of 20 increments were extracted.  Increment size was about 500 grams and average sizes were given so we are able to calculate the coefficient of variation and apply the previous formulas.  The gross samples were split into two sub-samples and each analyzed in duplicate.  Ishikawa (1965) calls this a “Check sampling for precision” and the data are in Table 1.

 

 

  

 

 

Table 1.  Percentage ash in 20 lots of coal.

 Analysis values x1 to x4 are from gross sample A and x5 to x8 are from gross sample B.  Duplicate analyses are nested in lab samples, which are nested in gross samples.  WA and WB are average sizes, in grams, of the 20 increments (scoops) in gross samples A and B.

 

  SubLot  x1    x2    x3    x4    x5   x6     x7    x8    WA    WB  

1  9.14  9.28  8.92  8.88  9.32  9.41  8.50  8.48   478   496

2  9.66  9.70  9.82  9.82  9.46  9.46  9.82  9.64   405   403

3  7.50  7.36  7.42  7.50  7.46  7.52  7.48  7.40   513   506

4  8.72  8.86  8.91  8.94  9.10  9.24  9.90  9.90   407   466

5  9.26  9.28  8.82  8.78  8.82  8.66  9.42  9.38   589   574

6  8.98  8.98  8.96  8.76  8.90  8.94  8.86  8.70   652   611

7  8.86  8.78  8.86  8.84  9.06  8.82  8.96  8.84   664   611

8  8.72  8.78  8.52  8.90  8.68  8.92  8.52  8.84   563   583

9  8.70  8.84  7.20  7.32  8.78  9.11  8.98  8.90   560   569

10  7.30  7.06  7.41  7.50  8.50  8.80  7.46  7.34   641   584

11  8.54  8.36  8.02  7.80  8.54  8.14  8.26  8.34   542   547

12  8.34  8.10  8.48  8.22  8.32  8.16  8.68  8.62   599   516

13  7.34  7.30  6.94  7.12  7.10  7.20  7.20  7.28   540   547

14  8.94  9.10  9.06  9.00  9.14  9.06  9.10  9.28   507   511

15  8.54  8.60  9.00  8.88  8.90  8.96  8.76  8.86   521   566

     16  8.98  9.04  8.98  8.90  8.84  8.92  8.74  8.84   477   567   

17  8.80  8.86  8.84  8.84  8.88  8.86  8.90  9.08   644   691

18  8.66  8.81  9.14  9.22  8.74  8.92  8.42  8.52   715   720

19  7.26  7.28  7.02  7.14  7.20  7.02  6.74  6.84   509   491  

20 10.22 10.12 10.06 10.04 10.62 10.68 10.20 10.20   479   502

Source:  Ishikawa, K. (1965)

 

 

A perfectly legitimate model for such data is one involving components of variance as,

 

       

Estimates of the variances can be obtained from a nested ANOVA and estimates of the variance of an estimate for any sublot, based on just one gross sample, is their sum.  The reader should verify that this sum is 0.108 (=0.0081525+0.08940+0.01062) and thus the standard error would be 0.33.  Since the coal was about 9 percent ash the relative standard deviation is 4% and thus precision is marginally acceptable, in that 5% is the usual requirement on the standard deviation (and 10% on two standard deviations, which is what is usually called “precision” in coal sampling). 

 

Alternatively to the additive components model, one can recognize that all three steps involved composites and thus three coefficients of variation enter into the variance.  For this case we decompose the random weights a of into the product of three independent components aa­­­­, ab and ac say.  The first has average equal to 1/N where N is the number of increments going into each gross sample and the other two have expected values of 1 and represent random imbalances due to sub-sampling.  They are taken to be exchangeable and independent and would be expected actually to follow such a pattern approximately. 

 

Now we apply the matrix machinery of to the simple case of separate variances for each increment and make use of the fact that CV-squared of a product of random variables equals the sum of the separate CV-squared’s plus all possible cross products.  The result is,

                                                                         

We have written sIS2 for the underlying gross sample variance.  The component sa2 from above is equal to sIS2(1+CV12) so that once we find CV1 we can estimate it from the estimate for sa2.  In Table 1 are average weights of the twenty increments in a gross sample.  Their variance will be 1/20 of the variance of an individual increment weight and when we make this adjustment we find that CV1 is about 64% for the weight of an individual increment.  Patient substitution reveals that estimates for CV22 and CV32 are found as sb2/  sa2 and sc2/(sb2+sb2).  The solutions are thus CV2=331% and CV3=33%.  These reveal a serious problem at the sub-sampling stage and we suspect that the gross sample was not crushed before splitting.  The 64% is not alarming, nor is the 33%, but the 331% is. 

 

 

5.     Resume General Review

One can find no better early (or even later) introduction to bulk sampling than E. S. Pearson’s 1934 paper on “Sampling Problems in Industry.”  He organized his discussion around three dichotomies, A. Method of sampling (1) Random versus (2) Representative, B. Type of material (1) Separate units versus (2) Bulks, C. Place of sampling (1) Production versus (2) Delivery.  Our interest is in B(2) and after giving numerous examples he offers five conclusions, (a) “the general custom is to obtain and mix together into a single gross sample, portions of material collected from a number of parts of the whole bulk,” (b) testing duplicates from the gross sample only tells the reliability of analysis, not of sampling, (c) elaborate rules are only justified if evidence from experience “provides a truly representative gross sample,” (d) “the only satisfactory procedure (for estimating variance) is for the same complete process to be carried out independently several times on the same consignment or batch and the standard deviation of these independent results obtained,” (e) sampling a stationary pile differs from sampling a stream. 

    Pearson goes on to describe an experiment with a mixture of rice and peas that illustrated biases due to the different bouncing of the peas from the rice.  A second example was of hourly nitrogen contents of a chemical product that showed a trend and he illustrated the effect of getting a “representative” sample by contrasting the variability among 4 interleaved systematic samples versus random samples of the same size.  This term “representative” refered to the use of stratification or of the hidden stratification of periodic sampling but its definition has never been operationalized carefully enough and is now not a useful technical term. 

 

In Appendices he illustrates some calculations but one must be wary of his statistical intuition.  He finds that the “variance” of a systematic sample from an autocorrelated process increases with the degree of correlation.  That seems contrary to his endorsement of the “representative” approach above.  The problem is that the variance of concern should not be with that around the process mean but with that around the mean of just the stretch of the process serving as the population for the sampling.  He makes much of the rice and peas example but fails sufficiently to caution users about trusting such artificial demonstrations to guide work with real materials.

 

6.     Examples of Data

Among studies that do provide data we begin with those found in the two ASTM Special Technical Publications, Nos. 114 and 126.  An article by Bertholf ( 1952) listed 150 ash percents determined from 6 groups of 5 sets of increments for 5 sizes of increment.  The five sizes of increments were around 14, 40, 62, 82 and 104 - - all in pounds.  The 150 individual weights of increment were also listed, although there was no mention of a coefficient of variation.  The CV’s were, in fact, small at 9%, 8%, 9%, 7% and 6% for the size classes lightest to heaviest.  The data were used to do weighted analyses of variance and this is a notable achievement.    The data are in Table 2 and the reader might wish to check that most statistical packages will do the weighted analyses.

 


 

 

                     Table 2.   Data from Bertholf’s Enos sampling constants study

                    The variables y1 to y5 are ash percent for five sizes of increments

                          Obs     y1      y2      y3      y4      y5

                            1    10.8    11.3    11.7    11.5    11.6

                            2    13.6    14.2    13.5    13.6    13.4

                            3    13.7    12.1    12.8    12.4    12.4

                            4    12.1    11.8    12.7    12.4    12.0

                            5     9.9    10.3    10.1    10.2    10.8

                            6    12.7    12.2    12.7    12.5    12.8

                            7    12.3    12.1    11.8    12.1    12.1

                            8    12.1    11.2    11.1    11.2    11.6

                            9    12.7    11.7    12.2    12.0    11.9

                           10    11.8    12.5    12.1    12.5    12.1

                           11    13.4    13.4    13.5    13.5    13.5

                           12    10.8    11.2    11.2    11.6    11.0

                           13    12.7    12.4    12.4    12.3    12.7

                           14    13.2    12.8    13.1    12.9    12.7

                           15    13.7    13.7    13.2    12.7    12.9

                           16    12.2    11.9    12.4    12.7    11.8

                           17    12.5    12.5    12.9    12.9    12.7

                           18    12.6    12.7    13.0    13.1    12.9

                           19    13.4    13.3    13.2    12.8    13.3

                           20    14.0    13.9    13.2    13.5    13.0

                           21    13.2    13.1    12.8    13.8    13.2

                           22    14.4    14.3    14.6    14.9    15.0

                           23    14.3    13.5    13.8    13.8    14.5

                           24    13.5    12.8    13.0    12.1    12.8

                           25    13.2    14.0    13.6    14.5    14.0

                           26    12.1    13.0    12.6    12.8    12.8

                           27    10.5    10.3    10.9    10.9    10.8

                           28    12.3    13.3    13.6    13.6    13.3

                           29    12.6    12.4    12.6    12.5    12.4

                           30    12.5    11.8    12.2    12.2    12.0

 

For such an analysis one treats ash percents from increments of mass 14 pounds as if they were means of 14 observations, those from 40-pound increments as if they were on 40 observations and so on.  The ANOVA calculations are straightforward and produce an estimate (5.02) of the variance of an increment mean based on a one-pound (or “unit”) size.  The estimate of the variance from set to set is not so straightforward in that there are unequal numbers of “observations,” but we agree that Bertholf’s estimate of  0.87 is correct enough.  Having these two estimates of variance components, one can express the variance of increment sampling as (0.87+5.02/w)/N where N is the number of increments going into the gross sample and w is the (target) size of each increment. 

 

Such a formula would be valuable in helping to decide what size increment to take and how many would be needed but it has limitations.  It is for one coal and more data would be needed to show how consistent is the formula.  However, doing separate analyses of increments is expensive and even to this day such data are scarce.  In an Appendix to his paper Bertholf shows how data should be collected from a split of the increment to estimate sample preparation variance and thereby avoid his unrealistic assumption that the data were without measurement errors.  However, this doubles the analytic expenses and may further discourage collection of additional data.  Finally, there is implicit in this method the supposition that increment variance goes down as the reciprocal while the data themselves suggest that the exponent is closer to zero than is –1 - - perhaps even –0.5.  .  Such a modification does not change the results all that much but it is a conceptual hurdle in gaining acceptance for the formula.  One remembers that proving that the variance of a sample mean is proportional to the reciprocal of sample size requires the observations be independent and a clump of one-pound amounts of coal going into one and the same increment will likely be more alike than independence would produce.     

 

There is another notable set of data in STP 114 presented by Bertholf and Webb (1954) that uses three sizes of hand-removed increments, machine increments in the same set and at alternate sets, splits, different laboratories using alternative splitting régimes, a float-sink determination - - a really monumental experiment - - but again for just that one coal.  The data from the hand-removed increments are given in Table 3 and the reader is invited to check that the components estimated from the data are 0.7474 and 6.8950. The equation that Bertholf and Webb chose to work with turned out to be (0.2+4/w)/N.  The shift from 0.7474 to 0.2 is due to definition of the lot as only one day within one coal, while the data in Table 3 cover 10 days and a short stretch of another coal.  The change from 6.8950 to 4 is due to greater reliance on a float-sink determined variance estimate. 

 

Table 3. Cabin Creek Experiment, Sets as COL numbers for three sizes of hand-removed increments, 1. 14-lb, 2. 40-lb and 3. 100-lb.  y1, y2 and y3 are ash percents, sets 56-59

Came from another (strip rather than gas) coal.

 

 Obs _NAME_ COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 COL11 COL12 COL13 COL14

  1    y1=   10.8  9.6 11.1  9.7  9.6  9.8  9.5  9.4 10.7  8.2   9.4   9.1   9.3   8.9

  2    y2=   10.0  9.6  9.5 10.0  9.3  9.7  8.6  9.8 10.5  9.8   9.5   9.2   9.7   9.5

  3    y3=   10.0  9.5  9.6  9.6  9.7 10.0  9.1  9.7  9.8  9.5   9.6   9.2   9.0   9.0

 

 Obs COL15  COL16  COL17  COL18  COL19  COL20  COL21  COL22  COL23  COL24  COL25  COL26  COL27

  1   9.6    9.1     9.8    9.6    9.2   11.6    9.5   11.1    9.8   10.5    8.7   10.3   10.5

  2   9.9    9.2    10.3    9.3   11.2   10.2    9.7    9.4   10.3    9.8   10.9    9.7   10.0

  3   9.9    9.4     9.4   10.0    9.8   10.5   10.2    9.5   11.0   10.8   10.5    9.3   10.4

 

 Obs COL28  COL29  COL30  COL31  COL32  COL33  COL34  COL35  COL36  COL37  COL38  COL39  COL40

  1   12.0    9.9   10.0   10.9    9.5    9.6   10.5   10.0   10.0    9.9   10.4    9.9    9.5

  2   11.6   10.1   10.5   10.4   10.1   10.0    9.9   10.6   10.1   10.6   11.6   10.2   10.2

  3   10.8   10.6   10.2    9.8   10.0   10.0    9.9   10.5   10.1   10.8   10.5    9.9   10.1

 

 Obs COL41  COL42  COL43  COL44  COL45  COL46  COL47  COL48  COL49  COL50  COL51  COL52  COL53

  1   10.1   10.5   11.9   11.7   10.7   10.6    9.6   10.2   10.6   10.5   9.8     9.7    9.5

  2   10.0   10.4   11.4   11.0   10.5   10.7   11.2    9.6   10.1   10.4   9.8    10.3    9.9

  3    9.9   10.6   11.2   11.1   10.9   10.4   10.4    9.1    9.9    9.8   9.5    10.5   10.2

 

 Obs COL54  COL55  (COL56 COL57  COL58  COL59) COL60  COL61  COL62  COL63  COL64  COL65  COL66

  1   12.2   11.0   7.3    7.0    6.7    7.3    11.5    9.9   11.9   11.3   9.4    11.2   11.8

  2   11.5   11.2   7.2    6.9    6.2    9.7    11.3    9.9   10.7    9.3   9.4    11.7   11.7

  3   11.9   12.1   8.2    6.7    6.1    9.2    10.7   11.1   10.3   10.4   9.4    12.0   11.1

 

 Obs COL67  COL68  COL69  COL70  COL71  COL72  COL73  COL74  COL75  COL76  COL77  COL78  COL79

  1   10.5   10.1   10.2   11.0   9.2     9.4   11.9    9.9   10.6   10.9   10.2   10.7    9.7

  2   11.2   10.4    9.8   10.6   9.6    10.0   10.2   10.8   10.9   10.3   10.5   10.6   10.0

  3   11.0   10.7   10.5   10.5   9.6     9.9   10.8   10.4   10.3   10.3   10.7   10.5    9.8

7.     Splitting Stages Design

Another experimental design was carried out by Aresco and Orning (1965) that analyzed splits at a number of stages.  As they describe it, “The increments were taken from a moving stream at equal intervals of time by means of a shovel moved across the entire width of the stream.  ¼successive increments by rotation were placed into four different sample containers, A, B, C, D, to give four replicate samples.  Each replicate sample consisted of 25 increments of 10 lb each or a total of 250 lb.”  They continue, “Each 250-lb gross sample was crushed to minus 8 mesh and split ¼into two 125 sub-samples, S and R, each of which (was further reduced and split to 4-lb sub-samples)”.  The 4-lb laboratory samples were first split into 2-lb halves and each was “riffled to 50 g.”  Two 1-gram test portions were analyzed from each 50-gram test sample.  Notice that large units were first split into halves and then riffled down to the next size unit.  The exception is the 60-gram test sample that is routinely sampled by spatula scoop. 

 

The design produced 32 determinations from each lot of coal (four replicates samples by two 125-lb sub-samples, two 2-lb laboratory samples, two 50-gram test samples and two test portions of 2-gram each).  It differs only slightly from the check sampling for precision of Isikawa (1965).  The whole experiment, which was sponsored by the U. S. Bureau of Mines, covered a variety of coals, 73 sets from raw coals and 27 sets from washed coals.  The authors grouped the sets by particle size and used ANOVA to estimate four variance components. 

 

Our interest centers on VS, sampling variance from the four replicate samples, and Vf, the sum of the remaining three components to represent sample preparation variance.  The Cabin Creek coal was a raw coal, around 2-inch with ash percent of 10.  We can bracket that coal with two entries from Aresco-Orning results for ash levels of 8.21 and 11.96 that average to VS=0.097 and Vf=0.016.  The Cabin Creek formula, after Bertholf(1954) agonizes over the estimates, becomes, (0.2+4/10)25=0.032 and that is quite a bit smaller than 0.097.  The difference can be explained in part by the greater heterogeneity of the Aresco-Orning lots as compared with the relative homogeneity of the hourly increments within a day at Cabin Creek.  There is another Aresco-Orning result for 2-inch coal that is VS=0.0627 and if the Cabin Creek data are allowed to show day-to-day effects as well, that formula gives 0.056 which is in good agreement - - but one can push these estimates around too easily. 

 

We should have mentioned that there were experiments to estimate sample preparation variances as well at Cabin Creek (Bertholf and Webb, 1954).  In fact four different laboratories were assigned to splits in a Latin Square pattern and four variances, one from each lab, were estimated.  The differences were striking.  One lab achieved a variance of 0.007 and another slipped to 0.103.  The Aresco-Orning value of 0.016 seems quite acceptable, while some of the Cabin Creek labs did poorly.  This variation among laboratories is a fact of life that is sometimes lost to view.    

 

8.     Two-Sizes Design

One modification to the weighted ANOVA approach and championed by Visman, is to use only two sizes of increment, choose them extreme enough to bracket the usable range of sizes, and then fit just to the two variances.  This reduction in size of experiment apparently allowed Visman to collect considerable data but where one can find these data is somewhat a mystery.  

 

Visman (1954) decided to work with a 2-inch untreated stove coal with impurities and coming from three different seams as he reasoned it would allow binomial variance to shine through.  As he states, “Three series of 35 samples were collected simultaneously from a shaking conveyor, at regular intervals, over a period of 4 hr (approximately 20 tons).”  These data, originally on three different sizes of increments, have been around for some time as an example of how to estimate the A and B sampling constants in his formula from just two sizes of increments.  Recall that the experimental design calls for pairs of increments to be extracted from nearby locations on the belt.  This corresponds to an agricultural experiment done in complete blocks with two treatments and uses the same philosophy of blocking to remove the trend (in coal sampling) or soil fertility (in agricultural field experiments). 

          


       

  Table 4.  Visman’s Table 1 Ash Percent for three sizes of increments at 35 locations along the belt    

                                     53.3         19.2       22.3

                                     34.8         25.1       18.1

                                     19.9         47.7       14.6

                                     16.6         10.9       20.5

                                      4.3         12.4       16.2

                                     13.0          6.4       15.4

                                     16.9         12.8       11.8

                                     11.8         38.6       28.0

                                     17.3         13.1       18.4

                                     41.8         16.0       19.3

                                     24.0         18.6       23.1

                                      8.0         21.7       22.7

                                     14.5         17.2       19.0

                                     14.9         11.4       24.0

                                     27.7         12.4       25.0

                                      8.0         24.3       18.0

                                     37.3         21.0       26.3

                                     24.0         14.2       17.3

                                     50.0         21.6       12.2

                                     17.0         12.0       24.6

                                     18.8         22.7       24.3

                                      7.6          6.5       14.2

                                     38.7         37.2       27.6

                                     12.5         15.6       13.4

                                     38.0         24.2       20.9

                                     43.2         16.2       22.0

                                     29.8         26.4       24.9

                                     36.2         26.6       23.4

                                     52.3         32.0       23.7

                                     19.8         17.4       17.8

                                      6.1          6.8       16.3

                                     34.3         26.8       16.0

                                      8.3         22.7       20.7

                                     11.5         10.9       19.5

                                     57.0         46.5       10.6

 

     

    Table 5. The MEANS Procedure for Variables of Visman’s Table 1.

 

                                                                    Coeff of

             Variable    Label                          Mean       Variation     N

             ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

             wa          1. increment mass g     185.0857143      32.6728776    35

             aa          1. ashpercent            24.8342857      61.0727890    35

             wb          2. increment mass g     602.1428571      22.6885650    35

             ab          2. ash percent           20.4314286      50.5265533    35

             wc          3. increment mass g         6538.89      22.8778533    35

             ac          3. ash percent           19.7742857      23.4237576    35

             ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ


 

 

A later Visman and Aresco (1979) paper illustrates how the two sample variances are fitted to the model, si2=B+A/wi where i=1 and 2 for the two sizes.  With two parameters and two unknowns the estimates are obtained by setting A=(s12-s22 )(1/w1-1/w2) and, once having a value for A, B=s12A/w1.  For this example with s12=230 and s22=21.5 one finds A=87 and B=15.5.  Notice the relatively large size of trend variance that reflects the scrappy nature of that 20 tons of coal Visman (1954) sampled. 

 

There are extensions to this use of just two sample variances, in that the sample covariance might also be fit and, of course, fitting to more sizes allows for even more variances and covariances to be included.  In fact that original data with all three sizes of increments, as shown in Table 1 with relevant statistics in Table 2, can be entered into a program called PROC CALIS of the SAS package that fits to the six covariance statistics to yield estimates of the three components of within-block variance plus the trend variance.  The estimates are found to be 154, 68 and 14.2 as within-block variances and 7.6 for trend variance.  The question arises whether the three variances follow a reciprocal law or is some exponent, less than one, more appropriate.  Upon fitting to logs the exponent is estimated as -0.66 (close to one) and the constant as 83.8.  This means that the variance formula will appear as (7.6+83.8/w0.66) as compared to the (15.5+87/w) found from the two-sizes method above.  Their agreement seems to verify both approaches.

 

9.     Time Series Method

Returning now to a time series model for data with just one size of increment, we find a more complex view of the trend in Jowett’s (1952) work.  In addition to variance the trend is viewed to have covariances or serial correlations as well.  Jowett compares two analysis of variance methods to a time series method for calculating sampling variances.  His data came from analyses of ash percent on individual increments taken at one-minute intervals while loading four trucks.  There were 20 increments taken from each truck and the data appear in Table 6. 

 

 


 

                              Table 6.     Jowett coal truck ash data                                

 

                                  y1      y2      y3      y4

 

                                 4.80    5.11    5.42    5.82

                                 4.96    4.58    5.32    5.91

                                 4.98    4.98    5.26    6.20

                                 4.58    4.43    5.17    6.12

                                 4.54    4.75    5.19    5.70

                                 4.51    5.20    5.38    5.94

                                 4.29    4.95    5.27    6.03

                                 4.55    5.11    5.54    5.90

                                 4.45    5.26    5.52    6.06

                                 4.28    5.22    6.26    5.73

                                 4.10    5.07    5.92    5.77

                                 4.30    5.16    5.53    5.68

                                 3.85    5.22    5.56    5.93

                                 3.83    6.08    5.61    5.81

                                 3.94    5.53    6.07    5.60

                                 4.01    5.56    5.61    5.84

                                 3.87    5.40    5.52    5.47

                                 3.88    5.07    5.81    5.86

                                 4.07    5.29    5.63    5.35

                                 4.13    5.22    5.48    6.11

 

 

Under the first ANOVA method Jowett created zones of four consecutive increments and estimated the within-zone variance.  He treats the zones as strata and calculates what would be the variance of a sample of size 5 increments - - one coming from each zone or a one-per-stratum design.  It was 0.0085.  The second ANOVA method created 5 interleaved samples of every 4th increment and estimated their variance.  It was 0.0053.  The time series method involved diagramming a variogram, which is a plot of the squared difference between two increments as a function of their separation distance.  Jowett then picked off the needed covariances that enter into the expression for the variance of a systematic sample.  It was 0.0059.

 

The interleaved samples approach is closest to Pearson’s (1934) call for simply replicating a sample design and can serve as a reference method.  The stratification approach leads to an overestimate that likely results from serial correlation within zones not reflected in the 0.0085 but still operating to improve the variance of the systematic sample.  Jowett advocates the variogram approach because of its flexibility in furnishing variances for many spacings of systematic samples.  However, his manipulations of graphs and formulas raise questions of how accessible the method is to unskilled users such as myself. 

 

Time series methods are much used in connection with data from continuous emissions monitors and nuclear activation analyzers and other non-invasive methods of getting data on analyte concentrations from bulk material and from coal in particular.  We are not sure that these techniques belong as part of “bulk sampling”.  That is, we cannot point to any definite amount of material as having been analyzed.  At any rate they will require a quite different statistical model and so we decided not to include them in this review. 

 

10. General Treatments of Bulk Sampling

A great many more papers and even books have been written on bulk sampling and its variances but we find that there is very little additional data and a whole lot of speculation.  A few notable events deserve mention.  We mentioned Deming and the sampling of bales of wool and should have recounted that Deming presided over the conference accompanying the first ASTM STP114.  Judging from the discussions to those papers, there was somewhat of a clash between survey sampling based on listings of identifiable items versus the dynamic nature of bulk sampling on the fly.  I suspect that Deming, and even Tukey who also attended, did not completely appreciate the operational aspects of coal sampling.  Coming from that background of bales of wool that could be listed, may have caused survey samplers to miss the need for treating the particle as a sampling unit (See our earlier review.). 

 

In general discussions of bulk sampling (Bicking, 1967) much had been made of the distinction between segmented lots and unsegmented lots but even so we notice that enumerative sampling types (like myself) persist in listing locations where increments can be extracted and in believing there is some box or bag containing the increment when there definitely is not.  Perhaps some such confusion accompanied the discussions in papers, too numerous and confusing to cite, between Acheson Duncan, who did pioneering work (Duncan, 1962) on bulk sampling, namely on fertilizer (in bags), and Jan Visman who offered a “general theory” based largely on his coal experiences (Visman, 1969).  Duncan was concerned with the appropriateness of the basic formula because he suspected intracluster correlation among the particles.  Visman countered with artificial experiments with lead pellets and, most unfortunately in my view, Duncan did his own experiments with the pellets to show correlations.  They were not able to resolve their differences as would be expected. 

 

Such arguments over variance laws as these two had, have been very common in the field of sampling.  This is because sampling units are generally located in a stochastic process with complicated correlations that can follow any number of special patterns.  The intracluster correlation is widely used in survey sampling, while the Smith’s b is common in agricultural surveys as are other “laws,” many listed by Cochran (1977, Section 9.5).  In bulk sampling the “Smith’s b” approach is upheld by Rose’s (1983) gauge invariance, by Mandelbrott and Van Ness (1968) fractional Brownian motion and Varmarke’s “exponential power law” (1983).  The variogram is touted by Gy (1982) and may be entering into the standards literature as well.  The proponents tend to argue hotly for their cause but generally by showing how well their own version fits what little data they can find.  They seldom show how badly the other ones do because generally the other ones do almost as well.

 

11. Recent Developments

There is a class of formulas that have been developed for anticipating the variances of a bulk sampling design that show high promise and have been worked on by ourselves and that therefore we cannot resist mentioning.  They have the horrible acronym of BSPVF for bulk sampling planning variance formula.  They build on the formulas for particle sampling reviewed previously and accept the basic multi-stage nature of bulk sampling in that they are a sum of variances all having the same basic formula. They own much to Gy (1982), (See also Patricia Smith (2001)) but they eschew the variogram and instead employ Smith’s b for both systematic sampling and for clustered sampling adjustments.  They make no distinction between increment sampling versus sample preparation.  They require a detailed specification of the sampling setting and perhaps that is their main virtue in requiring the designer of the plan to tell all the details.  An example of the use of this formula is found in the Appendix from which we list the specifications,

 

1.      The analyte percent in the lot is P. (P=10% ash content is not uncommon.)

2.      The specific gravity of the material is SpG.  (SpG=1.4 is usual.)

3.      The stages are indexed by i=1, 2, ¼, F, where F stands for the “final” or analysis stage. (F=1 means each increment is analyzed and this is done, for example, by nuclear activation analysis on a flowing coal stream, but is exceptional.  More commonly F=3, 4 or 5 with the last two stages as sample preparation and the first ones as increment extraction stages.)

4.      Lot size is em1, which is entering amount at the first stage in grams, while entering amounts are emi=ni-1mi-1 for the other stages.  This supposes that all stages must be viewed as extracting a number of increments.

5.      The number of evenly spaced increments (which can be riffler compartments) is denoted ni.

6.      The target increment (or compartment) size is written mi in grams.

7.      The coefficient of variation of increment sampling rates is CVi.  (CVi=0.3 more or less.)

8.      The nominal top size diameter of particles entering the ith stage is di in centimeters.

9.      The liberation diameter in centimeters of free analyte particles is dL.  (It is taken to be 0.2 cm for raw coal and 0.01 cm for washed coal.)

10.  The level of handling effectiveness, CA which ranges from 1 to 4, and governs how far above 1 the bIS,i values go and how far below 1 the bSP,i values go. 

 

Here is the formula,

            


 


This formula is based on data but only remotely.  It was checked against the Aresco-Orning sets, for example. 

 

12. References

 

Aresco, S. J. And Orning, A. A. (1965) “A Study of the Precision of Coal Sampling, Sample Preparation and Analysis,”  Transactions of the Society of Mining Engineers, vol.253, pp. 258-264.

 

ASTM. Designation: D 2013, (1985) “Standard Method of Preparing Coal Samples for Analysis,” Annual Book of ASTM Standards, vol. 05.05, Gaseous Fuels; Coal and Coke. 

 

Bertholf, W. M., (1952a), “The Analysis of Variance in a Sampling Experiment,” ASTM, STP 114, Symposium on Bulk Sampling, Pa.     Symposium on bulk sampling; presented at the fifty-fourth annual meeting, American Society for Testing Materials, Atlantic City, N.J., June 18, 1951.    STP 114,  1952  Sponsored by Committee E-11 on Quality Control of Materials."

 

Bertholf, W. M., (1952b) “The Design of Coal Sampling Procedures,” (with discussion) ASTM, STP 114, Symposium on Bulk Sampling, Pa. Pp. 46-65  

 

Bertholf, W. M., (1954) “The Development of the Theoretical Basis of Coal Sampling,” pp. 5-29, ASTM STP 162, Symposium on Coal Sampling, Pa. 

 

Bertholf, W. M. and Webb, W. L. (1954) “Tests of the Geary-Jennings Sampler at Cabin Creek,” ASTM, STP 162, Symposium on Coal Sampling, Pa.

 

Bicking, C. A. (1967), “The Sampling of Bulk Materials,” Materials Research and Standards, vol. 7, pp. 95-116.

 

Cochran, G. W., (1972), Sampling Techniques, Wiley, NY.

 

Gy, P. M., (1982), Sampling of Particulate Materials: Theory and Practice, Elsevier, NY

 

Ishikawa, K. (1965), “Some Experimental Methods for Bulk Material Sampling,” Report on Seminar on Sampling of Bulk Materials, U. S. – Japan Cooperative Science Program, National Science Foundation and Japan Society for Promotion of Science.  Tokyo, 187-223. 

 

 

Jowett, G. H., (1952) “The Accuracy of Systematic Sampling from Conveyor Belts,” Applied Statistics, vol. 1, pp. 50-

 

Kassel, L. S. and Guy, T. W., (1935), “Determining the Correct Weight of Sample in Coal Sampling," Journal of Industrial and Engineering Chemistry, vol. 7, pp 112-115.

 

Manderbrot, B. B. and Van Ness, J. W. (1968), “Fractional Brownian Motions, Fractional Noise and Applications,”  SIAM Review, vol. 10, pp 422-437.

 

Merks, J. W., (1985), Sampling and Weighing of Bulk Solids, Gulf Pub. Co., Houston. 

 

Rose, C. D., (1983), “Variances in Sampling Streams of Coal,” Journal of Testing and Evaluation,  vol. 11, pp. 320-   .

 

Smith, H. F., (1937), “An Empirical Law Describing Heterogeneity in the Yields of Agricultural Crops,” Journal of Agricultural Science, vol. 28, pp 1-23.

 

Smith, P. L., (2001), A Primer for Sampling Solids, Liquids, and Gases Based on the Seven Sampling Errors of Pierre Gy, SIAM-ASA, Philadelphia.

 

Tanner, L. and Deming, W. E., (1950), “Some Problems in the Sampling of Bulk Materials,” ASTM Proceedings, vol.49, pp. 1181-1188

 

Vanmarcke, E., (1983), Random Fields, Analysis and Synthesis, MIT Press, Mass.

 

Visman, J. and Aresco, S. J., Chapter 2, “Sampling of Coal,”  Ed. Leonard, J. W., (1979), Coal Preparation, American Institute of Mining, Metallurgical and Petroleum Engineers, 4th edition. 

 

Visman, J. (1954) “Tests on the Binomial Sampling Theory for Heterogeneous Coals,” ASTM STP 162, Symposium on Coal Sampling, Pa.

 

 

 

Appendix.  Calculating Planning Variance and Estimating Attained Variance from Bulk Sampling with an Example Showing Costs

 

 

 

1.    Introduction

 

Statistical theory for the sampling of bulk materials is somewhat different from survey sampling theory as found in textbooks such as Cochran’s (1972).  Both theories, however, have two kinds of variance formula, one based on guesses or judgments of population parameters and the other based on data.  For the mean of a random sample these are written s2/n in the one case and s2/n in the other.  In the case of bulk sampling theory we have a BSPVF, a bulk sampling planning variance formula in the one case, versus sample variances among interleaved sub-composites in the other.

 

We propose here to describe what we see as the statistical theory of bulk sampling, particularly in the ways it is distinguished from survey sampling theory, and to derive the two kinds of variance formula.  We suggest that getting more bulk sampling data will be necessary to improving both types of formulas and wonder why this has not happened.  Our answer is that perhaps the complexities of any bulk sampling plan have discouraged experimentation just as they have made theory construction so difficult.. 

 

2.    Statistical Theory of Bulk Sampling is Not Enumerative Sampling Theory

 

 

Bulk material is stuff in big containers or in a big pile.  It can even be soil on a farm field or river water, but our main interest here is with particulate materials exemplified by coal flowing over a conveyor belt.  By definition material is in bulk whenever it can be partitioned only at great cost.  Sampling a bulk means extracting amounts of material.  A bulk sampling plan is a protocol or series of instructions for getting and processing material up to and including a measurement or analysis step.

 

The main difference between the enumerative sampling theory found in Cochran (1972) and the statistical model for bulk sampling is in the clerical and physical nature of the sampling frame.  Much of bulk sampling uses no frame.  However, a cross stream cutter, for example, can be programmed to sweep at a given moment and such moments can be listed.  Even so, there is no pre-existing, naturally defined, object, item or element corresponding to such a moment.  The material extracted is defined “on the fly.”

 

One might claim that the hypothetically infinite collection of amounts that could be extracted at a given moment form the basis of a measurement error distribution and this addition of measurement errors is all that would be needed to bring bulk sampling into line with enumerative sampling.  This would only be true if each extracted amount were analyzed (and such schemes do exist as, for example, nuclear activation analyzers or continuous emissions monitors) but in most cases the amounts extracted are further sub-sampled “on the fly” as well.  These further sources of randomness bring the theory of bulk sampling away from enumerative sampling and back into general statistical theory.   

 

Of course, the principles of enumerative sampling theory, or of sampling theory generally, are still applicable.  For example, we recommend listing the possible moments for extracting and using random numbers as starts for drawing multiple systematic samples as a basis for variance calculations.  Such recommendations are not presently being followed but we might hope that better understanding of their statistical basis will advance their acceptance.

 

3.    The Planning Variance Formula

 

 

In the face of the great variety of equipment and settings it is remarkable that a more or less standardized sampling plan has evolved.  One begins by extracting increments as amounts of material taken from all sectors of the lot.  This material is possibly crushed and sub-sampled, then sub-sampled and crushed again, down to a test sample, which is pulverized and an amount as small as one gram is taken for analysis.  Sub-sampling is done either by extracting increments (during the first few “increment selection” stages) or by splitting (for the later “sample preparation” stages).  Extracting is done by falling stream collectors or by cross stream cutters.   Splitting is done by rifflers – either the 2-pan Jones type or the rotary splitters with several compartments.  The procedures have been described by Merks (1985) and Smith (2001).

 

The statistical specification of a bulk sampling plan (with some coal sampling values in parentheses) is the following.

 

13.  The analyte percent in the lot is P. (P=10% ash content is not uncommon.)

14.  The specific gravity of the material is SpG.  (SpG=1.4 is usual.)

15.  The stages are indexed by i=1, 2, ¼, F, where F stands for the “final” or analysis stage. (F=1 means each increment is analyzed and this is done, for example, by nuclear activation analysis on a flowing coal stream, but is exceptional.  More commonly F=3, 4 or 5 with the last two stages as sample preparation and the first ones as increment extraction stages.)

16.  Lot size is em1, which is entering amount at the first stage in grams, while entering amounts are emi=ni-1mi-1 for the other stages.

17.  The number of evenly spaced increments (riffler compartments) is denoted ni.

18.  The target increment (or compartment) size is written mi in grams.

19.  The coefficient of variation of increment sampling rates is CVi.  (CVi=0.3 more or less.)

20.  The nominal top size diameter of particles entering the ith stage is di in centimeters.

21.  The liberation diameter of free analyte particles is dL.  (It is taken to be 0.2 cm for raw coal and 0.01 cm for washed coal.)

22.  The level of handling effectiveness, CA which ranges from 1 to 4, and governs how far above 1 the bIS,i values go and how far below 1 the bSP,i values go. 

 

The variance formula based on these specifications is,


 

 


Development of this formula began in the 30’s and 40’s (Kassel and Guy, 1935) and was greatly advanced by Gy (1986).  The current version now includes all stages in one expression, the CVi are explicitly recognized and b-values are chosen to represent the effects of process correlations.  These recent developments are described in a series of studies appearing on the web page, www4.stat.ncsu.edu/~cproctor/.  Just a word on the use of b-values and the parameter CA, may be in order.

 

From the standpoint of statistical theory the BSPVF arose initially from applied probability with particles behaving as independently and randomly sampled units.  However, particles obey natural correlations and sampling is systematic, not random.  Several approaches compete for dealing with these departures from random sampling.  Some theories use intra-cluster correlation coefficients, others rely on variance components and still others on correlograms and variograms.  Our preference is for one called variously gauge invariance (Rose, 1983), power law (Vanmarke, 1983), fractional Brownian motion (Mandelbrott, 1968) or Smith’s Empirical Law (Smith, 1937).  Smith used b as the notation for the exponent and we follow his lead.

 

The use of a summary score CA to replace the series of judgment calls needed to set all of the bIS,i  and bSP,i values represents a realistic admission of ignorance of such detailed information.  These b-values reflect both choice of equipment and also the adjacency correlations arising from the material.  That is, the efficiencies of systematic sampling depend both on how steep or flat is the process correlogram and also on how equally and widely spaced are the increments.  Similarly, the losses in variance to cluster intra-correlation depend on what riffling equipment is used as well as on how adhering are the particles.  There seemed to be consistency over stages in the degree of care taken to separate increments and to maintain their order.  For example, a CA=1 level would be grab sampling, a CA=2 level may be gridded sampling, a CA=3 may represent a cross-stream cutter and CA=4 a rotary riffler.  These would normally be associated respectively with a CA=1 level for a dumped compositing, a CA=2 level for some attempt to preserve ordering in feeding to the next stage and a CA=3 level for separate bagging of increments to be fed in order to the next stage. 

 

1.    Illustrating the Formula

 

The BSPVF was developed by successive trial and error fittings to the relatively few estimates of variance components by stages that appear in the literature and these efforts have been described in the Studies on the web page.  Although it appears that the variance formula will work for other materials (We have applied it to fertilizer and peanuts in studies on the web page.), it is best that we consider only percent ash in coal from here on as we deal with costs.  We found very little data on costs but we have applied our common sense judgment to derive a bulk sampling planning cost formula, BSPCF.  This exercise verified that the specifications were sufficiently complete.  It also forced us to recognize that naming specific pieces of equipment would furnish too much detail.  The compromise was to exploit the level-of-control quantity, CA, to summarize how well arrangement of particles is being preserved as well as how expensive this might be.

 

To illustrate both formulas let’s apply them to answer a rhetorical question posed on his web site by Charlie Rose, a coal-sampling consultant.  The question is, “Should the sample crusher product be 8-mesh or 4-mesh?”  A case was cited of an agreement by coal vendors to send to the buyer, or more precisely to the utility company’s laboratory, a one-kilogram 8-mesh sample.  Particle diameter for 8-mesh is 0.236 cm and for 4-mesh it is 0.475 cm.  Charlie pointed out that the ASTM D 2013 (1985) standard for percent ash also allows the sample to be of four kilograms and 4-mesh, and he suggested this may be less costly overall.  Tables 1 and 2 give variances and costs, as calculated by BSPVF and by BSPCF, for the two alternative plans.   

 

Table 1 covers the case of the smaller, more finely crushed, sample while Table 2 gives details of the alternative plan.  Notice that the sample sizes passed from stage two to stage three follow the requirements.  In the first column of Table 1 we see diameters for the stages – the lot top size is at 5.4 cm (about 2 1/8’’) and the primary composite is crushed to 0.236 cm or to 8-mesh.  In Table 2 the primary composite is crushed only to 0.475 cm or 4-mesh.  From column (12) we find the amounts entering the third stage are 999.9 grams or one kilogram and 39990.0 grams or 4 kilograms as was required. 

 

The first two stages would be carried out by the vendor with, perhaps, a cross-stream cutter, then a crusher, followed by riffling or perhaps by a secondary cutter.  The last two stages would be done in the utility’s laboratory - - crushing to 8-mesh if they get a 4-mesh sample and riffling in either case to a 62.4-gram test sample which is then pulverized to 60-mesh and one gram analyzed.     

 

Notice in column (6) that the extra crushing does reduce variance a bit - - from 0.82012 to 0.81540 - - but there are cost differences as can be seen from column (18).  Costs at the third (laboratory) stage are higher (280.775 pesos) for 4-mesh material than for 8-mesh (263.530 pesos).  Costs at stage 2 (the vendor’s sampling system) are lower for 4-mesh (528.375 pesos) than for 8-mesh (615.977 pesos).  We call the cost unit a peso because it is so fraught with judgments equating wage rates, times required, installation and maintenance, and so on.  Charlie judges $2 to $4 extra per sample where the BSPCF finds a difference of 17.245 pesos so the exchange rate seems about 6 to 1.  The difference to the vendor is thus about $15 per sample by BSPCF.  Charlie enumerates some expensive, though not too common, costs but does not take an expectation.  We give in the Appendix, the SAS code for BSPCF (and for BSPVF) which shows what cost coefficients we applied to what cost factors and we would welcome others with better judgment helping to improve the formula.     

 

Supposing that all parties agree to work with 4-mesh, we might check that the slight variance increase does not affect precision too adversely.  Since the estimate for the whole lot will be the average of the four sublot determinations the variance of the plan is found as 0.82012/4=0.20503.  Taking a square root gives the standard error as SE=0.45280 so that 2SE=0.9056.  It is 2SE that is required to be less than 1/10 of the estimate.  In this case we are using P=10 and 1/10 of 10 is 1 so the anticipated 2SE satisfies the precision requirement.  We will describe next how to check this.   

 

2.    Estimating Sampling Variances

 

Bulk sampling plans with nF=1 are very common and yield just the one determination with no possibility for an estimate of sampling variance.  Even if nF  is upped to, say, 5 the variability among the five determinations is only of the analysis stage.  It is possible, I have been told, to run two or more independent sets of sampling equipment over the same lot, but I have seen no such data.  The only practical way to estimate sampling variance is to use interleaved sub-composites over a series of sublots.  We already described sublots in connection with the example above so it remains to define interleaving. 

 

These are formed by numbering the n1 increments in order as they are extracted and separating the material from the odd-numbered ones from that of the even-numbered ones.  It is easy to say “separating”, but it may well tax the sampling system to do so, although the better systems have this capability.  Further steps of sub-sampling are done in a “routine manner.”  One should perhaps double n1 so that the two sub-composites have the same amounts of material as the initial sampling plan called for but again this can over-tax the system.  To build up degrees of freedom it is necessary to first identify a quite large lot and create equal-sized sublots (10 or 20 of them) so as to obtain interleaved-sub-composites in each sublot.  Interleaved sub-composites can also be created at later stages and thereby estimate variance components that correspond directly to variance contributions from BSPVF at each stage. 


 

Table 1. BSPVF-Specified for raw, 5.4 cm top size, 10 000 ton lot, 4 sublots, 1kg to lab

 

      (1)         (2)        (3)          (4)          (5)          (6)           (7)

                Amount     Number      Control       Control

   Top size    sampled       of          over         over      Cummulative    Cummulative

     (cm)      (grams)    samples    arrangement    equality         dv           costs

 

     5.400     3000.00       30           2             3         0.14446         541.11

     0.236       33.33       30           2             3         0.17942        1157.09

     0.236        3.90       16           2             3         0.52310        1420.62

     0.025        1.00        1           2             3         0.81540        1475.85

 

     (8)        (9)         (10)         (11)         (12)         (13)         (14) 

                                                   Entering                  Increment

   Analyte              Liberation    Specific       amount     Constant     selection

   percent    Stages     diameter      gravity      (grams)    adjustment        b

 

      10         1          0.2          1.4      1200000.0      0.19245      1.36751

      10         2          0.2          1.4        90000.0      0.92057      1.24319

      10         3          0.2          1.4          999.9      0.92057      1.11887

      10         4          0.2          1.4           62.4      2.82843      0.99455

 

        (15)          (16)         (17)          (18)        (19)    

      Sample        CV of

   preparation    sampling    Anticipated    Anticipated

        b           rates       var-comp        costs       product

 

     0.63158         0.3        0.14446        541.115      78.1685

     0.66667         0.3        0.03496        615.977      21.5342

     0.70588         0.3        0.34369        263.530      90.5715

     0.75000         0.0        0.29230         55.229      16.1433

 

 

Table 2. BSPVF-Specified for raw, 5.4 cm top size, 10 000 ton lot, 4 sublots, 4kg to lab

 

                Amount     Number      Control       Control

   Top size    sampled       of          over         over      Cummulative    Cummulative

     (cm)      (grams)    samples    arrangement    equality         dv           costs

 

     5.400      3000.0       30           2             3         0.14446         541.11

     0.475       133.0       30           2             3         0.18414        1069.49

     0.236         3.9       16           2             3         0.52782        1350.26

     0.025         1.0        1           2             3         0.82012        1405.49

 

                                                   Entering                  Increment

   Analyte              Liberation    Specific       amount     Constant     selection

   percent    Stages     diameter      gravity      (grams)    adjustment        b

 

      10         1          0.2          1.4      1200000.0      0.19245      1.36751

      10         2          0.2          1.4        90000.0      0.64889      1.24319

      10         3          0.2          1.4         3990.0      0.92057      1.11887

      10         4          0.2          1.4           62.4      2.82843      0.99455

 

      Sample        CV of

   preparation    sampling    Anticipated    Anticipated

        b           rates       var-comp        costs       product

 

     0.63158         0.3        0.14446        541.115      78.1685

     0.66667         0.3        0.03968        528.375      20.9654

     0.70588         0.3        0.34369        280.775      96.4982

     0.75000         0.0        0.29230         55.229      16.1433

 

 

   

 

 

There is another type of sub-composite that needs to be distinguished so it can be avoided when the aim is to estimate total sampling variance.  This is the sub-increment sub-composite.  Each increment is divided into a number of parts and the material from parts numbered 1 become one sub-composite, those numbered 2 another and so on.  These sub-composites can be further sub-sampled in a “routine manner” and yield determinations whose average is a fine estimate of the analyte level in the lot.  However their variance tells relatively little about the variance of the sampling plan - - it tells only about within-increment variation.

 

Numerous examples of analyses of variance from data on interleaved sub-composites can be found in the cited studies of our web page.  One difficulty we found when analyzing such data is the lack of a full specification of the sampling plan.  This is not to wonder at considering that the BSPVF requires knowing, P, SpG, F, em1, the ni, the mi, the di , dL, the CVi, as well as all the b-values.  Look again at Tables 1 and 2.  Some of these values can be set at default quantities if one is familiar with routine methods.  However, a statistician is not usually so well informed and we also notice that engineers are uncomfortable “guessing” at what might have gone on. 

 

3.    Illustration of Variance Estimation

 

With least disturbance to the existing sampling plan used by the vendor in the previous example one can create two interleaved sub-composites from the 30 increments taken in each sublot.  The resulting plan is specified and its costs and variances anticipated in Table 3.  Now two samples, each of 2 kg of mesh-4 material, are passed to the utility’s laboratory where they are crushed to 8-mesh and a 62.4-gram test sample is obtained, which is then pulverized and analyzed.  

 

Eight determinations result from this plan and their average becomes the estimate for the lot.  Differences between the four pairs form the basis for variance estimation.  Each difference-squared-over-two estimates a within-sublot variance.  Averaging these and dividing by 8 gives the estimate of sampling variance from the survey data.  I hope we will see such data some day.  Although the anticipated precision from the eight determinations (2SE=0.74) will be more than from the four originally, there is also anticipated to be a 30.7% increase in total cost (vendor plus laboratory).  These results are based on judgments, not data, and it will be of interest to see if they agree with others’ judgments.    Variance estimation is not for free.

 

 

 

 

 

Table 3. BSPVF-Specified for raw 2-inch as 10 000 ton lot in 4 sublots 4kg to lab

interleaved subcomposites from same size design

 

                Amount     Number      Control       Control

   Top size    sampled       of          over         over      Cummulative    Cummulative

     (cm)      (grams)    samples    arrangement    equality         dv           costs

 

     5.400      3000.0       15           2             3         0.37274        301.115

     0.475       133.0       15           2             3         0.46667        590.957

     0.236         3.9       16           2             3         0.81035        863.097

     0.025         1.0        1           2             3         1.10265        918.326

 

                                                   Entering                  Increment

   Analyte              Liberation    Specific       amount     Constant     selection

   percent    Stages     diameter      gravity      (grams)    adjustment        b

 

      10         1          0.2          1.4      1200000.0      0.19245      1.36751

      10         2          0.2          1.4        45000.0      0.64889      1.24319

      10         3          0.2          1.4         1995.0      0.92057      1.11887

      10         4          0.2          1.4           62.4      2.82843      0.99455

 

      Sample        CV of

   preparation    sampling    Anticipated    Anticipated

        b           rates       var-comp        costs       product

 

     0.63158         0.3        0.37274        301.115      112.237

     0.66667         0.3        0.09393        289.842       27.224

     0.70588         0.3        0.34369        272.140       93.531

     0.75000         0.0        0.29230         55.229       16.143


 

Figure 1.  SAS code for computing BSPVF and BSPCF

 

data dorig;

input  d m n ca  ce  ; *Specifications by levels of control;

retain  oem cumdv cumdc oca  oce od;

  if ca=. then ca=oca;if ce=. then ce=oce;

lta=ca;sta=ca;

p=10;*put in level of analyte;

anastg=4;*put in number of analysis (last) stage;

anavar=(.01*p)*(.01*p);*add var from analysis std dev at 1% of P;

stages=_n_;* number of sub-sampling stages operationally recognized;

dl=d;*put in diameter at liberation if known or if washed put d/10 at washed stages;;

dl=.2;*remove asterisk if coal is raw;

*dl=.01;*remove asterisk only if coal is washed;

spg=1.4;  *if material is other than coal put in its specific gravity here;

if _n_=1 then do;oca=ca;od=d;cumdv=0;cumdc=0;em=1200000;end; *initializing lot size as em;

c=sqrt(dL/d)*(spg/1.4); *correction for liberation diameter and for density;

bis=(1-stages/12)*exp(lta/5);

bsp=(sta/5+.2)/(1-stages/20);

cv=1.2-.3*ce;

if n=1 then cv=0;*no CV if only one increment;

 

c0=10**(-3*ca);

c1=10*(ca-1)+ce;

c2=.001;

c3=.0004;

c4=10;

 

if _n_>1 then em=oem;* entering amount is calculated from collected amounts;

dv=((1+cv*cv)*(c*100*p*(((2*d/3)**3)/m)**bsp))*n**(-bis);

if stages=anastg then dv=dv+anavar;

cumdv=cumdv+dv;

 

dc=c0*em+c1*n+c2*n*m+c3*em*(1/d-1/od)+c4*log(em/m);

cumdc=cumdc+dc; product=dc*dv;

 

oca=ca;;oce=ce; od=d;

oem=m*n; *collected amounts;

drop anastg anavar od c0 c1 c2 c3 c4 oca oce lta sta oem;

label em='Entering amount (grams)' d='Top size (cm)' m='Amount sampled (grams)'

n='Number of samples' c='Constant adjustment' p='Analyte percent'

cumdc='Cummulative costs'

dv='Anticipated var-comp' cumdv='Cummulative dv' stages='Stages'

cv='CV of sampling rates'  ca='Control over arrangement' ce='Control over equality'

dl='Liberation diameter' spg='Specific gravity'   dc='Anticipated costs'

bis='Increment selection b' bsp='Sample preparation b';

title 'BSPVF-Specified for raw 2-inch as 10 000 ton lot in 4 sublots 4kg to lab';

title2 'interleaved subcomposites from same size design';

cards;

 5.4 3000 15 2  3

 .475 133 15 . .

.236 3.9 16          . .

.025 1 1                . .

;;

proc print label noobs;run;