Technical publications concerning ILC

Lab proficiency testing: Proposals for limits that balance risks of triggering false alerts and lack of true alerts

PT is based on scores that shall not overcome limits, (typically 2 and 3 for bias) checking whether participants pertain the main population. These limits are always conventional and associated theoretical α risks are not always same (limits 2 and 3 correspond to 2,275% and 0,135% while ISO 5725-2 considers 1% and 5% risks). Usual practice using 2 warning levels enables to distinguish doubtful and bad performances. However, probabilities to fail to declare participants results as outliers (β risk) are not considered with these limits. In this study, we defined a “doubtful” zone as where both α and β risks are low and balanced rather than ignoring the β risk. This avoids the usual situations where β is very large, i.e. PT with very low power. We determined corresponding limits for assessing bias and repeatability with α=β=1% at a level of confidence of 90%. For bias, these “bands of doubt” are close to usual ones when n=110, enlarged for lower values of n and vice-versa. We also determined limits using non-parametric methods, then expressed as ranks rather than scores. Unsurprisingly, this is less efficient and powerful and should be used only when parametric methods cannot be used.

Download the full text: Proposals for balanced limits of alerts

Proficiency testing of repeatability

Repeatability is a main feature of the performance of a laboratory. However, ISO 13528 does provide much information about this issue and tools of ISO 5725-2 (i.e. Cochran algorithm and Mandel k-scores) are intended to identify outliers rather than to assess performance of participants. To address this issue, we developed a new “zr-score” based on the well-known equation of distribution of estimates of standard deviations. The conditions of validity of any assessment of repeatability are discussed (i.e. adequate statistical parameters, homoscedasticity, methods to determine an assigned value of repeatability, heterogeneity of test specimens and relationship with scheme of experiments, outliers, …). The PT scheme needs to be well designed to put in light the true values of repeatability standard deviations of participants. A Monte-Carlo study was used to check effectiveness and power of the assessment of repeatability, using zr-score, Mandel k-scores and Cochran algorithm (methods of ISO 13528 appeared very poor and were not included). zr-score appeared to be the simplest and the most effective of the 3. Moreover, the 3 methods appeared quite powerful to detect all or almost all outliers, even with only 10 participants.

Download the full text: Proficiency testing of repeatability

Intervals of confidence on nested standard deviations (or on nested variances)

In many situations, standard deviations (SD) need to be computed from a nested design. This typically happens in procedures of quality control and in laboratory inter-comparisons. In such situations, the basics for computing SD is well known but existing methods to compute the interval of confidence (IC) on them are rather unsatisfactory. In particular, they fail totally to account negative values that are often encountered for corresponding estimated variances. This article provides equations that describe well the distributions of variances of nested levels and their scatter, provided that the true values of them is known. Both cases of 2 nested levels and more than 2 nested levels are considered. Inversing them to find out IC on true values of variances as function of their estimations is unfortunately impossible when variances of lower levels are unknown. However, this article proposes approaching equations that can be used when the impact of the variances of lower levels can be expected to be low. Methods to check whether this condition is fulfilled are also proposed. When not, the numbers of repetitions at the lower levels need to be increased to get an acceptable determination of the IC.

Download the full text: Intervals of confidence on nested standard deviations

Intervals of confidence on quantiles of Gaussian distributions

In some situations, typically when declarations of conformity are to be declared, quantile values of a Gaussian distribution need to be estimated. If the calculation of an estimate of it is easy, the calculation of its related interval of confidence is not at all. The knowledge of such IC is needed particularly when levels of confidence about decisions of conformity are needed, whenever the specifications are min or max limits or characteristic values. This document provides technical backgrounds of calculations of them, table of IC limits computed by the Monte-Carlo method according to the desired quantile and level of confidence, an Excel file enabling to compute them, minimum numbers of values to get a given interval of confidence and empirical formulas to estimate them for the usual values of desired quantile and level of confidence.

Download the full text: Intervals of confidence on estimates quantile of a Gaussian distribution

How to assure reliability in the determination of uncertainties of test results

Results of CompaLab ILC (interlaboratory comparisons) show that uncertainties are significantly underestimated by participants. A gradation of test methods can be established, from mainly metrological to methods which sources of uncertainties are mainly qualitative. Uncertainties are globally well determined for the first while they are globally underestimated by a factor 10 or more for the last. This probably comes from a massive choice of GUM method B to determine them, whatever the test method. However, method B is effective in metrology but not when significant qualitative sources of uncertainty are present. GUM also lacks guidance about some issues specific to testing. Furthermore, ILC and laboratory quality surveillance results can be re-used for GUM method A, which provide quite better estimates of uncertainties and request significantly fewer time and money than method B. When accurate determination of uncertainties is important, collaborative method A experiments (i.e. specifically designed ILC) should be organised, which results can afterwards be used in very effective internal quality surveillance programs. Determining uncertainties should always begin by a clarification about the intended use of them and a collection of available information concerning the precision of testing. The most appropriate method to determine uncertainties highly depends on this and, in most cases, the answer is not method B.

Download the full text: Reliability of uncertainties of test results

Interlaboratory comparisons for hardness tests: interpolation of assigned values according to loading charges

The possibility in a lab proficiency testing to assess hardness test results of a given Brinell or Vickers scale when an enough amount of test results is available for adjacent scales is investigated. 5 different methods are found to determine the assigned value and 2 different methods are found to determine the proficiency standard deviation, the repeatability standard deviation and the uncertainty on the assigned value. The best option depends on the interlaboratory testing conditions. A procedure is described to deal with the different possible options and to propose parameters to check the adequacy of each of them to help the choice of the most adapted one. An assessment of the results obtained with this procedure on CompaLab ILC results obtained during the 2017-2023 years was performed, leading to very small differences in the scoring of participants for available scales. When the size of the input data is large, output scoring is even likely to be more efficient than usual one.

Download the full text: ILC about hardness: Interpolation of VA according to load

Appropriate rankits to use for normal probability plots and Standard deviation probability plots

Normal probability plots are usually used to check whether a distribution can be regarded as Gaussian, to visualise whether some figures are likely to be outliers and, using a linear regression, to estimate its mean value and its standard deviation. In the same way, “SD probability plots”, based on the distribution of standard deviation estimates, could be quite useful to reach similar goals: check whether a hypothesis of homoscedasticity can be accepted or not, visualise estimates that are likely to be outliers, and estimate the true underlying standard deviation. In practice, a change of variable is necessary to change the rank of each value into a corresponding cumulated probability and inverse Gaussian transformation to get a “rankit” to be used as ordinates for these plots. Equations in the form of (i-a)/(N+1-2a) with 0 ≤ a ≤ 1 are usually used to determine the adequate cumulated probabilities. As a matter of fact, at least for small values of N, the choice of the “a” value has an important impact on the conclusions that are drawn afterwards. This document:

Discusses the grounds of these equations;
Evaluates their adequacy for a series of situations and types of distribution laws;
Proposes equations to determine “a” values as function of N, that provide better rankits than usually used and enable to estimate mean values and/or standard deviations without any bias for a series of situations;
Proposes an accurate way to determine envelope curves of confidence for normal probability plots and probability plots of any distribution which cumulative function is known.

Download the full text: Appropriate rankits for probability plots

Beta risk in proficiency testing in relation with the number of participants

Abstract:

The Monte Carlo method was applied to PT schemes to investigate their efficiency. Probabilities that the computed z values are over 3 while the true value is less than 2 and that the computed z values are less than 2 while the true values are over 3 are computed for a series of situations: number of participants from 5 to 30, various ratios of repeatability over reproducibility and number of test results per participant, introduction or not of outliers with z from 3,5 to 10. For each situation, the probabilities of not detecting true outliers and to trigger false alerts are discussed. Guidance and keys are proposed to check and improve the efficiency of real PT programs.

Abstract of conclusions:

This study demonstrates that:

The ratio λ=σr/(σL×Nr) is of main importance to control the efficiency of a PT scheme, even more than the number of participants. The PT providers should then care N_r, number of test results per participant that they request;
Even in adverse conditions, the α-risk is always very low (less than 0,7%);
Robust algorithms improve the efficiency of the PT program (i.e. β-risk) at a slight expense on α-risk (which always remain very low). This comes from a significantly better estimation of the standard deviation of reference when an outlier is present among the participants when these algorithms are used;
A number of 6 participants is large enough to detect a strongly outlying participant provided that good PT conditions (i.e. low value of λ) are present;
PT with a low number of participants is (almost) always better than no PT at all.

ISO 5725-1 and ISO 13528 recommend not to organise an ILC with less than 12 participants. This makes sense for ISO 5725-1, which goal is to determine the performance of a test method. It makes less sense for ISO 13528, which goal is to check the performance of a lab. Obviously, when no PT is organised, β-risk is 100%: any lab having a problem can never at all realise it! Consequently, for test methods that are performed by a little number of labs, it is obviously better to organise PT with 6 participants than nothing. In those cases, the PT provider should specially care the N_r it requests, to ensure a proper λ value and consequently assure an efficiency as good as possible.

Download the full text: Beta risks in proficiency testing EN

Download corresponding scientific publication