Relative standard deviation. Standard deviation

X i - random (current) variables;

the average value of random variables for the sample is calculated using the formula:

So, variance is the average square of deviations . That is, the average value is first calculated, then taken the difference between each original and average value is squared , is added and then divided by the number of values ​​in the population.

The difference between an individual value and the average reflects the measure of deviation. It is squared so that all deviations become exclusively positive numbers and to avoid mutual destruction of positive and negative deviations when summing them up. Then, given the squared deviations, we simply calculate the arithmetic mean.

The answer to the magic word “dispersion” lies in just these three words: average - square - deviations.

Average standard deviation(RMS)

Taking the square root of the variance, we obtain the so-called “ standard deviation". There are names "standard deviation" or "sigma" (from the name of the Greek letter σ .). The formula for the standard deviation is:

So, dispersion is sigma squared, or is the standard deviation squared.

The standard deviation, obviously, also characterizes the measure of data dispersion, but now (unlike dispersion) it can be compared with the original data, since they have the same units of measurement (this is clear from the calculation formula). The range of variation is the difference between extreme values. Standard deviation, as a measure of uncertainty, is also involved in many statistical calculations. With its help, the degree of accuracy of various estimates and forecasts is determined. If the variation is very large, then the standard deviation will also be large, and therefore the forecast will be inaccurate, which will be expressed, for example, in very wide confidence intervals.

Therefore, in methods of statistical data processing in real estate assessments, depending on the required accuracy of the task, the two or three sigma rule is used.

To compare the two-sigma rule and the three-sigma rule, we use Laplace’s formula:

F-F,

where Ф(x) is the Laplace function;



Minimum value

β = maximum value

s = sigma value (standard deviation)

a = average

In this case it is used private view Laplace's formula when the boundaries of α and β values random variable X are equally spaced from the center of the distribution a = M(X) by a certain amount d: a = a-d, b = a+d. Or (1) Formula (1) determines the probability of a given deviation d of a random variable X with a normal distribution law from its mathematical expectation M(X) = a. If in formula (1) we take sequentially d = 2s and d = 3s, we obtain: (2), (3).

Two sigma rule

It can be almost reliably (with a confidence probability of 0.954) that all values ​​of a random variable X with a normal distribution law deviate from its mathematical expectation M(X) = a by an amount not greater than 2s (two standard deviations). Confidence probability (Pd) is the probability of events that are conventionally accepted as reliable (their probability is close to 1).

Let's illustrate the two-sigma rule geometrically. In Fig. Figure 6 shows a Gaussian curve with the distribution center a. The area limited by the entire curve and the Ox axis is equal to 1 (100%), and the area of ​​the curvilinear trapezoid between the abscissas a–2s and a+2s, according to the two-sigma rule, is equal to 0.954 (95.4% of the total area). The area of ​​the shaded areas is 1-0.954 = 0.046 (»5% of the total area). These areas are called the critical region of the random variable. Values ​​of a random variable falling into the critical region are unlikely and in practice are conventionally accepted as impossible.

The probability of conditionally impossible values ​​is called the significance level of a random variable. The significance level is related to the confidence probability by the formula:

where q is the significance level expressed as a percentage.

Three sigma rule

When solving issues that require greater reliability, when the confidence probability (Pd) is taken equal to 0.997 (more precisely, 0.9973), instead of the two-sigma rule, according to formula (3), the rule is used three sigma



According to three sigma rule with a confidence probability of 0.9973, the critical area will be the area of ​​attribute values ​​outside the interval (a-3s, a+3s). The significance level is 0.27%.

In other words, the probability that the absolute value of the deviation will exceed three times the standard deviation is very small, namely 0.0027 = 1-0.9973. This means that only 0.27% of cases will this happen. Such events, based on the principle of the impossibility of unlikely events, can be considered practically impossible. Those. sampling is highly accurate.

This is the essence of the three sigma rule:

If a random variable is distributed normally, then the absolute value of its deviation from the mathematical expectation does not exceed three times the standard deviation (MSD).

In practice, the three-sigma rule is applied as follows: if the distribution of the random variable being studied is unknown, but the condition specified in the above rule is met, then there is reason to assume that the variable being studied is normally distributed; otherwise it is not normally distributed.

The level of significance is taken depending on the permitted degree of risk and the task at hand. For real estate valuation, a less precise sample is usually adopted, following the two-sigma rule.

To calculate the simple geometric mean, the formula is used:

Geometric weighted

To determine the weighted geometric mean, the formula is used:

The average diameters of wheels, pipes, and the average sides of squares are determined using the mean square.

Root-mean-square values ​​are used to calculate some indicators, for example, the coefficient of variation, which characterizes the rhythm of production. Here the standard deviation from the planned production output for a certain period is determined using the following formula:

These values ​​accurately characterize the change in economic indicators compared to their base value, taken in its average value.

Quadratic simple

The root mean square is calculated using the formula:

Quadratic weighted

The weighted mean square is equal to:

22. Absolute indicators of variation include:

range of variation

average linear deviation

dispersion

standard deviation

Range of variation (r)

Range of variation- is the difference between the maximum and minimum values ​​of the attribute

It shows the limits within which the value of a characteristic changes in the population being studied.

The work experience of the five applicants in previous work is: 2,3,4,7 and 9 years. Solution: range of variation = 9 - 2 = 7 years.

For a generalized description of differences in attribute values, average variation indicators are calculated based on taking into account deviations from the arithmetic mean. The difference is taken as a deviation from the average.

In this case, in order to avoid the sum of deviations of variants of a characteristic from the average turning to zero (zero property of the average), one must either ignore the signs of the deviation, that is, take this sum modulo , or square the deviation values

Average linear and square deviation

Average linear deviation is the arithmetic average of the absolute deviations of individual values ​​of a characteristic from the average.

The average linear deviation is simple:

The work experience of the five applicants in previous work is: 2,3,4,7 and 9 years.

In our example: years;

Answer: 2.4 years.

Average linear deviation weighted applies to grouped data:

Due to its convention, the average linear deviation is used in practice relatively rarely (in particular, to characterize the fulfillment of contractual obligations regarding uniformity of delivery; in the analysis of product quality, taking into account the technological features of production).

Standard deviation

The most perfect characteristic of variation is the mean square deviation, which is called the standard (or standard deviation). Standard deviation() equals square root from the mean square of deviations of individual values ​​of the characteristic to the arithmetic mean:

The standard deviation is simple:

Weighted standard deviation is applied to grouped data:

Between the root mean square and mean linear deviations under normal distribution conditions the following ratio takes place: ~ 1.25.

The standard deviation, being the main absolute measure of variation, is used in determining the ordinate values ​​of a normal distribution curve, in calculations related to the organization of sample observation and establishing the accuracy of sample characteristics, as well as in assessing the limits of variation of a characteristic in a homogeneous population.

Carrying out any statistical analysis unthinkable without calculations. In this article we will look at how to calculate variance, standard deviation, coefficient of variation and other statistical indicators in Excel.

Maximum and minimum value

Average linear deviation

The average linear deviation is the average of the absolute (modulo) deviations from in the analyzed data set. The mathematical formula is:

a– average linear deviation,

X– analyzed indicator,

– average value of the indicator,

n

In Excel this function is called SROTCL.

After selecting the SROTCL function, we indicate the data range over which the calculation should occur. Click "OK".

Dispersion

(module 111)

Perhaps not everyone knows what , so I’ll explain, it’s a measure that characterizes the spread of data around the mathematical expectation. However, usually only a sample is available, so the following variance formula is used:

s 2– sample variance calculated from observational data,

X– individual values,

– arithmetic mean for the sample,

n– the number of values ​​in the analyzed data set.

The corresponding Excel function is DISP.G. When analyzing relatively small samples (up to about 30 observations), you should use , which is calculated using the following formula.

The difference, as you can see, is only in the denominator. Excel has a function for calculating sample unbiased variance DISP.B.

Select the desired option (general or selective), indicate the range, and click the “OK” button. The resulting value may be very large due to the preliminary squaring of the deviations. Dispersion in statistics is a very important indicator, but it is usually not used in pure form, and for further calculations.

Standard deviation

The standard deviation (RMS) is the root of the variance. This indicator is also called standard deviation and is calculated using the formula:

by general population

by sample

You can simply take the root of the variance, but Excel has ready-made functions for standard deviation: STDEV.G And STDEV.V(for the general and sample populations, respectively).

Standard and standard deviation, I repeat, are synonyms.

Next, as usual, indicate the desired range and click on “OK”. The standard deviation has the same units of measurement as the analyzed indicator, and therefore is comparable to the original data. More on this below.

The coefficient of variation

All indicators discussed above are tied to the scale of the source data and do not allow one to obtain a figurative idea of ​​the variation of the analyzed population. To obtain a relative measure of data dispersion, use the coefficient of variation, which is calculated by dividing standard deviation on average. The formula for the coefficient of variation is simple:

There is no ready-made function for calculating the coefficient of variation in Excel, which is not a big problem. The calculation can be made by simply dividing the standard deviation by the mean. To do this, write in the formula bar:

STANDARDDEVIATION.G()/AVERAGE()

The data range is indicated in parentheses. If necessary, use the sample standard deviation (STDEV.V).

The coefficient of variation is usually expressed as a percentage, so you can frame a cell with a formula in a percentage format. The required button is located on the ribbon on the “Home” tab:

You can also change the format by selecting from the context menu after highlighting the desired cell and right-clicking.

The coefficient of variation, unlike other indicators of the scatter of values, is used as an independent and very informative indicator of data variation. In statistics, it is generally accepted that if the coefficient of variation is less than 33%, then the data set is homogeneous, if more than 33%, then it is heterogeneous. This information can be useful for preliminary characterization of the data and for identifying opportunities for further analysis. In addition, the coefficient of variation, measured as a percentage, allows you to compare the degree of scatter of different data, regardless of their scale and units of measurement. Useful property.

Oscillation coefficient

Another indicator of data dispersion today is the oscillation coefficient. This is the ratio of the range of variation (the difference between the maximum and minimum values) to the average. There is no ready-made Excel formula, so you will have to combine three functions: MAX, MIN, AVERAGE.

The coefficient of oscillation shows the extent of the variation relative to the average, which can also be used to compare different data sets.

In general, using Excel, many statistical indicators are calculated very simply. If something is not clear, you can always use the search box in the function insert. Well, Google is here to help.

Standard deviation(synonyms: standard deviation, standard deviation, square deviation; related terms: standard deviation, standard spread) - in probability theory and statistics, the most common indicator of the dispersion of the values ​​of a random variable relative to its mathematical expectation. With limited arrays of samples of values, instead of the mathematical expectation, the arithmetic mean of the set of samples is used.

Encyclopedic YouTube

  • 1 / 5

    The standard deviation is measured in units of measurement of the random variable itself and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring the linear relationship between random variables. Defined as the square root of the variance of a random variable.

    Standard deviation:

    s = n n − 1 σ 2 = 1 n − 1 ∑ i = 1 n (x i − x ¯) 2 ; (\displaystyle s=(\sqrt ((\frac (n)(n-1))\sigma ^(2)))=(\sqrt ((\frac (1)(n-1))\sum _( i=1)^(n)\left(x_(i)-(\bar (x))\right)^(2)));)
    • Note: Very often there are discrepancies in the names of MSD (Root Mean Square Deviation) and STD ( Standard Deviation) with their formulas. For example, in the numPy module of the Python programming language, the std() function is described as "standard deviation", while the formula reflects the standard deviation (division by the root of the sample). In Excel, the STANDARDEVAL() function is different (division by the root of n-1).

    Standard deviation(estimate of the standard deviation of a random variable x relative to its mathematical expectation based on an unbiased estimate of its variance) s (\displaystyle s):

    σ = 1 n ∑ i = 1 n (x i − x ¯) 2 . (\displaystyle \sigma =(\sqrt ((\frac (1)(n))\sum _(i=1)^(n)\left(x_(i)-(\bar (x))\right) ^(2))).)

    Where σ 2 (\displaystyle \sigma ^(2))- dispersion; x i (\displaystyle x_(i)) - i th element of the selection; n (\displaystyle n)- sample size; - arithmetic mean of the sample:

    x ¯ = 1 n ∑ i = 1 n x i = 1 n (x 1 + … + x n) . (\displaystyle (\bar (x))=(\frac (1)(n))\sum _(i=1)^(n)x_(i)=(\frac (1)(n))(x_ (1)+\ldots +x_(n)).)

    It should be noted that both estimates are biased. IN general case It is impossible to construct an unbiased estimate. However, the estimate based on the unbiased variance estimate is consistent.

    In accordance with GOST R 8.736-2011, the standard deviation is calculated using the second formula of this section. Please check the results.

    Three sigma rule

    Three sigma rule (3 σ (\displaystyle 3\sigma )) - almost all values ​​of a normally distributed random variable lie in the interval (x ¯ − 3 σ ; x ¯ + 3 σ) (\displaystyle \left((\bar (x))-3\sigma ;(\bar (x))+3\sigma \right)). More strictly - with approximately probability 0.9973, the value of a normally distributed random variable lies in the specified interval (provided that the value x ¯ (\displaystyle (\bar (x))) true, and not obtained as a result of sample processing).

    If the true value x ¯ (\displaystyle (\bar (x))) is unknown, then you should not use σ (\displaystyle \sigma ), A s. Thus, rule of three sigma is converted to the rule of three s .

    Interpretation of the standard deviation value

    A larger standard deviation value shows a greater spread of values ​​in the presented set with the average value of the set; a smaller value, accordingly, shows that the values ​​in the set are grouped around the average value.

    For example, we have three number sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). All three sets have mean values ​​equal to 7, and standard deviations, respectively, equal to 7, 5 and 1. The last set has a small standard deviation, since the values ​​in the set are grouped around the mean value; the first set has the most great importance standard deviation - values ​​within the set diverge greatly from the average value.

    In a general sense, standard deviation can be considered a measure of uncertainty. For example, in physics, standard deviation is used to determine the error of a series of successive measurements of some quantity. This value is very important for determining the plausibility of the phenomenon under study in comparison with the value predicted by the theory: if the average value of the measurements differs greatly from the values ​​​​predicted by the theory (large standard deviation), then the obtained values ​​or the method of obtaining them should be rechecked. identified with portfolio risk.

    Climate

    Suppose there are two cities with the same average maximum daily temperature, but one is located on the coast and the other on the plain. It is known that cities located on the coast have many different maximum daytime temperatures that are lower than cities located inland. Therefore, the standard deviation of the maximum daily temperatures for a coastal city will be less than for the second city, despite the fact that the average value of this value is the same, which in practice means that the probability that the maximum air temperature on any given day of the year will be higher differ from the average value, higher for a city located inland.

    Sport

    Let's assume that there are several football teams that are rated on some set of parameters, for example, the number of goals scored and conceded, scoring chances, etc. It is most likely that the best team in this group will have better values ​​on more parameters. The smaller the team’s standard deviation for each of the presented parameters, the more predictable the team’s result is; such teams are balanced. On the other hand, for a team with a large standard deviation, it is difficult to predict the result, which in turn is explained by the imbalance, e.g. strong defense, but with a weak attack.

    Using the standard deviation of team parameters makes it possible, to one degree or another, to predict the result of a match between two teams, assessing the strengths and weak sides commands, and therefore the chosen methods of struggle.

    An approximate method for assessing the variability of a variation series is to determine the limit and amplitude, but the values ​​of the variant within the series are not taken into account. The main generally accepted measure of the variability of a quantitative characteristic within a variation series is standard deviation (σ - sigma). The larger the standard deviation, the higher the degree of fluctuation of this series.

    The method for calculating the standard deviation includes the following steps:

    1. Find the arithmetic mean (M).

    2. Determine the deviations of individual options from the arithmetic mean (d=V-M). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is zero.

    3. Square each deviation d 2.

    4. Multiply the squares of the deviations by the corresponding frequencies d 2 *p.

    5. Find the sum of the products å(d 2 *p)

    6. Calculate the standard deviation using the formula:

    When n is greater than 30, or when n is less than or equal to 30, where n is the number of all options.

    Standard deviation value:

    1. The standard deviation characterizes the spread of the variant relative to average size(i.e. fluctuation of the variation series). The larger the sigma, the higher the degree of diversity of this series.

    2. The standard deviation is used for a comparative assessment of the degree of correspondence of the arithmetic mean to the variation series for which it was calculated.

    Variations mass phenomena obey the law of normal distribution. The curve representing this distribution looks like a smooth bell-shaped symmetrical curve (Gaussian curve). According to the theory of probability, in phenomena that obey the law of normal distribution, there is a strict mathematical relationship between the values ​​of the arithmetic mean and the standard deviation. The theoretical distribution of a variant in a homogeneous variation series obeys the three-sigma rule.

    If in the system rectangular coordinates On the abscissa axis we plot the values ​​of the quantitative characteristic (variants), and on the ordinate axis - the frequency of occurrence of the variant in the variation series, then variants with larger and smaller values ​​are evenly located on the sides of the arithmetic mean.



    It has been established that with a normal distribution of the trait:

    68.3% of the variant values ​​are within M±1s

    95.5% of the variant values ​​are within M±2s

    99.7% of the variant values ​​are within M±3s

    3. The standard deviation allows you to establish normal values ​​for clinical and biological parameters. In medicine, the interval M±1s is usually taken as the normal range for the phenomenon being studied. The deviation of the estimated value from the arithmetic mean by more than 1s indicates a deviation of the studied parameter from the norm.

    4. In medicine, the three-sigma rule is used in pediatrics for individual assessment of the level of physical development of children (sigma deviation method), for the development of standards for children's clothing

    5. The standard deviation is necessary to characterize the degree of diversity of the characteristic being studied and to calculate the error of the arithmetic mean.

    The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use coefficient of variation (Cv), representing relative size: percentage ratio of the standard deviation to the arithmetic mean.

    The coefficient of variation is calculated using the formula:

    The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.