Calculate square deviation online. Standard deviation

It is worth noting that this calculation of variance has a drawback - it turns out to be biased, i.e. its mathematical expectation is not equal to the true value of the variance. Read more about this. At the same time, not everything is so bad. As the sample size increases, it still approaches its theoretical analogue, i.e. is asymptotically unbiased. Therefore, when working with large sizes samples, you can use the formula above.

It is useful to translate the language of signs into the language of words. It turns out that the variance is the average square of the deviations. That is, the average value is first calculated, then the difference between each original and average value is taken, squared, added, and then divided by the number of values ​​in the population. The difference between an individual value and the average reflects the measure of deviation. It is squared so that all deviations become exclusively positive numbers and to avoid mutual destruction of positive and negative deviations when summing them up. Then, given the squared deviations, we simply calculate the arithmetic mean. Average - square - deviations. The deviations are squared and the average is calculated. The solution lies in just three words.

However, in pure form, such as the arithmetic mean, or index, variance is not used. It is rather an auxiliary and intermediate indicator that is necessary for other types of statistical analysis. It doesn't even have a normal unit of measurement. Judging by the formula, this is the square of the unit of measurement of the original data. Without a bottle, as they say, you can’t figure it out.

(module 111)

In order to return the variance to reality, that is, to use it for more mundane purposes, the square root is extracted from it. It turns out the so-called standard deviation (RMS). There are names “standard deviation” or “sigma” (from the name of the Greek letter). The standard deviation formula is:

To obtain this indicator for the sample, use the formula:

As with variance, there is a slightly different calculation option. But as the sample grows, the difference disappears.

The standard deviation, obviously, also characterizes the measure of data dispersion, but now (unlike dispersion) it can be compared with the original data, since they have the same units of measurement (this is clear from the calculation formula). But this indicator in its pure form is not very informative, since it contains too many intermediate calculations that are confusing (deviation, squared, sum, average, root). However, it is already possible to work directly with the standard deviation, because the properties of this indicator are well studied and known. For example, there is this three sigma rule, which states that the data has 997 values ​​out of 1000 within ±3 sigma of the arithmetic mean. Standard deviation, as a measure of uncertainty, is also involved in many statistical calculations. With its help, the degree of accuracy of various estimates and forecasts is determined. If the variation is very large, then the standard deviation will also be large, and therefore the forecast will be inaccurate, which will be expressed, for example, in very wide confidence intervals.

The coefficient of variation

Average standard deviation gives absolute assessment measures of dispersion. Therefore, to understand how large the spread is relative to the values ​​themselves (i.e., regardless of their scale), a relative indicator is required. This indicator is called coefficient of variation and is calculated using the following formula:

The coefficient of variation is measured as a percentage (if multiplied by 100%). Using this indicator, you can compare a variety of phenomena, regardless of their scale and units of measurement. This fact is what makes the coefficient of variation so popular.

In statistics, it is accepted that if the value of the coefficient of variation is less than 33%, then the population is considered homogeneous; if it is more than 33%, then it is heterogeneous. It's difficult for me to comment on anything here. I don’t know who defined this and why, but it is considered an axiom.

I feel that I am carried away by dry theory and need to bring something visual and figurative. On the other hand, all variation indicators describe approximately the same thing, only they are calculated differently. Therefore, it is difficult to show off a variety of examples. Only the values ​​of indicators can differ, but not their essence. So let’s compare how the values ​​of different variation indicators differ for the same set of data. Let's take the example of calculating the average linear deviation (from ). Here are the source data:

And a schedule to remind you.

Using these data, we calculate various indicators of variation.

The average value is the usual arithmetic average.

The range of variation is the difference between the maximum and minimum:

The average linear deviation is calculated using the formula:

Standard deviation:

Let's summarize the calculation in a table.

As can be seen, the linear mean and standard deviation give similar values ​​for the degree of data variation. Variance is sigma squared, so it will always be a relatively large number, which, in fact, does not mean anything. The range of variation is the difference between extreme values ​​and can speak volumes.

Let's summarize some results.

Variation of an indicator reflects the variability of a process or phenomenon. Its degree can be measured using several indicators.

1. Range of variation - the difference between the maximum and minimum. Reflects range possible values.
2. Average linear deviation – reflects the average of the absolute (modulo) deviations of all values ​​of the analyzed population from their average value.
3. Dispersion - the average square of deviations.
4. Standard deviation is the root of the dispersion (the mean square of deviations).
5. The coefficient of variation is the most universal indicator, reflecting the degree of scattering of values, regardless of their scale and units of measurement. The coefficient of variation is measured as a percentage and can be used to compare variation various processes and phenomena.

Thus, in statistical analysis there is a system of indicators reflecting the homogeneity of phenomena and the stability of processes. Often variation indicators do not have independent meaning and are used for further data analysis (calculation confidence intervals

Defined as a generalizing characteristic of the size of variation of a trait in the aggregate. It is equal to the square root of the average square deviation of individual values ​​of the attribute from the arithmetic mean, i.e. The root of and can be found like this:

1. For the primary row:

2. For the variation series:

Transformation of the standard deviation formula brings it to a form more convenient for practical calculations:

Standard deviation determines how much on average specific options deviate from their average value, and is also an absolute measure of the variability of a characteristic and is expressed in the same units as the options, and therefore is well interpreted.

Examples of finding the standard deviation: ,

For alternative characteristics, the standard deviation formula looks like this:

where p is the proportion of units in the population that have a certain characteristic;

q is the proportion of units that do not have this characteristic.

The concept of average linear deviation

Average linear deviation is defined as the arithmetic mean of the absolute values ​​of the deviations of individual options from .

1. For the primary row:

2. For the variation series:

where the sum n is sum of frequencies of variation series.

An example of finding the average linear deviation:

The advantage of the mean absolute deviation as a measure of dispersion over the range of variation is obvious, since this measure is based on taking into account all possible deviations. But this indicator has significant shortcomings. Arbitrary rejection of algebraic signs of deviations can lead to the fact that the mathematical properties of this indicator are far from elementary. This makes it very difficult to use the mean absolute deviation when solving problems involving probabilistic calculations.

Therefore, the average linear deviation as a measure of variation of a characteristic is rarely used in statistical practice, namely when summing up indicators without taking into account signs makes economic sense. With its help, for example, the turnover of foreign trade, the composition of workers, the rhythm of production, etc. are analyzed.

Mean square

Mean square applied, for example, to calculate the average size of the sides of n square sections, the average diameters of trunks, pipes, etc. It is divided into two types.

Simple mean square. If, when replacing individual values ​​of a characteristic with an average value, it is necessary to keep the sum of the squares of the original values ​​unchanged, then the average will be a quadratic average value.

It is the square root of the quotient of dividing the sum of squares of the individual attribute values ​​by their number:

The weighted mean square is calculated using the formula:

where f is the weight sign.

Average cubic

Average cubic applies, for example, when determining the average length of a side and cubes. It is divided into two types.
Average cubic simple:

When calculating average values ​​and dispersion in interval series of distributions true values characteristics are replaced by the central values ​​of the intervals, which are different from the arithmetic mean of the values ​​included in the interval. This leads to a systematic error when calculating the variance. V.F. Sheppard determined that error in variance calculation, caused by the use of grouped data, is 1/12 of the square of the interval in both the upward and downward direction of the variance.

Sheppard Amendment should be used if the distribution is close to normal, relates to a characteristic with a continuous nature of variation, and is based on a significant amount of initial data (n > 500). However, based on the fact that in some cases both errors, acting in different directions, compensate each other, it is sometimes possible to refuse to introduce corrections.

The smaller the variance and standard deviation, the more homogeneous the population and the more typical the average will be.
In the practice of statistics, there is often a need to compare variations of various characteristics. For example, it is of great interest to compare variations in the age of workers and their qualifications, length of service and size wages, cost and profit, length of service and labor productivity, etc. For such comparisons, indicators of absolute variability of characteristics are unsuitable: it is impossible to compare the variability of work experience, expressed in years, with the variation of wages, expressed in rubles.

To carry out such comparisons, as well as comparisons of the variability of the same characteristic in several populations with different arithmetic averages, a relative indicator of variation is used - the coefficient of variation.

Structural averages

To characterize the central tendency in statistical distributions, it is often rational to use, together with the arithmetic mean, a certain value of the characteristic X, which, due to certain features location in the distribution series can characterize its level.

This is especially important when in a distribution series the extreme values ​​of a characteristic have unclear boundaries. In this regard, an accurate determination of the arithmetic mean is usually impossible or very difficult. In such cases average level can be determined by taking, for example, a feature value that is located in the middle of a frequency series or that occurs most often in the current series.

Such values ​​depend only on the nature of the frequencies, i.e., on the structure of the distribution. They are typical in location in a series of frequencies, therefore such values ​​are considered as characteristics of the center of the distribution and therefore received the definition of structural averages. They are used to study internal structure and the structure of the distribution series of attribute values. Such indicators include:

An approximate method for assessing the variability of a variation series is to determine the limit and amplitude, but the values ​​of the variant within the series are not taken into account. The main generally accepted measure of the variability of a quantitative characteristic within a variation series is standard deviation (σ - sigma). The larger the standard deviation, the higher the degree of fluctuation of this series.

The method for calculating the standard deviation includes the following steps:

1. Find the arithmetic mean (M).

2. Determine the deviations of individual options from the arithmetic mean (d=V-M). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is zero.

3. Square each deviation d 2.

4. Multiply the squares of the deviations by the corresponding frequencies d 2 *p.

5. Find the sum of the products å(d 2 *p)

6. Calculate the standard deviation using the formula:

When n is greater than 30, or when n is less than or equal to 30, where n is the number of all options.

Standard deviation value:

1. The standard deviation characterizes the spread of the variant relative to the average value (i.e., the variability of the variation series). The larger the sigma, the higher the degree of diversity of this series.

2. The standard deviation is used for a comparative assessment of the degree of correspondence of the arithmetic mean to the variation series for which it was calculated.

Variations mass phenomena obey the law of normal distribution. The curve representing this distribution looks like a smooth bell-shaped symmetrical curve (Gaussian curve). According to the theory of probability, in phenomena that obey the law of normal distribution, there is a strict mathematical relationship between the values ​​of the arithmetic mean and the standard deviation. The theoretical distribution of a variant in a homogeneous variation series obeys the three-sigma rule.

If in the system rectangular coordinates On the abscissa axis we plot the values ​​of the quantitative characteristic (variants), and on the ordinate axis - the frequency of occurrence of the variant in the variation series, then variants with larger and smaller values ​​are evenly located on the sides of the arithmetic mean.



It has been established that with a normal distribution of the trait:

68.3% of the variant values ​​are within M±1s

95.5% of the variant values ​​are within M±2s

99.7% of the variant values ​​are within M±3s

3. The standard deviation allows you to establish normal values ​​for clinical and biological parameters. In medicine, the interval M±1s is usually taken as the normal range for the phenomenon being studied. The deviation of the estimated value from the arithmetic mean by more than 1s indicates a deviation of the studied parameter from the norm.

4. In medicine, the three-sigma rule is used in pediatrics for individual assessment of the level of physical development of children (sigma deviation method), for the development of standards for children's clothing

5. The standard deviation is necessary to characterize the degree of diversity of the characteristic being studied and to calculate the error of the arithmetic mean.

The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use coefficient of variation (Cv), representing relative size: percentage ratio of the standard deviation to the arithmetic mean.

The coefficient of variation is calculated using the formula:

The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.

The Excel program is highly valued by both professionals and amateurs, because users of any skill level can work with it. For example, anyone with minimal “communication” skills in Excel can draw a simple graph, make a decent plate, etc.

At the same time, this program even allows you to perform various types of calculations, for example, calculations, but this requires a slightly different level of training. However, if you have just begun to become closely acquainted with this program and are interested in everything that will help you become a more advanced user, this article is for you. Today I will tell you what the standard deviation formula in Excel is, why it is needed at all and, strictly speaking, when it is used. Go!

What it is

Let's start with the theory. The standard deviation is usually called the square root obtained from the arithmetic mean of all squared differences between the available quantities, as well as their arithmetic mean.

By the way, this value is usually called the Greek letter “sigma”. The standard deviation is calculated using the STANDARDEVAL formula; accordingly, the program does this for the user itself.

The essence of this concept is to identify the degree of variability of an instrument, that is, it is, in its own way, an indicator derived from descriptive statistics. It identifies changes in the volatility of an instrument over a certain time period. The STDEV formulas can be used to estimate the standard deviation of a sample, ignoring Boolean and text values.

Formula

The formula that is automatically provided in Excel helps to calculate the standard deviation in Excel. To find it, you need to find the formula section in Excel, and then select the one called STANDARDEVAL, so it’s very simple.

After this, a window will appear in front of you in which you will need to enter data for the calculation. In particular, two numbers should be entered in special fields, after which the program itself will calculate the standard deviation for the sample.

Undoubtedly, mathematical formulas and calculations are a rather complex issue, and not all users can cope with it straight away. However, if you dig a little deeper and look at the issue in a little more detail, it turns out that not everything is so sad. I hope you are convinced of this using the example of calculating the standard deviation.

Video to help

Standard deviation The most perfect characteristic of variation is the mean square deviation, which is called the standard (or standard deviation). Standard deviation

() is equal to the square root of the average square deviation of individual values ​​of the attribute from the arithmetic mean:

The standard deviation is simple:

Weighted standard deviation is applied to grouped data:

The standard deviation, being the main absolute measure of variation, is used in determining the ordinate values ​​of a normal distribution curve, in calculations related to the organization of sample observation and establishing the accuracy of sample characteristics, as well as in assessing the limits of variation of a characteristic in a homogeneous population.

18. Variance, its types, standard deviation.

Variance of a random variable- a measure of the spread of a given random variable, i.e. its deviation from the mathematical expectation. In statistics, the notation or is often used. Square root from the variance is usually called standard deviation, standard deviation or standard spread.

Total variance (σ 2) measures the variation of a trait in its entirety under the influence of all the factors that caused this variation. At the same time, thanks to the grouping method, it is possible to identify and measure the variation due to the grouping characteristic and the variation arising under the influence of unaccounted factors.

Intergroup variance (σ 2 m.gr) characterizes systematic variation, i.e. differences in the value of the studied trait that arise under the influence of the trait - the factor that forms the basis of the group.

Standard deviation(synonyms: standard deviation, standard deviation, square deviation; related terms: standard deviation, standard spread) - in probability theory and statistics, the most common indicator of the dispersion of the values ​​of a random variable relative to its mathematical expectation. With limited arrays of samples of values, instead of the mathematical expectation, the arithmetic mean of the set of samples is used.

The standard deviation is measured in units of measurement of the random variable itself and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring the linear relationship between random variables. Defined as the square root of the variance of a random variable.

Standard deviation:

Standard deviation(estimate of the standard deviation of a random variable x relative to its mathematical expectation based on an unbiased estimate of its variance):

where is the dispersion; - i th element of the selection; - sample size; - arithmetic mean of the sample:

It should be noted that both estimates are biased. IN general case It is impossible to construct an unbiased estimate. In this case, the estimate based on the unbiased variance estimate is consistent.

19. Essence, scope and procedure for determining mode and median.

In addition to power averages in statistics, for the relative characterization of the value of a varying characteristic and the internal structure of distribution series, structural averages are used, which are represented mainly by fashion and median.

Fashion- This is the most common variant of the series. Fashion is used, for example, in determining the size of clothes and shoes that are in greatest demand among customers. The mode for a discrete series is the variant with the highest frequency. When calculating the mode for an interval variation series, it is extremely important to first determine the modal interval (by maximum frequency), and then - the value of the modal value of the attribute using the formula:

§ - meaning of fashion

§ - lower limit of the modal interval

§ - interval value

§ - modal interval frequency

§ - frequency of the interval preceding the modal

§ - frequency of the interval following the modal

Median - this value of the attribute, ĸᴏᴛᴏᴩᴏᴇ lies in the basis of the ranked series and divides this series into two parts equal in number.

To determine the median in a discrete series if frequencies are available, first calculate the half-sum of frequencies , and then determine which value of the variant falls on it. (If the sorted series contains an odd number of characteristics, then the median number is calculated using the formula:

M e = (n (number of features in total) + 1)/2,

in the case of an even number of features, the median will be equal to the average of the two features in the middle of the row).

When calculating the median for interval variation series First, determine the median interval within which the median is located, and then determine the value of the median using the formula:

§ - the required median

§ - lower limit of the interval that contains the median

§ - interval value

§ - sum of frequencies or number of series terms

§ - the sum of the accumulated frequencies of the intervals preceding the median

§ - frequency of the median interval

Example. Find the mode and median.

Solution: IN in this example the modal interval is within the age group of 25-30 years, since this interval accounts for the highest frequency (1054).

Let's calculate the magnitude of the mode:

This means that the modal age of students is 27 years.

Let's calculate the median. The median interval is in age group 25-30 years, since within this interval there is an option͵ which divides the population into two equal parts (Σf i /2 = 3462/2 = 1731). Next, we substitute the necessary numerical data into the formula and get the median value:

This means that one half of the students are under 27.4 years old, and the other half are over 27.4 years old.

In addition to mode and median, indicators such as quartiles are used, dividing the ranked series into 4 equal parts, deciles - 10 parts and percentiles - into 100 parts.

20. The concept of sample observation and its scope.

Selective observation applies when the use of continuous surveillance physically impossible due to a large amount of data or not economically feasible. Physical impossibility occurs, for example, when studying passenger flows, market prices, family budgets. Economic inexpediency occurs when assessing the quality of goods associated with their destruction, for example, tasting, testing bricks for strength, etc.

The statistical units selected for observation are sample population or sample, and their entire array - general population(GS). Wherein number of units in sample denote n, and throughout the entire HS - N. Attitude n/N usually called relative size or sample share.

The quality of sample observation results depends on sample representativeness, that is, on how representative it is in the GS. To ensure the representativeness of the sample, it is extremely important to comply principle of random selection of units, which assumes that the inclusion of a HS unit in the sample cannot be influenced by any factor other than chance.

Exists 4 ways of random selection to sample:

  1. Actually random selection or the “lotto method”, when statistical values ​​are assigned serial numbers, recorded on certain objects (for example, barrels), which are then mixed in a container (for example, in a bag) and selected at random. On practice this method carried out using a generator random numbers or mathematical tables of random numbers.
  2. Mechanical selection according to which each ( N/n)-th value of the general population. For example, if it contains 100,000 values, and you need to select 1,000, then every 100,000 / 1000 = 100th value will be included in the sample. Moreover, if they are not ranked, then the first one is selected at random from the first hundred, and the numbers of the others will be one hundred higher. For example, if the first unit was No. 19, then the next should be No. 119, then No. 219, then No. 319, etc. If the population units are ranked, then No. 50 is selected first, then No. 150, then No. 250, and so on.
  3. Selection of values ​​from a heterogeneous data array is carried out stratified(stratified) method, when the population is first divided into homogeneous groups, to which random or mechanical selection is applied.
  4. A special sampling method is serial selection, in which they randomly or mechanically select not individual values, but their series (sequences from some number to some number in a row), within which continuous observation is carried out.

The quality of sample observations also depends on sample type: repeated or unrepeatable. At re-selection Statistical values ​​or their series included in the sample are returned to the general population after use, having a chance to be included in a new sample. Moreover, all values ​​in the general population have the same probability of inclusion in the sample. Repeatless selection means that the statistical values ​​or their series included in the sample do not return to the general population after use, and therefore for the remaining values ​​of the latter the probability of being included in the next sample increases.

Non-repetitive sampling gives more accurate results, and therefore is used more often. But there are situations when it cannot be applied (studying passenger flows, consumer demand, etc.) and then a repeated selection is carried out.

21. Maximum observation sampling error, average sampling error, procedure for their calculation.

Let us consider in detail the methods listed above for forming a sample population and the representativeness errors that arise. Properly random sampling is based on selecting units from the population at random without any systematic elements. Technically, actual random selection is carried out by drawing lots (for example, lotteries) or using a table of random numbers.

Proper random selection “in its pure form” is rarely used in the practice of selective observation, but it is the initial one among other types of selection; it implements the basic principles of selective observation. Let's consider some theoretical issues sampling method and error formulas for simple random sampling.

Sampling bias- ϶ᴛᴏ the difference between the value of the parameter in the general population and its value calculated from the results of sample observation. It is important to note that for the average quantitative characteristic the sampling error is determined by

The indicator is usually called the maximum sampling error. The sample mean is a random variable that can take on different values ​​based on which units are included in the sample. Therefore, sampling errors are also random variables and can take on different values. For this reason, determine the average of possible errorsaverage sampling error, which depends on:

· sample size: the larger the number, the smaller the average error;

· the degree of change in the characteristic being studied: the smaller the variation of the characteristic, and, consequently, the dispersion, the smaller the average sampling error.

At random re-selection the average error is calculated. In practice, the general variance is not known exactly, but in probability theory it has been proven that . Since the value for sufficiently large n is close to 1, we can assume that . Then the average sampling error should be calculated: . But in cases of a small sample (with n<30) коэффициент крайне важно учитывать, и среднюю ошибку малой выборки рассчитывать по формуле .

At random non-repetitive sampling the given formulas are adjusted by the value . Then the average non-repetitive sampling error is: And . Because is always less than , then the multiplier () is always less than 1. This means that the average error with repeated selection is always less than with repeated selection. Mechanical sampling is used when the general population is ordered in some way (for example, lists of voters in alphabetical order, telephone numbers, house and apartment numbers). The selection of units is carried out at a certain interval, which is equal to the inverse value of the sampling percentage. So, with a 2% sample, every 50 unit = 1/0.02 is selected, with a 5% sample, every 1/0.05 = 20 unit of the general population.

The reference point is selected in different ways: randomly, from the middle of the interval, with a change in the reference point. The main thing is to avoid systematic error. For example, with a 5% sample, if the first unit is the 13th, then the next ones are 33, 53, 73, etc.

In terms of accuracy, mechanical selection is close to actual random sampling. For this reason, to determine the average error of mechanical sampling, proper random selection formulas are used.

At typical selection the population being surveyed is preliminarily divided into homogeneous, similar groups. For example, when surveying enterprises, these are industries, sub-sectors; when studying the population, these are regions, social or age groups. Next, an independent selection from each group is made mechanically or purely randomly.

Typical sampling produces more accurate results than other methods. Typing the general population ensures that each typological group is represented in the sample, which makes it possible to eliminate the influence of intergroup variance on the average sampling error. Therefore, when finding the error of a typical sample according to the rule of adding variances (), it is extremely important to take into account only the average of the group variances. Then the average sampling error: with repeated sampling, with non-repetitive sampling , Where – the average of the within-group variances in the sample.

Serial (or nest) selection used when the population is divided into series or groups before the start of the sample survey. These series include packaging of finished products, student groups, and brigades. Series for examination are selected mechanically or purely randomly, and within the series a continuous examination of units is carried out. For this reason, the average sampling error depends only on the intergroup (between series) variance, which is calculated using the formula: where r is the number of selected series; – average of the i-th series. The average error of serial sampling is calculated: with repeated sampling, with non-repetitive sampling , where R is the total number of series. Combined selection is a combination of the considered selection methods.

The average sampling error for any sampling method depends mainly on the absolute size of the sample and, to a lesser extent, on the percentage of the sample. Let us assume that 225 observations are made in the first case from a population of 4,500 units and in the second from a population of 225,000 units. The variances in both cases are equal to 25. Then in the first case, with a 5% selection, the sampling error will be: In the second case, with 0.1% selection, it will be equal to:

However, when the sampling percentage was reduced by 50 times, the sampling error increased slightly, since the sample size did not change. Let's assume that the sample size is increased to 625 observations. In this case, the sampling error is: Increasing the sample by 2.8 times with the same population size reduces the size of the sampling error by more than 1.6 times.

22.Methods and methods for forming a sample population.

In statistics, various methods of forming sample populations are used, which is determined by the objectives of the study and depends on the specifics of the object of study.

The main condition for conducting a sample survey is to prevent the occurrence of systematic errors arising from violation of the principle of equal opportunity for each unit of the general population to be included in the sample. Prevention of systematic errors is achieved through the use of scientifically based methods for forming a sample population.

There are the following methods for selecting units from the general population: 1) individual selection - individual units are selected for the sample; 2) group selection - the sample includes qualitatively homogeneous groups or series of units being studied; 3) combined selection is a combination of individual and group selection. Selection methods are determined by the rules for forming a sample population.

The sample should be:

  • actually random consists in the fact that the sample population is formed as a result of random (unintentional) selection of individual units from the general population. In this case, the number of units selected in the sample population is usually determined based on the accepted sample proportion. The sample proportion is the ratio of the number of units in the sample population n to the number of units in the general population N, ᴛ.ᴇ.
  • mechanical consists in the fact that the selection of units in the sample population is made from the general population, divided into equal intervals (groups). In this case, the size of the interval in the population is equal to the reciprocal of the sample share. So, with a 2% sample, every 50th unit is selected (1:0.02), with a 5% sample, every 20th unit (1:0.05), etc. However, in accordance with the accepted proportion of selection, the general population is, as it were, mechanically divided into equal groups. From each group, only one unit is selected for the sample.
  • typical – in which the general population is first divided into homogeneous typical groups. Next, from each typical group, a purely random or mechanical sample is used to individually select units into the sample population. An important feature of a typical sample is that it gives more accurate results compared to other methods of selecting units in the sample population;
  • serial- in which the general population is divided into groups of equal size - series. Series are selected into the sample population. Within the series, continuous observation of the units included in the series is carried out;
  • combined- sampling should be two-stage. In this case, the population is first divided into groups. Next, groups are selected, and within the latter, individual units are selected.

In statistics, the following methods are distinguished for selecting units in a sample population:

  • single stage sampling - each selected unit is immediately subjected to study according to a given criterion (proper random and serial sampling);
  • multi-stage sampling - a selection is made from the general population of individual groups, and individual units are selected from the groups (typical sampling with a mechanical method of selecting units into the sample population).

In addition, there are:

  • re-selection- according to the scheme of the returned ball. In this case, each unit or series included in the sample is returned to the general population and therefore has a chance to be included in the sample again;
  • repeat selection- according to the unreturned ball scheme. It has more accurate results with the same sample size.

23. Determination of the extremely important sample size (using the Student's t-table).

One of the scientific principles in sampling theory is to ensure that a sufficient number of units are selected. Theoretically, the extreme importance of observing this principle is presented in the proofs of limit theorems in probability theory, which make it possible to establish what volume of units should be selected from the population so that it is sufficient and ensures the representativeness of the sample.

A decrease in the standard sampling error, and therefore an increase in the accuracy of the estimate, is always associated with an increase in the sample size; therefore, already at the stage of organizing a sample observation, it is necessary to decide what the size of the sample population should be in order to ensure the required accuracy of the observation results . The calculation of the extremely important sample volume is constructed using formulas derived from the formulas for the maximum sampling errors (A), corresponding to a particular type and method of selection. So, for a random repeated sample size (n) we have:

The essence of this formula is that with random repeated sampling of extremely important numbers, the sample size is directly proportional to the square of the confidence coefficient (t2) and variance of the variational characteristic (?2) and is inversely proportional to the square of the maximum sampling error (?2). In particular, with an increase in the maximum error by a factor of two, the required sample size should be reduced by a factor of four. Of the three parameters, two (t and?) are set by the researcher. At the same time, the researcher, based on the goal

and the problems of a sample survey must solve the question: in what quantitative combination is it better to include these parameters to ensure the optimal option? In one case, he may be more satisfied with the reliability of the results obtained (t) than with the measure of accuracy (?), in another - vice versa. It is more difficult to resolve the issue regarding the value of the maximum sampling error, since the researcher does not have this indicator at the stage of designing the sample observation; therefore, in practice it is customary to set the value of the maximum sampling error, usually within 10% of the expected average level of the attribute . Establishing the estimated average can be approached in different ways: using data from similar previous surveys, or using data from the sampling frame and conducting a small pilot sample.

The most difficult thing to establish when designing a sample observation is the third parameter in formula (5.2) - the variance of the sample population. In this case, it is extremely important to use all the information available to the researcher, obtained in previous similar and pilot surveys.

The question of determining the extremely important sample size becomes more complicated if the sample survey involves the study of several characteristics of sampling units. In this case, the average levels of each of the characteristics and their variation, as a rule, are different, and in this regard, deciding which variance of which of the characteristics to give preference to is possible only taking into account the purpose and objectives of the survey.

When designing a sample observation, a predetermined value of the permissible sampling error is assumed in accordance with the objectives of a particular study and the probability of conclusions based on the observation results.

In general, the formula for the maximum error of the sample average allows us to determine:

‣‣‣ the magnitude of possible deviations of the indicators of the general population from the indicators of the sample population;

‣‣‣ the required sample size to ensure the required accuracy, at which the limits of possible error do not exceed a certain specified value;

‣‣‣ the probability that the error in the sample will have a specified limit.

Student distribution in probability theory, it is a one-parameter family of absolutely continuous distributions.

24. Dynamic series (interval, moment), closing dynamic series.

Dynamics series- these are the values ​​of statistical indicators that are presented in a certain chronological sequence.

Each time series contains two components:

1) indicators of time periods(years, quarters, months, days or dates);

2) indicators characterizing the object under study for time periods or on corresponding dates, which are called series levels.

Series levels are expressed in both absolute and average or relative values. Taking into account the dependence on the nature of the indicators, dynamic series of absolute, relative and average values ​​are built. Dynamic series of relative and average values ​​are constructed on the basis of derived series of absolute values. There are interval and moment series of dynamics.

Dynamic interval series contains the values ​​of indicators for certain periods of time. In an interval series, levels can be summed up to obtain the volume of the phenomenon over a longer period, or the so-called accumulated totals.

Dynamic moment series reflects the values ​​of indicators at a certain point in time (date of time). In moment series, the researcher may only be interested in the difference in phenomena that reflects the change in the level of the series between certain dates, since the sum of the levels here has no real content. Cumulative totals are not calculated here.

The most important condition for the correct construction of time series is comparability of series levels belonging to different periods. The levels must be presented in homogeneous quantities, and there must be equal completeness of coverage of different parts of the phenomenon.

In order to avoid distortion of the real dynamics, in statistical research preliminary calculations are carried out (closing the dynamics series), which precede the statistical analysis of the time series. Under closing the series of dynamics It is generally accepted to understand the combination into one series of two or more series, the levels of which are calculated using different methodology or do not correspond to territorial boundaries, etc. Closing the dynamics series may also imply bringing the absolute levels of the dynamics series to a common basis, which neutralizes the incomparability of the levels of the dynamics series.

25. The concept of comparability of dynamics series, coefficients, growth and growth rates.

Dynamics series- these are a series of statistical indicators characterizing the development of natural and social phenomena over time. Statistical collections published by the State Statistics Committee of Russia contain a large number of dynamics series in tabular form. Dynamic series make it possible to identify patterns of development of the phenomena being studied.

Dynamics series contain two types of indicators. Time indicators(years, quarters, months, etc.) or points in time (at the beginning of the year, at the beginning of each month, etc.). Row level indicators. Indicators of the levels of dynamics series can be expressed in absolute values ​​(product production in tons or rubles), relative values ​​(share of the urban population in %) and average values ​​(average salary of industry workers by year, etc.). In tabular form, a time series contains two columns or two rows.

Correct construction of time series requires the fulfillment of a number of requirements:

  1. all indicators of a number of dynamics must be scientifically substantiated and reliable;
  2. indicators of a series of dynamics must be comparable over time, ᴛ.ᴇ. must be calculated for the same periods of time or on the same dates;
  3. indicators of a number of dynamics must be comparable across the territory;
  4. indicators of a series of dynamics must be comparable in content, ᴛ.ᴇ. calculated according to a single methodology, in the same way;
  5. indicators of a number of dynamics should be comparable across the range of farms taken into account. All indicators of a series of dynamics must be given in the same units of measurement.

Statistical indicators can characterize either the results of the process being studied over a period of time, or the state of the phenomenon being studied at a certain point in time, ᴛ.ᴇ. indicators can be interval (periodic) and momentary. Accordingly, initially the dynamics series are either interval or moment. Moment dynamics series, in turn, come with equal and unequal time intervals.

The original dynamics series can be transformed into a series of average values ​​and a series of relative values ​​(chain and basic). Such time series are called derived time series.

The methodology for calculating the average level in the dynamics series is different, depending on the type of the dynamics series. Using examples, we will consider the types of dynamics series and formulas for calculating the average level.

Absolute increases (Δy) show how many units the subsequent level of the series has changed compared to the previous one (gr. 3. - chain absolute increases) or compared to the initial level (gr. 4. - basic absolute increases). The calculation formulas can be written as follows:

When the absolute values ​​of the series decrease, there will be a “decrease” or “decrease”, respectively.

Absolute growth indicators indicate that, for example, in 1998. production of product "A" increased compared to 1997. by 4 thousand tons, and compared to 1994 ᴦ. - by 34 thousand tons; for other years, see table. 11.5 gr.
Posted on ref.rf
3 and 4.

Growth rate shows how many times the level of the series has changed compared to the previous one (gr. 5 - chain coefficients of growth or decline) or compared to the initial level (gr. 6 - basic coefficients of growth or decline). The calculation formulas can be written as follows:

Rates of growth show what percentage the next level of the series is compared to the previous one (gr. 7 - chain growth rates) or compared to the initial level (gr. 8 - basic growth rates). The calculation formulas can be written as follows:

So, for example, in 1997. production volume of product "A" compared to 1996 ᴦ. amounted to 105.5% (

Growth rate show by what percentage the level of the reporting period increased compared to the previous one (column 9 - chain growth rates) or compared to the initial level (column 10 - basic growth rates). The calculation formulas can be written as follows:

T pr = T r - 100% or T pr = absolute growth / level of the previous period * 100%

So, for example, in 1996. compared to 1995 ᴦ. Product "A" was produced more by 3.8% (103.8% - 100%) or (8:210)x100%, and compared to 1994 ᴦ. - by 9% (109% - 100%).

If the absolute levels in the series decrease, then the rate will be less than 100% and, accordingly, there will be a rate of decrease (the rate of increase with a minus sign).

Absolute value of 1% increase(gr.
Posted on ref.rf
11) shows how many units need to be produced in a given period so that the level of the previous period increases by 1%. In our example, in 1995 ᴦ. it was necessary to produce 2.0 thousand tons, and in 1998 ᴦ. - 2.3 thousand tons, ᴛ.ᴇ. much bigger.

The absolute value of 1% growth can be determined in two ways:

§ the level of the previous period divided by 100;

§ chain absolute increases are divided by the corresponding chain growth rates.

Absolute value of 1% increase =

In dynamics, especially over a long period, a joint analysis of the growth rate with the content of each percentage increase or decrease is important.

Note that the considered methodology for analyzing time series is applicable both for time series, the levels of which are expressed in absolute values ​​(t, thousand rubles, number of employees, etc.), and for time series, the levels of which are expressed in relative indicators (% of defects , % ash content of coal, etc.) or average values ​​(average yield in c/ha, average salary, etc.).

Along with the considered analytical indicators, calculated for each year in comparison with the previous or initial level, when analyzing dynamics series, it is extremely important to calculate the average analytical indicators for the period: the average level of the series, the average annual absolute increase (decrease) and the average annual growth rate and growth rate .

Methods for calculating the average level of a series of dynamics were discussed above. In the interval dynamics series we are considering, the average level of the series is calculated using the simple arithmetic mean formula:

Average annual production volume of the product for 1994-1998. amounted to 218.4 thousand tons.

The average annual absolute growth is also calculated using the arithmetic mean formula

Standard deviation - concept and types. Classification and features of the category "Mean square deviation" 2017, 2018.