Nächste: , Vorige:   [Inhalt][Index]

71 stats


71.1 Introduction to stats

Package stats contains a set of classical statistical inference and hypothesis testing procedures.

All these functions return an inference_result Maxima object which contains the necessary results for population inferences and decision making.

Global variable stats_numer controls whether results are given in floating point or symbolic and rational format; its default value is true and results are returned in floating point format.

Package descriptive contains some utilities to manipulate data structures (lists and matrices); for example, to extract subsamples. It also contains some examples on how to use package numericalio to read data from plain text files. See descriptive and numericalio for more details.

Package stats loads packages descriptive, distrib and inference_result.

For comments, bugs or suggestions, please contact the author at

’mario AT edu DOT xunta DOT es’.


71.2 Functions and Variables for inference_result

Function: inference_result (title, values, numbers)

Constructs an inference_result object of the type returned by the stats functions. Argument title is a string with the name of the procedure; values is a list with elements of the form symbol = value and numbers is a list with positive integer numbers ranging from one to length(values), indicating which values will be shown by default.

Example:

This is a simple example showing results concerning a rectangle. The title of this object is the string "Rectangle", it stores five results, named 'base, 'height, 'diagonal, 'area, and 'perimeter, but only the first, second, fifth, and fourth will be displayed. The 'diagonal is stored in this object, but it is not displayed; to access its value, make use of function take_inference.

(%i1) load("inference_result")$
(%i2) b: 3$ h: 2$
(%i3) inference_result("Rectangle",
                        ['base=b,
                         'height=h,
                         'diagonal=sqrt(b^2+h^2),
                         'area=b*h,
                         'perimeter=2*(b+h)],
                        [1,2,5,4] );
                        |   Rectangle
                        |
                        |    base = 3
                        |
(%o3)                   |   height = 2
                        |
                        | perimeter = 10
                        |
                        |    area = 6
(%i4) take_inference('diagonal,%);
(%o4)                        sqrt(13)

See also take_inference.

Function: inferencep (obj)

Returns true or false, depending on whether obj is an inference_result object or not.

Function: items_inference (obj)

Returns a list with the names of the items stored in obj, which must be an inference_result object.

Example:

The inference_result object stores two values, named 'pi and 'e, but only the second is displayed. The items_inference function returns the names of all items, no matter they are displayed or not.

(%i1) load("inference_result")$
(%i2) inference_result("Hi", ['pi=%pi,'e=%e],[2]);
                            |   Hi
(%o2)                       |
                            | e = %e
(%i3) items_inference(%);
(%o3)                        [pi, e]
Function: take_inference (n, obj)
Function: take_inference (name, obj)
Function: take_inference (list, obj)

Returns the n-th value stored in obj if n is a positive integer, or the item named name if this is the name of an item. If the first argument is a list of numbers and/or symbols, function take_inference returns a list with the corresponding results.

Example:

Given an inference_result object, function take_inference is called in order to extract some information stored in it.

(%i1) load("inference_result")$
(%i2) b: 3$ h: 2$
(%i3) sol: inference_result("Rectangle",
                            ['base=b,
                             'height=h,
                             'diagonal=sqrt(b^2+h^2),
                             'area=b*h,
                             'perimeter=2*(b+h)],
                            [1,2,5,4] );
                        |   Rectangle
                        |
                        |    base = 3
                        |
(%o3)                   |   height = 2
                        |
                        | perimeter = 10
                        |
                        |    area = 6
(%i4) take_inference('base,sol);
(%o4)                           3
(%i5) take_inference(5,sol);
(%o5)                          10
(%i6) take_inference([1,'diagonal],sol);
(%o6)                     [3, sqrt(13)]
(%i7) take_inference(items_inference(sol),sol);
(%o7)                [3, 2, sqrt(13), 6, 10]

See also inference_result and take_inference.


71.3 Functions and Variables for stats

Option variable: stats_numer

Default value: true

If stats_numer is true, inference statistical functions return their results in floating point numbers. If it is false, results are given in symbolic and rational format.

Function: test_mean (x)
Function: test_mean (x, options ...)

This is the mean t-test. Argument x is a list or a column matrix containing an one dimensional sample. It also performs an asymptotic test based on the Central Limit Theorem if option 'asymptotic is true.

Options:

  • 'mean, default 0, is the mean value to be checked.
  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided, 'greater and 'less.
  • 'dev, default 'unknown, this is the value of the standard deviation when it is known; valid values are: 'unknown or a positive expression.
  • 'conflevel, default 95/100, confidence level for the confidence interval; it must be an expression which takes a value in (0,1).
  • 'asymptotic, default false, indicates whether it performs an exact t-test or an asymptotic one based on the Central Limit Theorem; valid values are true and false.

The output of function test_mean is an inference_result Maxima object showing the following results:

  1. 'mean_estimate: the sample mean.
  2. 'conf_level: confidence level selected by the user.
  3. 'conf_interval: confidence interval for the population mean.
  4. 'method: inference procedure.
  5. 'hypotheses: null and alternative hypotheses to be tested.
  6. 'statistic: value of the sample statistic used for testing the null hypothesis.
  7. 'distribution: distribution of the sample statistic, together with its parameter(s).
  8. 'p_value: p-value of the test.

Examples:

Performs an exact t-test with unknown variance. The null hypothesis is H_0: mean=50 against the one sided alternative H_1: mean<50; according to the results, the p-value is too great, there are no evidence for rejecting H_0.

(%i1) load("stats")$
(%i2) data: [78,64,35,45,45,75,43,74,42,42]$
(%i3) test_mean(data,'conflevel=0.9,'alternative='less,'mean=50);
          |                 MEAN TEST
          |
          |            mean_estimate = 54.3
          |
          |              conf_level = 0.9
          |
          | conf_interval = [minf, 61.51314273502712]
          |
(%o3)     |  method = Exact t-test. Unknown variance.
          |
          | hypotheses = H0: mean = 50 , H1: mean < 50
          |
          |       statistic = .8244705235071678
          |
          |       distribution = [student_t, 9]
          |
          |        p_value = .7845100411786889

This time Maxima performs an asymptotic test, based on the Central Limit Theorem. The null hypothesis is H_0: equal(mean, 50) against the two sided alternative H_1: not equal(mean, 50); according to the results, the p-value is very small, H_0 should be rejected in favor of the alternative H_1. Note that, as indicated by the Method component, this procedure should be applied to large samples.

(%i1) load("stats")$
(%i2) test_mean([36,118,52,87,35,256,56,178,57,57,89,34,25,98,35,
              98,41,45,198,54,79,63,35,45,44,75,42,75,45,45,
              45,51,123,54,151],
              'asymptotic=true,'mean=50);
          |                       MEAN TEST
          |
          |           mean_estimate = 74.88571428571429
          |
          |                   conf_level = 0.95
          |
          | conf_interval = [57.72848600856194, 92.04294256286663]
          |
(%o2)     |    method = Large sample z-test. Unknown variance.
          |
          |       hypotheses = H0: mean = 50 , H1: mean # 50
          |
          |             statistic = 2.842831192874313
          |
          |             distribution = [normal, 0, 1]
          |
          |             p_value = .004471474652002261
Function: test_means_difference (x1, x2)
Function: test_means_difference (x1, x2, options ...)

This is the difference of means t-test for two samples. Arguments x1 and x2 are lists or column matrices containing two independent samples. In case of different unknown variances (see options 'dev1, 'dev2 and 'varequal bellow), the degrees of freedom are computed by means of the Welch approximation. It also performs an asymptotic test based on the Central Limit Theorem if option 'asymptotic is set to true.

Options:

  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided, 'greater and 'less.
  • 'dev1, default 'unknown, this is the value of the standard deviation of the x1 sample when it is known; valid values are: 'unknown or a positive expression.
  • 'dev2, default 'unknown, this is the value of the standard deviation of the x2 sample when it is known; valid values are: 'unknown or a positive expression.
  • 'varequal, default false, whether variances should be considered to be equal or not; this option takes effect only when 'dev1 and/or 'dev2 are 'unknown.
  • 'conflevel, default 95/100, confidence level for the confidence interval; it must be an expression which takes a value in (0,1).
  • 'asymptotic, default false, indicates whether it performs an exact t-test or an asymptotic one based on the Central Limit Theorem; valid values are true and false.

The output of function test_means_difference is an inference_result Maxima object showing the following results:

  1. 'diff_estimate: the difference of means estimate.
  2. 'conf_level: confidence level selected by the user.
  3. 'conf_interval: confidence interval for the difference of means.
  4. 'method: inference procedure.
  5. 'hypotheses: null and alternative hypotheses to be tested.
  6. 'statistic: value of the sample statistic used for testing the null hypothesis.
  7. 'distribution: distribution of the sample statistic, together with its parameter(s).
  8. 'p_value: p-value of the test.

Examples:

The equality of means is tested with two small samples x and y, against the alternative H_1: m_1>m_2, being m_1 and m_2 the populations means; variances are unknown and supposed to be different.

(%i1) load("stats")$
(%i2) x: [20.4,62.5,61.3,44.2,11.1,23.7]$
(%i3) y: [1.2,6.9,38.7,20.4,17.2]$
(%i4) test_means_difference(x,y,'alternative='greater);
            |              DIFFERENCE OF MEANS TEST
            |
            |         diff_estimate = 20.31999999999999
            |
            |                 conf_level = 0.95
            |
            |    conf_interval = [- .04597417812882298, inf]
            |
(%o4)       |        method = Exact t-test. Welch approx.
            |
            | hypotheses = H0: mean1 = mean2 , H1: mean1 > mean2
            |
            |           statistic = 1.838004300728477
            |
            |    distribution = [student_t, 8.62758740184604]
            |
            |            p_value = .05032746527991905

The same test as before, but now variances are supposed to be equal.

(%i1) load("stats")$
(%i2) x: [20.4,62.5,61.3,44.2,11.1,23.7]$
(%i3) y: matrix([1.2],[6.9],[38.7],[20.4],[17.2])$
(%i4) test_means_difference(x,y,'alternative='greater,
                                                 'varequal=true);
            |              DIFFERENCE OF MEANS TEST
            |
            |         diff_estimate = 20.31999999999999
            |
            |                 conf_level = 0.95
            |
            |     conf_interval = [- .7722627696897568, inf]
            |
(%o4)       |   method = Exact t-test. Unknown equal variances
            |
            | hypotheses = H0: mean1 = mean2 , H1: mean1 > mean2
            |
            |           statistic = 1.765996124515009
            |
            |           distribution = [student_t, 9]
            |
            |            p_value = .05560320992529344
Function: test_variance (x)
Function: test_variance (x, options, ...)

This is the variance chi^2-test. Argument x is a list or a column matrix containing an one dimensional sample taken from a normal population.

Options:

  • 'mean, default 'unknown, is the population’s mean, when it is known.
  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided, 'greater and 'less.
  • 'variance, default 1, this is the variance value (positive) to be checked.
  • 'conflevel, default 95/100, confidence level for the confidence interval; it must be an expression which takes a value in (0,1).

The output of function test_variance is an inference_result Maxima object showing the following results:

  1. 'var_estimate: the sample variance.
  2. 'conf_level: confidence level selected by the user.
  3. 'conf_interval: confidence interval for the population variance.
  4. 'method: inference procedure.
  5. 'hypotheses: null and alternative hypotheses to be tested.
  6. 'statistic: value of the sample statistic used for testing the null hypothesis.
  7. 'distribution: distribution of the sample statistic, together with its parameter.
  8. 'p_value: p-value of the test.

Examples:

It is tested whether the variance of a population with unknown mean is equal to or greater than 200.

(%i1) load("stats")$
(%i2) x: [203,229,215,220,223,233,208,228,209]$
(%i3) test_variance(x,'alternative='greater,'variance=200);
             |                  VARIANCE TEST
             |
             |              var_estimate = 110.75
             |
             |                conf_level = 0.95
             |
             |     conf_interval = [57.13433376937479, inf]
             |
(%o3)        | method = Variance Chi-square test. Unknown mean.
             |
             |    hypotheses = H0: var = 200 , H1: var > 200
             |
             |                 statistic = 4.43
             |
             |             distribution = [chi2, 8]
             |
             |           p_value = .8163948512777689
Function: test_variance_ratio (x1, x2)
Function: test_variance_ratio (x1, x2, options ...)

This is the variance ratio F-test for two normal populations. Arguments x1 and x2 are lists or column matrices containing two independent samples.

Options:

  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided, 'greater and 'less.
  • 'mean1, default 'unknown, when it is known, this is the mean of the population from which x1 was taken.
  • 'mean2, default 'unknown, when it is known, this is the mean of the population from which x2 was taken.
  • 'conflevel, default 95/100, confidence level for the confidence interval of the ratio; it must be an expression which takes a value in (0,1).

The output of function test_variance_ratio is an inference_result Maxima object showing the following results:

  1. 'ratio_estimate: the sample variance ratio.
  2. 'conf_level: confidence level selected by the user.
  3. 'conf_interval: confidence interval for the variance ratio.
  4. 'method: inference procedure.
  5. 'hypotheses: null and alternative hypotheses to be tested.
  6. 'statistic: value of the sample statistic used for testing the null hypothesis.
  7. 'distribution: distribution of the sample statistic, together with its parameters.
  8. 'p_value: p-value of the test.

Examples:

The equality of the variances of two normal populations is checked against the alternative that the first is greater than the second.

(%i1) load("stats")$
(%i2) x: [20.4,62.5,61.3,44.2,11.1,23.7]$
(%i3) y: [1.2,6.9,38.7,20.4,17.2]$
(%i4) test_variance_ratio(x,y,'alternative='greater);
              |              VARIANCE RATIO TEST
              |
              |       ratio_estimate = 2.316933391522034
              |
              |               conf_level = 0.95
              |
              |    conf_interval = [.3703504689507268, inf]
              |
(%o4)         | method = Variance ratio F-test. Unknown means.
              |
              | hypotheses = H0: var1 = var2 , H1: var1 > var2
              |
              |         statistic = 2.316933391522034
              |
              |            distribution = [f, 5, 4]
              |
              |          p_value = .2179269692254457
Function: test_proportion (x, n)
Function: test_proportion (x, n, options ...)

Inferences on a proportion. Argument x is the number of successes in n trials in a Bernoulli experiment with unknown probability.

Options:

  • 'proportion, default 1/2, is the value of the proportion to be checked.
  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided, 'greater and 'less.
  • 'conflevel, default 95/100, confidence level for the confidence interval; it must be an expression which takes a value in (0,1).
  • 'asymptotic, default false, indicates whether it performs an exact test based on the binomial distribution, or an asymptotic one based on the Central Limit Theorem; valid values are true and false.
  • 'correct, default true, indicates whether Yates correction is applied or not.

The output of function test_proportion is an inference_result Maxima object showing the following results:

  1. 'sample_proportion: the sample proportion.
  2. 'conf_level: confidence level selected by the user.
  3. 'conf_interval: Wilson confidence interval for the proportion.
  4. 'method: inference procedure.
  5. 'hypotheses: null and alternative hypotheses to be tested.
  6. 'statistic: value of the sample statistic used for testing the null hypothesis.
  7. 'distribution: distribution of the sample statistic, together with its parameters.
  8. 'p_value: p-value of the test.

Examples:

Performs an exact test. The null hypothesis is H_0: p=1/2 against the one sided alternative H_1: p<1/2.

(%i1) load("stats")$
(%i2) test_proportion(45, 103, alternative = less);
         |            PROPORTION TEST              
         |                                         
         | sample_proportion = .4368932038834951   
         |                                         
         |           conf_level = 0.95             
         |                                         
         | conf_interval = [0, 0.522714149150231]  
         |                                         
(%o2)    |     method = Exact binomial test.       
         |                                         
         | hypotheses = H0: p = 0.5 , H1: p < 0.5  
         |                                         
         |             statistic = 45              
         |                                         
         |  distribution = [binomial, 103, 0.5]    
         |                                         
         |      p_value = .1184509388901454 

A two sided asymptotic test. Confidence level is 99/100.

(%i1) load("stats")$
(%i2) fpprintprec:7$
(%i3) test_proportion(45, 103, 
                  conflevel = 99/100, asymptotic=true);
      |                 PROPORTION TEST                  
      |                                                  
      |           sample_proportion = .43689             
      |                                                  
      |                conf_level = 0.99                 
      |                                                  
      |        conf_interval = [.31422, .56749]          
      |                                                  
(%o3) | method = Asympthotic test with Yates correction.
      |                                                  
      |     hypotheses = H0: p = 0.5 , H1: p # 0.5       
      |                                                  
      |               statistic = .43689                 
      |                                                  
      |      distribution = [normal, 0.5, .048872]       
      |                                                  
      |                p_value = .19662
Function: test_proportions_difference (x1, n1, x2, n2)
Function: test_proportions_difference (x1, n1, x2, n2, options ...)

Inferences on the difference of two proportions. Argument x1 is the number of successes in n1 trials in a Bernoulli experiment in the first population, and x2 and n2 are the corresponding values in the second population. Samples are independent and the test is asymptotic.

Options:

  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided (p1 # p2), 'greater (p1 > p2) and 'less (p1 < p2).
  • 'conflevel, default 95/100, confidence level for the confidence interval; it must be an expression which takes a value in (0,1).
  • 'correct, default true, indicates whether Yates correction is applied or not.

The output of function test_proportions_difference is an inference_result Maxima object showing the following results:

  1. 'proportions: list with the two sample proportions.
  2. 'conf_level: confidence level selected by the user.
  3. 'conf_interval: Confidence interval for the difference of proportions p1 - p2.
  4. 'method: inference procedure and warning message in case of any of the samples sizes is less than 10.
  5. 'hypotheses: null and alternative hypotheses to be tested.
  6. 'statistic: value of the sample statistic used for testing the null hypothesis.
  7. 'distribution: distribution of the sample statistic, together with its parameters.
  8. 'p_value: p-value of the test.

Examples:

A machine produced 10 defective articles in a batch of 250. After some maintenance work, it produces 4 defective in a batch of 150. In order to know if the machine has improved, we test the null hypothesis H0:p1=p2, against the alternative H0:p1>p2, where p1 and p2 are the probabilities for one produced article to be defective before and after maintenance. According to the p value, there is not enough evidence to accept the alternative.

(%i1) load("stats")$
(%i2) fpprintprec:7$
(%i3) test_proportions_difference(10, 250, 4, 150,
                                alternative = greater);
      |       DIFFERENCE OF PROPORTIONS TEST         
      |                                              
      |       proportions = [0.04, .02666667]        
      |                                              
      |              conf_level = 0.95               
      |                                              
      |      conf_interval = [- .02172761, 1]        
      |                                              
(%o3) | method = Asymptotic test. Yates correction.  
      |                                              
      |   hypotheses = H0: p1 = p2 , H1: p1 > p2     
      |                                              
      |            statistic = .01333333             
      |                                              
      |    distribution = [normal, 0, .01898069]     
      |                                              
      |             p_value = .2411936 

Exact standard deviation of the asymptotic normal distribution when the data are unknown.

(%i1) load("stats")$
(%i2) stats_numer: false$
(%i3) sol: test_proportions_difference(x1,n1,x2,n2)$
(%i4) last(take_inference('distribution,sol));
               1    1                  x2 + x1
              (-- + --) (x2 + x1) (1 - -------)
               n2   n1                 n2 + n1
(%o4)    sqrt(---------------------------------)
                           n2 + n1
Function: test_sign (x)
Function: test_sign (x, options ...)

This is the non parametric sign test for the median of a continuous population. Argument x is a list or a column matrix containing an one dimensional sample.

Options:

  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided, 'greater and 'less.
  • 'median, default 0, is the median value to be checked.

The output of function test_sign is an inference_result Maxima object showing the following results:

  1. 'med_estimate: the sample median.
  2. 'method: inference procedure.
  3. 'hypotheses: null and alternative hypotheses to be tested.
  4. 'statistic: value of the sample statistic used for testing the null hypothesis.
  5. 'distribution: distribution of the sample statistic, together with its parameter(s).
  6. 'p_value: p-value of the test.

Examples:

Checks whether the population from which the sample was taken has median 6, against the alternative H_1: median > 6.

(%i1) load("stats")$
(%i2) x: [2,0.1,7,1.8,4,2.3,5.6,7.4,5.1,6.1,6]$
(%i3) test_sign(x,'median=6,'alternative='greater);
               |                  SIGN TEST
               |
               |              med_estimate = 5.1
               |
               |      method = Non parametric sign test.
               |
(%o3)          | hypotheses = H0: median = 6 , H1: median > 6
               |
               |                statistic = 7
               |
               |      distribution = [binomial, 10, 0.5]
               |
               |         p_value = .05468749999999989
Function: test_signed_rank (x)
Function: test_signed_rank (x, options ...)

This is the Wilcoxon signed rank test to make inferences about the median of a continuous population. Argument x is a list or a column matrix containing an one dimensional sample. Performs normal approximation if the sample size is greater than 20, or if there are zeroes or ties.

See also pdf_rank_test and cdf_rank_test.

Options:

  • 'median, default 0, is the median value to be checked.
  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided, 'greater and 'less.

The output of function test_signed_rank is an inference_result Maxima object with the following results:

  1. 'med_estimate: the sample median.
  2. 'method: inference procedure.
  3. 'hypotheses: null and alternative hypotheses to be tested.
  4. 'statistic: value of the sample statistic used for testing the null hypothesis.
  5. 'distribution: distribution of the sample statistic, together with its parameter(s).
  6. 'p_value: p-value of the test.

Examples:

Checks the null hypothesis H_0: median = 15 against the alternative H_1: median > 15. This is an exact test, since there are no ties.

(%i1) load("stats")$
(%i2) x: [17.1,15.9,13.7,13.4,15.5,17.6]$
(%i3) test_signed_rank(x,median=15,alternative=greater);
                 |             SIGNED RANK TEST
                 |
                 |           med_estimate = 15.7
                 |
                 |           method = Exact test
                 |
(%o3)            | hypotheses = H0: med = 15 , H1: med > 15
                 |
                 |              statistic = 14
                 |
                 |     distribution = [signed_rank, 6]
                 |
                 |            p_value = 0.28125

Checks the null hypothesis H_0: equal(median, 2.5) against the alternative H_1: not equal(median, 2.5). This is an approximated test, since there are ties.

(%i1) load("stats")$
(%i2) y:[1.9,2.3,2.6,1.9,1.6,3.3,4.2,4,2.4,2.9,1.5,3,2.9,4.2,3.1]$
(%i3) test_signed_rank(y,median=2.5);
             |                 SIGNED RANK TEST
             |
             |                med_estimate = 2.9
             |
             |          method = Asymptotic test. Ties
             |
(%o3)        |    hypotheses = H0: med = 2.5 , H1: med # 2.5
             |
             |                 statistic = 76.5
             |
             | distribution = [normal, 60.5, 17.58195097251724]
             |
             |           p_value = .3628097734643669
Function: test_rank_sum (x1, x2)
Function: test_rank_sum (x1, x2, option)

This is the Wilcoxon-Mann-Whitney test for comparing the medians of two continuous populations. The first two arguments x1 and x2 are lists or column matrices with the data of two independent samples. Performs normal approximation if any of the sample sizes is greater than 10, or if there are ties.

Option:

  • 'alternative, default 'twosided, is the alternative hypothesis; valid values are: 'twosided, 'greater and 'less.

The output of function test_rank_sum is an inference_result Maxima object with the following results:

  1. 'method: inference procedure.
  2. 'hypotheses: null and alternative hypotheses to be tested.
  3. 'statistic: value of the sample statistic used for testing the null hypothesis.
  4. 'distribution: distribution of the sample statistic, together with its parameters.
  5. 'p_value: p-value of the test.

Examples:

Checks whether populations have similar medians. Samples sizes are small and an exact test is made.

(%i1) load("stats")$
(%i2) x:[12,15,17,38,42,10,23,35,28]$
(%i3) y:[21,18,25,14,52,65,40,43]$
(%i4) test_rank_sum(x,y);
              |                 RANK SUM TEST
              |
              |              method = Exact test
              |
              | hypotheses = H0: med1 = med2 , H1: med1 # med2
(%o4)         |
              |                 statistic = 22
              |
              |        distribution = [rank_sum, 9, 8]
              |
              |          p_value = .1995886466474702

Now, with greater samples and ties, the procedure makes normal approximation. The alternative hypothesis is H_1: median1 < median2.

(%i1) load("stats")$
(%i2) x: [39,42,35,13,10,23,15,20,17,27]$
(%i3) y: [20,52,66,19,41,32,44,25,14,39,43,35,19,56,27,15]$
(%i4) test_rank_sum(x,y,'alternative='less);
             |                  RANK SUM TEST
             |
             |          method = Asymptotic test. Ties
             |
             |  hypotheses = H0: med1 = med2 , H1: med1 < med2
(%o4)        |
             |                 statistic = 48.5
             |
             | distribution = [normal, 79.5, 18.95419580097078]
             |
             |           p_value = .05096985666598441
Function: test_normality (x)

Shapiro-Wilk test for normality. Argument x is a list of numbers, and sample size must be greater than 2 and less or equal than 5000, otherwise, function test_normality signals an error message.

Reference:

[1] Algorithm AS R94, Applied Statistics (1995), vol.44, no.4, 547-551

The output of function test_normality is an inference_result Maxima object with the following results:

  1. 'statistic: value of the W statistic.
  2. 'p_value: p-value under normal assumption.

Examples:

Checks for the normality of a population, based on a sample of size 9.

(%i1) load("stats")$
(%i2) x:[12,15,17,38,42,10,23,35,28]$
(%i3) test_normality(x);
                       |      SHAPIRO - WILK TEST
                       |
(%o3)                  | statistic = .9251055695162436
                       |
                       |  p_value = .4361763918860381
Function: simple_linear_regression (x)
Function: simple_linear_regression (x option)

Simple linear regression, y_i=a+b x_i+e_i, where e_i are N(0,sigma) independent random variables. Argument x must be a two column matrix or a list of pairs.

Options:

  • 'conflevel, default 95/100, confidence level for the confidence interval; it must be an expression which takes a value in (0,1).
  • 'regressor, default 'x, name of the independent variable.

The output of function simple_linear_regression is an inference_result Maxima object with the following results:

  1. 'model: the fitted equation. Useful to make new predictions. See examples bellow.
  2. 'means: bivariate mean.
  3. 'variances: variances of both variables.
  4. 'correlation: correlation coefficient.
  5. 'adc: adjusted determination coefficient.
  6. 'a_estimation: estimation of parameter a.
  7. 'a_conf_int: confidence interval of parameter a.
  8. 'b_estimation: estimation of parameter b.
  9. 'b_conf_int: confidence interval of parameter b.
  10. 'hypotheses: null and alternative hypotheses about parameter b.
  11. 'statistic: value of the sample statistic used for testing the null hypothesis.
  12. 'distribution: distribution of the sample statistic, together with its parameter.
  13. 'p_value: p-value of the test about b.
  14. 'v_estimation: unbiased variance estimation, or residual variance.
  15. 'v_conf_int: variance confidence interval.
  16. 'cond_mean_conf_int: confidence interval for the conditioned mean. See examples bellow.
  17. 'new_pred_conf_int: confidence interval for a new prediction. See examples bellow.
  18. 'residuals: list of pairs (prediction, residual), ordered with respect to predictions. This is useful for goodness of fit analysis. See examples bellow.

Only items 1, 4, 14, 9, 10, 11, 12, and 13 above, in this order, are shown by default. The rest remain hidden until the user makes use of functions items_inference and take_inference.

Example:

Fitting a linear model to a bivariate sample. Input %i4 plots the sample together with the regression line; input %i5 computes y given x=113; the means and the confidence interval for a new prediction when x=113 are also calculated.

(%i1) load("stats")$
(%i2) s:[[125,140.7], [130,155.1], [135,160.3], [140,167.2],
                                                [145,169.8]]$
(%i3) z:simple_linear_regression(s,conflevel=0.99);
           |               SIMPLE LINEAR REGRESSION
           |
           |   model = 1.405999999999985 x - 31.18999999999804
           |
           |           correlation = .9611685255255155
           |
           |           v_estimation = 13.57966666666665
           |
(%o3)      | b_conf_int = [.04469633662525263, 2.767303663374718]
           |
           |          hypotheses = H0: b = 0 ,H1: b # 0
           |
           |            statistic = 6.032686683658114
           |
           |            distribution = [student_t, 3]
           |
           |             p_value = 0.0038059549413203
(%i4) plot2d([[discrete, s], take_inference(model,z)],
        [x,120,150],
        [gnuplot_curve_styles, ["with points","with lines"]] )$
(%i5) take_inference(model,z), x=133;
(%o5)                         155.808
(%i6) take_inference(means,z);
(%o6)                     [135.0, 158.62]
(%i7) take_inference(new_pred_conf_int,z), x=133;
(%o7)              [132.0728595995113, 179.5431404004887]

71.4 Functions and Variables for special distributions

Function: pdf_signed_rank (x, n)

Probability density function of the exact distribution of the signed rank statistic. Argument x is a real number and n a positive integer.

See also test_signed_rank.

Function: cdf_signed_rank (x, n)

Cumulative density function of the exact distribution of the signed rank statistic. Argument x is a real number and n a positive integer.

See also test_signed_rank.

Function: pdf_rank_sum (x, n, m)

Probability density function of the exact distribution of the rank sum statistic. Argument x is a real number and n and m are both positive integers.

See also test_rank_sum.

Function: cdf_rank_sum (x, n, m)

Cumulative density function of the exact distribution of the rank sum statistic. Argument x is a real number and n and m are both positive integers.

See also test_rank_sum.


Nächste: , Vorige:   [Inhalt][Index]