Package descriptive (Maxima 5.47post Manual)

Next: Package diag, Previous: Package contrib_ode [Contents][Index]

52 Package descriptive ¶

Introduction to descriptive
Functions and Variables for data manipulation
Functions and Variables for descriptive statistics
Functions and Variables for statistical graphs

Next: Functions and Variables for data manipulation, Previous: Package descriptive, Up: Package descriptive [Contents][Index]

52.1 Introduction to descriptive ¶

Package descriptive contains a set of functions for making descriptive statistical computations and graphing. Together with the source code there are three data sets in your Maxima tree: pidigits.data, wind.data and biomed.data.

Any statistics manual can be used as a reference to the functions in package descriptive.

For comments, bugs or suggestions, please contact me at ’riotorto AT yahoo DOT com’.

Here is a simple example on how the descriptive functions in descriptive do they work, depending on the nature of their arguments, lists or matrices,

(%i1) load ("descriptive")$

(%i2) /* univariate sample */   mean ([a, b, c]);
                            c + b + a
(%o2)                       ---------
                                3

(%i3) matrix ([a, b], [c, d], [e, f]);
                            [ a  b ]
                            [      ]
(%o3)                       [ c  d ]
                            [      ]
                            [ e  f ]

(%i4) /* multivariate sample */ mean (%);
                      e + c + a  f + d + b
(%o4)                [---------, ---------]
                          3          3

Note that in multivariate samples the mean is calculated for each column.

In case of several samples with possible different sizes, the Maxima function map can be used to get the desired results for each sample,

(%i1) load ("descriptive")$

(%i2) map (mean, [[a, b, c], [d, e]]);
                        c + b + a  e + d
(%o2)                  [---------, -----]
                            3        2

In this case, two samples of sizes 3 and 2 were stored into a list.

Univariate samples must be stored in lists like

(%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
(%o1)           [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

and multivariate samples in matrices as in

(%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
             [10.58, 6.63], [13.33, 13.25], [13.21,  8.12]);
                        [ 13.17  9.29  ]
                        [              ]
                        [ 14.71  16.88 ]
                        [              ]
                        [ 18.5   16.88 ]
(%o1)                   [              ]
                        [ 10.58  6.63  ]
                        [              ]
                        [ 13.33  13.25 ]
                        [              ]
                        [ 13.21  8.12  ]

In this case, the number of columns equals the random variable dimension and the number of rows is the sample size.

Data can be introduced by hand, but big samples are usually stored in plain text files. For example, file pidigits.data contains the first 100 digits of number %pi:

In order to load these digits in Maxima,

(%i1) s1 : read_list (file_search ("pidigits.data"))$

(%i2) length (s1);
(%o2)                          100

On the other hand, file wind.data contains daily average wind speeds at 5 meteorological stations in the Republic of Ireland (This is part of a data set taken at 12 meteorological stations. The original file is freely downloadable from the StatLib Data Repository and its analysis is discussed in Haslett, J., Raftery, A. E. (1989) Space-time Modelling with Long-memory Dependence: Assessing Ireland’s Wind Power Resource, with Discussion. Applied Statistics 38, 1-50). This loads the data:

(%i1) s2 : read_matrix (file_search ("wind.data"))$

(%i2) length (s2);
(%o2)                          100

(%i3) s2 [%]; /* last record */
(%o3)            [3.58, 6.0, 4.58, 7.62, 11.25]

Some samples contain non numeric data. As an example, file biomed.data (which is part of another bigger one downloaded from the StatLib Data Repository) contains four blood measures taken from two groups of patients, A and B, of different ages,

(%i1) s3 : read_matrix (file_search ("biomed.data"))$

(%i2) length (s3);
(%o2)                          100

(%i3) s3 [1]; /* first record */
(%o3)            [A, 30, 167.0, 89.0, 25.6, 364]

The first individual belongs to group A, is 30 years old and his/her blood measures were 167.0, 89.0, 25.6 and 364.

One must take care when working with categorical data. In the next example, symbol a is assigned a value in some previous moment and then a sample with categorical value a is taken,

(%i1) a : 1$

(%i2) matrix ([a, 3], [b, 5]);
                            [ 1  3 ]
(%o2)                       [      ]
                            [ b  5 ]

Categories: Descriptive statistics · Share packages · Package descriptive ·

Next: Functions and Variables for descriptive statistics, Previous: Introduction to descriptive, Up: Package descriptive [Contents][Index]

52.2 Functions and Variables for data manipulation ¶

Function: build_sample build_sample (list) build_sample (matrix) ¶

Builds a sample from a table of absolute frequencies. The input table can be a matrix or a list of lists, all of them of equal size. The number of columns or the length of the lists must be greater than 1. The last element of each row or list is interpreted as the absolute frequency. The output is always a sample in matrix form.

Examples:

Univariate frequency table.

(%i1) load ("descriptive")$

(%i2) sam1: build_sample([[6,1], [j,2], [2,1]]);
                              [ 6 ]
                              [   ]
                              [ j ]
(%o2)                         [   ]
                              [ j ]
                              [   ]
                              [ 2 ]

(%i3) mean(sam1);
                              j + 4
(%o3)                        [-----]
                                2

(%i4) barsplot(sam1) $

Multivariate frequency table.

(%i1) load ("descriptive")$

(%i2) sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ;
                            [ 6  3 ]
                            [      ]
                            [ 5  6 ]
                            [      ]
                            [ 5  6 ]
(%o2)                       [      ]
                            [ u  2 ]
                            [      ]
                            [ 6  8 ]
                            [      ]
                            [ 6  8 ]

(%i3) cov(sam2);
      [   2                 2                            ]
      [  u  + 158   (u + 28)     2 u + 174   11 (u + 28) ]
      [  -------- - ---------    --------- - ----------- ]
(%o3) [     6          36            6           12      ]
      [                                                  ]
      [ 2 u + 174   11 (u + 28)            21            ]
      [ --------- - -----------            --            ]
      [     6           12                 4             ]

(%i4) barsplot(sam2, grouping=stacked) $

Categories: Package descriptive ·

Function: continuous_freq continuous_freq (data) continuous_freq (data, m) ¶

Divides the range of data into intervals, and counts how many values fall into each one.

A value x falls into an interval with left and right endpoints a and b if and only if x > a and x <= b, except for the first (least or leftmost) interval, for which x >= a and x <= b. That is, an interval excludes its left endpoint and includes its right endpoint, except for the first interval, which includes both the left and right endpoints.

data must be a list of numbers, or 1-dimensional array (as created by make_array).

m is optional, and equals either the number of classes (10 by default), or a list of two elements (the least and greatest values to be counted), or a list of three elements (the least and greatest values to be counted, and the number of classes), or a set containing the endpoints of the class intervals.

It is assumed that class intervals are contiguous. That is, the right endpoint of one interval is equal to the left endpoint of the next.

continuous_freq returns a list of two lists. The first list comprises all the endpoints of the class intervals, concatenated into a single list. The second list contains the class counts for the intervals corresponding to elements of the first list.

If sample values are all equal, this function returns exactly one class of width 2.

Examples:

Optional argument indicates the number of classes we want. The first list in the output contains the interval limits, and the second the corresponding counts: there are 16 digits inside the interval [0, 1.8], 24 digits in (1.8, 3.6], and so on.

(%i1) load ("descriptive")$
(%i2) s1 : read_list (file_search ("pidigits.data"))$

(%i3) continuous_freq (s1, 5);
               9  18  27  36
(%o3)     [[0, -, --, --, --, 9], [16, 24, 18, 17, 25]]
               5  5   5   5

Optional argument indicates we want 7 classes with limits -2 and 12:

(%i1) load ("descriptive")$
(%i2) s1 : read_list (file_search ("pidigits.data"))$

(%i3) continuous_freq (s1, [-2,12,7]);
(%o3) [[- 2, 0, 2, 4, 6, 8, 10, 12], [8, 20, 22, 17, 20, 13, 0]]

Optional argument indicates we want the default number of classes with limits -2 and 12:

(%i1) load ("descriptive")$
(%i2) s1 : read_list (file_search ("pidigits.data"))$

(%i3) continuous_freq (s1, [-2,12]);
               3  4  11  18     32  39  46  53
(%o3) [[- 2, - -, -, --, --, 5, --, --, --, --, 12], 
               5  5  5   5      5   5   5   5
                              [0, 8, 20, 12, 18, 9, 8, 25, 0, 0]]

The first argument may be an array.

(%i1) load ("descriptive")$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) a1 : make_array (fixnum, length (s1)) $

(%i4) fillarray (a1, s1);
(%o4) {Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\
 6 4 3 3 8 3 2 7 9 5
               0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\
 4 9 4 4 5 9 2
               3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\
 2 5 3 4 2 1 1
               7 0 6 7)}

(%i5) continuous_freq (a1);
           9   9  27  18  9  27  63  36  81
(%o5) [[0, --, -, --, --, -, --, --, --, --, 9], 
           10  5  10  5   2  5   10  5   10
                             [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]

Categories: Package descriptive ·

Function: discrete_freq (data) ¶

Counts absolute frequencies in discrete samples, both numeric and categorical. Its sole argument is a list, or 1-dimensional array (as created by make_array).

Examples:

(%i1) load ("descriptive")$
(%i2) s1 : read_list (file_search ("pidigits.data"))$

(%i3) discrete_freq (s1);
(%o3) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
                             [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]

In the return value, the first list gives the sample values, and the second, their absolute frequencies.

The argument may be an array.

(%i1) load ("descriptive")$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) a1 : make_array (fixnum, length (s1)) $

(%i4) fillarray (a1, s1);
(%o4) {Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\
 6 4 3 3 8 3 2 7 9 5
               0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\
 4 9 4 4 5 9 2
               3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\
 2 5 3 4 2 1 1
               7 0 6 7)}

(%i5) discrete_freq (a1);
(%o5) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
                             [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]

Categories: Package descriptive ·

Function: standardize standardize (list) standardize (matrix) ¶: Subtracts to each element of the list the sample mean and divides the result by the standard deviation. When the input is a matrix, standardize subtracts to each row the multivariate mean, and then divides each component by the corresponding standard deviation.

Categories: Package descriptive ·

Function: subsample subsample (data_matrix, predicate_function) subsample (data_matrix, predicate_function, col_num1, col_num2, ...) ¶

This is a sort of variant of the Maxima submatrix function. The first argument is the data matrix, the second is a predicate function and optional additional arguments are the numbers of the columns to be taken.

Examples:

These are multivariate records in which the wind speed in the first meteorological station were greater than 18. See that in the lambda expression the i-th component is referred to as v[i].

(%i1) load ("descriptive")$
(%i2) s2 : read_matrix (file_search ("wind.data"))$

(%i3) subsample (s2, lambda([v], v[1] > 18));
              [ 19.38  15.37  15.12  23.09  25.25 ]
              [                                   ]
              [ 18.29  18.66  19.08  26.08  27.63 ]
(%o3)         [                                   ]
              [ 20.25  21.46  19.95  27.71  23.38 ]
              [                                   ]
              [ 18.79  18.96  14.46  26.38  21.84 ]

In the following example, we request only the first, second and fifth components of those records with wind speeds greater or equal than 16 in station number 1 and less than 25 knots in station number 4. The sample contains only data from stations 1, 2 and 5. In this case, the predicate function is defined as an ordinary Maxima function.

(%i1) load ("descriptive")$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) g(x):= x[1] >= 16 and x[4] < 25$

(%i4) subsample (s2, g, 1, 2, 5);
                     [ 19.38  15.37  25.25 ]
                     [                     ]
                     [ 17.33  14.67  19.58 ]
(%o4)                [                     ]
                     [ 16.92  13.21  21.21 ]
                     [                     ]
                     [ 17.25  18.46  23.87 ]

Here is an example with the categorical variables of biomed.data. We want the records corresponding to those patients in group B who are older than 38 years.

(%i1) load ("descriptive")$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) h(u):= u[1] = B and u[2] > 38 $

(%i4) subsample (s3, h);
                [ B  39  28.0  102.3  17.1  146 ]
                [                               ]
                [ B  39  21.0  92.4   10.3  197 ]
                [                               ]
                [ B  39  23.0  111.5  10.0  133 ]
                [                               ]
                [ B  39  26.0  92.6   12.3  196 ]
(%o4)           [                               ]
                [ B  39  25.0  98.7   10.0  174 ]
                [                               ]
                [ B  39  21.0  93.2   5.9   181 ]
                [                               ]
                [ B  39  18.0  95.0   11.3  66  ]
                [                               ]
                [ B  39  39.0  88.5   7.6   168 ]

Probably, the statistical analysis will involve only the blood measures,

(%i1) load ("descriptive")$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$

(%i3) subsample (s3, lambda([v], v[1] = B and v[2] > 38),
           3, 4, 5, 6);
                   [ 28.0  102.3  17.1  146 ]
                   [                        ]
                   [ 21.0  92.4   10.3  197 ]
                   [                        ]
                   [ 23.0  111.5  10.0  133 ]
                   [                        ]
                   [ 26.0  92.6   12.3  196 ]
(%o3)              [                        ]
                   [ 25.0  98.7   10.0  174 ]
                   [                        ]
                   [ 21.0  93.2   5.9   181 ]
                   [                        ]
                   [ 18.0  95.0   11.3  66  ]
                   [                        ]
                   [ 39.0  88.5   7.6   168 ]

This is the multivariate mean of s3,

(%i1) load ("descriptive")$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$

(%i3) mean (s3);
       13 B + 7 A  317
(%o3) [----------, ---, 87.178, 0.06 NA + 81.44999999999999, 
           20      10
                                                    3 NA + 19587
                                18.122999999999998, ------------]
                                                        100

Here, the first component is meaningless, since A and B are categorical, the second component is the mean age of individuals in rational form, and the fourth and last values exhibit some strange behaviour. This is because symbol NA is used here to indicate non available data, and the two means are nonsense. A possible solution would be to take out from the matrix those rows with NA symbols, although this deserves some loss of information.

(%i1) load ("descriptive")$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) g(v):= v[4] # NA and v[6] # NA $

(%i4) mean (subsample (s3, g, 3, 4, 5, 6));
(%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813, 
                                                            2514
                                                            ----]
                                                             13

Categories: Package descriptive ·

Function: transform_sample (matrix, varlist, exprlist) ¶

Transforms the sample matrix, where each column is called according to varlist, following expressions in exprlist.

Examples:

The second argument assigns names to the three columns. With these names, a list of expressions define the transformation of the sample.

(%i1) load ("descriptive")$
(%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $

(%i3) transform_sample(data, [a,b,c], [c, a*b, log(a)]);
                               [ 7  6   log(3) ]
                               [               ]
                               [ 2  21  log(3) ]
(%o3)                          [               ]
                               [ 4  16  log(8) ]
                               [               ]
                               [ 4  10  log(5) ]

Add a constant column and remove the third variable.

(%i1) load ("descriptive")$
(%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
(%i3) transform_sample(data, [a,b,c], [makelist(1,k,length(data)),a,b]);

                                  [ 1  3  2 ]
                                  [         ]
                                  [ 1  3  7 ]
(%o3)                             [         ]
                                  [ 1  8  2 ]
                                  [         ]
                                  [ 1  5  2 ]

Categories: Package descriptive ·

Next: Functions and Variables for statistical graphs, Previous: Functions and Variables for data manipulation, Up: Package descriptive [Contents][Index]

52.3 Functions and Variables for descriptive statistics ¶

Function: mean mean (x) mean (x, w) ¶

Returns the sample mean. x must be a list or matrix.

When x is a list, mean returns the sample mean of x.

When x is a matrix, mean returns a list comprising the sample mean of each column.

w is an optional per-datum weight. w must either be 1, in which case every datum x[i] is given equal weight, or a list of the same length as x, in which case the weight for x[i] is given by w[i]. The elements of w must be nonnegative and not all zero; it is not required that they sum to 1.

The unweighted sample mean is defined as

                     n
                    ====
             _   1  \
             x = -   >    x
                 n  /      i
                    ====
                    i = 1

The weighted sample mean is defined as

                     n
                    ====
             _   1  \
             x = -   >    w  x
                 Z  /      i  i
                    ====
                    i = 1

where Z is the sum of the weights,

                   n
                  ====
                  \
             Z =   >    w
                  /      i
                  ====
                  i = 1

Examples:

Sample mean of a list.

(%i1) load ("descriptive")$
(%i2) s1 : read_list (file_search ("pidigits.data"))$

(%i3) mean (s1);
                               471
(%o3)                          ---
                               100

Sample mean of each column of a matrix.

(%i1) load ("descriptive")$
(%i2) s2 : read_matrix (file_search ("wind.data"))$

(%i3) mean (s2);
(%o3) [9.9485, 10.160700000000004, 10.868499999999997, 
                          15.716600000000001, 14.844100000000001]

Weighted sample mean of a list.

(%i1) load ("descriptive")$

(%i2) mean ([a, b, c, d], [1, 2, 3, 4]);
                       4 d + 3 c + 2 b + a
(%o2)                  -------------------
                               10

Weighted sample mean of each column of a matrix.

(%i1) load ("descriptive")$

(%i2) mm: matrix ([p, q, r], [s, t, u]);
                           [ p  q  r ]
(%o2)                      [         ]
                           [ s  t  u ]

(%i3) mean (mm, [vv, ww]);
              s ww + p vv  t ww + q vv  u ww + r vv
(%o3)        [-----------, -----------, -----------]
                ww + vv      ww + vv      ww + vv

Categories: Package descriptive ·

Function: var var (x) var (x, w) ¶

Returns the sample variance. x must be a list or matrix.

When x is a list, var returns the sample variance of x.

When x is a matrix, var returns a list comprising the sample variance of each column.

The unweighted sample variance is defined as

                    n
                  ====
           2   1  \          _ 2
          s  = -   >    (x - x)
               n  /       i
                  ====
                  i = 1

The weighted sample variance is defined as

                    n
                  ====
           2   1  \             _ 2
          s  = -   >    w  (x - x)
               Z  /      i   i
                  ====
                  i = 1

where Z is the sum of the weights,

                   n
                  ====
                  \
             Z =   >    w
                  /      i
                  ====
                  i = 1

Example:

Sample variance of a list.

(%i1) load ("descriptive")$
(%i2) s1 : read_list (file_search ("pidigits.data"))$

(%i3) var (s1), numer;
(%o3)                   8.425899999999999

Sample variance of each column of a matrix.

(%i1) load ("descriptive")$
(%i2) s2 : read_matrix (file_search ("wind.data"))$

(%i3) var (s2);
(%o3) [17.22190675000001, 14.987736510000005, 
       15.475728749999998, 32.17651044000001, 24.423076190000007]

Weighted sample variance of a list.

(%i1) load ("descriptive")$

(%i2) var ([a - b, a, a + b], [3, 5, 7]);
                                  2
                             134 b
(%o2)                        ------
                              225

Weighted sample variance of each column of a matrix.

(%i1) load ("descriptive")$

(%i2) mm: matrix ([a - b, c - d], [a, c], [a + b, c + d]);
                        [ a - b  c - d ]
                        [              ]
(%o2)                   [   a      c   ]
                        [              ]
                        [ b + a  d + c ]

(%i3) var (mm, [3, 5, 7]);
                              2       2
                         134 b   134 d
(%o3)                   [------, ------]
                          225     225

52.4 Functions and Variables for statistical graphs ¶

Function: barsplot (data1, data2, …, option_1, option_2, …) ¶

Plots bars diagrams for discrete statistical variables, both for one or multiple samples.

data can be a list of outcomes representing one sample, or a matrix of m rows and n columns, representing n samples of size m each.

Available options are:

box_width (default, 3/4): relative width of rectangles. This value must be in the range [0,1].
grouping (default, clustered): indicates how multiple samples are shown. Valid values are: clustered and stacked.
groups_gap (default, 1): a positive integer number representing the gap between two consecutive groups of bars.
bars_colors (default, []): a list of colors for multiple samples. When there are more samples than specified colors, the extra necessary colors are chosen at random. See color to learn more about them.
frequency (default, absolute): indicates the scale of the ordinates. Possible values are: absolute, relative, and percent.
ordering (default, orderlessp): possible values are orderlessp or ordergreatp, indicating how statistical outcomes should be ordered on the x-axis.
sample_keys (default, []): a list with the strings to be used in the legend. When the list length is other than 0 or the number of samples, an error message is returned.
start_at (default, 0): indicates where the plot begins to be plotted on the x axis.
All global draw options, except xtics, which is internally assigned by barsplot. If you want to set your own values for this option or want to build complex scenes, make use of barsplot_description. See example below.
The following local Package draw options: key, color_draw, fill_color, fill_density and line_width. See also barsplot.

There is also a function wxbarsplot for creating embedded histograms in interfaces wxMaxima and iMaxima. barsplot in a multiplot context.

Examples:

Univariate sample in matrix form. Absolute frequencies.

(%i1) load ("descriptive")$
(%i2) m : read_matrix (file_search ("biomed.data"))$

(%i3) barsplot(
  col(m,2),
  title        = "Ages",
  xlabel       = "years",
  box_width    = 1/2,
  fill_density = 3/4)$

Two samples of different sizes, with relative frequencies and user declared colors.

(%i1) load ("descriptive")$
(%i2) l1:makelist(random(10),k,1,50)$
(%i3) l2:makelist(random(10),k,1,100)$

(%i4) barsplot(
   l1,l2,
   box_width = 1,
   fill_density = 1,
   bars_colors = [black, grey],
   frequency = relative,
   sample_keys = ["A", "B"])$

Four non numeric samples of equal size.

(%i1) load ("descriptive")$

(%i2) barsplot(
  makelist([Yes, No, Maybe][random(3)+1],k,1,50),
  makelist([Yes, No, Maybe][random(3)+1],k,1,50),
  makelist([Yes, No, Maybe][random(3)+1],k,1,50),
  makelist([Yes, No, Maybe][random(3)+1],k,1,50),
  title      = "Asking for something to four groups",
  ylabel     = "# of individuals",
  groups_gap = 3,
  fill_density = 0.5,
  ordering = ordergreatp)$

Stacked bars.

(%i1) load ("descriptive")$

(%i2) barsplot(
  makelist([Yes, No, Maybe][random(3)+1],k,1,50),
  makelist([Yes, No, Maybe][random(3)+1],k,1,50),
  makelist([Yes, No, Maybe][random(3)+1],k,1,50),
  makelist([Yes, No, Maybe][random(3)+1],k,1,50),
  title      = "Asking for something to four groups",
  ylabel     = "# of individuals",
  grouping   = stacked,
  fill_density = 0.5,
  ordering = ordergreatp)$

For bars diagrams related options, see barsplot of package Package draw See also functions histogram and piechart.

Categories: Package descriptive · Plotting ·

Function: barsplot_description (…) ¶

Function barsplot_description creates a graphic object suitable for creating complex scenes, together with other graphic objects.

Example: barsplot in a multiplot context.

(%i1) load ("descriptive")$
(%i2) l1:makelist(random(10),k,1,50)$
(%i3) l2:makelist(random(10),k,1,100)$
(%i4) bp1 : 
        barsplot_description(
         l1,
         box_width = 1,
         fill_density = 0.5,
         bars_colors = [blue],
         frequency = relative)$
(%i5) bp2 : 
        barsplot_description(
         l2,
         box_width = 1,
         fill_density = 0.5,
         bars_colors = [red],
         frequency = relative)$
(%i6) draw(gr2d(bp1), gr2d(bp2))$

Categories: Package descriptive · Plotting ·

Function: boxplot (data) boxplot (data, option_1, option_2, …) ¶

This function plots box-and-whisker diagrams. Argument data can be a list, which is not of great interest, since these diagrams are mainly used for comparing different samples, or a matrix, so it is possible to compare two or more components of a multivariate statistical variable. But it is also allowed data to be a list of samples with possible different sample sizes, in fact this is the only function in package descriptive that admits this type of data structure.

The box is plotted from the first quartile to the third, with an horizontal segment situated at the second quartile or median. By default, lower and upper whiskers are plotted at the minimum and maximum values, respectively. Option range can be used to indicate that values greater than quantile(x,3/4)+range*(quantile(x,3/4)-quantile(x,1/4)) or less than quantile(x,1/4)-range*(quantile(x,3/4)-quantile(x,1/4)) must be considered as outliers, in which case they are plotted as isolated points, and the whiskers are located at the extremes of the rest of the sample.

Available options are:

box_width (default, 3/4): relative width of boxes. This value must be in the range [0,1].
box_orientation (default, vertical): possible values: vertical and horizontal.
range (default, inf): positive coefficient of the interquartilic range to set outliers boundaries.
outliers_size (default, 1): circle size for isolated outliers.
All draw options, except points_joined, point_size, point_type, xtics, ytics, xrange, and yrange, which are internally assigned by boxplot. If you want to set your own values for this options or want to build complex scenes, make use of boxplot_description.
The following local draw options: key, color, and line_width.

There is also a function wxboxplot for creating embedded histograms in interfaces wxMaxima and iMaxima.

Examples:

Box-and-whisker diagram from a multivariate sample.

(%i1) load ("descriptive")$
(%i2) s2 : read_matrix(file_search("wind.data"))$

(%i3) boxplot(s2,
  box_width  = 0.2,
  title      = "Windspeed in knots",
  xlabel     = "Stations",
  color      = red,
  line_width = 2)$

Box-and-whisker diagram from three samples of different sizes.

(%i1) load ("descriptive")$

(%i2) A :
 [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
  [8, 10, 7, 9, 12, 8, 10],
  [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$

(%i3) boxplot (A, box_orientation = horizontal)$

Option range can be used to handle outliers.

(%i1)  load ("descriptive")$
 B: [[7, 15, 5, 8, 6, 5, 7, 3, 1],
     [10, 8, 12, 8, 11, 9, 20],
     [23, 17, 19, 7, 22, 19]] $
 boxplot (B, range=1)$
 boxplot (B, range=1.5, box_orientation = horizontal)$
 draw2d(
    boxplot_description(
       B,
       range            = 1.5,
       line_width       = 3,
       outliers_size    = 2,
       color            = red,
       background_color = light_gray),
    xtics = {["Low",1],["Medium",2],["High",3]}) $

Categories: Package descriptive · Plotting ·

Function: boxplot_description (…) ¶: Function boxplot_description creates a graphic object suitable for creating complex scenes, together with other graphic objects.

Categories: Package descriptive · Plotting ·

Function: histogram histogram (list) histogram (list, option_1, option_2, …) histogram (one_column_matrix) histogram (one_column_matrix, option_1, option_2, …) histogram (one_row_matrix) histogram (one_row_matrix, option_1, option_2, …) ¶

Constructs and displays a histogram from a data sample. Data must be stored as a list of numbers, or a matrix of one row or one column.

Optional arguments:

nclasses (default, 10): the number of classes (also called bins) in the histogram, or a list of two numbers (the least and greatest values included in the histogram), or a list of three numbers (the least and greatest values included in the histogram, and the number of classes), or a set containing the endpoints of the class intervals, or a symbol specifying the name of one of three algorithms to automatically determine the number of classes: fd (Ref. [1]), scott (Ref. [2]), or sturges (Ref. [3]).
A class interval excludes its left endpoint and includes its right endpoint, except for the first interval, which includes both the left and right endpoints. It is assumed that class intervals are contiguous. That is, the right endpoint of one interval is equal to the left endpoint of the next.
frequency (default, absolute): indicates the scale of the vertical axis. Possible values are: absolute (heights of bars add up to number of data), relative (heights of bars add up to 1), percent (heights of bars add up to 100), and density (total area of histogram is 1).
htics (default, auto): format of tic marks on the horizontal axis. Possible values are: auto (tics are placed automatically), endpoints (tics are placed at the divisions between classes), intervals (classes are labeled with the corresponding intervals), or a list of labels, one for each class.
All global draw options, except xrange, yrange, and xtics, which are internally assigned by histogram. If you want to set your own values for these options, make use of histogram_description.
The following local Package draw options: key, fill_color, fill_density, and line_width. Note that the outlines of bars, as well as the interior of bars when fill_density is nonzero, are drawn with fill_color, not color.

histogram honors the global option histogram_skyline. When histogram_skyline is true, histogram and histogram_description construct "skyline" plots, which shows the outline of the histogram bars, instead of drawing all the vertical segments. Otherwise (the default), histograms are displayed with bars showing vertical segments.

There is also a function wxhistogram for creating embedded histograms in interfaces wxMaxima and iMaxima.

See also continuous_freq, which, like histogram, counts data in intervals, but returns the counts instead of displaying a graphic representation.