Next: Package diag, Previous: Package contrib_ode [Contents][Index]
Next: Functions and Variables for data manipulation, Previous: Package descriptive, Up: Package descriptive [Contents][Index]
Package descriptive
contains a set of functions for
making descriptive statistical computations and graphing.
Together with the source code there are three data sets in
your Maxima tree: pidigits.data
, wind.data
and biomed.data
.
Any statistics manual can be used as a reference to the functions in package descriptive
.
For comments, bugs or suggestions, please contact me at ’riotorto AT yahoo DOT com’.
Here is a simple example on how the descriptive functions in descriptive
do they work, depending on the nature of their arguments, lists or matrices,
(%i1) load ("descriptive")$
(%i2) /* univariate sample */ mean ([a, b, c]); c + b + a (%o2) --------- 3
(%i3) matrix ([a, b], [c, d], [e, f]); [ a b ] [ ] (%o3) [ c d ] [ ] [ e f ]
(%i4) /* multivariate sample */ mean (%); e + c + a f + d + b (%o4) [---------, ---------] 3 3
Note that in multivariate samples the mean is calculated for each column.
In case of several samples with possible different sizes, the Maxima function map
can be used to get the desired results for each sample,
(%i1) load ("descriptive")$
(%i2) map (mean, [[a, b, c], [d, e]]); c + b + a e + d (%o2) [---------, -----] 3 2
In this case, two samples of sizes 3 and 2 were stored into a list.
Univariate samples must be stored in lists like
(%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]; (%o1) [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
and multivariate samples in matrices as in
(%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88], [10.58, 6.63], [13.33, 13.25], [13.21, 8.12]); [ 13.17 9.29 ] [ ] [ 14.71 16.88 ] [ ] [ 18.5 16.88 ] (%o1) [ ] [ 10.58 6.63 ] [ ] [ 13.33 13.25 ] [ ] [ 13.21 8.12 ]
In this case, the number of columns equals the random variable dimension and the number of rows is the sample size.
Data can be introduced by hand, but big samples are usually stored in plain text files. For example, file pidigits.data
contains the first 100 digits of number %pi
:
3 1 4 1 5 9 2 6 5 3 ...
In order to load these digits in Maxima,
(%i1) s1 : read_list (file_search ("pidigits.data"))$
(%i2) length (s1); (%o2) 100
On the other hand, file wind.data
contains daily average wind speeds at 5 meteorological stations in the Republic of Ireland (This is part of a data set taken at 12 meteorological stations. The original file is freely downloadable from the StatLib Data Repository and its analysis is discussed in Haslett, J., Raftery, A. E. (1989) Space-time Modelling with Long-memory Dependence: Assessing Ireland’s Wind Power Resource, with Discussion. Applied Statistics 38, 1-50). This loads the data:
(%i1) s2 : read_matrix (file_search ("wind.data"))$
(%i2) length (s2); (%o2) 100
(%i3) s2 [%]; /* last record */ (%o3) [3.58, 6.0, 4.58, 7.62, 11.25]
Some samples contain non numeric data. As an example, file biomed.data
(which is part of another bigger one downloaded from the StatLib Data Repository) contains four blood measures taken from two groups of patients, A
and B
, of different ages,
(%i1) s3 : read_matrix (file_search ("biomed.data"))$
(%i2) length (s3); (%o2) 100
(%i3) s3 [1]; /* first record */ (%o3) [A, 30, 167.0, 89.0, 25.6, 364]
The first individual belongs to group A
, is 30 years old and his/her blood measures were 167.0, 89.0, 25.6 and 364.
One must take care when working with categorical data. In the next example, symbol a
is assigned a value in some previous moment and then a sample with categorical value a
is taken,
(%i1) a : 1$
(%i2) matrix ([a, 3], [b, 5]); [ 1 3 ] (%o2) [ ] [ b 5 ]
Next: Functions and Variables for descriptive statistics, Previous: Introduction to descriptive, Up: Package descriptive [Contents][Index]
Builds a sample from a table of absolute frequencies. The input table can be a matrix or a list of lists, all of them of equal size. The number of columns or the length of the lists must be greater than 1. The last element of each row or list is interpreted as the absolute frequency. The output is always a sample in matrix form.
Examples:
Univariate frequency table.
(%i1) load ("descriptive")$
(%i2) sam1: build_sample([[6,1], [j,2], [2,1]]); [ 6 ] [ ] [ j ] (%o2) [ ] [ j ] [ ] [ 2 ]
(%i3) mean(sam1); j + 4 (%o3) [-----] 2
(%i4) barsplot(sam1) $
Multivariate frequency table.
(%i1) load ("descriptive")$
(%i2) sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ; [ 6 3 ] [ ] [ 5 6 ] [ ] [ 5 6 ] (%o2) [ ] [ u 2 ] [ ] [ 6 8 ] [ ] [ 6 8 ]
(%i3) cov(sam2); [ 2 2 ] [ u + 158 (u + 28) 2 u + 174 11 (u + 28) ] [ -------- - --------- --------- - ----------- ] (%o3) [ 6 36 6 12 ] [ ] [ 2 u + 174 11 (u + 28) 21 ] [ --------- - ----------- -- ] [ 6 12 4 ]
(%i4) barsplot(sam2, grouping=stacked) $
Divides the range of data into intervals, and counts how many values fall into each one.
A value x falls into an interval with left and right endpoints a and b
if and only if x > a
and x <= b
,
except for the first (least or leftmost) interval,
for which x >= a
and x <= b
.
That is, an interval excludes its left endpoint and includes its right endpoint,
except for the first interval, which includes both the left and right endpoints.
data must be a list of numbers,
or 1-dimensional array (as created by make_array
).
m is optional, and equals either the number of classes (10 by default), or a list of two elements (the least and greatest values to be counted), or a list of three elements (the least and greatest values to be counted, and the number of classes), or a set containing the endpoints of the class intervals.
It is assumed that class intervals are contiguous. That is, the right endpoint of one interval is equal to the left endpoint of the next.
continuous_freq
returns a list of two lists.
The first list comprises all the endpoints of the class intervals,
concatenated into a single list.
The second list contains the class counts for the intervals corresponding to elements of the first list.
If sample values are all equal, this function returns exactly one class of width 2.
Examples:
Optional argument indicates the number of classes we want.
The first list in the output contains the interval limits, and
the second the corresponding counts: there are 16 digits inside
the interval [0, 1.8]
, 24 digits in (1.8, 3.6]
, and so on.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) continuous_freq (s1, 5); 9 18 27 36 (%o3) [[0, -, --, --, --, 9], [16, 24, 18, 17, 25]] 5 5 5 5
Optional argument indicates we want 7 classes with limits -2 and 12:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) continuous_freq (s1, [-2,12,7]); (%o3) [[- 2, 0, 2, 4, 6, 8, 10, 12], [8, 20, 22, 17, 20, 13, 0]]
Optional argument indicates we want the default number of classes with limits -2 and 12:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) continuous_freq (s1, [-2,12]); 3 4 11 18 32 39 46 53 (%o3) [[- 2, - -, -, --, --, 5, --, --, --, --, 12], 5 5 5 5 5 5 5 5 [0, 8, 20, 12, 18, 9, 8, 25, 0, 0]]
The first argument may be an array.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) a1 : make_array (fixnum, length (s1)) $
(%i4) fillarray (a1, s1); (%o4) {Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\ 6 4 3 3 8 3 2 7 9 5 0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\ 4 9 4 4 5 9 2 3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\ 2 5 3 4 2 1 1 7 0 6 7)}
(%i5) continuous_freq (a1); 9 9 27 18 9 27 63 36 81 (%o5) [[0, --, -, --, --, -, --, --, --, --, 9], 10 5 10 5 2 5 10 5 10 [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
Counts absolute frequencies in discrete samples, both numeric and categorical. Its sole argument is a list,
or 1-dimensional array (as created by make_array
).
Examples:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) discrete_freq (s1); (%o3) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
In the return value, the first list gives the sample values, and the second, their absolute frequencies.
The argument may be an array.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) a1 : make_array (fixnum, length (s1)) $
(%i4) fillarray (a1, s1); (%o4) {Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\ 6 4 3 3 8 3 2 7 9 5 0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\ 4 9 4 4 5 9 2 3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\ 2 5 3 4 2 1 1 7 0 6 7)}
(%i5) discrete_freq (a1); (%o5) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
Subtracts to each element of the list the sample mean and divides
the result by the standard deviation. When the input is a matrix,
standardize
subtracts to each row the multivariate mean, and then
divides each component by the corresponding standard deviation.
This is a sort of variant of the Maxima submatrix
function.
The first argument is the data matrix, the second is a predicate function
and optional additional arguments are the numbers of the columns to be taken.
Examples:
These are multivariate records in which the wind speed
in the first meteorological station were greater than 18.
See that in the lambda expression the i-th component is
referred to as v[i]
.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) subsample (s2, lambda([v], v[1] > 18)); [ 19.38 15.37 15.12 23.09 25.25 ] [ ] [ 18.29 18.66 19.08 26.08 27.63 ] (%o3) [ ] [ 20.25 21.46 19.95 27.71 23.38 ] [ ] [ 18.79 18.96 14.46 26.38 21.84 ]
In the following example, we request only the first, second and fifth components of those records with wind speeds greater or equal than 16 in station number 1 and less than 25 knots in station number 4. The sample contains only data from stations 1, 2 and 5. In this case, the predicate function is defined as an ordinary Maxima function.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) g(x):= x[1] >= 16 and x[4] < 25$
(%i4) subsample (s2, g, 1, 2, 5); [ 19.38 15.37 25.25 ] [ ] [ 17.33 14.67 19.58 ] (%o4) [ ] [ 16.92 13.21 21.21 ] [ ] [ 17.25 18.46 23.87 ]
Here is an example with the categorical variables of biomed.data
.
We want the records corresponding to those patients in group B
who are older than 38 years.
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) h(u):= u[1] = B and u[2] > 38 $
(%i4) subsample (s3, h); [ B 39 28.0 102.3 17.1 146 ] [ ] [ B 39 21.0 92.4 10.3 197 ] [ ] [ B 39 23.0 111.5 10.0 133 ] [ ] [ B 39 26.0 92.6 12.3 196 ] (%o4) [ ] [ B 39 25.0 98.7 10.0 174 ] [ ] [ B 39 21.0 93.2 5.9 181 ] [ ] [ B 39 18.0 95.0 11.3 66 ] [ ] [ B 39 39.0 88.5 7.6 168 ]
Probably, the statistical analysis will involve only the blood measures,
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) subsample (s3, lambda([v], v[1] = B and v[2] > 38), 3, 4, 5, 6); [ 28.0 102.3 17.1 146 ] [ ] [ 21.0 92.4 10.3 197 ] [ ] [ 23.0 111.5 10.0 133 ] [ ] [ 26.0 92.6 12.3 196 ] (%o3) [ ] [ 25.0 98.7 10.0 174 ] [ ] [ 21.0 93.2 5.9 181 ] [ ] [ 18.0 95.0 11.3 66 ] [ ] [ 39.0 88.5 7.6 168 ]
This is the multivariate mean of s3
,
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) mean (s3); 13 B + 7 A 317 (%o3) [----------, ---, 87.178, 0.06 NA + 81.44999999999999, 20 10 3 NA + 19587 18.122999999999998, ------------] 100
Here, the first component is meaningless, since A
and B
are categorical, the second component is the mean age of individuals in rational form, and the fourth and last values exhibit some strange behaviour. This is because symbol NA
is used here to indicate non available data, and the two means are nonsense. A possible solution would be to take out from the matrix those rows with NA
symbols, although this deserves some loss of information.
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) g(v):= v[4] # NA and v[6] # NA $
(%i4) mean (subsample (s3, g, 3, 4, 5, 6)); (%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813, 2514 ----] 13
Transforms the sample matrix, where each column is called according to varlist, following expressions in exprlist.
Examples:
The second argument assigns names to the three columns. With these names, a list of expressions define the transformation of the sample.
(%i1) load ("descriptive")$ (%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
(%i3) transform_sample(data, [a,b,c], [c, a*b, log(a)]); [ 7 6 log(3) ] [ ] [ 2 21 log(3) ] (%o3) [ ] [ 4 16 log(8) ] [ ] [ 4 10 log(5) ]
Add a constant column and remove the third variable.
(%i1) load ("descriptive")$ (%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $ (%i3) transform_sample(data, [a,b,c], [makelist(1,k,length(data)),a,b]);
[ 1 3 2 ] [ ] [ 1 3 7 ] (%o3) [ ] [ 1 8 2 ] [ ] [ 1 5 2 ]
Next: Functions and Variables for statistical graphs, Previous: Functions and Variables for data manipulation, Up: Package descriptive [Contents][Index]
Returns the sample mean. x must be a list or matrix.
When x is a list,
mean
returns the sample mean of x.
When x is a matrix,
mean
returns a list comprising the sample mean of each column.
w is an optional per-datum weight. w must either be 1, in which case every datum x[i] is given equal weight, or a list of the same length as x, in which case the weight for x[i] is given by w[i]. The elements of w must be nonnegative and not all zero; it is not required that they sum to 1.
The unweighted sample mean is defined as
n ==== _ 1 \ x = - > x n / i ==== i = 1
The weighted sample mean is defined as
n ==== _ 1 \ x = - > w x Z / i i ==== i = 1
where Z is the sum of the weights,
n ==== \ Z = > w / i ==== i = 1
Examples:
Sample mean of a list.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mean (s1); 471 (%o3) --- 100
Sample mean of each column of a matrix.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) mean (s2); (%o3) [9.9485, 10.160700000000004, 10.868499999999997, 15.716600000000001, 14.844100000000001]
Weighted sample mean of a list.
(%i1) load ("descriptive")$
(%i2) mean ([a, b, c, d], [1, 2, 3, 4]); 4 d + 3 c + 2 b + a (%o2) ------------------- 10
Weighted sample mean of each column of a matrix.
(%i1) load ("descriptive")$
(%i2) mm: matrix ([p, q, r], [s, t, u]); [ p q r ] (%o2) [ ] [ s t u ]
(%i3) mean (mm, [vv, ww]); s ww + p vv t ww + q vv u ww + r vv (%o3) [-----------, -----------, -----------] ww + vv ww + vv ww + vv
Returns the sample variance. x must be a list or matrix.
When x is a list,
var
returns the sample variance of x.
When x is a matrix,
var
returns a list comprising the sample variance of each column.
w is an optional per-datum weight. w must either be 1, in which case every datum x[i] is given equal weight, or a list of the same length as x, in which case the weight for x[i] is given by w[i]. The elements of w must be nonnegative and not all zero; it is not required that they sum to 1.
The unweighted sample variance is defined as
n ==== 2 1 \ _ 2 s = - > (x - x) n / i ==== i = 1
The weighted sample variance is defined as
n ==== 2 1 \ _ 2 s = - > w (x - x) Z / i i ==== i = 1
where Z is the sum of the weights,
n ==== \ Z = > w / i ==== i = 1
Example:
Sample variance of a list.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) var (s1), numer; (%o3) 8.425899999999999
Sample variance of each column of a matrix.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) var (s2); (%o3) [17.22190675000001, 14.987736510000005, 15.475728749999998, 32.17651044000001, 24.423076190000007]
Weighted sample variance of a list.
(%i1) load ("descriptive")$
(%i2) var ([a - b, a, a + b], [3, 5, 7]); 2 134 b (%o2) ------ 225
Weighted sample variance of each column of a matrix.
(%i1) load ("descriptive")$
(%i2) mm: matrix ([a - b, c - d], [a, c], [a + b, c + d]); [ a - b c - d ] [ ] (%o2) [ a c ] [ ] [ b + a d + c ]
(%i3) var (mm, [3, 5, 7]); 2 2 134 b 134 d (%o3) [------, ------] 225 225
See also function var1
.
This is the sample variance, defined as
n ==== 1 \ _ 2 --- > (x - x) n-1 / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) var1 (s1), numer; (%o3) 8.5110101010101
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) var1 (s2); (%o5) [17.395865404040414, 15.139127787878794, 15.632049242424243, 32.50152569696971, 24.669773929292937]
See also function var
.
Returns the sample standard deviation. x must be a list or matrix.
When x is a list,
std
returns the sample standard deviation of x,
which is defined as the square root of the sample variance,
as computed by var
.
When x is a matrix,
std
returns a list comprising the sample standard deviation of each column.
w is an optional per-datum weight. w must either be 1, in which case every datum x[i] is given equal weight, or a list of the same length as x, in which case the weight for x[i] is given by w[i]. The elements of w must be nonnegative and not all zero; it is not required that they sum to 1.
Example:
Sample standard deviation of a list.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) std (s1), numer; (%o3) 2.9027400848164135
Sample standard deviation of each column of a matrix.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) std (s2); (%o3) [4.149928523480858, 3.8713998127292415, 3.9339202775348663, 5.672434260526957, 4.941970881136392]
See also functions var
and std1
.
This is the square root of the function var1
, the variance with denominator n-1.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) std1 (s1), numer; (%o3) 2.917363553109228
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) std1 (s2); (%o5) [4.170835096721089, 3.8909032097803196, 3.9537386411375555, 5.701010936401517, 4.966867617451963]
See also functions var1
and std
.
Returns the noncentral moment of order k. x must be a list or matrix.
When x is a list,
noncentral_moment
returns the noncentral moment of order k of x.
When x is a matrix,
noncentral_moment
returns a list comprising the noncentral moment of order k of each column.
w is an optional per-datum weight. w must either be 1, in which case every datum x[i] is given equal weight, or a list of the same length as x, in which case the weight for x[i] is given by w[i]. The elements of w must be nonnegative and not all zero; it is not required that they sum to 1.
The unweighted noncentral moment of order k is defined as
n ==== 1 \ k - > x n / i ==== i = 1
The weighted noncentral moment of order k is defined as
n ==== 1 \ k - > w x Z / i i ==== i = 1
where Z is the sum of the weights,
n ==== \ Z = > w / i ==== i = 1
Examples:
First noncentral moment of a list. The first noncentral moment is equal to the sample mean.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) noncentral_moment (s1, 1), numer; (%o3) 4.71
(%i4) mean (s1), numer; (%o4) 4.71
Fifth noncentral moment of each column of a matrix.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) noncentral_moment (s2, 5); (%o3) [319793.87247615046, 320532.19238924625, 391249.56213815557, 2502278.205988911, 1691881.7977422548]
See also function central_moment
.
Returns the central moment of order k. x must be a list or matrix.
When x is a list,
central_moment
returns the central moment of order k of x.
When x is a matrix,
central_moment
returns a list comprising the central moment of order k of each column.
w is an optional per-datum weight. w must either be 1, in which case every datum x[i] is given equal weight, or a list of the same length as x, in which case the weight for x[i] is given by w[i]. The elements of w must be nonnegative and not all zero; it is not required that they sum to 1.
The unweighted central moment of order k is defined as
n ==== 1 \ _ k - > (x - x) n / i ==== i = 1
The weighted central moment of order k is defined as
n ==== 1 \ _ k - > w (x - x) Z / i i ==== i = 1
where Z is the sum of the weights,
n ==== \ Z = > w / i ==== i = 1
Examples:
Second central moment of a list. The second central moment is equal to the sample variance.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) central_moment (s1, 2), numer; (%o3) 8.425899999999999
(%i4) var (s1), numer; (%o4) 8.425899999999999
Third central moment of each column of a matrix.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) central_moment (s1, 2), numer; /* the variance */ (%o3) 8.425899999999999
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) central_moment (s2, 3); (%o6) [11.29584771375004, 16.97988248298583, 5.626661952750102, 37.5986572057918, 25.85981904394192]
See also functions central_moment
and mean
.
Returns the variation coefficient,
defined as the sample standard deviation std
divided by the mean
.
Examples:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) cv (s1), numer; (%o3) 0.6162930116383044
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) cv (s2); (%o5) [0.4171411291632767, 0.38101703748061055, 0.3619561372346568, 0.3609199356430116, 0.3329249251309538]
See also functions std
and mean
.
This is the minimum value of the sample list.
When the argument is a matrix, smin
returns
a list containing the minimum values of the columns,
which are associated to statistical variables.
Examples:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) smin (s1); (%o3) 0
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) smin (s2); (%o5) [0.58, 0.5, 2.67, 5.25, 5.17]
See also function smax
.
This is the maximum value of the sample list.
When the argument is a matrix, smax
returns
a list containing the maximum values of the columns,
which are associated to statistical variables.
Examples:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) smax (s1); (%o3) 9
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) smax (s2); (%o5) [20.25, 21.46, 20.04, 29.63, 27.63]
See also function smin
.
The range is the difference between the extreme values.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) range (s1); (%o3) 9
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) range (s2); (%o5) [19.67, 20.96, 17.369999999999997, 24.38, 22.46]
This is the p-quantile, with p a number in [0, 1], of the sample list. Although there are several definitions for the sample quantile (Hyndman, R. J., Fan, Y. (1996) Sample quantiles in statistical packages. American Statistician, 50, 361-365), the one based on linear interpolation is implemented in package Package descriptive
Examples:
Input is a list. First and third quartiles are computed.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) [quantile (s1, 1/4), quantile (s1, 3/4)], numer; (%o3) [2.0, 7.25]
Input is a matrix. First quartile is computed for each column.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) quantile (s2, 1/4); (%o3) [7.2575, 7.477500000000001, 7.82, 11.28, 11.48]
Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) median (s1); 9 (%o3) - 2
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) median (s2); (%o5) [10.059999999999999, 9.855, 10.73, 15.48, 14.105]
The median is the 1/2-quantile.
See also function quantile
.
Returns the interquartile range,
defined as the difference between the third and first quartiles:
quantile(x, 3/4) - quantile(x, 1/4)
x must be a list or matrix.
When x is a matrix,
qrange
returns the interquartile range for each column.
Examples:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) qrange (s1); 21 (%o3) -- 4
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) qrange (s2); (%o5) [5.385, 5.572499999999998, 6.022500000000001, 8.729999999999999, 6.649999999999999]
See also function quantile
.
The mean deviation, defined as
n ==== 1 \ _ - > |x - x| n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mean_deviation (s1); 51 (%o3) -- 20
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) mean_deviation (s2); (%o5) [3.2879599999999987, 3.075342, 3.2390700000000003, 4.715664000000001, 4.028546000000002]
See also function mean
.
The median deviation, defined as
n ==== 1 \ - > |x - med| n / i ==== i = 1
where med
is the median of list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) median_deviation (s1); 5 (%o3) - 2
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) median_deviation (s2); (%o5) [2.75, 2.7550000000000003, 3.08, 4.315, 3.3099999999999996]
See also function mean
.
The harmonic mean, defined as
n -------- n ==== \ 1 > -- / x ==== i i = 1
Example:
(%i1) load ("descriptive")$ (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i3) harmonic_mean (y), numer; (%o3) 3.9018580276322052
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) harmonic_mean (s2); (%o5) [6.948015590052786, 7.391967752360356, 9.055658197151745, 13.441990281936924, 13.01439145898509]
See also functions mean
and geometric_mean
.
The geometric mean, defined as
/ n \ 1/n | /===\ | | ! ! | | ! ! x | | ! ! i| | i = 1 | \ /
Example:
(%i1) load ("descriptive")$ (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i3) geometric_mean (y), numer; (%o3) 4.454845412337012
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) geometric_mean (s2); (%o5) [8.82476274347979, 9.22652604739361, 10.044267571488904, 14.612741263490207, 13.96184163444275]
See also functions mean
and harmonic_mean
.
The kurtosis coefficient, defined as
n ==== 1 \ _ 4 ---- > (x - x) - 3 4 / i n s ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) kurtosis (s1), numer; (%o3) - 1.273247946514421
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) kurtosis (s2); (%o5) [- 0.2715445622195385, 0.119998784429451, - 0.42752334904828615, - 0.6405361979019522, - 0.4952382132352935]
See also functions mean
, var
and skewness
.
The skewness coefficient, defined as
n ==== 1 \ _ 3 ---- > (x - x) 3 / i n s ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) skewness (s1), numer; (%o3) 0.009196180476450424
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) skewness (s2); (%o5) [0.1580509020000978, 0.2926379232061854, 0.09242174416107717, 0.20599843481486865, 0.21425202488908313]
See also functions mean
,, var
and kurtosis
.
Pearson’s skewness coefficient, defined as
_ 3 (x - med) ----------- s
where med is the median of list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) pearson_skewness (s1), numer; (%o3) 0.21594840290938955
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) pearson_skewness (s2); (%o5) [- 0.08019976629211892, 0.2357036272952649, 0.10509040624912039, 0.12450423405923679, 0.44641817958045193]
See also functions mean
, var
and median
.
The quartile skewness coefficient, defined as
c - 2 c + c 3/4 1/2 1/4 -------------------- c - c 3/4 1/4
where c_p is the p-quantile of sample list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) quartile_skewness (s1), numer; (%o3) 0.047619047619047616
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) quartile_skewness (s2); (%o5) [- 0.040854224698235304, 0.14670255720053824, 0.033623910336239196, 0.03780068728522298, 0.2105263157894735]
See also function quantile
.
Kaplan Meier estimator of the survival, or reliability, function S(x)=1-F(x).
Data can be introduced as a list of pairs, or as a two column matrix. The first component is the observed time, and the second component a censoring index (1 = non censored, 0 = right censored).
The optional argument is the name of the variable in the returned expression, which is x by default.
Examples:
Sample as a list of pairs.
(%i1) load ("descriptive")$
(%i2) S: km([[2,1], [3,1], [5,0], [8,1]]); charfun((3 <= x) and (x < 8)) (%o2) charfun(x < 0) + ----------------------------- 2 3 charfun((2 <= x) and (x < 3)) + ------------------------------- 4 + charfun((0 <= x) and (x < 2))
(%i3) load ("draw")$
(%i4) draw2d( line_width = 3, grid = true, explicit(S, x, -0.1, 10))$
Estimate survival probabilities.
(%i1) load ("descriptive")$ (%i2) S(t):= ''(km([[2,1], [3,1], [5,0], [8,1]], t)) $
(%i3) S(6); 1 (%o3) - 2
Empirical distribution function F(x).
Data can be introduced as a list of numbers, or as an one column matrix.
The optional argument is the name of the variable in the returned expression, which is x by default.
Example:
Empirical distribution function.
(%i1) load ("descriptive")$
(%i2) F(x):= ''(cdf_empirical([1,3,3,5,7,7,7,8,9])); (%o2) F(x) := (charfun(x >= 9) + charfun(x >= 8) + 3 charfun(x >= 7) + charfun(x >= 5) + 2 charfun(x >= 3) + charfun(x >= 1))/9
(%i3) F(6); 4 (%o3) - 9
(%i4) load("draw")$
(%i5) draw2d( line_width = 3, grid = true, explicit(F(z), z, -2, 12)) $
Returns the sample covariance matrix. X must be a matrix.
The sample covariance matrix has the same number of rows and columns, both equal to the number of columns of X; each diagonal element X[i, i] is equal to the sample variance of the i’th column, and each off-diagonal element X[i, j] is equal to the sample covariance of the i’th and j’th columns.
w is an optional per-datum weight. w must either be 1, in which case every datum X[i] is given equal weight, or a list of the same length as X, in which case the weight for X[i] is given by w[i]. The elements of w must be nonnegative and not all zero; it is not required that they sum to 1.
The unweighted sample covariance is defined as
n ==== 1 \ _ _ S = - > (X - X) (X - X)' n / j j ==== j = 1
where X[j] is the j’th row of the sample matrix.
The weighted sample covariance is defined as
n ==== 1 \ _ _ S = - > w (X - X) (X - X)' Z / j j j ==== j = 1
where Z is the sum of the weights,
n ==== \ Z = > w / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) fpprintprec : 7$
(%i4) cov (s2); [ 17.22191 13.61811 14.37217 19.39624 15.42162 ] [ ] [ 13.61811 14.98774 13.30448 15.15834 14.9711 ] [ ] (%o4) [ 14.37217 13.30448 15.47573 17.32544 16.18171 ] [ ] [ 19.39624 15.15834 17.32544 32.17651 20.44685 ] [ ] [ 15.42162 14.9711 16.18171 20.44685 24.42308 ]
See also function cov1
.
The covariance matrix of the multivariate sample, defined as
n ==== 1 \ _ _ S = --- > (X - X) (X - X)' 1 n-1 / j j ==== j = 1
where X_j is the j-th row of the sample matrix.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) fpprintprec : 7$
(%i4) cov1 (s2); [ 17.39587 13.75567 14.51734 19.59216 15.5774 ] [ ] [ 13.75567 15.13913 13.43887 15.31145 15.12232 ] [ ] (%o4) [ 14.51734 13.43887 15.63205 17.50044 16.34516 ] [ ] [ 19.59216 15.31145 17.50044 32.50153 20.65338 ] [ ] [ 15.5774 15.12232 16.34516 20.65338 24.66977 ]
See also function cov
.
Function global_variances
returns a list of global variance measures:
trace(S_1)
,
trace(S_1)/p
,
determinant(S_1)
,
sqrt(determinant(S_1))
,
determinant(S_1)^(1/p)
, (defined in: Peña, D. (2002) Análisis de datos multivariantes; McGraw-Hill, Madrid.)
determinant(S_1)^(1/(2*p))
.
where p is the dimension of the multivariate random variable and S_1 the covariance matrix returned by cov1
.
Option:
'data
, default 'true
, indicates whether the input matrix contains the sample data,
in which case the covariance matrix cov1
must be calculated, or not, and then the covariance
matrix (symmetric) must be given, instead of the data.
Examples:
Calculate the global_variances
from sample data.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) global_variances (s2); (%o3) [105.33834206060595, 21.06766841212119, 12874.34690469686, 113.46517926085015, 6.636590811800794, 2.5761581496097623]
Calculate the global_variances
from the covariance matrix.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) s : cov1 (s2)$
(%i4) global_variances (s, data=false); (%o4) [105.33834206060595, 21.06766841212119, 12874.34690469686, 113.46517926085015, 6.636590811800794, 2.5761581496097623]
The correlation matrix of the multivariate sample.
Option:
'data
, default 'true
, indicates whether the input matrix contains the sample data,
in which case the covariance matrix cov1
must be calculated, or not, and then the covariance
matrix (symmetric) must be given, instead of the data.
Examples:
(%i1) load ("descriptive")$ (%i2) fpprintprec : 7 $ (%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) cor (s2); [ 1.0 0.8476339 0.8803515 0.8239624 0.7519506 ] [ ] [ 0.8476339 1.0 0.8735834 0.6902622 0.782502 ] [ ] (%o4) [ 0.8803515 0.8735834 1.0 0.7764065 0.8323358 ] [ ] [ 0.8239624 0.6902622 0.7764065 1.0 0.7293848 ] [ ] [ 0.7519506 0.782502 0.8323358 0.7293848 1.0 ]
Calculate the correlation matrix from the covariance matrix.
(%i1) load ("descriptive")$ (%i2) fpprintprec : 7 $ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) s : cov1 (s2)$
(%i5) cor (s, data=false); /* this is faster */ [ 1.0 0.8476339 0.8803515 0.8239624 0.7519506 ] [ ] [ 0.8476339 1.0 0.8735834 0.6902622 0.782502 ] [ ] (%o5) [ 0.8803515 0.8735834 1.0 0.7764065 0.8323358 ] [ ] [ 0.8239624 0.6902622 0.7764065 1.0 0.7293848 ] [ ] [ 0.7519506 0.782502 0.8323358 0.7293848 1.0 ]
Function list_correlations
returns a list of correlation measures:
-1 ij S = (s ) 1 i,j = 1,2,...,p
2 1 R = 1 - ------- i ii s s ii
being an indicator of the goodness of fit of the linear multivariate regression model on X_i when the rest of variables are used as regressors.
ij s r = - ------------ ij.rest / ii jj\ 1/2 |s s | \ /
Option:
'data
, default 'true
, indicates whether the input matrix contains the sample data,
in which case the covariance matrix cov1
must be calculated, or not, and then the covariance
matrix (symmetric) must be given, instead of the data.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) z : list_correlations (s2)$ (%i4) fpprintprec : 5$
(%i5) precision_matrix: z[1]; (%o5) [ 0.38486 - 0.13856 - 0.15626 - 0.10239 0.031179 ] [ ] [ - 0.13856 0.34107 - 0.15233 0.038447 - 0.052842 ] [ ] [ - 0.15626 - 0.15233 0.47296 - 0.024816 - 0.10054 ] [ ] [ - 0.10239 0.038447 - 0.024816 0.10937 - 0.034033 ] [ ] [ 0.031179 - 0.052842 - 0.10054 - 0.034033 0.14834 ]
(%i6) multiple_correlation_vector: z[2]; (%o6) [0.85063, 0.80634, 0.86474, 0.71867, 0.72675]
(%i7) partial_correlation_matrix: z[3]; [ - 1.0 0.38244 0.36627 0.49908 - 0.13049 ] [ ] [ 0.38244 - 1.0 0.37927 - 0.19907 0.23492 ] [ ] (%o7) [ 0.36627 0.37927 - 1.0 0.10911 0.37956 ] [ ] [ 0.49908 - 0.19907 0.10911 - 1.0 0.26719 ] [ ] [ - 0.13049 0.23492 0.37956 0.26719 - 1.0 ]
Calculates the principal components of a multivariate sample. Principal components are used in multivariate statistical analysis to reduce the dimensionality of the sample.
Option:
'data
, default 'true
, indicates whether the input matrix contains the sample data,
in which case the covariance matrix cov1
must be calculated, or not, and then the covariance
matrix (symmetric) must be given, instead of the data.
The output of function principal_components
is a list with the following results:
Examples:
In this sample, the first component explains 83.13 per cent of total variance.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) fpprintprec:4 $ (%i4) res: principal_components(s2); 0 errors, 0 warnings (%o4) [[87.57, 8.753, 5.515, 1.889, 1.613], [83.13, 8.31, 5.235, 1.793, 1.531],
[ .4149 .03379 - .4757 - 0.581 - .5126 ] [ ] [ 0.369 - .3657 - .4298 .7237 - .1469 ] [ ] [ .3959 - .2178 - .2181 - .2749 .8201 ]] [ ] [ .5548 .7744 .1857 .2319 .06498 ] [ ] [ .4765 - .4669 0.712 - .09605 - .1969 ]
(%i5) /* accumulated percentages */ block([ap: copy(res[2])], for k:2 thru length(ap) do ap[k]: ap[k]+ap[k-1], ap); (%o5) [83.13, 91.44, 96.68, 98.47, 100.0] (%i6) /* sample dimension */ p: length(first(res)); (%o6) 5 (%i7) /* plot percentages to select number of principal components for further work */ draw2d( fill_density = 0.2, apply(bars, makelist([k, res[2][k], 1/2], k, p)), points_joined = true, point_type = filled_circle, point_size = 3, points(makelist([k, res[2][k]], k, p)), xlabel = "Variances", ylabel = "Percentages", xtics = setify(makelist([concat("PC",k),k], k, p))) $
In case the covariance matrix is known, it can be passed to the function,
but option data=false
must be used.
(%i1) load ("descriptive")$ (%i2) S: matrix([1,-2,0],[-2,5,0],[0,0,2]); [ 1 - 2 0 ] [ ] (%o2) [ - 2 5 0 ] [ ] [ 0 0 2 ] (%i3) fpprintprec:4 $ (%i4) /* the argument is a covariance matrix */ res: principal_components(S, data=false); 0 errors, 0 warnings [ - .3827 0.0 .9239 ] [ ] (%o4) [[5.828, 2.0, .1716], [72.86, 25.0, 2.145], [ .9239 0.0 .3827 ]] [ ] [ 0.0 1.0 0.0 ] (%i5) /* transformation to get the principal components from original records */ matrix([a1,b2,c3],[a2,b2,c2]).last(res); [ .9239 b2 - .3827 a1 1.0 c3 .3827 b2 + .9239 a1 ] (%o5) [ ] [ .9239 b2 - .3827 a2 1.0 c2 .3827 b2 + .9239 a2 ]
Previous: Functions and Variables for descriptive statistics, Up: Package descriptive [Contents][Index]
Plots bars diagrams for discrete statistical variables, both for one or multiple samples.
data can be a list of outcomes representing one sample, or a matrix of m rows and n columns, representing n samples of size m each.
Available options are:
3/4
): relative width of rectangles. This
value must be in the range [0,1]
.
clustered
): indicates how multiple samples are
shown. Valid values are: clustered
and stacked
.
1
): a positive integer number representing
the gap between two consecutive groups of bars.
[]
): a list of colors for multiple samples.
When there are more samples than specified colors, the extra necessary colors
are chosen at random. See color
to learn more about them.
absolute
): indicates the scale of the
ordinates. Possible values are: absolute
, relative
,
and percent
.
orderlessp
): possible values are orderlessp
or ordergreatp
,
indicating how statistical outcomes should be ordered on the x-axis.
[]
): a list with the strings to be used in the legend.
When the list length is other than 0 or the number of samples, an error message is returned.
0
): indicates where the plot begins to be plotted on the
x axis.
draw
options, except xtics
, which is
internally assigned by barsplot
.
If you want to set your own values for this option or want to build
complex scenes, make use of barsplot_description
. See example below.
key
, color_draw
,
fill_color
, fill_density
and line_width
.
See also
barsplot
.
There is also a function wxbarsplot
for creating embedded
histograms in interfaces wxMaxima and iMaxima. barsplot
in a
multiplot context.
Examples:
Univariate sample in matrix form. Absolute frequencies.
(%i1) load ("descriptive")$ (%i2) m : read_matrix (file_search ("biomed.data"))$
(%i3) barsplot( col(m,2), title = "Ages", xlabel = "years", box_width = 1/2, fill_density = 3/4)$
Two samples of different sizes, with relative frequencies and user declared colors.
(%i1) load ("descriptive")$ (%i2) l1:makelist(random(10),k,1,50)$ (%i3) l2:makelist(random(10),k,1,100)$
(%i4) barsplot( l1,l2, box_width = 1, fill_density = 1, bars_colors = [black, grey], frequency = relative, sample_keys = ["A", "B"])$
Four non numeric samples of equal size.
(%i1) load ("descriptive")$
(%i2) barsplot( makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), title = "Asking for something to four groups", ylabel = "# of individuals", groups_gap = 3, fill_density = 0.5, ordering = ordergreatp)$
Stacked bars.
(%i1) load ("descriptive")$
(%i2) barsplot( makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), title = "Asking for something to four groups", ylabel = "# of individuals", grouping = stacked, fill_density = 0.5, ordering = ordergreatp)$
For bars diagrams related options, see barsplot
of package Package draw
See also functions histogram
and piechart
.
Function barsplot_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects.
Example: barsplot
in a multiplot context.
(%i1) load ("descriptive")$ (%i2) l1:makelist(random(10),k,1,50)$ (%i3) l2:makelist(random(10),k,1,100)$ (%i4) bp1 : barsplot_description( l1, box_width = 1, fill_density = 0.5, bars_colors = [blue], frequency = relative)$ (%i5) bp2 : barsplot_description( l2, box_width = 1, fill_density = 0.5, bars_colors = [red], frequency = relative)$ (%i6) draw(gr2d(bp1), gr2d(bp2))$
This function plots box-and-whisker diagrams. Argument data can be a list,
which is not of great interest, since these diagrams are mainly used for
comparing different samples, or a matrix, so it is possible to compare
two or more components of a multivariate statistical variable.
But it is also allowed data to be a list of samples with
possible different sample sizes, in fact this is the only function
in package descriptive
that admits this type of data structure.
The box is plotted from the first quartile to the third, with an horizontal
segment situated at the second quartile or median. By default, lower and
upper whiskers are plotted at the minimum and maximum values,
respectively. Option range can be used to indicate that values greater
than quantile(x,3/4)+range*(quantile(x,3/4)-quantile(x,1/4))
or
less than quantile(x,1/4)-range*(quantile(x,3/4)-quantile(x,1/4))
must be considered as outliers, in which case they are plotted as
isolated points, and the whiskers are located at the extremes of the rest of
the sample.
Available options are:
3/4
): relative width of boxes.
This value must be in the range [0,1]
.
vertical
): possible values: vertical
and horizontal
.
inf
): positive coefficient of the interquartilic range
to set outliers boundaries.
1
): circle size for isolated outliers.
draw
options, except points_joined
, point_size
, point_type
,
xtics
, ytics
, xrange
, and yrange
, which are
internally assigned by boxplot
.
If you want to set your own values for this options or want to build
complex scenes, make use of boxplot_description
.
draw
options: key
, color
,
and line_width
.
There is also a function wxboxplot
for creating embedded
histograms in interfaces wxMaxima and iMaxima.
Examples:
Box-and-whisker diagram from a multivariate sample.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix(file_search("wind.data"))$
(%i3) boxplot(s2, box_width = 0.2, title = "Windspeed in knots", xlabel = "Stations", color = red, line_width = 2)$
Box-and-whisker diagram from three samples of different sizes.
(%i1) load ("descriptive")$
(%i2) A : [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2], [8, 10, 7, 9, 12, 8, 10], [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
(%i3) boxplot (A, box_orientation = horizontal)$
Option range can be used to handle outliers.
(%i1) load ("descriptive")$ B: [[7, 15, 5, 8, 6, 5, 7, 3, 1], [10, 8, 12, 8, 11, 9, 20], [23, 17, 19, 7, 22, 19]] $ boxplot (B, range=1)$ boxplot (B, range=1.5, box_orientation = horizontal)$ draw2d( boxplot_description( B, range = 1.5, line_width = 3, outliers_size = 2, color = red, background_color = light_gray), xtics = {["Low",1],["Medium",2],["High",3]}) $
Function boxplot_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects.
Constructs and displays a histogram from a data sample. Data must be stored as a list of numbers, or a matrix of one row or one column.
Optional arguments:
nclasses
(default, 10):
the number of classes (also called bins) in the histogram,
or a list of two numbers (the least and greatest values included in the histogram),
or a list of three numbers (the least and greatest values included in the histogram, and the number of classes),
or a set containing the endpoints of the class intervals,
or a symbol specifying the name of one of three algorithms to automatically determine the number of classes:
fd
(Ref. [1]), scott
(Ref. [2]), or sturges
(Ref. [3]).
A class interval excludes its left endpoint and includes its right endpoint, except for the first interval, which includes both the left and right endpoints. It is assumed that class intervals are contiguous. That is, the right endpoint of one interval is equal to the left endpoint of the next.
frequency
(default, absolute
): indicates the scale of the vertical axis.
Possible values are: absolute
(heights of bars add up to number of data),
relative
(heights of bars add up to 1),
percent
(heights of bars add up to 100),
and density
(total area of histogram is 1).
htics
(default, auto
): format of tic marks on the horizontal axis.
Possible values are: auto
(tics are placed automatically),
endpoints
(tics are placed at the divisions between classes),
intervals
(classes are labeled with the corresponding intervals),
or a list of labels, one for each class.
draw
options, except xrange
, yrange
,
and xtics
, which are internally assigned by histogram
.
If you want to set your own values for these options, make use of
histogram_description
.
key
,
fill_color
, fill_density
, and line_width
.
Note that the outlines of bars,
as well as the interior of bars when fill_density
is nonzero,
are drawn with fill_color
, not color
.
histogram
honors the global option histogram_skyline
.
When histogram_skyline
is true
,
histogram
and histogram_description
construct "skyline" plots,
which shows the outline of the histogram bars,
instead of drawing all the vertical segments.
Otherwise (the default), histograms are displayed with bars showing vertical segments.
There is also a function wxhistogram
for creating embedded
histograms in interfaces wxMaxima and iMaxima.
See also continuous_freq
,
which, like histogram
,
counts data in intervals,
but returns the counts instead of displaying a graphic representation.
See also barsplot
.
Examples:
A simple histogram with eight classes:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) histogram ( s1, nclasses = 8, title = "pi digits", xlabel = "digits", ylabel = "Absolute frequency", fill_color = grey, fill_density = 0.6)$
Setting the limits of the histogram to -2 and 12, with 3 classes. Also, we introduce predefined tics:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) histogram ( s1, nclasses = [-2,12,3], htics = ["A", "B", "C"], terminal = png, fill_color = "#23afa0", fill_density = 0.6)$
Bounds for varying class widths.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) histogram (s1, nclasses = {0,3,6,7,11})$
Freedman-Diaconis formula for the number of classes.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) histogram(s1, nclasses=fd) $
References:
[1] Freedman, D., and Diaconis, P. (1981) On the histogram as a density estimator: L_2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 453-476.
[2] Scott, D. W. (1979) On optimal and data-based histograms. Biometrika 66, 605-610.
[3] Sturges, H. A. (1926) The choice of a class interval. Journal of the American Statistical Association 21, 65-66.
Creates a graphic object which represents a histogram.
Such an object is suitable for creating complex scenes together with other graphic objects,
to be displayed by draw2d
.
histogram_description
takes the same arguments
as the stand-alone function histogram
.
See histogram
for more information.
Example:
We make use of histogram_description
for setting
xrange
and adding an explicit curve into the scene:
(%i1) load ("descriptive")$ (%i2) ( load("distrib"), m: 14, s: 2, s2: random_normal(m, s, 1000) ) $ (%i3) draw2d( grid = true, xrange = [5, 25], histogram_description( s2, nclasses = 9, frequency = density, fill_density = 0.5), explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s))$
Default value: false
When histogram_skyline
is true
,
histogram
and histogram_description
construct "skyline" plots,
which shows the outline of the histogram bars,
instead of drawing all the vertical segments.
The outline is drawn with the current fill_color
(not the current color
).
The interior of the histogram is filled with fill_color
,
but only if fill_density
is nonzero.
Otherwise, histograms are displayed with bars showing vertical segments.
Examples:
Construct a skyline histogram, and an ordinary histogram for comparison, on the same plot.
(%i1) load ("descriptive") $ (%i2) L: read_list (file_search ("pidigits.data")) $ (%i3) histogram_skyline: true $ (%i4) skyline_hist: histogram_description (L) $ (%i5) histogram_skyline: false $ (%i6) ordinary_hist: histogram_description (L) $ (%i7) draw (gr2d (skyline_hist), gr2d (ordinary_hist)) $
Continuing the preceding example.
Set display options for fill_color
and fill_density
.
(%i8) histogram_skyline: true $ (%i9) skyline_hist: histogram_description (L, fill_color = blue, fill_density = 0.2) $ (%i10) histogram_skyline: false $ (%i11) ordinary_hist: histogram_description (L, fill_color = blue, fill_density = 0.2) $ (%i12) draw (gr2d (skyline_hist), gr2d (ordinary_hist)) $
Similar to barsplot
, but plots sectors instead of rectangles.
Available options are:
[]
): a list of colors for sectors.
When there are more sectors than specified colors, the extra necessary colors
are chosen at random. See color
to learn more about them.
[0,0]
): diagram’s center.
1
): diagram’s radius.
draw
options, except key
, which is
internally assigned by piechart
.
If you want to set your own values for this option or want to build
complex scenes, make use of piechart_description
.
draw
options: key
, color
,
fill_density
and line_width
. See also
ellipse
There is also a function wxpiechart
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) piechart( s1, xrange = [-1.1, 1.3], yrange = [-1.1, 1.1], title = "Digit frequencies in pi")$
See also function barsplot
.
Function piechart_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects.
Plots scatter diagrams both for univariate (list) and multivariate (matrix) samples.
Available options are the same admitted by histogram
.
There is also a function wxscatterplot
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Examples:
Univariate scatter diagram from a simulated Gaussian sample.
(%i1) load ("descriptive")$ (%i2) load ("distrib")$
(%i3) scatterplot( random_normal(0,1,200), xaxis = true, point_size = 2, dimensions = [600,150])$
Two dimensional scatter plot.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) scatterplot( submatrix(s2, 1,2,3), title = "Data from stations #4 and #5", point_type = diamant, point_size = 2, color = blue)$
Three dimensional scatter plot.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) scatterplot(submatrix (s2, 1,2), nclasses=4)$
Five dimensional scatter plot, with five classes histograms.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) scatterplot( s2, nclasses = 5, frequency = relative, fill_color = blue, fill_density = 0.3, xtics = 5)$
For plotting isolated or line-joined points in two and three dimensions,
see points
. See also histogram
.
Function scatterplot_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects.
Plots star diagrams for discrete statistical variables, both for one or multiple samples.
data can be a list of outcomes representing one sample, or a matrix of m rows and n columns, representing n samples of size m each.
Available options are:
[]
): a list of colors for multiple samples.
When there are more samples than specified colors, the extra necessary colors
are chosen at random. See color
to learn more about them.
absolute
): indicates the scale of the
radii. Possible values are: absolute
and relative
.
orderlessp
): possible values are orderlessp
or ordergreatp
,
indicating how statistical outcomes should be ordered.
[]
): a list with the strings to be used in the legend.
When the list length is other than 0 or the number of samples, an error message is returned.
[0,0]
): diagram’s center.
1
): diagram’s radius.
draw
options, except points_joined
, point_type
,
and key
, which are internally assigned by starplot
.
If you want to set your own values for this options or want to build
complex scenes, make use of starplot_description
.
draw
option: line_width
.
There is also a function wxstarplot
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Example:
Plot based on absolute frequencies. Location and radius defined by the user.
(%i1) load ("descriptive")$ (%i2) l1: makelist(random(10),k,1,50)$ (%i3) l2: makelist(random(10),k,1,200)$
(%i4) starplot( l1, l2, stars_colors = [blue,red], sample_keys = ["1st sample", "2nd sample"], star_center = [1,2], star_radius = 4, proportional_axes = xy, line_width = 2 ) $
Function starplot_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects.
Plots stem and leaf diagrams.
The only available option is:
1
): indicates the unit of the leaves; must be a
power of 10.
Example:
(%i1) load ("descriptive")$ (%i2) load("distrib")$
(%i3) stemplot( random_normal(15, 6, 100), leaf_unit = 0.1); -5|4 0|37 1|7 3|6 4|4 5|4 6|57 7|0149 8|3 9|1334588 10|07888 11|01144467789 12|12566889 13|24778 14|047 15|223458 16|4 17|11557 18|000247 19|4467799 20|00 21|1 22|2335 23|01457 24|12356 25|455 27|79 key: 6|3 = 6.3 (%o3) done
Next: Package diag, Previous: Package contrib_ode [Contents][Index]