Nächste: diag, Vorige: contrib_ode [Inhalt][Index]
Nächste: Functions and Variables for data manipulation, Vorige: Package descriptive, Nach oben: Package descriptive [Inhalt][Index]
Package descriptive
contains a set of functions for making descriptive
statistical computations and graphing. Together with the source code there are
three data sets in your Maxima tree: pidigits.data
, wind.data
and
biomed.data
.
Any statistics manual can be used as a reference to the functions in package
descriptive
.
For comments, bugs or suggestions, please contact me at ’mario AT edu DOT xunta DOT es’.
Here is a simple example on how the descriptive functions in descriptive
do they work, depending on the nature of their arguments, lists or matrices,
(%i1) load ("descriptive")$
(%i2) /* univariate sample */ mean ([a, b, c]); c + b + a (%o2) --------- 3
(%i3) matrix ([a, b], [c, d], [e, f]); [ a b ] [ ] (%o3) [ c d ] [ ] [ e f ]
(%i4) /* multivariate sample */ mean (%); e + c + a f + d + b (%o4) [---------, ---------] 3 3
Note that in multivariate samples the mean is calculated for each column.
In case of several samples with possible different sizes, the Maxima function
map
can be used to get the desired results for each sample,
(%i1) load ("descriptive")$
(%i2) map (mean, [[a, b, c], [d, e]]); c + b + a e + d (%o2) [---------, -----] 3 2
In this case, two samples of sizes 3 and 2 were stored into a list.
Univariate samples must be stored in lists like
(%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]; (%o1) [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
and multivariate samples in matrices as in
(%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88], [10.58, 6.63], [13.33, 13.25], [13.21, 8.12]); [ 13.17 9.29 ] [ ] [ 14.71 16.88 ] [ ] [ 18.5 16.88 ] (%o1) [ ] [ 10.58 6.63 ] [ ] [ 13.33 13.25 ] [ ] [ 13.21 8.12 ]
In this case, the number of columns equals the random variable dimension and the number of rows is the sample size.
Data can be introduced by hand, but big samples are usually stored in plain
text files. For example, file pidigits.data
contains the first 100
digits of number %pi
:
3 1 4 1 5 9 2 6 5 3 ...
In order to load these digits in Maxima,
(%i1) s1 : read_list (file_search ("pidigits.data"))$
(%i2) length (s1); (%o2) 100
On the other hand, file wind.data
contains daily average wind speeds at
5 meteorological stations in the Republic of Ireland (This is part of a data
set taken at 12 meteorological stations. The original file is freely
downloadable from the StatLib Data Repository and its analysis is discused in
Haslett, J., Raftery, A. E. (1989) Space-time Modelling with Long-memory
Dependence: Assessing Ireland’s Wind Power Resource, with Discussion. Applied
Statistics 38, 1-50). This loads the data:
(%i1) s2 : read_matrix (file_search ("wind.data"))$
(%i2) length (s2); (%o2) 100
(%i3) s2 [%]; /* last record */ (%o3) [3.58, 6.0, 4.58, 7.62, 11.25]
Some samples contain non numeric data. As an example, file biomed.data
(which is part of another bigger one downloaded from the StatLib Data
Repository) contains four blood measures taken from two groups of patients,
A
and B
, of different ages,
(%i1) s3 : read_matrix (file_search ("biomed.data"))$
(%i2) length (s3); (%o2) 100
(%i3) s3 [1]; /* first record */ (%o3) [A, 30, 167.0, 89.0, 25.6, 364]
The first individual belongs to group A
, is 30 years old and his/her
blood measures were 167.0, 89.0, 25.6 and 364.
One must take care when working with categorical data. In the next example,
symbol a
is asigned a value in some previous moment and then a sample
with categorical value a
is taken,
(%i1) a : 1$
(%i2) matrix ([a, 3], [b, 5]); [ 1 3 ] (%o2) [ ] [ b 5 ]
Nächste: Functions and Variables for descriptive statistics, Vorige: Introduction to descriptive, Nach oben: Package descriptive [Inhalt][Index]
Builds a sample from a table of absolute frequencies. The input table can be a matrix or a list of lists, all of them of equal size. The number of columns or the length of the lists must be greater than 1. The last element of each row or list is interpreted as the absolute frequency. The output is always a sample in matrix form.
Examples:
Univariate frequency table.
(%i1) load ("descriptive")$ (%i2) sam1: build_sample([[6,1], [j,2], [2,1]]); [ 6 ] [ ] [ j ] (%o2) [ ] [ j ] [ ] [ 2 ] (%i3) mean(sam1); 2 j + 8 (%o3) [-------] 4 (%i4) barsplot(sam1) $
Multivariate frequency table.
(%i1) load ("descriptive")$ (%i2) sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ; [ 6 3 ] [ ] [ 5 6 ] [ ] [ 5 6 ] (%o2) [ ] [ u 2 ] [ ] [ 6 8 ] [ ] [ 6 8 ] (%i3) cov(sam2); [ 2 2 ] [ u + 158 (u + 28) 2 u + 174 11 (u + 28) ] [ -------- - --------- --------- - ----------- ] (%o3) [ 6 36 6 12 ] [ ] [ 2 u + 174 11 (u + 28) 21 ] [ --------- - ----------- -- ] [ 6 12 4 ] (%i4) barsplot(sam2, grouping=stacked) $
The argument of continuous_freq
must be a list of numbers.
Divides the range in intervals and counts how many values
are inside them. The second argument is optional and either
equals the number of classes we want, 10
by default, or
equals a list containing the class limits and the number of
classes we want, or a list containing only the limits.
Argument list must be a list of (2 or 3) real numbers.
If sample values are all equal, this function returns only
one class of amplitude 2.
Examples:
Optional argument indicates the number of classes we want.
The first list in the output contains the interval limits, and
the second the corresponding counts: there are 16 digits inside
the interval [0, 1.8]
, 24 digits in (1.8, 3.6]
, and so on.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) continuous_freq (s1, 5); (%o3) [[0, 1.8, 3.6, 5.4, 7.2, 9.0], [16, 24, 18, 17, 25]]
Optional argument indicates we want 7 classes with limits
-2
and 12
:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) continuous_freq (s1, [-2,12,7]); (%o3) [[- 2, 0, 2, 4, 6, 8, 10, 12], [8, 20, 22, 17, 20, 13, 0]]
Optional argument indicates we want the default number of classes with limits
-2
and 12
:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) continuous_freq (s1, [-2,12]);
3 4 11 18 32 39 46 53 (%o3) [[- 2, - -, -, --, --, 5, --, --, --, --, 12], 5 5 5 5 5 5 5 5 [0, 8, 20, 12, 18, 9, 8, 25, 0, 0]]
Counts absolute frequencies in discrete samples, both numeric and categorical. Its unique argument is a list,
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) discrete_freq (s1);
(%o3) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
The first list gives the sample values and the second their absolute
frequencies. Commands ? col
and ? transpose
should help you to
understand the last input.
This is a sort of variant of the Maxima submatrix
function. The first
argument is the data matrix, the second is a predicate function and optional
additional arguments are the numbers of the columns to be taken. Its behaviour
is better understood with examples.
These are multivariate records in which the wind speed in the first
meteorological station were greater than 18. See that in the lambda expression
the i-th component is refered to as v[i]
.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) subsample (s2, lambda([v], v[1] > 18));
[ 19.38 15.37 15.12 23.09 25.25 ] [ ] [ 18.29 18.66 19.08 26.08 27.63 ] (%o3) [ ] [ 20.25 21.46 19.95 27.71 23.38 ] [ ] [ 18.79 18.96 14.46 26.38 21.84 ]
In the following example, we request only the first, second and fifth components of those records with wind speeds greater or equal than 16 in station number 1 and less than 25 knots in station number 4. The sample contains only data from stations 1, 2 and 5. In this case, the predicate function is defined as an ordinary Maxima function.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) g(x):= x[1] >= 16 and x[4] < 25$ (%i4) subsample (s2, g, 1, 2, 5);
[ 19.38 15.37 25.25 ] [ ] [ 17.33 14.67 19.58 ] (%o4) [ ] [ 16.92 13.21 21.21 ] [ ] [ 17.25 18.46 23.87 ]
Here is an example with the categorical variables of biomed.data
.
We want the records corresponding to those patients in group B
who are older than 38 years.
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) h(u):= u[1] = B and u[2] > 38 $ (%i4) subsample (s3, h);
[ B 39 28.0 102.3 17.1 146 ] [ ] [ B 39 21.0 92.4 10.3 197 ] [ ] [ B 39 23.0 111.5 10.0 133 ] [ ] [ B 39 26.0 92.6 12.3 196 ] (%o4) [ ] [ B 39 25.0 98.7 10.0 174 ] [ ] [ B 39 21.0 93.2 5.9 181 ] [ ] [ B 39 18.0 95.0 11.3 66 ] [ ] [ B 39 39.0 88.5 7.6 168 ]
Probably, the statistical analysis will involve only the blood measures,
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) subsample (s3, lambda([v], v[1] = B and v[2] > 38), 3, 4, 5, 6);
[ 28.0 102.3 17.1 146 ] [ ] [ 21.0 92.4 10.3 197 ] [ ] [ 23.0 111.5 10.0 133 ] [ ] [ 26.0 92.6 12.3 196 ] (%o3) [ ] [ 25.0 98.7 10.0 174 ] [ ] [ 21.0 93.2 5.9 181 ] [ ] [ 18.0 95.0 11.3 66 ] [ ] [ 39.0 88.5 7.6 168 ]
This is the multivariate mean of s3
,
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) mean (s3);
65 B + 35 A 317 6 NA + 8144.999999999999 (%o3) [-----------, ---, 87.178, ------------------------, 100 10 100 3 NA + 19587 18.123, ------------] 100
Here, the first component is meaningless, since A
and B
are
categorical, the second component is the mean age of individuals in rational
form, and the fourth and last values exhibit some strange behaviour. This is
because symbol NA
is used here to indicate non available data,
and the two means are nonsense. A possible solution would be to take out from
the matrix those rows with NA
symbols, although this deserves some
loss of information.
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) g(v):= v[4] # NA and v[6] # NA $ (%i4) mean (subsample (s3, g, 3, 4, 5, 6));
(%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813, 2514 ----] 13
Nächste: Functions and Variables for specific multivariate descriptive statistics, Vorige: Functions and Variables for data manipulation, Nach oben: Package descriptive [Inhalt][Index]
This is the sample mean, defined as
n ==== _ 1 \ x = - > x n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mean (s1); 471 (%o3) --- 100
(%i4) %, numer; (%o4) 4.71
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) mean (s2); (%o6) [9.9485, 10.1607, 10.8685, 15.7166, 14.8441]
This is the sample variance, defined as
n ==== 2 1 \ _ 2 s = - > (x - x) n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) var (s1), numer; (%o3) 8.425899999999999
See also function var1
.
This is the sample variance, defined as
n ==== 1 \ _ 2 --- > (x - x) n-1 / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) var1 (s1), numer; (%o3) 8.5110101010101
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) var1 (s2); (%o5) [17.39586540404041, 15.13912778787879, 15.63204924242424, 32.50152569696971, 24.66977392929294]
See also function var
.
This is the the square root of function var
, the variance with
denominator n.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) std (s1), numer; (%o3) 2.902740084816414
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) std (s2); (%o5) [4.149928523480858, 3.871399812729241, 3.933920277534866, 5.672434260526957, 4.941970881136392]
See also functions var
and std1
.
This is the the square root of function var1
, the variance with
denominator n-1.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) std1 (s1), numer; (%o3) 2.917363553109228
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) std1 (s2); (%o5) [4.170835096721089, 3.89090320978032, 3.953738641137555, 5.701010936401517, 4.966867617451963]
See also functions var1
and std
.
The non central moment of order k, defined as
n ==== 1 \ k - > x n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) noncentral_moment (s1, 1), numer; /* the mean */ (%o3) 4.71
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) noncentral_moment (s2, 5); (%o6) [319793.8724761505, 320532.1923892463, 391249.5621381556, 2502278.205988911, 1691881.797742255]
See also function central_moment
.
The central moment of order k, defined as
n ==== 1 \ _ k - > (x - x) n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) central_moment (s1, 2), numer; /* the variance */ (%o3) 8.425899999999999
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) central_moment (s2, 3); (%o6) [11.29584771375004, 16.97988248298583, 5.626661952750102, 37.5986572057918, 25.85981904394192]
See also functions central_moment
and mean
.
The variation coefficient is the quotient between the sample standard deviation
(std
) and the mean
,
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) cv (s1), numer; (%o3) .6193977819764815
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) cv (s2); (%o5) [.4192426091090204, .3829365309260502, 0.363779605385983, .3627381836021478, .3346021393989506]
See also functions std
and mean
.
This is the minimum value of the sample list,
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mini (s1); (%o3) 0
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) mini (s2); (%o5) [0.58, 0.5, 2.67, 5.25, 5.17]
See also function maxi
.
This is the maximum value of the sample list,
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) maxi (s1); (%o3) 9
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) maxi (s2); (%o5) [20.25, 21.46, 20.04, 29.63, 27.63]
See also function mini
.
The range is the difference between the extreme values.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) range (s1); (%o3) 9
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) range (s2); (%o5) [19.67, 20.96, 17.37, 24.38, 22.46]
This is the p-quantile, with p a number in [0, 1], of the
sample list. Although there are several definitions for the sample
quantile (Hyndman, R. J., Fan, Y. (1996) Sample quantiles in statistical
packages. American Statistician, 50, 361-365), the one based on linear
interpolation is implemented in package descriptive
.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) /* 1st and 3rd quartiles */ [quantile (s1, 1/4), quantile (s1, 3/4)], numer; (%o3) [2.0, 7.25]
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) quantile (s2, 1/4); (%o5) [7.2575, 7.477500000000001, 7.82, 11.28, 11.48]
Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) median (s1); 9 (%o3) - 2
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) median (s2); (%o5) [10.06, 9.855, 10.73, 15.48, 14.105]
The median is the 1/2-quantile.
See also function quantile
.
The interquartilic range is the difference between the third and first
quartiles, quantile(list,3/4) - quantile(list,1/4)
,
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) qrange (s1); 21 (%o3) -- 4
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) qrange (s2); (%o5) [5.385, 5.572499999999998, 6.022500000000001, 8.729999999999999, 6.649999999999999]
See also function quantile
.
The mean deviation, defined as
n ==== 1 \ _ - > |x - x| n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mean_deviation (s1); 51 (%o3) -- 20
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) mean_deviation (s2); (%o5) [3.287959999999999, 3.075342, 3.23907, 4.715664000000001, 4.028546000000002]
See also function mean
.
The median deviation, defined as
n ==== 1 \ - > |x - med| n / i ==== i = 1
where med
is the median of list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) median_deviation (s1); 5 (%o3) - 2
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) median_deviation (s2); (%o5) [2.75, 2.755, 3.08, 4.315, 3.31]
See also function mean
.
The harmonic mean, defined as
n -------- n ==== \ 1 > -- / x ==== i i = 1
Example:
(%i1) load ("descriptive")$ (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i3) harmonic_mean (y), numer; (%o3) 3.901858027632205
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) harmonic_mean (s2); (%o5) [6.948015590052786, 7.391967752360356, 9.055658197151745, 13.44199028193692, 13.01439145898509]
See also functions mean
and geometric_mean
.
The geometric mean, defined as
/ n \ 1/n | /===\ | | ! ! | | ! ! x | | ! ! i| | i = 1 | \ /
Example:
(%i1) load ("descriptive")$ (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i3) geometric_mean (y), numer; (%o3) 4.454845412337012
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) geometric_mean (s2); (%o5) [8.82476274347979, 9.22652604739361, 10.0442675714889, 14.61274126349021, 13.96184163444275]
See also functions mean
and harmonic_mean
.
The kurtosis coefficient, defined as
n ==== 1 \ _ 4 ---- > (x - x) - 3 4 / i n s ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) kurtosis (s1), numer; (%o3) - 1.273247946514421
(%i4) s2 : read_matrix (file_search ("wind.data"))$ (%i5) kurtosis (s2); (%o5) [- .2715445622195385, 0.119998784429451, - .4275233490482861, - .6405361979019522, - .4952382132352935]
See also functions mean
, var
and skewness
.
The skewness coefficient, defined as
n ==== 1 \ _ 3 ---- > (x - x) 3 / i n s ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) skewness (s1), numer; (%o3) .009196180476450424
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) skewness (s2); (%o5) [.1580509020000978, .2926379232061854, .09242174416107717, .2059984348148687, .2142520248890831]
See also functions mean
, var
and kurtosis
.
Pearson’s skewness coefficient, defined as
_ 3 (x - med) ----------- s
where med is the median of list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) pearson_skewness (s1), numer; (%o3) .2159484029093895
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) pearson_skewness (s2); (%o5) [- .08019976629211892, .2357036272952649, .1050904062491204, .1245042340592368, .4464181795804519]
See also functions mean
, var
and median
.
The quartile skewness coefficient, defined as
c - 2 c + c 3/4 1/2 1/4 -------------------- c - c 3/4 1/4
where c_p is the p-quantile of sample list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) quartile_skewness (s1), numer; (%o3) .04761904761904762
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) quartile_skewness (s2); (%o5) [- 0.0408542246982353, .1467025572005382, 0.0336239103362392, .03780068728522298, .2105263157894735]
See also function quantile
.
Nächste: Functions and Variables for statistical graphs, Vorige: Functions and Variables for descriptive statistics, Nach oben: Package descriptive [Inhalt][Index]
The covariance matrix of the multivariate sample, defined as
n ==== 1 \ _ _ S = - > (X - X) (X - X)' n / j j ==== j = 1
where X_j is the j-th row of the sample matrix.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) fpprintprec : 7$ /* change precision for pretty output */
(%i4) cov (s2); [ 17.22191 13.61811 14.37217 19.39624 15.42162 ] [ ] [ 13.61811 14.98774 13.30448 15.15834 14.9711 ] [ ] (%o4) [ 14.37217 13.30448 15.47573 17.32544 16.18171 ] [ ] [ 19.39624 15.15834 17.32544 32.17651 20.44685 ] [ ] [ 15.42162 14.9711 16.18171 20.44685 24.42308 ]
See also function cov1
.
The covariance matrix of the multivariate sample, defined as
n ==== 1 \ _ _ S = --- > (X - X) (X - X)' 1 n-1 / j j ==== j = 1
where X_j is the j-th row of the sample matrix.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) fpprintprec : 7$ /* change precision for pretty output */
(%i4) cov1 (s2); [ 17.39587 13.75567 14.51734 19.59216 15.5774 ] [ ] [ 13.75567 15.13913 13.43887 15.31145 15.12232 ] [ ] (%o4) [ 14.51734 13.43887 15.63205 17.50044 16.34516 ] [ ] [ 19.59216 15.31145 17.50044 32.50153 20.65338 ] [ ] [ 15.5774 15.12232 16.34516 20.65338 24.66977 ]
See also function cov
.
Function global_variances
returns a list of global variance measures:
trace(S_1)
,
trace(S_1)/p
,
determinant(S_1)
,
sqrt(determinant(S_1))
,
determinant(S_1)^(1/p)
, (defined in: Peña, D.
(2002) Análisis de datos multivariantes; McGraw-Hill, Madrid.)
determinant(S_1)^(1/(2*p))
.
where p is the dimension of the multivariate random variable and
S_1 the covariance matrix returned by cov1
.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) global_variances (s2); (%o3) [105.338342060606, 21.06766841212119, 12874.34690469686, 113.4651792608501, 6.636590811800795, 2.576158149609762]
Function global_variances
has an optional logical argument:
global_variances (x, true)
tells Maxima that x
is the data matrix,
making the same as global_variances(x)
. On the other hand,
global_variances(x, false)
means that x
is not the data matrix,
but the covariance matrix, avoiding its recalculation,
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) s : cov1 (s2)$
(%i4) global_variances (s, false); (%o4) [105.338342060606, 21.06766841212119, 12874.34690469686, 113.4651792608501, 6.636590811800795, 2.576158149609762]
See also cov
and cov1
.
The correlation matrix of the multivariate sample.
Example:
(%i1) load ("descriptive")$ (%i2) fpprintprec : 7 $ (%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) cor (s2); [ 1.0 .8476339 .8803515 .8239624 .7519506 ] [ ] [ .8476339 1.0 .8735834 .6902622 0.782502 ] [ ] (%o4) [ .8803515 .8735834 1.0 .7764065 .8323358 ] [ ] [ .8239624 .6902622 .7764065 1.0 .7293848 ] [ ] [ .7519506 0.782502 .8323358 .7293848 1.0 ]
Function cor
has an optional logical argument: cor(x,true)
tells
Maxima that x
is the data matrix, making the same as cor(x)
. On
the other hand, cor(x,false)
means that x
is not the data matrix,
but the covariance matrix, avoiding its recalculation,
(%i1) load ("descriptive")$ (%i2) fpprintprec : 7 $ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) s : cov1 (s2)$
(%i5) cor (s, false); /* this is faster */ [ 1.0 .8476339 .8803515 .8239624 .7519506 ] [ ] [ .8476339 1.0 .8735834 .6902622 0.782502 ] [ ] (%o5) [ .8803515 .8735834 1.0 .7764065 .8323358 ] [ ] [ .8239624 .6902622 .7764065 1.0 .7293848 ] [ ] [ .7519506 0.782502 .8323358 .7293848 1.0 ]
See also cov
and cov1
.
Function list_correlations
returns a list of correlation measures:
-1 ij S = (s ) 1 i,j = 1,2,...,p
2 1 R = 1 - ------- i ii s s ii
being an indicator of the goodness of fit of the linear multivariate regression model on X_i when the rest of variables are used as regressors.
ij s r = - ------------ ij.rest / ii jj\ 1/2 |s s | \ /
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) z : list_correlations (s2)$ (%i4) fpprintprec : 5$ /* for pretty output */
(%i5) z[1]; /* precision matrix */ [ .38486 - .13856 - .15626 - .10239 .031179 ] [ ] [ - .13856 .34107 - .15233 .038447 - .052842 ] [ ] (%o5) [ - .15626 - .15233 .47296 - .024816 - .10054 ] [ ] [ - .10239 .038447 - .024816 .10937 - .034033 ] [ ] [ .031179 - .052842 - .10054 - .034033 .14834 ]
(%i6) z[2]; /* multiple correlation vector */ (%o6) [.85063, .80634, .86474, .71867, .72675]
(%i7) z[3]; /* partial correlation matrix */ [ - 1.0 .38244 .36627 .49908 - .13049 ] [ ] [ .38244 - 1.0 .37927 - .19907 .23492 ] [ ] (%o7) [ .36627 .37927 - 1.0 .10911 .37956 ] [ ] [ .49908 - .19907 .10911 - 1.0 .26719 ] [ ] [ - .13049 .23492 .37956 .26719 - 1.0 ]
Function list_correlations
also has an optional logical argument:
list_correlations(x,true)
tells Maxima that x
is the data matrix,
making the same as list_correlations(x)
. On the other hand,
list_correlations(x,false)
means that x
is not the data matrix,
but the covariance matrix, avoiding its recalculation.
See also cov
and cov1
.
Vorige: Functions and Variables for specific multivariate descriptive statistics, Nach oben: Package descriptive [Inhalt][Index]
Plots bars diagrams for discrete statistical variables, both for one or multiple samples.
data can be a list of outcomes representing one sample, or a matrix of m rows and n columns, representing n samples of size m each.
Available options are:
3/4
): relative width of rectangles. This
value must be in the range [0,1]
.
clustered
): indicates how multiple samples are
shown. Valid values are: clustered
and stacked
.
1
): a positive integer number representing
the gap between two consecutive groups of bars.
[]
): a list of colors for multiple samples.
When there are more samples than specified colors, the extra necesary colors
are chosen at random. See color
to learn more
about them.
absolute
): indicates the scale of the
ordinates. Possible values are: absolute
, relative
,
and percent
.
orderlessp
): possible values are
orderlessp
or ordergreatp
, indicating how statistical outcomes
should be ordered on the x
-axis.
[]
): a list with the strings to be used in
the legend. When the list length is other than 0
or the number of
samples, an error message is returned.
0
): indicates where the plot begins to be
plotted on the x
-axis.
draw
options, except xtics
, which is
internally assigned by barsplot
.
If you want to set your own values for this option or want to build
complex scenes, make use of barsplot_description
. See example below.
draw
options: key
, color
,
fill_color
, fill_density
and line_width
. See also
bars
.
Function barsplot_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects. There is also a function wxbarsplot
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Examples:
Univariate sample in matrix form. Absolute frequencies.
(%i1) load ("descriptive")$ (%i2) m : read_matrix (file_search ("biomed.data"))$
(%i3) barsplot( col(m,2), title = "Ages", xlabel = "years", box_width = 1/2, fill_density = 3/4)$
Two samples of different sizes, with relative frequencies and user declared colors.
(%i1) load ("descriptive")$ (%i2) l1:makelist(random(10),k,1,50)$ (%i3) l2:makelist(random(10),k,1,100)$
(%i4) barsplot( l1,l2, box_width = 1, fill_density = 1, bars_colors = [black, grey], frequency = relative, sample_keys = ["A", "B"])$
Four non numeric samples of equal size.
(%i1) load ("descriptive")$
(%i2) barsplot( makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), title = "Asking for something to four groups", ylabel = "# of individuals", groups_gap = 3, fill_density = 0.5, ordering = ordergreatp)$
Stacked bars.
(%i1) load ("descriptive")$
(%i2) barsplot( makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), makelist([Yes, No, Maybe][random(3)+1],k,1,50), title = "Asking for something to four groups", ylabel = "# of individuals", grouping = stacked, fill_density = 0.5, ordering = ordergreatp)$
barsplot
in a multiplot context.
(%i1) load ("descriptive")$ (%i2) l1:makelist(random(10),k,1,50)$ (%i3) l2:makelist(random(10),k,1,100)$ (%i4) bp1 : barsplot_description( l1, box_width = 1, fill_density = 0.5, bars_colors = [blue], frequency = relative)$
(%i5) bp2 : barsplot_description( l2, box_width = 1, fill_density = 0.5, bars_colors = [red], frequency = relative)$
(%i6) draw(gr2d(bp1), gr2d(bp2))$
For bars diagrams related options, see bars
of package draw
.
See also functions histogram
and piechart
.
This function plots box-and-whishker diagrams. Argument data can be a
list, which is not of great interest, since these diagrams are mainly used for
comparing different samples, or a matrix, so it is possible to compare
two or more components of a multivariate statistical variable.
But it is also allowed data to be a list of samples with
possible different sample sizes, in fact this is the only function
in package descriptive
that admits this type of data structure.
Available options are:
3/4
): relative width of boxes.
This value must be in the range [0,1]
.
vertical
): possible values:
vertical
and horizontal
.
draw
options, except points_joined
, point_size
,
point_type
, xtics
, ytics
, xrange
, and yrange
,
which are internally assigned by boxplot
.
If you want to set your own values for this options or want to build
complex scenes, make use of boxplot_description
.
draw
options: key
, color
,
and line_width
.
Function boxplot_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects. There is also a function wxboxplot
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Examples:
Box-and-whishker diagram from a multivariate sample.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix(file_search("wind.data"))$
(%i3) boxplot(s2, box_width = 0.2, title = "Windspeed in knots", xlabel = "Stations", color = red, line_width = 2)$
Box-and-whishker diagram from three samples of different sizes.
(%i1) load ("descriptive")$
(%i2) A : [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2], [8, 10, 7, 9, 12, 8, 10], [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
(%i3) boxplot (A, box_orientation = horizontal)$
This function plots an histogram from a continuous sample. Sample data must be stored in a list of numbers or an one dimensional matrix.
Available options are:
10
): number of classes of the histogram, or
a list indicating the limits of the classes and the number of them, or
only the limits.
absolute
): indicates the scale of the
ordinates. Possible values are: absolute
, relative
,
and percent
.
auto
): format of the histogram tics. Possible
values are: auto
, endpoints
, intervals
, or a list
of labels.
draw
options, except xrange
, yrange
,
and xtics
, which are internally assigned by histogram
.
If you want to set your own values for these options, make use of
histogram_description
. See examples bellow.
draw
options: key
, color
,
fill_color
, fill_density
and line_width
. See also
bars
.
Function histogram_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects. There is also a function wxhistogram
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Examples:
A simple with eight classes:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) histogram ( s1, nclasses = 8, title = "pi digits", xlabel = "digits", ylabel = "Absolute frequency", fill_color = grey, fill_density = 0.6)$
Setting the limits of the histogram to -2 and 12, with 3 classes. Also, we introduce predefined tics:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) histogram ( s1, nclasses = [-2,12,3], htics = ["A", "B", "C"], terminal = png, fill_color = "#23afa0", fill_density = 0.6)$
We make use of histogram_description
for setting
the xrange
and adding an explicit curve into the scene:
(%i1) load ("descriptive")$ (%i2) ( load("distrib"), m: 14, s: 2, s2: random_normal(m, s, 1000) ) $ (%i3) draw2d( grid = true, xrange = [5, 25], histogram_description( s2, nclasses = 9, frequency = relative, fill_density = 0.5), explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s))$
Similar to barsplot
, but plots sectors instead of rectangles.
Available options are:
[]
): a list of colors for sectors.
When there are more sectors than specified colors, the extra necesary colors
are chosen at random. See color
to learn more
about them.
[0,0]
): diagram’s center.
1
): diagram’s radius.
draw
options, except key
, which is
internally assigned by piechart
.
If you want to set your own values for this option or want to build
complex scenes, make use of piechart_description
.
draw
options: key
, color
,
fill_density
and line_width
. See also
ellipse
.
Function piechart_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects. There is also a function wxpiechart
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) piechart( s1, xrange = [-1.1, 1.3], yrange = [-1.1, 1.1], title = "Digit frequencies in pi")$
See also function barsplot
.
Plots scatter diagrams both for univariate (list) and multivariate (matrix) samples.
Available options are the same admitted by histogram
.
Function scatterplot_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects. There is also a function wxscatterplot
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Examples:
Univariate scatter diagram from a simulated Gaussian sample.
(%i1) load ("descriptive")$ (%i2) load ("distrib")$
(%i3) scatterplot( random_normal(0,1,200), xaxis = true, point_size = 2, dimensions = [600,150])$
Two dimensional scatter plot.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) scatterplot( submatrix(s2, 1,2,3), title = "Data from stations #4 and #5", point_type = diamant, point_size = 2, color = blue)$
Three dimensional scatter plot.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) scatterplot(submatrix (s2, 1,2), nclasses=4)$
Five dimensional scatter plot, with five classes histograms.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) scatterplot( s2, nclasses = 5, frequency = relative, fill_color = blue, fill_density = 0.3, xtics = 5)$
For plotting isolated or line-joined points in two and three dimensions,
see points
. See also histogram
.
Plots star diagrams for discrete statistical variables, both for one or multiple samples.
data can be a list of outcomes representing one sample, or a matrix of m rows and n columns, representing n samples of size m each.
Available options are:
[]
): a list of colors for multiple samples.
When there are more samples than specified colors, the extra necesary colors
are chosen at random. See color
to learn more
about them.
absolute
): indicates the scale of the
radii. Possible values are: absolute
and relative
.
orderlessp
): possible values are
orderlessp
or ordergreatp
, indicating how statistical outcomes
should be ordered.
[]
): a list with the strings to be used in
the legend. When the list length is other than 0 or the number of samples, an
error message is returned.
[0,0]
): diagram’s center.
1
): diagram’s radius.
draw
options, except points_joined
, point_type
,
and key
, which are internally assigned by starplot
.
If you want to set your own values for this options or want to build
complex scenes, make use of starplot_description
.
draw
option: line_width
.
Function starplot_description
creates a graphic object
suitable for creating complex scenes, together with other
graphic objects. There is also a function wxstarplot
for
creating embedded histograms in interfaces wxMaxima and iMaxima.
Example:
Plot based on absolute frequencies. Location and radius defined by the user.
(%i1) load ("descriptive")$ (%i2) l1: makelist(random(10),k,1,50)$ (%i3) l2: makelist(random(10),k,1,200)$
(%i4) starplot( l1, l2, stars_colors = [blue,red], sample_keys = ["1st sample", "2nd sample"], star_center = [1,2], star_radius = 4, proportional_axes = xy, line_width = 2 ) $
Plots stem and leaf diagrams. Unique available option is:
1
): indicates the unit of the leaves;
must be a power of 10
.
Example:
(%i1) load ("descriptive")$ (%i2) load("distrib")$ (%i3) stemplot( random_normal(15, 6, 100), leaf_unit = 0.1); -5|4 0|37 1|7 3|6 4|4 5|4 6|57 7|0149 8|3 9|1334588 10|07888 11|01144467789 12|12566889 13|24778 14|047 15|223458 16|4 17|11557 18|000247 19|4467799 20|00 21|1 22|2335 23|01457 24|12356 25|455 27|79 key: 6|3 = 6.3 (%o3) done
Nächste: diag, Vorige: contrib_ode [Inhalt][Index]