Next: Package to_poly_solve, Previous: Package stirling [Contents][Index]
Next: String Input and Output, Previous: Package stringproc, Up: Package stringproc [Contents][Index]
The package stringproc
contains functions for processing strings
and characters including formatting, encoding and data streams.
This package is completed by some tools for cryptography, e.g. base64 and hash
functions.
It can be directly loaded via load("stringproc")
or automatically by
using one of its functions.
For questions and bug reports please contact the author. The following command prints his e-mail-address.
printf(true, "~{~a~}@gmail.com", split(sdowncase("Volker van Nek")))$
A string is constructed by typing e.g. "Text"
.
When the option variable stringdisp
is set to false
, which is
the default, the double quotes won’t be printed.
stringp is a test, if an object is a string.
(%i1) str: "Text"; (%o1) Text (%i2) stringp(str); (%o2) true
Characters are represented by a string of length 1. charp is the corresponding test.
(%i1) char: "e"; (%o1) e (%i2) charp(char); (%o2) true
In Maxima position indices in strings are like in list 1-indexed which results to the following consistency.
(%i1) is(charat("Lisp",1) = charlist("Lisp")[1]); (%o1) true
A string may contain Maxima expressions. These can be parsed with parse_string.
(%i1) map(parse_string, ["42" ,"sqrt(2)", "%pi"]); (%o1) [42, sqrt(2), %pi] (%i2) map('float, %); (%o2) [42.0, 1.414213562373095, 3.141592653589793]
Strings can be processed as characters or in binary form as octets. Functions for conversions are string_to_octets and octets_to_string. Usable encodings depend on the platform, the application and the underlying Lisp. (The following shows Maxima in GNU/Linux, compiled with SBCL.)
(%i1) obase: 16.$ (%i2) string_to_octets("$£€", "cp1252"); (%o2) [24, 0A3, 80] (%i3) string_to_octets("$£€", "utf-8"); (%o3) [24, 0C2, 0A3, 0E2, 82, 0AC]
Strings may be written to character streams or as octets to binary streams. The following example demonstrates file in and output of characters.
openw returns an output stream to a file, printf writes formatted to that file and by e.g. close all characters contained in the stream are written to the file.
(%i1) s: openw("file.txt"); (%o1) #<output stream file.txt> (%i2) printf(s, "~%~d ~f ~a ~a ~f ~e ~a~%", 42, 1.234, sqrt(2), %pi, 1.0e-2, 1.0e-2, 1.0b-2)$ (%i3) close(s)$
openr then returns an input stream from the previously used file and readline returns the line read as a string. The string may be tokenized by e.g. split or tokens and finally parsed by parse_string.
(%i4) s: openr("file.txt"); (%o4) #<input stream file.txt> (%i5) readline(s); (%o5) 42 1.234 sqrt(2) %pi 0.01 1.0E-2 1.0b-2 (%i6) map(parse_string, split(%)); (%o6) [42, 1.234, sqrt(2), %pi, 0.01, 0.01, 1.0b-2] (%i7) close(s)$
Next: Characters, Previous: Introduction to String Processing, Up: Package stringproc [Contents][Index]
Example: Formatted printing to a file.
(%i1) s: openw("file.txt"); (%o1) #<output stream file.txt> (%i2) control: "~2tAn atom: ~20t~a~%~2tand a list: ~20t~{~r ~}~%~2t\ and an integer: ~20t~d~%"$ (%i3) printf( s,control, 'true,[1,2,3],42 )$ (%o3) false (%i4) close(s); (%o4) true (%i5) s: openr("file.txt"); (%o5) #<input stream file.txt> (%i6) while stringp( tmp:readline(s) ) do print(tmp)$ An atom: true and a list: one two three and an integer: 42 (%i7) close(s)$
Closes stream and returns true
if stream had been open.
stream has to be an open stream from or to a file.
flength
then returns the number of bytes which are currently present in this file.
Example: See writebyte .
Flushes stream where stream has to be an output stream to a file.
Example: See writebyte .
Returns the current position in stream, if pos is not used.
If pos is used, fposition
sets the position in stream.
stream has to be a stream from or to a file and
pos has to be a positive number.
Positions in data streams are like in strings or lists 1-indexed, i.e. the first element in stream is in position 1.
Writes a new line to the standard output stream
if the position is not at the beginning of a line and returns true
.
Using the optional argument stream the new line is written to that stream.
There are some cases, where freshline()
does not work as expected.
See also newline.
Returns a string containing all the characters currently present in stream which must be an open string-output stream. The returned characters are removed from stream.
Example: See make_string_output_stream .
Returns an input stream which contains parts of string and an end of file. Without optional arguments the stream contains the entire string and is positioned in front of the first character. start and end define the substring contained in the stream. The first character is available at position 1.
(%i1) istream : make_string_input_stream("text", 1, 4); (%o1) #<string-input stream from "text"> (%i2) (while (c : readchar(istream)) # false do sprint(c), newline())$ t e x (%i3) close(istream)$
Returns an output stream that accepts characters. Characters currently present in this stream can be retrieved by get_output_stream_string.
(%i1) ostream : make_string_output_stream(); (%o1) #<string-output stream 09622ea0> (%i2) printf(ostream, "foo")$ (%i3) printf(ostream, "bar")$ (%i4) string : get_output_stream_string(ostream); (%o4) foobar (%i5) printf(ostream, "baz")$ (%i6) string : get_output_stream_string(ostream); (%o6) baz (%i7) close(ostream)$
Writes a new line to the standard output stream.
Using the optional argument stream the new line is written to that stream.
There are some cases, where newline()
does not work as expected.
See sprint for an example of using newline()
.
Returns a character output stream to file.
If an existing file is opened, opena
appends elements at the end of file.
For binary output see opena_binary .
Returns a character input stream to file.
openr
assumes that file already exists.
If reading the file results in a lisp error about its encoding
passing the correct string as the argument encoding might help.
The available encodings and their names depend on the lisp being used.
For sbcl a list of suitable strings can be found at
http://www.sbcl.org/manual/#External-Formats.
For binary input see openr_binary .
See also close
and openw
.
(%i1) istream : openr("data.txt","EUC-JP"); (%o1) #<FD-STREAM for "file /home/gunter/data.txt" {10099A3AE3}> (%i2) close(istream); (%o2) true
Returns a character output stream to file.
If file does not exist, it will be created.
If an existing file is opened, openw
destructively modifies file.
For binary output see openw_binary .
Produces formatted output by outputting the characters of control-string string and observing that a tilde introduces a directive. The character after the tilde, possibly preceded by prefix parameters and modifiers, specifies what kind of formatting is desired. Most directives use one or more elements of the arguments expr_1, ..., expr_n to create their output.
If dest is a stream or true
, then printf
returns false
.
Otherwise, printf
returns a string containing the output.
By default the streams stdin, stdout and stderr are defined.
If Maxima is running as a network client (which is the normal case if Maxima is communicating
with a graphical user interface, which must be the server) setup-client
will define old_stdout and old_stderr, too.
printf
provides the Common Lisp function format
in Maxima.
The following example illustrates the general relation between these two
functions.
(%i1) printf(true, "R~dD~d~%", 2, 2); R2D2 (%o1) false (%i2) :lisp (format t "R~dD~d~%" 2 2) R2D2 NIL
The following description is limited to a rough sketch of the possibilities of
printf
.
The Lisp function format
is described in detail in many reference books.
Of good help is e.g. the free available online-manual
"Common Lisp the Language" by Guy L. Steele. See chapter 22.3.3 there.
In addition, printf
recognizes two format directives which are not known to Lisp format
.
The format directive ~m
indicates Maxima pretty printer output.
The format directive ~h
indicates a bigfloat number.
~% new line ~& fresh line ~t tab ~$ monetary ~d decimal integer ~b binary integer ~o octal integer ~x hexadecimal integer ~br base-b integer ~r spell an integer ~p plural ~f floating point ~e scientific notation ~g ~f or ~e, depending upon magnitude ~h bigfloat ~a uses Maxima function string ~m Maxima pretty printer output ~s like ~a, but output enclosed in "double quotes" ~~ ~ ~< justification, ~> terminates ~( case conversion, ~) terminates ~[ selection, ~] terminates ~{ iteration, ~} terminates
Note that the directive ~* is not supported.
If dest is a stream or true
, then printf
returns false
.
Otherwise, printf
returns a string containing the output.
(%i1) printf( false, "~a ~a ~4f ~a ~@r", "String",sym,bound,sqrt(12),144), bound = 1.234; (%o1) String sym 1.23 2*sqrt(3) CXLIV (%i2) printf( false,"~{~a ~}",["one",2,"THREE"] ); (%o2) one 2 THREE (%i3) printf(true,"~{~{~9,1f ~}~%~}",mat ), mat = args(matrix([1.1,2,3.33],[4,5,6],[7,8.88,9]))$ 1.1 2.0 3.3 4.0 5.0 6.0 7.0 8.9 9.0 (%i4) control: "~:(~r~) bird~p ~[is~;are~] singing."$ (%i5) printf( false,control, n,n,if n=1 then 1 else 2 ), n=2; (%o5) Two birds are singing.
The directive ~h has been introduced to handle bigfloats.
~w,d,e,x,o,p@H w : width d : decimal digits behind floating point e : minimal exponent digits x : preferred exponent o : overflow character p : padding character @ : display sign for positive numbers
(%i1) fpprec : 1000$ (%i2) printf(true, "|~h|~%", 2.b0^-64)$ |0.0000000000000000000542101086242752217003726400434970855712890625| (%i3) fpprec : 26$ (%i4) printf(true, "|~h|~%", sqrt(2))$ |1.4142135623730950488016887| (%i5) fpprec : 24$ (%i6) printf(true, "|~h|~%", sqrt(2))$ |1.41421356237309504880169| (%i7) printf(true, "|~28h|~%", sqrt(2))$ | 1.41421356237309504880169| (%i8) printf(true, "|~28,,,,,'*h|~%", sqrt(2))$ |***1.41421356237309504880169| (%i9) printf(true, "|~,18h|~%", sqrt(2))$ |1.414213562373095049| (%i10) printf(true, "|~,,,-3h|~%", sqrt(2))$ |1414.21356237309504880169b-3| (%i11) printf(true, "|~,,2,-3h|~%", sqrt(2))$ |1414.21356237309504880169b-03| (%i12) printf(true, "|~20h|~%", sqrt(2))$ |1.41421356237309504880169| (%i13) printf(true, "|~20,,,,'+h|~%", sqrt(2))$ |++++++++++++++++++++|
For conversion of objects to strings also see concat
, sconcat
,
string
and simplode
.
Removes and returns the first byte in stream which must be a binary input stream.
If the end of file is encountered readbyte
returns false
.
Example: Read the first 16 bytes from a file encrypted with AES in OpenSSL.
(%i1) ibase: obase: 16.$ (%i2) in: openr_binary("msg.bin"); (%o2) #<input stream msg.bin> (%i3) (L:[], thru 16. do push(readbyte(in), L), L:reverse(L)); (%o3) [53, 61, 6C, 74, 65, 64, 5F, 5F, 88, 56, 0DE, 8A, 74, 0FD, 0AD, 0F0] (%i4) close(in); (%o4) true (%i5) map(ascii, rest(L,-8)); (%o5) [S, a, l, t, e, d, _, _] (%i6) salt: octets_to_number(rest(L,8)); (%o6) 8856de8a74fdadf0
Removes and returns the first character in stream.
If the end of file is encountered readchar
returns false
.
Example: See make_string_input_stream.
Returns a string containing all characters starting at the current position
in stream up to the end of the line or false
if the end of the file is encountered.
Evaluates and displays its arguments one after the other ‘on a line’ starting at
the leftmost position. The expressions are printed with a space character right next
to the number, and it disregards line length.
newline()
might be used for line breaking.
Example: Sequential printing with sprint
.
Creating a new line with newline()
.
(%i1) for n:0 thru 19 do sprint(fib(n))$ 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 (%i2) for n:0 thru 22 do ( sprint(fib(n)), if mod(n,10) = 9 then newline() )$ 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711
Writes byte to stream which must be a binary output stream.
writebyte
returns byte
.
Example: Write some bytes to a binary file output stream.
In this example all bytes correspond to printable characters and are printed
by printfile
.
The bytes remain in the stream until flush_output
or close
have been called.
(%i1) ibase: obase: 16.$ (%i2) bytes: map(cint, charlist("GNU/Linux")); (%o2) [47, 4E, 55, 2F, 4C, 69, 6E, 75, 78] (%i3) out: openw_binary("test.bin"); (%o3) #<output stream test.bin> (%i4) for i thru 3 do writebyte(bytes[i], out); (%o4) done (%i5) printfile("test.bin")$ (%i6) flength(out); (%o6) 0 (%i7) flush_output(out); (%o7) true (%i8) flength(out); (%o8) 3 (%i9) printfile("test.bin")$ GNU (%i0A) for b in rest(bytes,3) do writebyte(b, out); (%o0A) done (%i0B) close(out); (%o0B) true (%i0C) printfile("test.bin")$ GNU/Linux
Next: String Processing, Previous: String Input and Output, Up: Package stringproc [Contents][Index]
Characters are strings of length 1.
Prints information about the current external format of the Lisp reader
and in case the external format encoding differs from the encoding of the
application which runs Maxima adjust_external_format
tries to adjust
the encoding or prints some help or instruction.
adjust_external_format
returns true
when the external format has
been changed and false
otherwise.
Functions like cint, unicode, octets_to_string and string_to_octets need UTF-8 as the external format of the Lisp reader to work properly over the full range of Unicode characters.
Examples (Maxima on Windows, March 2016):
Using adjust_external_format
when the default external format
is not equal to the encoding provided by the application.
1. Command line Maxima
In case a terminal session is preferred it is recommended to use Maxima compiled
with SBCL. Here Unicode support is provided by default and calls to
adjust_external_format
are unnecessary.
If Maxima is compiled with CLISP or GCL it is recommended to change
the terminal encoding from CP850 to CP1252.
adjust_external_format
prints some help.
CCL reads UTF-8 while the terminal input is CP850 by default.
CP1252 is not supported by CCL. adjust_external_format
prints instructions for changing the terminal encoding and external format
both to iso-8859-1.
2. wxMaxima
In wxMaxima SBCL reads CP1252 by default but the input from the application is UTF-8 encoded. Adjustment is needed.
Calling adjust_external_format
and restarting Maxima
permanently changes the default external format to UTF-8.
(%i1)adjust_external_format(); The line (setf sb-impl::*default-external-format* :utf-8) has been appended to the init file C:/Users/Username/.sbclrc Please restart Maxima to set the external format to UTF-8. (%i1) false
Restarting Maxima.
(%i1) adjust_external_format(); The external format is currently UTF-8 and has not been changed. (%i1) false
Returns true
if char is an alphabetic character.
To identify a non-US-ASCII character as an alphabetic character the underlying Lisp must provide full Unicode support. E.g. a German umlaut is detected as an alphabetic character with SBCL in GNU/Linux but not with GCL. (In Windows Maxima, when compiled with SBCL, must be set to UTF-8. See adjust_external_format for more.)
Example: Examination of non-US-ASCII characters.
The underlying Lisp (SBCL, GNU/Linux) is able to convert the typed character into a Lisp character and to examine.
(%i1) alphacharp("ü"); (%o1) true
In GCL this is not possible. An error break occurs.
(%i1) alphacharp("u"); (%o1) true (%i2) alphacharp("ü"); package stringproc: ü cannot be converted into a Lisp character. -- an error.
Returns true
if char is an alphabetic character or a digit
(only corresponding US-ASCII characters are regarded as digits).
Note: See remarks on alphacharp.
Returns the US-ASCII character corresponding to the integer int
which has to be less than 128
.
See unicode for converting code points larger than 127
.
Examples:
(%i1) for n from 0 thru 127 do ( ch: ascii(n), if alphacharp(ch) then sprint(ch), if n = 96 then newline() )$ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
Returns true
if char_1 and char_2 are the same character.
Like cequal
but ignores case which is only possible for non-US-ASCII
characters when the underlying Lisp is able to recognize a character as an
alphabetic character. See remarks on alphacharp.
Returns true
if the code point of char_1 is greater than the
code point of char_2.
Like cgreaterp
but ignores case which is only possible for non-US-ASCII
characters when the underlying Lisp is able to recognize a character as an
alphabetic character. See remarks on alphacharp.
Returns true
if obj is a Maxima-character.
See introduction for example.
Returns the Unicode code point of char which must be a
Maxima character, i.e. a string of length 1
.
Examples: The hexadecimal code point of some characters (Maxima with SBCL on GNU/Linux).
(%i1) obase: 16.$ (%i2) map(cint, ["$","£","€"]); (%o2) [24, 0A3, 20AC]
Warning: It is not possible to enter characters corresponding to code points larger than 16 bit in wxMaxima with SBCL on Windows when the external format has not been set to UTF-8. See adjust_external_format.
CMUCL doesn’t process these characters as one character.
cint
then returns false
.
Converting a character to a code point via UTF-8-octets may serve as a workaround:
utf8_to_unicode(string_to_octets(character));
See utf8_to_unicode, string_to_octets.
Returns true
if the code point of char_1 is less than the
code point of char_2.
Like clessp
but ignores case which is only possible for non-US-ASCII
characters when the underlying Lisp is able to recognize a character as an
alphabetic character. See remarks on alphacharp.
Returns true
if char is a graphic character but not a space character.
A graphic character is a character one can see, plus the space character.
(constituent
is defined by Paul Graham.
See Paul Graham, ANSI Common Lisp, 1996, page 67.)
(%i1) for n from 0 thru 255 do ( tmp: ascii(n), if constituent(tmp) then sprint(tmp) )$ ! " # % ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
Returns true
if char is a digit where only the corresponding
US-ASCII-character is regarded as a digit.
Returns true
if char is a lowercase character.
Note: See remarks on alphacharp.
The newline character (ASCII-character 10).
The space character.
The tab character.
Returns the character defined by arg which might be a Unicode code point or a name string if the underlying Lisp provides full Unicode support.
Example: Characters defined by hexadecimal code points (Maxima with SBCL on GNU/Linux).
(%i1) ibase: 16.$ (%i2) map(unicode, [24, 0A3, 20AC]); (%o2) [$, £, €]
Warning: In wxMaxima with SBCL on Windows it is not possible to convert code points larger than 16 bit to characters when the external format has not been set to UTF-8. See adjust_external_format for more information.
CMUCL doesn’t process code points larger than 16 bit.
In these cases unicode
returns false
.
Converting a code point to a character via UTF-8 octets may serve as a workaround:
octets_to_string(unicode_to_utf8(code_point));
See octets_to_string, unicode_to_utf8.
In case the underlying Lisp provides full Unicode support the character might be
specified by its name. The following is possible in ECL, CLISP and SBCL,
where in SBCL on Windows the external format has to be set to UTF-8.
unicode(name)
is supported by CMUCL too but again limited to 16 bit
characters.
The string argument to unicode
is basically the same string returned by
printf
using the "~@c" specifier.
But as shown below the prefix "#\" must be omitted.
Underlines might be replaced by spaces and uppercase letters by lowercase ones.
Example (continued): Characters defined by names (Maxima with SBCL on GNU/Linux).
(%i3) printf(false, "~@c", unicode(0DF)); (%o3) #\LATIN_SMALL_LETTER_SHARP_S (%i4) unicode("LATIN_SMALL_LETTER_SHARP_S"); (%o4) ß (%i5) unicode("Latin small letter sharp s"); (%o5) ß
Returns a list containing the UTF-8 code corresponding to the Unicode code_point.
Examples: Converting Unicode code points to UTF-8 and vice versa.
(%i1) ibase: obase: 16.$ (%i2) map(cint, ["$","£","€"]); (%o2) [24, 0A3, 20AC] (%i3) map(unicode_to_utf8, %); (%o3) [[24], [0C2, 0A3], [0E2, 82, 0AC]] (%i4) map(utf8_to_unicode, %); (%o4) [24, 0A3, 20AC]
Returns true
if char is an uppercase character.
Note: See remarks on alphacharp.
This option variable affects Maxima when the character encoding provided by the application which runs Maxima is UTF-8 but the external format of the Lisp reader is not equal to UTF-8.
On GNU/Linux this is true when Maxima is built with GCL
and on Windows in wxMaxima with GCL- and SBCL-builds.
With SBCL it is recommended to change the external format to UTF-8.
Setting us_ascii_only
is unnecessary then.
See adjust_external_format for details.
us_ascii_only
is false
by default.
Maxima itself then (i.e. in the above described situation) parses the UTF-8 encoding.
When us_ascii_only
is set to true
it is assumed that all strings
used as arguments to string processing functions do not contain Non-US-ASCII characters.
Given that promise, Maxima avoids parsing UTF-8 and strings can be processed more efficiently.
Returns a Unicode code point corresponding to the list which must contain the UTF-8 encoding of a single character.
Examples: See unicode_to_utf8.
Next: Octets and Utilities for Cryptography, Previous: Characters [Contents][Index]
Position indices in strings are 1-indexed like in Maxima lists. See example in charat.
Returns the n-th character of string. The first character in string is returned with n = 1.
(%i1) charat("Lisp",1); (%o1) L (%i2) charlist("Lisp")[1]; (%o2) L
Returns the list of all characters in string.
(%i1) charlist("Lisp"); (%o1) [L, i, s, p]
Parse the string str as a Maxima expression and evaluate it.
The string str may or may not have a terminator (dollar sign $
or semicolon ;
).
Only the first expression is parsed and evaluated, if there is more than one.
Complain if str is not a string.
Examples:
(%i1) eval_string ("foo: 42; bar: foo^2 + baz"); (%o1) 42 (%i2) eval_string ("(foo: 42, bar: foo^2 + baz)"); (%o2) baz + 1764
See also parse_string and eval_string_lisp.
Parse the string str as a Maxima expression (do not evaluate it).
The string str may or may not have a terminator (dollar sign $
or semicolon ;
).
Only the first expression is parsed, if there is more than one.
Complain if str is not a string.
Examples:
(%i1) parse_string ("foo: 42; bar: foo^2 + baz"); (%o1) foo : 42 (%i2) parse_string ("(foo: 42, bar: foo^2 + baz)"); 2 (%o2) (foo : 42, bar : foo + baz)
See also eval_string.
Returns a copy of string as a new string.
Like supcase but uppercase characters are converted to lowercase.
Returns true
if string_1 and string_2 contain the same
sequence of characters.
Like sequal
but ignores case which is only possible for non-US-ASCII
characters when the underlying Lisp is able to recognize a character as an
alphabetic character. See remarks on alphacharp.
sexplode
is an alias for function charlist
.
simplode
takes a list of expressions and concatenates them into a string.
If no delimiter delim is specified, simplode
uses no delimiter.
delim can be any string.
See also concat
, sconcat
, string
and printf
.
Examples:
(%i1) simplode(["xx[",3,"]:",expand((x+y)^3)]); (%o1) xx[3]:y^3+3*x*y^2+3*x^2*y+x^3 (%i2) simplode( sexplode("stars")," * " ); (%o2) s * t * a * r * s (%i3) simplode( ["One","more","coffee."]," " ); (%o3) One more coffee.
Returns a string that is a concatenation of substring(string, 1, pos-1)
,
the string seq and substring (string, pos)
.
Note that the first character in string is in position 1.
Examples:
(%i1) s: "A submarine."$ (%i2) concat( substring(s,1,3),"yellow ",substring(s,3) ); (%o2) A yellow submarine. (%i3) sinsert("hollow ",s,3); (%o3) A hollow submarine.
Returns string except that each character from position start to end is inverted. If end is not given, all characters from start to the end of string are replaced.
Examples:
(%i1) sinvertcase("sInvertCase"); (%o1) SiNVERTcASE
Returns the number of characters in string.
Returns a new string with a number of num characters char.
Example:
(%i1) smake(3,"w"); (%o1) www
Returns the position of the first character of string_1 at which string_1 and string_2 differ or false
.
Default test function for matching is sequal
.
If smismatch
should ignore case, use sequalignore
as test.
Example:
(%i1) smismatch("seven","seventh"); (%o1) 6
Returns the list of all tokens in string.
Each token is an unparsed string.
split
uses delim as delimiter.
If delim is not given, the space character is the default delimiter.
multiple is a boolean variable with true
by default.
Multiple delimiters are read as one.
This is useful if tabs are saved as multiple space characters.
If multiple is set to false
, each delimiter is noted.
Examples:
(%i1) split("1.2 2.3 3.4 4.5"); (%o1) [1.2, 2.3, 3.4, 4.5] (%i2) split("first;;third;fourth",";",false); (%o2) [first, , third, fourth]
Returns the position of the first character in string which matches char. The first character in string is in position 1. For matching characters ignoring case see ssearch.
Returns a string like string but without all substrings matching seq.
Default test function for matching is sequal
.
If sremove
should ignore case while searching for seq, use sequalignore
as test.
Use start and end to limit searching.
Note that the first character in string is in position 1.
Examples:
(%i1) sremove("n't","I don't like coffee."); (%o1) I do like coffee. (%i2) sremove ("DO ",%,'sequalignore); (%o2) I like coffee.
Like sremove
except that only the first substring that matches seq is removed.
Returns a string with all the characters of string in reverse order.
See also reverse
.
Returns the position of the first substring of string that matches the string seq.
Default test function for matching is sequal
.
If ssearch
should ignore case, use sequalignore
as test.
Use start and end to limit searching.
Note that the first character in string is in position 1.
Example:
(%i1) ssearch("~s","~{~S ~}~%",'sequalignore); (%o1) 4
Returns a string that contains all characters from string in an order such there are no two successive characters c and d such that test (c, d)
is false
and test (d, c)
is true
.
Default test function for sorting is clessp.
The set of test functions is {clessp, clesspignore, cgreaterp, cgreaterpignore, cequal, cequalignore}
.
Examples:
(%i1) ssort("I don't like Mondays."); (%o1) '.IMaddeiklnnoosty (%i2) ssort("I don't like Mondays.",'cgreaterpignore); (%o2) ytsoonnMlkIiedda.'
Returns a string like string except that all substrings matching old are replaced by new.
old and new need not to be of the same length.
Default test function for matching is sequal
.
If ssubst
should ignore case while searching for old, use sequalignore
as test.
Use start and end to limit searching.
Note that the first character in string is in position 1.
Examples:
(%i1) ssubst("like","hate","I hate Thai food. I hate green tea."); (%o1) I like Thai food. I like green tea. (%i2) ssubst("Indian","thai",%,'sequalignore,8,12); (%o2) I like Indian food. I like green tea.
Like subst
except that only the first substring that matches old is replaced.
Returns a string like string, but with all characters that appear in seq removed from both ends.
Examples:
(%i1) "/* comment */"$ (%i2) strim(" /*",%); (%o2) comment (%i3) slength(%); (%o3) 7
Like strim
except that only the left end of string is trimmed.
Like strim
except that only the right end of string is trimmed.
Returns true
if obj is a string.
See introduction for example.
Returns the substring of string beginning at position start and ending at position end. The character at position end is not included. If end is not given, the substring contains the rest of the string. Note that the first character in string is in position 1.
Examples:
(%i1) substring("substring",4); (%o1) string (%i2) substring(%,4,6); (%o2) in
Returns string except that lowercase characters from position start to end are replaced by the corresponding uppercase ones. If end is not given, all lowercase characters from start to the end of string are replaced.
Example:
(%i1) supcase("english",1,2); (%o1) English
Returns a list of tokens, which have been extracted from string.
The tokens are substrings whose characters satisfy a certain test function.
If test is not given, constituent is used as the default test.
{constituent, alphacharp, digitcharp, lowercasep, uppercasep, charp, characterp, alphanumericp}
is the set of test functions.
(The Lisp-version of tokens
is written by Paul Graham. ANSI Common Lisp, 1996, page 67.)
Examples:
(%i1) tokens("24 October 2005"); (%o1) [24, October, 2005] (%i2) tokens("05-10-24",'digitcharp); (%o2) [05, 10, 24] (%i3) map(parse_string,%); (%o3) [5, 10, 24]
Next: Regular Expressions, Previous: String Processing [Contents][Index]
Returns the base64-representation of arg as a string. The argument arg may be a string, a non-negative integer or a list of octets.
Examples:
(%i1) base64: base64("foo bar baz"); (%o1) Zm9vIGJhciBiYXo= (%i2) string: base64_decode(base64); (%o2) foo bar baz (%i3) obase: 16.$ (%i4) integer: base64_decode(base64, 'number); (%o4) 666f6f206261722062617a (%i5) octets: base64_decode(base64, 'list); (%o5) [66, 6F, 6F, 20, 62, 61, 72, 20, 62, 61, 7A] (%i6) ibase: 16.$ (%i7) base64(octets); (%o7) Zm9vIGJhciBiYXo=
Note that if arg contains umlauts (resp. octets larger than 127) the resulting base64-string is platform dependent. However the decoded string will be equal to the original.
By default base64_decode
decodes the base64-string back to the original string.
The optional argument return-type allows base64_decode
to
alternatively return the corresponding number or list of octets.
return-type may be string
, number
or list
.
Example: See base64.
By default crc24sum
returns the CRC24
checksum of an octet-list
as a string.
The optional argument return-type allows crc24sum
to
alternatively return the corresponding number or list of octets.
return-type may be string
, number
or list
.
Example:
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAEBAgAGBQJVdCTzAAoJEG/1Mgf2DWAqCSYH/AhVFwhu1D89C3/QFcgVvZTM wnOYzBUURJAL/cT+IngkLEpp3hEbREcugWp+Tm6aw3R4CdJ7G3FLxExBH/5KnDHi rBQu+I7+3ySK2hpryQ6Wx5J9uZSa4YmfsNteR8up0zGkaulJeWkS4pjiRM+auWVe vajlKZCIK52P080DG7Q2dpshh4fgTeNwqCuCiBhQ73t8g1IaLdhDN6EzJVjGIzam /spqT/sTo6sw8yDOJjvU+Qvn6/mSMjC/YxjhRMaQt9EMrR1AZ4ukBF5uG1S7mXOH WdiwkSPZ3gnIBhM9SuC076gLWZUNs6NqTeE3UzMjDAFhH3jYk1T7mysCvdtIkms= =WmeC -----END PGP SIGNATURE-----
(%i1) ibase : obase : 16.$ (%i2) sig64 : sconcat( "iQEcBAEBAgAGBQJVdCTzAAoJEG/1Mgf2DWAqCSYH/AhVFwhu1D89C3/QFcgVvZTM", "wnOYzBUURJAL/cT+IngkLEpp3hEbREcugWp+Tm6aw3R4CdJ7G3FLxExBH/5KnDHi", "rBQu+I7+3ySK2hpryQ6Wx5J9uZSa4YmfsNteR8up0zGkaulJeWkS4pjiRM+auWVe", "vajlKZCIK52P080DG7Q2dpshh4fgTeNwqCuCiBhQ73t8g1IaLdhDN6EzJVjGIzam", "/spqT/sTo6sw8yDOJjvU+Qvn6/mSMjC/YxjhRMaQt9EMrR1AZ4ukBF5uG1S7mXOH", "WdiwkSPZ3gnIBhM9SuC076gLWZUNs6NqTeE3UzMjDAFhH3jYk1T7mysCvdtIkms=" )$ (%i3) octets: base64_decode(sig64, 'list)$ (%i4) crc24: crc24sum(octets, 'list); (%o4) [5A, 67, 82] (%i5) base64(crc24); (%o5) WmeC
Returns the MD5
checksum of a string, non-negative integer,
list of octets, or binary (not character) input stream.
A file for which an input stream is opened may be an ordinary text file;
it is the stream which needs to be binary, not the file itself.
When the argument is an input stream,
md5sum
reads the entire content of the stream,
but does not close the stream.
The default return value is a string containing 32 hex characters.
The optional argument return-type allows md5sum
to alternatively
return the corresponding number or list of octets.
return-type may be string
, number
or list
.
Note that in case arg contains German umlauts or other non-ASCII
characters (resp. octets larger than 127) the MD5
checksum is platform dependent.
Examples:
(%i1) ibase: obase: 16.$ (%i2) msg: "foo bar baz"$ (%i3) string: md5sum(msg); (%o3) ab07acbb1e496801937adfa772424bf7 (%i4) integer: md5sum(msg, 'number); (%o4) 0ab07acbb1e496801937adfa772424bf7 (%i5) octets: md5sum(msg, 'list); (%o5) [0AB,7,0AC,0BB,1E,49,68,1,93,7A,0DF,0A7,72,42,4B,0F7] (%i6) sdowncase( printf(false, "~{~2,'0x~^:~}", octets) ); (%o6) ab:07:ac:bb:1e:49:68:01:93:7a:df:a7:72:42:4b:f7
The argument may be a binary input stream.
(%i1) S: openr_binary (file_search ("md5.lisp")); (%o1) #<INPUT BUFFERED FILE-STREAM (UNSIGNED-BYTE 8) /home/robert/maxima/maxima-code/share/stringproc/md5.lisp> (%i2) md5sum (S); (%o2) 31a512ed53daf5b99495c9d05559355f (%i3) close (S); (%o3) true
Returns a pseudo random number of variable length. By default the returned value is a number with a length of len octets.
The optional argument return-type allows mgf1_sha1
to alternatively
return the corresponding list of len octets.
return-type may be number
or list
.
The computation of the returned value is described in RFC 3447
,
appendix B.2.1 MGF1
.
SHA1
is used as hash function, i.e. the randomness of the computed number
relies on the randomness of SHA1
hashes.
Example:
(%i1) ibase: obase: 16.$ (%i2) number: mgf1_sha1(4711., 8); (%o2) 0e0252e5a2a42fea1 (%i3) octets: mgf1_sha1(4711., 8, 'list); (%o3) [0E0,25,2E,5A,2A,42,0FE,0A1]
Returns an octet-representation of number as a list of octets. The number must be a non-negative integer.
Example:
(%i1) ibase : obase : 16.$ (%i2) octets: [0ca,0fe,0ba,0be]$ (%i3) number: octets_to_number(octets); (%o3) 0cafebabe (%i4) number_to_octets(number); (%o4) [0CA, 0FE, 0BA, 0BE]
Returns a number by concatenating the octets in the list of octets.
Example: See number_to_octets.
Computes an object identifier (OID) from the list of octets.
Example: RSA encryption OID
(%i1) ibase : obase : 16.$ (%i2) oid: octets_to_oid([2A,86,48,86,0F7,0D,1,1,1]); (%o2) 1.2.840.113549.1.1.1 (%i3) oid_to_octets(oid); (%o3) [2A, 86, 48, 86, 0F7, 0D, 1, 1, 1]
Decodes the list of octets into a string according to current system defaults. When decoding octets corresponding to Non-US-ASCII characters the result depends on the platform, application and underlying Lisp.
Example: Using system defaults (Maxima compiled with GCL, which uses no format definition and simply passes through the UTF-8-octets encoded by the GNU/Linux terminal).
(%i1) octets: string_to_octets("abc"); (%o1) [61, 62, 63] (%i2) octets_to_string(octets); (%o2) abc (%i3) ibase: obase: 16.$ (%i4) unicode(20AC); (%o4) € (%i5) octets: string_to_octets(%); (%o5) [0E2, 82, 0AC] (%i6) octets_to_string(octets); (%o6) € (%i7) utf8_to_unicode(octets); (%o7) 20AC
In case the external format of the Lisp reader is equal to UTF-8 the optional argument encoding allows to set the encoding for the octet to string conversion. If necessary see adjust_external_format for changing the external format.
Some names of supported encodings (see corresponding Lisp manual for more):
CCL, CLISP, SBCL: utf-8, ucs-2be, ucs-4be, iso-8859-1, cp1252, cp850
CMUCL: utf-8, utf-16-be, utf-32-be, iso8859-1, cp1252
ECL: utf-8, ucs-2be, ucs-4be, iso-8859-1, windows-cp1252, dos-cp850
Example (continued): Using the optional encoding argument (Maxima compiled with SBCL, GNU/Linux terminal).
(%i8) string_to_octets("€", "ucs-2be"); (%o8) [20, 0AC]
Converts an object identifier (OID) to a list of octets.
Example: See octets_to_oid.
Returns the SHA1
fingerprint of a string, a non-negative integer or
a list of octets. The default return value is a string containing 40 hex characters.
The optional argument return-type allows sha1sum
to alternatively
return the corresponding number or list of octets.
return-type may be string
, number
or list
.
Example:
(%i1) ibase: obase: 16.$ (%i2) msg: "foo bar baz"$ (%i3) string: sha1sum(msg); (%o3) c7567e8b39e2428e38bf9c9226ac68de4c67dc39 (%i4) integer: sha1sum(msg, 'number); (%o4) 0c7567e8b39e2428e38bf9c9226ac68de4c67dc39 (%i5) octets: sha1sum(msg, 'list); (%o5) [0C7,56,7E,8B,39,0E2,42,8E,38,0BF,9C,92,26,0AC,68,0DE,4C,67,0DC,39] (%i6) sdowncase( printf(false, "~{~2,'0x~^:~}", octets) ); (%o6) c7:56:7e:8b:39:e2:42:8e:38:bf:9c:92:26:ac:68:de:4c:67:dc:39
Note that in case arg contains German umlauts or other non-ASCII
characters (resp. octets larger than 127) the SHA1
fingerprint is platform dependent.
Returns the SHA256
fingerprint of a string, a non-negative integer or
a list of octets. The default return value is a string containing 64 hex characters.
The optional argument return-type allows sha256sum
to alternatively
return the corresponding number or list of octets (see sha1sum).
Example:
(%i1) string: sha256sum("foo bar baz"); (%o1) dbd318c1c462aee872f41109a4dfd3048871a03dedd0fe0e757ced57dad6f2d7
Note that in case arg contains German umlauts or other non-ASCII
characters (resp. octets larger than 127) the SHA256
fingerprint is platform dependent.
Encodes a string into a list of octets according to current system defaults. When encoding strings containing Non-US-ASCII characters the result depends on the platform, application and underlying Lisp.
In case the external format of the Lisp reader is equal to UTF-8 the optional argument encoding allows to set the encoding for the string to octet conversion. If necessary see adjust_external_format for changing the external format.
See octets_to_string for examples and some more information.
Previous: Octets and Utilities for Cryptography [Contents][Index]
Next: Functions and Variables, Previous: Regular Expressions, Up: Regular Expressions [Contents][Index]
sregex
is an interface to the portable regex engine by Dorai
Sitaram. The syntax of the regular expressions is described in detail
in the pregexp
manual by Dorai Sitaram. See the manual for full details.
While sregex
supports Unicode, the support for Unicode characters in
strings is dependent on the support for Unicode characters in the Lisp
used to run Maxima.
Previous: Introduction to Regular Expressions, Up: Regular Expressions [Contents][Index]
Compile regex string in pattern to an internal form that is easier for the regex engine to process. This is not required, however. All the regex functions accept this compiled regex or a string. If the pattern is used many times, compiling the pattern will speed up matching.
(%i1) regex_compile("c.r"); (%o1) Structure [COMPILED-REGEX for "c.r"]
Return a list consisting of a list of the start and end positions of
str where the first match of regex occurred. If no match
is found, returns false
.
If a third argument, start, is supplied, it is the starting index of the text string str. The fourth argument, end, is the ending index of text string str.
(%i1) str : "his hay needle stack -- my hay needle stack -- her hay needle stack"$ (%i2) regex : regex_compile("ne{2}dle")$
(%i3) regex_match_pos(regex, str); (%o3) [[9, 15]]
(%i4) regex_match_pos("ne{2}dle", str); (%o4) [[9, 15]]
(%i5) regex_match_pos("ne{2}dle", str, 25, 44); (%o5) [[32, 38]]
Here is an example where regex_match_pos
returns a list of more
than one element:
(%i1) str : "jan 1, 1970"; (%o1) jan 1, 1970
(%i2) match: regex_match_pos("([a-z]+) ([0-9]+), ([0-9]+)", "jan 1, 1970"); (%o2) [[1, 12], [1, 4], [5, 6], [8, 12]]
(%i3) map(lambda([posn], substring(str, posn[1], posn[2])), match); (%o3) [jan 1, 1970, jan, 1, 1970]
The first element is for the full match. Each subsequent element of the list is the substring that matches the cluster enclosed in parenthesis in the given regular expression.
regex_match
is very similar to regex_match_pos
except
that it returns the matching substrings instead of the indices of the
match. If no match is found, returns false
.
(%i1) regex_match("ne{2}dle", "hay needle stack"); (%o1) [needle]
(%i2) regex_match("ne{2}dle", "hay needle stack", 10); (%o2) false
Here is examples using POSIX character classes. [:alpha:]
matches any letter. The pattern matches any letter or underscore:
(%i1) regex_match("[[:alpha:]_]", "--x--"); (%o1) [x]
(%i2) regex_match("[[:alpha:]_]", "--_--"); (%o2) [_]
(%i3) regex_match("[[:alpha:]_]", "--:--"); (%o3) false
sregex
supports clusters (see
pregexp clusters) which are subpatterns denoted
by being enclosed within parentheses. These cause the matcher to
return the submatch along with the overall match.
Here we are looking for any number of letters followed by a space, any number of digits, a comma and space, then any number of digits.
(%i1) regex_match("([a-z]+) ([0-9]+), ([0-9]+)", "jan 1, 1970"); (%o1) [jan 1, 1970, jan, 1, 1970]
The result is a list of strings. The first element is the full match.
The second matches "([a-z]+)"
, which is a cluster of any number
of letters. Hence, "jan"
matches this cluster. Likewise for
the other clusters.
A more complicated example illustrates how a subpattern fails to
match, but the overall pattern matches. In this case, false
represents to failed match.
The regex pattern matches “month year” or “month day, year”. The subpattern matches the day, if present.
(%i1) date_re : regex_compile("([a-z]+) +([0-9]+,)? *([0-9]+)"); (%o1) Structure [COMPILED-REGEX for "([a-z]+) +([0-9]+,)? *([0-9]+)"]
(%i2) regex_match(date_re, "jan 1, 1970"); (%o2) [jan 1, 1970, jan, 1,, 1970]
(%i3) regex_match(date_re, "jan 1970"); (%o3) [jan 1970, jan, false, 1970]
You can also do case-insensitve matches by using a cloister
(see
pregexp cloisters)
with the i
modifier:
(%i1) regex_match("hearth", "HeartH"); (%o1) false
(%i2) regex_match("(?i:hearth)", "HeartH"); (%o2) [HeartH]
Alternate subpatterns can be separated by |
.
(%i1) regex_match("f(ee|i|o|um)", "a small, final fee"); (%o1) [fi, i]
The first element is the full match "fi"
; the second shows
that we matched "i"
for the cluster.
Returns a list of strings where str has been split into substrings where the regex identifies the delimiters to use for separating the substrings.
(%i1) regex_split("[,;]+", "split,pea;;;soup"); (%o1) [split, pea, soup]
Returns a string where the first occurrence of pattern in str with replacement.
(%i1) regex_subst_first("ty", "t.", "liberte egalite fraternite"); (%o1) liberty egalite fraternite
This example shows how to use back references. The replacement specifies that the first submatch is used as the replacment text.
(%i1) regex_match("_(.+?)_", "the _nina_, the _pinta_, and the _santa maria_"); (%o1) [_nina_, nina]
(%i2) regex_subst_first("*\\1*", "_(.+?)_", "the _nina_, the _pinta_, and the _santa maria_"); (%o2) the *nina*, the _pinta_, and the _santa maria_
Returns a string where every occurrence of pattern has been replaced by replacement in the string str.
(%i1) regex_subst("ty", "t.\\b", "liberte egalite fraternite"); (%o1) liberty egality fraternity
Returns a regex string where any special reqex characters in str are quoted to remove the specialness of the character.
(%i1) re : string_to_regex(". :"); (%o1) \. :
(%i2) regex_match(re, "z :"); (%o2) false
(%i3) regex_match(re, ". :"); (%o3) [. :]
(%i4) regex_match(". :", "z :"); (%o4) [z :]
In this example, the regex will only match a substring consisting of a
period, followed by a space and a colon. Without the quoting, the
"."
would match any single character.
Next: Package to_poly_solve, Previous: Package stirling [Contents][Index]