Next: , Previous:   [Contents][Index]

93 Package stringproc


93.1 Introduction to String Processing

The package stringproc contains functions for processing strings and characters including formatting, encoding and data streams. This package is completed by some tools for cryptography, e.g. base64 and hash functions.

It can be directly loaded via load("stringproc") or automatically by using one of its functions.

For questions and bug reports please contact the author. The following command prints his e-mail-address.

printf(true, "~{~a~}@gmail.com", split(sdowncase("Volker van Nek")))$

A string is constructed by typing e.g. "Text". When the option variable stringdisp is set to false, which is the default, the double quotes won’t be printed. stringp is a test, if an object is a string.

(%i1) str: "Text";
(%o1)                         Text
(%i2) stringp(str);
(%o2)                         true

Characters are represented by a string of length 1. charp is the corresponding test.

(%i1) char: "e";
(%o1)                           e
(%i2) charp(char);
(%o2)                         true

In Maxima position indices in strings are like in list 1-indexed which results to the following consistency.

(%i1) is(charat("Lisp",1) = charlist("Lisp")[1]);
(%o1)                         true

A string may contain Maxima expressions. These can be parsed with parse_string.

(%i1) map(parse_string, ["42" ,"sqrt(2)", "%pi"]);
(%o1)                   [42, sqrt(2), %pi]
(%i2) map('float, %);
(%o2)        [42.0, 1.414213562373095, 3.141592653589793]

Strings can be processed as characters or in binary form as octets. Functions for conversions are string_to_octets and octets_to_string. Usable encodings depend on the platform, the application and the underlying Lisp. (The following shows Maxima in GNU/Linux, compiled with SBCL.)

(%i1) obase: 16.$
(%i2) string_to_octets("$£€", "cp1252");
(%o2)                     [24, 0A3, 80]
(%i3) string_to_octets("$£€", "utf-8");
(%o3)               [24, 0C2, 0A3, 0E2, 82, 0AC]

Strings may be written to character streams or as octets to binary streams. The following example demonstrates file in and output of characters.

openw returns an output stream to a file, printf writes formatted to that file and by e.g. close all characters contained in the stream are written to the file.

(%i1) s: openw("file.txt");
(%o1)                #<output stream file.txt>
(%i2) printf(s, "~%~d ~f ~a ~a ~f ~e ~a~%", 
42, 1.234, sqrt(2), %pi, 1.0e-2, 1.0e-2, 1.0b-2)$
(%i3) close(s)$

openr then returns an input stream from the previously used file and readline returns the line read as a string. The string may be tokenized by e.g. split or tokens and finally parsed by parse_string.

(%i4) s: openr("file.txt");
(%o4)                 #<input stream file.txt>
(%i5) readline(s);
(%o5)          42 1.234 sqrt(2) %pi 0.01 1.0E-2 1.0b-2
(%i6) map(parse_string, split(%));
(%o6)       [42, 1.234, sqrt(2), %pi, 0.01, 0.01, 1.0b-2]
(%i7) close(s)$

93.2 String Input and Output

Example: Formatted printing to a file.

(%i1) s: openw("file.txt");
(%o1)                      #<output stream file.txt>
(%i2) control: 
"~2tAn atom: ~20t~a~%~2tand a list: ~20t~{~r ~}~%~2t\
and an integer: ~20t~d~%"$
(%i3) printf( s,control, 'true,[1,2,3],42 )$
(%o3)                                false
(%i4) close(s);
(%o4)                                true
(%i5) s: openr("file.txt");
(%o5)                      #<input stream file.txt>
(%i6) while stringp( tmp:readline(s) ) do print(tmp)$
  An atom:          true 
  and a list:       one two three  
  and an integer:   42 
(%i7) close(s)$
Function: close (stream)

Closes stream and returns true if stream had been open.

Function: flength (stream)

stream has to be an open stream from or to a file. flength then returns the number of bytes which are currently present in this file.

Example: See writebyte .

Function: flush_output (stream)

Flushes stream where stream has to be an output stream to a file.

Example: See writebyte .

Categories: File output · Package stringproc ·
Function: fposition
    fposition (stream)
    fposition (stream, pos)

Returns the current position in stream, if pos is not used. If pos is used, fposition sets the position in stream. stream has to be a stream from or to a file and pos has to be a positive number.

Positions in data streams are like in strings or lists 1-indexed, i.e. the first element in stream is in position 1.

Function: freshline
    freshline ()
    freshline (stream)

Writes a new line to the standard output stream if the position is not at the beginning of a line and returns true. Using the optional argument stream the new line is written to that stream. There are some cases, where freshline() does not work as expected.

See also newline.

Categories: File output · Package stringproc ·
Function: get_output_stream_string (stream)

Returns a string containing all the characters currently present in stream which must be an open string-output stream. The returned characters are removed from stream.

Example: See make_string_output_stream .

Categories: Package stringproc ·
Function: make_string_input_stream
    make_string_input_stream (string)
    make_string_input_stream (string, start)
    make_string_input_stream (string, start, end)

Returns an input stream which contains parts of string and an end of file. Without optional arguments the stream contains the entire string and is positioned in front of the first character. start and end define the substring contained in the stream. The first character is available at position 1.

(%i1) istream : make_string_input_stream("text", 1, 4);
(%o1)              #<string-input stream from "text">
(%i2) (while (c : readchar(istream)) # false do sprint(c), newline())$
t e x 
(%i3) close(istream)$
Categories: Package stringproc ·
Function: make_string_output_stream ()

Returns an output stream that accepts characters. Characters currently present in this stream can be retrieved by get_output_stream_string.

(%i1) ostream : make_string_output_stream();
(%o1)               #<string-output stream 09622ea0>
(%i2) printf(ostream, "foo")$

(%i3) printf(ostream, "bar")$

(%i4) string : get_output_stream_string(ostream);
(%o4)                            foobar
(%i5) printf(ostream, "baz")$

(%i6) string : get_output_stream_string(ostream);
(%o6)                              baz
(%i7) close(ostream)$
Categories: Package stringproc ·
Function: newline
    newline ()
    newline (stream)

Writes a new line to the standard output stream. Using the optional argument stream the new line is written to that stream. There are some cases, where newline() does not work as expected.

See sprint for an example of using newline().

Categories: File output · Package stringproc ·
Function: opena (file)

Returns a character output stream to file. If an existing file is opened, opena appends elements at the end of file.

For binary output see opena_binary .

Categories: File output · Package stringproc ·
Function: openr
    openr (file)
    openr (file, encoding)

Returns a character input stream to file. openr assumes that file already exists. If reading the file results in a lisp error about its encoding passing the correct string as the argument encoding might help. The available encodings and their names depend on the lisp being used. For sbcl a list of suitable strings can be found at http://www.sbcl.org/manual/#External-Formats.

For binary input see openr_binary . See also close and openw.

(%i1) istream : openr("data.txt","EUC-JP");
(%o1)     #<FD-STREAM for "file /home/gunter/data.txt" {10099A3AE3}>
(%i2) close(istream);
(%o2)                                true
Categories: File input · Package stringproc ·
Function: openw (file)

Returns a character output stream to file. If file does not exist, it will be created. If an existing file is opened, openw destructively modifies file.

For binary output see openw_binary .

See also close and openr.

Categories: File output · Package stringproc ·
Function: printf
    printf (dest, string)
    printf (dest, string, expr_1, ..., expr_n)

Produces formatted output by outputting the characters of control-string string and observing that a tilde introduces a directive. The character after the tilde, possibly preceded by prefix parameters and modifiers, specifies what kind of formatting is desired. Most directives use one or more elements of the arguments expr_1, ..., expr_n to create their output.

If dest is a stream or true, then printf returns false. Otherwise, printf returns a string containing the output. By default the streams stdin, stdout and stderr are defined. If Maxima is running as a network client (which is the normal case if Maxima is communicating with a graphical user interface, which must be the server) setup-client will define old_stdout and old_stderr, too.

printf provides the Common Lisp function format in Maxima. The following example illustrates the general relation between these two functions.

(%i1) printf(true, "R~dD~d~%", 2, 2);
R2D2
(%o1)                                false
(%i2) :lisp (format t "R~dD~d~%" 2 2)
R2D2
NIL

The following description is limited to a rough sketch of the possibilities of printf. The Lisp function format is described in detail in many reference books. Of good help is e.g. the free available online-manual "Common Lisp the Language" by Guy L. Steele. See chapter 22.3.3 there.

In addition, printf recognizes two format directives which are not known to Lisp format. The format directive ~m indicates Maxima pretty printer output. The format directive ~h indicates a bigfloat number.

   ~%       new line
   ~&       fresh line
   ~t       tab
   ~$       monetary
   ~d       decimal integer
   ~b       binary integer
   ~o       octal integer
   ~x       hexadecimal integer
   ~br      base-b integer
   ~r       spell an integer
   ~p       plural
   ~f       floating point
   ~e       scientific notation
   ~g       ~f or ~e, depending upon magnitude
   ~h       bigfloat
   ~a       uses Maxima function string
   ~m       Maxima pretty printer output
   ~s       like ~a, but output enclosed in "double quotes"
   ~~       ~
   ~<       justification, ~> terminates
   ~(       case conversion, ~) terminates 
   ~[       selection, ~] terminates 
   ~{       iteration, ~} terminates

Note that the directive ~* is not supported.

If dest is a stream or true, then printf returns false. Otherwise, printf returns a string containing the output.

(%i1) printf( false, "~a ~a ~4f ~a ~@r", 
              "String",sym,bound,sqrt(12),144), bound = 1.234;
(%o1)                 String sym 1.23 2*sqrt(3) CXLIV
(%i2) printf( false,"~{~a ~}",["one",2,"THREE"] );
(%o2)                          one 2 THREE 
(%i3) printf(true,"~{~{~9,1f ~}~%~}",mat ),
          mat = args(matrix([1.1,2,3.33],[4,5,6],[7,8.88,9]))$
      1.1       2.0       3.3 
      4.0       5.0       6.0 
      7.0       8.9       9.0 
(%i4) control: "~:(~r~) bird~p ~[is~;are~] singing."$
(%i5) printf( false,control, n,n,if n=1 then 1 else 2 ), n=2;
(%o5)                    Two birds are singing.

The directive ~h has been introduced to handle bigfloats.

~w,d,e,x,o,p@H
 w : width
 d : decimal digits behind floating point
 e : minimal exponent digits
 x : preferred exponent
 o : overflow character
 p : padding character
 @ : display sign for positive numbers
(%i1) fpprec : 1000$
(%i2) printf(true, "|~h|~%", 2.b0^-64)$
|0.0000000000000000000542101086242752217003726400434970855712890625|
(%i3) fpprec : 26$
(%i4) printf(true, "|~h|~%", sqrt(2))$
|1.4142135623730950488016887|
(%i5) fpprec : 24$
(%i6) printf(true, "|~h|~%", sqrt(2))$
|1.41421356237309504880169|
(%i7) printf(true, "|~28h|~%", sqrt(2))$
|   1.41421356237309504880169|
(%i8) printf(true, "|~28,,,,,'*h|~%", sqrt(2))$
|***1.41421356237309504880169|
(%i9) printf(true, "|~,18h|~%", sqrt(2))$
|1.414213562373095049|
(%i10) printf(true, "|~,,,-3h|~%", sqrt(2))$
|1414.21356237309504880169b-3|
(%i11) printf(true, "|~,,2,-3h|~%", sqrt(2))$
|1414.21356237309504880169b-03|
(%i12) printf(true, "|~20h|~%", sqrt(2))$
|1.41421356237309504880169|
(%i13) printf(true, "|~20,,,,'+h|~%", sqrt(2))$
|++++++++++++++++++++|

For conversion of objects to strings also see concat, sconcat, string and simplode.

Categories: File output · Package stringproc ·
Function: readbyte (stream)

Removes and returns the first byte in stream which must be a binary input stream. If the end of file is encountered readbyte returns false.

Example: Read the first 16 bytes from a file encrypted with AES in OpenSSL.

(%i1) ibase: obase: 16.$

(%i2) in: openr_binary("msg.bin");
(%o2)                       #<input stream msg.bin>
(%i3) (L:[],  thru 16. do push(readbyte(in), L),  L:reverse(L));
(%o3) [53, 61, 6C, 74, 65, 64, 5F, 5F, 88, 56, 0DE, 8A, 74, 0FD,
       0AD, 0F0]
(%i4) close(in);
(%o4)                                true
(%i5) map(ascii, rest(L,-8));
(%o5)                      [S, a, l, t, e, d, _, _]
(%i6) salt: octets_to_number(rest(L,8));
(%o6)                          8856de8a74fdadf0
Categories: File input · Package stringproc ·
Function: readchar (stream)

Removes and returns the first character in stream. If the end of file is encountered readchar returns false.

Example: See make_string_input_stream.

Categories: File input · Package stringproc ·
Function: readline (stream)

Returns a string containing all characters starting at the current position in stream up to the end of the line or false if the end of the file is encountered.

Categories: File input · Package stringproc ·
Function: sprint (expr_1, …, expr_n)

Evaluates and displays its arguments one after the other ‘on a line’ starting at the leftmost position. The expressions are printed with a space character right next to the number, and it disregards line length. newline() might be used for line breaking.

Example: Sequential printing with sprint. Creating a new line with newline().

(%i1) for n:0 thru 19 do sprint(fib(n))$
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181
(%i2) for n:0 thru 22 do ( 
         sprint(fib(n)), 
         if mod(n,10) = 9 then newline() )$
0 1 1 2 3 5 8 13 21 34 
55 89 144 233 377 610 987 1597 2584 4181 
6765 10946 17711 
Categories: Package stringproc ·
Function: writebyte (byte, stream)

Writes byte to stream which must be a binary output stream. writebyte returns byte.

Example: Write some bytes to a binary file output stream. In this example all bytes correspond to printable characters and are printed by printfile. The bytes remain in the stream until flush_output or close have been called.

(%i1) ibase: obase: 16.$

(%i2) bytes: map(cint, charlist("GNU/Linux"));
(%o2)                [47, 4E, 55, 2F, 4C, 69, 6E, 75, 78]
(%i3) out: openw_binary("test.bin");
(%o3)                      #<output stream test.bin>
(%i4) for i thru 3 do writebyte(bytes[i], out);
(%o4)                                done
(%i5) printfile("test.bin")$

(%i6) flength(out);
(%o6)                                  0
(%i7) flush_output(out);
(%o7)                                true
(%i8) flength(out);
(%o8)                                  3
(%i9) printfile("test.bin")$
GNU
(%i0A) for b in rest(bytes,3) do writebyte(b, out);
(%o0A)                               done
(%i0B) close(out);
(%o0B)                               true
(%i0C) printfile("test.bin")$
GNU/Linux
Categories: File output · Package stringproc ·

93.3 Characters

Characters are strings of length 1.

Function: adjust_external_format ()

Prints information about the current external format of the Lisp reader and in case the external format encoding differs from the encoding of the application which runs Maxima adjust_external_format tries to adjust the encoding or prints some help or instruction. adjust_external_format returns true when the external format has been changed and false otherwise.

Functions like cint, unicode, octets_to_string and string_to_octets need UTF-8 as the external format of the Lisp reader to work properly over the full range of Unicode characters.

Examples (Maxima on Windows, March 2016): Using adjust_external_format when the default external format is not equal to the encoding provided by the application.

1. Command line Maxima

In case a terminal session is preferred it is recommended to use Maxima compiled with SBCL. Here Unicode support is provided by default and calls to adjust_external_format are unnecessary.

If Maxima is compiled with CLISP or GCL it is recommended to change the terminal encoding from CP850 to CP1252. adjust_external_format prints some help.

CCL reads UTF-8 while the terminal input is CP850 by default. CP1252 is not supported by CCL. adjust_external_format prints instructions for changing the terminal encoding and external format both to iso-8859-1.

2. wxMaxima

In wxMaxima SBCL reads CP1252 by default but the input from the application is UTF-8 encoded. Adjustment is needed.

Calling adjust_external_format and restarting Maxima permanently changes the default external format to UTF-8.

(%i1)adjust_external_format();
The line
(setf sb-impl::*default-external-format* :utf-8)
has been appended to the init file
C:/Users/Username/.sbclrc
Please restart Maxima to set the external format to UTF-8.
(%i1) false

Restarting Maxima.

(%i1) adjust_external_format();
The external format is currently UTF-8
and has not been changed.
(%i1) false
Categories: Package stringproc ·
Function: alphacharp (char)

Returns true if char is an alphabetic character.

To identify a non-US-ASCII character as an alphabetic character the underlying Lisp must provide full Unicode support. E.g. a German umlaut is detected as an alphabetic character with SBCL in GNU/Linux but not with GCL. (In Windows Maxima, when compiled with SBCL, must be set to UTF-8. See adjust_external_format for more.)

Example: Examination of non-US-ASCII characters.

The underlying Lisp (SBCL, GNU/Linux) is able to convert the typed character into a Lisp character and to examine.

(%i1) alphacharp("ü");
(%o1)                          true

In GCL this is not possible. An error break occurs.

(%i1) alphacharp("u");
(%o1)                          true
(%i2) alphacharp("ü");

package stringproc: ü cannot be converted into a Lisp character.
 -- an error.
Function: alphanumericp (char)

Returns true if char is an alphabetic character or a digit (only corresponding US-ASCII characters are regarded as digits).

Note: See remarks on alphacharp.

Function: ascii (int)

Returns the US-ASCII character corresponding to the integer int which has to be less than 128.

See unicode for converting code points larger than 127.

Examples:

(%i1) for n from 0 thru 127 do ( 
        ch: ascii(n), 
        if alphacharp(ch) then sprint(ch),
        if n = 96 then newline() )$
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
a b c d e f g h i j k l m n o p q r s t u v w x y z
Categories: Package stringproc ·
Function: cequal (char_1, char_2)

Returns true if char_1 and char_2 are the same character.

Function: cequalignore (char_1, char_2)

Like cequal but ignores case which is only possible for non-US-ASCII characters when the underlying Lisp is able to recognize a character as an alphabetic character. See remarks on alphacharp.

Function: cgreaterp (char_1, char_2)

Returns true if the code point of char_1 is greater than the code point of char_2.

Function: cgreaterpignore (char_1, char_2)

Like cgreaterp but ignores case which is only possible for non-US-ASCII characters when the underlying Lisp is able to recognize a character as an alphabetic character. See remarks on alphacharp.

Function: charp (obj)

Returns true if obj is a Maxima-character. See introduction for example.

Function: cint (char)

Returns the Unicode code point of char which must be a Maxima character, i.e. a string of length 1.

Examples: The hexadecimal code point of some characters (Maxima with SBCL on GNU/Linux).

(%i1) obase: 16.$
(%i2) map(cint, ["$","£","€"]);
(%o2)                           [24, 0A3, 20AC]

Warning: It is not possible to enter characters corresponding to code points larger than 16 bit in wxMaxima with SBCL on Windows when the external format has not been set to UTF-8. See adjust_external_format.

CMUCL doesn’t process these characters as one character. cint then returns false. Converting a character to a code point via UTF-8-octets may serve as a workaround:

utf8_to_unicode(string_to_octets(character));

See utf8_to_unicode, string_to_octets.

Categories: Package stringproc ·
Function: clessp (char_1, char_2)

Returns true if the code point of char_1 is less than the code point of char_2.

Function: clesspignore (char_1, char_2)

Like clessp but ignores case which is only possible for non-US-ASCII characters when the underlying Lisp is able to recognize a character as an alphabetic character. See remarks on alphacharp.

Function: constituent (char)

Returns true if char is a graphic character but not a space character. A graphic character is a character one can see, plus the space character. (constituent is defined by Paul Graham. See Paul Graham, ANSI Common Lisp, 1996, page 67.)

(%i1) for n from 0 thru 255 do ( 
tmp: ascii(n), if constituent(tmp) then sprint(tmp) )$
! " #  %  ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B
C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c
d e f g h i j k l m n o p q r s t u v w x y z { | } ~
Function: digitcharp (char)

Returns true if char is a digit where only the corresponding US-ASCII-character is regarded as a digit.

Function: lowercasep (char)

Returns true if char is a lowercase character.

Note: See remarks on alphacharp.

Variable: newline

The newline character (ASCII-character 10).

Variable: space

The space character.

Variable: tab

The tab character.

Function: unicode (arg)

Returns the character defined by arg which might be a Unicode code point or a name string if the underlying Lisp provides full Unicode support.

Example: Characters defined by hexadecimal code points (Maxima with SBCL on GNU/Linux).

(%i1) ibase: 16.$
(%i2) map(unicode, [24, 0A3, 20AC]);
(%o2)                            [$, £, €]

Warning: In wxMaxima with SBCL on Windows it is not possible to convert code points larger than 16 bit to characters when the external format has not been set to UTF-8. See adjust_external_format for more information.

CMUCL doesn’t process code points larger than 16 bit. In these cases unicode returns false. Converting a code point to a character via UTF-8 octets may serve as a workaround:

octets_to_string(unicode_to_utf8(code_point));

See octets_to_string, unicode_to_utf8.

In case the underlying Lisp provides full Unicode support the character might be specified by its name. The following is possible in ECL, CLISP and SBCL, where in SBCL on Windows the external format has to be set to UTF-8. unicode(name) is supported by CMUCL too but again limited to 16 bit characters.

The string argument to unicode is basically the same string returned by printf using the "~@c" specifier. But as shown below the prefix "#\" must be omitted. Underlines might be replaced by spaces and uppercase letters by lowercase ones.

Example (continued): Characters defined by names (Maxima with SBCL on GNU/Linux).

(%i3) printf(false, "~@c", unicode(0DF));
(%o3)                    #\LATIN_SMALL_LETTER_SHARP_S
(%i4) unicode("LATIN_SMALL_LETTER_SHARP_S");
(%o4)                                  ß
(%i5) unicode("Latin small letter sharp s");
(%o5)                                  ß
Categories: Package stringproc ·
Function: unicode_to_utf8 (code_point)

Returns a list containing the UTF-8 code corresponding to the Unicode code_point.

Examples: Converting Unicode code points to UTF-8 and vice versa.

(%i1) ibase: obase: 16.$
(%i2) map(cint, ["$","£","€"]);
(%o2)                           [24, 0A3, 20AC]
(%i3) map(unicode_to_utf8, %);
(%o3)                 [[24], [0C2, 0A3], [0E2, 82, 0AC]]
(%i4) map(utf8_to_unicode, %);
(%o4)                           [24, 0A3, 20AC]
Categories: Package stringproc ·
Function: uppercasep (char)

Returns true if char is an uppercase character.

Note: See remarks on alphacharp.

Variable: us_ascii_only

This option variable affects Maxima when the character encoding provided by the application which runs Maxima is UTF-8 but the external format of the Lisp reader is not equal to UTF-8.

On GNU/Linux this is true when Maxima is built with GCL and on Windows in wxMaxima with GCL- and SBCL-builds. With SBCL it is recommended to change the external format to UTF-8. Setting us_ascii_only is unnecessary then. See adjust_external_format for details.

us_ascii_only is false by default. Maxima itself then (i.e. in the above described situation) parses the UTF-8 encoding.

When us_ascii_only is set to true it is assumed that all strings used as arguments to string processing functions do not contain Non-US-ASCII characters. Given that promise, Maxima avoids parsing UTF-8 and strings can be processed more efficiently.

Function: utf8_to_unicode (list)

Returns a Unicode code point corresponding to the list which must contain the UTF-8 encoding of a single character.

Examples: See unicode_to_utf8.

Categories: Package stringproc ·

93.4 String Processing

Position indices in strings are 1-indexed like in Maxima lists. See example in charat.

Function: charat (string, n)

Returns the n-th character of string. The first character in string is returned with n = 1.

(%i1) charat("Lisp",1);
(%o1)                           L
(%i2) charlist("Lisp")[1];
(%o2)                           L
Categories: Package stringproc ·
Function: charlist (string)

Returns the list of all characters in string.

(%i1) charlist("Lisp");
(%o1)                     [L, i, s, p]
Categories: Package stringproc ·
Function: eval_string (str)

Parse the string str as a Maxima expression and evaluate it. The string str may or may not have a terminator (dollar sign $ or semicolon ;). Only the first expression is parsed and evaluated, if there is more than one.

Complain if str is not a string.

Examples:

(%i1) eval_string ("foo: 42; bar: foo^2 + baz");
(%o1)                       42
(%i2) eval_string ("(foo: 42, bar: foo^2 + baz)");
(%o2)                   baz + 1764

See also parse_string and eval_string_lisp.

Categories: Package stringproc ·
Function: parse_string (str)

Parse the string str as a Maxima expression (do not evaluate it). The string str may or may not have a terminator (dollar sign $ or semicolon ;). Only the first expression is parsed, if there is more than one.

Complain if str is not a string.

Examples:

(%i1) parse_string ("foo: 42; bar: foo^2 + baz");
(%o1)                    foo : 42
(%i2) parse_string ("(foo: 42, bar: foo^2 + baz)");
                                   2
(%o2)          (foo : 42, bar : foo  + baz)

See also eval_string.

Categories: Package stringproc ·
Function: scopy (string)

Returns a copy of string as a new string.

Categories: Package stringproc ·
Function: sdowncase
    sdowncase (string)
    sdowncase (string, start)
    sdowncase (string, start, end)

Like supcase but uppercase characters are converted to lowercase.

Categories: Package stringproc ·
Function: sequal (string_1, string_2)

Returns true if string_1 and string_2 contain the same sequence of characters.

Function: sequalignore (string_1, string_2)

Like sequal but ignores case which is only possible for non-US-ASCII characters when the underlying Lisp is able to recognize a character as an alphabetic character. See remarks on alphacharp.

Function: sexplode (string)

sexplode is an alias for function charlist.

Categories: Package stringproc ·
Function: simplode
    simplode (list)
    simplode (list, delim)

simplode takes a list of expressions and concatenates them into a string. If no delimiter delim is specified, simplode uses no delimiter. delim can be any string.

See also concat, sconcat, string and printf.

Examples:

(%i1) simplode(["xx[",3,"]:",expand((x+y)^3)]);
(%o1)             xx[3]:y^3+3*x*y^2+3*x^2*y+x^3
(%i2) simplode( sexplode("stars")," * " );
(%o2)                   s * t * a * r * s
(%i3) simplode( ["One","more","coffee."]," " );
(%o3)                   One more coffee.
Categories: Package stringproc ·
Function: sinsert (seq, string, pos)

Returns a string that is a concatenation of substring(string, 1, pos-1), the string seq and substring (string, pos). Note that the first character in string is in position 1.

Examples:

(%i1) s: "A submarine."$
(%i2) concat( substring(s,1,3),"yellow ",substring(s,3) );
(%o2)                  A yellow submarine.
(%i3) sinsert("hollow ",s,3);
(%o3)                  A hollow submarine.
Categories: Package stringproc ·
Function: sinvertcase
    sinvertcase (string)
    sinvertcase (string, start)
    sinvertcase (string, start, end)

Returns string except that each character from position start to end is inverted. If end is not given, all characters from start to the end of string are replaced.

Examples:

(%i1) sinvertcase("sInvertCase");
(%o1)                      SiNVERTcASE
Categories: Package stringproc ·
Function: slength (string)

Returns the number of characters in string.

Categories: Package stringproc ·
Function: smake (num, char)

Returns a new string with a number of num characters char.

Example:

(%i1) smake(3,"w");
(%o1)                          www
Categories: Package stringproc ·
Function: smismatch
    smismatch (string_1, string_2)
    smismatch (string_1, string_2, test)

Returns the position of the first character of string_1 at which string_1 and string_2 differ or false. Default test function for matching is sequal. If smismatch should ignore case, use sequalignore as test.

Example:

(%i1) smismatch("seven","seventh");
(%o1)                           6
Categories: Package stringproc ·
Function: split
    split (string)
    split (string, delim)
    split (string, delim, multiple)

Returns the list of all tokens in string. Each token is an unparsed string. split uses delim as delimiter. If delim is not given, the space character is the default delimiter. multiple is a boolean variable with true by default. Multiple delimiters are read as one. This is useful if tabs are saved as multiple space characters. If multiple is set to false, each delimiter is noted.

Examples:

(%i1) split("1.2   2.3   3.4   4.5");
(%o1)                 [1.2, 2.3, 3.4, 4.5]
(%i2) split("first;;third;fourth",";",false);
(%o2)               [first, , third, fourth]
Categories: Package stringproc ·
Function: sposition (char, string)

Returns the position of the first character in string which matches char. The first character in string is in position 1. For matching characters ignoring case see ssearch.

Categories: Package stringproc ·
Function: sremove
    sremove (seq, string)
    sremove (seq, string, test)
    sremove (seq, string, test, start)
    sremove (seq, string, test, start, end)

Returns a string like string but without all substrings matching seq. Default test function for matching is sequal. If sremove should ignore case while searching for seq, use sequalignore as test. Use start and end to limit searching. Note that the first character in string is in position 1.

Examples:

(%i1) sremove("n't","I don't like coffee.");
(%o1)                   I do like coffee.
(%i2) sremove ("DO ",%,'sequalignore);
(%o2)                    I like coffee.
Categories: Package stringproc ·
Function: sremovefirst
    sremovefirst (seq, string)
    sremovefirst (seq, string, test)
    sremovefirst (seq, string, test, start)
    sremovefirst (seq, string, test, start, end)

Like sremove except that only the first substring that matches seq is removed.

Categories: Package stringproc ·
Function: sreverse (string)

Returns a string with all the characters of string in reverse order.

See also reverse.

Categories: Package stringproc ·
Function: ssearch
    ssearch (seq, string)
    ssearch (seq, string, test)
    ssearch (seq, string, test, start)
    ssearch (seq, string, test, start, end)

Returns the position of the first substring of string that matches the string seq. Default test function for matching is sequal. If ssearch should ignore case, use sequalignore as test. Use start and end to limit searching. Note that the first character in string is in position 1.

Example:

(%i1) ssearch("~s","~{~S ~}~%",'sequalignore);
(%o1)                                  4
Categories: Package stringproc ·
Function: ssort
    ssort (string)
    ssort (string, test)

Returns a string that contains all characters from string in an order such there are no two successive characters c and d such that test (c, d) is false and test (d, c) is true. Default test function for sorting is clessp. The set of test functions is {clessp, clesspignore, cgreaterp, cgreaterpignore, cequal, cequalignore}.

Examples:

(%i1) ssort("I don't like Mondays.");
(%o1)                    '.IMaddeiklnnoosty
(%i2) ssort("I don't like Mondays.",'cgreaterpignore);
(%o2)                 ytsoonnMlkIiedda.'   
Categories: Package stringproc ·
Function: ssubst
    ssubst (new, old, string)
    ssubst (new, old, string, test)
    ssubst (new, old, string, test, start)
    ssubst (new, old, string, test, start, end)

Returns a string like string except that all substrings matching old are replaced by new. old and new need not to be of the same length. Default test function for matching is sequal. If ssubst should ignore case while searching for old, use sequalignore as test. Use start and end to limit searching. Note that the first character in string is in position 1.

Examples:

(%i1) ssubst("like","hate","I hate Thai food. I hate green tea.");
(%o1)          I like Thai food. I like green tea.
(%i2) ssubst("Indian","thai",%,'sequalignore,8,12);
(%o2)         I like Indian food. I like green tea.
Categories: Package stringproc ·
Function: ssubstfirst
    ssubstfirst (new, old, string)
    ssubstfirst (new, old, string, test)
    ssubstfirst (new, old, string, test, start)
    ssubstfirst (new, old, string, test, start, end)

Like subst except that only the first substring that matches old is replaced.

Categories: Package stringproc ·
Function: strim (seq,string)

Returns a string like string, but with all characters that appear in seq removed from both ends.

Examples:

(%i1) "/* comment */"$
(%i2) strim(" /*",%);
(%o2)                        comment
(%i3) slength(%);
(%o3)                           7
Categories: Package stringproc ·
Function: striml (seq, string)

Like strim except that only the left end of string is trimmed.

Categories: Package stringproc ·
Function: strimr (seq, string)

Like strim except that only the right end of string is trimmed.

Categories: Package stringproc ·
Function: stringp (obj)

Returns true if obj is a string. See introduction for example.

Function: substring
    substring (string, start)
    substring (string, start, end)

Returns the substring of string beginning at position start and ending at position end. The character at position end is not included. If end is not given, the substring contains the rest of the string. Note that the first character in string is in position 1.

Examples:

(%i1) substring("substring",4);
(%o1)                        string
(%i2) substring(%,4,6);
(%o2)                          in
Categories: Package stringproc ·
Function: supcase
    supcase (string)
    supcase (string, start)
    supcase (string, start, end)

Returns string except that lowercase characters from position start to end are replaced by the corresponding uppercase ones. If end is not given, all lowercase characters from start to the end of string are replaced.

Example:

(%i1) supcase("english",1,2);
(%o1)                        English
Categories: Package stringproc ·
Function: tokens
    tokens (string)
    tokens (string, test)

Returns a list of tokens, which have been extracted from string. The tokens are substrings whose characters satisfy a certain test function. If test is not given, constituent is used as the default test. {constituent, alphacharp, digitcharp, lowercasep, uppercasep, charp, characterp, alphanumericp} is the set of test functions. (The Lisp-version of tokens is written by Paul Graham. ANSI Common Lisp, 1996, page 67.)

Examples:

(%i1) tokens("24 October 2005");
(%o1)                  [24, October, 2005]
(%i2) tokens("05-10-24",'digitcharp);
(%o2)                     [05, 10, 24]
(%i3) map(parse_string,%);
(%o3)                      [5, 10, 24]
Categories: Package stringproc ·

93.5 Octets and Utilities for Cryptography

Function: base64 (arg)

Returns the base64-representation of arg as a string. The argument arg may be a string, a non-negative integer or a list of octets.

Examples:

(%i1) base64: base64("foo bar baz");
(%o1)                          Zm9vIGJhciBiYXo=
(%i2) string: base64_decode(base64);
(%o2)                            foo bar baz
(%i3) obase: 16.$
(%i4) integer: base64_decode(base64, 'number);
(%o4)                       666f6f206261722062617a
(%i5) octets: base64_decode(base64, 'list);
(%o5)            [66, 6F, 6F, 20, 62, 61, 72, 20, 62, 61, 7A]
(%i6) ibase: 16.$
(%i7) base64(octets);
(%o7)                          Zm9vIGJhciBiYXo=

Note that if arg contains umlauts (resp. octets larger than 127) the resulting base64-string is platform dependent. However the decoded string will be equal to the original.

Categories: Package stringproc ·
Function: base64_decode
    base64_decode (base64-string)
    base64_decode (base64-string, return-type)

By default base64_decode decodes the base64-string back to the original string.

The optional argument return-type allows base64_decode to alternatively return the corresponding number or list of octets. return-type may be string, number or list.

Example: See base64.

Categories: Package stringproc ·
Function: crc24sum
    crc24sum (octets)
    crc24sum (octets, return-type)

By default crc24sum returns the CRC24 checksum of an octet-list as a string.

The optional argument return-type allows crc24sum to alternatively return the corresponding number or list of octets. return-type may be string, number or list.

Example:

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBAgAGBQJVdCTzAAoJEG/1Mgf2DWAqCSYH/AhVFwhu1D89C3/QFcgVvZTM
wnOYzBUURJAL/cT+IngkLEpp3hEbREcugWp+Tm6aw3R4CdJ7G3FLxExBH/5KnDHi
rBQu+I7+3ySK2hpryQ6Wx5J9uZSa4YmfsNteR8up0zGkaulJeWkS4pjiRM+auWVe
vajlKZCIK52P080DG7Q2dpshh4fgTeNwqCuCiBhQ73t8g1IaLdhDN6EzJVjGIzam
/spqT/sTo6sw8yDOJjvU+Qvn6/mSMjC/YxjhRMaQt9EMrR1AZ4ukBF5uG1S7mXOH
WdiwkSPZ3gnIBhM9SuC076gLWZUNs6NqTeE3UzMjDAFhH3jYk1T7mysCvdtIkms=
=WmeC
-----END PGP SIGNATURE-----
(%i1) ibase : obase : 16.$
(%i2) sig64 : sconcat(
 "iQEcBAEBAgAGBQJVdCTzAAoJEG/1Mgf2DWAqCSYH/AhVFwhu1D89C3/QFcgVvZTM",
 "wnOYzBUURJAL/cT+IngkLEpp3hEbREcugWp+Tm6aw3R4CdJ7G3FLxExBH/5KnDHi",
 "rBQu+I7+3ySK2hpryQ6Wx5J9uZSa4YmfsNteR8up0zGkaulJeWkS4pjiRM+auWVe",
 "vajlKZCIK52P080DG7Q2dpshh4fgTeNwqCuCiBhQ73t8g1IaLdhDN6EzJVjGIzam",
 "/spqT/sTo6sw8yDOJjvU+Qvn6/mSMjC/YxjhRMaQt9EMrR1AZ4ukBF5uG1S7mXOH",
 "WdiwkSPZ3gnIBhM9SuC076gLWZUNs6NqTeE3UzMjDAFhH3jYk1T7mysCvdtIkms=" )$
(%i3) octets: base64_decode(sig64, 'list)$
(%i4) crc24: crc24sum(octets, 'list);
(%o4)                          [5A, 67, 82]
(%i5) base64(crc24);
(%o5)                              WmeC
Categories: Package stringproc ·
Function: md5sum
    md5sum (arg)
    md5sum (arg, return-type)

Returns the MD5 checksum of a string, non-negative integer, list of octets, or binary (not character) input stream. A file for which an input stream is opened may be an ordinary text file; it is the stream which needs to be binary, not the file itself.

When the argument is an input stream, md5sum reads the entire content of the stream, but does not close the stream.

The default return value is a string containing 32 hex characters. The optional argument return-type allows md5sum to alternatively return the corresponding number or list of octets. return-type may be string, number or list.

Note that in case arg contains German umlauts or other non-ASCII characters (resp. octets larger than 127) the MD5 checksum is platform dependent.

Examples:

(%i1) ibase: obase: 16.$
(%i2) msg: "foo bar baz"$
(%i3) string: md5sum(msg);
(%o3)                  ab07acbb1e496801937adfa772424bf7
(%i4) integer: md5sum(msg, 'number);
(%o4)                 0ab07acbb1e496801937adfa772424bf7
(%i5) octets: md5sum(msg, 'list);
(%o5)        [0AB,7,0AC,0BB,1E,49,68,1,93,7A,0DF,0A7,72,42,4B,0F7]
(%i6) sdowncase( printf(false, "~{~2,'0x~^:~}", octets) );
(%o6)           ab:07:ac:bb:1e:49:68:01:93:7a:df:a7:72:42:4b:f7

The argument may be a binary input stream.

(%i1) S: openr_binary (file_search ("md5.lisp"));
(%o1) #<INPUT BUFFERED FILE-STREAM (UNSIGNED-BYTE 8)
  /home/robert/maxima/maxima-code/share/stringproc/md5.lisp>
(%i2) md5sum (S);
(%o2)           31a512ed53daf5b99495c9d05559355f
(%i3) close (S);
(%o3)                         true
Categories: Package stringproc ·
Function: mgf1_sha1
    mgf1_sha1 (seed, len)
    mgf1_sha1 (seed, len, return-type)

Returns a pseudo random number of variable length. By default the returned value is a number with a length of len octets.

The optional argument return-type allows mgf1_sha1 to alternatively return the corresponding list of len octets. return-type may be number or list.

The computation of the returned value is described in RFC 3447, appendix B.2.1 MGF1. SHA1 is used as hash function, i.e. the randomness of the computed number relies on the randomness of SHA1 hashes.

Example:

(%i1) ibase: obase: 16.$
(%i2) number: mgf1_sha1(4711., 8);
(%o2)                        0e0252e5a2a42fea1
(%i3) octets: mgf1_sha1(4711., 8, 'list);
(%o3)                  [0E0,25,2E,5A,2A,42,0FE,0A1]
Categories: Package stringproc ·
Function: number_to_octets (number)

Returns an octet-representation of number as a list of octets. The number must be a non-negative integer.

Example:

(%i1) ibase : obase : 16.$
(%i2) octets: [0ca,0fe,0ba,0be]$
(%i3) number: octets_to_number(octets);
(%o3)                            0cafebabe
(%i4) number_to_octets(number);
(%o4)                      [0CA, 0FE, 0BA, 0BE]
Categories: Package stringproc ·
Function: octets_to_number (octets)

Returns a number by concatenating the octets in the list of octets.

Example: See number_to_octets.

Categories: Package stringproc ·
Function: octets_to_oid (octets)

Computes an object identifier (OID) from the list of octets.

Example: RSA encryption OID

(%i1) ibase : obase : 16.$
(%i2) oid: octets_to_oid([2A,86,48,86,0F7,0D,1,1,1]);
(%o2)                      1.2.840.113549.1.1.1
(%i3) oid_to_octets(oid);
(%o3)               [2A, 86, 48, 86, 0F7, 0D, 1, 1, 1]
Categories: Package stringproc ·
Function: octets_to_string
    octets_to_string (octets)
    octets_to_string (octets, encoding)

Decodes the list of octets into a string according to current system defaults. When decoding octets corresponding to Non-US-ASCII characters the result depends on the platform, application and underlying Lisp.

Example: Using system defaults (Maxima compiled with GCL, which uses no format definition and simply passes through the UTF-8-octets encoded by the GNU/Linux terminal).

(%i1) octets: string_to_octets("abc");
(%o1)                            [61, 62, 63]
(%i2) octets_to_string(octets);
(%o2)                                 abc
(%i3) ibase: obase: 16.$
(%i4) unicode(20AC);
(%o4)                                  €
(%i5) octets: string_to_octets(%);
(%o5)                           [0E2, 82, 0AC]
(%i6) octets_to_string(octets);
(%o6)                                  €
(%i7) utf8_to_unicode(octets);
(%o7)                                20AC

In case the external format of the Lisp reader is equal to UTF-8 the optional argument encoding allows to set the encoding for the octet to string conversion. If necessary see adjust_external_format for changing the external format.

Some names of supported encodings (see corresponding Lisp manual for more):
CCL, CLISP, SBCL: utf-8, ucs-2be, ucs-4be, iso-8859-1, cp1252, cp850
CMUCL: utf-8, utf-16-be, utf-32-be, iso8859-1, cp1252
ECL: utf-8, ucs-2be, ucs-4be, iso-8859-1, windows-cp1252, dos-cp850

Example (continued): Using the optional encoding argument (Maxima compiled with SBCL, GNU/Linux terminal).

(%i8) string_to_octets("€", "ucs-2be");
(%o8)                              [20, 0AC]
Categories: Package stringproc ·
Function: oid_to_octets (oid-string)

Converts an object identifier (OID) to a list of octets.

Example: See octets_to_oid.

Categories: Package stringproc ·
Function: sha1sum
    sha1sum (arg)
    sha1sum (arg, return-type)

Returns the SHA1 fingerprint of a string, a non-negative integer or a list of octets. The default return value is a string containing 40 hex characters.

The optional argument return-type allows sha1sum to alternatively return the corresponding number or list of octets. return-type may be string, number or list.

Example:

(%i1) ibase: obase: 16.$
(%i2) msg: "foo bar baz"$
(%i3) string: sha1sum(msg);
(%o3)              c7567e8b39e2428e38bf9c9226ac68de4c67dc39
(%i4) integer: sha1sum(msg, 'number);
(%o4)             0c7567e8b39e2428e38bf9c9226ac68de4c67dc39
(%i5) octets: sha1sum(msg, 'list);
(%o5)  [0C7,56,7E,8B,39,0E2,42,8E,38,0BF,9C,92,26,0AC,68,0DE,4C,67,0DC,39]
(%i6) sdowncase( printf(false, "~{~2,'0x~^:~}", octets) );
(%o6)     c7:56:7e:8b:39:e2:42:8e:38:bf:9c:92:26:ac:68:de:4c:67:dc:39

Note that in case arg contains German umlauts or other non-ASCII characters (resp. octets larger than 127) the SHA1 fingerprint is platform dependent.

Categories: Package stringproc ·
Function: sha256sum
    sha256sum (arg)
    sha256sum (arg, return-type)

Returns the SHA256 fingerprint of a string, a non-negative integer or a list of octets. The default return value is a string containing 64 hex characters.

The optional argument return-type allows sha256sum to alternatively return the corresponding number or list of octets (see sha1sum).

Example:

(%i1) string: sha256sum("foo bar baz");
(%o1)  dbd318c1c462aee872f41109a4dfd3048871a03dedd0fe0e757ced57dad6f2d7

Note that in case arg contains German umlauts or other non-ASCII characters (resp. octets larger than 127) the SHA256 fingerprint is platform dependent.

Categories: Package stringproc ·
Function: string_to_octets
    string_to_octets (string)
    string_to_octets (string, encoding)

Encodes a string into a list of octets according to current system defaults. When encoding strings containing Non-US-ASCII characters the result depends on the platform, application and underlying Lisp.

In case the external format of the Lisp reader is equal to UTF-8 the optional argument encoding allows to set the encoding for the string to octet conversion. If necessary see adjust_external_format for changing the external format.

See octets_to_string for examples and some more information.

Categories: Package stringproc ·

93.6 Regular Expressions


93.6.1 Introduction to Regular Expressions

sregex is an interface to the portable regex engine by Dorai Sitaram. The syntax of the regular expressions is described in detail in the pregexp manual by Dorai Sitaram. See the manual for full details.

While sregex supports Unicode, the support for Unicode characters in strings is dependent on the support for Unicode characters in the Lisp used to run Maxima.


93.6.2 Functions and Variables

Function: regex_compile (pattern)

Compile regex string in pattern to an internal form that is easier for the regex engine to process. This is not required, however. All the regex functions accept this compiled regex or a string. If the pattern is used many times, compiling the pattern will speed up matching.

(%i1) regex_compile("c.r");
(%o1)         Structure [COMPILED-REGEX for "c.r"]
Function: regex_match_pos (regex, str)
Function: regex_match_pos (regex, str, start)
Function: regex_match_pos (regex, str, start, end)

Return a list consisting of a list of the start and end positions of str where the first match of regex occurred. If no match is found, returns false.

If a third argument, start, is supplied, it is the starting index of the text string str. The fourth argument, end, is the ending index of text string str.

(%i1) str : "his hay needle stack -- my hay needle stack -- her hay needle stack"$
(%i2) regex : regex_compile("ne{2}dle")$
(%i3) regex_match_pos(regex, str);
(%o3)                       [[9, 15]]
(%i4) regex_match_pos("ne{2}dle", str);
(%o4)                       [[9, 15]]
(%i5) regex_match_pos("ne{2}dle", str, 25, 44);
(%o5)                      [[32, 38]]

Here is an example where regex_match_pos returns a list of more than one element:

(%i1) str : "jan 1, 1970";
(%o1)                      jan 1, 1970
(%i2) match: regex_match_pos("([a-z]+) ([0-9]+), ([0-9]+)", "jan 1, 1970");
(%o2)          [[1, 12], [1, 4], [5, 6], [8, 12]]
(%i3) map(lambda([posn], substring(str, posn[1], posn[2])), match);
(%o3)              [jan 1, 1970, jan, 1, 1970]

The first element is for the full match. Each subsequent element of the list is the substring that matches the cluster enclosed in parenthesis in the given regular expression.

Categories: Package stringproc ·
Function: regex_match (regex, str)
Function: regex_match (regex, str, start)
Function: regex_match (regex, str, start, end)

regex_match is very similar to regex_match_pos except that it returns the matching substrings instead of the indices of the match. If no match is found, returns false.

(%i1) regex_match("ne{2}dle", "hay needle stack");
(%o1)                       [needle]
(%i2) regex_match("ne{2}dle", "hay needle stack", 10);
(%o2)                         false

Here is examples using POSIX character classes. [:alpha:] matches any letter. The pattern matches any letter or underscore:

(%i1) regex_match("[[:alpha:]_]", "--x--");
(%o1)                          [x]
(%i2) regex_match("[[:alpha:]_]", "--_--");
(%o2)                          [_]
(%i3) regex_match("[[:alpha:]_]", "--:--");
(%o3)                         false

sregex supports clusters (see pregexp clusters) which are subpatterns denoted by being enclosed within parentheses. These cause the matcher to return the submatch along with the overall match.

Here we are looking for any number of letters followed by a space, any number of digits, a comma and space, then any number of digits.

(%i1) regex_match("([a-z]+) ([0-9]+), ([0-9]+)", "jan 1, 1970");
(%o1)              [jan 1, 1970, jan, 1, 1970]

The result is a list of strings. The first element is the full match. The second matches "([a-z]+)", which is a cluster of any number of letters. Hence, "jan" matches this cluster. Likewise for the other clusters.

A more complicated example illustrates how a subpattern fails to match, but the overall pattern matches. In this case, false represents to failed match.

The regex pattern matches “month year” or “month day, year”. The subpattern matches the day, if present.

(%i1) date_re : regex_compile("([a-z]+) +([0-9]+,)? *([0-9]+)");
(%o1) 
  Structure [COMPILED-REGEX for "([a-z]+) +([0-9]+,)? *([0-9]+)"]
(%i2) regex_match(date_re, "jan 1, 1970");
(%o2)             [jan 1, 1970, jan, 1,, 1970]
(%i3) regex_match(date_re, "jan 1970");
(%o3)             [jan 1970, jan, false, 1970]

You can also do case-insensitve matches by using a cloister (see pregexp cloisters) with the i modifier:

(%i1) regex_match("hearth", "HeartH");
(%o1)                         false
(%i2) regex_match("(?i:hearth)", "HeartH");
(%o2)                       [HeartH]

Alternate subpatterns can be separated by |.

(%i1) regex_match("f(ee|i|o|um)", "a small, final fee");
(%o1)                        [fi, i]

The first element is the full match "fi"; the second shows that we matched "i" for the cluster.

Categories: Package stringproc ·
Function: regex_split (regex, str)

Returns a list of strings where str has been split into substrings where the regex identifies the delimiters to use for separating the substrings.

(%i1) regex_split("[,;]+", "split,pea;;;soup");
(%o1)                  [split, pea, soup]
Categories: Package stringproc ·
Function: regex_subst_first (replacement, pattern, str)

Returns a string where the first occurrence of pattern in str with replacement.

(%i1) regex_subst_first("ty", "t.", "liberte egalite fraternite");
(%o1)              liberty egalite fraternite

This example shows how to use back references. The replacement specifies that the first submatch is used as the replacment text.

(%i1) regex_match("_(.+?)_", "the _nina_, the _pinta_, and the _santa maria_");
(%o1)                    [_nina_, nina]
(%i2) regex_subst_first("*\\1*", "_(.+?)_", "the _nina_, the _pinta_, and the _santa maria_");
(%o2)    the *nina*, the _pinta_, and the _santa maria_
Categories: Package stringproc ·
Function: regex_subst (replacement, pattern, str)

Returns a string where every occurrence of pattern has been replaced by replacement in the string str.

(%i1) regex_subst("ty", "t.\\b", "liberte egalite fraternite");
(%o1)              liberty egality fraternity
Categories: Package stringproc ·
Function: string_to_regex (str)

Returns a regex string where any special reqex characters in str are quoted to remove the specialness of the character.

(%i1) re : string_to_regex(". :");
(%o1)                         \. :
(%i2) regex_match(re, "z :");
(%o2)                         false
(%i3) regex_match(re, ". :");
(%o3)                         [. :]
(%i4) regex_match(". :", "z :");
(%o4)                         [z :]

In this example, the regex will only match a substring consisting of a period, followed by a space and a colon. Without the quoting, the "." would match any single character.

Categories: Package stringproc ·

Next: , Previous:   [Contents][Index]