! char.ax.txt - character and symbol utility predicates ! *** Includes higher-order operations involving decimal numbers! *** ! uses form.ax ho.ax natnum.ax bit.ax atom.ax nonident.ax !---+----|----+----|----+----|----+----|----+----|----+----|----+----|----+----8 !/ index: ===== character & char subset definitions ===== (char` _char`) - distinguishing atom for a syntax-extension character (char _char) - set of (8-bit) characters (charseq (_chars_) - sequence of characters (chexpr _chexpr) - char expr: char or seq of char exprs (pchar _pchar) - set of printable characters (x20..x7E) (ultr _ultr) - uppercase-letter (lltr _lltr) - lowercase-letter (ltr _ltr) - upper- or lower-case letter (~ultr _~ultr) - a non-uppercase-letter char (or char expr seq) (~lltr _~lltr) - a non-lowercase-letter char (or char expr seq) ~ltr - char that is not a letter (or char expr seq) (chxcat _chex _chstr) - catenate a char expr into a flat char string (lc>uc _lclex _uclex) - map lowercase letters to uppercase (in nested ch expr) (uc>lc _uclex _lclex) - map uppercase " lowercase " (ultrs _ultrs) - uppercase letter seq "ABC..Z" (lltrs _lltrs) - lowercase letter seq "abc..z" (uc_lc _uc _lc) - uppercase_lowercase letter pairs ltr> - generalize lower-/upper-case letter map to both cases ===== symbol utilities ===== (sym` _sym`) - distinguishing atom for a syntax-extension symbol (sym _sym) - set of syntax-extension symbols (catsym _sym0 _sym1 _sym01) - catenate two symbols (i.e., their char seqs) / ===== decimal number representation & predicates ===== (dig _digit) - decimal digit char ('0'..'9') (~dig _~dig) - a non-digit char expr (dec>nat _digseq _natnum) - map digit sequence to its natural number value (nat>dec _nat _digseq) - map natural num to digit seq ("0" for zero) (dsym>nat _dsym _nat) - map decimal digit symbol to natural number (nat>dsym _nat _dsym) - map natural number to digit symbol (sel_dstr _seq _el) - select sequence element given decimal dig string index (el>ix1 _grps _eldig) - map seq of elem groups to elem-to-1-digit-index map (digits _dig) - decimal digit seq "01..9" (el>1d _seq _el>1d) - map seq elements to consecutive single digits '0'..<='9' (xseqs> _xseqs> _x>) - get map of seq elems from map of seqs (skip _dsym _f) - skip _dsym iargs, then apply _f to remaining args (repl _dsym _uf) - replace iarg 0 seq with unary fn applied to elem [_dsym] (ins _dsym _expr _seq _seq') - insert expr into seq before elem [_dsym] (lend _seq _d) - get seq length as a (proper) dec nat num (i.e. no leading 0s) ===== character/symbol higher-order operations ===== (_fn^ ...) - ^ suffix to move arg0 into the func param position, then apply (hopre _ho) - make selected h-o operators work as prefixes of fn/pred symbols: : _ * / \ . & (: (_d _pred) _fc_) _argp _args_) - ! skip prefix of d args in a fn compos ((: dup_d _fc_) ...) - dup arg d in stack pushing on front, resume f comp ((: swap_d _fc_) ...) - swap arg d w arg 0 in stack, resume f comp ((@d _f_) (_argp_ (_iargd.._) _args_) (_argp_ _res _args)) - apply f in seq !\ !---+----|----+----|----+----|----+----|----+----|----+----|----+----|----+----8 ! ======== character & char subset definitions ======== ! (char` _char`) - distinguishing atom for a syntax-extension character ! - alt: char_hdr`, ch` (char` `char). ! we hide the actual atom here! ! alt atoms: `ch `c (atom `char "char"). ! "declare" the atom ! -- atom declarations should be done when a new atom is introduced ! Ideally, we should never need to use the actual char header atom, ! since we can now include the char` predicate as an axiom condition, ! but since this is a little cumbersome, we will not do for these ! utility axioms. ! *** It might be a good idea to have a syntax extension that adds ! "hidden conditions", such as this char` predicate, to all axioms in ! all source files as a way to provide symbolic constant held in ! globally-reserved variable names, such as %char` ! ! We can provide a check to see that the parser is generating the atom ! we are expecting: `OK_char_atom < ! `OK_char_atom is valid if ... (char` %char`), ! ... expected atom and ... (== 'A' (%char` %8bits)). ! ... actual char atom are same! ! If not valie, we have an error!! ! -- This type of check could be part of other axiom tests. !/ Alternative definition: ! Since an implementation parser supports syntax-extension characters, ! we can get the character header atom from the parser as follows: (char` %char_hdr)< (== (%char_hdr %bits) 'A'). ! -- But we would still need to know the atom symbol text to "declare" it, ! so this is probably not a good idea. !\ ! (char _char) - set of syntax-extension characters (which have 8 bits) (char (%char` %8bits))< (char` %char`), (bitseq %8bits), (=l %8bits (7 6 5 4 3 2 1 0)). ! Note that <, <= ordering between chars derives from bit ordering. ! Do we need to define ordering between char and char sequence? ! -- No, for now. ! -- (Actually, the bit ordering axioms will make a null char sequence () ! less than the sequence for a char (_char_atom _8bits). ! -- Could this be a problem??? ! -- Probably not since we are not likely to find ourselves comparing ! a character with a sequence that is not also a character! ! (charseq (_chars_) - set of character sequences (def charseq (* char)). ! - uses higher-order forms def and * ! or use *char for charseq, using * as prefix (defined below) ! (chexpr _chexpr) - char expr: char or seq of char exprs (chexpr %char)< (char %char). ! a char is a char expr (chexpr ()). ! a seq of char exprs is a char expr (chexpr (%chexpr $chexprs))< (chexpr %chexpr), (chexpr ($chexprs)). ! (pchar _pchar) - set of printable (ASCII) characters (x20..x7E) (pchar (%char` %8bits))< ! printable char% (char (%char` %8bits)), ! get char atom and its 8 bits (bits %0 %1), ! get 0,1 bit atoms (<= (%0 %0 %1 %0 %0 %0 %0 %0) %8bits), ! x20.. (<= %8bits (%0 %1 %1 %1 %1 %1 %1 %0)). ! ..x7E !z (1val (== %8bits (%0 %0 %1 $x)) ! x20..3F !z (== %8bits (%0 %1 $y))). ! x40..7F !z ! -- this covers ASCII 20..7F (but we should probably exclude 7F) ! (ultr _ultr) - uppercase letter ('A' - 'Z') (alt: ucltr) (ultr %ul)< (char %ul), (<= 'A' %ul), (<= %ul 'Z'). ! -- These axioms assume that the implementation parser uses ASCII ! for the syntax-extension characters! (May need to be confirmed!) ! (lltr _lltr) - lowercase letter ('a' - 'z') (alt: lcltr) (lltr %ll)< (char %ll), (<= 'a' %ll), (<= %ll 'z'). ! (ltr _ltr) - upper- or lower-case letter (ltr %l)< (1val (ultr %l) (lltr %l)). ! (~ultr _~ultr) - char expr that is not an uppercase letter (?: /ultr ~ucltr) ! (A character expression is a char or seq of >=0 char exprs.) (~ultr %~ul)< ! char that is not uppercase letter (char %~ul), (1val (< %~ul 'A') (< 'Z' %~ul)). (~ultr %chxseq)< ((* chexpr) %chxseq). ! char expr seq is not uppercase ltr ! (~lltr _~lltr) - char expr that is not lowercase letter (alt?: /lltr ~lcltr) (~lltr %~ll)< ! char that is not lowercase letter (char %~ll), (1val (< %~ll 'a') (< 'z' %~ll)). (~lltr %chxseq)< ((* chexpr) %chxseq). ! char expr seq is not lowercase ltr ! (~ltr _~ltr) - char expr that is not a letter (~ltr %~l)< (val (~ultr %~l) (~lltr %~l)). ! (chxcat _chex _chstr) - catenate char expr into a flat char string ! (alt: chstr chxcat* cxcat* write) (chxcat %ch (%ch)). ! map single char to a 1-char str (chxcat () ()). (chxcat (% $))< ! map char expr seq to char string (chxcat % %chstr0), (chxcat ($) %chstr1), (cat %chstr0 %chstr1 %chstr). ! (lc>uc _lclex _uclex) - map lowercase letters to uppercase (in char expr) ! (alt?: lltr>ultr) ! map single char: (lc>uc (%char` (%b7 %b6 %1 $b)) (%char` (%b7 %b6 %0 $b))< ! lltr -> ultr (lltr (%char` (%b7 %b6 %1 $b))), (bits %0 %1). (lc>uc %c %c)< ! non-lowercase-letter char is unchanged (char %c), (~lltr %c). ! map sequence of char exprs: (lc>uc () ()). ! map nested char seqs (lc>uc (%cx $cx) (%cx' $cx'))< (lc>uc %cx %cx'), (lc>uc ($cx) ($cx')). ! (uc>lc _uclex _lclex) - map uppercase letters to lowercase (in char expr) (uc>lc (%char` (%b7 %b6 %0 $b)) (%char` (%b7 %b6 %1 $b))< ! ultr -> lltr (ultr (%char` (%b7 %b6 %0 $b))), (bits %0 %1). (uc>lc %c %c)< ! non-uppercase-letter char is unchanged (char %c), (/ultr %c). (uc>lc () ()). ! map nested char seqs (uc>lc (%cx $cx) (%cx' $cx'))< (uc>lc %cx %cx'), (uc>lc ($cx) ($cx')). ! ultrs | lltrs - sequence of uppercase|lowercase letters (ultrs "ABCDEFGHIJKLMNOPQRSTUVWXYZ"). (lltrs "abcdefghijklmnopqrstuvwxyz"). ! *** Probably need better way to define this, using bit string increments! ! ul_ll - uppercase_lowercase letter pairs (alt: ultr-lltr) (ul_ll %ul %ll)< (ultrs ($upre %ul $usuf)), (lltrs ($lpre %ll $lsuf)), (=l ($upre) ($lpre)). ! ltr> - generalize lower-/upper-case letter map to both cases !/ Given a finite map where the domain may have just one letter case present even though the intent is that both letter cases would be mapped to the same value, this function "generalizes" the map by adding letters of the other case, which map to the same values. Actually, the initial map may have a mix of upper and lower case letters and the other case is added with the same value, when absent. Both cases of a letter may be initially present so long as they map to the same value. (This function is not defined for the case where upper and lower case letters are both initially present with different values.) This letter-case-generalization function is defined below by working backward from fully generalized map to a possible domain map where one of the letter cases may be present. We also add non-letters to the map. !\ (ltr> %ltrmap %ltrmap)< ! map a generalized letter map to itself (ultrs %ultrs), ! seq of uppercase letters (lltrs %lltrs), ! seq of lowercase letters (=l %ultrs %vals), ! seq of arbitrary expr values, 1 per letter ((* join) %ultrs %vals %ulmap), ! uppercase letters map to values ((* join) %lltrs %vals %llmap), ! lowercase " (cat %ulmap %llmap %ullmap), ! combine upper-/lower-case letter maps (perm %ullmap %ltrmap). ! let map letters be in any order (ltr> %ltrmapx' %ltrmap)< ! remove a domain letter if other case present (ltr> %ltrmapx %ltrmap), ! generalization of ltrmapx is ltrmap (ul_ll %ul %ll), ! get corresponding uppercase-lowercase letters (in (%ul %val) %ltrmapx), ! if both upper/lower ltrs map to same value,... (in (%ll %val) %ltrmapx), (1val (ins (%ul %val) %ltrmapx' %ltrmapx) (ins (%ll %val) %ltrmapx' (%ll %val) %ltrmapx)). ! ... remove either upper or lower letter mapping from x to get x' ! (Note that the "insert" predicate also works as a "remove" pred.) ! (Note also that only 1 of these ins exprs can be valid.) ! Thus, letter generalization of ltrmapx' is also ltrmap! (ltr> %ltrmapx' %ltrmap')< ! we also allow adding non-letters to the map (ltr> %ltrmapx %ltrmap), ! ltrmapx generalizes to ltrmap (~ltr %c), ! c is a non-letter char (ins (%c %v) %ltrmapx %ltrmapx'), ! insert non-letter mapping (ins (%c %v) %ltrmap %ltrmap' ), ! into both maps ! - doesn't matter for generalization operation if c is already in map ! - doesn't matter if duplicate c maps to different value !---+----|----+----|----+----|----+----|----+----|----+----|----+----|----+----8 ! ======== symbol utilities ======== ! (sym` _sym`) - distinguishing atom for a syntax-extension symbol ! alt: sym_hdr` (sym` `). ! alternative symbol atom: `sym (atom ` ""). ! "declare" symbol atom ! -- See char` above for getting symbol atom from implementation parser! ! And to check that declared symbol atom matches implementation. ! (sym _sym) - set of syntax-extension symbols (sym (%sym` (%c1)))< ! symbol char string must have at least 1 char! (sym` %sym`), (pchar %c1), ! 1st symbol char is printable char (~in %c1 " `%$'""!()") ! other than blank ` % $ ' " ! ( ) (sym (%sym` ($cstr %c)))< (sym (%sym` ($cstr))), (pchar %c), ! remaining symbol chars are printable chars (/in %c " ()"). ! other than blank ( ) ! (catsym _sym0 _sym1 _sym01) - catenate two symbols (i.e., their char seqs) (catsym (%sym` %chstr0) (%sym` %chstr1) (%sym` %chstr01))< (sym` %sym`), ! hide symbol atom (cat %chstr0 %chstr1 %chstr01). !---+----|----+----|----+----|----+----|----+----|----+----|----+----|----+----8 ! ======== (ASCII) decimal number representation ======== ! (digits "0..9"). ! define set of decimal digits (digits "0123456789"); ! (ord dig _dig _num) - number associated with each digit ! - could also define this from a finite_set definition (ord dig %dig %num)< (digits ($dpre %dig $dsuf)), (len ($dpre) %num). ! (dig _digit) - decimal digit char !/ to be defined - see 1ho.ax.txt and 2natnum.ax.txt ! (dig _digit) - decimal digit char (finite_set dig "0123456789"). ! uses higher-order form ! -> (ord dig _dig _natk) - get digit num - defined from 'finite_set' !\ (dig %digit)< ! an (ASCII) char from '0' to '9' (char %digit), (<= '0' %digit), (<= %digit '9'). ! (~dig _~dig) - a non-digit char expr (~dig %~dig)< (char %~dig), (1val (< %~dig '0') (< '9' %~dig)). (~dig %chexseq)< ((* chexpr) %chexseq). ! char expr seq is not a digit ! (dec>nat _digseq _natnum) - map decimal digit seq to its natural num value ! - null seq maps to zero; other seqs can have leading zeros (dec>nat () %0)< (zero %0). (dec>nat ($d %d) %n')< (dec>nat ($d) %n), (len (* * * * * * * * * *) %10), ! need nat num constant 10 ! - we can't use decimal 10 since that is what we're trying to define! (times %n %10 %10n), (ord dig %d %k), ! get "ordinal" value of a 'finite_set' element (plus %10n %k %n'). ! (nat>dec _nat _digseq) - map natural number to decimal digit seq ! - zero maps to "0", everything else has non-zero leading digit ! - Thus each natural number has a unique representation, which we will call ! the "proper" decimal representation. (nat>dec %z "0")< ! we don't use null string for decimal number repr (zero %z). (nat>dec %n (%d $d))< ! other decimal numbers have non-zero leading digit (dec>nat (%d $d) %n), (< '0' %d). ! (dsym>nat _dsym _nat) - map decimal digit symbol to natural number (dsym>nat (%sym` %dstr) %n)< (sym` %sym`), (dec>nat %dstr %n). ! Note that null-string symbol maps to zero, but that shouldn't cause prob. ! (nat>dsym _nat _dsym) - map natural number to decimal digit symbol ! - zero maps to 0, everything else has non-zero leading digit ("proper") (nat>dsym %n (%sym` %dstr))< (sym` %sym`), (nat>dec %n %dstr). !/ --- old --- - use 'sel_dstr' instead! ! (_dsym _seq _elem) - select element from seq given decimal symbol index 0.. ! - the decimal symbol itself is the selection function (%dsym %seq %elem)< ! decimal symbol is the selection function name! (dsym>nat %dsym %nat), ! get natural number value of decimal symbol (sel# %nat %seq %elem). ! select sequence element given nat num index !\ ! (sel_dstr _seq _el) - select sequence element given decimal dig string index ! ex: (sel2 (a b c d) c) - note 0,1,.. indexing ! alt: sel#_dstr, #_dstr, _dstr (%sym` ('sel' $dstr)) %seq %elem)< (sym` %sym`), ! get symbol atom (dec>nat ($dstr) %i), ! get decimal index as natural number i (sel# %i %seq %el). ! select ith element ! *** tbd: Do all the ...# index fns with dec dig str index in the fn symbol! !/ *** old: (replaced by fns below) ! (el>ix1 _grps _eldig) - map seq of elem groups to elem-to-1-digit-index map ! example: ! (el>ix1 ("ab" "c" "def") ! (('a' '0') ('b' '0') ('c' '1') ('d' '2') ('e' '2') ('f' '2'))) ! -- works for <=10 groups ! -- apply to phonecode example! (def el>ix1 ! fn applied to above example ... ! ("ab" "c" "def") (dup ! duplicate arg ! ("ab" "c" "def") ("ab" "c" "def") len ! get length of 1st arg ! _n3 ("ab" "c" "def") iota0n- ! get indexing numbers 0 up to 3 ! (_n0 _n1 _n2) ("ab" "c" "def") *nat>dec ! convert natural numbers to decimal digit strings ! ("0" "1" "2") ("ab" "c" "def") *sel0 ! get leading digit of decimal digit strings (works for <=10 nums) ! ('0' '1' '2') ("ab" "c" "def") (* (.- join)) ! join each digit with each element of its group sequence ! ((('0' 'a') ('0' 'b')) (('1' 'c')) (('2' 'd') ('2' 'e') ('2' 'f'))) ! - note that digit-letter mappings are still in groups cat* ! catenate groups into a flat map ! (('0' 'a') ('0' 'b') ('1' 'c') ('2' 'd') ('2' 'e') ('2' 'f')) *rev ! reverse to get letter->digit map ! (('a' '0') ('b' '0') ('c' '1') ('d' '2') ('e' '2') ('f' '2')) ! - function result ) ). !\ ! (digits _dig) - decimal digit seq "01..9" (digits "0123456789"). ? (el>1d _seq _el>1d) - map seq elems to consecutive single digits '0'..<='9' ? (xseqs> _xseqs> _x>) - get map of seq elems from map of seqs ! (el>1d _seq _el>1d) - map seq elems to consecutive single digits '0'..<='9' ix>1d (el>1d %seq %el>1d)< ! seq elems map to consecutive single digits 0..<=9 (finite_set dig %digs), (pre %1dpre %digs), ! get digit seq prefix '0'..<='9' (=l %seq %1dpre), ! (... of same length as sequence) (*join %seq %1dpre %el>1d). ! join seq elems w single digits to make map ! (xseqs> _xseqs> _x>)< ! given map of seqs, make map of seq elems (def xseqs> ! a seq of seqs where each elem seq maps to a value (*.\join ! for each seq>val pair, get seq_elem>val, group by seq cat*)). ! flatten the elem>val groups into a single elem>val map ! ***** Move this to ho.ax !!! ***** ! -- A nice utility that depends on some higher-order forms!! ! (x>d _xseqs _x>d) - map seqs of exprs to seq index single digit values (el>1d %seqs %el>d)< (ix>1d %seqs %seqs>d), ! get single digit indexes for seq elems (xseqs> %seqs>d %el>d). ! ***** Naming of above 3 functions !!!! ! (skip _dsym _f) - skip _dsym iargs, then apply _f to remaining args ! - This skip operation is used within a function composition. (((skip %dsym %f) $f) $iaskip $args)< (dsym>nat %dsym %n), (len ($iaskip) %n), ! skip n input args ((%f $f) $args). ! then apply fn to remaining args in composition ! (repl _dsym _uf) - replace iarg0 seq with unary fn applied to elem [_dsym] (((repl %dsym %uf) $f) ($pre %ex $suf) $args)< (dsym>nat %dsym %n), (len ($pre) %n), (%uf %ex %ex'), ! apply unary fn to nth elem of 1st arg (($f) ($pre %ex' $suf) $args). ! replace that elem with result ! (ins_dsym _expr _seq _seq') - insert expr into seq before elem [_dsym] ! - we append the decimal insert location to the 'ins' fn symbol ! ex v.e.: (ins2 x (a b c) (a b x c)) (((%insd %expr ($pre $suf) ($pre %expr $suf))< ! insert expr into seq (catsym ins %dsym %insd), (dsym>nat %dsym %n), (len ($pre) %n). (param %insd)< (catsym ins %dsym %insd), (dsym>nat %dsym %n). ! - All ins_dsym instances can have insert expr as a parameter! ! (lend _seq _d) - get seq length as a (proper) decimal natural number (lend %seq %d)< (len %seq %n), (nat>dsym %n %d). !---+----|----+----|----+----|----+----|----+----|----+----|----+----|----+----8 ======== character/symbol higher-order operations ======== ! (_fn^ ...) - ^ suffix to move arg0 into func param position, then apply ! - Note this is only meaningful for functions with params, like 'eat'. (%fn^ %arg0 $args)< (catsym %fn ^ %fn^), ((%fn %arg0) $args). ! (We won't use this when a function already has parameters.) ! (hopre _ho) - make selected h-o ops work as prefixes of fn/rel pred syms: ! : _ * / \ . & (hopre %) < (in % (: _ * / \ . &)). ! -- see 1ho.ax.txt! (axs ! generate h-o prefixed predicates (typically functions) ((%hopsym $args) ((%ho %psym) $args)) ! no parameters (((%hopsym $params) $args) ((%ho (%psym $params)) args)) ! w params )< (hopre %ho), (catsym %ho %psym %hopsym). ! shared conditions! ! ((: (_d _fc0_) _fc_) (_argp_ _args_) _argseq') - do fcomp0 on stk suffix ! - (_d _fc0_) only applies within a function composition ((: (%d $fc0) $fc) ($argp $args) %argseq')< (lend ($argp) %d), ! dec num d is length of arg prefix ((: $fc0) ($args) ($argx), ! apply fcomp0 to arg suffix to get argx result ((: $fc) ($argp $argx) %argseq'). ! restore prefix and apply rest of fcomp ! If there is just one function in the function composition, we can make ! the decimal digit symbol a prefix of the function symbol for conciseness!: (hopre %d)< (nat>dec %n %d). ! valid for all (proper) decimal symbols ! del_d - delete argd from stack ! - generalization of 'del' operation in ho.ax ((: %deld $fc) ($argp $argd $args) %argseq')< (catsym del %d %deld), ! get "proper" dec nat num appended to 'del' (lend ($argp) %d), ! skip arg prefix to get to argd ((: $fc) ($argp $args) %argseq'). ! - Note that del0 is the same as del. ! dup_d - dup arg_d in stack pushing copy on front ! - generalization of 'dup' operation in ho.ax ((: %dupd $fc) ($argp %argd $args) %argseq')< (catsym dup %d %dupd), ! get "proper" dec nat num appended to 'dup' (lend ($argp) %d), ! skip arg prefix to get to argd ((: $fc) (%argd $argp %argd %args) %argseq'). ! - Note that dup0 is the same as dup. ! swap_d - swap arg d with arg 0 in stack ! - generalization of 'swap' operation in ho.ax ((: %swapd $fc) (%arg0 $argp %argd $args) %argseq')< (catsym swap %d %swapd), (lend (%arg0 $argp) %d), ! get argd (>0!) ((: $fc) (%argd $argp %arg0 $args) %argseq'). ! - Note that swap1 is the same as swap. ! - Note also that there is no swap0 (which would be a null operation if it were defined).