1. Introduction
This webpage gives an axiomatic language specification
for the "phonecode" problem of
Lutz Prechelt,
[Prechelt 2000, 2000TR, 2003].
This problem is to encode phone numbers as sequences of words from
a dictionary, given a mapping of letters to digits.
This webpage will be gradually refined in preparation for the
July 8, 2024 short paper deadline for
ICLP 2024.
A definition of axiomatic language can be found
here.
The software engineering assumption for this project is that
programming time is roughly the same per line of (non-comment) code,
regardless of language level. So, if a C language program for
a problem is typically 1/3 the size of an assembly language program for
the same problem (ignoring comment lines), we would expect programmer
productivity in C to be roughly 3 times greater.
Another assumption for this project is that the automatic
transformation of axiomatic language specifications to efficient
programs for problems like this can eventually be achieved.
Thus, this currently-unexecutable specification can be considered
an actual solution to the phonecode problem.
Section 2 talks about claims and evidence for the idea
that a higher-level programming language that enables shorter source
programs leads to shorter programming time and thus greater
programmer productivity.
Section 3 defines the phonecode problem and
section 4 gives an axiomatic language specification
for that problem.
Section 5 gives an expanded, tutorial version of the phonecode
specification with additional comments inserted to help explain the details.
Finally, section 6 gives some concluding comments and dicusses future work.
2. Roughly Constant LOC/hr,
Regardless of Language?!
The idea that programming time per line of non-comment code (loc)
is roughly constant regardless of language level was
an early observation in software engineering.
Fred Brooks writes "Productivity seems constant
in terms of elementary statements, a conclusion that is reasonable
in terms of the thought a statement requires and the errors
it may include." [Brooks 1975, p.94]
Prechelt [2000, p.25] refers to "Common software engineering wisdom,
which says that 'the number of lines written per hour is independent
of the language'".
Prechelt's phonecode results support this claim, where lines-of-code
are "anything that contributes to the semantics of the program
in each of the program source files, e.g. a statement, a declaration,
or at least a delimiter such as a closing brace (end-of-block marker)."
[Prechelt 2003, p.31]
Based on median program lengths and programming times,
"non-scripts [C, C++, Java] are typically two to three times
as long as scripts [Perl, Python, Rexx, Tcl]" [ibid., p.31]
and "scripts (total median 3.1 hours) take only about one third of
the time required for the non-scripts (total median 10.0 hours)"
[ibid., p.37], with some caveats, such as the script times being
self-reported.
The median loc/hr values for the seven languages are shown in
Fig. 24 of [ibid., p.41] and are all between 20-40. We therefore accept
the assumption of roughly constant loc/hr regardless of language.
Paul Graham [2002] argues this and makes the logical inference that
"the main point of high-level languages is to make source code smaller".
For very-high-level, declarative, functional, and logic languages,
there may be unrealized potential to make our programs smaller and
thus achieve greater programming productivity (without sacrificing
readability). Exploring that possibility is the motivation
for this project.
3. The Phonecode Problem
The phonecode program reads a text file of
zero-or-more phone numbers, one per line, and a text dictionary file of
zero-or-more words, one per line, and "encodes" the phone numbers as
word strings, interspersed as needed with phone number digits.
A phone number is a string of zero or more digits, dashes(-), and slashes (/).
A dictionary word is a string of (upper- and lowercase) letters, dashes (-),
and double quotes ("), starting with a letter.
The following sequence of letter groups defines a mapping
of letters to the digits 0-9:
(e jnq rwx dsy ft am civ bku lop ghz)
-> 0 1 2 3 4 5 6 7 8 9
(E.g., letter 'd' maps to digit '3'; letter 'H' maps to '9'.)
From this mapping of letters to digits, we get a mapping of
dictionary words to digit strings. ?? Special characters in phone numbers
and dictionary words are ignored in the encoding.
The encoding of a phone number to words is done
left-to-right in incremental steps.
At any point a prefix of a phone number will have been encoded to a
word string with possible included digits. An encoding step attempts
to match the starting digits of the not-yet-encoded phone number suffix
with digit strings for dictionary words. For each match the
dictionary word is appended to the encoding and the matching digits
are removed from the phone number suffix. If there are no matches,
a single phone number digit is moved to the encoding, but only
if the previous encoding step did not move a digit. (An encoding thus
cannot have consecutive digits.) When a sequence of encoding steps
has "consumed" the phone number, we have found an encoding
for that number. A phone number may have zero or more encodings.
The phone number - encoding pairs found are listed in an output
file, one pair per line, in the following format:
_phonenum: {[_dig] _word} [_dig]
Special characters are included in _phonenum and _word.
See the Prechelt papers [2000, 2000TR, 2003] for his natural
language decription of the phonecode requirements along with examples
(and let me know if I have misinterpreted them in my paraphrase!).
4. Axiomatic Language Phonecode Solution
The axiomatic language specification below for the phonecode
problem has just 8 non-comment lines of code.
Since axiomatic language is an extremely minimal language, it must
be built-up with assorted utility functions, such as numbers,
higher-order forms, Boolean functions and relations, etc.
These utilities would be provided in a standard library, which would
be shared by all applications, and thus this reusable code would not
be counted as part of the source code for any particular application.
Axiomatic language encourages this type of generalization and abstraction
of "general purpose functionality" from a specific application,
thus making the application shorter.
Another characteristic of this phonecode solution is a preference
for short names and
a "dense" style where composed functions are often written
on the same line for conciseness (and to keep the line count down).
Some abbreviations used in the source code and its comments
are as follows:
# - number of something (#apples)
> - denotes a function/map from an expression domain to a range
domain>range - (usage)
>=0 - zero-or-more
>=n - greater-than-or-equal-to n
~ - prefix for boolean function symbol that complements the function
arg - argument for a function or predicate
iarg - input argument
oarg - output argument (result of a function)
asn - assign expr to a position in a seq (replacing old expr at that pos)
ax(s) - axiom(s), set of axioms
bin, b - binary (function or relation) - 2 input arguments
bool, b - Boolean function/predicate (that uses/yields true/false values)
cat - [con]catenate char strings
cat* - catenate a sequence of char strings into a single string
char, ch - character
conclu, conc - conclusion (of an axiom)
cond - condition (of an axiom)
consec - consecutive (elements in a sequence)
cur - "currying" - a prefix of input args is moved into a function expr
def - define (a name for a function or predicate expression)
dic - dictionary (of phonecode code words)
dig, d - digit
dsym - decimal digit symbol: 3, 25
dup - duplicate (an input arg)
elem, el - "element" (expression) in a sequence
(sometimes string vars in a seq may be called elems of that seq)
elim - eliminate
expr, ex, e, x - expression
f, `f - false (Boolean value)
filt - filter out (remove) seq elems which fail a Boolean predicate
fmt - format
func, fn, f - function
id - identifier; identity function
ins - insert (an expression into a sequence)
ins_btw - insert an expression between elements of a sequence
ix, i - index
lc, lcltr - lowercase (letter)
len, l - length of a sequence
ltr, l - letter
num, nat, natnum - natural number
outf - output file
ph#, phonenum - phone number
phcode, phc - phonecode (problem)
pos - position of an expr in a seq (indexed from zero)
pred - predicate (name)
rel, r - relation
repl - replace (seq expr with result of a unary fn applied to that expr)
rev - reverse (a sequence)
rmv - remove (delete) an expr from a sequence
seq - sequence
str - string of some "type" (often chars) within a seq
sym - symbol
t, `t - true (Boolean value)
uc, ucltr - uppercase (letter)
un - unary (function)
bun - binary func w its 2 iargs in one 2-elem seq (-> unary fn)
w - with
A "type" or expression syntactic category may be denoted by a name
beginning with an underscore: _num, _ltr. A string of >=0 of the same
type within a sequence may be denoted by an underscore before and
after the type name: _dig_.
A program that reads multiple input text files
and writes a single output text file can be specified by a set of "Program"
valid expressions of the following form:
(Program _infiles_ _outfile) - >=0 input files, 1 output file
A text file is a sequence of lines, each a character sequence.
A functional style is used for many predicates:
(_fn _iargs_ _oarg) - func on >=0 input args yields a single output arg
- input arguments are numbered from 0: _iarg0 _iarg1 ....
Note that in addition to single-line comments,
...code... ! comment...
we use multi-line comment blocks framed by '!/' and '!\' markers
that start in column 1:
!/ comment...
comment...
comment...
!\ comment...
Comment blocks can be nested.
Recall that syntax-extension symbols can start with any
printable character other than blank ` $ % ! ( ) ' " and
symbol/atom/variable characters after the first can be any
printable characters other than blank ( ).
We often use special characters for, say, higher-order forms.
Our axiomatic language phonecode solution is given below.
Indices in column 1 count the non-comment source lines.
!---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
! --- Phonecode top-level program axiom ---
1 (Program %dic %ph#s %outf)< ! Program phonecode
!/ - encode phone numbers with dictionary words using letter>digit map
input arg text files:
%dic - >=0 words of dictionary, one-per-line
_word is uc/lc letter + >=0 letters and special chars - "
%ph#s - phone numbers to encode, one-per-line
phone number is >=0 digits and special chars - /
output arg text file: %outf - phonenum/encoding pairs, one per line:
_ph#: _wordd .. _wordd - encoding "wordd"s separated by blanks
- _wordd is dic word or single ph# dig if no possible word at that point
- consecutive digit-wordd's are not allowed
- Phone number encoding is incrementally computed left-to-right.
!\ - A ph# might not have an encoding and thus wouldn't be in the output.
! get map of ph# digits to dictionary words
!/ Given a sequence of lc-letter-group symbols, we assign digit index '0'..'9'
to each letter based on its group's sequence index. This letter-to-digit
mapping, applied to %dic, gives us a word-to-digit-string mapping,
!\ from which we get a digit-string-to-dictionary-word map.
2 ((ltr>dig digs>word) (e jnq rwx dsy ft am civ bku lop ghz) %dic %digs>word),
! 0 1 2 3 4 5 6 7 8 9
! Note that we hard-code the letter-to-digit map within the Program axiom.
! ltr->dig map + dictionary --> digits->word map
! compute all stream solutions of digs>word map applied to phone number digits
3 (((* (filt dig?)) (* (eat (bite1 %digs>word)))
4 (* (filt (~consec? dig?)))) %ph#s %codes),
!/ We remove special chars from phone numbers ph#s to get digit-only ph#ds.
We then incrementally "eat" prefixes from each ph#d while outputting
dictionary words whose digit string matches each prefix. If no word can
be matched at some point, a single digit of ph#d is copied to the output.
The resulting "wordd" sequence (wordd = dic word or phonenum digit)
is a candidate encoding except that we then eliminate encodings where
!\ consecutive wordds are both digits.
! format phone numbers and their codes for output file
5 (((* (.- join)) cat* *fmt) %ph#s %codes %outf).
! - %outf - output is text file of ph#-code pairs, 1 per line
!/ Given original phone numbers and their encodings, we get
!\ phonenum/code pairs and then format each of these for output.
! --- Phonecode app supporting functions ---:
! ltr>dig - define letter-to-digit mapping from seq of lc letter-group symbols
6 (def ltr>dig (*sel1 el>ix1 dup lc>uc cat)).
! -- Could this be a std lib fn?!
! digs>word - get map of digit strings to dictionary words
7 (def digs>word ((skip 1 (dup (* (filt ltr?)))) (.-- map) *join)).
!/ Given the ltr>dig map and a dictionary of words, this function applies
!\ the map to the dictionary words to get a word-digits_to_word map.
! fmt - format phonenum-code pair for output line
8 (def fmt ((cur ins 1 ": ") (repl 2 (cur ins_btw ' ')) chxstr)).
!/ Given a ph#-code pair, this function first inserts ": " between the
phonenum and its code, then inserts a blank between each "wordd" of
the code, and then flattens the nested character expr into a single
!\ character string, which will be a line of the output file.
Links to standard library source files and some of the
predicates they define are as follows:
form.ax
- utilities involving the form of expressions
(join _a _b (_a _b))
- join two argument exprs into 2-elem seq
(cat (_str1_) (_str2_) (_str1_ _str2_))
- catenate two seqs into one
(cat* ((_strA_) .. (_strZ_)) (_strA_ .. _strZ_))
- cat seq of seqs into one
(rev _seq _rev)
- reverse a sequence
(ins_btw _e _seq _seq')
- insert expr _ between seq elements
ho.ax
- higher-order predicates and functions
(_f_)
- function composition on input arg stack, applied left-to-right
dup
- duplicate the first input argument in a function composition
swap
- swap the first two input arguments for a function composition
(cur _f _iargs0_)
- "currying" - input arg prefix is part of function
(* _pred)
- apply predicate to sequences of arguments
(.- _binfn/rel)
- apply binary fn/rel to arg0 and all elems of arg1 seq
(.-- _binfn/rel)
- apply binary fn/rel to arg0 and all elems of arg1 seq seqs
(bun _binfn/rel)
- apply binary fn/rel to 2-elem seq (makes it unary)
(map _map _ex _ex')
- apply _map, consisting of (_(_ex _ex')_) pairs, to an expr
natnum.ax
- natural number predicates and functions
-- no utilities directly used
bit.ax
- predicates and functions that use bit representation
-- no utilities directly used
char.ax
- predicates and functions involving characters
*pred
- apply predicate to sequences of arguments
lc>uc
- map lowercase letters to uppercase in nested ch expr
sel_dsym
- select elem [_dsym] from a sequence [0..]
(el>ix1 _elgroups _el>dig)
- given seq of elem group seqs, get elem-to-1-dig-index map
(skip _dsym _f)
- skip _dsym iargs, then apply fn
(repl _dsym _uf)
- replace elem [_dsym] of iarg0 seq w unary fn _uf applied to it
(ins _dsym _e _seq _seq')
- insert expr before elem [_dsym] of seq
(chxstr _chexpr _chstr)
- flatten nested char expr into a char string
bool.ax
- Boolean predicates and functions (- ending ? is a convention)
_type?
- returns true/false if expr is of given type
_dig?
, _ltr?
- returns t/f if char expr is digit, letter
~_bfn?
- returns complement of Boolean function _bfn?
(consec? _type?)
- t/f if seq has consec elems of given type
stream.ax
- incremental stream processing functions
(bite1 _map)
- if possible, bite a stream prefix that matches map, outputting expr;
- else, copy 1 stream elem to output
(eat _bite)
- "eat" a stream using all possible bite sequences, outputting expr seqs
5. Expanded, Tutorial Axiomatic Language Solution Presentation
To help the user understand the above specification, we have
re-written it over multiple lines with extra comments inserted.
Some comments document the actions of individual functions;
others show the input argument stack before and after the function invocation.
(Our assumption is that a future axiomatic language programmer would have
comprehensive familiarity with the standard utility library,
so an expanded presentation like this would not be necessary.)
!---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
! *** expanded program axioms with inserted comments ***
! --- Phonecode top-level program axiom ---
1 (Program %dic %ph#s %outfile)< ! Program phonecode
!/ copied from above ...
- encode phone numbers with dictionary words using letter>digit map
input arg text files:
%dic - >=0 words of dictionary, one-per-line
_word is uc/lc letter + >=0 letters and special chars - "
%ph#s - phone numbers to encode, one-per-line
phone number is >=0 digits and special chars - /
output arg text file: %outf - phonenum/encoding pairs, one per line:
_ph#: _wordd .. _wordd - encoding "wordd"s separated by blanks
- _wordd is dic word or single ph# dig if no possible word at that point
- consecutive digit-wordd's are not allowed
- Phone number encoding is incrementally computed left-to-right.
!\ - A ph# might not have an encoding and thus wouldn't be in the output.
!/ copied ...
! get map of ph# digits to dictionary words
!/ Given a sequence of lc-letter-group symbols, we assign digit index '0'..'9'
to each letter based on its group's sequence index. This letter-to-digit
mapping, applied to %dic, gives us a word-to-digit-string mapping,
!\ from which we get a digit-string-to-dictionary-word map.
2 ((ltr>dig digs>word) (e jnq rwx dsy ft am civ bku lop ghz) %dic %digs>word),
! 0 1 2 3 4 5 6 7 8 9
! Note that we hard-code the letter-to-digit map within the Program axiom.
! ltr->dig map + dictionary --> digits->word map
! expanded ...
! get map of ph# digits to dictionary words
(( ! (e jnq rwx ... lop ghz) _dic - input args for this ax cond fn
ltr>dig ! get map of upper/lower-case letters to digits
! (('E' '0') .. ('Z' '9') ('e' '0') .. ('z' '9')) _dic
digs>word ! apply ltr>dig map to dictionary
! _digs>word - map of ph# digit prefixes to dictionary words
! (.. (_digits _dic-word) ..)
) (e jnq rwx dsy ft am civ bku lop ghz) %dic %digs>word),
!/ copied ...
! compute all stream solutions of digs>word map applied to phone number digits
3 (((* (filt dig?)) (* (eat (bite1 %digs>word)))
4 (* (filt (~consec? dig?)))) %ph#s %codes),
!/ We remove special chars from phone numbers ph#s to get digit-only ph#ds.
We then incrementally "eat" prefixes from each ph#d while outputting
dictionary words whose digit string matches each prefix. If no word can
be matched at some point, a single digit of ph#d is copied to the output.
The resulting "wordd" sequence (wordd = dic word or phonenum digit)
is a candidate encoding except that we then eliminate encodings where
!\ consecutive wordds are both digits.
!\
! expanded ...
! compute all stream solutions of digs>word map applied to phone number digits
(( ! _ph#s
(* (filt dig?)) ! filter out non-digits from ph#s - get digit-only ph#ds:
! _ph#ds
(* (eat (bite1 %digs>word))) !incrementally "eat" each ph#d to get "codeX"s
! - each "bite" matches digit prefix to digs>word map to get output word
! If no match, the bite1 fn copies 1 ph#d digit to the output.
! - Thus, an output "codeX" encoding is a sequence of wordds, each a
! dictionary word or a single digit. All possible codeXs are
! generated by this process, where a codeX encoding may have an
! arbitrary number of ph#d digits:
! (.. _codeXs-for-ph#i ..) - >=1 codes for each ph#
! - codeX "wordd" is digit where dictionary word not possible
! - Note that each ph# will have at least one codeX since ph#d itself
is a possible codeX (for example, if the dictionary is empty).
! - But consecutive digits are not allowed for ph# encodings!!
(* (filt (~consec? dig?))) ! elim codes w consecutive digits
! (.._(valid-)codes-for-ph#i..) - no consec digits in codes
! - Note that it's possible a ph# may not have any valid encodings.
) ! end of fn composition returns valid codes for each ph#
%ph#s %codes),
!/ copied ...
! format phone numbers and their codes for output file
5 (((* (.- join)) cat* *fmt) %ph#s %codes %outf).
! - %outf - output is text file of ph#-code pairs, 1 per line
!/ Given original phone numbers and their encodings, we get
!\ phonenum/code pairs and then format each of these for output.
!\
! expanded ...
! format phone numbers and their codes for output file
(( ! _ph#s _codes_for_ph#s
(* (.- join)) ! join each ph# w each of its codes
! (.. (.. (_ph#i _code-for-ph#i) ..) ..)
! - note that ph#-code pairs are grouped by ph#
cat* ! make a flat list of ph#-code pairs
! (.. (_ph# _code) ..)
! - if a phone num has no codes, it will not appear in any pair
*fmt ! format each ph#-code pair into output line
! (.. "ph#: wordd .. wordd" ..) - this is the output text file!
! - returned in %outf
) %ph#s %codes %outf).
! --- Phonecode app supporting axioms ---:
!/ copied ...
! ltr>dig - define letter-to-digit mapping from seq of lc letter-group symbols
6 (def ltr>dig (*sel1 el>ix1 dup lc>uc cat)).
! -- Could this be a std lib fn?!
!\
! expanded ...
! ltr>dig - define letter-to-digit mapping from seq of lc letter-group symbols
(def ltr>dig ! define ltr>dig function
( ! (e jnq rwx ... lop ghz) - lc letter groups - alphabet partition
*sel1 ! sel elem [1] of each letter group symbol (- gets sym char string)
! ("e" "jnq" ... "lop" "ghz")
el>ix1 ! get map of lc letter group elements to single digits
! (we index seq elems from ['0'])
! (('e' '0') .. ('g' '9') ('h' '9') ('z' '9')) - lcltr-to-dig map
dup ! duplicate the first arg
! (('e' '0') .. ('z' '9')) (('e' '0') .. ('z' '9'))
lc>uc ! map lc letters to uppercase
! (('E' '0') .. ('Z' '9')) (('e' '0') .. ('z' '9'))
cat ! concatenate lc- and uc-maps into single letter>digit map
! (('E' '0') .. ('Z' '9') ('e' '0') .. ('z' '9'))
)). ! end of fn composition and fn definition
! returns ltr>dig map
!/ copied ...
! digs>word - get map of digit strings to dictionary words
7 (def digs>word ((skip 1 (dup (* (filt ltr?)))) (.-- map) *join)).
!/ Given the ltr>dig map and a dictionary of words, this function applies
!\ the map to the dictionary words to get a word-digits_to_word map.
!\
! expanded ...
! digs>word - get map of digit strings to dictionary words
(def digs>word
( ! _ltr>dig_map _dic - the two input args
(skip 1 ! skip past 1 arg, then apply function
((dup ! duplicate the dictionary arg
! " _dic _dic
(* (filt ltr?)) ! remove non-letters from dictionary words
! " _dicl _dic
! - _dicl - letter-only words
) )
! _ltr>dig_map _dicl _dic
(.-- map) ! apply ltr>dig map to letter-only dictionary words
! - result is a unique string of digits for each word
! _(word-)digits _dic
*join ! pair each digit string with original dictionary word
! (.. (_digits _word) ..) - map of digit strs to original words
)). ! end of fn composition and fn definition
! digs>word map is returned!
!/ copied ...
! fmt - format phonenum-code pair for output line
8 (def fmt ((cur ins 1 ": ") (repl 2 (cur ins_btw ' ')) chxstr)).
!/ Given a ph#-code pair, this function first inserts ": " between the
phonenum and its code, then inserts a blank between each "wordd" of
the code, and then "flattens" the nested character expr into a single
!\ character string, which will be a line of the output file.
!\
! expanded ...
! fmt - format phonenum/code pair for output line
(def fmt
( ! (_ph# _code) - a ph#-code pair
(cur ins 1 ": ") ! insert punctuation between ph# and code
! (_ph# ": " _code)
(repl 2 (cur ins_btw ' ')) ! insert blanks between code words/digits
! (_ph# ": " (_wordd ' ' .. _wordd))
! - We replace arg 2 w result of unary fn that inserts blank between
! its sequence elements.
chxstr ! flatten nested char expr into single string
! "_ph#: _wordd .. _wordd" - syntax elems in char string are
! blank-separated and concatenated
! - this is an output text file line
)).
6. Final Comments
See section "5 Conclusions and Future Work" of
paper.
References
[Brooks 1975]
Frederick Brooks,
The Mythical Man-Month: Essays on Software Engineering,
Addison-Wesley.
[Graham 2002]
Paul Graham,
"Succinctness is Power".
[Prechelt 2000]
Lutz Prechelt,
"An empirical comparison of seven programming languages",
IEEE Computer, vol. 33, no. 10, pp. 23-29, Oct. 2000,
doi: 10.1109/2.876288,
link.
[Prechelt 2000TR]
Lutz Prechelt,
"An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl
for a search/string-processing program",
Dept. of Informatics, Univ. of Karlsruhe,
Technical Report 2000-5, Mar. 2000.
[Prechelt 2003]
Lutz Prechelt,
"Are Scripting Languages Any Good?
A Validation of Perl, Python, Rexx, and Tcl against C, C++, and Java",
in Advances in Computers 57:205-270, Dec. 2003,
doi: 10.1016/S0065-2458(03)57005-X,
link.
back
to Axiomatic Language Home Page