salad of math, recipes of lexers, parsers, interpreters, compilers

I've never understood the crazy demand for people collecting books which should teach about how to write a compiler and then they simply invoke lord Bison and mistress Lex during their laboratory activities.

  • Bison: YACC-compatible Parser Generator
  • Lex: Lexical Analyzer Generator
  • Yacc: Yet Another Compiler-Compiler
  • Flex: Fast scanner generator
  • Coco/R Compiler Generator

Sure there's enough software and hardware floating around now that you can mess about and possibly crank out something good with them on linux (which has the above tools), but ... I wonder how many knowledgeable people have really eaten a salad of math.

These yet another compiler - tools are good but … they tend to hide all the magic behind. While the math is rigorous and well defined, in computer science you can have cold drinks of various mixtures of raw or cooked fruits. That's the real fun.

See the following expression

# eval 1+(2.2*3.0E9+ ( 0xdeadbeaf logicalAnd 0xffffffff ) )

eval has said …. it's a "good expression" ... ohhh my goodness, certainly a bug in my source code, but what a delicious salad, let me offer a shake of different kinds, floating points with hex, logical with arithmetic and evil code

yards analysis: PASSED
stack analysis: PASSED

rpn_v[]={ 0 3 5 4 8 10 9 6 1 }

[1] 1 token_DecValue, type12
[2.2] 1 token_FloatinPointENGValue, type15
[3.0E9] 1 token_FloatinPointSCIValue, type16
[*] -1 token_Asterisk, type55
[0xdeadbeaf] 1 token_HexValue, type13
[0xffffffff] 1 token_HexValue, type13
[logicalAnd] -1 token_LogicalAnd, type41
[+] -1 token_Plus, type53
[+] -1 token_Plus, type53

#########################
#    good expression    #
#########################

(it's a lie, it's not a good expression, it's simply a bug)

Reading books like "dragon book" and "computer science algorithms for dummies" I was tempted to do my best to produce the worst code, so I have been playing with lexers, parsers, and RPN for weeks. The result is a black mass. The lexer is working good, but It was extremely difficult to make it to understand floating point notation, I was tempted to reinvent the wheel, then I grabbed the idea from SCI and ENG notations. It's used in CASIO pocket calculators. It was also more difficult to play with stack to make the engine to understand operators, their precedence and associativity

/*
 *  precedence   operators       associativity
 * ---------------------------------------------
 *  1            !               right to left
 *  2            * / %           left to right
 *  3            + -             left to right
 *  4            =               right to left
 */

now it seems it's beginning to work

# eval 1.1 + 2.2 * 3.0E9

yards analysis: PASSED
stack analysis: PASSED

rpn_v[]={ 0 2 4 3 1 }

[1.1] 1 token_FloatinPointENGValue, type15
[2.2] 1 token_FloatinPointENGValue, type15
[3.0E9] 1 token_FloatinPointSCIValue, type16
[*] -1 token_Asterisk, type55
[+] -1 token_Plus, type53

#########################
#    good expression    #
#########################
# eval 1+2+

yards analysis: PASSED
stack analysis: FAILED

#########################
#    bad  expression    #
#########################
# eval 1+2+(2+3

Error: parentheses mismatched/case1

yards analysis: FAILED
stack analysis: PASSED

#########################
#    bad  expression    #
#########################

but …. as you can see in the first eval above, the calculator has also learnt how to tell "lies"

this happens because in computer science you have to care about

  • data size (8bit, 16bit, 32bit, ...)
  • data type (integer, floating point, ...)
  • data sign (unsigned, signed, ...)

and certainly other things, including "strong type checking"

Today I have implemented a few mechanisms to handle functions. Probably I am too strong data type addicted because I used Pascal and ADA for too long.

# eval Fa(Fb(1,2,3,4,5),Fc(6,7,8,9),0)

yards analysis: PASSED
stack analysis: PASSED

function_pool has 3 functions
 + function1 [Fa] { typeRet typeRet type12 } 3 args
 + function2 [Fb] { type12 type12 type12 type12 type12 } 5 args 
 + function3 [Fc] { type12 type12 type12 type12 } 4 args 

rpn_v[]={ 4 6 8 10 12 2 17 19 21 23 15 26 0 }

[1] 1 token_DecValue, type12
[2] 1 token_DecValue, type12
[3] 1 token_DecValue, type12
[4] 1 token_DecValue, type12
[5] 1 token_DecValue, type12
[Fb] -4 token_StrictAlphaNum, type19
[6] 1 token_DecValue, type12
[7] 1 token_DecValue, type12
[8] 1 token_DecValue, type12
[9] 1 token_DecValue, type12
[Fc] -3 token_StrictAlphaNum, type19
[0] 1 token_DecValue, type12
[Fa] -2 token_StrictAlphaNum, type19

#########################
#    good expression    #
#########################

I am considering the RPN method as the hypothetical way to transform a math expression into machine opcodes, crazy idea for AS. I do not like methods like Recursive Descent.

recipes of lexers, parsers, interpreters, compilers anybody is cooking its own salad of math ?

let me know :D

Step away from the bong... (what poetic incomprehensibility!)

1+(2.2*3.0E9+ ( 0xdeadbeaf logicalAnd 0xffffffff ) )

That looks like a perfectly valid expression to me, at the parsing level. If you want to introduce rules like "you can't add the result of a logical operation to a floating point value", that would usually happy at a level above simple parsing. (Oh, I suppose you could have logical expressions be completely separate from mathematical expressions, but that doesn't seem particularly worthwhile. The equivalent expression in C (substituting && for logicaland) is accepted as correct (it generates warnings for other reasons...)

anybody is cooking its own salad of math ?

I once wrote subroutines for interactive fortran, something like "parsefunction(nargs)" and "execfunction(x,y,...)" The first would let you type in a function like "cos(sqrt(x^2+y^2))/(1+sqrt(x^2+y^2)" and build up some simple stack-oriented (rpn-like) compiled code in memory: push x, sqr, push y, sqr, add, sqrt, cos, ... Then when you called the second, it would evaluate the expression given the values provided. We used it to feed the neat hidden-line surface-drawing program on the tek4014 (sure beat having to recompile and relink the fortrans program each time you wanted to try a different function!) IIRC, it was based on the "my dear aunt sally" algorithm from "Best of Byte".) (You can download that these days: https://archive.org/details/best_of_byte_volume_1_1977_06 ), and was relatively straightforward to write (even in assembler, though it was a nice assembler!) (It used the existing fortran libraries for all the complex math functions.

Also, consider looking at Forth.

Lord Bison &his friends allow you to quickly design a lexer and a parser, hiding you a lot of features and troubles, while CoCo/R is pretty able to build something that obeys the rules and grammar you have designed, but it hides details and it makes you to struggle if you want to customize too deeply.

Designing things from scratch is good (of course if you have the time and the resources) to learn and to obtain better code. Bison &C’s output doesn’t look so good, it smells like ugly code.

westfw:
That looks like a perfectly valid expression to me, at the parsing level. If you want to introduce rules like “you can’t add the result of a logical operation to a floating point value”, that would usually happy at a level above simple parsing.

Ada performs “strong type checking”, I think I will introduce this feature at the same parsing level and it will say the above expression is a bad expression because the type checking will not be passed

yards analysis: PASSED
stack analysis: PASSED
type checking: FAILED <------------ new planned feature

westfw:
it was based on the “my dear aunt sally” algorithm from “Best of Byte”.)

Who knows My Dear Aunt Sally (the Math Game) ?

You know when you see students walking into classroom full of energy and excitement, and they are more engaged than ever during math lessons and they are raising their hands with the right answers more quickly when the teacher asks them to solve a problem. Why is this happening? The teacher made the decision to combine his/her teaching expertise with the innovative, skill-building “Math Game My Dear Aunt Sally”. I know, before of any paper and article it was used it has been used for decades to help students remember the mathematical order of operations.

My code is based on a modified Dijkstra’s algorithm, trying to generalize it into a operator-precedence parser.

westfw:
… and build up some simple stack-oriented (rpn-like) compiled code in memory

It sounds like rise your power with the Reverse Polish Notation :stuck_out_tongue:
Ironically I am planning something similar because I do not really like the Recursive Descent method. I have already implemented it in a (simplified) pascal-like interpreter and … I struggled too much.

westfw:
Also, consider looking at Forth.

Forth, a stack-oriented programming language. It’s good for RPN, but …
… I have a silver medal, it says “in strong type checking I trust:smiley:

I still have to implement type checking, but I have coded a simple codegen. It tries to generate opcodes from an RPN expression.

using an ideal RISC machine model, a few simple examples

# codegen 1+2+3

yards analysis: PASSED
stack analysis: PASSED

function_pool_show_all ... 0 functions

rpn_v[]={ 0 2 1 4 3 }

[1]  1 K3 cookie     token_DecValue, type12
[2]  1 K3 cookie     token_DecValue, type12
[+] -1 K5 operator   token_Plus, type53
[3]  1 K3 cookie     token_DecValue, type12
[+] -1 K5 operator   token_Plus, type53

#########################
#    good expression    #
#########################

##############[ code gen ]##############
load r1,1
load r2,2
uadd r1,r1,r2
load r2,3
uadd r1,r1,r2
# codegen 1+Fa(2)+Fb(3+4)

yards analysis: PASSED
stack analysis: PASSED

function_pool_show_all ... 2 functions
 + function1 [Fa] { type12 } 1 args ticket_rpn=2 
 + function2 [Fb] { type12 } 1 args ticket_rpn=7 

rpn_v[]={ 0 4 2 1 9 11 10 7 6 }

[1]  1 K3 cookie     token_DecValue, type12
[2]  1 K4 f_cookie   token_DecValue, type12
[Fa]  0 K2 function   token_StrictAlphaNum, type19
[+] -1 K5 operator   token_Plus, type53
[3]  1 K4 f_cookie   token_DecValue, type12
[4]  1 K4 f_cookie   token_DecValue, type12
[+] -1 K6 f_operator token_Plus, type53
[Fb]  0 K2 function   token_StrictAlphaNum, type19
[+] -1 K5 operator   token_Plus, type53

#########################
#    good expression    #
#########################

##############[ code gen ]##############
load r1,1
load_and_push r2,2
uadd SP,SP,2 /* arg_n, return address */
call Fa
load r2,(SP) /* load function result */
uadd r1,r1,r2
load_and_push r2,3
load_and_push r3,4
uadd r2,r2,r3
usub SP,SP,2
push r2
uadd SP,SP,2 /* arg_n, return address */
call Fb
load r2,(SP) /* load function result */
uadd r1,r1,r2
# codegen 1+Fa(2)+Fb(3+4)
   :
load r1,1
load_and_push r2,2
uadd SP,SP,2 /* arg_n, return address */
call Fa

Why are you adding 2 to the stack pointer? Presumably "load_and_push" and "call" both do their own manipulation of SP, which should be sufficient. Is that just a spot for the return value? Separate call and data stacks, perhaps?

I don't see any pops? For my own ancient effort, I sacrificed efficiency for simplicity, and implemented a strict dual-stack Stack Oriented Machine, which would have generated something like:

push datastack, 1
push datastack, 2
call do_Fa     ; pops (2), puts result on datastack
call add   ; pop and adds 1 and Fa Result, put sum on datastack
push datastack, 3
push datastack, 4
call add
call do_Fb
call add    ; Fb result + previous sum.
;; final result now on stack.

I may have been somewhat influenced by Forth at the time :-)

PS: do you have "The Dragon Book"?

the above code is just a proof of concept. It shows the RPN method is good for code-gen, and it may be used in hybrid solutions. In my case it hasn't been used in a pure stack machine.

westfw: Is that just a spot for the return value?

it's a spot, exactly.

westfw: Stack Oriented Machine

I have 64 CPU registers, so I'd better use them.

westfw: PS: do you have "The Dragon Book"?

sure, theoretically fine, practically useless.

I have to take a few decision about how to implement a few soft core’s details
so these lines are a proof of concept

load_and_push r2,2
uadd SP,SP,2 /* arg_n, return address /
call Fa
load r2,(SP) /
load function result */

ans=Fa(arg1)

once called, the function “Fa” has this stack

0x1000: arg1, #2 /* may be I will use byte_size of the full arg vector instead of how many 32bit words */
0x1004: arg_n, in this case size(arg1)…size(arg_n) is always 4 bytes, the function has 1 argument, so arg_n = 1
0x1008: return address
0x100C: local stack, function’s local variables <------------- SP points here

once returned to the caller, the caller sees this

0x1000: ans <------------- SP points here, this location contains the function return value

with load r2,(SP) the cpu register r2 contains the function result, which is good for my demonstration purpose

this line is just a proof of concept
uadd SP,SP,2 /* arg_n, return address */

it should push the arg_n instead

push arg_n /* how many arguments /
uadd SP,SP,1 /
reserve space for the return address */

actually “call” is not able to directly push the return address

in my soft core, when you call a function the RA is stored in r31
so inside a function you have to manually copy r31 into the stack
something like memwr -1(SP),r31

these are implementation’s details

I am satisfied, the RPN method is simpler than Recursive Descent, and, even if it’s stack oriented, it permits to write hybrid codegen methods for RISC machines.

a Pure Stack codegen looks this way

codegen 1+Fa(2)+Fb(3+4)
…
push 1
push 2
push 1 /* arg_n */
push ret_addr
call Fa
pop r3
pop r2
UADD r1,r2,r3
push r1
push 3
push 4
pop r3
pop r2
uadd r1,r2,r3
push r1
push 1 /* arg_n */
push ret_addr
call Fb
pop r3
pop r2
uadd r1,r2,r3
push r1