Repository URL to install this package:
Version:
1.4.0 ▾
|
.. |
ast |
cl |
encoding |
matcher |
parser |
scanner |
token |
types |
variant |
README.md |
tpl.go |
Text processing is a common task in programming, and regular expressions have long been the go-to solution. However, regular expressions are notorious for their cryptic syntax and poor readability. Enter Go+ TPL (Text Processing Language), an enhanced alternative that offers both power and intuitive syntax.
Go+ TPL is a grammar-based language similar to EBNF (Extended Backus-Naur Form) that seamlessly integrates with Go+. It provides a more readable and maintainable approach to text processing while offering capabilities beyond what regular expressions can achieve.
To understand Go+ TPL, you need to grasp three key concepts:
The foundation of TPL is its naming rules, expressed as name = rule
. A TPL grammar consists of a series of named rules, with the first one being the root rule. The rule
can be a combination of:
INT
, FLOAT
, CHAR
, STRING
, IDENT
, "+"
, "++"
, "+="
, "<<="
, etc.IDENT
enclosed in quotes, such as "if"
, "else"
, "for"
.R1 R2 ... Rn
- matches a sequence of rules.R1 | R2 | ... | Rn
- matches any one of the rules.*R
- matches the rule zero or more times+R
- matches the rule one or more times?R
- matches the rule zero or one time (optional)R1 % R2
- shorthand for R1 *(R2 R1)
, representing a sequence of R1 separated by R2. For example, INT % ","
represents a comma-separated list of integers.R1 ++ R2
- indicates that R1 and R2 must be adjacent with no whitespace or comments between them.The default operator precedence is: unary operators (*R
, +R
, ?R
) > ++
> %
> sequence (space) > |
. Parentheses can be used to change the precedence.
STRING
(string literals) can take two forms:
"Hello\nWorld\n" // QSTRING (quoted string) `Hello World ` // RAWSTRING (raw string)
STRING
can be defined as:
STRING = QSTRING | RAWSTRING
Since TPL rules automatically filter whitespace and comments, the sequence R1 R2
doesn't express that R1 and R2 are adjacent. This is where the adjacency operator ++
comes in.
For example, Go+ domain text literal is defined as IDENT ++ RAWSTRING
, making these valid:
tpl`expr = INT % ","` json`{"name": "Ken", age: 15}`
While these would match IDENT STRING
but are not valid domain text literals:
tpl"expr = *INT" // IDENT must be followed by RAWSTRING, not QSTRING tpl/* comment */`expr = *INT` // No whitespace or comments allowed between IDENT and RAWSTRING
Each rule has its built-in matching result:
Tokens and Keywords: Result is *tpl.Token
.
Sequence (R1 R2 ... Rn
): Result is a list ([]any
) with n elements.
Repetition (*R
, +R
): Result is a list ([]any
) with elements depending on how many times R matches.
Alternatives (R1 | R2 | ... | Rn
): Result depends on which rule matches.
Optional (?R
): Result is either the result of R or nil
if no match.
List Operator (R1 % R2
): Result is a complex tree-like structure with three levels.
Let's explain why it has three levels:
R1 % R2
(i.e.R1 *(R2 R1)
), which is a list with two elements.R1
.(R2 R1)
matches.
R2
and the result of R1
.For example, when parsing "1, 2, 3"
with INT % ","
, the result structure would be:
[
<INT:1>, // First R1
[
[<COMMA>, <INT:2>], // First (R2 R1)
[<COMMA>, <INT:3>] // Second (R2 R1)
]
]
This tree-like structure preserves all the information about the matched elements and their relationships, but can be complex to work with directly. That's why TPL provides helper functions like ListOp
and BinaryOp
to transform this structure into more usable forms.
Adjacency Operator (R1 ++ R2
): Result is a list ([]any
) with 2 elements, similar to a R1 R2
sequence.
The default matching result is called "self" in TPL. You can rewrite this result using a Go+ closure => { ... }
.
This feature is crucial as it allows seamless integration between TPL and Go+. In Go+, you reference TPL through domain text literal, and within TPL, you can call Go+ code through result rewriting.
import "gop/tpl" cl := tpl` expr = INT % "," => { return tpl.ListOp[int](self, v => { return v.(*tpl.Token).Lit.int! }) } `! echo cl.parseExpr("1, 2, 3", nil)! // Outputs: [1 2 3]
This example parses a comma-separated list of integers and converts it to a flat list of integers using TPL's ListOp
function.
Creating a calculator with Go+ TPL is remarkably concise:
import "gop/tpl" cl := tpl` expr = operand % ("*" | "/") % ("+" | "-") => { return tpl.BinaryOp(true, self, (op, x, y) => { switch op.Tok { case '+': return x.(float64) + y.(float64) case '-': return x.(float64) - y.(float64) case '*': return x.(float64) * y.(float64) case '/': return x.(float64) / y.(float64) } panic("unexpected") }) } operand = basicLit | unaryExpr unaryExpr = "-" operand => { return -(self[1].(float64)) } basicLit = INT | FLOAT => { return self.(*tpl.Token).Lit.float! } `! echo cl.parseExpr("1 + 2 * -3", nil)! // Outputs: -5
This calculator handles basic arithmetic operations with proper operator precedence in less than 30 lines of code.
Go+ TPL offers a powerful yet intuitive alternative to regular expressions for text processing. By combining grammar-based parsing with seamless Go+ integration, it enables developers to create clear, maintainable text processing solutions.
For more examples of TPL in action, check out the Go+ demos starting with tpl-
at https://github.com/goplus/gop/tree/main/demo. These examples showcase how to implement calculators, parse text to generate ASTs, and even implement entire languages in just a few hundred lines of code.
Whether you're parsing structured text, building domain-specific languages, or implementing complex text transformations, Go+ TPL provides a robust and readable approach that surpasses traditional regular expressions.