Common State Functions
Swift
Each state has three key functions you can use to easily build your tokeniser. Each state also implements Printable, constructing the OK Script for itself as its description. This is useful if you would like to confirm the syntax for usage in an OK Script.
branch(states:TokenizationState...)->TokenizationState
Accepts a series of states which will be added as branches of the target state. It returns the target state to enable chaining
sequence(states:TokenizationState...)->TokenizationState
Will chain the supplied states together. As opposed to branch each state will be attached as a branch of the state before it in the list. As with branch it returns the target state to enable chaining.
token(tokenName:String)
token(token:Token)
token(with:TokenCreationBlock)
These three methods specify the token that should be created if the state is satisfied. In general these should be terminal states, but the method returns the target state for chaining should that not be the case
OK Script
The . Operator
In OK script a branch is added using the . operator.
"a"."b"
The example above says a should be followed by b. If you would like a to be followed by b OR c simply enclose them in a branch (see below for more details)
"a".{"b","c"}
Creating tokens
Tokens are created using the -> operator followed by the name of the token to be emitted. For example if we wanted to extend the last example to emit either an "ab" or "ac" token, we simply do the following
"a".{"b"->ab,"c"->ac}
Tokens must start with an english letter, but after this can contain decimal digits an - or _ (dash or underscore)
Branch
Branch is the basic state, it maintains a list of branches in an ordered array. When a branch state is evaluated it looks for one of its child states for one that can accept the available character. If subsequently asked to consume that character it will pass it to the child state.
Swift
Branch()
Constructs an empty branch state
Branch(states:TokenizationState...)
Constructs a Branch with the specified branches
OK Script
Branches have very simple syntax, you simply specify an open branch
{State, State, ..., State}
You will also see this Branch syntax used wherever a set of states must be suppled. For example, a Branch with two Char states is simply
{"a","b"}
Char
Swift
The Char state accepts or rejects characters supplied to it's initialiser. These are supplied in a Swift String, and the particular constructor you use will govern if only characters in the String are accepted, or if any character except that in the String is accepted
Char(from:String)
Only characters from the supplied String are accepted.
Char(except:String)
Any characters except those in the supplied String are accepted.
OK Script
The Char state is represented by double quotes "" with any character between the quotes used for the accept string. If you wish to invert (make any character except the ones supplied acceptable) simply prefix the first " with a !
"a" //Only a accepted !"a" //Everything except "a" accepted
The following escape codes may be used \" for double quote, \\ for backslash, \n for newline, \t for tab and \r for carriage return
Repeat
The repeat state counts the number of tokens issued by a child set of states (they become the root of the tokenisation process until the repeat fails or is satisfied). You may specify both a minimum and maximum number of times the state should be entered.
Swift
Repeat(repeatingState:State,min:Int =1, max:Int?)
The repeating state can be any state (including for example a Branch() or another Repeating state. The Repeated state will be exited when repeatingState can no longer be entered. At this point any token specified on the state will be emitted, and its branches evaluated. A token will not be emitted if the minimum number of tokens have not be issued by the repeatedState. As soon as the maximum number (if specified) is reached, the state will exit through its branches, or directly to the parent state.
OK Script
The OK Script very closely mirrors the Swift.
(repeated-state[,min[,max]])
For example, to match exactly two hexadecimal digits.
("0123456789abcdefABCDEF"->hexDigit,2,2)->byte
Delimited
Delimited states allow you to enter a completely different tokenisation strategy when a delimiter is encountered. You may specify a single delimiter (e.g. ' ) or a specific opening a closing delimiter (e.g. [ and ] ). Unlike Repeat states, the tokens emitted by states inside the delimiter will be published in the normal fashion. The delimiter itself will issue the specified token when it is entered and exited.
Swift
Delimited(delimiter:String,states:TokenizationState...)
Creates a delimited state using a single string for both the start and end of the delimitation. Any number of states can be supplied and act like the root of a tokenizer until the delimiter character is encountered again.
Delimited(open:String,close:String,states:TokenizationState...)
As above but a separate opening and closing delimiter can be specified.
OK Script
Delimited states are specified between < and > characters. They take up to three parameters (just like the Swift constructors.
<'opening-delimiter'[,'closing-delimiter'], delimited-states>
If a closing delimiter is not specified the opening delimiter will be used. If you wish to use ' as the delimiter it must be escaped (\') and backslash can be used by escaping it also (\\).
Only one character can be used for a delimiter.
As an example, here any character is accepted in-between quotation marks
<'"',{!"\"""->char}>->double-quote
Swift is very young... does its performance have a significant impact on your code's performance? Do we need to wait until they tune the compiler? SPOILER ALERT: No, it's your fault dummy.