Nitisa. Open-source C++ GUI framework with Form Builder

Scripting

In this article your will learn how to setup and use Scripting module in your applications. This article contains detailed explanation of the module architecture and principles of its work. Although the module is very powerful and customizable, it is easy in use and understanding.

Scripting module features
Basic explanation
Parsing into tokens
Building token list into expression
Calculating expression
Basic usage example

Scripting module features

The main goal of the Scripting module is to provide ability to understand and run custom scripts. The script may be written in any language. For example, it could be JavaScript or even a new scripting language you develop. The module is not limited only by running scripts feature. You may use it for creating parser for different formats, like JSON, XML, HTML, and so on. You even can use it in compiler development to simplify parsing your programming language source code before compiling.

Basic explanation

Parsing and running scripts in the module is separated into 3 main parts which will be explained in details later.

Parsing into tokens is a process of reading source code and converting it into small parts, called tokens, for further processing.
Building expression process transforms plain list of tokens into structured data called expression.
Calculating expression is the process of processing expression, running mathematical operations, and function calling to produce final value coded in expression. This is the script running process.

This 3 parts of scripting has 3 helper entities: tokens, expressions, and values. The entire process could be shown as following.

Parser takes source code and produce tokens, Builder take tokens and produce expression, and finally Calculator process expression to get final result in form of variable. The TOKEN structure represents token, the CExpression class represents expression, and the CVariable class represents variable.

Parsing into tokens

Each source code can be represented by small parts called tokens. Lets look at the script source code below.

sin(2 * pi + x)

What do you think are the best candidates for tokens in this expression? If you think in the same way we do, your will answer they are sin, (, 2, "space", *, "space", pi, "space", +, "space", x, and ). We used "space" word to write a token which is a space character. As you can see there can be different types of tokens. In this example tokens sin, pi, x are identifiers. The token 2 is number. And rest of them could be called operators. By operators we mean not only mathematical operators but all others symbols or symbol sequences which purpose to separate other tokens from each other or modify them. For example, tokens which represents space, tab, and new line characters are separation tokens. Mathematical tokens represent mathematical operations like +, -, *, /, and so on. The token type in the module is defined by TOKEN_TYPE enumeration. There is also a set of predefined operator types. They are described by OPERATOR_TYPE enumeration.

Parsing source code into tokens is processed by CParser class. Before starting parsing you have to define source code language and allowed operators set. The language is defined by following options which all have corresponding getter and setter method in CParser(for example, getSymSpace() is getter and setSymSpace() is setter).

CaseSensitive property defines whether the source code language uses case sensitive names and identifiers.
SymSpace property defines allowed space symbols like space and tab characters.
SymNewLine property defines list of allowed line separators like new line character.
SymIdentifier property defines list of all allowed symbols in identifiers. For example, for C++ like language a list of these symbols is abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.
SymIdentifierNotStart property defines additional allowed symbols in identifiers but identifier should not start from any of it. For example, for C++ like language they are 0123456789.
SymNotAll property defines additional allowed symbols in identifiers but identifier can not have only this symbols. For example, for C++ like language it is _.
Escape function, if exists, should check if there is escape character at specified position in source code. For example, strings in C++ like language are enclosed in ""(for example, "This is string". The string could have escape sequences to represent different characters. For example "This \t sequence represents tab character". So this function purpose is to detect such sequences inside strings to avoid incorrect processing of them.
ParseBin property may contain pointer to a function used to parse and understand binary constants. For example, 0b10101010 is a binary constant in C++ language.
ParseOct property may contain pointer to a function used to parse and understand octal constants. For example, 07070 is an octal constant in C++ language.
ParseHex property may contain pointer to a function used to parse and understand hexadecimal constants. For example, 0xFFFF is a hexadecimal constant in C++ language.
ParseNumber property may contain pointer to a function used to parse and understand common base-10 integer and floating point numbers. For example, 25 or 3.8.

The properties SymIdentifier, SymIdentifierNotStart, SymNotAll are used to define rules for identifiers and, taking into consideration above examples of such properties settings, we can rewrite C++ like language identifier rules in human language like: identifier may have latin alphabet characters, digits, and underscore symbol; identifier should not start with digit and can't have only underscores.

Binary, octal, and hexadecimal constants, if parsed successfully by corresponding function, are represented by integer type tokens.

The final setting parser requires is a list of allowed operators for your source code language. Allowed operator is defined by TOKEN_OPERATOR structure. It has several properties which allow you to define the type of the operator(you may use a set of predefined operator types declared in OPERATOR_TYPE enumeration), close operator type(which is for so called block operators which defines blocks; for example, string is a block started from " operator and ended with the same " operator; the another example is block started by { and ended by }), mathematical precedence of the operator(which is not actually used is parser but it is used later to arrange mathematical expression parts accordingly to mathematical precedence of operators), string representation of operator and it's close operator(if the last one exists), which symbol sequences should be ignored or replaced in block operator values(it is used for block operators defining strings), and a set of flags which defines type of the operator and it's behaviour in expression building and following calculation. There are a lot of predefined settings for token operators you can find on Consts page.

There is another way to setup parser. There is a list of methods like SetPascal() and SetCpp() which could be used for setting up a parser with language settings and operator lists similar to ones used in corresponding language.

After setting up a parser the only thing is left is to call Parse method until it returns token. When it returns nullptr, it means the end is reached. Store found tokens into Tokens type variable.

Building token list into expression

After getting a list of tokens from a parser you can put them into a builder to produce a structured object which later can be used for calculating final value or for any other purpose. For preparation of such an object, called expression, the Builder is responsible.

As with parser, the builder also requires to be configured before using. You need to add operators defining different structure starting points (like { and [ for opening blocks) and separator operators for defining bounds of objects(like , for defining function arguments separation operator). If you want the builder process procedures and subscripts you have to add procedures and/or subscript operators(like ( and [). If you want the builder process arrays you have to add array operators(like [). You also can add corresponding separators if your arrays, functions/procedures, and subscripts can have more than one argument. You also have to add complex separator operators to process complex values which are something like structures in programming languages(for example, . and -> are complex value separators in C++). You also have to add all block starting operators even if the same ones already were added as procedure, subscript, or array starting operators. You may turn on and off mathematical expression building by BuildMath property. This property controls if the final expression will have mathematical expressions correctly arranged.

Builder also has a set of SetXXX methods which allow you to quick configure it for working with some languages.

After building expression by Build method of the builder you also have to check if any error had happened during the building process. You may access last error by getLastError() method. Always check for error because the builder will return expression even if it was not built correctly.

Calculating expression

Correctly built expression then can be calculated using calculator which has standard implementation in CCalculator class. The calculator has less configuration options then Parser and Builder. You can specify set of functions/procedures allowed in your scripting language if any is needed. You also can set values to needed variables. Lets look at the previous script example again.

sin(2 * pi + x)

To calculate this script we should have value of pi and x variables and also we should have sin() function. Variables could be defined very easy. The calculator has a lot of methods for it. For example, assuming calculator is instance of CCalculator class, we can set values for this variables with following two lines of code in our program.

calculator.SetVariable(L"pi", 3.14);
calculator.SetVariable(L"x", 0.1);

As for functions/procedures we should have a class derived form IProcedure which should implement needed function. Then we can add it to calculator by AddProcedure method. There are a lot of different classes which implement different mathematical functions. All this classes have name like CProcedureXXX and could be found in Classes section.

If your script doesn't use variables or functions, you don't have to specify them at all.

Basic usage example

Here is an example of using all the three stages and calculating final value for the script example we used before.

String script{ L"sin(2 * pi + x)" }; // Variable containing our script 

// Parsing 
CParser parser; // Create parser 
parser.SetCpp(); // Set C++ like language settings 
TOKEN *token; // Variable to store found token 
Tokens tokens; // Storage for all found tokens 
int index{ 0 }; // Start parse tokens from the beginning of script 
while (token = parser.Parse(script, 0, script.length() - 1, index)) // Parse entire script while there are tokens 
{
    if (token->Type == ttOperator && in(ofNotOperator, token->Operator->Flags)) // Tokens which are operators with flag ofNotOperator are usually useless because only space symbols and new lines has such a flag in their settings 
        delete token; // Don't add useless token in list and delete it instead 
    else
        tokens.push_back(token); // Add useful token to list 
}

// Building 
CBuilder builder; // Create builder 
builder.SetCpp(); // Set C++ like language settings 
CExpression *expression{ builder.Build(tokens) }; // Build expression 
if (builder.getLastError().Type == errOk) // Check for absence of error 
{
    // Calculating 
    CCalculator calculator; // Create calculator 
    calculator.SetVariable(L"pi", 3.1415926); // Set value for pi variable 
    calculator.SetVariable(L"x", 0.1); // Set value for x variable 
    calculator.AddProcedure(new CProcedureSin()); // Add sin function 
    CVariable *result{ calculator.Calculate(expression, nullptr) }; // Calculate 
    if (result) // Check if we got the result 
    {
        std::wcout << String(*result) << std::endl; // Output result to console 
        delete result; // Delete result variable 
    }
    else // In case of errors we can, for example, print them to console to see what is wrong 
    {
        for (int i = 0; i < calculator.getErrorCount(); i++)
            PrintError(calculator.getError(i));
    }
}
delete expression; // Delete expression 
FreeTokens(tokens); // Delete tokens