Script


In this article your will learn how to use and configure Script module in your applications. This article contains detailed explanation of the module architecture and principles of its work. Although the module is very powerful and customizable, it is easy in using and understanding.



Fast Start

Sometimes you need to add to your applications possibility to calculate simple mathematical expressions entered by a user. The module provides such possibility out of box. There is a ready to use class CExpressionRunner which allows to decode and execute mathematical expressions in only three lines of code. Here are they.

nitisa::script::generic::CExpressionRunner runner;
if (runner.Prepare(L"1 + 2 * 3") && runner.Run())
    std::wcout << (nitisa::String)runner.Result << std::endl;

The first line just declares the runner variable of type CExpressionRunner. The second line contains calls of two methods of the class. The Prepare() method prepares expression and the Run() method executes the prepared expression. Both of the methods can return false if something went wrong and expression cannot be executed properly (it might be wrong expression for instance). That is why we put it into if statement. Finally, we output execution result to standard output. The execution result is stored in the Result member of the class. It may store different type value (boolean, integer, string, etc).

That is actually all you need to start using expression calculation possibilities of the module. In the next paragraphs we'll explain architecture of the module in details and will show all supported functions and operators available by default in expressions.

Module Concepts

There are a few basic concepts the module is built on.

There first one is a Reader. It just reads source (usually a string) character by character and may transform that character before returning it. A reader is described by IReader interface. Please pay attention on the two Read() methods. The first one reads a character, optionally transforms it, and returns it. The second one reads a sequence of characters from a source and returns it as untransformed string. If requested character index if out of bounds, the first method returns character with code 0. The second method returns only those part of the source which is in source allowed range (specified in Min and Max properties) even if requested part is out of that range. There are three readers available in the module: CReaderLowerCase - returns character transformed to lower case, CReaderUpperCase - returns character transformed to upper case, and CReaderSensitive - returns untransformed character.

The second important concept is Lexic. Lexic describes character groups of specific language and their relation to identifiers of that language. Lexic is described by ILexic interface. It contains a few character classification methods which start with Is and the Escape() method which is used to process escape sequences in strings if language has them. The module contains implementation of lexics for the following languages: C++, JSON and XML. Lexics are being using together with the following objects.

Each source need to be transformed from plain sequence of characters into intermediate objects called tokens. Token is a single indivisible unit of the source. It may be integer number, string, identifier, operator, etc. Parser is responsible for extracting a single token from a source. Parser is described by IParser interface. There are several parser implementations in the module. They are: CParserHexadecimal - parses hexadecimal numbers, CParserIdentifier - parses identifiers, CParserNumeric - parses decimal integer and floating point numbers, CParserOperator - extracts operators from source, and CParserString - parses strings. There are also C++ language specific versions of some parsers. They are CParserBinary - parses binary numbers, CParserHexadecimal, CParserNumeric, CParserOctal - parses octal numbers, and CParserString. C++ versions supports features of number and string literals of the latest C++ standard (' separator between digits, number postfixes, string prefixes, hexadecimal floating point numbers).

Parser parses only specific token. Tokenizer is a collection of parses responsible for parsing source using language dependent set of parsers. Tokenizer is described by ITokenizer interface. It has overloaded method Next() for parsing source which should be called until it returns false meaning the end of a source is reached. The module has implementations of several tokenizers: C++ CTokenizer responsible for parsing source in C++ language format, Generic CTokenizer responsible for parsing source in simplified C++ language format and is being used in generic expression runner shown at the beginning of the article, JSON CTokenizer responsible for parsing source in JSON language format, and XML CTokenizer responsible for parsing source in XML language format. Tokenizer is also usually contains the list of available operators for the language it is designed to work with. Operator description is provided in form of Operator structure. It contains information about operator name, whether it should be separated by space at both ends, mathematical precedence and so on. That information is used to produce correct tokens and expressions.

While token is a basic meaningful lexical unit the Expression object or simple Expression is a hierarchy of that units. Consider mathematical expression 1 + 2 * 3 from the beginning of the article. It has 5 lexical meaningful units: 1 - integer number, + - addition operator, 2 - integer number, * - multiplication operator, and 3 - integer number. You may argue that there are also space symbols, but they are just for better view and they have no meaning during expression calculation. So, this mathematical expression may be represented by 5 tokens. But to calculate the expression properly we need to know that multiplication should be done before addition. Such information is being added using Expression. In this example Expression is like adding brackets: 1 + (2 * 3) but it is represented as a set of interrelated interfaces. The Expression is described by IExpression interface. There are several implementations of the expression: CExpressionBool represents boolean value, CExpressionBrace represents array value with possible type cast, CExpressionCall represents function call or just a groupped expression, CExpressionCast represents data type cast, CExpressionFloat represents floating point number, CExpressionIdentifier represents identifier, CExpressionInteger represents integer number, CExpressionNull represents null/nullptr value, CExpressionOperator represents operator, CExpressionSequence represents sequence of expressions, CExpressionString represents string value, CExpressionSubscript represents access to array element.

For the building expression hierarchy from list of tokens the Expression Builder is responsible. Expression Builder takes list of tokens and converts them into expression hierarchy. It is described by IExpressionBuilder interface. It also may use tokenizer instead of ready list of tokens to build hierarchy. In this case tokenizer is used first to get list of tokens. There is a generic implementation of expression builder in CExpressionBuilder class. The class is highly configurable. You may find description of all the possible configuration properties in the class reference page.

At this point it might seem to be enough information to execute expression. And it is true but the module goes further. Executing expression might be quite slow operation so the next layer is the optimization. Function object or simple function is tool of this optimization. Function is responsible for single action like multiplying two numbers or generating random number. In terms of function the mathematical expression can be represented as add(1, multiply(2, 3)). Here function add has two operands. When we execute it the function triggers first operand execution, which is obviously just returns 1, then is triggers second operand execution, which is function multiply. Function multiply executes first and second operands, which just returns 2 and 3, and then multiply them and return the result. After multiply finishes execution, add just adds results of the first and second operand execution. The function representation provides powerful optimization by eliminating all additional checks and deductions (for example, if executing expression hierarchy directly, we would have to execute a long if each time we encounter an operator to find out proper action for that operator). The function is described by IFunction interface. There are a lot of standard functions and operators implemented in form of function objects. They all are available as CFunction* classes in the module.

Function Factory is a tool for creating function object by its name. It is described by IFunctionFactory interface. CFunctionFactory class provides implementation of function factory which supports following operators and functions: &, |, ^, /, -, %, *, +, <<, >>, =, &=, |=, ^=, /=, -=, %=, *=, +=, <<=, >>=, &&, ==, >, >=, <, <=, !=, ||, <=>, a ? b : c(ternary operator), ->, {}, [], ~, !, --, ++, pi(), rnd(), rnd(min, max), rand(min, max), abs(x), acos(x), acosh(x), asin(x), asinh(x), atan(x), atan2(x, y), atanh(x), cbrt(x), ceil(x), cos(x), cosh(x), dim(x, y), erf(x), erfc(x), exp(x), exp2(x), expm1(x), floor(x), fma(x, y, z), gamma(x), hypot(x, y), ldexp(x, y), lgamma(x), log(x), log1p(x), log2(x), log10(x), max(arg1,...), min(arg1,...), mod(x, y), pow(x, y), remainder(x, y), round(x), roundf(x), sin(x), sinh(x), sqrt(x), tan(x), tanh(x), trunc(x). Also following data type conversion functions available: (bool)x, (float)x and (double)x are equivalent, (int)x and (int64)x as well as (Integer)x are equivalent, (String)x. Also, conversion may be used in function style: bool(x).

To convert expression hierarchy into optimized function object hierarchy the Expression Runner is used. It is described by IExpressionRunner interface. The implementation CExpressionRunner, you are familiar with from the beginning of the article, uses generic::CTokenizer to parse source into tokens, generic::CExpressionBuilder to build expression hierarchy, and generic::CFunctionFactory for find function objects. All this true in case you use bool Prepare(const String &source); method of the runner and no constructor argument is provided (or true is used).

Customization

The generic implementation of the expression runner CExpressionRunner is really powerful. First of all you may specify which functions and operators are allowed in expression. If you create expression runner instance with no argument in constructor, it will use generic function factory described earlier with all the functions and operators it has. Additionally, you may specify another, your own, function factory implementation via setFunctionFactory() method. In this case search for the operator or function will continue using this function factory, which allows to add more functions and operators. If you don't need all the functions and operators from generic function factory, just specify false as a constructor argument of generic expression runner and default generic function factory won't be used anymore. In this case the search will happen only in the function factory provided via setFunctionFactory() method.

Instead of using bool Prepare(const String &source); method, which uses generic tokenizer and expression builder, as mentioned above, you may use bool Prepare(IExpression *expression); method supplying it with expression hierarchy created by any other means.

In many places you may find possibility to specify IErrorListener. If you provide instance of that interface you will be able to get detailed information about errors happening during execution. In the same way in some places you may specify IProgressListener interface instance to check progress of operation and be able to abort it.

JSON and XML

Besides executing mathematical expressions the module provides means to decode and encode JSON and XML formats.

For JSON decoding the json::Decoder::Decode() methods are. They take JSON-encoded string and convert it into Variable or Variant. The first method is preferable bacuse Variable is more efficient than Variant.

Variable target;
if (nitisa::script::json::Decoder::Decode(L"{ key: 2 }", target))
    std::cout << "OK" << std::end;

For the reverse operation the json::Encoder::Encode() methods are. The compact argument indicates whether pretty, human readable, output is required or not. Here is an example.

// Build data to be converted to JSON string 
nitisa::script::Variable source, obj, arr;
source.push_back(nitisa::script::Variable{ });
source.push_back(true);
source.push_back(false);
source.push_back(123ll);
source.push_back(-22ll);
source.push_back(4.5);
source.push_back(-4.5);
source.push_back(L"hello");
obj[L"key1"] = 33ll;
obj[L"key2"] = false;
source.push_back(obj);
arr.push_back(L"red");
arr.push_back(L"green");
arr.push_back(L"blue");
source.push_back(arr);
std::wcout << nitisa::script::json::Encoder::Encode(source, true) << std::endl; // This will output [null,true,false,123,-22,4.5,-4.5,"hello",{key1:33,key2:false},["red","green","blue"]]
std::wcout << nitisa::script::json::Encoder::Encode(source, false) << std::endl; // This will output:
//[
//    null,
//    true,
//    false,
//    123,
//    -22,
//    4.5,
//    -4.5,
//    "hello",
//    {
//        key1: 33,
//        key2: false
//    },
//    [
//        "red",
//        "green",
//        "blue"
//    ]
//]

In similar way you may use xml::Decoder::Decode() and xml::Encoder::Encode() methods for XML decoding and encoding. The only difference is using Entity in this case, which is similar to the objects used in JSON decoding-encoding except it is more specific to XML format.

Advanced Expressions

You might have already noticed that there are non quite ordinary mathematical operators in the list of supported operators and function for CExpressionRunner we provided earlier. There are some operators which also change one of the operands. For example, operator ++ is usually used to change operand value before or after adding 1. Another example is operator *=. It's purpose to change first operand as well.

Having such operators support allows to build more complex expressions and have more than one result. IExpressionRunner allow to have variables which may be managed via its getVariable(), AddVariable() and DeleteVariable() methods. Variables are not removed during expression preparations, so you may manage them any time. During preparation, if new variable name is detected in expression being prepared, it is automatically added to variable list. Variables can store simple scalar values, like numbers and strings, as well as arrays, including arrays of arrays, and objects. Consider following example.

nitisa::script::generic::CExpressionRunner runner;
if (runner.Prepare(L"a += b++") && runner.Run())
{
    std::wcout << L"Result = " << int64(runner.Result) << std::endl;
    std::wcout << L"a = " << int64(*runner.getVariable(L"a")) << std::endl;
    std::wcout << L"b = " << int64(*runner.getVariable(L"b")) << std::endl;
}

Here we didn't add any variable to runner before running the expression. In this case Prepare() method detects variables and adds them into list automatically so the variables a and b are available after preparation. All automatically added variables stores empty value. When running this expression the part b++ is calculated first. It's result is 0 because b stores empty value and posterior addition operator is used. After it, b variable value is increased by 1 and becomes 1. Next part is calculation operator +=. Again, a is initially empty and so it is converted to integer 0. Then it adds up with 0 value we had for b variable at the beginning. So, the final result is 0 + 0 and this result is stored in a. Thus, the example above will print following.

Result = 0
a = 0
b = 1

No lets add another copy of if statement.

nitisa::script::generic::CExpressionRunner runner;
if (runner.Prepare(L"a += b++") && runner.Run())
{
    std::wcout << L"Result = " << int64(runner.Result) << std::endl;
    std::wcout << L"a = " << int64(*runner.getVariable(L"a")) << std::endl;
    std::wcout << L"b = " << int64(*runner.getVariable(L"b")) << std::endl;
}
if (runner.Run())
{
    std::wcout << L"-----------------" << std::endl;
    std::wcout << L"Result = " << int64(runner.Result) << std::endl;
    std::wcout << L"a = " << int64(*runner.getVariable(L"a")) << std::endl;
    std::wcout << L"b = " << int64(*runner.getVariable(L"b")) << std::endl;
}

The work of the first if is already known. When the second if statement executes prepared expression, the value of b variable is not an empty value anymore. It is 1. So, after execution we will get following.

Result = 1
a = 1
b = 2

When working with arrays, you first need to ensure it has correct size. The elements of array are not added automatically. So the following example will not run (but it will be prepared successfully).

nitisa::script::generic::CExpressionRunner runner;
if (runner.Prepare(L"a[1]++") && runner.Run())
{
    // Never gets here because Run() returns false 
}

To make this work you need to make array a to have size at least 2 elements (because we try to access second element). The modified code may look like following (we assumed for simplicity that using namespace nitisa::script; is added somewhere before this code).

generic::CExpressionRunner runner;
runner.AddVariable(L"a", Variable::Array{ 0ll, 2.5 });
if (runner.Prepare(L"a[1]++") && runner.Run())
{
    std::wcout << L"Result = " << runner.Result.toString() << std::end;
    std::wcout << L"a = " << runner.getVariable(L"a")->toString() << std::end;
}

Please note that we used toString() method of the Variable. It works slightly different from converting to string using operator String() of the Variable. The operator returns empty string if variable stores array or object. The toString() returns string representation of array of object. So, the code above will output following.

Result = 2.5
a = [0, 3.5]

If working with objects, the rule is the same for object properties, so the following example will not work.

generic::CExpressionRunner runner;
if (runner.Prepare(L"a.X++") && runner.Run())
{
    std::wcout << L"Result = " << int64(runner.Result) << std::end;
    std::wcout << L"a = " << runner.getVariable(L"a")->toString() << std::end;
}

But if we define X property in the object, it will.

generic::CExpressionRunner runner;
runner.AddVariable(L"a", Variable::Object{ { L"X", 0ll } });
if (runner.Prepare(L"a.X++") && runner.Run())
{
    std::wcout << L"Result = " << int64(runner.Result) << std::end;
    std::wcout << L"a = " << runner.getVariable(L"a")->toString() << std::end;
}

It will output following.

Result = 0
a = { X: 1 }

Members of object can be accessed with following operators: ., -> and :: if generic runner is used (with generic function factory of cause).

Arrays and objects can also be mixed. So you can have array of objects or objects with members which are arrays and so on.

Sometimes mathematical expressions may cause exceptions. This may happen when, for example, division by zero is attempted. To handle such errors you need to wrap call of Run() method in try...catch statement.

The final notice is about the expressions like a++ *= 2. In C++ they are forbidden but here they are OK. But they may have unexpected result. Lets sey variable a equals to 1 before running this expression. You might expect it to be equal 4 after the running (adding 1 by ++ operator and then multiplying by 2). But it will be actually equal to 2. Let us see why. Operator *= has two operands. The first operand is a++ which result is 1 because operator ++ is applied after the getting result of a++. So the *= multiplies 1 and 2. It gives 2, which is then written into variable a and returned as a result of a whole expression.

You should also be careful with expressions like a++++ and (a++)++. At the first glance they might seem to be equivalent, but they are not. In the first expression operator ++ is applied twice to the variable a. In the second case the outer ++ operator is applied to (a++) expression, not to the a variable. So the second expression will be less by 1 from the first one.