Artificial Intelligence


This article describes Artificial Intelligence module added in 4.0.0 version of the Nitisa framework. After reading this article you will know how to build your own multi layer perceptron and use it in classification tasks.



About module

AI module provides you with template classes you can use to build and train multi layer perceptron or multi layered neural network with fully connected layers. The module can be included by #include "Nitisa/Modules/AI.h". The module is completely implemented on templates. You can find documentation about this module on AI module reference page.

AI module does not implement all the neuron networks currently known and used and it doesn't provide any optimization using GPU or parallel execution. It's purpose is rather to provide you with possibility of fast start, strategy testing, and other not very complex goals. We are also planning to extend this module in future and implement the most powerful and widely used networks and features, like convolutional and recurrent networks, deep learning, and so on.

Despite this the AI module is very powerful. It is not only implement simple neural network but also allows you to use such an advanced techniques like regularization and momentum, different activation, error, and regularization functions, smart weights and biases initialization.

Before talking about neural network let us explain shortly some helper parts used with neural networks and available in the AI module. Here they are.

Activators

Each neuron in neural network require activation function. In the framework we call them activators. Such a function takes weighted neuron input(sum of inputs multiplied on corresponding weight) and produce some value. There are lots of functions used for such a purpose. We have collected many of them in nitisa::ai::activators namespace. Most commonly used are sigmoid activation function implemented by TLogistic template and the hyper tangent(tanh) function implemented in THyperTan. The are lots of other activation functions in the module which you can find on the AI module reference pages. Moreover, you can add your own activation functions. All you need is to derive them from IActivator interface, implement abstract methods and use it in the networks.

Error functions

The second important part of the neural networks is error function. The error function goal is to calculate how well neural network works. All available error functions are located in nitisa::ai::errors namespace. The most often used ones are already implemented. There are quadratic error function implemented in TQuadratic template and so called cross-entropy error function which is implemented in TCrossEntropy template. You can also use your own error functions as well. Just derive them from IErrorFunction interface and pass it to your network.

Regularizers

To improve neural network a so called regularization is often applied. It is an optional feature. To apply regularization you have to tell your network which regularization function it should use. There are two commonly used regularization functions implemented in the AI module. They are L1 and L2. You, as always, can implement your own regularization function by deriving it from IRegularizeFunction interface. All the functions are in nitisa::ai::regularizers namespace. To apply regularization you not only have to provide network with regularization function but also specify non-zero regularization rate.

Randomizers

When you create neural network you have to initialize neuron weights and biases somehow. It is usually done by setting their values to some random numbers. To do this we introduce randomizers - an objects which generate such a random numbers. They are located in nitisa::ai::randomizers namespace. And, as usually, there is an interface IRandomizer which will help you build your own randomizers. If you don't specify any randomizer to your network the default will be used. Usually using the default randomizer is more than enough.

Multi layer perceptron(MLP)

The core object of the AI module is Multi Layer Perceptron(MLP). It is implemented in TPerceptron template. To create a neural network all you need is to create TPerceptron object specifying network structure and other options in constructor. After it you have to teach your network. The TPerceptron is a network which learns with teacher using stochastic gradient descent method(back propagation). So the process of learning is just supplying a network with examples and required result many times. We will show an example a little bit later. TPerceptron has some parameters you have to specify. They are called hyperparameters and they are learning rate, regularization rate, momentum rate, and network structure itself. There are no common algorithms to select these hyperparameters so you have to experiment to find the best ones. Usually network architecture is following. You have the same number of inputs as your data set feature count. Number of outputs is the same as number of classes your data belongs. And there could be several hidden layers. Do not use too much hidden layers. 1-4 is usually okay. More hidden layers learn very slow and may give worse result then the net with fewer hidden layers. If your classes are linearly separable you can have no hidden layers at all.

If you use limited range activation functions(like sigmoid which output range is from 0 to 1) your have to understand that the output will never be out of the function range and specifying in desired output values out of these range is a mistake. Most input data should be preprocessed before using it to teach and process in neural network. Often there are 2(or even three) sets of data. One of them is used to train the network and the other one is to calculate error. Teaching is often done in epochs and both training and test data sets are shuffled each epoch.

That is all about TPerceptron and tips how to train it. Lets now look on simple neural network code.

Example

This code trains the network to classify data by 2 classes. The data is a set of randomly generated points in 2D space.


// We assume this code either in nitisa::ai namespace or using nitisa::ai; is written somewhere before. So we omit adding AI module namespace everywhere 
// Generate data 
struct DATA // Data structure 
{
    double X; // X-coordinate 
    double Y; // Y-coordinate 
    int Class; // Class to which the point belongs 
};
std::vector<DATA> train_data, test_data; // Arrays of train and test data 
{
    const int NUM_SAMPLES{ 250 }; // Total number of points in each class 
    std::vector<DATA> data; // Temporal storage for generated points 
    randomizers::TDefaultFloat<double> r1{ -3, -1, 6432976 }, r2{ 1, 3, 6945738 }; // 2 randomizers. One will generate points in range [-3..-1], and the other in range [+1..+3] 
    for (int i = 0; i < NUM_SAMPLES; i++) // Generate points 
    {
        data.push_back({ r1.Generate(), r1.Generate(), -1 }); // Generate point which belongs to first class 
        data.push_back({ r2.Generate(), r2.Generate(), +1 }); // Generate point which belongs to second class 
    }
    std::random_shuffle(data.begin(), data.end()); // Randomly shuffle generated points 
    for (int i = 0; i < NUM_SAMPLES; i++) // Put half of the data into training array 
        train_data.push_back(data[i]);
    for (int i = NUM_SAMPLES; i < (int)data.size(); i++) // Put another half of the data into testing array 
        test_data.push_back(data[i]);
}

activators::TLogistic<double> activator{ 1 }; // Activation function for hidden layers 
activators::TLogistic<double> activator_output{ 1 }; // Activation function for output layer 
errors::TCrossEntropy<double> error_function; // Cross-entropy error function 
regularizers::TL2<double> regularize; // L2 regularization function 
TPerceptron<double> net{ // Neural network(Multi layer perceptron) 
    &activator, // Hidden layers activation function 
    &activator_output, // Output layer activation function 
    &error_function, // Error function 
    2, // Our data has two features(X and Y coordinates) so we need two inputs 
    2, // Our data belongs to 2 classes, so we use 2 outputs 
    { 2 }, // 1 hidden layer with 2 neurons 
    nullptr, // Use default randomizer 
    &regularize, // Regularization function 
    10, // Batch size 
    0.1, // Learning rate 
    0.01, // Regularization rate 
    0.01 // Momentum rate 
};

// Helper function to calculate error on data set. It just calculate average error on specified data 
auto getLoss = [](TPerceptron<double> &net, const std::vector<DATA> &data)
{
    double result{ 0 };
    std::vector<double> inputs{ 0, 0 }, output{ 0, 0 };
    for (auto pos : data)
    {
        inputs[0] = pos.X;
        inputs[1] = pos.Y;
        output[0] = pos.Class == -1 ? 1 : 0;
        output[1] = pos.Class == +1 ? 1 : 0;
        result += net.Loss(inputs, output);
    }
    result /= data.size();
    return result;
};

// Arrays we will use as inputs and output for our network 
std::vector<double> inputs{ 0, 0 }, output{ 0, 0 };

// Calculate errors on train and test data before learning(we want to compare it with final errors to see how our network improves classification) 
double loss_train{ getLoss(net, train_data) }, loss_test{ getLoss(net, test_data) };
// Teach the network for 50 epochs 
for (int step = 0; step < 50; step++)
{
    std::random_shuffle(train_data.begin(), train_data.end()); // Random shuffle training data 
    std::random_shuffle(test_data.begin(), test_data.end()); // Random shuffle test data 
    for (int i = 0; i < (int)train_data.size(); i++) // Show each of the training point to the network 
    {
        // Prepare input for the network 
        inputs[0] = train_data[i].X;
        inputs[1] = train_data[i].Y;
        // Prepare desired output 
        output[0] = train_data[i].Class == -1 ? 1 : 0;
        output[1] = train_data[i].Class == +1 ? 1 : 0;
        // Process data by the network 
        net.Simulate(inputs, output);
    }
}
// Show final errors and compare them with initial ones 
std::cout << "OK(train loss " << loss_train << " -> " << getLoss(net, train_data) << ", test loss " << loss_test << " -> " << getLoss(net, test_data) << ")" << std::endl;

As a result you can get following output(result is slightly different each time you run this code because we use default randomization of weights and biases which produce different random values each time).

OK(train loss 1.09143 -> 0.0624769, test loss 1.06818 -> 0.0627296)

As you can see the network has significantly decreases error rate after training and can be now used to classify new(unknown) points from the similar random distributions. To classify points all you need is to call Forward method and find out which output value is the greatest.