*
This article describes Artificial Intelligence module added in 4.0.0 version of the Nitisa framework. After reading this article you will know how to build your own multi layer perceptron and use it in classification tasks. *

#### About module

#### Activators

#### Error functions

#### Regularizers

#### Randomizers

#### Multi layer perceptron(MLP)

#### Example

AI module provides you with template classes you can use to build and train multi layer perceptron or multi layered neural network with fully connected layers. The module can be included by `#include "Nitisa/Modules/AI.h"`

. The module is completely implemented on templates. You can find documentation about this module on AI module reference page.

AI module does not implement all the neuron networks currently known and used and it doesn't provide any optimization using GPU or parallel execution. It's purpose is rather to provide you with possibility of fast start, strategy testing, and other not very complex goals. We are also planning to extend this module in future and implement the most powerful and widely used networks and features, like convolutional and recurrent networks, deep learning, and so on.

Despite this the AI module is very powerful. It is not only implement simple neural network but also allows you to use such an advanced techniques like regularization and momentum, different activation, error, and regularization functions, smart weights and biases initialization.

Before talking about neural network let us explain shortly some helper parts used with neural networks and available in the AI module. Here they are.

Each neuron in neural network require activation function. In the framework we call them **activators**. Such a function takes weighted neuron input(sum of inputs multiplied on corresponding weight) and produce some value. There are lots of functions used for such a purpose. We have collected many of them in `nitisa::ai::activators`

namespace. Most commonly used are sigmoid activation function implemented by TLogistic template and the hyper tangent(tanh) function implemented in THyperTan. The are lots of other activation functions in the module which you can find on the AI module reference pages. Moreover, you can add your own activation functions. All you need is to derive them from IActivator interface, implement abstract methods and use it in the networks.

The second important part of the neural networks is error function. The error function goal is to calculate how well neural network works. All available error functions are located in `nitisa::ai::errors`

namespace. The most often used ones are already implemented. There are quadratic error function implemented in TQuadratic template and so called cross-entropy error function which is implemented in TCrossEntropy template. You can also use your own error functions as well. Just derive them from IErrorFunction interface and pass it to your network.

To improve neural network a so called regularization is often applied. It is an optional feature. To apply regularization you have to tell your network which regularization function it should use. There are two commonly used regularization functions implemented in the AI module. They are L1 and L2. You, as always, can implement your own regularization function by deriving it from IRegularizeFunction interface. All the functions are in `nitisa::ai::regularizers`

namespace. To apply regularization you not only have to provide network with regularization function but also specify non-zero regularization rate.

When you create neural network you have to initialize neuron weights and biases somehow. It is usually done by setting their values to some random numbers. To do this we introduce randomizers - an objects which generate such a random numbers. They are located in `nitisa::ai::randomizers`

namespace. And, as usually, there is an interface IRandomizer which will help you build your own randomizers. If you don't specify any randomizer to your network the default will be used. Usually using the default randomizer is more than enough.

The core object of the AI module is Multi Layer Perceptron(MLP). It is implemented in TPerceptron template. To create a neural network all you need is to create TPerceptron object specifying network structure and other options in constructor. After it you have to teach your network. The TPerceptron is a network which learns with teacher using stochastic gradient descent method(back propagation). So the process of learning is just supplying a network with examples and required result many times. We will show an example a little bit later. TPerceptron has some parameters you have to specify. They are called hyperparameters and they are learning rate, regularization rate, momentum rate, and network structure itself. There are no common algorithms to select these hyperparameters so you have to experiment to find the best ones. Usually network architecture is following. You have the same number of inputs as your data set feature count. Number of outputs is the same as number of classes your data belongs. And there could be several hidden layers. Do not use too much hidden layers. 1-4 is usually okay. More hidden layers learn very slow and may give worse result then the net with fewer hidden layers. If your classes are linearly separable you can have no hidden layers at all.

If you use limited range activation functions(like sigmoid which output range is from 0 to 1) your have to understand that the output will never be out of the function range and specifying in desired output values out of these range is a mistake. Most input data should be preprocessed before using it to teach and process in neural network. Often there are 2(or even three) sets of data. One of them is used to train the network and the other one is to calculate error. Teaching is often done in epochs and both training and test data sets are shuffled each epoch.

That is all about TPerceptron and tips how to train it. Lets now look on simple neural network code.

This code trains the network to classify data by 2 classes. The data is a set of randomly generated points in 2D space.

```
// We assume this code either in
```__nitisa::ai__ namespace or __using nitisa::ai;__ is written somewhere before. So we omit adding AI module namespace everywhere
// Generate data
struct DATA // Data structure
{
double X; // X-coordinate
double Y; // Y-coordinate
int Class; // Class to which the point belongs
};
std::vector<DATA> train_data, test_data; // Arrays of train and test data
{
const int NUM_SAMPLES{ 250 }; // Total number of points in each class
std::vector<DATA> data; // Temporal storage for generated points
randomizers::TDefaultFloat<double> r1{ -3, -1, 6432976 }, r2{ 1, 3, 6945738 }; // 2 randomizers. One will generate points in range [-3..-1], and the other in range [+1..+3]
for (int i = 0; i < NUM_SAMPLES; i++) // Generate points
{
data.push_back({ r1.Generate(), r1.Generate(), -1 }); // Generate point which belongs to first class
data.push_back({ r2.Generate(), r2.Generate(), +1 }); // Generate point which belongs to second class
}
std::random_shuffle(data.begin(), data.end()); // Randomly shuffle generated points
for (int i = 0; i < NUM_SAMPLES; i++) // Put half of the data into training array
train_data.push_back(data[i]);
for (int i = NUM_SAMPLES; i < (int)data.size(); i++) // Put another half of the data into testing array
test_data.push_back(data[i]);
}
activators::TLogistic<double> activator{ 1 }; // Activation function for hidden layers
activators::TLogistic<double> activator_output{ 1 }; // Activation function for output layer
errors::TCrossEntropy<double> error_function; // Cross-entropy error function
regularizers::TL2<double> regularize; // L2 regularization function
TPerceptron<double> net{ // Neural network(Multi layer perceptron)
&activator, // Hidden layers activation function
&activator_output, // Output layer activation function
&error_function, // Error function
2, // Our data has two features(X and Y coordinates) so we need two inputs
2, // Our data belongs to 2 classes, so we use 2 outputs
{ 2 }, // 1 hidden layer with 2 neurons
nullptr, // Use default randomizer
®ularize, // Regularization function
10, // Batch size
0.1, // Learning rate
0.01, // Regularization rate
0.01 // Momentum rate
};
// Helper function to calculate error on data set. It just calculate average error on specified data
auto getLoss = [](TPerceptron<double> &net, const std::vector<DATA> &data)
{
double result{ 0 };
std::vector<double> inputs{ 0, 0 }, output{ 0, 0 };
for (auto pos : data)
{
inputs[0] = pos.X;
inputs[1] = pos.Y;
output[0] = pos.Class == -1 ? 1 : 0;
output[1] = pos.Class == +1 ? 1 : 0;
result += net.Loss(inputs, output);
}
result /= data.size();
return result;
};
// Arrays we will use as inputs and output for our network
std::vector<double> inputs{ 0, 0 }, output{ 0, 0 };
// Calculate errors on train and test data before learning(we want to compare it with final errors to see how our network improves classification)
double loss_train{ getLoss(net, train_data) }, loss_test{ getLoss(net, test_data) };
// Teach the network for 50 epochs
for (int step = 0; step < 50; step++)
{
std::random_shuffle(train_data.begin(), train_data.end()); // Random shuffle training data
std::random_shuffle(test_data.begin(), test_data.end()); // Random shuffle test data
for (int i = 0; i < (int)train_data.size(); i++) // Show each of the training point to the network
{
// Prepare input for the network
inputs[0] = train_data[i].X;
inputs[1] = train_data[i].Y;
// Prepare desired output
output[0] = train_data[i].Class == -1 ? 1 : 0;
output[1] = train_data[i].Class == +1 ? 1 : 0;
// Process data by the network
net.Simulate(inputs, output);
}
}
// Show final errors and compare them with initial ones
std::cout << "OK(train loss " << loss_train << " -> " << getLoss(net, train_data) << ", test loss " << loss_test << " -> " << getLoss(net, test_data) << ")" << std::endl;

As a result you can get following output(result is slightly different each time you run this code because we use default randomization of weights and biases which produce different random values each time).

OK(train loss 1.09143 -> 0.0624769, test loss 1.06818 -> 0.0627296)

As you can see the network has significantly decreases error rate after training and can be now used to classify new(unknown) points from the similar random distributions. To classify points all you need is to call `Forward`

method and find out which output value is the greatest.