First, let’s take a look at the complete AutoEncoder model. By analogy with PCA, we know that the middle part of this model represents the feature variables we want to extract.

If you still remember the second exercise from our last lab, we trained a model to distinguish between the digits 5 and 6 using the first two principal components (PCs) as input feature variables.

Then, why not remove the decoder part and directly use the variables from the last layer as the final input for prediction?

Depending on the problem, we choose different activation functions for the final layer to perform the nonlinear transformation. For example, if it is a regression problem, then the activation function is just an identical function.

This is exactly the same as logistic regression if you have a binary classification problem. We use the variables from the last layer to compute a score, pass this score through the logistic function, and obtain the final prediction—the posterior probability. Then, we use this probability for classification.

Congratulations! You have just unlocked the most basic deep learning architecture, the artificial neural network.