The essence of a machine learning modeling problem is to find an appropriate model \(f\) and suitable parameters \(w\) to transform feature information \(x\) into target information \(y\).
Traditional modeling methods are usually carried out in two steps, with the focus placed separately on feature variables and modeling.
In the first step, we usually strive to ensure that the input feature variables are informative while keeping the number of variables manageable.
In this process, we can leverage our domain knowledge to create a feature extraction function, \(g(\textbf{x}; \textbf{w}_0)\), also known as manual feature engineering.
However, in most cases, we need to use data-driven methods. In doing so, we often overlook target information and derive the feature extraction function solely from the feature variables.
However, I want to emphasize that in most cases, we do NOT use \(Y\) in the first step.
Once all extracted feature variables have been computed, we proceed to the second step, where we select and apply various algorithms to obtain the most powerful model. That is, once the model \(f\) is determined, we use algorithms to learn the model parameters \(\textbf{W}\).
At this stage, the data ( \(y\) and extracted features \(\textbf{Z}\)) serve as the fuel, and the algorithm uses this fuel to determine the model parameters \(\textbf{W}\). Finally, we apply model selection techniques to identify the optimal model.
During the model deployment phase, when we encounter a new case, we use \(g\) to extract \(\textbf{z}\) from its feature values, then feed \(\textbf{z}\) into \(f\) to obtain the prediction result.
The main issue with this approach is the difficulty in designing an efficient feature extraction process that aligns well with the target information. As a result, the model’s performance may be suboptimal.
Of course, this approach has clear advantages as well—it does not require large amounts of data or powerful computational resources. Therefore, in the past era, this method was the dominant approach.
But as we enter a new era, with increasing data volumes and enhanced computational power, researchers have started reconsidering the integration of these two steps. The key idea is to use the target variable to guide the model—not only in learning the model parameters but also in learning more informative feature extraction.
Unlike the past approach, the fuel for training the model is now both \(X\) and \(y\). By skipping the intermediate step of extracting \(z\), we achieve an end-to-end learning approach.
Keep in mind. In the new ear, our slagon is Learn Everything From DATA.