Department of Statistics, Umeå University
Shallow ANN:
Easier to train, more efficient
Simpler decision structure
Good enough theory
Deep ANN:
‘Arbitrarily’ powerful
More ‘meaningful’ feature extraction
More challenges
Any solutions to avoid overfitting problem?
Any solutions to avoid overfitting problem?
Pre-training V.S. Pre-trained Model V.S. Transfer Learning
Pre-training V.S. Pre-trained Model V.S. Transfer Learning
ReLU function V.S. Sigmoid function
What is batch learning?
What is the ‘Epochs’ and ‘Batch_size’?
The smaller/larger the learning rate is, the better training we have. Is it correct?
Learning rate 0.01 is too small. Is it correct?