ISBN : 9781394205608
Author : Douglas J. Santry
Publisher : Wiley
Year : 2023
Language : English
Type : Book
Description : Table of Contents About the Author ix Acronyms x 1 Introduction 1 1.1 AI/ML – Deep Learning? 5 1.2 A Brief History 6 1.3 The Genesis of Models 9 1.3.1 Rise of the Empirical Functions 9 1.3.2 The Biological Phenomenon and the Analogue 13 1.4 Numerical Computation–Computer Numbers Are Not Real 14 1.4.1 The IEEE 754 Floating Point System 15 1.4.2 Numerical Coding Tip: Think in Floating Point 18 1.5 Summary 20 1.6 Projects 21 2 Deep Learning and Neural Networks 23 2.1 Feed-Forward and Fully-Connected Artificial Neural Networks 24 2.2 Computing Neuron State 29 2.2.1 Activation Functions 29 2.3 The Feed-Forward ANN Expressed with Matrices 31 2.3.1 Neural Matrices: A Convenient Notation 32 2.4 Classification 33 2.4.1 Binary Classification 34 2.4.2 One-Hot Encoding 36 2.4.3 The Softmax Layer 38 2.5 Summary 39 2.6 Projects 40 3 Training Neural Networks 41 3.1 Preparing the Training Set: Data Preprocessing 42 3.2 Weight Initialization 45 3.3 Training Outline 47 3.4 Least Squares: A Trivial Example 49 3.5 Backpropagation of Error for Regression 51 3.5.1 The Terminal Layer (Output) 54 3.5.2 Backpropagation: The Shallower Layers 57 3.5.3 The Complete Backpropagation Algorithm 61 3.5.4 AWord on the Rectified Linear Unit (ReLU) 62 3.6 Stochastic Sine 64 3.7 Verification of a Software Implementation 66 3.8 Summary 70 3.9 Projects 71 4 Training Classifiers 73 4.1 Backpropagation for Classifiers 73 4.1.1 Likelihood 74 4.1.2 Categorical Loss Functions 75 4.2 Computing the Derivative of the Loss 77 4.2.1 Initiate Backpropagation 80 4.3 Multilabel Classification 81 4.3.1 Binary Classification 82 4.3.2 Training A Multilabel Classifier ANN 82 4.4 Summary 84 4.5 Projects 85 5 Weight Update Strategies 87 5.1 Stochastic Gradient Descent 87 5.2 Weight Updates as Iteration and Convex Optimization 92 5.2.1 Newton's Method for Optimization 93 5.3 RPROP+ 96 5.4 Momentum Methods 99 5.4.1 AdaGrad and RMSProp 100 5.4.2 ADAM 101 5.5 Levenberg–Marquard Optimization for Neural Networks 103 5.6 Summary 108 5.7 Projects 109 6 Convolutional Neural Networks 111 6.1 Motivation 112 6.2 Convolutions and Features 113 6.3 Filters 117 6.4 Pooling 119 6.5 Feature Layers 120 6.6 Training a CNN 123 6.6.1 Flatten and the Gradient 123 6.6.2 Pooling and the Gradient 124 6.6.3 Filters and the Gradient 125 6.7 Applications 129 6.8 Summary 130 6.9 Projects 130 7 Fixing the Fit 133 7.1 Quality of the Solution 133 7.2 Generalization Error 134 7.2.1 Bias 134 7.2.2 Variance 135 7.2.3 The Bias-Variance Trade-off 136 7.2.4 The Bias-Variance Trade-off in Context 138 7.2.5 The Test Set 138 7.3 Classification Performance 140 7.4 Regularization 143 7.4.1 Forward Pass During Training 143 7.4.2 Forward Pass During Normal Inference 145 7.4.3 Backpropagation of Error 146 7.5 Advanced Normalization 148 7.5.1 Batch Normalization 149 7.5.2 Layer Normalization 154 7.6 Summary 156 7.7 Projects 157 8 Design Principles for a Deep Learning Training Library 159 8.1 Computer Languages 160 8.2 The Matrix: Crux of a Library Implementation 164 8.2.1 Memory Access and Modern CPU Architectures 165 8.2.2 Designing Matrix Computations 168 8.2.2.1 Convolutions as Matrices 170 8.3 The Framework 171 8.4 Summary 173 8.5 Projects 173 9 Vistas 175 9.1 The Limits of ANN Learning Capacity 175 9.2 Generative Adversarial Networks 177 9.2.1 GAN Architecture 178 9.2.2 The GAN Loss Function 180 9.3 Reinforcement Learning 183 9.3.1 The Elements of Reinforcement Learning 185 9.3.2 A Trivial RL Training Algorithm 187 9.4 Natural Language Processing Transformed 193 9.4.1 The Challenges of Natural Language 195 9.4.2 Word Embeddings 195 9.4.3 Attention 198 9.4.4 Transformer Blocks 200 9.4.5 Multi-Head Attention 204 9.4.6 Transformer Applications 205 9.5 Neural Turing Machines 207 9.6 Summary 210 9.7 Projects 210 Appendix A Mathematical Review 211 A.1 Linear Algebra 211 A.1.1 Vectors 211 A.1.2 Matrices 212 A.1.3 Matrix Properties 214 A.1.4 Linear Independence 215 A.1.5 The QR Decomposition 215 A.1.6 Least Squares 215 A.1.7 Eigenvalues and Eigenvectors 216 A.1.8 Hadamard Operations 216 A.2 Basic Calculus 217 A.2.1 The Product Rule 217 A.2.2 The Chain Rule 218 A.2.3 Multivariable Functions 218 A.2.4 Taylor Series 218 A.3 Advanced Matrices 219 A.4 Probability 219 Glossary 221 References 229 Index 243