AI HyperParameters (“cheatsheet”)

some hyperparameters are specific to certain models or algorithms.

Hyperparameters are crucial configurations set before training machine learning models, influencing their performance. Unlike model parameters learned during training, hyperparameters, such as learning rate, batch size, number of epochs, model architecture, activation functions, regularization techniques, and optimizer choice, must be manually set. Proper hyperparameter tuning can significantly enhance model performance, helping to avoid underfitting or overfitting. Techniques for tuning include manual adjustment, grid search, random search, Bayesian optimization, and Automated Hyperparameter Tuning (AutoML). The learning rate controls step size in optimization, batch size affects training speed and memory usage, and the architecture determines the model’s capacity. Activation functions introduce non-linearity, while regularization methods prevent overfitting. The choice of optimizer impacts how model weights are updated. Efficient hyperparameter tuning is essential for optimizing model accuracy and generalization to new data.

General Hyperparameters:

Learning Rate: Step size for weight updates.
Batch Size: Number of samples per gradient update.
Number of Epochs: Full cycles over the entire dataset.
Initialization Method: Strategy for initial weights (e.g., Xavier, He initialization).

Model Architecture (Deep Learning):

Number of Layers: Depth of the network.
Neurons per Layer: Width of the network or layers.
Activation Functions: ReLU, Sigmoid, Tanh, Leaky ReLU, etc.
Dropout Rate: Fraction of neurons to drop out for regularization.

Regularization:

L1 Regularization Coefficient: Regularization strength for L1.
L2 Regularization Coefficient: Regularization strength for L2.
Early Stopping: Stop training when a monitored metric stops improving.

Optimization:

Optimizer: SGD, Adam, RMSprop, etc.
Momentum: Parameter for accelerations in gradient descent.
Learning Rate Decay: Reduction method for the learning rate over time.

Convolutional Neural Networks (CNN):

Filter Size: Size of the convolutional kernels.
Stride: Stepsize for convolutional kernel movement.
Padding: Adding zeros to the input boundary for convolution.
Pooling Size: Dimension reduction factor through pooling operations.

Recurrent Neural Networks (RNN):

Sequence Length: Length of input sequences.
Hidden State Size: Dimensionality of RNN hidden states.
Cell Type: LSTM, GRU, or vanilla RNN.

Ensemble Methods (e.g., Random Forest, Gradient Boosting):

Number of Estimators: Number of trees in the forest or boosting rounds.
Max Depth: Maximum depth of each tree.
Min Samples Split: Minimum number of samples required to split a node.
Learning Rate (for boosting): Shrinks the contribution of each tree.

Support Vector Machines (SVM):

C (Regularization parameter): Trade-off between smooth decision boundary and classifying training points correctly.
Kernel Type: Linear, Polynomial, RBF, Sigmoid, etc.
Gamma: Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’.

K-Nearest Neighbors (KNN):

Number of Neighbors: Number of nearest neighbors to consider for decisions.
Distance Metric: How distance is measured (e.g., Euclidean, Manhattan).

Clustering (e.g., K-Means):

Number of Clusters: The ‘K’ in K-Means.
Initialization Method: Method for selecting initial cluster centers.

LINE BY LINE ON COMMAND LINE