Gradient Descent
Definition
Gradient descent is a mathematical optimization algorithm used to train AI models. It works by slowly adjusting the model’s weights—its internal settings—so the output becomes more accurate over time. This is done by finding the direction of the steepest error and moving in the opposite direction to reduce it.
Example
“During training, a neural network uses gradient descent to get better at predicting the next word in a sentence.”
How It’s Used in AI
Gradient descent is the foundation of how neural networks and machine learning models learn. It powers everything from fine-tuning to massive pretraining of LLMs, allowing the model to improve with each batch of data.
Brief History
Used in statistics since the 1800s, gradient descent became a key part of training deep neural networks in the 2010s. Variants like Stochastic Gradient Descent (SGD) and Adam are now standard in AI research.
Key Tools or Models
Algorithms: SGD, Adam, RMSprop
Libraries: TensorFlow, PyTorch
Used in most modern AI training workflows
Pro Tip
Learning rate matters. If it’s too high, the model overshoots and fails to learn. Too low, and training takes forever.