picks a random instance in the training set at every step and computes the gradients based only on that single instance
Or, shuffle the training set, then go through it by instance, then shuffle it again, and so on, this generally converges more slowly
Add a regularization term equal to $\alpha \sum_{i=1}^{n}\theta_{i}^{2}$
Add the regularizaiton term $\alpha \sum_{i=1}^{n} |\theta|$
Add the regularizaiton term $r\alpha\sum_{i=1}^{n}|\theta_{i}|+\frac{2}{1-r}\alpha\sum_{i=1}^{n}\theta_{i}^{2}$
Stop training as soon as the validation error reaches a minimum A simple and efficient regularization techniques that Geoffrey Hinton called it a "beautiful free lunch."
Hands-On Machine Learning with Scikit-Learn & TensorFlow