Code Explain
This is a detailed analysis of the complete code snippet at the end of Chapter 6, which implements the Random Local Search method.
Objective of the Code¶
This code snippet is a complete program to illustrate an optimization method that is better than a completely random search. Instead of "parachuting" to random locations, the model will start from one point, take a small random "trial step," and only accept that step if it leads to a better result (lower loss).
This is the implementation of the abstract analogy of the "blindfolded mountaineer" mentioned earlier.
Detailed Analysis of Each Block¶
We will go through each part of the code and explain its meaning.
Block 1: Preparing the Environment (Page 11)¶
# Create dataset
X, y = vertical_data(samples=100, classes=3)
# Create model
dense1 = Layer_Dense(2, 3)
activation1 = Activation_ReLU()
dense2 = Layer_Dense(3, 3)
activation2 = Activation_Softmax()
# Create loss function
loss_function = Loss_CategoricalCrossentropy()
- Simple Explanation: This section sets up everything necessary before starting the training.
vertical_data
: Creates an "easy" dataset. This data consists of 3 clusters of points that are clearly separated vertically. A straight line can easily divide them. This is why this method can succeed with it.Layer_Dense(2, 3)
: The first hidden layer takes 2 inputs (the x, y coordinates of each data point) and has 3 neurons.Layer_Dense(3, 3)
: The output layer takes 3 inputs (from the previous layer) and has 3 neurons (corresponding to the 3 classes/clusters of data to be classified).ReLU
,Softmax
,CategoricalCrossentropy
: These are standard components of a neural network for a multi-class classification problem that we have built in previous chapters.
Block 2: Initializing the Mountaineer's "Memory" (Page 11)¶
# Helper variables
lowest_loss = 9999999 # some initial value
best_dense1_weights = dense1.weights.copy()
best_dense1_biases = dense1.biases.copy()
best_dense2_weights = dense2.weights.copy()
best_dense2_biases = dense2.biases.copy()
-
Simple Explanation: This is an extremely important step; it sets up the "memory" for the algorithm.
lowest_loss = 9999999
: We initialize a variable to store the lowest loss value ever seen. We assign it a very large number to ensure that the loss calculated in the first iteration will certainly be smaller and get recorded.best_..._weights = dense1.weights.copy()
: This is the core part of the "memory." We create a complete copy of the initial set of weights and biases. This variable will always store the best set of parameters we have ever found.
-
General Knowledge Comparison - The Importance of
.copy()
: In Python, when you assign an object (like a NumPy array)a = b
, you don't create a new object.a
andb
will both point to the same memory location. If you changeb
,a
will also change. The.copy()
method creates a completely new, independent object in a different memory location. This allows us to freely changedense1.weights
without affectingbest_dense1_weights
, which is holding our "record." Without.copy()
, the algorithm would not work.
Block 3: The Optimization Loop - The Mountaineering Journey (Pages 11-12)¶
This is the heart of the algorithm.
for iteration in range(10000):
# --- Step 1: Take a random trial step ---
# Update weights with some small random values
dense1.weights += 0.05 * np.random.randn(2, 3)
dense1.biases += 0.05 * np.random.randn(1, 3)
# ... similar for dense2 ...
# --- Step 2: Check the new position ---
# Perform a forward pass ...
dense1.forward(X)
activation1.forward(dense1.output)
# ...
loss = loss_function.calculate(activation2.output, y)
# --- Step 3: Evaluate and Decide ---
# Calculate accuracy ...
predictions = np.argmax(activation2.output, axis=1)
accuracy = np.mean(predictions==y)
# If loss is smaller ...
if loss < lowest_loss:
# Keep the new position
print(...)
best_dense1_weights = dense1.weights.copy()
best_dense1_biases = dense1.biases.copy()
# ...
lowest_loss = loss
# Revert weights and biases
else:
# Return to the old position
dense1.weights = best_dense1_weights.copy()
dense1.biases = best_dense1_biases.copy()
# ...
-
Simple Explanation:
- Take a trial step (
+=
): Instead of creating new weights, we take the current weights and add (+=
) a very small random perturbation (0.05 * np.random.randn(...)
). The number0.05
is the "step size," which determines whether our trial step is large or small. - Check the new position: After taking a "trial step," we perform the entire forward pass again to calculate the
loss
with the newly adjusted weights. -
Evaluate and Decide (
if/else
):if loss < lowest_loss
(Going downhill): If the newloss
is smaller than the bestloss
ever recorded, this is a successful step! We will:- Print a progress message.
- Update
lowest_loss
with the new value. - Most importantly: Update the
best_...
variables by copying the current weights and biases. This new position becomes the new "best position."
else
(Going uphill or sideways): If the newloss
is not better, this trial step was a mistake. We will:- Revert the change by assigning the current weights and biases back to the values stored in the
best_...
variables. This action brings the "mountaineer" back to the previously known best position, to prepare for another random trial step in the next iteration.
- Revert the change by assigning the current weights and biases back to the values stored in the
- Take a trial step (
Diagram Illustrating Matrix Operations¶
The most important and new operation in this loop is dense1.weights += 0.05 * np.random.randn(2, 3)
.
dense1.weights
: is a matrix of size(2, 3)
.np.random.randn(2, 3)
: creates a random matrix also of size(2, 3)
.0.05 * ...
: multiplies each element in the random matrix by0.05
.+=
: adds the two matrices element-wise.
Diagram illustrating the update of dense1.weights
:
Current weights (dense1.weights) Random adjustment (0.05 * randn) New weights
(2x3) (2x3) (2x3)
+-------------------+ +-------------------+ +--------------------------+
| w_11 w_12 w_13| + | r_11 r_12 r_13| = | w_11+r_11 w_12+r_12 ... |
| w_21 w_22 w_23| | r_21 r_22 r_23| | w_21+r_21 w_22+r_22 ... |
+-------------------+ +-------------------+ +--------------------------+
Diagram illustrating the update of dense1.biases
(which is a row vector):
Current biases (dense1.biases) Random adjustment (0.05 * randn) New biases
(1x3) (1x3) (1x3)
+---------------+ + +---------------+ + +-------------------+
| b_1 b_2 b_3 | + | rb_1 rb_2 rb_3| = | b_1+rb_1 ... |
+---------------+ +---------------+ +-------------------+
Conclusion and Evaluation¶
- Advantages: This method is much more directional than a completely random search. It builds on previous success, step by step, to feel its way towards a better solution. This explains why it works effectively on the simple
vertical_data
dataset, achieving high accuracy. - Critical Disadvantage: As seen at the end of the chapter, when applied to the complex
spiral_data
, it fails. The reason is that it gets stuck in a local minimum. The "blindfolded mountaineer" has descended into a small pit and cannot get out, because every small random step leads uphill. - Lesson Learned: We need a "smarter" method than just random groping. We need a way to know which direction is the steepest downhill. This is the role of the derivative and the gradient, which will be introduced in the following chapters to build the Gradient Descent algorithm.