Code Explain

This is a detailed analysis of the complete code snippet at the end of Chapter 6, which implements the Random Local Search method.

Objective of the Code¶

This code snippet is a complete program to illustrate an optimization method that is better than a completely random search. Instead of "parachuting" to random locations, the model will start from one point, take a small random "trial step," and only accept that step if it leads to a better result (lower loss).

This is the implementation of the abstract analogy of the "blindfolded mountaineer" mentioned earlier.

Detailed Analysis of Each Block¶

We will go through each part of the code and explain its meaning.

Block 1: Preparing the Environment (Page 11)¶

Python

# Create dataset
X, y = vertical_data(samples=100, classes=3)

# Create model
dense1 = Layer_Dense(2, 3) 
activation1 = Activation_ReLU()
dense2 = Layer_Dense(3, 3) 
activation2 = Activation_Softmax()

# Create loss function
loss_function = Loss_CategoricalCrossentropy()

Simple Explanation: This section sets up everything necessary before starting the training.
- vertical_data: Creates an "easy" dataset. This data consists of 3 clusters of points that are clearly separated vertically. A straight line can easily divide them. This is why this method can succeed with it.
- Layer_Dense(2, 3): The first hidden layer takes 2 inputs (the x, y coordinates of each data point) and has 3 neurons.
- Layer_Dense(3, 3): The output layer takes 3 inputs (from the previous layer) and has 3 neurons (corresponding to the 3 classes/clusters of data to be classified).
- ReLU, Softmax, CategoricalCrossentropy: These are standard components of a neural network for a multi-class classification problem that we have built in previous chapters.

Block 2: Initializing the Mountaineer's "Memory" (Page 11)¶

Python

# Helper variables
lowest_loss = 9999999 # some initial value
best_dense1_weights = dense1.weights.copy()
best_dense1_biases = dense1.biases.copy()
best_dense2_weights = dense2.weights.copy()
best_dense2_biases = dense2.biases.copy()

Simple Explanation: This is an extremely important step; it sets up the "memory" for the algorithm.
- lowest_loss = 9999999: We initialize a variable to store the lowest loss value ever seen. We assign it a very large number to ensure that the loss calculated in the first iteration will certainly be smaller and get recorded.
- best_..._weights = dense1.weights.copy(): This is the core part of the "memory." We create a complete copy of the initial set of weights and biases. This variable will always store the best set of parameters we have ever found.
General Knowledge Comparison - The Importance of .copy(): In Python, when you assign an object (like a NumPy array) a = b, you don't create a new object. a and b will both point to the same memory location. If you change b, a will also change. The .copy() method creates a completely new, independent object in a different memory location. This allows us to freely change dense1.weights without affecting best_dense1_weights, which is holding our "record." Without .copy(), the algorithm would not work.

Block 3: The Optimization Loop - The Mountaineering Journey (Pages 11-12)¶

This is the heart of the algorithm.

Python

for iteration in range(10000):
    # --- Step 1: Take a random trial step ---
    # Update weights with some small random values
    dense1.weights += 0.05 * np.random.randn(2, 3)
    dense1.biases += 0.05 * np.random.randn(1, 3)
    # ... similar for dense2 ...

    # --- Step 2: Check the new position ---
    # Perform a forward pass ...
    dense1.forward(X)
    activation1.forward(dense1.output)
    # ...
    loss = loss_function.calculate(activation2.output, y)

    # --- Step 3: Evaluate and Decide ---
    # Calculate accuracy ...
    predictions = np.argmax(activation2.output, axis=1)
    accuracy = np.mean(predictions==y)

    # If loss is smaller ...
    if loss < lowest_loss:
        # Keep the new position
        print(...)
        best_dense1_weights = dense1.weights.copy()
        best_dense1_biases = dense1.biases.copy()
        # ...
        lowest_loss = loss
    # Revert weights and biases
    else:
        # Return to the old position
        dense1.weights = best_dense1_weights.copy()
        dense1.biases = best_dense1_biases.copy()
        # ...

Simple Explanation:
1. Take a trial step (+=): Instead of creating new weights, we take the current weights and add (+=) a very small random perturbation (0.05 * np.random.randn(...)). The number 0.05 is the "step size," which determines whether our trial step is large or small.
2. Check the new position: After taking a "trial step," we perform the entire forward pass again to calculate the loss with the newly adjusted weights.
3. Evaluate and Decide (if/else):
  - if loss < lowest_loss (Going downhill): If the new loss is smaller than the best loss ever recorded, this is a successful step! We will:
    - Print a progress message.
    - Update lowest_loss with the new value.
    - Most importantly: Update the best_... variables by copying the current weights and biases. This new position becomes the new "best position."
  - else (Going uphill or sideways): If the new loss is not better, this trial step was a mistake. We will:
    - Revert the change by assigning the current weights and biases back to the values stored in the best_... variables. This action brings the "mountaineer" back to the previously known best position, to prepare for another random trial step in the next iteration.

Diagram Illustrating Matrix Operations¶

The most important and new operation in this loop is dense1.weights += 0.05 * np.random.randn(2, 3).

dense1.weights: is a matrix of size (2, 3).
np.random.randn(2, 3): creates a random matrix also of size (2, 3).
0.05 * ...: multiplies each element in the random matrix by 0.05.
+=: adds the two matrices element-wise.

Diagram illustrating the update of dense1.weights:

Text Only

 Current weights (dense1.weights)          Random adjustment (0.05 * randn)             New weights
         (2x3)                                    (2x3)                                   (2x3)
+-------------------+                 +-------------------+                 +--------------------------+
| w_11   w_12   w_13|        +        | r_11   r_12   r_13|        =        | w_11+r_11   w_12+r_12  ... |
| w_21   w_22   w_23|                 | r_21   r_22   r_23|                 | w_21+r_21   w_22+r_22  ... |
+-------------------+                 +-------------------+                 +--------------------------+

Diagram illustrating the update of dense1.biases (which is a row vector):

Text Only

  Current biases (dense1.biases)       Random adjustment (0.05 * randn)             New biases
          (1x3)                                   (1x3)                                 (1x3)
+---------------+              +        +---------------+              +        +-------------------+
| b_1  b_2  b_3 |              +        | rb_1 rb_2 rb_3|              =        | b_1+rb_1 ...      |
+---------------+                       +---------------+                       +-------------------+

Conclusion and Evaluation¶

Advantages: This method is much more directional than a completely random search. It builds on previous success, step by step, to feel its way towards a better solution. This explains why it works effectively on the simple vertical_data dataset, achieving high accuracy.
Critical Disadvantage: As seen at the end of the chapter, when applied to the complex spiral_data, it fails. The reason is that it gets stuck in a local minimum. The "blindfolded mountaineer" has descended into a small pit and cannot get out, because every small random step leads uphill.
Lesson Learned: We need a "smarter" method than just random groping. We need a way to know which direction is the steepest downhill. This is the role of the derivative and the gradient, which will be introduced in the following chapters to build the Gradient Descent algorithm.