Skip to content

First Layer

Detailed Code Explanation

The chapter01.py program

Overview: What is the goal of this program?

Imagine we want to build a simple artificial "brain". This program performs the first and most fundamental step:

  1. Build a "neuron": Create a basic information processing unit.
  2. Prepare the data: Generate a sample dataset for the "brain" to process.
  3. Perform a calculation: Pass the data through the "brain" and see what the output is.

This is the foundation of every neural network. Understanding each line of code here will help you grasp more complex concepts later on.


Part 1: Preparing Tools and Materials (Imports & Data)

This is the step where we gather the necessary libraries and data before we start "building".

Python
import numpy as np
import nnfs
import matplotlib.pyplot as plt

from nnfs.datasets import spiral_data
nnfs.init()
Detailed Line-by-Line Explanation:
  • import numpy as np:

    • What is it?: NumPy (Numerical Python) is the most fundamental and important library for data science in Python. It provides an extremely efficient data structure called an array and tools to perform operations on these arrays, especially matrix math.
    • Why do we need it?: A neural network is, in essence, a series of matrix operations. NumPy helps us perform these matrix multiplications and additions much more quickly and efficiently than using Python's standard lists. as np is a common convention for aliasing the library.
  • import nnfs:

    • What is it?: nnfs (Neural Networks from Scratch) is a helper library written specifically for the book of the same name. Its purpose is to help learners focus on the concepts of neural networks rather than getting bogged down in minor details.
    • Why do we need it?: It provides utility functions, such as creating sample data (spiral_data) and initializing the environment (init), to ensure that everyone's results are the same, making it easier to learn and debug.
  • import matplotlib.pyplot as plt:

    • What is it?: Matplotlib is the most popular data visualization (plotting) library in Python. pyplot is a module within Matplotlib that provides an interface similar to MATLAB.
    • Why do we need it?: "A picture is worth a thousand words." This library allows us to plot the data on a graph to see what it looks like. Seeing the spiral data visually helps us better understand the problem the neural network is trying to solve.
  • from nnfs.datasets import spiral_data:

    • What is it?: This is a specific import statement. Instead of importing the entire nnfs.datasets library, we only import the spiral_data function from it.
    • Why do we need it?: spiral_data is a function that helps create the famous spiral dataset, a classic problem for testing classification models.
  • nnfs.init():

    • What is it?: This command calls the init function from the nnfs library.
    • Why do we need it?: This function performs some background setup, most importantly setting the seed for NumPy's random number generation and establishing a default data type. This ensures that every time you run the code, the "random weights" and "data" generated will be exactly the same, making learning and reproducing results consistent.

Part 2: Building the "Blueprint for a Judge" (class Layer_Dense)

This is the heart of the program. We aren't building a single neuron, but a "blueprint" (class) so that we can easily create an entire layer/panel of judges.

Python
class Layer_Dense:
        def __init__(self, n_inputs, n_neurons):
            self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
            self.biases = np.zeros((1, n_neurons))
        def forward(self, inputs):
            self.output = np.dot(inputs, self.weights) + self.biases
Detailed Section-by-Section Explanation:
  • class Layer_Dense:: Declares a "blueprint" named Layer_Dense. Everything inside it will define the properties and behaviors of a dense neural network layer.

  • def __init__(self, n_inputs, n_neurons):: The initializer (constructor).

    • What does it do?: This function is automatically called every time a new object is created from this blueprint (e.g., dense1 = Layer_Dense(...)). It is used to set up initial properties.
    • self: Represents the object that will be created. When you call dense1.weights, self is dense1.
    • n_inputs: The number of input features this layer will receive (e.g., 2 features like "Redness" and "Roundness" of fruits).
    • n_neurons: The number of neurons in this layer (e.g., 3 judges, one for each type of fruit).
    • self.weights = 0.01 * np.random.randn(n_inputs, n_neurons): This is an extremely important line.
      • np.random.randn(n_inputs, n_neurons): Creates a matrix of size (number_of_inputs, number_of_neurons) filled with random numbers from a standard normal distribution (Gaussian distribution, with a mean of 0 and variance of 1). This represents the initial, completely random "preferences" of the judges.
      • * 0.01: Multiplies all the random weights by a very small number. This is a common technique to prevent the initial output values from being too large, which helps stabilize the training process later on.
    • self.biases = np.zeros((1, n_neurons)):
      • np.zeros((1, n_neurons)): Creates a row matrix (vector) of size (1, number_of_neurons) filled with all zeros. This represents the initial "bias" or "mood" of the judges. Initializing with zeros means that, initially, they have no predisposition.
  • def forward(self, inputs):: The action method.

    • What does it do?: Defines the main behavior of the layer: receiving input data and calculating an output. This process is called the forward pass.
    • inputs: The input data that will be fed into the layer (e.g., a list of features for all fruits).
    • self.output = np.dot(inputs, self.weights) + self.biases: The core mathematical formula.
      • np.dot(inputs, self.weights): Matrix multiplication. This is where each judge "looks" at the features of the fruits and multiplies them by their "preferences" (weights) to come up with a preliminary score.
      • + self.biases: Adds the "bias" (prejudice) of each judge to their score.
      • self.output = ...: The final result is stored in the output attribute of the layer.

Part 3: The Competition Begins! (Using the Class and Data)

Now we will use the "blueprint" and "materials" prepared above to conduct a real competition.

Python
# Create dataset
X, y = spiral_data(samples=100, classes=3)
# Visualize dataset
plt.scatter(X[:,0], X[:,1], c=y, cmap='brg')
plt.show()
  • X, y = spiral_data(samples=100, classes=3): Calls the imported function to create data.
    • X: Will be a NumPy array of size (300, 2). It's 300 because there are 3 classes, each with 100 samples. It's 2 because each sample has 2 features (x, y coordinates). This is the "list of fruit contestants and their characteristics."
    • y: Will be a NumPy array of size (300,) containing the labels 0, 1, 2. This is the "correct answer" for each contestant (Apple, Orange, or Banana).
  • plt.scatter(X[:,0], X[:,1], c=y, cmap='brg'): Prepares to plot the graph.
    • X[:,0]: Gets all rows, first column (all x-coordinates).
    • X[:,1]: Gets all rows, second column (all y-coordinates).
    • c=y: c is short for color. This command tells Matplotlib to color each point (x, y) based on the corresponding value in the y array. Points with y=0 will have one color, y=1 another, and so on.
    • cmap='brg': Color map. Selects the Blue-Red-Green color palette.
  • plt.show(): Displays the prepared plot on the screen.
Python
# Create Dense layer with 2 input features and 3 output values
dense1 = Layer_Dense(2, 3)
  • This is where we create an object from the Layer_Dense blueprint. We are "hiring a panel of judges."
  • dense1 = ...: Creates a specific panel of judges named dense1.
  • Layer_Dense(2, 3): Calls the __init__ function.
    • n_inputs=2: Because each "fruit contestant" (X) has 2 features (x, y coordinates).
    • n_neurons=3: Because we need to classify into 3 types of fruit (3 classes in y). We need 3 judges, each specializing in one type.
Python
# Let's see initial weights and biases
print(">>> Initial weights and biases of the first layer:")
print(dense1.weights)
print(dense1.biases)
  • Prints the weights and biases attributes of the newly created dense1 object. This shows us the initial, completely random "preferences" and "biases" of the judges before they've scored any contestants.
Python
# Perform a forward pass of our training data through this layer
dense1.forward(X)
  • This is the moment of action. We call the forward method of dense1 and pass the entire "list of contestants" (X) into it. The calculation np.dot(X, dense1.weights) + dense1.biases is executed. The judges begin scoring.
Python
# Let's see output of the first few samples:
print(">>> Output of the first few samples:")
print(dense1.output[:5])
  • After forward() finishes running, the result is stored in dense1.output.
  • dense1.output[:5]: We print the scoring results for the first 5 "fruit contestants" to see a sample. Each row is a contestant, each column is a score from a judge. These values are called logits.

Part 4: Abstract Interpretation - The Fruit Classification Competition

Let's retell the whole story seamlessly:

  1. The Scene: We are organizing a competition to classify 3 types of fruit: Apples, Oranges, and Bananas.

  2. The Contestants (X, y): 300 fruits are participating. For each fruit, we use a machine to measure 2 characteristics: "Redness" and "Roundness" (these are the 2 columns of X). We also know the correct answer for each fruit (this is y).

  3. Hiring the Judges (dense1 = Layer_Dense(2, 3)): We hire a panel of 3 judges:

    • Judge 1: An expert on Apples.
    • Judge 2: An expert on Oranges.
    • Judge 3: An expert on Bananas. They are new to the job, so their "knowledge" is initially random.
  4. The Judges' Knowledge (weights and biases):

    • Preferences (weights): Each judge has their own set of "preferences" for the 2 characteristics "Redness" and "Roundness". For example, an ideal Apple expert would have a high preference for "Redness" and "Roundness". A Banana expert would have a negative preference for "Roundness" (since bananas are long). But because they are new, these preferences are assigned randomly (e.g., the Apple expert might prefer non-red fruits, the Banana expert might prefer round ones).
    • Mood (biases): Initially, all 3 judges have a neutral mood (equal to 0).
  5. The Scoring Process (dense1.forward(X)):

    • One by one, each fruit is presented to the panel.
    • Each judge calculates their score using the formula: Score = (Redness * Preference for Redness) + (Roundness * Preference for Roundness) + Mood
    • This process happens for all 300 fruits.
  6. The Scoreboard (dense1.output):

    • The final result is a large scoreboard. Each row is a fruit, each column is a score from a judge.
    • For example, the first row might be [0.0012, -0.0045, 0.0031]. This means that with their current random knowledge, the Apple Judge gives this fruit 0.0012 points, the Orange Judge gives it -0.0045 points, and the Banana Judge gives it 0.0031 points.

The Key Takeaway: Because the judges' "knowledge" (weights) is random, this "scoreboard" (output) is completely meaningless. The process of "training", which is not in this code, is the act of showing the judges the correct answers (y), pointing out their mistakes, and helping them adjust their "preferences" (weights) and "moods" (biases) over thousands of iterations, so that eventually their scoreboard accurately reflects the type of fruit.


Part 5: ASCII Illustration

Diagram for a single fruit passing through the panel of judges:

Text Only
           INPUT (1 fruit)
          (2 features)
        +----------------------+
        | Redness, Roundness   |
        +----------------------+
               |
               |                               JUDGING PANEL (dense1)
               |                                (3 Judges/Neurons)
               |
               |       Preferences (w11, w21)  +--------------------+   (Score from Apple Judge)
               +----------------------------->|    APPLE Judge   + b1|-----> output_1
               |                             +--------------------+
               |
               |       Preferences (w12, w22)  +--------------------+   (Score from Orange Judge)
               +----------------------------->|   ORANGE Judge   + b2|-----> output_2
               |                             +--------------------+
               |
               |       Preferences (w13, w23)  +--------------------+   (Score from Banana Judge)
               +----------------------------->|   BANANA Judge   + b3|-----> output_3
                                             +--------------------+


Scoring formula for the APPLE Judge:
output_1 = (Redness * w11) + (Roundness * w21) + b1

The final result for 1 fruit is a set of 3 scores: [output_1, output_2, output_3]


Appendix - Explaining Spiral Data

This is an explanation of "Spiral Data" - it might sound abstract, but it's one of the most classic and important sample datasets when you start learning about neural networks.

Let's break it down.

1. Simple Definition

Spiral Data is a synthetically generated dataset where data points belonging to different classes are arranged in interlocking spirals.

Take another look at the very plot you generated:

  • You have 3 classes, corresponding to 3 colors: Red, Green, and Blue.
  • Each point has a position (x, y coordinates).
  • Points of the same color form a spiral "arm".
  • These arms are intertwined, wrapping around each other.

2. Why is it so important and famous?

The reason this dataset is used so widely is because it's a perfect challenge to demonstrate the power of neural networks.

A. It "defeats" simple (Linear) models

Imagine you only have a ruler. Your task is to draw one or more straight lines to separate these 3 color groups, such that each region contains only one color.

You will quickly find that this is impossible.

Text Only
       /
      /    <-- You cannot draw any single straight line
     /         to separate Red (R) from Green (G) and Blue (B)
    /
   RRRRR
  G B R G
 B G R B G
B B G G B B
 R R B R R
  R G B R
   BBBBB

A model that can only draw straight lines for classification is called a linear model. Spiral data is a classic example of non-linear data, where the boundary between classes is not a straight line but a complex curve.

In other words, spiral data is intentionally designed to be difficult for simple classification algorithms.

B. It demonstrates the necessity of Neural Networks

Neural networks, especially those with hidden layers and non-linear activation functions (which we will learn about later), are capable of learning extremely complex and winding decision boundaries.

A well-trained neural network can create a boundary that looks something like this:

It doesn't use a "ruler"; it learns to "draw" smooth curves to perfectly enclose each group of data.

Conclusion: Spiral data is a "graduation" test for a classification model. If your model can solve this problem, it proves that it is capable of handling complex, non-linear relationships in data, something that simple models cannot do.

3. What does the spiral_data function create?

When you call X, y = spiral_data(samples=100, classes=3), the function calculates and returns two things:

  1. X (The Features):

    • This is a NumPy array containing the [x, y] coordinates of all the points.
    • With samples=100 and classes=3, it will create 100 * 3 = 300 points.
    • Therefore, X will have the shape (300, 2).
    • In our "fruit judges" story, X is equivalent to a list of 300 fruits, each with 2 features: "Redness" and "Roundness".
  2. y (The Labels):

    • This is a NumPy array containing the class label for each corresponding point in X.
    • It will contain 300 numbers, consisting of 100 0s, 100 1s, and 100 2s.
    • y[i] is the label (the correct answer) for the point X[i].
    • In our story, y is the list of correct answers: which fruit is an "Apple" (class 0), which is an "Orange" (class 1), and which is a "Banana" (class 2).

So, spiral_data isn't just a dataset; it's a classic non-linear classification problem packaged and ready for you to quickly test your models.