First Layer
Detailed Code Explanation¶
The chapter01.py
program
Overview: What is the goal of this program?
Imagine we want to build a simple artificial "brain". This program performs the first and most fundamental step:
- Build a "neuron": Create a basic information processing unit.
- Prepare the data: Generate a sample dataset for the "brain" to process.
- Perform a calculation: Pass the data through the "brain" and see what the output is.
This is the foundation of every neural network. Understanding each line of code here will help you grasp more complex concepts later on.
Part 1: Preparing Tools and Materials (Imports & Data)¶
This is the step where we gather the necessary libraries and data before we start "building".
import numpy as np
import nnfs
import matplotlib.pyplot as plt
from nnfs.datasets import spiral_data
nnfs.init()
Detailed Line-by-Line Explanation:¶
-
import numpy as np
:- What is it?:
NumPy
(Numerical Python) is the most fundamental and important library for data science in Python. It provides an extremely efficient data structure called an array and tools to perform operations on these arrays, especially matrix math. - Why do we need it?: A neural network is, in essence, a series of matrix operations. NumPy helps us perform these matrix multiplications and additions much more quickly and efficiently than using Python's standard lists.
as np
is a common convention for aliasing the library.
- What is it?:
-
import nnfs
:- What is it?:
nnfs
(Neural Networks from Scratch) is a helper library written specifically for the book of the same name. Its purpose is to help learners focus on the concepts of neural networks rather than getting bogged down in minor details. - Why do we need it?: It provides utility functions, such as creating sample data (
spiral_data
) and initializing the environment (init
), to ensure that everyone's results are the same, making it easier to learn and debug.
- What is it?:
-
import matplotlib.pyplot as plt
:- What is it?:
Matplotlib
is the most popular data visualization (plotting) library in Python.pyplot
is a module within Matplotlib that provides an interface similar to MATLAB. - Why do we need it?: "A picture is worth a thousand words." This library allows us to plot the data on a graph to see what it looks like. Seeing the spiral data visually helps us better understand the problem the neural network is trying to solve.
- What is it?:
-
from nnfs.datasets import spiral_data
:- What is it?: This is a specific
import
statement. Instead of importing the entirennfs.datasets
library, we only import thespiral_data
function from it. - Why do we need it?:
spiral_data
is a function that helps create the famous spiral dataset, a classic problem for testing classification models.
- What is it?: This is a specific
-
nnfs.init()
:- What is it?: This command calls the
init
function from thennfs
library. - Why do we need it?: This function performs some background setup, most importantly setting the seed for NumPy's random number generation and establishing a default data type. This ensures that every time you run the code, the "random weights" and "data" generated will be exactly the same, making learning and reproducing results consistent.
- What is it?: This command calls the
Part 2: Building the "Blueprint for a Judge" (class Layer_Dense
)¶
This is the heart of the program. We aren't building a single neuron, but a "blueprint" (class
) so that we can easily create an entire layer/panel of judges.
class Layer_Dense:
def __init__(self, n_inputs, n_neurons):
self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
self.biases = np.zeros((1, n_neurons))
def forward(self, inputs):
self.output = np.dot(inputs, self.weights) + self.biases
Detailed Section-by-Section Explanation:¶
-
class Layer_Dense:
: Declares a "blueprint" namedLayer_Dense
. Everything inside it will define the properties and behaviors of a dense neural network layer. -
def __init__(self, n_inputs, n_neurons):
: The initializer (constructor).- What does it do?: This function is automatically called every time a new object is created from this blueprint (e.g.,
dense1 = Layer_Dense(...)
). It is used to set up initial properties. self
: Represents the object that will be created. When you calldense1.weights
,self
isdense1
.n_inputs
: The number of input features this layer will receive (e.g., 2 features like "Redness" and "Roundness" of fruits).n_neurons
: The number of neurons in this layer (e.g., 3 judges, one for each type of fruit).self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
: This is an extremely important line.np.random.randn(n_inputs, n_neurons)
: Creates a matrix of size(number_of_inputs, number_of_neurons)
filled with random numbers from a standard normal distribution (Gaussian distribution, with a mean of 0 and variance of 1). This represents the initial, completely random "preferences" of the judges.* 0.01
: Multiplies all the random weights by a very small number. This is a common technique to prevent the initial output values from being too large, which helps stabilize the training process later on.
self.biases = np.zeros((1, n_neurons))
:np.zeros((1, n_neurons))
: Creates a row matrix (vector) of size(1, number_of_neurons)
filled with all zeros. This represents the initial "bias" or "mood" of the judges. Initializing with zeros means that, initially, they have no predisposition.
- What does it do?: This function is automatically called every time a new object is created from this blueprint (e.g.,
-
def forward(self, inputs):
: The action method.- What does it do?: Defines the main behavior of the layer: receiving input data and calculating an output. This process is called the forward pass.
inputs
: The input data that will be fed into the layer (e.g., a list of features for all fruits).self.output = np.dot(inputs, self.weights) + self.biases
: The core mathematical formula.np.dot(inputs, self.weights)
: Matrix multiplication. This is where each judge "looks" at the features of the fruits and multiplies them by their "preferences" (weights) to come up with a preliminary score.+ self.biases
: Adds the "bias" (prejudice) of each judge to their score.self.output = ...
: The final result is stored in theoutput
attribute of the layer.
Part 3: The Competition Begins! (Using the Class and Data)¶
Now we will use the "blueprint" and "materials" prepared above to conduct a real competition.
# Create dataset
X, y = spiral_data(samples=100, classes=3)
# Visualize dataset
plt.scatter(X[:,0], X[:,1], c=y, cmap='brg')
plt.show()
X, y = spiral_data(samples=100, classes=3)
: Calls the imported function to create data.X
: Will be a NumPy array of size(300, 2)
. It's 300 because there are 3classes
, each with 100samples
. It's 2 because each sample has 2 features (x, y coordinates). This is the "list of fruit contestants and their characteristics."y
: Will be a NumPy array of size(300,)
containing the labels0, 1, 2
. This is the "correct answer" for each contestant (Apple, Orange, or Banana).
plt.scatter(X[:,0], X[:,1], c=y, cmap='brg')
: Prepares to plot the graph.X[:,0]
: Gets all rows, first column (all x-coordinates).X[:,1]
: Gets all rows, second column (all y-coordinates).c=y
:c
is short for color. This command tells Matplotlib to color each point(x, y)
based on the corresponding value in they
array. Points withy=0
will have one color,y=1
another, and so on.cmap='brg'
: Color map. Selects the Blue-Red-Green color palette.
plt.show()
: Displays the prepared plot on the screen.
- This is where we create an object from the
Layer_Dense
blueprint. We are "hiring a panel of judges." dense1 = ...
: Creates a specific panel of judges nameddense1
.Layer_Dense(2, 3)
: Calls the__init__
function.n_inputs=2
: Because each "fruit contestant" (X
) has 2 features (x, y coordinates).n_neurons=3
: Because we need to classify into 3 types of fruit (3 classes iny
). We need 3 judges, each specializing in one type.
# Let's see initial weights and biases
print(">>> Initial weights and biases of the first layer:")
print(dense1.weights)
print(dense1.biases)
- Prints the
weights
andbiases
attributes of the newly createddense1
object. This shows us the initial, completely random "preferences" and "biases" of the judges before they've scored any contestants.
- This is the moment of action. We call the
forward
method ofdense1
and pass the entire "list of contestants" (X
) into it. The calculationnp.dot(X, dense1.weights) + dense1.biases
is executed. The judges begin scoring.
# Let's see output of the first few samples:
print(">>> Output of the first few samples:")
print(dense1.output[:5])
- After
forward()
finishes running, the result is stored indense1.output
. dense1.output[:5]
: We print the scoring results for the first 5 "fruit contestants" to see a sample. Each row is a contestant, each column is a score from a judge. These values are called logits.
Part 4: Abstract Interpretation - The Fruit Classification Competition¶
Let's retell the whole story seamlessly:
-
The Scene: We are organizing a competition to classify 3 types of fruit: Apples, Oranges, and Bananas.
-
The Contestants (
X
,y
): 300 fruits are participating. For each fruit, we use a machine to measure 2 characteristics: "Redness" and "Roundness" (these are the 2 columns ofX
). We also know the correct answer for each fruit (this isy
). -
Hiring the Judges (
dense1 = Layer_Dense(2, 3)
): We hire a panel of 3 judges:- Judge 1: An expert on Apples.
- Judge 2: An expert on Oranges.
- Judge 3: An expert on Bananas. They are new to the job, so their "knowledge" is initially random.
-
The Judges' Knowledge (
weights
andbiases
):- Preferences (
weights
): Each judge has their own set of "preferences" for the 2 characteristics "Redness" and "Roundness". For example, an ideal Apple expert would have a high preference for "Redness" and "Roundness". A Banana expert would have a negative preference for "Roundness" (since bananas are long). But because they are new, these preferences are assigned randomly (e.g., the Apple expert might prefer non-red fruits, the Banana expert might prefer round ones). - Mood (
biases
): Initially, all 3 judges have a neutral mood (equal to 0).
- Preferences (
-
The Scoring Process (
dense1.forward(X)
):- One by one, each fruit is presented to the panel.
- Each judge calculates their score using the formula:
Score = (Redness * Preference for Redness) + (Roundness * Preference for Roundness) + Mood
- This process happens for all 300 fruits.
-
The Scoreboard (
dense1.output
):- The final result is a large scoreboard. Each row is a fruit, each column is a score from a judge.
- For example, the first row might be
[0.0012, -0.0045, 0.0031]
. This means that with their current random knowledge, the Apple Judge gives this fruit 0.0012 points, the Orange Judge gives it -0.0045 points, and the Banana Judge gives it 0.0031 points.
The Key Takeaway: Because the judges' "knowledge" (weights) is random, this "scoreboard" (output) is completely meaningless. The process of "training", which is not in this code, is the act of showing the judges the correct answers (y
), pointing out their mistakes, and helping them adjust their "preferences" (weights
) and "moods" (biases
) over thousands of iterations, so that eventually their scoreboard accurately reflects the type of fruit.
Part 5: ASCII Illustration¶
Diagram for a single fruit passing through the panel of judges:
INPUT (1 fruit)
(2 features)
+----------------------+
| Redness, Roundness |
+----------------------+
|
| JUDGING PANEL (dense1)
| (3 Judges/Neurons)
|
| Preferences (w11, w21) +--------------------+ (Score from Apple Judge)
+----------------------------->| APPLE Judge + b1|-----> output_1
| +--------------------+
|
| Preferences (w12, w22) +--------------------+ (Score from Orange Judge)
+----------------------------->| ORANGE Judge + b2|-----> output_2
| +--------------------+
|
| Preferences (w13, w23) +--------------------+ (Score from Banana Judge)
+----------------------------->| BANANA Judge + b3|-----> output_3
+--------------------+
Scoring formula for the APPLE Judge:
output_1 = (Redness * w11) + (Roundness * w21) + b1
The final result for 1 fruit is a set of 3 scores: [output_1, output_2, output_3]
Appendix - Explaining Spiral Data¶
This is an explanation of "Spiral Data" - it might sound abstract, but it's one of the most classic and important sample datasets when you start learning about neural networks.
Let's break it down.
1. Simple Definition¶
Spiral Data is a synthetically generated dataset where data points belonging to different classes are arranged in interlocking spirals.
Take another look at the very plot you generated:
- You have 3 classes, corresponding to 3 colors: Red, Green, and Blue.
- Each point has a position (x, y coordinates).
- Points of the same color form a spiral "arm".
- These arms are intertwined, wrapping around each other.
2. Why is it so important and famous?¶
The reason this dataset is used so widely is because it's a perfect challenge to demonstrate the power of neural networks.
A. It "defeats" simple (Linear) models¶
Imagine you only have a ruler. Your task is to draw one or more straight lines to separate these 3 color groups, such that each region contains only one color.
You will quickly find that this is impossible.
/
/ <-- You cannot draw any single straight line
/ to separate Red (R) from Green (G) and Blue (B)
/
RRRRR
G B R G
B G R B G
B B G G B B
R R B R R
R G B R
BBBBB
A model that can only draw straight lines for classification is called a linear model. Spiral data is a classic example of non-linear data, where the boundary between classes is not a straight line but a complex curve.
In other words, spiral data is intentionally designed to be difficult for simple classification algorithms.
B. It demonstrates the necessity of Neural Networks¶
Neural networks, especially those with hidden layers and non-linear activation functions (which we will learn about later), are capable of learning extremely complex and winding decision boundaries.
A well-trained neural network can create a boundary that looks something like this:
It doesn't use a "ruler"; it learns to "draw" smooth curves to perfectly enclose each group of data.
Conclusion: Spiral data is a "graduation" test for a classification model. If your model can solve this problem, it proves that it is capable of handling complex, non-linear relationships in data, something that simple models cannot do.
3. What does the spiral_data
function create?¶
When you call X, y = spiral_data(samples=100, classes=3)
, the function calculates and returns two things:
-
X
(The Features):- This is a NumPy array containing the
[x, y]
coordinates of all the points. - With
samples=100
andclasses=3
, it will create100 * 3 = 300
points. - Therefore,
X
will have the shape(300, 2)
. - In our "fruit judges" story,
X
is equivalent to a list of 300 fruits, each with 2 features: "Redness" and "Roundness".
- This is a NumPy array containing the
-
y
(The Labels):- This is a NumPy array containing the class label for each corresponding point in
X
. - It will contain 300 numbers, consisting of 100
0
s, 1001
s, and 1002
s. y[i]
is the label (the correct answer) for the pointX[i]
.- In our story,
y
is the list of correct answers: which fruit is an "Apple" (class 0), which is an "Orange" (class 1), and which is a "Banana" (class 2).
- This is a NumPy array containing the class label for each corresponding point in
So, spiral_data
isn't just a dataset; it's a classic non-linear classification problem packaged and ready for you to quickly test your models.