Introduction to Neural Networks: Perceptron’s, Activation Functions, Layers

Neural Network is a computational model inspired by the biological neural networks of the human brain. It is the fundamental architecture powering modern deep learning and artificial intelligence.

At its core, a neural network consists of interconnected layers of artificial neurons (nodes): an input layer, one or more hidden layers, and an output layer. Data flows forward through these connections. Each connection has an adjustable weight, and each neuron applies an activation function (like ReLU or Sigmoid) to its weighted sum of inputs to introduce non-linearity.

Through a process called training, the network learns by example using algorithms like backpropagation to iteratively adjust its weights, minimizing the difference between its predictions and the true outcomes. This allows it to recognize complex, non-linear patterns in data, making it extraordinarily powerful for tasks like image recognition, natural language processing, and predictive analytics, where explicit programming is infeasible.

Perceptron’s

The Perceptron, developed by Frank Rosenblatt in 1957, is the simplest type of artificial neural network and the foundational building block for deep learning. It is a single-layer binary classifier that mimics a biological neuron. Its model takes multiple binary or numeric inputs, applies learnable weights, sums them with a bias, and passes the result through a step function (like the Heaviside step) to produce an output of either 0 or 1. Although limited to solving only linearly separable problems, its introduction of adaptive weight learning via error correction paved the way for modern multilayer networks and backpropagation.

Activation Functions of Neural Networks:

1. Sigmoid (Logistic)

The Sigmoid function maps any input value into a smooth, S-shaped curve between 0 and 1. Its formula is f(x)=1/(1+e−x). This makes it ideal for models where the output is a probability (e.g., binary classification). However, it suffers from the vanishing gradient problem—gradients become extremely small for very high or low inputs, slowing or halting learning in deep networks. It is also computationally expensive due to the exponential operation.

2. Hyperbolic Tangent (tanh)

The tanh function is a rescaled sigmoid that outputs values in the range -1 to 1 with the formula f(x)=(ex−e−x)/(ex+e−x). Its output is zero-centered, which often helps accelerate convergence during training compared to sigmoid. However, like sigmoid, it also suffers from vanishing gradients for extreme inputs. It is commonly used in hidden layers of networks, especially in recurrent architectures, to model features that can have both positive and negative influences.

3. Rectified Linear Unit (ReLU)

ReLU is the most widely used activation function in modern deep learning. Defined as f(x)=max(0,x), it outputs the input directly if positive, otherwise zero. Its key advantages are computational simplicity and mitigating the vanishing gradient problem for positive inputs, leading to faster training. Its main drawback is the “Dying ReLU” problem, where neurons stuck in the negative region output zero and stop learning permanently. Variants like Leaky ReLU address this by allowing a small negative slope.

4. Leaky ReLU & Parametric ReLU (PReLU)

Leaky ReLU is a variant that addresses the dying ReLU problem. Its formula is f(x)=max(αx,x), where α is a small, fixed constant (e.g., 0.01) for negative inputs. This ensures a small, non-zero gradient even when the input is negative, keeping neurons active. Parametric ReLU (PReLU) generalizes this by making α a learnable parameter during training, allowing the network to optimize the slope for negative values, which can further improve performance in deep models.

5. Softmax

The Softmax function is used almost exclusively in the final output layer for multi-class classification problems. It converts a vector of raw scores (logits) into a probability distribution across multiple classes. The formula for class i is f(xi)=exi/∑jexj. The outputs sum to 1, with each value representing the predicted probability for a class. This makes it ideal for interpreting results and calculating the cross-entropy loss during training.

6. Swish & GELU

Swish, defined as f(x)=x⋅sigmoid(βx), is a smooth, non-monotonic function that often outperforms ReLU in very deep networks (e.g., for Google’s Neural Machine Translation). GELU (Gaussian Error Linear Unit), used in models like BERT and GPT, is f(x)=x⋅Φ(x), where Φ(x) is the CDF of the Gaussian distribution. Both functions provide smoother gradients than ReLU, improving training dynamics and model performance, especially in state-of-the-art transformer architectures.

Layers of Neural Networks:

  • Input Layer

The input layer is the first layer of a neural network. It receives raw data from the dataset and passes it to the next layer. Each neuron in the input layer represents a feature or attribute of the data. For example, in a sales prediction model, input neurons could represent product price, customer age, and purchase history. The input layer does not perform any computation; it only serves as the entry point for data. Properly designed input layers ensure that all relevant features are included, forming the foundation for accurate learning in subsequent layers.

  • Hidden Layer

Hidden layers are the intermediate layers between the input and output layers. They perform computations and extract patterns from the data using weights, biases, and activation functions. Each hidden layer transforms the input data into a more abstract representation, helping the network learn complex relationships. A neural network can have one or more hidden layers, and the number of neurons in each layer affects learning capacity. Hidden layers are critical for deep learning, enabling the network to capture intricate patterns and improve accuracy in tasks like image recognition, prediction, and classification.

  • Output Layer

The output layer is the final layer of a neural network that produces the predicted result or classification. The number of neurons in the output layer depends on the type of task. For example, a binary classification problem has one neuron, while multi-class classification has multiple neurons. Activation functions like sigmoid or softmax are used in the output layer to convert computed values into probabilities or predicted labels. The output layer communicates the final decision of the neural network to the user or system. Its accuracy depends on the proper design and learning of input and hidden layers.

Leave a Reply

error: Content is protected !!