CyberAsim

1. The Challenge

Context: In the agricultural supply chain, quality control is often a bottleneck. Manual sorting is labor-intensive, subjective, and prone to human error, leading to food waste and inconsistent packaging. The goal was to build a system capable of identifying 10 distinct fruit varieties (e.g., Mango vs. Orange, Apple vs. Cherry) to automate sorting lines.
The Obstacle: The primary engineering challenge was intra-class similarity and scale ambiguity. A zoomed-in Red Apple looks nearly identical to a Red Cherry in a $224 \times 224$ pixel frame. Furthermore, the model needed to be lightweight enough for potential deployment on edge devices (like Raspberry Pis on conveyor belts), ruling out massive architectures like VGG16 or ResNet50.

2. The Solution Architecture

The solution implements a Transfer Learning pipeline using the MobileNetV2 architecture:

Input Pipeline: Raw images are ingested from the Roboflow dataset, resized to $224 \times 224$, and normalized to the $[0,1]$ range.
Feature Extraction: We utilize MobileNetV2 (pre-trained on ImageNet) as the backbone to extract high-level features (edges, textures).
Classification: A custom "Head" is attached to the backbone to map these features to our specific 10 fruit classes.
Key Decisions:
- MobileNetV2 over ResNet: I chose MobileNetV2 because it uses Depthwise Separable Convolutions. This drastically reduces the parameter count and computation cost, making the model scalable for real-time industrial hardware without sacrificing significant accuracy.

3. Implementation Highlights

A. The Custom Classification Head

To adapt the pre-trained model to our specific dataset, I froze the base layers and added a custom dense layer structure. Note the use of Dropout to prevent the model from memorizing the training data.

base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False  # Freeze base layers initially

model = Sequential([
    base_model,
    GlobalAveragePooling2D(),
    Dense(128, activation='relu'),
    Dropout(0.3),  # Drop 30% of neurons to prevent overfitting
    Dense(10, activation='softmax') # 10 Output classes
])

B. Fine-Tuning Strategy

After the initial training, the model struggled to differentiate between "Mango" and "Orange" due to color overlap. I implemented a Fine-Tuning Phase by unfreezing the top 20 layers and retraining with a microscopic learning rate. This allowed the model to learn specific texture nuances (orange peel pores vs. mango skin) without destroying the pre-learned weights.

# Unfreeze the top 20 layers for fine-tuning
base_model.trainable = True
for layer in base_model.layers[:-20]:
    layer.trainable = False

# Re-compile with a very low learning rate
model.compile(
    optimizer=Adam(learning_rate=1e-5),  # 1e-5 prevents catastrophic forgetting
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

4. Challenges & Overcoming Roadblocks

The Trap: Scale Ambiguity (The Cherry/Apple Problem). In the error analysis, I found the model frequently confused Red Apples with Cherries. Since the images were resized to the same dimension, the model lost the sense of "real-world size."
The Logic Fix: While a complete fix requires a reference object in the frame (hardware solution), I improved the software discriminators by heavily augmenting the training data with Zoom and Rotation. By forcing the model to see these fruits at various "simulated distances," it began to rely more on the stem shape and skin specularity rather than just color and roundness, improving the F1-score for apples to 0.86.

5. Results & Impact

Accuracy: The system achieved an overall Test Accuracy of 86.27%.
Reliability: Achieved 100% Recall for distinct classes like Avocado and Strawberries, meaning zero false negatives for these high-value items.
Impact: The project serves as a valid proof-of-concept for low-cost, automated optical sorting, capable of running on non-GPU hardware.

AgriVision

Table of Contents