Seeing the World Through Code
Computer vision (CV) enables machines to interpret and understand visual information from the world. From facial recognition to autonomous vehicles, CV is one of the most visible and impactful areas of AI.
Core Computer Vision Tasks
Image Classification
The most fundamental CV task: assigning a label to an entire image.
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np
# Load pre-trained model
model = ResNet50(weights='imagenet')
# Load and preprocess an image
img = image.load_img('cat.jpg', target_size=(224, 224))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)
img_array = preprocess_input(img_array)
# Predict
predictions = model.predict(img_array)
print(decode_predictions(predictions, top=3)[0])
# [('n02123045', 'tabby', 0.82), ('n02123159', 'tiger_cat', 0.12), ...]
Object Detection
Unlike classification, object detection locates multiple objects within an image, drawing bounding boxes around each one. Popular architectures include YOLO, SSD, and Faster R-CNN.
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
results = model.predict('street_scene.jpg', conf=0.25)
for r in results:
for box in r.boxes:
class_id = int(box.cls[0])
confidence = float(box.conf[0])
coords = box.xyxy[0].tolist()
print(f"Class: {model.names[class_id]}, Confidence: {confidence:.2f}, BBox: {coords}")
Image Segmentation
Segmentation goes a step further by classifying each pixel. Semantic segmentation labels every pixel by category, while instance segmentation distinguishes between individual objects of the same class.
Real-World Applications
Healthcare
- Detecting tumors in X-rays and MRIs
- Analyzing retinal scans for diabetic retinopathy
- Automating pathology slide analysis
Autonomous Vehicles
Computer vision enables self-driving cars to detect lane markings, traffic signs, pedestrians, and other vehicles in real time.
Retail and E-Commerce
- Visual search: find products by image
- Automated checkout: recognize items without barcodes
- Inventory management: count stock from camera feeds
Agriculture
- Crop health monitoring using drone imagery
- Weed detection for precision spraying
- Yield estimation from satellite data
Getting Started
The deep learning ecosystem makes it easy to experiment:
# Install
# pip install opencv-python tensorflow ultralytics
# Basic image operations with OpenCV
import cv2
img = cv2.imread('photo.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, threshold1=50, threshold2=150)
cv2.imwrite('edges.jpg', edges)
Conclusion
Computer vision has matured from academic research to everyday utility. Pre-trained models and open-source tools mean you don't need a PhD to build something useful. Start with a dataset that excites you, experiment with a pre-trained model, and iterate. The world is full of visual problems waiting for a computer vision solution.