AYN - AI-Powered Visual Assistant

Description

AYN is an advanced visual assistance application designed to help visually impaired users navigate and understand their surroundings. This full-stack application combines machine learning-based object detection with natural language processing to provide real-time audio descriptions of environments captured by the camera. The system features a Flask backend that integrates YOLOv5 for object detection and MiDaS for depth estimation, along with a sleek Next.js frontend with responsive design and accessible UI.

Challenges & Solutions

Key challenges included implementing robust real-time object detection with spatial awareness, creating an accessible and intuitive UI for visually impaired users, and developing an efficient pipeline for processing video frames and generating meaningful audio descriptions. The project required integrating multiple machine learning models and ensuring they worked together seamlessly while maintaining low latency for real-time feedback.

Technical Achievements

Multi-Model AI System: Integrated YOLOv5 for object detection and MiDaS for depth estimation to create comprehensive spatial awareness
Real-time Processing: Implemented efficient video frame processing with WebRTC for camera access and canvas manipulation
Depth-Based Object Prioritization: Created an algorithm to identify and prioritize nearby objects based on depth estimation
Natural Language Generation: Used GPT-4o to translate technical object data into natural, helpful descriptions for users
Text-to-Speech Integration: Implemented OpenAI's TTS API to convert descriptions into clear audio feedback
Accessibility-First Design: Built a UI specifically optimized for users with visual impairments, featuring large buttons and voice feedback
Cross-Platform Compatibility: Ensured the application works across different devices and browsers with responsive design

Technologies Used

Python Flask PyTorch Next.js React Tailwind CSS OpenAI API YOLOv5 MiDaS

Project Details

Date: February 2025

Category: AI & Accessibility

View Live GitHub Repo