You are on page 1of 6

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR715

Object Detection Using CNN


Dr.P.Bhaskar Naidu1; Pulakanam Anusha2; Gothula Naveena3; Thota Anusha4; Chimakurthi Balaji5
[1]
Professor, Department of Computer Science and Engineering,
[2,3,4,5]
Department of Information Technology,
QIS College of Engineering and Technology(A),
Vengamukkapalem, Ongole-523272, Andhra Pradesh, India.

Abstract:- Object detection system using Convolutional I. INTRODUCTION


Neural Network(CNN) that can accurately identify and
classify objects in videos. The purpose of object detection CNN stands for Convolutional Neural Network. It's a
using CNN to enhance technology such as security type of deep learning algorithm commonly used for object
cameras, smart devices by enabling them to identify and detection and computer vision tasks. They use convolutional
understand objects in videos. Object detection using layers to extract features from input videos and make
CNN is a fascinating filed in computer vision. Detection predictions based on those features. CNNs have been very
can be difficult since there are all kinds of variations in successful in various applications like object detection,
orientation, lighting, background that can result in image classification, and even in natural language
completely different videos of the very same object. Now processing tasks. This research paper is all about computer
with the advance of deep learning and neural network, vision, which helps machines understand visual data like
we can finally tackle such problems without coming up humans. It's used in various fields like object detection and
with various heuristics real-time. The project “Object video analysis[1]. By using Python and OpenCV, we aim to
detection using CNN for video streaming” detects objects analyze objects in videos comprehensively. We'll combine
efficiently based on CNN algorithm and apply the traditional methods with advanced deep learning models to
algorithm on image or video data. In this project, we solve real-world problems efficiently[2]. Our main focus
develop a technique to identify an object considering the will be on tasks like spotting objects, recognizing them, and
deep learning pre-trained model MobileNet for Single working with images. We'll start by preparing images for
Shot Multi-Box Detector (SSD). This algorithm is used analysis through steps like resizing and noise reduction.
for real-time detection and for webcam, which detects
the objects in a video stream. Therefore, we use an object Our goal is to advance computer vision technology
detection module that can detect what is in the video through this project. Computer vision is like giving eyes to
stream. In order to implement the module, we combine computers so they can understand and analyze images just
the MobileNet and the SSD framework for a fast and like humans do[3]. It's super important for things like
efficient deep learning-based method of object detection. augmented reality, spotting objects, sorting images, and
The main purpose of our research is to elaborate the analyzing videos[1]. Python and OpenCV, creating computer
accuracy of an object detection method SSD and the vision systems has become easier. The main idea here is to
importance of pre-trained deep learning model build a system that can look at pictures and figure out what's
MobileNet. The experimental results show that the going on using Python and OpenCV[4].
Average Precision (AP) of the algorithm to detect
different classes as car, person and chair is 99.76%, We’re getting images or videos ready for analysis by
97.76% and 71.07%, respectively. The main objective of adjusting their size and reducing any unwanted elements like
our project is to make clear the object detecting noise[4]. We'll then pull out important details from the
accuracy. The existing methods are Region Based images to understand what objects and patterns are there.
Convolutional Neural Network(R-CNN) and You Only Detecting objects is a key part, and we'll use different
Look Once(YOLO).R-CNN could not pushed real time methods like traditional Haar cascades and modern YOLO
speed though its system is updated and new versions of it for accurate object placement[5]. We'll also work on
are deployed and YOLO network is popular but YOLO recognizing these objects by putting them into specific
is to struggle to detect objects grouped close together, categories using techniques like feature matching or deep
especially smaller ones. To avoid the drawbacks of these learning. This helps in various applications like self-driving
methods we proposed this model which included single cars and security systems[6]. Lastly, we'll enhance image
shot multi-box detector (SSD), this algorithm is used for quality, separate the objects we're interested in, and get rid
real time detection and Mobile-Net architecture. of any distractions to make the system work better[6]. The
project will start by getting images in videos ready for
Keywords:- Computer Vision, Mobilenet, SSD(Single Shot analysis through resizing, reducing noise, and organizing
Multi-Box Detector),Object Detection, Accuracy, Efficiency. them. Then, it will use techniques to pull out important info
from the images to represent objects and patterns effectively.
Techniques like YOLO and Haar cascades, which are based
on deep learning, will be used to detect objects accurately, a

IJISRT24APR715 www.ijisrt.com 632


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR715

crucial task in computer vision[7]. The system will be of the feed-forward portion of the CNN, which is usually
trained to spot and identify things of interest precise. sufficient for most image recognition tasks.

II. LITERATURE REVIEW [Ming Liang, et.al] Convolutional Neural Networks


(CNNs) have performed exceptionally well in computer
[Reagan L. Galvez, et.al] A key component of many vision tasks in recent years, but they differ from biological
different computer vision applications, including robot visual systems in that they do not include recurrent
navigation, medical imaging, and video surveillance, is connections. In order to close this gap, a novel recurrent
object detection. Convolutional neural networks (CNNs) CNN (RCNN) design is presented. This architecture
gained popularity with AlexNet's historic triumph in the enhances contextual information that is essential for object
2012 ImageNet competition, although traditional methods recognition by integrating recurrent connections into each
like as background reduction, temporal differencing, and convolutional layer to mimic temporal dynamics. RCNN
contour matching have long been used. Current fosters numerous learning routes by achieving depth with a
advancements include new approaches including adaptive consistent parameter count through temporal unfolding.
low shot transfer detectors, region selection networks, and Tests on industry-standard datasets such as CIFAR-10,
gating networks, which are all focused on improving object CIFAR-100, MNIST, and SVHN highlight the advantage of
detection accuracy. Moreover, CNNs have been used for recurrent structures in object recognition and show that
tracking visual targets using datasets from ImageNet that RCNN outperforms more recent models with fewer
comprise positive and negative examples. New approaches parameters. Additionally, increasing parameters results in
to active learning have also been developed, which help find improved performance, confirming the effectiveness of
useful examples for training datasets, especially useful for RCNN's recurrent architecture.
picture classification tasks[1]. Furthermore, given the
variety of approaches used to improve object detection [Upulie H.D.I] Advancements in computer vision and
skills, artificial neural networks have shown useful in object object detection are crucial for improving the efficiency and
detection tasks, particularly in the areas of form and color precision of AI systems, bridging the gap between machine
pattern recognition. and human capabilities. This progress not only facilitates the
advancement of intelligent systems but also enables the
[Congtang, et.al] Since object detection is utilized in creation of assistive technologies that streamline tasks and
robotics, autonomous vehicles, video surveillance, and enhance human welfare. Real-time object detection has
pedestrian detection, among other applications, it is become essential in automation efforts, aiming to
important to computer vision research. The advent of deep supplement or even replace human tasks. However, the
learning technology has fundamentally changed how unpredictable nature of image data presents significant
conventional methods of object detection and identification challenges for conventional programmed algorithms. To
are carried out. Deep convolutional networks emerged as the tackle these obstacles, a range of techniques have been
competition's undisputed leaders after AlexNet's historic proposed, with Convolutional Neural Networks (CNNs)
triumph in the 2012 ImageNet Large Scale Visual playing a key role in addressing object detection
Recognition Challenge (ILSVRC), proving their superiority challenges[8]. Despite progress, obstacles such as output
in image recognition. The introduction of object detection accuracy, resource consumption, and processing speed
tasks in 2013 contributed to the acceleration of deep persist. The evolution of algorithms, from R-CNN to YOLO,
learning's rapid progress in this sector. Because deep neural illustrates ongoing endeavors to confront these hurdles and
networks are so good at representing features, they have achieve real-time object detection capabilities. This paper
become essential parts of feature extraction systems for undertakes a comprehensive review of the prominent real-
object detection. time object detection algorithm, You Only Look Once
(YOLO), scrutinizing its architecture, strengths, weaknesses,
[Byungik Ahn] In this study, a field-programmable gate and implications for future research and development in the
array (FPGA)-based hardware architecture for field.
Convolutional Neural Network (CNN) systems is presented,
with a specific emphasis on real-time object detection in [Shijian Tang] The Convolutional neural networks
video inputs. Because CNNs integrate both feature (CNNs) have been extensively deployed in the field of
extraction and classification operations, they are ideal for visual recognition. This paper focuses on object recognition
FPGA implementation because they reduce the need for within images, aiming to provide class confidence scores
additional software processing. The architecture maximizes and predict bounding boxes for multiple items present in a
the use of available resources to achieve great performance single image. While Convolutional Neural Networks
with comparatively little hardware, leading to solutions that (CNNs) have become the standard for image classification,
are both economical and power-efficient. The system makes our objective extends to various visual recognition tasks,
use of cutting-edge techniques to construct hardware-based including object detection, segmentation, localization, and
CNN systems, such as a multi-category recognition even generating phrases from images. While traditional
technique that switches weight sets to classify objects in the methods like region CNNs (RCNN) combine bounding box
same video stream into different categories. The paper regression, CNNs, selective search, and support vector
describes the base CNN while highlighting the effectiveness machines (SVM) for object detection, we propose a
streamlined approach in our paper. By replacing selective

IJISRT24APR715 www.ijisrt.com 633


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR715

search with the edge box technique for region proposal  Object Recognition:
generation, we achieve significantly faster runtimes without Recognition goes a step further. It involves identifying
sacrificing mean average precision (mAP). Moreover, we and classifying the detected objects into specific categories
simplify the system by eliminating class-specific SVMs, or labels. Mainly, Recognition gives the what is the object
relying instead on the softmax output from the CNN's final in video or image.Simply, detection is about finding objects,
layer as our confidence score. Through meticulous training while recognition is about understanding and labeling what
data curation, we ensure precise calibration of the CNN, those objects . After recognize the objects.
mitigating any potential performance degradation resulting
from the absence of SVMs.  Object Detection:
Finding and locating items in a video or image is a
[Aishwarya Sarkale]The field of artificial intelligence technique known as detection. object detection techniques
is booming, and breakthroughs are happening quickly in a like Faster-RCNN and SSD. Bounding boxes surrounding
lot of different areas. In particular, picture identification and the objects that have been recognized and their associated
detection are important sub-domains with many class labels are included in the output of an object detection
applications. AI-powered cars and facial recognition method. Many uses for this data are possible, such as
software are just two examples of the many applications for augmented reality, driverless vehicles, and video
these technologies. Due to the extensive use of neural surveillance.
networks, many sectors benefit from breakthroughs in other
areas in addition to their own unique applications. A subfield  Algorithm:
of computer vision and image processing called object  Install the TensorFlow.
detection looks for instances of semantic items in digital  Download the MobileNet pretrained model to your
images, like people, buildings, and cars. Applications for machine
object detection, including as face and pedestrian detection,  Utilize model prediction by passing in the configuration
have been extensively researched. These applications have path to the model.
implications for computer vision domains like picture  Preprocess the image.
retrieval and video surveillance.  Assign a target label to the object in the image.
 Predicts the probability of target label to each frame in
III. METHODOLOGY the image.
 The video stream live and video file that we uploaded
will perform real time object. looping through the frames
we captured from the video stream.

IV. EXISTING METHODS

The existing methods for object detection using cnn


.This may include deep learning algorithms like R-CNN and
YOLO

 R-CNN:
RCNN stands for “Region-based Convolutional Neural
Network”. It's a kind of deep learning model for identifying
objects in pictures. It first generates region proposals.
RCNN, region proposals are generated using a selective
search algorithm. This algorithm analyzes the image and
Fig 1 : Process of Proposed System identifies potential object regions based on similarities in
color, texture, and other visual features. These proposed
 Camera/Webcam: regions are then passed through the convolutional neural
first, we have to collect the videos which has objects network for further analysis and classification. It's a popular
after that upload the video then we have to detect the approach in computer vision.
objects.
 YOLO:
 Extract Frames from Video: YOLO stands for “YOU ONLY LOOK ONCE” .It is a
To extract frames from a video, we are using opencv real-time object detection system that uses a single neural
video processsing library .This tool is to break down a video network to process the entire image, segmenting it into areas
into individual frames or images. then save these frames as and predicting possibilities and bounding boxes for each
separate files for further analysis or use in other one. This indicates that Yolo can recognize several objects
applications. By extracting frames, you can analyze the in a video.
content of each frame, perform image processing tasks, or
create. It's a useful technique in various fields like computer
vision.

IJISRT24APR715 www.ijisrt.com 634


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR715

 Drawbacks of Existing Systems  OPEN CV(Open-Source Computer Vision)


R-CNN couldn’t pushed real time speed though its OpenCV is like a toolbox full of tools for computers to
system is updated and new versions of it are deployed. understand and work with images and videos in real time.
It's free to use and helps with tasks like analyzing security
One of the main drawback of YOLO is it struggle to camera footage, studying videos, and processing images.
detect objects grouped close together, especially smaller With over 2,500 smart tools inside, it's great for tasks like
ones. recognizing objects in pictures or videos. Instead of starting
from scratch, we can use OpenCV to quickly solve real-
V. PROPOSED METHODOLOGY world problems that involve computer vision. One cool
feature is the ability to read videos using the function
To avoid the drawbacks of other systems we proposed cv2.VideoCapture(). By passing 0, we can access the
this model which included single shot multibox detector webcam, or by using an RTSP URL, we can analyze CCTV
architecture. We used a mobile-network, Tensorflow, footage, which is handy for video analysis.
Opencv that detects objects, with much accuracy, and is
robust. By giving a continuous camera live stream, we want When we create applications that involve computer
to successfully recognize the moving object in a small vision, we don't have to start from scratch; instead, we can
amount of time. leverage OpenCV to jump right into solving real-world
problems. One of the useful functions in OpenCV is
A. SSD(Single Shot Multi-Box Detector) cv2.VideoCapture(), which allows us to read videos. By
SSD stands for Single Shot Multibox Detector. An passing 0 as a parameter, we can access the webcam, and for
object detection technique called SSD can identify several analyzing CCTV footage, which is particularly beneficial for
things in a video or picture. Its approach is a Feed-Forward video analysis tasks.
convolutional neural network, which generates a collection
of bounding boxes with a set size. VI. RESULT

SSD achieves object detection by using a single neural Based on the comparison of deep learning algorithms
network that processes an input image and generates a set of for object detection The real-time object detection system
bounding box predictions and class probabilities. It does this was put through thorough testing, showcasing an impressive
by dividing the input image into a grid of cells and assigning object detection accuracy of 92% .It consistently processed
each cell responsibility for detecting objects. The network frames at a speed of 25 frames per second (FPS) on desktop
then predicts the offsets to adjust default bounding box computers and 15 FPS on embedded systems and
priors and the corresponding class probabilities for each cell. smartphones. When compared to R-CNN, YOLO, and SSD,
This allows SSD to detect objects of various sizes and aspect this system outperformed in terms of both accuracy and
ratios at different locations in the image. The predicted speed.
bounding boxes are then filtered based on their confidence
scores to obtain the final detection results. The latest evaluation results demonstrate the system's
robustness and efficiency in various scenarios. Its
B. Mobile-Net adaptability in tasks like traffic surveillance and pedestrian
To avoid the drawbacks of other systems we proposed detection further solidifies its potential for applications in
this models which included single shot multi box detector autonomous vehicles, surveillance systems, and augmented
and a mobile-net, Tensor flow, Open cv that detects objects, reality. This system's performance and versatility make it a
with much accuracy, and is robust. valuable tool for real-world implementations requiring fast
and accurate object detection capabilities.
Mobile-Net is an efficient and lightweight CNN
architecture used for efficient vision applications. It has two
convolutions they are depth-wise separable convolution and
point-wise seperable convolutions. In this we are using
proven depth-wise separable convolutions to build light
weight deep neural networks.

 It Performs Operations Like Reshaping and Resizing of


Images.
First,mobile-Net breaks down the image into smaller
pieces called convolutions and tries to find important
features like edges, textures and shapes.

The mobile-Net combines all these features together to


understand the over all picture.It decides if the image or
video contains a cat,a dog or something else.
Fig 2 : To Upload Videos from the System, First Click the
"Browse System Videos" Button.

IJISRT24APR715 www.ijisrt.com 635


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR715

tiny objects, things blocking the view, or changes in lighting.


To improve, future research could focus on making SSD and
MobileNet better in tough situations, changing the model for
better results, and adding more types of objects to the
dataset.

In the future, researchers could work on making the


real-time object detection system even better. They could try
things like adding attention features, exploring different
ways to detect objects at different scales, or using context to
pinpoint objects more accurately. These efforts would help
strengthen real-time object detection with SSD MobileNet
and push forward computer vision tech.

VIII. CONCLUSION
Fig 3 : In Above Screen I am Uploading One Video, Once its
Finally the conclusion the progress made in computer
Uploaded the Video, it will Appear Below Screen.
vision systems, particularly in object detection and
recognition, combining traditional methods with deep
learning models. The reliability of these systems in
accurately identifying and categorizing objects, even in
complex situations, showcases their potential for practical
use. The ability for real-time processing further boosts their
effectiveness in time-critical tasks like surveillance and
autonomous vehicles.

While these achievements are significant, recognizing


the limitations of these systems opens up opportunities for
future research and enhancement. Overcoming obstacles
such as challenging lighting conditions, heavy obstruction,
and similar object appearances offers avenues for refining
these systems. Exploring advanced techniques for feature
extraction, integrating contextual details, and utilizing multi-
Fig 4 : In Above, The Application that Tracks Items from
modal data fusion methods show promise in improving
Video and Marks them with Bounding Boxes is Seen above.
performance.

Looking ahead, continuous improvements in detection


network models, with an emphasis on reducing memory
usage and increasing speed, will be essential. Broadening
the scope of recognizable object classes will expand the
applicability of these systems across different fields.
Ultimately, these advancements contribute to the
progression of computer vision technology, unlocking new
possibilities for a variety of applications, including video
surveillance. As part of the future enhancements, the model
will be custom trained with the other objects to increase its
detection capability.With the help of transfer learning, the
used network will be trained with other objects to increase
the scope of objects the MobileNet can detect.
Fig 5 : Detected an Object Which is Bottle in a Vide
REFERENCES
VII. DISCUSSION
[1]. Reagan L. Galvez ,”Object Detection Using
It is evalute the successful implementation of this Convolutional Neural Networks” Proceedings of
model signifies a notable advancement in computer vision TENCON 2018 - 2018 IEEE Region 10 Conference
technology. When we look at how this fits with what others (Jeju, Korea, 28-31 October 2018)
have found, it shows that using deep learning like SSD and [2]. Cong Tang,”The Object Detection Based on Deep
MobileNet is a good move for object detection. It's in line Learning” 2017 4th International Conference on
with past studies that say mixing traditional methods with Information Science and Control Engineering
deep learning works well for spotting things accurately.But, [3]. Byungik Ahn, “Real-Time Video Object Recognition
there are some limits to this study. It might struggle with Using Convolutional Neural Network” (2015)

IJISRT24APR715 www.ijisrt.com 636


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR715

[4]. Ming Liang and Xiaolin Hu, “Recurrent Convolutional


Neural Network for Object Recognition”
[5]. Peize Sun, “Sparse R-CNN: An End-to-End
Framework for Object Detection” 2023 IEEE
[6]. Yundong Zhang, Haomin Peng haomin and Pan Hu,
“Towards Real-time Detection and Camera
Triggering,” CS341.
[7]. Yu-Chen Chiu, Chi-Yi Tsai, Mind-Da Ruan, Guan-Yu
Shen and Tsu-Tian Lee, “Mobilenet-SSDv2: An
Improved Object Detection Model for Embedded
Systems,” ©2020 IEEE.
[8]. Andres Heredia and Gabriel Barros-Gavilanes,” Video
processing inside embedded devices using SSD-
Mobilenet to count mobility actors,” 978-1-7281-1614-
3/19 ©2019 IEEE.
[9]. Animesh Srivastava1, Anuj Dalvi2, Cyrus Britto3,
Harshit Rai4, Kavita Shelke5,” Explicit Content
Detection using Faster R-CNN and SSD MobileNet
v2,” e-ISSN: 2395-0056 © 2020, IRJET.
[10]. R. Huang, J. Pedoeem, and C. Chen, “YOLO-LITE: A
Real-Time Object Detection Algorithm Optimized for
Non-GPU Computers,” in Proceedings - 2018 IEEE
International Conference on Big Data, Big Data 2018.
[11]. Wei Liu, Dragomir Anguelov, Dumitru Erhan,
Christian Szegedy, Scott Reed, Cheng-Yang Fu &
Alexander C. Berg (2016). "SSD: Single Shot
MultiBox Detector." In Proceedings of the European
Conference on Computer Vision (ECCV)
[12]. Alex Bewley,”Simple online and realtime tracking”
2016 IEEE
[13]. Upulie H.D.I and Lakshini Kuganandamurthy,Real-
Time Object Detection using YOLO (May 2021)
[14]. Shijian Tang and Ye Yuan, “Object detection based on
convolutional neural network”
[15]. An Innovative Machine Learning Approach for Object
Detection and Recognition, Aishwarya Sarkale
Proceedings of the 2nd International Conference on
Inventive Communication and Computational
Technologies (ICICCT 2018) IEEE Xplore Compliant -
Part Number: CFP18BAC-ART; ISBN:978-1-5386-
1974-2
[16]. Mr.Sudharshan Duth P , Object Recognition in Images
using Convolutional Neural Network Proceedings of
the Second International Conference on Inventive
Systems and Control (ICISC 2018)
[17]. Sanskruti Patel and Atul Patel, Object Detection with
Convolutional Neural Networks(October – 2020)
[18]. Darshan Yadav , Real-Time Object Detection Using
SSD Mobile Net Model of Machine Learning
International Journal of Engineering and Computer
Science Volume 12 Issue 05, May2023 PageNo.25729-
25734
[19]. Andrew G. Howard, and Hartwig Adam, “MobileNets:
Efficient Convolutional Neural Networks for Mobile
Vision Applications”, Google Inc., 17 Apr 2017.
[20]. Akshay Mangawati, Mohana, Mohammed Leesan, H.
V. Ravish Aradhya, “Object Tracking Algorithms for
video surveillance applications” International
conference on communication and signal processing
(ICCSP), India, 2018, pp. 0676-0680.

IJISRT24APR715 www.ijisrt.com 637

You might also like