Comprehensive Guide: Top Computer Vision Resources All in One Blog

Save this blog for comprehensive resources for computer vision

Chinmay Bhalerao
7 min readJan 27, 2023
Source:appen

Working in computer vision and deep learning is fantastic because, after every few months, someone comes up with something crazy that completely changes your perspective on what is feasible.

After spending 2+ years in this field, I have found many interesting and helpful resources that will help you in your computer vision work. Also, they will show you how huge this domain is. I planned to add topics in a systematic way as we work on a computer vision project. So without any further due, let's start with,

Dataset generation

To train and evaluate computer vision models, we want some data. A dataset is a group of samples (in this case, photos or videos). Examples that fall within a specific topic or domain are typically included in datasets. Open datasets are those that anybody may access, download, and use for any purpose. We should include images which are having our targeted labels of classes. You can use the below resources for creating your data.

Kaggle image datasets: Link

Users of Kaggle may discover and share data sets, study and develop models in a web-based data science environment, and collaborate with other data scientists and computer vision experts.

Datagen: Link

There are other sites where you can download basic images and then you can augment or process them. They are free to use and millions of images are present on the below site.

Unsplash

Pexel

Pixabay

I found these websites much more helpful to process with an easy to download images. I also mentioned 2 techniques for dataset creation in my blog so you can refer to them for further resources.

Annotation:

I purposefully placed annotation before augmentation because many annotation tools now have a facility for augmentation.

Annotation is the process where you want to mark a mask or bounding box around your target in an image in order to teach your model about the features of categories.

Image by author

Roboflow: inbuilt facility for augmentation and annotation

makesense.ai : Very helpful for annotations and collabration facility is also present

Vgg annotator : Useful for faster masking

LabeIimg

V7

Labelbox

Scale AI

SuperAnnotate

Augmentation of images:

Image data augmentation is the process of generating new transformed versions of images from the given image dataset to increase its diversity.

Roboflow: [Again !!!!] One of best augmentation tool for huge data

image_augmentor : It will help you for fast and different augmentations

My blog: I coded it with the help of library to augment images you can check this for customization.

Training of model:

Training is a teaching model to understand facts and draw predictions from them so that it can accurately carry out a task.

There are many models but I am providing links of papers with a repo for a few famous ones. I will provide papers for all models to understand what is its structure and its layerwise arrangements.

* Object detection

Object detection is a computer vision technique for locating instances of objects in images or videos.

An incredible explanation of working of YOLO by Dhaval Patel

YOLO v1: Paper and repo

YOLO v2: Paper and repo

YOLO v3: Paper and repo

YOLO v4: Paper and repo

YOLO v5: Paper and repo

YOLO v6: Paper and code

YOLO v7: Paper and repo

YOLO v8: [Paper was not realesed until this blog ]and repo

SSD: Paper and repo

Faster-RCNN: Paper and repo

Fast-RCNN: Paper and repo

Spatial Pyramid Pooling (SPP-net): Paper and repo

* Segmentation

It is the process of dividing an image into different regions based on the characteristics of pixels to identify objects or boundaries to simplify an image and more efficiently analyze it.

YOLOv8 Instance Segmentation

YOLOv7 Instance Segmentation

OneFormer

Mask RCNN

YOLOv5 Instance Segmentation

SegFormer

Libraries to know

Few libraries are helpful in all sub-activities in computer vision projects.

OpenCV ,SimpleCV,TensorFlow,Keras,MATLAB,PCL,DeepFace,NVIDIA CUDA-X,NVIDIA,Performance,Primitives,BoofCV,OpenVINO,PyTorch,Albumentations,Caffe,Detectron2,CUDA,YOLO

Interesting and simple projects that I found to improve your computer vision thinking:

image-net — computer vision challenge

1. How to read an image in Python using OpenCV — 2023

2. Sketchy — Sketch making Flask App — Interesting Project — 2023

3. How to detect shapes using cv2- with source code — easy project — 2023

4. Rotating and Scaling Images using cv2 — a fun Python application — 2023

5. How to use mouse clicks to draw circles in Python using OpenCV — easy project — 2023

6. How to perform Morphological Operations like Erosion, Dilation, and Gradient in Python using OpenCV — easiest explanation –2023

7. Object Detection using SSD — with source code — easiest way — fun project –2023

8. Face Recognition Based Attendance System with source code — Flask App — With GUI — 2023

9. Face Recognition — GitHub Link 1, GitHub Link 2, Video Tutorial

10. Easiest way to Train yolov7 on the custom dataset — 2023

11. Template Matching — Video Tutorial, Written Tutorial

12. Semantic and Instance Segmentation on Videos using PixelLib in Python — Video Tutorial, Code

13. Object Detection using Deep Learning — Video Tutorial, Written Tutorial

14. Drowsiness Detection using cv2 in Python — interesting project — 2023

15. Realtime Number Plate Detection using Yolov7 — Easiest Explanation — 2023

Simultaneous localization and mapping [SLAM] systems:

Simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location within it.

A video by Daniel DeTone explaining SLAM systems

DROID-SLAM : DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer.

DynaSLAM:DynaSLAM is a visual SLAM system that is robust in dynamic scenarios for monocular, stereo and RGB-D configurations. Having a static map of the scene allows inpainting the frame background that has been occluded by such dynamic objects.

*RGB (Monocular):

ORB-SLAM:ORB-SLAM is a versatile and accurate SLAM solution for Monocular, Stereo and RGB-D cameras

Kimera:An Open-Source Library for Real-Time Metric-Semantic Localization and Mapping.

PTAM: PTAM (Parallel Tracking and Mapping) is a camera tracking system for augmented reality

LSD-SLAM:LSD-SLAM is a novel, direct monocular SLAM technique. Instead of using keypoints, it directly operates on image intensities both for tracking and mapping.

SVO-SLAM:SVO uses a semi-drect paradigm to estimate the 6-DOF motion of a camera system from both pixel intensities

Neural Radiance Field (NeRF)

A neural radiance field (NeRF) is a fully-connected neural network that can generate novel views of complex 3D scenes, based on a partial set of 2D images.

Bmild

nerf-pytorch

nerf_pl

MobileNeRF

Interesting blogs related to computer vision:

Introduction to object detection by Analytics vidya: Part1, Part 2, and Part 3

Instance segmentation : To understand all about instance segmentation

Semantic segmentation:To understand all about instance segmentation

Ultimate Guide to Object Detection Using Deep Learning: Step by step approch to understand deep learning

Image processing : To understand basics of image processing

All CNN architectures : Understanding of basic cnn architectures

Informative videos related to computer vision:

MIT 6.S094: Computer Vision by Lex Fridman

CNN Architectures by Michigan online

Tensorflow Object Detection by Nicholas Renotte

Detection and Segmentation by Stanford

CNN by Andrej Karpathy (2016)

CNN by Stanford University School of Engineering (2017)

Introduction to Deep Learning and Self-Driving Cars by Lex Fridman [MIT 6.S094]

Deep Learning State of the Art by Lex Fridman

Stanford Machine Learning Course — Andrew Ng

Research Papers

These are a few research paper sources where you can get easily papers for any required model and method.

arXiv.org

ICLR

Awesome — Most Cited Deep Learning Papers

Other Resources

GitHub — A famous host of open-source software projects.

Quora — Seek help and ask any questions here if you have any difficulties!

nducthang/deep_learning_object_detection

nducthang/Active-learning-for-object-detection

I will be updating this blog frequently because there are many things that are not covered in this blog. You can follow me for new updates. Also, you can suggest topics to add to make it more useful for newbies as well as for all computer vision engineers. Let’s embrace AI!

If you have found this article insightful

Give article claps if you liked this article

If you found this article insightful, follow me on Linkedin and medium. you can also subscribe to get notified when I publish articles. Let’s create a community! Thanks for your support!

If you want to support me :

As Your following and clapping is the most important thing, but you can also support me by buying coffee. COFFEE.

You can read my other blogs related to :

Signing off,

Chinmay

--

--

Chinmay Bhalerao

AI-ML Researcher & Developer | 3 X Top writer in Artificial intelligence, Computer vision & Object detection | Mathematical Modelling & Simulations