Image Fusion in Computer Vision

Nico Klingler

About

Viso Suite is the all-in-one solution for teams to build, deliver, scale computer vision applications.

Need Computer Vision?

Viso Suite is the world’s only end-to-end computer vision platform. Request a demo.

In many computer vision applications (e.g. robot motion and medical imaging) there is a need to integrate relevant information from multiple images into a single image. Such image fusion will provide higher reliability, accuracy, and data quality.

Multiview fusion improves the image with higher resolution and also recovers the 3D representation of a scene. Multimodal fusion combines images from different sensors and is referred to as multi-sensor fusion. Its main applications include medical imagery, surveillance, and security.

About us: Viso.ai provides a robust end-to-end computer vision solution – Viso Suite. Our software helps several leading organizations start with computer vision and implement deep learning models efficiently with minimal overhead for various downstream tasks. Get a demo here.

Levels of Image Fusion

Engineers perform Image Fusion (IF) at three levels based on the stage of fusion accomplishment.

Pixel Level IF. This image fusion method is at a low level and it is simple to perform. It contains the features of two input images and generates an average, single resultant image.
Feature Level IF. It justifies the image features (size, color) from multiple sources, thus generating the enhanced image after feature extraction.
Block (Region) Based IF. This is a high-level technique. It utilizes multistage representation and calculates measurements according to the regions.

Levels image fusion — Levels of Image Fusion – Source

Types of Image Fusion

Single-sensor IF

Single-sensor image fusion captures the real world as a sequence of images. The algorithm combines a set of images and generates a new image with optimal information content. E.g. in different lighting conditions, a human operator may not be able to detect objects but highlights them in the resultant fused image.

The drawbacks of this method are the limitations of the imaging sensor that is used in some sensing areas. The conditions in which the sensor capability restricts the system functions (dynamic range, resolution, etc.). For example, some sensors are good for illuminated environments (daylight) but are not suitable for night and fog conditions.

Multi-sensor IF

A multi-sensor image fusion method merges the images from several sensors to form a composite image. E.g. an infrared camera and a separate digital camera produce their individual images and by merging, the final fused image is produced. This approach overcomes the single-sensor problems.

This method generates the merged information from several images. The digital camera is suitable for daylight conditions; the infrared camera is good in weakly illuminated environments. So the method has applications in the military and also in object detection, robotics, and medical imaging.

Multiview IF

In this method, images have multiple or different views at the same time. This method utilizes images from different conditions like visible, infrared, multispectral, and remote sensing. Common methods of image fusion include object-level fusion, weighted pixel fusion, and fusion in the transform domain.

Multi-focus IF

This method processes images from 3D views with their focal length. It splits the original image into regions so that every region is in focus for at least one channel of the image.

How to Implement Image Fusion?

Researchers implement image fusion in multiple ways and here we present the most common methods.

Convolutional Neural Network

Zhang et al. (2021) created a CNN-based fusion framework to extract features and reconstruct images by using a carefully designed loss function. They utilized CNN as part of the overall fusion framework to perform activity-level monitoring and feature integration.

In their case of CNN for fusion, they combined loss function with classified CNN to perform medical IF. In addition, they embedded the fusion layer in the training process. Therefore, CNN reduces the constraints caused by manually designed fusion rules (maximum, minimum, or average).

CNN image fusion — IF Implementation by CNN – Source

Also, the researchers introduced other approaches:

A CNN-based end-to-end fusion framework, to avoid the drawbacks of manual rules.
Their CNN defines the objective function for IF with better precision and preservation of texture structure.
Zhang et al. modeled IF with gradient preservation, thus designing a general loss function for multiple fusion tasks.

Multiscale Transformation

Ma et al. (2023) performed the fusion process by using multiscale transformation:

They decomposed the image separately, to obtain different frequency levels, i.e. high-frequency and low-frequency sub-bands.
The team designed the optimal fusion calculation method as the fusion strategy. They utilized different characteristics of the high-frequency and low-frequency sub-bands.
To generate the fused image, they inverted the final fusion coefficients.

Multiscale IF — IF Implementation by Multiscale Transformation – Source

The researchers applied wavelet transform and geometric transform without subsampling in multiple scales and multiple directions.
Their multiscale transform-based fusion method introduces a fusion strategy according to the characteristics of different sub-bands. Thus, the fused image is rich in detailed information and low in redundancy.
The choice of a decomposition method and fusion rules is an important part of the fusion process. They determine whether the fused image can contain additional information than the original image.

Sparse Representation Model for IF

Compared to traditional multiscale transform, sparse representation has two main differences. The multiscale fusion method uses a preset basis function, which ignores some important features of the source image. The sparse representation learns over a complete feature set, which can better express and extract images.

In addition, the multiscale transform-based fusion method decomposes images into multiple layers, but the requirements for noise and registration are quite strict. The sparse representation uses a sliding window technique to segment the image into multiple overlapping segments, which improves robustness.

Sparse Representation for IF — Sparse Representation Model for IF – Source

The sparse representation method improves the problems of insufficient feature information and high registration requirements in the multiscale transformation. However, it still has some drawbacks, which are mainly present in the below two aspects.

The signal representation capability of the overcomplete dictionary is limited, which leads to the loss of image texture details.
Because of the sliding window, there’s an overlapping small block, which lowers the operational efficiency of the algorithm.

Applications of Image Fusion

The four main IF use cases are:

Robotic Vision

The robotic motion utilizes the fusion of infrared and visible images. Robots use infrared images to distinguish the target from the background, because of the difference in thermal radiation. Therefore, the illumination and weather conditions do not affect the fusion. However, infrared images don’t provide texture detail.

For their computer vision tasks, robots utilize visible light images. Because of the influence of the data collection environment, visible images may not show important targets. Infrared and visible light fusion methods overcome this drawback of a single image, thus extracting information.

Robotic vision Amazon humanoid robot — Robotic vision – Amazon humanoid robot – Source

The fusion images are usually clearer than the infrared images. In addition, robots perform a fusion of visible and infrared images, such as for autonomous driving and face recognition.

Medical Imagery

Today, medical imagery generates various types of medical images to help doctors diagnose diseases or injuries. Each type of image has its specific intensity. Therefore, IF has a high clinical application in the field of medical imaging modalities.

Medical imagery researchers combine redundant information and related information from different medical images, to create fused medical images. Thus they provide quality information-inspired image diagnosis for their medical examinations.

IF in Medical Imagery — Image Fusion in Medical Imagery – Source

The figure shows an example of image fusion for medical diagnostics by combining Computed Tomography (CT) and MRI. The data comes from a brain image dataset of combined tomography and magnetic resonance imaging (MedPix dataset).

Doctors use CT to analyze bone structures with high-spatial domain resolution, and MRI to detect soft tissues, such as the heart, eyes, and brain. MRI and CT are combined with image fusion technology to increase accuracy and medical applicability.

Defect Detection in Industry

Because of the constraints of industrial production conditions, workpiece defects are difficult to avoid. Typical defects include debris, porosity, and cracks inside the workpiece.

These defects increase during the use of the workpiece and affect its performance. Therefore they cause the workpiece to fail, shortening its service life, and threatening the safety of the machine.

IF for defect identification in industry — Image Fusion for defect identification in industry – Source

The current defect detection algorithms are generally divided into two groups:

Defect area segmentation, where all potential defect areas are segmented from a single image.
To detect different types of defects – manufacturers apply manually designed features. They are only applicable to specific defect detection, i.e. sizes of defects, diverse shapes, and complex background areas.

Agricultural Remote Sensing

Image fusion technology is also widely used in the field of agricultural remote sensing. By using agricultural remote sensing technology, farmers select the environment for the adaptation of plants and the detection of plant diseases.

Existing fusion technologies, including equipment such as ranging and optical detection, synthetic radar, and medium-resolution imaging spectrometers, all have applications in image fusion.

IF in Agricultural Remote Sensing — Image Fusion in Agricultural Remote Sensing – Source

Researchers utilize a region-based fusion scheme for combining panchromatic, multispectral, and synthetic aperture radar images. In addition, some farmers combine spectral information, radar range data, and optical detection.

Advantages and Drawbacks of IF

Advantages of IF

Benefits of image fusion include:

Image fusion reduces data storage and data transmission.
The price of IF is rather low and requires simple steps to perform fusion.
Teams use image fusion for image identification and registration.
It can produce a high-resolution output from foggy multiscale images.
The fused resulting images are easy to interpret and can be in color.
It increases situational and conditional awareness.
Image fusion enables one to read small signs in different images (applications).
Image enhancement from different perspectives leads to better contrast.

Drawbacks of IF

Image fusion has certain limitations, such as:

The processing of data is quite slow when images are fuzzy.
Fusion is sometimes complex and expensive because of the feature extraction and integration steps.
It requires time and efforts to define and select the proper features for each use case.
In the image fusion process, there are large chances of data loss.
In single-sensor fusion, images can be blurry in poor weather conditions.
In night-condition photos, it is difficult to perform image fusion.
For good visualization of images, it requires multi-sensor or multi-view fusion.

Summary

Image fusion is an important technique for the integration, and evaluation of data from multiple sources (sensors). It has many applications in computer vision, medical imaging, and remote sensing.

Image fusions with complex nonlinear distortions contribute to the robustness of the most complex computer vision methods.

Here are some additional resources to read more about computer vision tasks and learn more about the tasks performed in IF.

Object Localization and Image Localization
Grounded-SAM Explained: A New Image Segmentation Paradigm?
Image Registration and Its Applications
Image Data Augmentation for Computer Vision (2024 Guide)
Image Annotation: Best Software Tools and Solutions in 2024
Machine Vision – What You Need to Know (Overview)

An Introduction to Federated Learning

Federated learning is used for distributed training of machine learning algorithms on multiple edge devices without exchanging training data.

Explainable AI (XAI): The Complete Guide (2024)

Why did your model give that output? Learn how explainable ai gives insight into model decisions for trust and accountability in ML systems.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
ZCAMPAIGN_CSRF_TOKEN	session	This cookie is used to distinguish between humans and bots.
zfccn	session	Zoho sets this cookie for website security when a request is sent to campaigns.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_177371481_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
zabUserId	1 year	This cookie is set by Zoho and identifies whether users are returning or visiting the website for the first time
zabVisitId	one year	Used for identifying returning visits of users to the webpage.
zft-sdc	24hours	It records data about the user's navigation and behavior on the website. This is used to compile statistical reports and heat maps to improve the website experience.
zps-tgr-dts	1 year	These cookies are used to measure and analyze the traffic of this website and expire in 1 year.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
2d719b1dd3	session	This cookie has not yet been given a description. Our team is working to provide more information.
4662279173	session	This cookie is used by Zoho Page Sense to improve the user experience.
ad2d102645	session	This cookie has not yet been given a description. Our team is working to provide more information.
zc_consent	1 year	No description available.
zc_show	1 year	No description available.
zsc2feeae1d12f14395b6d5128904ae3746	1 minute	This cookie has not yet been given a description. Our team is working to provide more information.