Google’s new deepfake hunter sees what you cannot

Investigating deepfake technology. Image by Tim Sandle

AI-generated videos are becoming dangerously convincing. To challenge this dangerous misinformation University of California – Riverside researchers have teamed up with Google to ‘fight back’.

As a result, the new system, called Universal Network for Identifying Tampered and synthEtic videos (UNITE), can detect deepfakes even when faces are not visible. The technology goes beyond traditional methods for detecting video tampering by scanning backgrounds, motion, and subtle cues.

As fake content becomes easier to generate and harder to detect, the universal tool could become essential for newsrooms and social media platforms trying to safeguard the truth.

Deepfakes (“deep learning” plus “fake”) are videos, pictures or audio clips made with artificial intelligence to look real. While they can be used for fun, increasingly they are being used to impersonate people in order to deliberately mislead people.

Limitations of current technology

With current deepfake detectors there are limitations: If there’s no face in the frame, many detectors simply will not work. The concern is that disinformation can come in many forms. For example, altering a scene’s background can distort the truth just as easily as creating fake vocals.

How Google’s new technology works

The technology detects forgeries by examining not only faces but complete video frames, including backgrounds and motion patterns. This analysis makes it the first tool capable of identifying synthetic or doctored videos that do not rely simply upon facial content.

UNITE uses a transformer-based deep learning model to analyse video clips. It detects subtle spatial and temporal inconsistencies — cues often missed by previous systems. The model draws on a foundational AI framework known as Sigmoid Loss for Language Image Pre-Training (SigLIP), which extracts features not bound to a specific person or object. A novel training method, dubbed “attention-diversity loss,” prompts the system to monitor multiple visual regions in each frame, preventing it from focusing solely on faces.

The collaboration with Google provided the researchers with access to expansive datasets and computing resources needed to train the model on a broad range of synthetic content, including videos generated from text or still images — formats that often confuse and challenge existing detectors.

The result is a universal detector capable of flagging a range of forgeries — from simple facial swaps to complex, fully synthetic videos generated without any real footage.

Why it matters

UNITE’s development comes as text-to-video and image-to-video generation have become widely available online. These AI platforms enable virtually anyone to fabricate highly convincing videos, posing serious risks to individuals, institutions, and, arguably, depending on where one resides, democracy itself.

The researchers presented their findings at the 2025 Conference on Computer Vision and Pattern Recognition (CVPR) in Nashville, U.S. Titled “Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content,” their paper outlines UNITE’s architecture and training methodology.

Google’s new deepfake hunter sees what you cannot

#Googles #deepfake #hunter #sees