In early 2024, a finance worker at the engineering firm Arup joined what looked like a routine internal video call. The CFO was on screen. So were several colleagues. Every face was familiar. Every voice sounded right. By the end of the call, the worker had wired roughly $25 million to attacker-controlled accounts. Every single participant on that call — except the worker — was a deepfake video call generated in real time, as reported by CNN and the South China Morning Post.
That story used to be a one-off. In 2026, it is a category. A Singapore finance director wired about $499,000 to a fake CFO on a live deepfake video call this spring, and Skift Meetings warned the events industry that the same playbook is now turning up in conferences and board meetings. Real-time face-swap is no longer a research demo. It is a tool that fraud crews are running at scale.
If your team uses Zoom, Google Meet, Microsoft Teams, or any other video tool to approve money, contracts, or access, you need a working answer to one question: how do you know the person on the other side of a deepfake video call is actually them? This guide walks through five visual and behavioral tells, a four-step verification protocol, and the platform-level controls that turn deepfake resistance into a default rather than a tribal skill.
Why deepfake video calls became a 2026 problem
The shift from "interesting research" to "production fraud tool" happened in three steps, and all three are now visible in the data.
First, the models got fast. In 2022, generating a convincing deepfake of a specific person required hours of compute and a pre-recorded clip. By late 2025, open-source projects could face-swap in real time at 30 frames per second on a single consumer GPU. Anyone with a laptop and a public LinkedIn photo could now stand in for an executive on a deepfake video call.
Second, the criminal economy industrialized. Malwarebytes reported in March 2026 that scam compounds in Southeast Asia are now hiring "AI models" — humans paid to operate face-swap software during live calls — the same way they hire phone bankers. The work has been deskilled. You no longer need a deepfake expert; you need a script.
Third, the workplace became deepfake-shaped. According to Pew Research, 39% of US workers now use generative AI on the job, up from 22% a year earlier, but only 28% of employers have a written AI policy. Most teams approve high-stakes actions over a deepfake video call without a single identity check beyond "I recognize their face." That gap is exactly what fraud crews target.
The economic pressure is also unusually direct. The FBI's Internet Crime Complaint Center has tracked Business Email Compromise losses past $50 billion over the last decade, and BEC is now mutating into Business Video Compromise. The wire transfer that used to require a forged email thread now requires a 12-minute deepfake video call.
The 5 tells of a live deepfake video call
Real-time face-swap is good, but it is not perfect. Each shortcut a model takes to hit 30 fps shows up as a small, specific artifact. If you train your team to look for these five tells during any deepfake video call, you will catch most live attacks before money or access changes hands.
1. Face geometry that breaks at the edges
Real-time face-swap models replace the central face but rarely re-render the head, ears, or hairline correctly. Look at the boundary between the forehead and hairline, the join between jaw and neck, and the inside edge of the ears. Real video shows continuous lighting and shadow. A deepfake video call often shows a faint seam, color mismatch, or flicker when the person turns their head.
A useful test: ask the person to turn their head 90 degrees. Most consumer-grade face-swap pipelines fail above 45 degrees of yaw because the training data is dominated by frontal faces. The face will smear, double, or briefly snap back to a neutral pose.
2. Eye behavior that is too smooth or too still
Human eyes are messy. They saccade between points roughly three times a second, blink every four to six seconds, and dilate visibly under changing light. Real-time deepfakes still struggle with all three. Eye motion in a deepfake video call often looks slightly metronomic, blink rate runs lower than normal, and pupils tend to lag in reflective conditions.
A useful test: change the light in your own room mid-call (turn on a lamp, raise the blinds). A real person's pupils contract within a second. A deepfake's pupils usually do not respond at all.
3. Mouth shapes that do not match consonants
Audio-driven lip sync in real-time deepfakes maps phonemes to mouth poses, but it under-models hard consonants — particularly p, b, and m — which require lip closure. Watch the mouth, not the eyes, for two or three sentences. If the lips never fully close on plosives, you are likely watching a deepfake video call.
A useful test: ask the person to repeat a phrase like "buy a big bag of bagels." Real lips snap shut on every "b." Deepfake lips often hover.
4. Audio with the wrong room
Most deepfake operators clone the voice with one tool and the face with another. The voice often comes out studio-clean — no room reverb, no HVAC hum, no subtle breath noise. If your CEO normally calls in from a noisy open office and is suddenly broadcasting from an anechoic chamber, that is a signal worth pausing on.
A useful test: ask them to clap once, near the camera. Real audio captures the clap with the room's natural decay. Cloned audio often clips it or strips the reverb entirely.
5. Conversational latency that does not match content
Real-time deepfakes add 200-600 milliseconds of latency on top of the network. That delay shows up as a slightly delayed reaction to interruptions, late laughter at jokes, and a tendency to keep talking past a clear stop signal. If the person on a deepfake video call seems to be listening to you on a one-second tape delay even on a fast connection, treat that as a tell.
A useful test: cough or snap your fingers near the camera. Real attendees flinch in under 200 ms. Deepfake operators often miss the trigger entirely or react half a second late.
The 4-step verification protocol every team should adopt
The five tells above will catch sloppy attackers. They will not catch a well-funded crew with a $20,000 GPU rig and an actor who can stay in character. For high-stakes actions — wire transfers, contract signatures, credential resets, access grants — you need a verification protocol that does not rely on visual judgment alone. Here is the four-step protocol used by mature fraud-resistant finance teams in 2026.
Step 1: Out-of-band callback
After any video call where money or access is approved, call the requester back on a number you already had on file — not a number provided in the meeting or chat. The five seconds of friction this adds is the single highest-ROI control in any anti-fraud playbook. The Arup case would have been stopped here.
Step 2: Shared-secret challenge
For executives and finance roles, agree on a rolling code phrase that is rotated weekly and never written in any system that an attacker could read (so: not Slack, not email, not your CRM). On any unusual deepfake-prone call, ask for the phrase. A real CFO will sigh and answer. A deepfake video call will improvise, deflect, or pretend the audio dropped.
Step 3: Live action that breaks face-swap
Ask the person to do something a real-time face-swap model handles badly. Common requests: place an open hand briefly over your face, turn your head 90 degrees and read aloud what is on the wall behind you, hold up a specific finger count, write a number on paper and show it to camera. Most consumer face-swap pipelines visibly fail on at least one of these.
Step 4: Logged confirmation in a separate channel
Before any irreversible action, get written confirmation in a system that requires authenticated login (your finance tool, your ITSM ticketing, your DocuSign). The point is not to "double-check" — it is to force an attacker who beat the deepfake video call to also beat your single sign-on, which is a different and much harder problem.
For deeper coverage of related identity threats in remote work, see our guide on how to detect AI interview fraud, which extends this protocol to hiring.
How to design a meeting platform for deepfake resistance
Individual vigilance does not scale. The point of a meeting platform in 2026 is to make deepfake resistance the default rather than a heroic act by every employee. Four design principles that should now be table stakes:
Verified attendee badges. Every participant joins through an SSO-authenticated link tied to their corporate identity, and the platform displays a verified badge next to their name on screen. A face is not an identity. A signed identity claim is.
Side-channel liveness signals. During high-stakes calls, the platform can issue a discreet liveness challenge (a brief on-screen pattern the attendee must follow with their eyes or a hand) that is hard for real-time face-swap to replicate. Done well, this is invisible to humans and ruinous to deepfake operators.
Recording with provenance. Every recording should carry a cryptographic signature tied to the attendee's verified identity. If the recording later becomes evidence — in court or in HR — provenance prevents an attacker from claiming the deepfake was the real call. Our checklist on AI notetaker security evaluation covers the same provenance question for AI-generated artifacts.
Contextual AI that knows who is in the room. This is where canvas-grounded meeting AI matters: an assistant that ties decisions, action items, and dollar amounts to which authenticated person said what creates an audit trail no deepfake video call can fake. Coommit was built around this principle — video, interactive canvas, and contextual AI in one platform — so the artifacts of a meeting are always linked to verified participants, not to faces.
For a broader view of how the meeting recording category is shifting under trust pressure, see our AI meeting recording trust crisis analysis. The same forces that are stressing recordings are stressing live calls — only faster.
What to do if you suspect a deepfake mid-call
The hardest moment is the one most teams have not rehearsed. You are five minutes into a call. Something is wrong. The CFO sounds slightly off. The lighting is too clean. They are pushing you to approve a transfer in the next ten minutes. What now?
Do not accuse. Buy time. Three moves that work:
- Trigger Step 3 of the verification protocol. Ask for a hand-over-face or a behind-the-shoulder reveal. Phrase it as a connection check ("Your camera looks weird, can you wave your hand in front of it?"). A real person complies in three seconds. A deepfake video call stalls, freezes, or degrades visibly.
- Pause the action. Tell them you need to verify with one other person before approving. If they push back hard against a 30-minute delay, that is itself a tell. Real executives almost always prefer a small delay over a wrong wire.
- Hang up and call back on a known number. If they were real, you have lost a few minutes. If they were not, you have stopped a six- or seven-figure loss.
After the call, log everything: the meeting ID, the phone number, the wallet or account details, any chat artifacts. Send the package to your security team and, if money was involved, file with the FBI's IC3 within 24 hours. BEC complaints filed within 72 hours have meaningfully higher recovery rates.
If your platform supports it, also check the meeting recording with provenance signatures. Increasingly, what proves the call was a deepfake video call is the absence of a verified-identity signature on the attacker's stream — not anything you saw with your eyes.
For teams formalizing these controls, we cover the broader governance question in AI governance for teams, and the policy framing in shadow AI policy templates.
A practical takeaway
The honest summary of where we are in April 2026: real-time face-swap is good enough to fool a stressed employee on a five-minute deepfake video call, and it will only get better. The defense is not "look harder." The defense is structural — out-of-band verification, shared-secret challenges, identity-bound meeting platforms, and a culture in which a 30-minute delay on a wire transfer is a feature rather than a friction point.
Train the five tells. Adopt the four-step protocol. Push your meeting platform to treat identity as a first-class object. The teams that do this in 2026 will lose roughly zero dollars to deepfakes. The teams that do not will donate to a fraud compound.