What are structured, unstructured, and semi-structured documents?

Structured documents are documents that follow a consistent, expected format. Information is always located in the same spot on the document, which makes it easy to extract and analyze. Examples include tax forms, like a Form I-9, Form W-2, Form W-9, and Form CP-575. Unstructured documents, on the other hand, do not follow a consistent format. They can vary significantly depending on who created the document, making it more difficult to extract and analyze the information that they contain. Examples include contracts, emails, letters of recommendation, and shareholder agreements. Finally, semi-structured documents fall somewhere in between structured and unstructured. They will often contain the same information from example to example, but vary in layout or composition. Examples include invoices, receipts, utility bills, pay stubs, bank statements, educational transcripts, and medical records.

Why is verifying unstructured and semi-structured documents difficult?

Unstructured and semi-structured documents, due to their variability, can pose a challenge for data extraction, validation, and verification. Some document fraud solutions can only handle a very limited number of document formats, or only work with clearly and consistently structured documents.

How to Detect the New Wave of Document Fraud

Supplemental document checks are often required for businesses that conduct Know Your Customer (KYC) or Know Your Business (KYB) checks. Even when compliance isn’t required, organizations often collect supplemental documents for their own business purposes, such as risk assessments.

In business contexts, a supplemental document is a non-government-issued document that you collect to support a risk assessment. You might ask for these from a potential customer, current user, employee, business partner, or other party. They’re valuable because they provide you with additional information about the individual or business.

Failing to spot a fraudulent document can lead to compliance and fraud risk. And it’s a growing threat, thanks in large part to the widespread availability of AI tools that allow non-technical fraudsters to generate realistic-looking documents.

Below, we review the different types of document fraud businesses should watch out for and highlight the key fraud capabilities you should look for when evaluating document verification solutions.

Types of document fraud

While document fraud is often portrayed as a monolith, the truth is that it can take many forms. These include:

Document alteration: A fraudster tampers with a legitimate document by changing a name, address, or other information. Alterations can include physical tampering and digital alterations with editing software or AI.
Document forgery: A fraudster creates a fake document. Traditionally, this required a deep understanding of document templates, formats, security features, and artistic techniques. In recent years, fraudsters have increasingly leveraged AI to generate realistic artifacts, like documents and selfies.
Document theft: Fraudsters use stolen documents to engage in identity theft. Fraudsters may also pair stolen documents with document alteration and tampering to create a synthetic identity that consists of partially legitimate information.

It’s important to note that while these different forms of document fraud aren’t new, they’ve become a larger problem in recent years due to a number of factors:

The general difficulty of visually inspecting documents can decrease conversions, increase friction during onboarding, and create a heavier workload for reviewers.
Data breaches have exposed massive troves of sensitive data and PII, which fraudsters use to steal identities and generate synthetic IDs.
Fraudsters have widely adopted AI tools to quickly generate realistic forgeries or alter legitimate documents.
Businesses have increasingly moved online, enabling digital onboarding and other processes that make it easier for fraudsters to scale fraud.
Fraudsters are increasingly coordinating to create and share tools, tactics, and cash-out partners.

Supplemental documents are prime targets for fraudsters because they often lack many security features or rarely follow a standardized layout. Even the same type of document might contain different information depending on the issuer and region. As a result, organizations often struggle to distinguish between authentic and fraudulent documents.

Characteristics of an ideal document fraud solution

One way to ease the burden on your team is to deploy an automated document fraud solution. But if you want to address the varying use cases and types of document fraud, you’ll want a solution that offers:

Document-level detection
Broader checks
Customized experiences

Here’s a closer look at each one.

Document-level detection

Because document fraud is diverse and the techniques regularly change, there’s no silver bullet or set of indicators that detect every type of document fraud. In fact, overrelying on a single risk signal is a surefire way to inadvertently allow fraudulent documents through your vetting process.

With this in mind, it’s important to choose a document fraud solution capable of analyzing multiple signals associated with the submitted document. Some important document-level fraud capabilities include:

Pixel-level analysis

While it can be difficult to distinguish fraudulent and authentic documents simply by looking at them, visual signals can help you detect certain instances of fraud. For one, there may be obvious annotations or other indicators of tampering that can lead you to reject a submission right off the bat. The most effective document fraud detection systems will also analyze individual pixels to spot visual artifacts, font inconsistencies, misalignment, and other tells that might be invisible to the human eye.

Metadata analysis

The document's metadata can indicate how it was created, saved, or modified. For example, document metadata checks can detect that a document was modified by a PDF editor, which may raise a red flag. Finding one instance of risky metadata may not be enough to reject a submission, but the more data you have, the more robust your understanding and the more likely you are to be confident in your determination.

Contextual analysis

Context is important because it can tell you what types of information the document usually contains and which fields are most likely to be altered. Many solutions look for generic signs of tampering or forgery, but a solution that considers the document type will be better at detecting mismatched names, incorrect account numbers, and math errors that indicate fraud.

Streamlined review process

Even the best solution can’t automate every decision, but it should make manual reviews easier for your team. For example, the tool can help by highlighting potentially suspicious parts of the documents, sharing insights from non-visual signals, and showing you legitimate documents as a comparison point. Ideally, it can also show you relevant information about how the document was submitted.

Broader checks

In addition to considering the document itself, a best-in-class solution will consider signals related to the document submission and your environment. Examples of these broader checks include:

Consistency checks

Inconsistencies between the data submitted by a user are a common sign of fraud. These inconsistencies can have a number of causes, including a fraudster:

Submitting different details because the data is fabricated
Making mistakes when creating the document
Using documents that were stolen from multiple individuals
Relying on an AI tool that hallucinates text when generating documents

For this reason, it’s important to choose a document verification solution that includes built-in cross-submission checks for inconsistencies. Similarly, a solution that can verify the data contained within a document by cross-checking it against one or more authoritative or issuing data sources can decrease the risk of fraudulent documents making it through your defenses.

Environmental signals

Solutions that can collect and analyze passive signals during the document submission process offer an additional layer of context to help you make a final determination. For example, you might be able to better assess risk based on the user’s:

IP address
VPN, proxy, and Tor detection
Location data
Browser or device fingerprint
Flow completion time
Number of distraction events

A user who is hiding their IP address and is located far from the address on their document might be slightly more suspicious. So might a user who’s copying and pasting their form inputs instead of typing them in.

Population-level analysis

When fraudsters manage to break through a company’s defenses, they rarely stop at just one fraudulent account or transaction. Instead, they tend to reuse assets or parts of assets to open additional accounts, and they often share or sell information to other fraudsters. Population-level analysis can help you identify patterns among users and documents.

For example, a user on a device previously associated with fraudulent activity should definitely raise flags. With real-time link analysis, it’s possible to uncover users who share names, dates of birth, contact information, and document numbers. Some tools can even analyze the images to spot templates or similarities that could indicate a large fraud ring.

Customized experiences

Finally, your solution should make it easy to design and deploy a document verification process that’s tailored to your business and works across your user life cycle. Some capabilities that can help include:

Progressive risk segmentation

Progressive risk segmentation is a risk mitigation technique that automatically subjects customers to different verifications, screenings, and checks based on real-time risk signals. For example, you can automatically require step-up verifications, such as asking for additional documents or screening against an authoritative database, when you suspect document fraud. But you can avoid these extra checks to save time or money when the user and documents are at a low risk of fraud.

Automated workflows

Look for a solution that automates follow-up actions based on real-time fraud insights. Examples might include automatically rejecting or blocking a customer if certain fraud signals are detected; flagging a case for manual review and final decisioning; or kicking off actions in external systems (like flagging a customer profile from your CRM).

How Persona detects sophisticated fraud and automates document verification

Identifying fake and tampered documents often requires a layered approach that considers the document, submission process, and broader environment. Combining these signals can help you detect sophisticated instances of fraud, but only if you have the right systems in place.

Regardless of the industry you operate within, Persona’s document verification solution can support you with:

Document-level capabilities: Multi-signal analysis, contextual analysis, and AI-detection to help you determine if a document is legitimate and contains the information you expect.
Broader signals: The ability to cross-check data between documents and against issuing and authoritative databases for verification purposes. Environmental signals to paint a broader picture of a customer’s risk. Population-level analysis to uncover suspicious links between accounts on your platform.
Customized experiences: Automation workflows that free up your team to perform higher-value tasks like resolving edge cases, and progressive risk segmentation to scale friction up or down in real-time based on the risk signals detected.

Persona’s Document AI verifications also work alongside a broader suite of identity tools, including ID verification, selfie verification, ancillary checks (sanctions list, watchlist, PEP, etc.), and link analysis. The result is a robust identity verification platform that offers a holistic approach to reducing fraud while verifying documents and identities for compliance and risk mitigation.

Ready to learn more about how Persona can help you prevent document fraud and other identity challenges? Reach out to a member of our team today with any questions, or sign up for a free demo.

How to detect the new wave of document fraud