AI Asset Assurance — Real-World AI Failures That Led to Lawsuits, Fines, and Regulatory Action

Note: All information on this page is sourced from public court filings, regulatory enforcement actions, investigative journalism, and published academic research. Where cases are ongoing, we note the current status. AI Asset Assurance does not make legal conclusions about any company's intent or negligence — we present what the public record shows and where proactive evaluation applies. Some allegations below are contested by the companies involved.

$2B+

Estimated exposure across cases below

90%

Denial reversal rate alleged in UnitedHealth case

200+

Applicants auto-rejected by age in iTutorGroup case

Major AI hiring lawsuits filed in 2024-2025

Financial services

Apple Card / Goldman Sachs — Gender bias allegations in credit limit algorithm

Cleared by NY DFS Fair lending ECOA

What happened

In 2019, a viral social media post revealed that a tech entrepreneur received a credit limit 20 times higher than his wife's, despite shared assets and her higher credit score. Apple co-founder Steve Wozniak reported a similar experience. The New York Department of Financial Services launched an investigation. In 2021, DFS cleared Goldman Sachs, finding no evidence of intentional discrimination or disparate impact based on analysis of nearly 400,000 NY applicants. However, DFS criticized Goldman for customer service shortcomings and lack of transparency in how the algorithm worked.

The cost of opacity — even when cleared

Even without a finding of bias, the incident triggered a multi-year regulatory investigation, massive reputational damage, and contributed to Goldman eventually exiting the consumer lending business entirely. The algorithm didn't use gender directly, but used features correlated with gender (e.g., supplemental card history). The investigation highlighted that regulators' traditional methods may not catch proxy discrimination.

What proactive evaluation provides

Shapley attribution would have identified proxy variables before launch — showing that supplemental card history disproportionately affects women even though gender isn't an explicit input. An evaluation report documenting this analysis and any remediation would have been available to DFS immediately, potentially avoiding the investigation entirely.

Sources: CNN Business, Nov 2019; NY DFS investigation, Mar 2021; TechCrunch analysis of DFS methodology gaps

SR 11-7 → GDPR Art. 22 →

SafeRent / PERQ — AI tenant screening with disparate impact on race

Settled — $2M+ Fair Housing Act

What happened

SafeRent settled for over $2 million after its AI-powered tenant screening system was found to have a disparate impact on non-white rental applicants. Separately, PERQ's "conversational AI leasing agent" was alleged to issue blanket rejections to applicants using housing choice vouchers, which disproportionately affected African-American renters. The PERQ case settled with agreement to allow outside review of application systems and anti-bias monitoring.

What went wrong

AI screening systems used data inputs that correlated with race (credit history patterns, prior address characteristics) without evaluating disparate impact. Blanket rules (e.g., rejecting voucher holders) produced discriminatory outcomes even without discriminatory intent.

What proactive evaluation detects

Cohort analysis by race would immediately surface disparate approval rates. Shapley attribution would identify which input variables — credit score components, address history, income verification methods — drive the racial disparities and by how much. This enables targeted remediation rather than removing the tool entirely.

Sources: Quinn Emanuel analysis, 2025; Open Communities v. Harbor Group, N.D. Ill.

California AI → Colorado AI Act →

Healthcare & insurance

Estate of Lokken v. UnitedHealth Group — AI algorithm allegedly used to deny Medicare Advantage claims

Class action — discovery ordered Medicare Breach of contract

What happened

Filed in 2023 by the families of two deceased Medicare Advantage members. The lawsuit alleges UnitedHealth's nH Predict algorithm — developed by subsidiary naviHealth (now Optum) — overrode physician determinations and denied medically necessary post-acute care. Plaintiffs allege the tool carries a 90% error rate, meaning nine of ten appealed denials were ultimately reversed. A 2024 Senate investigation found UnitedHealth's denial rate for post-acute care more than doubled after deploying nH Predict. In March 2026, a federal judge ordered UnitedHealth to produce broad discovery on the algorithm's implementation and use.

What went wrong

An AI tool was allegedly deployed to make coverage determinations that, per the insurance contract, should have been made by clinical staff or physicians. No independent evaluation of whether the tool's predictions aligned with actual clinical outcomes. The tool allegedly prioritized cost containment over medical necessity without transparency to patients or providers.

What proactive evaluation detects

Delta analysis between the algorithm's recommendations and physician determinations would surface systematic disagreement patterns. Shapley attribution would identify which patient characteristics (diagnosis, age, facility type, cost) drive denial decisions — revealing whether the model optimizes for clinical outcomes or cost reduction. Deployment integrity verification would confirm whether physician review actually occurs after the algorithm's recommendation.

Sources: CBS News, Nov 2023; Healthcare Finance News; Becker's Payer Issues, Mar 2026; Senate investigation, 2024

SR 11-7 → EU AI Act →

Cigna PXDX algorithm — AI allegedly enabled batch claim denials without individual review

Litigation pending Insurance law

What happened

Cigna was sued over an algorithm known as PXDX that allegedly enabled physicians to deny insurance claims in batches of hundreds or thousands at a time. The system matched treatment codes against preset criteria and auto-flagged mismatches for denial — without individual review of each patient's circumstances. Cigna disputed that the process involved AI, but the core allegation is that automated batch processing replaced individualized clinical review.

What went wrong

Automated matching replaced case-by-case physician review. No independent evaluation of whether batch-denial criteria produced equitable outcomes across patient demographics. No transparency to patients about how their claims were processed.

What proactive evaluation detects

Cohort analysis would reveal whether certain demographics, diagnoses, or treatment types are disproportionately denied. Deployment integrity verification would confirm whether the system as running matches what was described to regulators and plan members.

Sources: Healthcare Finance News; Georgetown Health Care Litigation Tracker

GDPR Art. 22 → SR 11-7 →

Hiring & employment

Mobley v. Workday — AI screening allegedly discriminates by age, race, and disability

Class action certified Title VII ADEA ADA

What happened

A plaintiff over age 40 alleges he was rejected from over 100 positions on Workday's AI-powered applicant screening platform due to algorithmic bias based on age, race, and disability. In May 2025, the court certified a collective action under the ADEA — potentially covering millions of applicants rejected since September 2020. The court held that Workday's AI participates in employment decision-making and cannot escape liability simply because discrimination happens through software rather than a human reviewer.

What went wrong

No independent evaluation of whether the AI screening tool produced disparate outcomes across protected categories. The tool made recommendations that employers relied on without understanding the model's internal decision patterns. No causal decomposition of which input variables drove rejection decisions.

What proactive evaluation detects

Shapley attribution would decompose rejection rates by protected category and identify which specific input variables correlate with age, race, or disability status. If the model uses features that act as proxies for protected characteristics, the evaluation surfaces them before the tool processes millions of applications.

Sources: N.D. Cal. 2025 WL 1424347; Fisher Phillips; Quinn Emanuel analysis

NYC LL144 → Colorado AI Act → California AI →

EEOC v. iTutorGroup — AI programmed to auto-reject applicants by age

Settled — $365K ADEA

What happened

The EEOC's first AI hiring discrimination lawsuit. iTutorGroup's AI recruitment software was programmed to automatically reject applications from women age 55+ and men age 60+. Over 200 qualified applicants were rejected solely because of their age, in violation of the Age Discrimination in Employment Act. The company settled for $365,000 in August 2023.

What went wrong

Age was used directly as a filtering criterion in the AI system. No independent audit verified the system's decision criteria before deployment. The discriminatory filter operated for an extended period before detection.

What proactive evaluation detects

Behavioral fingerprinting with demographically diverse probes would immediately detect a hard rejection boundary at specific ages. Shapley attribution would show age as the dominant factor in rejection decisions — a finding impossible to miss in any competent evaluation.

Sources: EEOC press release, August 2023; American Bar Association analysis

NYC LL144 → California AI →

ACLU v. Aon / HireVue — AI interview tools alleged to discriminate against disability and race

FTC complaint + EEOC charge ADA Title VII

What happened

The ACLU challenged Aon's AI hiring tools and HireVue's video interview platform for discriminating against people with disabilities and certain racial groups. In a separate Colorado case, a deaf applicant was denied employment after the AI concluded she was not "practicing active listening" during a video interview. CVS separately settled a proposed class action over HireVue's use of facial expression tracking to generate "employability scores."

What went wrong

AI video analysis tools measured behaviors — facial expressions, speech patterns, eye contact — that are inherently different across disability status, cultural background, and neurotype. These behavioral signals were treated as proxies for job performance without evaluating whether the correlation held equally across protected groups.

What proactive evaluation detects

Cohort analysis across disability status and racial groups would reveal disparate pass rates. Shapley attribution would identify which behavioral features (e.g., "active listening" score, facial expression metrics) drive the disparities — enabling remediation before deployment.

Sources: ACLU FTC complaint, May 2024; Fisher Phillips AI litigation tracker; CVS settlement, July 2024

Colorado AI Act → EU AI Act →

The pattern across every case

Every incident shares the same three failures

1. No independent evaluation

The AI system was deployed without evaluation by a party independent of the developer and deployer. Internal testing, if it existed, didn't catch the issue because the organization evaluating the system was the same organization that built it.

2. No causal understanding

When problems surfaced, no one could explain why the system produced the outcomes it did. Aggregate fairness metrics looked acceptable, but proxy variables and cohort-level disparities went undetected because no causal decomposition was performed.

3. No deployment verification

The system in production was never verified to match what was described to regulators, plan members, or job applicants. Model updates, configuration changes, and rule modifications occurred without re-evaluation.

These are the three gaps AI Asset Assurance is designed to close: structural independence (five separate validators), causal decomposition (Shapley attribution), and deployment integrity (sealed verification). One evaluation pipeline addresses all three.

What happens when AI systems aren't independently evaluated

What happened

The cost of opacity — even when cleared

What proactive evaluation provides

What happened

What went wrong

What proactive evaluation detects

What happened

What went wrong

What proactive evaluation detects

What happened

What went wrong

What proactive evaluation detects

What happened

What went wrong

What proactive evaluation detects

What happened

What went wrong

What proactive evaluation detects

What happened

What went wrong

What proactive evaluation detects

1. No independent evaluation

2. No causal understanding

3. No deployment verification

Don't become the next case study.