Every case below is drawn from public reporting, court filings, or regulatory enforcement actions. Each one shows a real organization that deployed AI without adequate independent evaluation — and the consequences that followed.
In 2019, a viral social media post revealed that a tech entrepreneur received a credit limit 20 times higher than his wife's, despite shared assets and her higher credit score. Apple co-founder Steve Wozniak reported a similar experience. The New York Department of Financial Services launched an investigation. In 2021, DFS cleared Goldman Sachs, finding no evidence of intentional discrimination or disparate impact based on analysis of nearly 400,000 NY applicants. However, DFS criticized Goldman for customer service shortcomings and lack of transparency in how the algorithm worked.
Even without a finding of bias, the incident triggered a multi-year regulatory investigation, massive reputational damage, and contributed to Goldman eventually exiting the consumer lending business entirely. The algorithm didn't use gender directly, but used features correlated with gender (e.g., supplemental card history). The investigation highlighted that regulators' traditional methods may not catch proxy discrimination.
Shapley attribution would have identified proxy variables before launch — showing that supplemental card history disproportionately affects women even though gender isn't an explicit input. An evaluation report documenting this analysis and any remediation would have been available to DFS immediately, potentially avoiding the investigation entirely.
SafeRent settled for over $2 million after its AI-powered tenant screening system was found to have a disparate impact on non-white rental applicants. Separately, PERQ's "conversational AI leasing agent" was alleged to issue blanket rejections to applicants using housing choice vouchers, which disproportionately affected African-American renters. The PERQ case settled with agreement to allow outside review of application systems and anti-bias monitoring.
AI screening systems used data inputs that correlated with race (credit history patterns, prior address characteristics) without evaluating disparate impact. Blanket rules (e.g., rejecting voucher holders) produced discriminatory outcomes even without discriminatory intent.
Cohort analysis by race would immediately surface disparate approval rates. Shapley attribution would identify which input variables — credit score components, address history, income verification methods — drive the racial disparities and by how much. This enables targeted remediation rather than removing the tool entirely.
Filed in 2023 by the families of two deceased Medicare Advantage members. The lawsuit alleges UnitedHealth's nH Predict algorithm — developed by subsidiary naviHealth (now Optum) — overrode physician determinations and denied medically necessary post-acute care. Plaintiffs allege the tool carries a 90% error rate, meaning nine of ten appealed denials were ultimately reversed. A 2024 Senate investigation found UnitedHealth's denial rate for post-acute care more than doubled after deploying nH Predict. In March 2026, a federal judge ordered UnitedHealth to produce broad discovery on the algorithm's implementation and use.
An AI tool was allegedly deployed to make coverage determinations that, per the insurance contract, should have been made by clinical staff or physicians. No independent evaluation of whether the tool's predictions aligned with actual clinical outcomes. The tool allegedly prioritized cost containment over medical necessity without transparency to patients or providers.
Delta analysis between the algorithm's recommendations and physician determinations would surface systematic disagreement patterns. Shapley attribution would identify which patient characteristics (diagnosis, age, facility type, cost) drive denial decisions — revealing whether the model optimizes for clinical outcomes or cost reduction. Deployment integrity verification would confirm whether physician review actually occurs after the algorithm's recommendation.
Cigna was sued over an algorithm known as PXDX that allegedly enabled physicians to deny insurance claims in batches of hundreds or thousands at a time. The system matched treatment codes against preset criteria and auto-flagged mismatches for denial — without individual review of each patient's circumstances. Cigna disputed that the process involved AI, but the core allegation is that automated batch processing replaced individualized clinical review.
Automated matching replaced case-by-case physician review. No independent evaluation of whether batch-denial criteria produced equitable outcomes across patient demographics. No transparency to patients about how their claims were processed.
Cohort analysis would reveal whether certain demographics, diagnoses, or treatment types are disproportionately denied. Deployment integrity verification would confirm whether the system as running matches what was described to regulators and plan members.
A plaintiff over age 40 alleges he was rejected from over 100 positions on Workday's AI-powered applicant screening platform due to algorithmic bias based on age, race, and disability. In May 2025, the court certified a collective action under the ADEA — potentially covering millions of applicants rejected since September 2020. The court held that Workday's AI participates in employment decision-making and cannot escape liability simply because discrimination happens through software rather than a human reviewer.
No independent evaluation of whether the AI screening tool produced disparate outcomes across protected categories. The tool made recommendations that employers relied on without understanding the model's internal decision patterns. No causal decomposition of which input variables drove rejection decisions.
Shapley attribution would decompose rejection rates by protected category and identify which specific input variables correlate with age, race, or disability status. If the model uses features that act as proxies for protected characteristics, the evaluation surfaces them before the tool processes millions of applications.
The EEOC's first AI hiring discrimination lawsuit. iTutorGroup's AI recruitment software was programmed to automatically reject applications from women age 55+ and men age 60+. Over 200 qualified applicants were rejected solely because of their age, in violation of the Age Discrimination in Employment Act. The company settled for $365,000 in August 2023.
Age was used directly as a filtering criterion in the AI system. No independent audit verified the system's decision criteria before deployment. The discriminatory filter operated for an extended period before detection.
Behavioral fingerprinting with demographically diverse probes would immediately detect a hard rejection boundary at specific ages. Shapley attribution would show age as the dominant factor in rejection decisions — a finding impossible to miss in any competent evaluation.
The ACLU challenged Aon's AI hiring tools and HireVue's video interview platform for discriminating against people with disabilities and certain racial groups. In a separate Colorado case, a deaf applicant was denied employment after the AI concluded she was not "practicing active listening" during a video interview. CVS separately settled a proposed class action over HireVue's use of facial expression tracking to generate "employability scores."
AI video analysis tools measured behaviors — facial expressions, speech patterns, eye contact — that are inherently different across disability status, cultural background, and neurotype. These behavioral signals were treated as proxies for job performance without evaluating whether the correlation held equally across protected groups.
Cohort analysis across disability status and racial groups would reveal disparate pass rates. Shapley attribution would identify which behavioral features (e.g., "active listening" score, facial expression metrics) drive the disparities — enabling remediation before deployment.
The AI system was deployed without evaluation by a party independent of the developer and deployer. Internal testing, if it existed, didn't catch the issue because the organization evaluating the system was the same organization that built it.
When problems surfaced, no one could explain why the system produced the outcomes it did. Aggregate fairness metrics looked acceptable, but proxy variables and cohort-level disparities went undetected because no causal decomposition was performed.
The system in production was never verified to match what was described to regulators, plan members, or job applicants. Model updates, configuration changes, and rule modifications occurred without re-evaluation.
These are the three gaps AI Asset Assurance is designed to close: structural independence (five separate validators), causal decomposition (Shapley attribution), and deployment integrity (sealed verification). One evaluation pipeline addresses all three.
Proactive evaluation costs a fraction of what these organizations paid in settlements, legal fees, regulatory investigations, and reputational damage.
Request evaluation