Skip to main content
Skip table of contents

Hugging Face Unsafe Models

Overview

Companies rely on Hugging Face, employing scanners such as Pickle Scan and Protect AI for vulnerability information, however many reported vulnerabilities might be unverified or false positives, leading to uncertainty in decision-making. By validating vulnerabilities independently and exposing their statuses in the Mend AppSec Platform, Mend AI provides a more accurate, security-driven assessment of model risk.

Mend AI’s Analysis of Hugging Face Models

Mend AI verifies Hugging Face’s tagging of unsafe models, providing a clear indication whether a model deemed by Hugging Face as unsafe is indeed unsafe or, alternatively, a false-positive.

Model Safety Statuses

There are 4 Hugging Face model safety statuses in the Mend AppSec Platform UI. Note that the latter two are assigned as part of the Mend AI Research team’s analysis of the model in question. The first two statuses can evolve into one of the latter two statuses.

  • No Findings – No known vulnerabilities detected.

  • Suspected – Suspected Unsafe, i.e., tagged by Hugging Face as unsafe, but not reviewed by the Mend AI Research team.

    image-20250320-193020.png
  • Validated – Unsafe, i.e., reproduced and verified by the Mend AI Research team.

    image-20250320-193131.png
  • False-Positive – Safe, i.e., reported by Hugging Face as unsafe but refuted by the Mend AI Research team.

    image-20250320-193056.png

The Mend AI Model Selection Criteria

The criteria are based on three key parameters:

  1. The presence of a "Suspected Unsafe" tag.

  2. The model popularity, based on download count.

  3. The "Suspected Unsafe" tag referencing files other than training_args.bin (a file used exclusively for fine-tuning and not loaded during the standard model loading process).

The Mend AI Investigation Process

The technical investigation is comprised of the following:

  1. An automated analysis using a Mend.io proprietary scanning tool.

  2. A manual code review by Mend.io Researchers, to determine whether genuine malicious code exists within the model and whether it poses an actual risk to users.

The investigation process can yield one of two statuses:

  • If vulnerabilities are found in the model, its status changes to Confirmed Unsafe.

  • If the model is deemed safe, its status changes to Known False Positive.

Example

When the Mend AI Research team detects potentially suspicious code elements such as builtins.getattr(), they thoroughly examine the implementation context. Based on the findings, they assign either a Confirmed Unsafe status if malicious intent is confirmed, or a Known False-Positive status if the code is determined to be benign.

Unsafe Models in the Mend Platform UI

The model statuses will be visible to you in the Risk Factors column of the AI Models table.

image-20250317-112323.png

Note: A License Risk column is also available in this table, providing compliance-related risk information about the model in question.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.