Shadow Hugging Face Models in Code
Overview
Mend AI uses regex-based detection to identify shadow open-source AI models referenced in code by extracting direct and nearby references to models in widely used Hugging Face APIs, including pipelines, transformers, and diffusers.
Regex-based detection prioritizes accuracy while minimizing false-positives and reducing overall noise levels.
Furthermore, Mend AI detects references to the top 100 most commonly used licenses and gated models in codebases.
Limitations
Cases where string concatenation or complex variable assignments lead to model references are not currently supported.
Getting it done
Detected Hugging Face models are automatically added to the AI Inventory, where information about their origin, engine, and additional details will be available.

Code examples
Direct API Call with Model Name:
from transformers import pipeline
model = pipeline('text-classification', model='bert-base-uncased')
'bert-base-uncased' will be indicated as the detected model.
Model Name Stored in a Variable:
from transformers import pipeline
model_name = 'bert-base-uncased'
model = pipeline('text-classification', model=model_name)
'bert-base-uncased' will be extracted from the variable assignment and indicated as the detected model.
Model Name Passed to a Function:
from transformers import pipeline
def load_model(name):
return pipeline('text-classification', model=name)
model_instance = load_model('bert-base-uncased')
'bert-base-uncased' will be extracted from the function call and indicated as the detected model.
Diffusers:
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
"CompVis/stable-diffusion-v1-4" will be extracted and indicated as the detected model.
Variable Passed to a Function with Intermediate Assignment:
from transformers import AutoModel
model_name = "gpt-neo-1.3B"
def initialize_model(name):
model = AutoModel.from_pretrained(name)
return model
ai_model = initialize_model(model_name)
'gpt-neo-1.3B' will be extracted from the intermediate variable and indicated as the detected model.