AI Model Artifact Detection
Overview
Mend AI detects various types of AI model artifacts.
To minimize noise and increase accuracy, Mend AI automatically filters out irrelevant file types (e.g., .txt, .tar, .zip) and employs smart content validation to handle ambiguous extensions such as .pkl, .npy, etc.
Mend AI only detects files equal to or larger than 10MB in size.
Testing Artifact Detection with Git LFS
To properly test the AI Model Artifact Detection capability, it is highly recommended to use Git LFS (Large File Storage), since model artifacts are typically larger than 10MB.
Clone a Hugging Face repository.
Run
git lfs.Verify the files are on your file system.
Prerequisites
Before testing, ensure you have Git LFS installed. If not, install it following the Hugging Face requirements guide.
Testing Steps
Clone a repository with model artifacts from Hugging Face:
CODEgit clone https://huggingface.co/[model-name]Example repositories you can use for testing:
https://huggingface.co/bert-base-uncasedhttps://huggingface.co/gpt2
Pull the LFS files:
CODEcd [repository-name] git lfs pullVerify the files are present on your filesystem:
CODEls -lhLook for files larger than 10MB (typically
.bin,.safetensors, or.ckptfiles). These should now show their actual size rather than being LFS pointer files.Run the Mend artifact scanner:
Now you can test the artifact detection capabilities against these actual model files.
Common Issues
Issue: Scanner doesn't detect artifacts
Cause: Git LFS files weren't pulled, only pointer files exist
Solution: Run
git lfs pullto download the actual artifacts
Issue: Files appear to be only a few KBs in size
Cause: These are LFS pointer files, not the actual artifacts
Solution: Ensure Git LFS is installed and run
git lfs pull
AI Model Artifact File Types
"bin", "safetensors", "safetensorsc", "ggmlv3", "trt", "tdict", "safetens", "argosmodel", "pt",
"v2", "pth", "onnx", "ckpt", "gguf", "guff", "wv", "model", "weight", "weights",
"caffe", "caffemodel", "nemo", "pdmodel", "neuron", "skops", "pkl", "index", "npz", "npy",
"pb", "pickle", "qweight", "tfrecord", "engine", "pbmm", "scorer", "tflite", "data", "binary",
"llamafile", "dat", "mlmodel", "keras", "ggml", "safetensor", "tfbson", "tensors", "gguff", "mil",
"torch", "safetensorsa", "savetensors", "savetensor", "msgpack", "h5", "h5ad", "pyth", "wandb",
"onnx_data", "pdparams", "cleanrl_model", "qzeros", "safetesors",
"mar", "joblib", "cbm", "mlpackage", "plan", "dlc", "ubj", "pmml"