This model is available for MLX now, in various different sizes.
I ran https://huggingface.co/mlx-community/Qwen2.5-VL-32B-Instruct... using uv (so no need to install libraries first) and https://github.com/Blaizzy/mlx-vlm like this:
uv run --with 'numpy<2' --with mlx-vlm \
python -m mlx_vlm.generate \
--model mlx-community/Qwen2.5-VL-32B-Instruct-4bit \
--max-tokens 1000 \
--temperature 0.0 \
--prompt "Describe this image." \
--image Mpaboundrycdfw-1.png
That downloaded an ~18GB model and gave me a VERY impressive result, shown at the bottom here: https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/ Does quantised MLX support vision though?
Is UV the best way to run it?
uv is just a Python package manager. No idea why they thought it was relevant to mention that
Because that one-liner will result in the model instantly running on your machine, which is much more useful than trying to figure out all the dependencies, invariably failing, and deciding that technology is horrible and that all you ever wanted was to be a carpenter.
Right: I could give you a recipe that tells you to first create a Python virtual environment, then install mlx-vlm, then make sure to downgrade to numpy 1.0 because some of the underlying libraries don't work with numpy 2.0 yet...
... or I can give you a one-liner that does all of that with uv.
python-specific side question -- is there some indication in the python ecosystems that Numpy 2x is not getting adoption? numpy-1.26 looks like 'stable' from here
I think it's just that it's a breaking change to a fundamental library, so it will take many months for the ecosystem to upgrade.
Similar thing happened when Pydantic upgraded from 1 to 2.
I have a project on torch 2.6 and numpy 2.2. I never had any issues with that combination.