I don't think LLMs can achieve "understanding" in that sense.
These aren't LLM. Most of the neat things in science, involving AI, aren't LLM. Next word prediction has extremely limited use with non-text data.
People seem to have started to use "LLM" to refer to any suite of software that includes an LLM somewhere within it; you can see them talking about LLM-generated art, for example.
Was it ascii art? ;)
https://hamatti.org/posts/art-forgery-llms-and-why-it-feels-...
People will just believe whatever they hear.
Computer vision models are not large language models; LLM does not mean generative AI or even AI in general, it stands for a specific initialism.