**Alex Vranas** @Alex@vran.as · Dec 10, 2022, 21:53

**Alex Vranas** @Alex@vran.as · Dec 10, 2022, 21:53

Alex Vranas @Alex@vran.as

Dec 10, 2022, 21:53

Alex Vranas @Alex@vran.as

I asked #ChatGPT to draw a picture of a cat and a dog in an HTML canvas, and uhh...

**Erik Moeller** @eloquence@social.coop · Dec 10, 2022, 21:56

**Erik Moeller** @eloquence@social.coop · Dec 10, 2022, 21:56

Dec 10, 2022, 21:56

Erik Moeller @eloquence@social.coop

@Alex

Its lack of any visual or spatial anchoring is pretty apparent -- for my code generation experiments, it even messed up basic shapes (made a square instead of a circle).

**Alex Vranas** @Alex@vran.as · 2022-12-10T22:42:41Z

Alex Vranas @Alex@vran.as

@eloquence Had the same experience when I asked to animate a spinning pentagram. I got a spinning square, but all things considered, it's still neat.

Dec 10, 2022, 22:42 · · Web · · ·

**Erik Moeller** @eloquence@social.coop · Dec 10, 2022, 23:00

**Erik Moeller** @eloquence@social.coop · Dec 10, 2022, 23:00

Dec 10, 2022, 23:00

Erik Moeller @eloquence@social.coop

@Alex

Yep, definitely. It's hard to say how much further along Google/DeepMind are since most of their stuff is not accessible, but their Flamingo project looks pretty interesting in terms of integrating the visual modality - and this is from back in April:

https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model

Trending now

Resources

Developers

What is Mastodon?

vran.as

More…