Its lack of any visual or spatial anchoring is pretty apparent -- for my code generation experiments, it even messed up basic shapes (made a square instead of a circle).
Yep, definitely. It's hard to say how much further along Google/DeepMind are since most of their stuff is not accessible, but their Flamingo project looks pretty interesting in terms of integrating the visual modality - and this is from back in April:
@Alex
Yep, definitely. It's hard to say how much further along Google/DeepMind are since most of their stuff is not accessible, but their Flamingo project looks pretty interesting in terms of integrating the visual modality - and this is from back in April:
https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model