Remove author sion
article thumbnail

What happened in 2023

Bugra Akyildiz

The authors studied 4,096 features in detail. They present Vi sion- L anguage L earning with A ttributes (ViLLA), which leverages self-supervised learning in order to capture fine-grained region-attribute relationships from complex datasets. Model properties They are extracted from a one-layer transformer with a 512-neuron MLP layer.