A Cross-Modal Variational Framework For Food Image Analysis