Has anyone explored "semantic tokenization" in #machinelearning? For example instead of cutting up an image into little squares to use as tokens, separating its objects "by meaning" like a segmentation model would. I wonder if there's a generic way to do this for any modality.
Comments