How to choose a sufficient and limited set of tags for tagging things is a frequently asked question in Personal Information Management. Those sets of tags are called controlled vocabulary (CV). I wrote about some aspects about CVs in this article about classification and in this article about folder hierarchies. Please do read my general recommendations on using tags in an efficient way.
This article follows one approach to classify general natural objects within 49 or 17 general dimensions. Those dimensions can be used for tagging arbitrary natural objects. The neat thing about this approach is that the way the dimensions were derived should assure a maximum consensus.
How I Learned About This Approach
While listening to a German podcast episode, I learned about the scientific research paper "Revealing the multidimensional mental representations of natural objects underlying human similarity judgements". Here is the abstract:
Objects can be characterized according to a vast number of possible criteria (such as animacy, shape, colour and function), but some dimensions are more useful than others for making sense of the objects around us. To identify these core dimensions of object representations, we developed a data-driven computational model of similarity judgements for real-world images of 1,854 objects. The model captured most explainable variance in similarity judgements and produced 49 highly reproducible and meaningful object dimensions that reflect various conceptual and perceptual properties of those objects. These dimensions predicted external categorization behaviour and reflected typicality judgements of those categories. Furthermore, humans can accurately rate objects along these dimensions, highlighting their interpretability and opening up a way to generate similarity estimates from object dimensions alone. Collectively, these results demonstrate that human similarity judgements can be captured by a fairly low-dimensional, interpretable embedding that generalizes to external behaviour.
The original paper is locked away by Elsevier here. Luckily, there is a preprint version of the paper here.
I also found this Twitter thread of one of the authors.
Here, I don't want to discuss the paper in detail. Please do read it yourself. This article is using the results of that paper and applies it to tagging processes.
Method
Researchers of the paper derived conceptual and perceptual properties of real-world images of 1,854 objects. Thousands of people from the USA were asked to chose one non-matching image within sets of three images (odd-one-out task).
The result was then interpreted by researchers that came up with 49 dimensions that should be sufficient to classify general objects. Using reasonable simplification this was even more reduced to 17 dimensions.
I extracted those dimensions for you:
49 Dimensions
From: "Extended Data Figure 2"
- made of metal/artificial/hard
- food-related/eating-related/kitchen-related
- animal-related/organic
- clothing-related/fabric/covering
- furniture-related/household-related/artifact
- plant-related/green
- outdoors-related
- transportation/motorized/dynamic
- wood-related/brown
- body part-related
- colorful
- valuable/special occasion-related
- electronic/technology
- sport-related/recreation-related
- disc-shaped/round
- tool-related
- many small things/coarse pattern
- paper-related/thin/flat/text-related
- fluid-related/drink-related
- long/thin
- water-related/blue
- powdery/fine-scale pattern
- red
- feminine (stereotypically)/decorative
- bathroom-related/sanitary
- black/noble
- weapon/danger-related/violence
- music instrument-related/noise-related
- sky-related/flying-related/floating-related
- spherical/ellipsoid/rounded/voluminous
- repetitive
- flat/patterned
- white
- thin/flat
- disgusting/bugs
- string-related
- arms/legs/skin-related
- shiny/transparent
- construction-related/physical work-related
- fire-related/heat-related
- head-related/face-related
- beams-related
- eating-related/put things on top
- container-related/hollow
- child-related/toy-related
- medicine-related
- has grating
- handicraft-related
- cylindrical/conical
17 Dimensions
From "Extended Data Table 1"
- weapon/danger-related: weapon
- transportation/dynamic: vehicle
- furniture-related: furniture
- electronic/technology: electronic device
- animal-related: animal
- sport-related: sports equipment
- clothing-related: clothing
- fluid-related/drink-related: drink
- food-related: food
- child/toy-related: toy
- instrument-related: musical instrument
- body part-related: body part
- medicine-related: medical equipment
- tool-related: tool
- container-related/hollow: container
- insects/disgusting: insect
- plant-related/green: plant
How to Apply to Tagging?
The dimensions do not translate well to the typical form of tags. Tags are usually single words (no spaces) and are used in their plural form by convention. Therefore, you do have to generalize the dimension names.
If I would use those dimensions to tag using my workflows and tools:
- Derive a general term per dimension (in plural)
- Write those terms in a
.filetags
text file - Optionally add the whole dimension description to the file in the form of comment lines.
- Put the
.filetags
file within the folder hierarchy that holds the files to tag.
Limitations
Since the research data was derived by US people alone, there is a chance of cultural bias. Therefore, the dimensions might look different for different cultures.
The data only refers to natural objects. This is the reason that those dimensions are not useful for any limited scope of things to tag such as images of plants only.
You should also read:
- How to tag (including my personal tagging rules)