π

Tagging Natural Objects

Show Sidebar

How to choose a sufficient and limited set of tags for tagging things is a frequently asked question in Personal Information Management. Those sets of tags are called controlled vocabulary (CV). I wrote about some aspects about CVs in this article about classification and in this article about folder hierarchies.

This article follows one approach to classify general natural objects within 49 or 17 general dimensions. Those dimensions can be used for tagging arbitrary natural objects. The neat thing about this approach is that the way the dimensions were derived should assure a maximum consensus.

How I Learned About This Approach

While listening to a German podcast episode, I learned about the scientific research paper "Revealing the multidimensional mental representations of natural objects underlying human similarity judgements". Here is the abstract:

Objects can be characterized according to a vast number of possible criteria (such as animacy, shape, colour and function), but some dimensions are more useful than others for making sense of the objects around us. To identify these core dimensions of object representations, we developed a data-driven computational model of similarity judgements for real-world images of 1,854 objects. The model captured most explainable variance in similarity judgements and produced 49 highly reproducible and meaningful object dimensions that reflect various conceptual and perceptual properties of those objects. These dimensions predicted external categorization behaviour and reflected typicality judgements of those categories. Furthermore, humans can accurately rate objects along these dimensions, highlighting their interpretability and opening up a way to generate similarity estimates from object dimensions alone. Collectively, these results demonstrate that human similarity judgements can be captured by a fairly low-dimensional, interpretable embedding that generalizes to external behaviour.

The original paper is locked away by Elsevier here. Luckily, there is a preprint version of the paper here.

I also found this Twitter thread of one of the authors.

Here, I don't want to discuss the paper in detail. Please do read it yourself. This article is using the results of that paper and applies it to tagging processes.

Method

Researchers of the paper derived conceptual and perceptual properties of real-world images of 1,854 objects. Thousands of people from the USA were asked to chose one non-matching image within sets of three images (odd-one-out task).

The result was then interpreted by researchers that came up with 49 dimensions that should be sufficient to classify general objects. Using reasonable simplification this was even more reduced to 17 dimensions.

I extracted those dimensions for you:

49 Dimensions

From: "Extended Data Figure 2"

  1. made of metal/artificial/hard
  2. food-related/eating-related/kitchen-related
  3. animal-related/organic
  4. clothing-related/fabric/covering
  5. furniture-related/household-related/artifact
  6. plant-related/green
  7. outdoors-related
  8. transportation/motorized/dynamic
  9. wood-related/brown
  10. body part-related
  11. colorful
  12. valuable/special occasion-related
  13. electronic/technology
  14. sport-related/recreation-related
  15. disc-shaped/round
  16. tool-related
  17. many small things/coarse pattern
  18. paper-related/thin/flat/text-related
  19. fluid-related/drink-related
  20. long/thin
  21. water-related/blue
  22. powdery/fine-scale pattern
  23. red
  24. feminine (stereotypically)/decorative
  25. bathroom-related/sanitary
  26. black/noble
  27. weapon/danger-related/violence
  28. music instrument-related/noise-related
  29. sky-related/flying-related/floating-related
  30. spherical/ellipsoid/rounded/voluminous
  31. repetitive
  32. flat/patterned
  33. white
  34. thin/flat
  35. disgusting/bugs
  36. string-related
  37. arms/legs/skin-related
  38. shiny/transparent
  39. construction-related/physical work-related
  40. fire-related/heat-related
  41. head-related/face-related
  42. beams-related
  43. eating-related/put things on top
  44. container-related/hollow
  45. child-related/toy-related
  46. medicine-related
  47. has grating
  48. handicraft-related
  49. cylindrical/conical

17 Dimensions

From "Extended Data Table 1"

  1. weapon/danger-related: weapon
  2. transportation/dynamic: vehicle
  3. furniture-related: furniture
  4. electronic/technology: electronic device
  5. animal-related: animal
  6. sport-related: sports equipment
  7. clothing-related: clothing
  8. fluid-related/drink-related: drink
  9. food-related: food
  10. child/toy-related: toy
  11. instrument-related: musical instrument
  12. body part-related: body part
  13. medicine-related: medical equipment
  14. tool-related: tool
  15. container-related/hollow: container
  16. insects/disgusting: insect
  17. plant-related/green: plant

How to Apply to Tagging?

The dimensions do not translate well to the typical form of tags. Tags are usually single words (no spaces) and are used in their plural form by convention. Therefore, you do have to generalize the dimension names.

If I would use those dimensions to tag using my workflows and tools:

  1. Derive a general term per dimension (in plural)
  2. Write those terms in a .filetags text file
  3. Optionally add the whole dimension description to the file in the form of comment lines.
  4. Put the .filetags file within the folder hierarchy that holds the files to tag.

Limitations

Since the research data was derived by US people alone, there is a chance of cultural bias. Therefore, the dimensions might look different for different cultures.

The data only refers to natural objects. This is the reason that those dimensions are not useful for any limited scope of things to tag such as images of plants only.

Comment via email or via Disqus comments below: