CLOSED: [2020-12-27 Sun 15:43] SCHEDULED: <2020-12-27 Sun> :PROPERTIES: :CREATED: [2020-12-27 Sun 14:42] :ID: 2020-12-27-tagging-natural-objects :END: :LOGBOOK: - State "DONE" from "STARTED" [2020-12-27 Sun 15:43] :END: How to choose a sufficient and limited set of tags for tagging things is a frequently asked question in [[id:tags-pim][Personal Information Management]]. Those sets of tags are called [[https://en.wikipedia.org/wiki/Controlled_vocabulary][controlled vocabulary]] (CV). I wrote about some aspects about CVs [[id:2017-04-18-classification][in this article about classification]] and [[id:2020-01-25-avoid-complex-folder-hierarchies][in this article about folder hierarchies]]. Please do read [[id:2022-01-29-How-to-Use-Tags][my general recommendations on using tags in an efficient way]]. This article follows one approach to classify general natural objects within 49 or 17 general dimensions. Those dimensions can be used for tagging arbitrary natural objects. The neat thing about this approach is that the way the dimensions were derived should assure a maximum consensus. *** How I Learned About This Approach While listening to [[https://minkorrekt.de/mi178-brennbar-oder-royal/][a German podcast episode]], I learned about the scientific research paper "[[https://doi.org/10.1038/s41562-020-00951-3][Revealing the multidimensional mental representations of natural objects underlying human similarity judgements]]". Here is the abstract: #+BEGIN_QUOTE Objects can be characterized according to a vast number of possible criteria (such as animacy, shape, colour and function), but some dimensions are more useful than others for making sense of the objects around us. To identify these core dimensions of object representations, we developed a data-driven computational model of similarity judgements for real-world images of 1,854 objects. The model captured most explainable variance in similarity judgements and produced 49 highly reproducible and meaningful object dimensions that reflect various conceptual and perceptual properties of those objects. These dimensions predicted external categorization behaviour and reflected typicality judgements of those categories. Furthermore, humans can accurately rate objects along these dimensions, highlighting their interpretability and opening up a way to generate similarity estimates from object dimensions alone. Collectively, these results demonstrate that human similarity judgements can be captured by a fairly low-dimensional, interpretable embedding that generalizes to external behaviour. #+END_QUOTE The original paper is [[https://en.wikipedia.org/wiki/The_Cost_of_Knowledge][locked away]] by [[https://en.wikipedia.org/wiki/Elsevier][Elsevier]] [[https://rdcu.be/b8pqd][here]]. Luckily, there is [[https://10.31234/osf.io/7wrgh][a preprint version of the paper]] [[https://psyarxiv.com/7wrgh/][here]]. I also found [[https://twitter.com/martin_hebart/status/1315700596419301377][this Twitter thread]] of one of the authors. Here, I don't want to discuss the paper in detail. Please do read it yourself. This article is using the results of that paper and applies it to tagging processes. *** Method Researchers of the paper derived conceptual and perceptual properties of real-world images of 1,854 objects. Thousands of people from the USA were asked to chose one non-matching image within sets of three images (odd-one-out task). The result was then interpreted by researchers that came up with 49 dimensions that should be sufficient to classify general objects. Using reasonable simplification this was even more reduced to 17 dimensions. I extracted those dimensions for you: *** 49 Dimensions From: "Extended Data Figure 2" 1. made of metal/artificial/hard 2. food-related/eating-related/kitchen-related 3. animal-related/organic 4. clothing-related/fabric/covering 5. furniture-related/household-related/artifact 6. plant-related/green 7. outdoors-related 8. transportation/motorized/dynamic 9. wood-related/brown 10. body part-related 11. colorful 12. valuable/special occasion-related 13. electronic/technology 14. sport-related/recreation-related 15. disc-shaped/round 16. tool-related 17. many small things/coarse pattern 18. paper-related/thin/flat/text-related 19. fluid-related/drink-related 20. long/thin 21. water-related/blue 22. powdery/fine-scale pattern 23. red 24. feminine (stereotypically)/decorative 25. bathroom-related/sanitary 26. black/noble 27. weapon/danger-related/violence 28. music instrument-related/noise-related 29. sky-related/flying-related/floating-related 30. spherical/ellipsoid/rounded/voluminous 31. repetitive 32. flat/patterned 33. white 34. thin/flat 35. disgusting/bugs 36. string-related 37. arms/legs/skin-related 38. shiny/transparent 39. construction-related/physical work-related 40. fire-related/heat-related 41. head-related/face-related 42. beams-related 43. eating-related/put things on top 44. container-related/hollow 45. child-related/toy-related 46. medicine-related 47. has grating 48. handicraft-related 49. cylindrical/conical *** 17 Dimensions From "Extended Data Table 1" 1. weapon/danger-related: weapon 2. transportation/dynamic: vehicle 3. furniture-related: furniture 4. electronic/technology: electronic device 5. animal-related: animal 6. sport-related: sports equipment 7. clothing-related: clothing 8. fluid-related/drink-related: drink 9. food-related: food 10. child/toy-related: toy 11. instrument-related: musical instrument 12. body part-related: body part 13. medicine-related: medical equipment 14. tool-related: tool 15. container-related/hollow: container 16. insects/disgusting: insect 17. plant-related/green: plant *** How to Apply to Tagging? :PROPERTIES: :END: The dimensions do not translate well to the typical form of tags. Tags are usually single words (no spaces) and are used in their plural form [[https://startpage.com/do/settings?t=dark&query=tagging%20convention%20plural%20singular&lui=english&cat=web&sc=ZUsk4JtK9KLl20][by convention]]. Therefore, you do have to generalize the dimension names. If I would use those dimensions to tag using [[id:2014-05-09-managing-digital-photographs][my workflows and tools]]: 1. Derive a general term per dimension (in plural) 2. Write those terms in a =.filetags= text file 3. Optionally add the whole dimension description to the file in the form of comment lines. 4. Put the =.filetags= file within the folder hierarchy that holds the files to tag. *** Limitations Since the research data was derived by US people alone, there is a chance of [[https://en.wikipedia.org/wiki/Cultural_bias][cultural bias]]. Therefore, the dimensions might look different for different cultures. The data only refers to natural objects. This is the reason that those dimensions are not useful for any limited scope of things to tag such as images of plants only. You should also read: - [[id:2022-01-29-How-to-Use-Tags][How to tag]] (including my personal tagging rules)