A fairly rich approximation of the concepts’ content is achieved by observing their behavior in our minds, in the external world and in language. The Network Approach will therefore take into account :
1) how we simulate these concepts in our minds (WORK PACKAGE 1 – WP1);
2) how do these concepts are experienced in perceptual contexts (WORK PACKAGE 2 – WP2);
3) how we use them in linguistic texts (WORK PACKAGE 3 – WP3).
Three different databases will be used, each of them containing specific semantic information about the behavior of the concepts in one of the three environments sketched above (mental simulations; experiential contexts; language).
(WP1) A set of speaker-generated properties of the concepts (internal attributive features)
Norms: the semantic features production norms database collected by McRae and colleagues (2005) is commonly used in semantic memory research. It consists of lists of properties attributed by participants to 541 basic level concepts. Since this database encompasses concepts that are not necessarily employed as source or target domains, this database will be expanded with additional data. Prof. McRae, coordinator of the original database, will advise this phase of the project, and he will be invited to the host institution for a brief stay. Since these semantic features derive from mental simulations and property generation of the given concepts by informants, the emerging similarity is attributive (e.g.-apple: is-fruit, is-red).
(WP2) A database of annotated images where the domains appear together with other entities (external relational entities )
Fdt (Flickr Distributional Tagspace): an original method, fully implemented by myself (Bolognesi, 2014; Bolognesi, accepted), used to subset, format, and analyze metadata associated to images on Flickr, the photo-hosting service powered by Yahoo! (used with permission). Metadata will be organized in lists of tags that have been associated by Flickr users to those pictures that are representative for the source and target domains. For each concept to be analyzed I will download from Flickr averagely one hundred thousand tag-sets associated to as many pictures, which will constitute the corpus on which it will be then be possible to identify shared patterns of occurrence between source and target domains. Since each concept in Flickr is expressed through a tag and represented in actual situations, captured by the photographer, the context is represented by the co-tags, which express other concepts that are associated in the experience to the targeted concept. The similarity that will emerge from this distributional analysis is therefore relational. (e.g.-apple: tree, orchard).
(WP3) a collection of verbal texts where the domains are used linguistically ( linguistic contexts )
DM (Distributional Memory, Baroni, Lenci 2010): a framework for corpus-based semantic analyses, which comprises lists of weighted word-link-word triplets, that have been extracted from an annotated corpus of text of 2.83 billion tokens. I will select from this database all instances where each of the source and target domain concepts appear. This will allow me to compare the typical linguistic contexts in which source and target domains occur. Such similarity can be defined, in a broad sense, as linguistic (e.g.-apple: buy-apple, pick-apple).
A sample of 50 visual metaphors and 50 verbal metaphors will be analyzed with the Network Approach, which sum up to a total of 200 source and target domains to be analyzed across 3 different datasets (600 individual distributional analyses). Each analysis implies identifying, retrieving, classifying, and measuring through a semi-automated procedure a concept’s behavior in diverging contexts, which are the features collected in the first database, and the hundreds of thousands of instances gathered in the second and third databases.
The sets of visual and verbal metaphors used as stimuli to elicit semantic features from the participants were randomly selected from the VU Amsterdam Metaphor Corpus (http://metaphorlab.org/metaphor-corpus) and the VisMet Corpus (http://www.vismet.org/VisMet/ For this corpus, the selection includes images for which the authorization to reproduce the image is still pending). These corpora are balanced and representative of the two modalities, and therefore have modality-specific inherent variability.
The source and target domains of each of the 100 metaphors (50 visual and 50 verbal) will be identified with established procedures:
For linguistic metaphors the MIPVU procedure was applied (Steen et al. 2010). This procedure relies on the idea that the majority of metaphors found in language are not direct comparisons expressed through words (such as for example “my lawyer is a shark”), but are instead words used in a metaphorical way in a given context. In this sense, the majority of linguistic metaphors are expressed indirectly, and they imply the existence of a contrast between the contextual meaning of the word (which is metaphoric) and its basic meaning (which is literal). According to this procedure, given a text with a potentially metaphorical word, the contextual meaning and the basic meaning of that given word express the contrast on which the metaphor is created. For example, in the sentence “I see what you mean”, the contextual meaning of see is understand, while the basic meaning refers to the physical ability. The two meanings are in contrast and therefore the word see is to be considered metaphorical in the above mentioned linguistic context.
For the identification of the metaphor terms involved in visual metaphors the VISMIP procedure was applied (Sorm, Steen, submitted). This procedure relies on the idea that visual metaphors typically present (different types of) perceptually incongruous elements, that violate the expected scenario and need to be mentally replaced with other elements, whose function is to restore the visual feasibility of the scenario. Detecting such elements ( step 3 of the VisMip procedure) and replacing them with elements that would help to restore the expected scenario (step 4), is the type of cognitive operation that needs to be performed to unravel the metaphor. In this sense, the perceptual incongruities and their replacements constitute the metaphor terms (or part of them), or they cue to the abstract concepts that constitute the actual conceptual domains of the metaphor, by means of metonymies. For example, if a car advertisement shows a car frame with a rhino in place of the (expected) internal engine, the animal constitutes the perceptually incongruous unit, which has to be mentally replaced with a real engine, in order to restore the expected scenario. In such metaphor, the car engine is therefore compared to a rhino (and such comparison triggers mappings such as power, strength, robustness, etc).
Hundreds of thousands of instances of each of the selected concepts (source and target domains of visual or verbal metaphors) will be analyzed in their natural contexts, across the three environments: mental simulations, experiential contexts, and language. For the implementation of each WP the shared contexts will be automatically identified and retrieved; then they will be manually classified according to a well-established taxonomy (Wu, Barsalou, 2009). Finally, the degree of relatedness between each pair of source and target domains will be automatically measured, within each specific environment.
The Network Approach will follow the general method suggested in distributional semantics (“you shall know a word by the company it keeps!”, Firth-1957), according to which words that appear in the same sentences share components of their meanings (Distributional Hypothesis). In the classical distributional semantics view words are defined by their linguistic contexts, and similarity emerges from linguistic use. For example, given the 3 words book, manual, and umbrella, if we observe the linguistic contexts in which these 3 words usually appear, we can state that book and manual share more linguistic contexts than book and umbrella, or manual and umbrella, and this results in a greater similarity between book and manual, as opposed to book and umbrella or manual and umbrella. Although the nature of the similarity emerging from distributional modeling must be considered to be linguistic, rather than conceptual (De Vega et al. 2008), these methods have proven to be able to compete with human-stimulated judgments of similarity between words.
The methodological innovation that will be proposed and developed in this project is multifaceted: first of all, the methods of distributional semantics (traditionally applied to corpora of language) will be applied also to extra-linguistic corpora, bringing to light a variety of components of the meaning of the chosen words (not only linguistic). This will allow to observe different types of conceptual relatedness between two concepts. Second, the distributional semantics methods, traditionally used in lexical semantics, here will be innovatively applied to metaphor studies, and thus they will allow us to observe and model the meaning of source and target domains, and to shed light onto the nature of the similarity (or in a broader sense relatedness) that justifies their alignment. Third, two modalities of expression will be compared: the relatedness between source and target domains in visual metaphors will be compared to the relatedness between source and target domains in verbal metaphors.
Identifying and retrieving the shared components of meaning between source and target domains across of the 3 different databases is a procedure that will be performed automatically, through the methods outlined above. Classifying the components of meaning that are shared by source and target domains is a procedure that will be performed manually: each intersecting component of meaning will be labeled with a semantic role, chosen from the well-known taxonomy proposed by Wu and Barsalou (2009). Then, through another automatic procedure, the strength of the relatedness between each pair of source/target domains will be measured through the computation of the cosine between the two vectors that represent the concepts, accordingly with the well-established literature in distributional semantics (e.g. Landauer, Dumais 1997; Sahlgren 2006).
The project organization is summarized in the following schema.