You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This stems from this question on Stack Overflow: generalising it, suppose that each node i has some setA[i] of properties (I am avoiding "attributes", since we use that term elsewhere.). We wish to specify a dyadic predictor that, in pseudocode, can be represented x[i,j] = length(intersect(A[i], A[j])) (the number of properties i and j have in common) or x[i,j] = length(intersect(A[i], A[j])) > 0 (whether i and j have any properties in common).
Some examples:
A[i] is the set of languages i speaks, and we wish to use an indicator of whether i and j speak at least one common language as a predictor of their interaction. (This is from Stack Overflow.)
A[i] is a list of i's hobbies, and we wish to use the number of hobbies i and j have in common to predict acquaintance.
A[i] is a list of places i visited over the course of a day (e.g., from a contact diary), and we wish to use the number of common areas visited by i and j to predict whether they had a contact.
This seems like something that can be useful in a variety of circumstances.
A further generalisation of this concept is to make A[i] a mapping that maps property k to some value (e.g., proficiency in a language) so that, e.g., x[i,j] = max[k](min(A[i][k], A[j][k])) (or some other "interacting" and "combining" functions in place of min() and max[k](), respectively). In the language example, this predictor represents the proficiency of the less-proficient actor in the two actors' best common language (where "best common language" is the language in which the less-proficient actor has the highest proficiency).
In all cases, this would be a dyad-independent term, so in principle representable with edgecov().
Questions
How broadly useful would this be? I suspect @CarterButts and @mbojan might have some applications I hadn't thought about.
Would the generalisation to a mapping be useful? What "interacting" and "combining" functions would be useful?
What would be an efficient way to implement these?
What kind of a user interface (required data format and syntax) would we want for this term?
The text was updated successfully, but these errors were encountered:
Nice idea. It sounds useful to me especially if there would be some freedom of choice wrt "similarity function" f() in
f( A[i], A[j] )
where f() could include, next to your examples:
Cosine similarity
Jaccard coefficient (esp useful if A[i] is a set of binary variables)
some of those described in Choi, S. S., Cha, S. H., & Tappert, C. C. (2010). A survey of binary similarity and distance measures. Journal of systemics, cybernetics and informatics, 8(1), 43-48.
or any R function of two vectors returning a numeric scalar
Implementation-wise, does such term have to essentially construct a matrix for edgecov on the R level, or there are computational "shortcuts" to exploit on the lower level?
Implementation-wise, does such term have to essentially construct a matrix for edgecov on the R level, or there are computational "shortcuts" to exploit on the lower level?
That remains to be seen. If the operation is on a set rather than a mapping, there is a number of ways to represent it, with different advantages and disadvantages:
Each int can encode set membership for up to 32 properties. Then, unions and intersections can be calculated by bitwise &. If there are more than 32 properties, one can have multiple ints per node, though the storage and computational costs grow linearly in the number of properties.
If there are no more than 2^32 distinct properties, then each node can have a sorted array of its property IDs; then an algorithm can iterate through each node's properties, testing for common members a la the merge sort. This method is not sensitive to the total number of distinct properties but is sensitive to the average number of properties a node has.
There are probably others.
For a mapping, Method 2 can be used, with the array of property IDs serving as keys and a parallel array for values. (One can also just store the values in a vector with one element per property for each node analogously to Method 1, but then one loses the benefits of compactness of the one-bit-per-property representation and the speed of bitwise operations.)
Term description
This stems from this question on Stack Overflow: generalising it, suppose that each node
i
has some setA[i]
of properties (I am avoiding "attributes", since we use that term elsewhere.). We wish to specify a dyadic predictor that, in pseudocode, can be representedx[i,j] = length(intersect(A[i], A[j]))
(the number of propertiesi
andj
have in common) orx[i,j] = length(intersect(A[i], A[j])) > 0
(whetheri
andj
have any properties in common).Some examples:
A[i]
is the set of languagesi
speaks, and we wish to use an indicator of whetheri
andj
speak at least one common language as a predictor of their interaction. (This is from Stack Overflow.)A[i]
is a list ofi
's hobbies, and we wish to use the number of hobbiesi
andj
have in common to predict acquaintance.A[i]
is a list of placesi
visited over the course of a day (e.g., from a contact diary), and we wish to use the number of common areas visited byi
andj
to predict whether they had a contact.This seems like something that can be useful in a variety of circumstances.
A further generalisation of this concept is to make
A[i]
a mapping that maps propertyk
to some value (e.g., proficiency in a language) so that, e.g.,x[i,j] = max[k](min(A[i][k], A[j][k]))
(or some other "interacting" and "combining" functions in place ofmin()
andmax[k]()
, respectively). In the language example, this predictor represents the proficiency of the less-proficient actor in the two actors' best common language (where "best common language" is the language in which the less-proficient actor has the highest proficiency).In all cases, this would be a dyad-independent term, so in principle representable with
edgecov()
.Questions
The text was updated successfully, but these errors were encountered: