I build graph visualization software. Not the graphs you produce with Excel and bore people with in Powerpoint, but the kind represent relationships between entities. This is graph theory, a subset of mathematics that concerns lots of dots, connected by lots of lines. More formally, I refer to them as nodes and edges. I'm exactly the kind of person that would get annoyed by straying from that.
Researchers use graph theory to map and model all kinds of relationships, from my domain of bioinformatics, to infrastructure, to economics.
To understand why graph theory is so attractive, we can look at something I hold dear, which is music. Musicians are a promiscuous lot; a single musician may play with many different groups throughout his/her life. Dave Grohl is an excellent contemporary example. He has played in Nirvana, Probot, Foo Fighters, Killing Joke, Them Crooked Vultures, Queens of the Stone Age and has likely whored himself around to countless other side projects and collaborations. As an exercise, I slapped together a quick list of musicians and bands that have collaborated. That list looks something like this:
...
Nick Simper - Johnny Kidd & The Pirates
Geezer Butler - Black Sabbath
Ozzy Osbourne - Black Sabbath
Tony Iommi - Black Sabbath
Tony Iommi - Jethro Tull
Lemmy Kilmister - Motorhead
Lemmy Kilmister - Hawkwind
Mikkey Dee - Motorhead
Mikkey Dee - King Diamond
Lemmy Kilmister - Ozzy Osbourne
Lemmy Kilmister - Probot
Josh Homme - Them Crooked Vultures
Josh Homme - Kyuss
Josh Homme - Queens of the Stone Age
Dave Grohl - Them Crooked Vultures
John Paul Jones - Them Crooked Vultures
John Paul Jones - Led Zeppelin
...
... This goes on to take up 92 lines. It's visually uninteresting, but it's helpful. If I was listening to an album, I could consult the list and find out what other bands the musicians have played with. If I like the album, which is often the case with this particular list, chances are that their other work will be of similar quality (there are some notable exceptions. I'm looking at you, Them Crooked Vultures).
Enter the graph.
A labelled portion of the previous list, now in shiny graph format. |
All of the text boxes are our nodes, and our edges are instances where the two parties have played with each other. Note that this doesn't make the earlier index any less informative. It contains the same data, but now it's arranged in a manner that facilitates the visual identification of paths from node to node. I can very quickly trace my way from Metallica to Journey and eventually from Journey to Nailbomb. Paths like that amuse me to no end.
Zooming out, the whole thing, 76 musicians and bands and 92 collaborations, looks like this:
This also represents half a century of trashed hotel rooms. |
Same graph, with node involvement in shortest paths and degree mapped to node width and height. |
Now we're beginning to see some very attractive nodes. Lets look at three of them:
Top 3 standouts from our analysis. You could do worse. |
This looks pretty solid. Ozzy and Lemmy alone could give you a shelf full of records, and Probot, being Dave Grohl's personal wish-fulfillment album comprised of metal stars from his childhood, stands out as a significant kind of thing to own. We're cool with that.
One problem with this is that graph structure may not necessarily denote worth. For example, according to this graph, there's a lesser known blues cover band occupying the less-fashionable south end of this graph, called 'Led Zeppelin'.
If I could illustrate Led Zeppelin and Ozzy as a giant cherub and a giant bat respectively, I would have. |
Speaking of Madonna, why isn't she on this graph? Why aren't the Beatles? It has a lot to do with my research methods. I started this graph with Lemmy and Motorhead, and worked out from there, consulting Wikipedia as I went. My interest in the soulless void that is pop music is limited; thus any research in unfavorable directions was actively ignored. Seems like a pretty biased way to research anything, doesn't it? No way would any self-respecting researcher do such a thing in the world of science. Except when they do (Edwards et. al. 2011).
Research within confines of music and sales itself isn't the only thing we'd need for the whole picture. Geography, the ugly beast, ensured a lot of the early relationships on this graph. The fact that the members of Hawkwind, Motörhead, the Yardbirds and Cream were all located around London during the 60s and 70s greatly effects their connectivity and clustering. Or maybe the other way around, as artists would move closer to a city that could support both their musical styles and their drug habits. In fact, your efforts to build a better graph could go as far as tracking down who shared the same drug dealer in the 60s, leading to the discovery of a vitally important rundown flat situated in south-west London. Deciding where to draw the line regarding supporting data is maddening in this regard.
All of this aside, I love graphs. Like the sledgehammer, if I could, I would apply them to doors, cinder blocks and coconuts. The idea that they map things, as opposed to modelling them, is foremost in my mind though, as was cartoonishly illustrated above. It's powerfully intoxicating to imagine that what you're looking at is an actual thing, instead of our limited representation of that thing.
great overview of graphs - very good points about strengths and limitations. Like other nice data representations/analysis methods, there are lots of ways to misuse them. Graphs depicting molecular interactions are a good example - they can identify key molecules but often they're actually depicting key biases
ReplyDelete