The Sentiment Arch graph follows the sentiment throughout the novel. Each sentence in the novel is labeled a positive, negative, or neutral sentiment value using a sentiment classifier. To get the smooth arches, Tambr uses a fourier transform and a series of signal filters to get rid of the noise and get the overarching shape of the novel's sentiment. The current implementation uses an r package called Syuzhet written by Matthew Jockers in 2015
Tambr uses a topic extraction technique called Non Negative Matrix Factoriazation to get groups of words that represent the most significant topics in the novel. This technique is not perfect yet, but it yields consistent results that make sense with what we know the novels to be about, with the exception of some noise.
The first bubble illustraction shows each extracted topic in its own bubble, with the corresponding synthesizer in the in the bubble that matches its color in the next illustration. Tambr currently uses a few hundred synthesizers from Apple Logic . It matches synthesizers to topics based on the synthesizer titles (eg Power, Mystery, Drowning) using a knowledge base of word embeddings extracted from the Google News corpus using the free tool Word2Vec .