Text Insights

Topic modelers can analyze an assortment of documents and extract distinct themes from them. From this, topic modelers are able to determine what new documents are about - without needing to read them. In the example below, we trained our topic modeler to read thousands of articles and then visualized the results.

get your own demo

right arrow icon

Top 10 Most Salient Terms(1)

Top 20 Most Salient Terms(1)

This is for demo purpose only.

Overall Term Frequency

Estimated Term Frequency within The Selected Topic

1. Saliency (term w) = frequency(w) * [sum_t p(t | w) * log(p(t | w)/p(t))] for topics t;

see Chuang et. al (2012)


2. Relevance (term w | topic t) = λ * p(w | t) + (1 - λ) * p(w | t)/p(w);

see Sievert & Shirley (2014)

Marginal Topic Distribution

2%

5%

10%

1. Saliency (term w) = frequency(w) * [sum_t p(t | w) * log(p(t | w)/p(t))] for topics t; see Chuang et. al (2012)

2. Relevance (term w | topic t) = λ * p(w | t) + (1 - λ) * p(w | t)/p(w); see Sievert & Shirley (2014)

Relevance Metric(2)

λ = 1

TOPIC: Movie 1

Intertopic Distance Map

(via multidimensional scaling)

how it works

Extract meaningful patterns and insights from documents at scale. Let humans focus on the business.

Clicking one of the topics in the above diagram will show the most frequent words in that topic, and hovering over a word will show its distribution across the topics. With a little exploration, you can see that there are three distinct categories of topics: movies, sports, and cars.

But each one of those categories has multiple topics inside, so what makes these topics different? If you decrease the Relevance Metric, you'll start seeing differences between the topics; for example, “Sports 5” seems to be more of a hockey topic, while “Sports 6” is more of a baseball topic. Gaining insights into text has never been easier - get in touch to find out how the Convergence topic modeler can benefit your business.

Extract Patterns logo
Extract Patterns