Convergence Logo

Text Insights

Topic modelers can analyze an assortment of documents and extract distinct themes from them. From this, topic modelers are able to determine what new documents are about - without needing to read them. In the example below, we trained our topic modeler to read thousands of articles and then visualized the results.

Top 10 Most Salient Terms(1)

This is for demo purpose only.

relevance metric

(2)

(2)

λ = 1

λ = 1

TOPIC: All Topic

1. Saliency (term w) = frequency(w) * [sum_t p(t | w) * log(p(t | w)/p(t))] for topics t;

see Chuang et. al (2012)

2. Relevance (term w | topic t) = λ * p(w | t) + (1 - λ) * p(w | t)/p(w);

see Sievert & Shirley (2014)

Overall Term Frequency

Estimated Term Frequency within The Selected Topic

1. Saliency (term w) = frequency(w) * [sum_t p(t | w) * log(p(t | w)/p(t))] for topics t;

see Chuang et. al (2012)

2. Relevance (term w | topic t) = λ * p(w | t) + (1 - λ) * p(w | t)/p(w);

see Sievert & Shirley (2014)

Intertopic Distance Map

(via multidimensional scaling)

Marginal Topic Distribution

2%

5%

10%

HOW IT WORKS

Extract meaningful patterns and insights from documents at scale. Let humans focus on the business.

Clicking one of the topics in the above diagram will show the most frequent words in that topic, and hovering over a word will show its distribution across the topics. With a little exploration, you can see that there are three distinct categories of topics: movies, sports, and cars.

But each one of those categories has multiple topics inside, so what makes these topics different? If you decrease the Relevance Metric, you'll start seeing differences between the topics; for example, “Sports 5” seems to be more of a hockey topic, while “Sports 6” is more of a baseball topic. Gaining insights into text has never been easier - get in touch to find out how the Convergence topic modeler can benefit your business.

How it works