Techniques for Chat Data Analytics with Python | by Robin von Malottki

Half II: Subject Extraction with BERTopic

Photograph by Mikechie Esparagoza
and obtained from Pexels.com

Within the first a part of this collection, I launched you to my artificially created buddy John, who was good sufficient to supply us together with his chats with 5 of the closest individuals in his life. We used simply the metadata, equivalent to who despatched messages at what time, to visualise when John met his girlfriend, when he had fights with one among his finest buddies and which members of the family he ought to write to extra usually. In case you didn’t learn the primary a part of the collection, you’ll find it here.

What we didn’t cowl but however we’ll dive deeper into now could be an evaluation of precise messages. Due to this fact, we’ll use the chat between John and Maria to establish the matters they talk about. And naturally, we is not going to undergo the messages one after the other and classify them — no, we’ll use the Python library BERTopic to extract the matters that the chats revolve round.

What’s BERTopic?

BERTopic is a subject modeling method launched by Maarten Grootendorst that makes use of transformer-based embeddings, particularly BERT embeddings, to generate coherent and interpretable matters from giant collections of paperwork. It was designed to beat the constraints of conventional subject modeling approaches like LDA (Latent Dirichlet Allocation), which regularly battle to deal with quick…

Source link