Machine learning projects need time to produce value. Unless you are using an off-the-shelf, fully pre-trained solution that isn’t easily customised, you will have to spend some time training a machine learning model to do the job it’s meant to do. The value created can be transformational, but there’s always an initial hump you have to overcome.
Minimising and streamlining this model training process is one of the defining missions of Re:infer. As a solution combining the latest machine learning and natural language processing models, our team works tirelessly to make the model training process as fast and easy as possible.
To this end, we leverage active learning and various forms of unsupervised learning, built around our zero-code user interface, to deliver state-of-the-art model performance and accuracy - with minimal time investment from our users.
We’re not sitting still either. Our research team is actively working to discover new ways to accelerate the training process. One technique they’ve been focused on has been topic modelling, and our resident senior research scientist Harshil Shah recently wrote a great piece on our research innovations in this area.
What is topic modelling?
Topic modelling is a machine learning technique that automatically identifies sets of words which commonly occur together (known as ‘topics’) within large datasets. This is the case whether those datasets are text documents, spreadsheets or emails from a shared mailbox.
What is topic modelling used for?
Topic modelling automates the task of analysing large bodies and collections of text data. It is most commonly used as a text-mining tool for discovering hidden structures and insights within unstructured and semi-structured datasets. In addition to text mining, there have also been successful applications of topic modelling in computer vision, population genetics, and social network analysis.
For example, you could use topic modelling if you wanted to find out what your customers are saying about your brand on social media, or if you want to identify the most common requests coming into your service desk.
In a Communications Mining solution like Re:infer, topic modelling enables the platform to automatically identify topics and intents. Common issues, reasons for contact, and causes of failure demand are important topics our customers extract to better understand their business, improve service efficiency and enhance the customer experience.
Why is topic modelling important?
Organisations today are bombarded with massive amounts of data, mostly existing in a text-based format and 80% of which is unstructured. This information contains important insights which could be used to improve products, business processes and customer experiences. If businesses want to remain performant and competitive within their marketplace, they have little choice but to analyse this data and extract the key insights.
The challenge is that there is too much text data for humans to process alone. Human workers lack the capacity for this repetitive, monotonous work, and few businesses could afford the extra headcount required anyway.
This is where topic modelling comes into its own. Topic modelling excels at rapidly, automatically making sense of and extracting useful insight from vast collections of text data. Businesses can discover widespread issues, opportunities and insights from their masses of data. This is without needing to invest in a large manual workforce whose only function is data processing.
Deep hierarchical unsupervised intent modelling
In his in-depth, technical blog post, Harshil explains how the Re:infer research team has been experimenting with a new unsupervised approach to topic modelling.
He concludes that topic modelling is an excellent way to gain a high-level understanding of a dataset without having to perform any manual data labelling. However, flat topic models - which are the most commonly used - are somewhat compromised in that their output can be difficult to interpret. You are also required to know the number of topics you’ll need in advance, which limits their utility.
Fortunately, Harshil explains how these issues can be overcome by using a tree-structured model which groups related topics together, and automatically learns the topic structure from the data. He also shows how topic modelling results can be further improved with Transformer embeddings.
These findings will be worked into how Re:infer utilises topic models, enabling customers to begin extracting value from their communications data in rapid time and with minimal training.
If you would like to know more about the team’s technical approach and how they achieved these results, read the full blog here.