How One Law Firm Uses Watchful to Leverage Attorney Expertise in Classifying Data at Scale

Jul 25, 2022

Law firms are not the first thing that comes to mind when people talk about innovative and cutting-edge applications of artificial intelligence, but for one Silicon Valley law firm, their partnership with Watchful.io has resulted in a unique and innovative process for applying AI to solve a simple but difficult problem of efficiently and accurately labeling large volumes of data.

Wilson Sonsini, like other law firms around the world, collects an ever-growing trove of data from its round-the-clock operations in the servicing and representation of its clients.  And this velocity and volume of data only increase as law firms like Wilson Sonsini continue to digitize and upgrade their infrastructure.  Although their internal workflows make law firms “data rich,” a daunting challenge remains to truly understand and harness the untapped value in the large datasets they generate, both in a scalable and cost-effective manner.  For firms that successfully extract practical value from their data, however, the ensuing intelligence and insight provided by the data creates competitive advantage, improved quality, and a broader scope of services for their partners and clients.

To help solve a pervasive labeling problem, David Wang, Chief Innovation Officer at Wilson Sonsini, reached out to Shayan Mohanty and John Singleton at Watchful.io about a partnership to test their innovative technology on Wilson Sonsini data and see how it could help. David and the Wilson Sonsini Data Science Team, consisting of Jon Metcalf, Hari Charora and James Gallmeister, began work in earnest to learn what Watchful’s modern, interactive and programmatic data labeling solution could do with literally millions of rows of important but underutilized data.

Challenge and Approach

The main challenge in labeling data in a law firm environment is not so much deciding what labels the data should go into—after all, law firms, like Wilson Sonsini, are built upon hundreds of world-class subject-matter experts who know precisely what label applies to the work they produce for clients—rather, it is how to effectively apply that labeling to large bodies of data.  The Wilson Sonsini data science team couldn’t just ask extremely busy subject matter experts to spend countless hours classifying their data entries, yet the team needed their human expertise to train computer models that would be used instead of the attorneys to automate this work, at scale and with a high degree of accuracy.

The Wilson Sonsini team had three goals for how Watchful technology would need to work: first, they wanted to leverage human subject matter expertise with minimal demands on the attorneys who provide that expertise; second, they had to scale the labeling process across large data sets by leveraging machine learning, structured databases and automation; and third, they had to build a system that was easy to reconfigure in the eventuality that the classifications or the rules for classification would need to change.

If the input was unclassified/unlabeled raw data, the output would be accurately predicted and highly useful information and insights about that data once the classification rules were properly established and applied to the raw data.

Applying Watchful.io

The Wilson Sonsini team decided to target a large body of unclassified data that, if properly labeled, would add critical insights and specificity about the types of work the firm provides to its clients.  For the Watchful labeling exercise, only a smaller subset (about 500K entries) was needed to create the prediction algorithm that would eventually be applied to the larger set of related data entries.

Once extracted and pre-processed for labeling, the data was imported into the Watchful platform, where, with minimal supervision from human subject matter experts, the data science team added heuristics, or “hinters”, to the Watchful model.  As more hinters were applied and suggestions were generated by the platform to assist the subject matter experts, the Watchful algorithm quickly became better and better at applying the right labels to the data.  The time spent by the subject matter experts was only minimal–about two 3-hours sessions.  The work went quickly since the Watchful algorithm only needed a small sample of correct entries to “learn” how the experts were making their labeling decisions and how the data met those conditions.

This process was repeated for all the labels or classes targeted for this project. The result of this phase was a Watchful-labeled data set that was ready for use by the data science team to develop a machine learning model for predicting how this type of data would be classified if a human subject matter expert did the labeling.


Applying the results of the Watchful exercise, the Wilson Sonsini data science team completed the project by developing an automated pipeline and workflow where the predictive machine learning model gets applied to newly generated data entries, every day.  The newly classified data then gets added to a production database and becomes instantly available for reports and dashboards that provide fresh and additional insight about the data to users throughout the firm.

Machine learning tools, like Watchful.io, are helping firms like Wilson Sonsini better understand and organize their vast stores of important but underutilized data.  The Watchful platform is particularly effective in saving time and resources through its user interface that efficiently collects human expertise and its powerful machine learning algorithm that applies that expertise to better classify a given data set, even with minimal input by the expert.

For Wilson Sonsini, the Watchful platform is a game changer in that it allows experts who have not traditionally had the time or ability to now participate in the analysis process and work directly with data teams to bring their expertise into the pipeline.  Watchful, it seems, is very mindful that human expertise is still an important aspect of proper data analysis.  So, when the attorneys ask how their data was labeled, the Wilson Sonsini team can tell them they were directly part of the process.

Table of Contents

Watchful is a modern, interactive and programmatic data labeling solution for data scientists and machine learning practitioners. With Watchful, holistically explore, classify, annotate and validate any unique dataset.