Watchful x Proper Case Study: Part 2 - The Process of Selecting a Data Labeling Solution

Key Takeaways

Proper had multiple data classification challenges tied to almost 33% of its data which included identifying key retail data to provide a better consumer experience to their e-commerce consumers.
By approaching their problem as a Named Entity Recognition (NER) use case, Proper was able to use Watchful to resolve their challenges easily.
As subject matter experts, Proper had familiarity with the data set, and by bringing their industry knowledge into Watchful, Proper was able to refine their classifications and ultimately get their desired output.

‍

John Singleton (Watchful):

Prior to acquiring Watchful. What was kind of your thought process and assessing solutions, and then like what ultimately brought you to Watchful?

‍

Mike France (Proper):

Yeah, we had two fundamental challenges in this data set as we really saw it once we researched everything and we kind of ran the numbers on the data set. The first thing we did was decomposed, we segmented the problem so that can we identify a brand. What is it that we're trying to do? Well, we're helping we're trying to help consumers shop cannabis brands as effectively as possible. So we need to know the brand, right? One of the problems we solve with Watchful there is one out of every 3 records in the data set that we source doesn't have any brand value in any dedicated field at all. So 2 out of three, we get some brand value. Those are aliases. We need to still figure out what's the master brand. Ultimately, but with one-third of our data set, we need to just try to figure out what branded even is. This means dipping into, most commonly, the product name column. Sometimes the description fields to try to pluck out the brand name and that's a NER, use case (named entity recognition), and actually extracting value so that we can then put it through the rest of our pipeline just like the 2 out of every 3 rows that has some brand value there even though it's not a unified value. So we knew we're trafficking and brands here cannabis brands, ultimately between the consumers and the brands. So we need to know the brand and that's a challenge we need to address 33% of our data set has that problem.

The second problem is ultimately at a product level, if we want to do product e-commerce, the way that consumers expect in any other consumer space, you need to know what that ultimate product is for us we said okay, great, so how do you figure out what the product is? Number one, we know we need to classify these products, and looking at enough of the data, we knew we had a classification challenge which I was already kind of describing, right? Things that are listed as concentrates will be listed as concentrates in different ways and therefore be unreliable if you're trying to use those values directly just based upon what market you're in, who put that thing on the menu, you know. So you'll get those pre roles classified as concentrates. The same as you get like an actual cannabis concentrate that you would that you would smoke in some way. The classification we found to be kind of the most uncertain but had the highest amount of uncertainty the ambiguity and that we needed to nail that down. Once we nail that down, we were able to determine attributes of products very reliably. Right? if something is an edible, then we can reliably determine that it's a dummy. But if we don't know that it's an edible, we actually fail the majority of the time. If you were just looking at keywords or those kinds of things, yeah. We had that classification problem that then basically unlocked the rest of the data set so that we could figure out other attributes in rather straightforward ways.

All of this essentially we're dealing in text. So we looked initially at regular expressions. And just the problem with that is that you can be very precise, but you also can't deal with ambiguity very well at all. So Watchful was a natural fit in that aligned with I think the problem well, as well as the way that we had geared up to solve this. And when we kind of looked at this we said, hey we have a third of our data set that has where we have to identify the brand, but it has structural features and the text that we can use to do that reliably. And so Watchful is an incredible platform to do that, making that really, really easy. And we have a kind of a classification problem, but actually these products to classify them uniquely in one class or another, they have a lot of unique attributes that we can leverage. If we just have a way of kind of stacking knowledge into a system and that's exactly what Watchful does we were able to use, ss you call them hinters, but regular expressions that we had already been using analytically. And to match things at a lower scale and scope. And were able to then put them in and we were talking this way internally.

Were like we know this is like, hey, 15% of the time I see a concentrated, it's not a concentrated. It's gonna be one of these other 2 things. But of course, if you also looked for these other attributes, you'd be able to resolve this very relatively easily. I think the thing that Watchful did was make it incredibly easy to kind of import that knowledge in the form of we're subject matter experts. We have familiarity with the data set. We can import that knowledge very efficiently through hinters, and then Watchful takes over very quickly. And it allowed us to kind of shoot from the hip as subject matter experts and then gave us a really powerful system to refine those classifications and ultimately get output. That is so good that you know, we haven't we have such high recall and precision on what we're doing with Watchful. That we actually have not yet developed a machine learning model off of the labeling at all. We're just using the labeling and streaming mode because it has solved those problems for us without any issues.

‍

John Singleton (Watchful):

It's the one that works on your data in your environment in a effective manner. Yeah, it's absolutely great that you're able to get those results directly into Watchful.
‍

Mike France (Proper):

I was saying is a subject matter experts, we know our data set very well. We just kind of want a very effective machine that you can import knowledge into and that we'll kind of take over at some point, right? Automating inter and making the development of that labeling model. Really efficient, really easy, but Watchful was also a critical tool and helping us understand our data set for the same reasons, right? When you suggest a inter because there's some underlying attraction or correlation in the data set. That tells us, as analysts or subject matter experts, something as well. It tells us that there's something going on there. It actually helped us learn the variants and the differences and commonalities in our data set better than we could have just I mean, yeah, I'd love to do all this stuff in spreadsheets with like, simple profiling. But with the amount of data that's obviously not very practical. But also, if you're doing that, you're looking in straightforward. Ways, right. And Watchful helps unlock those kind of hidden patterns or associations. You're either making it more efficient to kind of get those related patterns in or you're telling the subject matter expert, maybe something they didn't know that gives them some more information about the data set overall.

‍

Watchful x Proper Case Study: Part 2 - The Process of Selecting a Data Labeling Solution

Latest Videos