Opinion-piece: Power of Sample Data

Created by Johann Furmann, edited by Sviatoslav Voloshin

Disclaimer

This document is my personal opinion and just one way of doing things. I’m looking forward to any feedback.

Introduction

In our projects, we have to make countless small and big decisions. The decisions are sometimes very small as the order of fields on a page layout, sometimes very big like the implementation of Salesforce CPQ for Quote generation.

Making these decisions needs always be based on the desired future process, which is most often based on existing processes.

Discussing the desired outcome can often take a very long time since there are many unknowns and “black swans.” This is where the concept of realistic sample or test-data comes into play. It is not a new concept, however, it is often overlooked or not given the necessary attention.

Existing data makes it possible to perform evidentiary analysis based on a small sample of truly random records. This type of analysis will often yield a greater depth of understanding of the real requirements than lengthy discussions. Evidence may also yield unexpected and even contrary results.

My experience has taught me that most discussions about the business are rather theoretical and somewhat detached from reality.

We sit in a room and discuss hypothetical processes without examining reality. If we are lucky, we have the voice of the user in the room. What we almost never have in the room is the reality. Human memory is typically unreliable. We like to remember the outliers, exceptions, or what fits our beliefs. It is almost impossible for us humans to remember something we would prefer to filter out.

That’s where sample data comes into place. Even a small sample size can give us a glimpse into reality.

I’d like to explain it with a simple example:

Business problem:

Users spend a lot of time filling out the “Comment” field on Quotes. 

Management wants to decide if that’s still needed in the future or should be removed.

Preparation:

The team defines the purpose of the Comment field.

The Comment field is meant to help the Invoicing Team create the correct invoice.

Sample Data:

The project team picks the 20 random Quotes where the comment field is filled out. The team does not filter the Quotes in any other way.

Classification:

The team takes the 20 Comments and asks two members of the invoicing team to classify the comments. Comments are classified as “helpful” or “not helpful”.

Decision:

Based on the classification we learned that the Invoicing Team members classified 2 out of the 20 comments as helpful. All others were classified as “Not helpful”.

Using that data it is concluded that the comment field is removed since the value of the comments does not justify the effort of maintaining the field (migrating data, reporting on it, etc.). 

Sample size:

In my experience, a sample size of 10 – 20 records is more than sufficient for such analysis. Often 5 records are already enough to see a trend.

Randomness:

As tempting as it may be to filter sample data, for fear of certain records not being good examples or user bias, this practice must be avoided, since the objective is to reflect reality and not a version of it. Making decisions based on real evidence will produce better results. It’s a lot like good parenting – you will not be the most popular person and it will require more work and rigor, but your customers and your future self will thank you later.

Summary:

Using the aforementioned process takes about an hour of work. The outcome of the decision is better than any lengthy discussions since it’s based on real end-user feedback.

Discussions typically yield less accurate results, since they are usually based on assumptions, inaccurate information, and anecdotal evidence.