How to Calculate New, Active & Lapsed Customers with SQL

One of the important fundamentals to growing a healthy business is understanding the customer life cycle. At a high level, the types of questions that usually begin a customer life cycle analysis range from:

  • How many new customers did marketing generate over a specific time period?
  • Which types of product categories or services generate more new customers?
  • Are we increasing the number of active buyers over time with our marketing and other business efforts?
  • Are there different buying cycles based on the type of customers we sell to?

The answers to the above questions will vary based on the industry and specific business. How quickly they can be answered will depend on the size and maturity of your organization. You might have a business intelligence group or team of analysts to answer the question for you. If not, the goal of this post is to show you how to answer these types of questions using SQL.

The data set and environment

The data set we’ll be using is Super Store and the structure of the data looks like the following.

The environment we’ll be using to query the data with SQL is the cloud database Google Big Query. If your organization stores it’s data on-premise in a system like MS SQL Server the fundamentals still apply although the syntax of the SQL will vary.

Transforming data with SQL

  1. Create an order sequence column specific to each customer ID based on the ascending order date for that customer. This is what the ROW_NUMBER analytic function is doing in the below query.

  2. Create a new column that inserts the previous order date for a customer ID to be used in a later code block for calculating the time period between orders. This is what the LAG Analytic function is doing in the below query.

    *Note the “Group By” order_date and customer_id columns at the end of the statement. This is important because a customer can have multiple orders with different order ids on the same day
SELECT 
   order_date,
   customer_id,

   ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY 
   order_date ASC) as customer_order_sequence,

   LAG(order_date) OVER (PARTITION BY customer_id ORDER BY 
   order_date ASC) as previous_order_date,

   FROM personal-264822.superstore.orders
   GROUP BY order_date, customer_id
Output from the above SQL query

3. Run a sub-query against the above table to calculate the days between the order_date and the previous_order_date columns, you’ll see this happening with the DATE_DIFF function that creates the new column days_between_orders.

SELECT 
   order_date,
   customer_id,
   customer_order_sequence,

   CASE WHEN previous_order_date IS NULL THEN order_date
   ELSE previous_order_date END AS previous_order_date,

   DATE_DIFF(order_date, previous_order_date, DAY) AS 
   days_between_orders

   FROM order_sequence
Output from the above SQL query

4. The next query creates an additional customer_life_cycle column with a CASE statement to indicate if the order is from a New, Active or Lapsed customer based on the days_between_orders column. In this example, a customer is considered ACTIVE if the order takes place anytime between 1 to 365 days from the previous order and LAPSED if the prior order was greater than 365 days ago. This highly business specific so yours may vary.

SELECT 
 order_date,
 customer_id,

 CASE 
 WHEN customer_order_sequence = 1 THEN 'New Customer'
 WHEN days_between_orders > 0 AND days_between_orders < 366 THEN 'Active Customer'
 WHEN days_between_orders > 365 THEN 'Lapsed Customer'
 ELSE 'Unknown' 
 END AS customer_life_cycle,

 customer_order_sequence,
 previous_order_date,

 CASE 
 WHEN days_between_orders IS NULL THEN 0
 ELSE days_between_orders 
 END AS days_between_orders

 FROM time_between_orders
Output from the above query

5. The output of the three SQL statements can then be joined back to the original Super Store table with the additional attributes about the customer and orders.

Note: for the final query using a WITH clause to organize the query into distinct virtual table blocks to help organize and keep the statement readable.

WITH 

order_sequence AS ( 

  SELECT 
  order_date,
  customer_id,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY 
  order_date ASC) as customer_order_sequence,
  LAG(order_date) OVER (PARTITION BY customer_id ORDER BY 
  order_date ASC) as previous_order_date,
  FROM `personal-264822.superstore.orders`
  GROUP BY order_date, customer_id),

time_between_orders AS (

   SELECT 
   order_date,
   customer_id,
   customer_order_sequence,
   CASE WHEN previous_order_date IS NULL THEN order_date
   ELSE previous_order_date END AS previous_order_date,
   DATE_DIFF(order_date, previous_order_date, DAY) AS 
   days_between_orders
   FROM order_sequence),
   
customer_life_cycle AS ( 

  SELECT 
   order_date,
   customer_id,

   CASE 
   WHEN customer_order_sequence = 1 THEN 'New Customer'
   WHEN days_between_orders > 0 AND days_between_orders < 366 
   THEN 'Active Customer'
   WHEN days_between_orders > 365 THEN 'Lapsed Customer'
   ELSE 'Unknown' 
   END AS customer_life_cycle,

   customer_order_sequence,
   previous_order_date,

   CASE 
   WHEN days_between_orders IS NULL THEN 0
   ELSE days_between_orders 
   END AS days_between_orders

   FROM time_between_orders)
   
SELECT 
t1.order_date,
t1.customer_id,
t2.customer_life_cycle,
t2.customer_order_sequence,
t2.days_between_orders,
t1.order_id,
t1.sales,
t1.profit,
t1.customer_segment,
t1.product_category,
t1.product_sub_category,
t1.region,
t1.state_or_province,
t1.city

FROM `personal-264822.superstore.orders` AS t1

LEFT JOIN customer_life_cycle AS t2

ON 
(t1.customer_id=t2.customer_id 
AND t1.order_date=t2.order_date) 
Output from the above query

Summary

We now have a data view that shows the customer life cycle for every transaction. The next step would be summarizing the data to generate insights. The next post will cover how to visualize this data in a service like Tableau or Data Studio.

Share On:

2 thoughts on “How to Calculate New, Active & Lapsed Customers with SQL

  1. Wow, this advanced query help me a lot!
    It’s clearly enough to be adapted to any business model based on customers analytics. Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *