WP Transients must be used responsibly

We ran into an interesting issue with WooCommerce at work. First, here is the subject of the support request we got from our hosting provider:

The site is generating ~150MB/sec of transaction logs, filling 500GB of diskspace

Holy. Shit. A WordPress site should not be generating that much data. 150MB per second? Wow.

How? Why?

The simple explanation is that there is a bottleneck in WooCommerce with the filtered layer nav query objects using single transient record.

// We have a query - let's see if cached results of this query already exist.
$query_hash    = md5( $query );
$cached_counts = (array) get_transient( 'wc_layered_nav_counts' );

if ( ! isset( $cached_counts[ $query_hash ] ) ) {
    $results                      = $wpdb->get_results( $query, ARRAY_A );
    $counts                       = array_map( 'absint', wp_list_pluck( $results, 'term_count', 'term_count_id' ) );
    $cached_counts[ $query_hash ] = $counts;
    set_transient( 'wc_layered_nav_counts', $cached_counts, DAY_IN_SECONDS );
}

What is happening here is that a sql query based on the currently selected filters is hashed and shoved into an array that is saved to a single transient record. This means that every single interaction with the filters requires a read and possible write to a single transient record. A site with any sort of traffic and let's say 9 filter widgets (with around 50 total options) will potentially generate a huge amount of unique queries. It is no wonder why we are pushing 150MB/s.

Our quick, temporary, patch was to simply remove the transient.

// We have a query - let's see if cached results of this query already exist.
$query_hash    = md5( $query );
$cached_counts = array();
$results                      = $wpdb->get_results( $query, ARRAY_A );
$counts                       = array_map( 'absint', wp_list_pluck( $results, 'term_count', 'term_count_id' ) );
$cached_counts[ $query_hash ] = $counts;

You can see the massive improvement in performance after removing the transients. We applied the patch around 9:47 am.

Woocommerce post patch

Object caching would probably help. I was surprised at how much of an improvement we saw by simply removing the transient.

I think a good solution here would be to use unique transients for each hashed query, and not a single transient for EVERY hashed query. It would work find on small WP installs and would scale.

I will try it out and see what we get and if the results are good I will submit a PR to the woocommerce devs.

update:

I said we should use transients responsibly. In this case, I would be creating potentially 15k additional (tiny) transient records. Is that more responsible than 1 massive 1mb transient?

WooCommerce devs has asked that I run some performance tests. Going to do so and report back!

update 2:

Not having any transients at all is better at scale in our case since the SQL query that is executed is not that heavy and we have some decent page caching via varnish. Also our MySQL server is well tuned. Every single request to a page with the layered nav will make N requests for the transient data. If data has to be written, that is N updates per request. This single record becomes a bottleneck as the field is locked while it is being written to. Redis or Memcache would be a a better solution. WP Transients are just bad on their own.

update 3:

I've been able to work with WooCommerce developers and I got a PR approved and merged into WooCommerce core!