Graphql Shopify to get Metafields of all Products

Shopify app developers are likely to be aware of the fact that many merchants store additional product data in Shopify metafields. This can be a challenge for developers as it is difficult to retrieve metafield data via the Shopify RESTAPI. It can also cause slow performance when synchronizing client store data.

Klevu Search is an app that retrieves product data from merchants to index them for search. We had initial difficulties with metafields but discovered that GraphQL can fetch product metafield data faster than REST with careful query creation and resource management. There is a point at which GraphQL will be slower than REST because of a combination query cost and throttling.

The payoff is still there: in some cases, we were able reduce the sync time for product metafields from 4 minutes to 10 seconds. We’d love to share our experience and provide useful advice.

Shopify Metafields via RESTAP

There is no way to bulk retrieve metafield data stored against products using the REST API. There are a few promising leads to be found in the Shopify documentation, such as /metafields.json?metafield[owner_resource]=product. These methods don’t return the data that we want in practice.

We must fetch metafield information one product at time or one variant of each product. If your client has 100 products and 400 variations, 500 API calls will be required to fetch all the metafield data. This can take up to four minutes, which we found is too long.

Shopify Metafields via GraphQL

Shopify offers a GraphQL API that allows for more flexibility when retrieving bulk data and can be used to retrieve metafields of products more efficiently.

It is more than just replacing REST calls with corresponding GraphQL calls. It is important to be careful with query creation and resource management. We will dive into this in the sections below.

Query cost

It is important to know the cost of querying GraphQL. You can find more information in the Shopify documentation. However, we will cover what you need to know here.

Each element of the query result has a resource cost. The query cost would rise if there were more products than 50. However, decreasing the product count by 10 would lower it.

“Each field in the schema is assigned an integer cost value. The sum of all the costs for each field is called the query cost. The cost of sub-selections that are based on connection fields has a multiplier effect.

This query returns the following API response.

Error Query has a cost 63252 which is higher than the maximum cost of 1000

An app is assigned a bucket with 1,000 cost points. This means that your total queries cost cannot exceed 1,000 points.

Shopify can solve our query, but it is 63 times more complicated than we expected. It needs to be simplified.

Also, we must reduce the number records that we request.

  • Still too high: 250 products with 50 metafields = 13.252 query costs
  • 50 products with 50 metafields = 2652 queries cost, moving closer.
  • 30 products with 30 metafields = 692 queries cost. Bingo!

If our store has 30 metafields, we can retrieve the values for 30 products simultaneously. With the REST API, however, we could only retrieve one product per time.

Throttling

This might lead you to believe that a data sync using GraphQL will be 30x faster than REST.

It’s not as simple as it seems. If we run the above query two times in rapid succession, such as to get the first 30 products and then the next 30, the API response will look something like this:

Error - Throttled

We’ve used about 700 points from our 1,000 point bucket. Now we need to wait until enough points are available to make the next request. This will be at a refill rate 50 points per second.

An app is given 1,000 cost points and a leak rate at 50 cost points per second. This means that your total queries cost cannot exceed 1000 points. The app creates this room at 50 points per second.

The pageInfohasNextPage is required to know whether we need to fetch more records from the next page, in which case we use the cursor field. legacyResourceId is the ID of the product variant, and ProductlegacyResourceId is needed to associate the product variant with its parent, i.e., the Parent ID.

These metafields represent the namespace key values that we need from this store. This is a dynamic part that can change depending on the number of metafields required.

This query will fetch 50 products and the relevant metafields.

Large vs. small page sizes

We discovered that query cost and throttling were not the best ways to think about it.

Low page sizes mean that the query cost is lower, which allows you to make more requests without being throttled. You will get less data with every request. You get more data if the page size is large. You will be throttled further if the query cost is high.

We tested the retrieval of one metafield from 100 products, and 400 variants.

  • It took 216 seconds to load one product per page, throttled zero times
  • It took 44 seconds to load five products per page, and was throttled zero times
  • 10 products per page took 21 seconds. Tweaked zero times
  • 25 products per webpage took 10 seconds, throttled zero
  • 50 products per webpage took 10 seconds. Tweaked three times
  • 75 products per page took 12 second, throttled six time
  • 100 products per webpage took 10 seconds. Tweaked four times
  • 150 products per page took 16 seconds. It was throttled seven times.
  • 200 products per webpage took 10 seconds. Tweaked five times
  • 250 products per page took 18 second, throttled seven time

You can see that the timings and throttle counts do not always match. The time required to complete the task is determined by a combination of these three factors:

  1. The number of metafields that are being retrieved
  2. The optimal pagination count
  3. The throttle count

This example shows that the fastest times resulted from the minimum API requests. 100 products and 400 variants are divided into 25, 50 and 100 pages respectively. Although one, five and 10 all divided into our page count perfectly (as well as the other variants), they were slower because they were below the threshold of products that could be retrieved with each request.

We can also see that the timing of the products is consistent regardless of whether they are fetching 25, 50 or 100 products per page. This is due to the increased throttles when the page size increases.

We decided to request 25 products per web page.

The results

We found a significant improvement in fetching data from stores that only required a few product metafields. GraphQL shines here, with a low query price and a very efficient bulk retrieval.

We found that the Shopify store had 100 products and 400 variants. The data sync times were approximately 4 minutes or 240 seconds.

  • One metafield: 10 Seconds
  • Two metafields are available: 20 seconds
  • Five metafields in 50 seconds
  • 10 metafields – 100 seconds
  • 20 metafields 200 seconds

It is clear that metafield counts * 10seconds are emerging.

  • 24 metafields in 240 seconds (the REST API is the same)
  • 30 Metafields: 300 seconds (slower that the REST API).

GraphQL lost its benefits in stores that needed to retrieve more metafields than 24. In fact, the REST API was faster. We would choose REST or GraphQL based upon the store’s need for metafields.

We took a couple of real customer shops with more than 10,000 products and over 10 metafields and reduced the Shopify Standard store’s total data sync time to only one hour. A similar Shopify Plus store took one hour and 45 mins and a similar Shopify Plus shop took 35 minutes to sync using this GraphQL approach.

GraphQL bulk operations are used to speed up the retrieval of Shopify metafields

We are pleased with the improvements made by GraphQL and we continue to explore the other options Shopify offers for data retrieval such as the GraphQL bulk operation API.

You will still need to use a GraphQL question, but this requires a slightly different approach. Instead of receiving the response in its entirety, you will be provided with a reference ID that you can use to check periodically if Shopify has completed preparing your data. After the task is completed, you will receive a URL that allows you to download the JSONL results.

GraphQL for Metafields Sync

We are happy to announce a new beta feature within our Shopify App that allows you to retrieve Metafield data from your shop using Shopify’s GraphQL instead of their RESTAPI. This can make it faster to sync your Klevu store data.

This feature is currently disabled by our backend team. We will first review your store to ensure suitability. There are some criteria that will improve sync time, but also criteria that could negatively impact performance.

This is easiest to show in a table that shows the results from some tests we performed. It shows the sync time for a Shopify Standard shop with approximately 500 products and 2,000 variants.

This is not the total number of Metafields that you have saved for a product. It is the number of Metafields that you have set up for retrieval in the Klevu app settings. This is crucial to know as changing the Klevu App settings to include more or fewer Metafields can impact the sync speed when using GraphQL.

If you have hundreds of Metafields in your products, but only one Metafield is configured for Klevu, you can expect faster Metafield index times with GraphQL versus using REST. This new approach may be beneficial if you have to index less than 20 Metafields using Klevu.

We saw a Shopify Standard store with more than 10,000 products sync in less time from 3 hours to 1 hour. A similar store that uses Shopify Plus saw its sync time drop from 1 hour 45 to 35 minutes to only 35 minutes.