Google DataStudio

Track BQ Usage of your DataStudio Reports

turned on monitoring screen

If you connect a BigQuery data source to your DataStudio dashboard, DataStudio will request the data in the background as needed. This comes in very handy. Esp. since BQ is so fast and basically unlimited storage. But depending on the report you build and how you use it, it might generate unexpected costs.

Querying the data from BigQuery costs 5$ per Terabyte of data. While 1TB is a lot and hard to reach, multiple executed queries can quickly add up.

There are several ways to keep the costs under control. In this article, I want to share how to keep track of the costs of a single DataStudio dashboard or connector.

DataStudio BigQuery Connector

When you connect a BQ datasource to DataStudio, you can select if the access credentials should be used from the dashboard owner or viewer.

If you select Owner (probably you), it would keep it easy since you don’t need to give everyone access to the underlying dataset. But in this case, all data usage of DataStudio will actually be attributed to your Google Cloud Account. It looks like you caused this data volume, but in fact, it’s just the reports you build. This is the problem I want to solve with this article. In the end, you will be able to see the data usage of each single BQ connector in each dashboard.

If you use Viewer’s credentials, you would need to provide every user access to the underlying dataset in BigQuery. That might be unpractical in most cases.

Set a marker in your queries

In order to track all usage, you need to use the custom query from the BigQuery connector and add a comment to the query. If you have a simple query like this:

SELECT * FROM myDataSet;

you will add a comment like this:

--datastudio_<dashboard>_<connector>
SELECT * FROM myDataSet;

Where <dashboard> and <connector> describe the DataStudio dashboard you are working on and the specific BQ connector you are using. This will later help to further separate the usage and costs. The values <dashboard> and <connector> should only contain these characters [a-zA-Z0-9.+-]. This helps later with RegEX and the SQL Query.

This added comment does not change anything with the query or the data itself, but it allows us to track its usage.

Analyzing the usage

BigQuery provides a table with meta-information about all executed queries including the full query. We can query this meta-information with a LIKE lookup for our custom string as follows:

SELECT * FROM `<project>`.`region-eu`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE query LIKE '%--datastudio_%'

More specifically, we are looking for the data volume and billing costs.

SELECT
FORMAT_TIMESTAMP("%F", creation_time) as date,
REGEXP_EXTRACT(query, r'--datastudio_([a-zA-Z0-9.+-]+)_[a-zA-Z0-9.+-]+') AS dashboard,
REGEXP_EXTRACT(query, r'--datastudio_[a-zA-Z0-9.+-]+_([a-zA-Z0-9.+-]+)') AS connector,
CAST(SUM(total_bytes_processed/1024/1024/1024/1024) AS FLOAT64) AS TB,
CAST(SUM(total_bytes_processed/1024/1024/1024/1024)*5 AS FLOAT64) AS USD
FROM `<PROJECT>`.`region-eu`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE query LIKE '%--datastudio_%'
GROUP BY 1,2,3

Change the region-eu value to wherever your dataset is located. See a list of all BigQuery locations here.

The result contains dashboard, connector, terrabyte (TB) and costs (USD) per day, like this

{  "date": "2022-05-16",
   "dashboard": "facebook",
   "connector": "performance",
   "TB": "0.0023289003129320918",
   "USD": "0.011644501564660459"}
{  "date": "2022-05-17",
   "dashboard": null,
   "connector": null,
   "TB": "0.00016425196099589812",
   "USD": "0.00082125980497949058"}

When the dashboard and connector are set to null, this would indicate these queries we do to receive this data and can be considered as a kind of overhead for this report. This should stay quite low. The data is not too much.

Show this data in a dashboard

And now the visualization. We can just connect the query from above as a custom Query in the BigQuery Connector and start drawing some charts or tables, e.g. this table showing overall costs for a certain date range for all Dashboards.

Or this chart shows the overall costs of all dashboards per day per dashboard.

You can get creative here and draw whatever chart or table you need.

The idea is now to identify DataStudio dashboards that consume an extensive amount of data with no clear reason why they would need that much data. You can then deep-dive into that reports and optimize the queries. That can lower the costs for the Google Cloud as well as optimize the loading time of the dashboards.

Leave a Reply