Let's Work Together

Synchronize Firebase JSON to BigQuery Columns

Image by Troy Dugger

Synchronizing Firestore JSON Data Into BigQuery Table Columns

In our last blog post, we documented the synchronization of mobile app data from Firestore to BigQuery with the raw JSON placed in a data column of a BigQuery table. While that is a perfectly acceptable way of doing this, maybe the data you want to synchronize has a defined schema that will never, or at least rarely, change. In this case, it would be better to deconstruct the JSON data into various BigQuery table columns. So, let’s cover that now.

We will build off the following services enabled during the last blog:

Set up BigQuery schema

  1. In GCP, go to BigQuery.

  2. Select the messages table from the left navigation tree.

  3. Click the EDIT SCHEMA button.

  4. Click the Add Field button in New Fields section.

  5. Type event_name.

  6. Click the Add field button again.

  7. Type user_id.

  8. Click the SAVE button.

Create Schema in PubSub to match BigQuery table

  1. In GCP, go to PubSub.

  2. Click on Schemas from the left navigation bar.

  3. Click the CREATE SCHEMA button.

  4. Use the following settings for the Schema:

    • Schema ID: message
    • Schema type: Protocol Buffer
  5. Paste the following as the Schema definition:

    1
    2
    3
    4
    5
    6
    
    syntax = "proto2";
    
    message ProtocolBuffer {
        required string event_name = 1;
        required string user_id = 2;
    }
    
  6. Click the CREATE button.

Create PubSub Topic with Schema

  1. Click on Topics from the left navigation bar.

  2. Click CREATE TOPIC button.

  3. Use the following settings for the Topic:

    • Topic ID: firestore-to-BQ
    • Use a schema: check
    • Pub/Sub schema: schemas/message
  4. Click CREATE TOPIC button.

Set up BigQuery Subscription (check Use topic schema)

  1. Click EXPORT TO BIGQUERY button.

  2. Click CONTINUE button.

  3. Use the following settings for the BigQuery Subscription:

    • Subscription ID: firestore-to-BQ-bqsub
    • Dataset: firestore_data_copy
    • Table: messages
    • Use topic schema: check
  4. Click CREATE button.

Add code to Cloud Function to read schema and write message using protobfjs

  1. In GCP, go to Cloud Functions.

  2. Click on copy-data-to-bigquery in list.

    Note: We created this in the previous blog, but if you don’t have it, here are the settings to use:

    • Environment: 1st gen
    • Name: copy-data-to-bigquery
    • Region: same region used for Firestore
    • Trigger type: Cloud Firestore
    • Event type: create
    • Document Path: messages/{documentId}
  3. Click Edit button.

  4. Click Next button.

  5. Copy the following code into index.js:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    
    const { Logging } = require('@google-cloud/logging');
    const logging = new Logging();
    const log = logging.log('publishToBigQuery');
    
    const { PubSub } = require('@google-cloud/pubsub');
    const pubsub = new PubSub();
    
    const protobuf = require('protobufjs');
    
    const extractFirestoreDocumentFromEvent = (event) => {
        const data = {};
        for (field in event.value.fields) {
            const fieldValueObjectName = Object.keys(event.value.fields[field])[0];
            data[field] = event.value.fields[field][fieldValueObjectName];
        }
        return data;
    }
    
    exports.publishToBigQuery = async event => {
        try {
            const firestoreData = extractFirestoreDocumentFromEvent(event);
            const topic = pubsub.topic('firestore-to-BQ');
    
            const schema = pubsub.schema('message');
            const info = await schema.get();
    
            const type = protobuf.parse(info.definition, { keepCase: true });
            const ProtocolBuffer = type.root.lookupType('ProtocolBuffer');
            const message = ProtocolBuffer.create(firestoreData);
    
            const data = Buffer.from(JSON.stringify(message));
            const value = await topic.publishMessage({data});
            log.info(log.entry(`Document published on pubsub ${value}`));
        } catch (error) {
            log.error(log.entry(`Error when publishing document: ${error}`));
        };
    };
    
  6. Copy the following code into package.json:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    {
        "name": "sample",
        "version": "1.0.0",
        "dependencies": {
            "@google-cloud/logging": "^10.1.9",
            "@google-cloud/pubsub": "^3.1.0",
            "protobufjs": "^7.1.0"
        }
    }
    

Create data in Firestore, see data in columns

  1. In GCP, go to Firestore.

  2. Click on three-dot icon for the test document in the messages collection.

  3. Choose Add similar document from the popup menu.

  4. Click the SAVE button.

  5. In GCP, go to Logging > Logs Explorer.

    You should see the following:

  6. In GCP, go to BigQuery.

  7. Select messages table from the left navigation tree.

  8. Click the Query drop-down button, and choose In new tab to add a new editor tab.

  9. Type * between the words SELECT and FROM.

  10. Click the Run button.

    In the Query results section that appears, you should see the data you entered into Firestore

Summary

That’s pretty much it and as long as your schema doesn’t change, you’re good to go. But, currently, if you do need to change the schema, you will have to delete and recreate the Pub/Sub Topic, Schema and BigQuery Subscription.

Also, if the for loop at the beginning of the Cloud Function code gave you the chills, here’s a version of the code that gets the document data from Firestore. It’s a little cleaner and while it does have the added cost of hitting Firestore, it also deletes the document which keeps your Firestore tidy.

Thanks for reading and I hope this blog has helped you. Keep us in mind for all your mobile solution needs.