Google BigQuery Data Transfer activity¶
Introduction¶
A Google BigQuery Data Transfer activity, using its Google BigQuery connection, transfers data from a data source to a dataset in Google BigQuery and is intended to be used as a target to consume data in an operation.
Create a Google BigQuery Data Transfer activity¶
An instance of a Google BigQuery Data Transfer activity is created from a Google BigQuery connection using its Data Transfer activity type.
To create an instance of an activity, drag the activity type to the design canvas or copy the activity type and paste it on the design canvas. For details, see Create an activity instance in Component reuse.
An existing Google BigQuery Data Transfer activity can be edited from these locations:
- The design canvas (see Component actions menu in Design canvas).
- The project pane's Components tab (see Component actions menu in Project pane Components tab).
Configure a Google BigQuery Data Transfer activity¶
Follow these steps to configure a Google BigQuery Data Transfer activity:
-
Step 1: Enter a name and select the data source
Provide a name for the activity and select the data source. -
Step 2: Select the dataset
Select the dataset. -
Step 3: Select the table
Select the table. -
Step 4: Review the data schemas
Any request or response schemas are displayed.
Step 1: Enter a name and select the datasource¶
In this step, provide a name for the activity and select the data source. Each user interface element of this step is described below.
-
Name: Enter a name to identify the activity. The name must be unique for each Google BigQuery Data Transfer activity and must not contain forward slashes
/
or colons:
. -
Select the Data Source: This section displays data sources available in the Google BigQuery endpoint, either Amazon S3 or Google Cloud Storage.
-
Selected Data Source: After a data source is selected, it is listed here.
-
Search: Enter any column's value into the search box to filter the list of data sources. The search is not case-sensitive. If data sources are already displayed within the table, the table results are filtered in real time with each keystroke. To reload data sources from the endpoint when searching, enter search criteria and then refresh, as described below.
-
Refresh: Click the refresh icon or the word Refresh to reload data sources from the Google BigQuery endpoint. This may be useful if data sources have been added to Google BigQuery. This action refreshes all metadata used to build the table of data sources displayed in the configuration.
-
Selecting a Data Source: Within the table, click anywhere on a row to select a data source. Only one data source can be selected. The information available for each data source is fetched from the Google BigQuery endpoint:
-
Name: The name of the data source.
-
Description: The description of the data source.
-
Tip
If the table does not populate with available data sources, the Google BigQuery connection may not be successful. Ensure you are connected by reopening the connection and retesting the credentials.
-
-
Save & Exit: If enabled, click to save the configuration for this step and close the activity configuration.
-
Next: Click to temporarily store the configuration for this step and continue to the next step. The configuration will not be saved until you click the Finished button on the last step.
-
Discard Changes: After making changes, click to close the configuration without saving changes made to any step. A message asks you to confirm that you want to discard changes.
Step 2: Select the dataset¶
In this step, select the dataset. Each user interface element of this step is described below.
-
Select the Dataset: This section displays datasets available in the Google BigQuery endpoint.
-
Selected Data Source: The datasource selected in the previous step is listed here.
-
Selected Dataset: After a dataset is selected, it is listed here.
-
Search: Enter any column's value into the search box to filter the list of datasets. The search is not case-sensitive. If datasets are already displayed within the table, the table results are filtered in real time with each keystroke. To reload datasets from the endpoint when searching, enter search criteria and then refresh, as described below.
-
Refresh: Click the refresh icon or the word Refresh to reload datasets from the Google BigQuery endpoint. This may be useful if datasets have been added to Google BigQuery. This action refreshes all metadata used to build the table of datasets displayed in the configuration.
-
Selecting a Dataset: Within the table, click anywhere on a row to select a dataset. Only one dataset can be selected. The information available for each dataset is fetched from the Google BigQuery endpoint:
-
Name: The name of the dataset.
-
Description: The description of the dataset.
-
Tip
If the table does not populate with available datasets, the Google BigQuery connection may not be successful. Ensure you are connected by reopening the connection and retesting the credentials.
-
-
Back: Click to temporarily store the configuration for this step and return to the previous step.
-
Next: Click to temporarily store the configuration for this step and continue to the next step. The configuration will not be saved until you click the Finished button on the last step.
-
Discard Changes: After making changes, click to close the configuration without saving changes made to any step. A message asks you to confirm that you want to discard changes.
Step 3: Select the table¶
In this step, select the table. Each user interface element of this step is described below.
Tip
Fields with a variable icon support using global variables, project variables, and Jitterbit variables. Begin either by typing an open square bracket [
into the field or by clicking the variable icon to display a list of the existing variables to choose from.
-
Select the Table: This section displays tables available in the Google BigQuery endpoint.
-
Selected Dataset: The dataset selected in the previous step is listed here.
-
Selected Table: After a table is selected, it is listed here.
-
Search: Enter any column's value into the search box to filter the list of tables. The search is not case-sensitive. If tables are already displayed within the table, the table results are filtered in real time with each keystroke. To reload tables from the endpoint when searching, enter search criteria and then refresh, as described below.
-
Refresh: Click the refresh icon or the word Refresh to reload tables from the Google BigQuery endpoint. This may be useful if tables have been added to Google BigQuery. This action refreshes all metadata used to build the table of tables displayed in the configuration.
-
Selecting a Table: Within the table, click anywhere on a row to select a table. Only one table can be selected. The information available for each table is fetched from the Google BigQuery endpoint:
-
Name: The name of the table.
-
Description: The description of the table
-
Tip
If the table does not populate with available tables, the Google BigQuery connection may not be successful. Ensure you are connected by reopening the connection and retesting the credentials.
-
-
Display Name: Set a display name for the data transfer.
-
Back: Click to temporarily store the configuration for this step and return to the previous step.
-
Next: Click to temporarily store the configuration for this step and continue to the next step. The configuration will not be saved until you click the Finished button on the last step.
-
Discard Changes: After making changes, click to close the configuration without saving changes made to any step. A message asks you to confirm that you want to discard changes.
Step 4: Review the data schemas¶
Any request or response schemas are displayed. Each user interface element of this step is described below.
-
Data Schemas: These data schemas are inherited by adjacent transformations and are displayed again during transformation mapping.
The Google BigQuery connector uses the Google SDK version 25.4.0. Refer to the SDK documentation for information on the schema nodes and fields.
Important
The value used for the
custom_schedule
field should be a string that follows the format defined in Google's documentation on scheduling jobs with cron.yml without prependingschedule:
. For example,every 12 hours
orevery monday 09:00
.The Data Transfer activity uses JSON in both its request and response schemas.
-
Refresh: Click the refresh icon or the word Refresh to regenerate schemas from the Google BigQuery endpoint. This action also regenerates a schema in other locations throughout the project where the same schema is referenced, such as in an adjacent transformation.
-
Back: Click to temporarily store the configuration for this step and return to the previous step.
-
Finished: Click to save the configuration for all steps and close the activity configuration.
-
Discard Changes: After making changes, click to close the configuration without saving changes made to any step. A message asks you to confirm that you want to discard changes.
Next steps¶
After configuring a Google BigQuery Data Transfer activity, complete the configuration of the operation by adding and configuring other activities, transformations, or scripts as operation steps. You can also configure the operation settings, which include the ability to chain operations together that are in the same or different workflows.
Menu actions for an activity are accessible from the project pane and the design canvas. For details, see Activity actions menu in Connector basics.
Google BigQuery Data Transfer activities can be used as a target with these operation patterns:
- Transformation pattern
- Two-transformation pattern (as the first or second target)
To use the activity with scripting functions, write the data to a temporary location and then use that temporary location in the scripting function.
When ready, deploy and run the operation and validate behavior by checking the operation logs.