Schema processing and data conversion in Jitterbit Integration Studio

Introduction

During processing, data in schemas may be converted. This page describes when and how data is converted depending on the schema type:

All schema types
CSV
JSON
XML

All schema types

In most cases (exceptions are noted below), certain data is converted the same way regardless of schema type:

Data types
Names of fields and nodes
Null values

Data types

These sections describe Integration Studio's support of certain data types by type of schema. For all data types supported in Jitterbit Scripts, see Data types in Jitterbit Script.

Unlimited-precision data types

For most schema types, unlimited-precision data types, such as XML decimal, are converted to double data types. This imposes a precision limit that could truncate data.

The precision limit is within the range of the minimum and maximum values of a signed 32-bit integer, which is –2,147,483,648 to 2,147,483,647. If outside of this range, consider using a string data type instead to avoid truncating data.

Non-primitive data types

The primitive data types boolean, date, double, float, integer, long, and string, and the non-primitive data type decimal are supported in all schema types. When creating or editing a custom flat or hierarchical schema or editing any schema uploaded in a transformation, these data types are available to choose in the Type dropdown. For new custom schemas, other non-primitive data types such as datetime are not supported.

However, other non-primitive data types are supported for schemas that are automatically generated from a connector-generated schema or were mirrored from such a schema. After they are generated, such schemas can also be manually edited in the custom schema editor. If the schema contains any non-primitive data types, these are also listed in the Type dropdown when editing such a schema.

Names of fields and nodes

Names of fields and nodes are converted to valid namespaces following the XML standard. For reference, see NCNameChar for the allowed characters as defined by the W3C.

When using a connector-provided schema, any special characters in a schema field or node name are replaced by underscores (_) as required to create a valid XML schema.

Note

JSON schemas follow additional rules in certain circumstances (described below).

Null values

Fields with null values are included in the resulting data schema despite having no data. As they also have no defined data type, these fields are treated as having a string data type.

Note

This applies to all schema types except JSON schemas processed using the Preserve JSON names processing (described below).

CSV schemas

When using a CSV schema, these rules apply to the following data:

CSV files with headers
Special characters

CSV files with headers

When providing a CSV file with a header row as a sample schema file, these rules are applied to generate column names:

Special characters are replaced with a question mark (?).
Spaces are replaced with an underscore (_).
Blank column names are replaced with f1, f2, f3, and so on.
Column names starting with a number are prefixed with an underscore (_).
Repeated column names are appended with 2, 3, 4, and so on.

Special characters

Special characters (. _ - $ # @ ?) in CSV schemas are converted in these circumstances:

Special characters present on the source or target side of a transformation are replaced by a question mark (?).
Special characters in a script on a target field in a transformation are replaced by a question mark (?). This rule also applies to any custom flat or custom hierarchical schema created in CSV format.

Note

These conversions affect only the schema structures, not the actual data.

JSON schemas

When using a JSON schema, data conversion depends on the Preserve JSON names project setting:

Preserve JSON names processing: The default method used when Preserve JSON names is enabled. This applies to projects created after the 11.48 Harmony release and running on agent version 11.48 or later.
Legacy JSON processing: The default method used when Preserve JSON names is disabled. This applies to projects created before the 11.48 Harmony release or running on agent version 11.47 or earlier. After upgrading to agent version 11.48 or later, you can enable Preserve JSON names processing by enabling the project setting Preserve JSON names. However, this is not recommended for existing projects.

Not recommended for existing projects

For projects created before the 11.48 Harmony release, enabling Preserve JSON names only affects operations and schemas configured after activation. This creates inconsistencies when mixed with the existing Legacy JSON processing.

Recommendation: Keep using Legacy JSON processing for existing projects, or regenerate all operations and schemas after enabling this feature to avoid inconsistencies.

The following table summarizes the key differences between the two JSON processing methods:

Method	Preserve JSON names processing	Legacy JSON processing
When used	Preserve JSON names is enabled	Preserve JSON names is disabled
Project compatibility	Projects created after 11.48 Harmony release and agent version 11.48 or later	Projects created before 11.48 Harmony release or agent version 11.47 or earlier
Field names starting with numbers	Prepended with underscore (for example, `1text` becomes `_1text`)	Replaced with underscore (for example, `12345` becomes `_2345`)
Special characters in field names	Converted at design time only; original names used at runtime	Converted at design time and runtime
Null value handling	Null fields excluded from resulting schema	Null fields included in resulting schema as string type

Preserve JSON names processing

When using Preserve JSON names processing (when Preserve JSON names is enabled), these rules apply:

Data types (as described above)
Names of fields and nodes
Null values

Names of fields and nodes

When using Preserve JSON names processing, names of fields and nodes are converted in these circumstances:

The field and node name processing described in All schema types applies only during design time. At runtime, the original field or node name with special characters is used. For example, the field name location_ids[] appears as location_ids__ in the transformation during design time, but at runtime, the original location_ids[] is used.

To verify this behavior while troubleshooting, see Special characters in connector-provided JSON schemas in Operation troubleshooting.
When a JSON schema's field or node name begins with a number, it's prepended with an underscore _. For example, 1text becomes _1text.

Null values

Fields or nodes with null values are not included in the resulting data schema.

Legacy JSON processing

When legacy JSON processing is used (when Preserve JSON names is disabled), these rules apply:

Data types (as described above)
Names of fields and nodes (as described above)

Caution

In addition to the field and node name processing described in All schema types, when using legacy JSON processing, a JSON schema's field or node name that begins with a number is replaced with an underscore _. For example, 12345 becomes _2345. This issue doesn't occur with Preserve JSON names processing.
Null values (as described above)

XML schemas

When using an XML schema, these rules apply:

Namespaces
Self-closing tags
Special characters

For information on troubleshooting XML schema and transformation errors, see Operation troubleshooting.

Namespaces

Namespaces in XML schemas are supported. If more than one namespace is used in an XML schema, the Jitterbit agent converts the XML schema to multiple XSD files during processing.

Self-closing tags

Self-closing tags on elements in XML schemas are supported with some manipulation of the sample XML used to generate the schema. To send the XML element in the payload without a mapped value, you can use a Jitterbit function and Jitterbit variable as described below.

Manipulate the sample XML

Manipulation is required when an XML sample file uses shorthand notation for an opening and closing tag in an element such as this:

<tag/>

If such shorthand is used in an XML schema directly, the node will not be able to be mapped to when used as the target schema of a transformation.

To resolve, manipulate the sample XML being used to generate the schema to expand the opening and closing tags surrounding the element, and provide a sample value so that a data type is assigned to the element:

<tag>example</tag>

The element will then show up as a field in the XML schema and, when used as the target schema in a transformation, you can map to that field.

Map to the XML field

If you do not have a source object or variable to map to the target field, you can place the Jitterbit function Null in the transformation script to use as the mapped value:

<trans>
Null()
</trans>

Upstream of the transformation using the XML schema, set one of these Jitterbit variables (depending on which is appropriate to your use case) to control what is sent:

Empty XML: To send an empty XML node in the payload, use jitterbit.target.xml.include_empty_xml:
```
$jitterbit.target.xml.include_empty_xml=true;
```
Null XML: To send a nil value in the XML payload, use jitterbit.target.xml.include_null_xml:
```
$jitterbit.target.xml.include_null_xml=true;
```
Exclude Empty XML: To exclude an empty XML node with a boolean data type, use jitterbit.target.xml.exclude_empty_data:
```
$jitterbit.target.xml.exclude_empty_data=true;
```

Special characters

Special characters in XML schemas are converted in these circumstances:

Special characters in a script on a target field in a transformation are replaced by a question mark (?).
These special characters in an XML schema field or node name are not supported (as XML doesn't support them):

$ # @ ?

Note

These conversions are limited to the schema structures only and do not affect the actual data.

Troubleshoot schema processing

Known issues

Blank mapped fields with flat source schemas

When using a flat source schema, target fields may not map correctly and appear blank in certain circumstances. This issue doesn't occur with mirrored schemas or JSON schemas.

Workaround: Add a script at the beginning of the operation to disable streaming transformations:

$jitterbit.transformation.auto_streaming = false;

Optimize memory usage

For transformations that process large files and cannot use streaming, you can configure chunking to reduce memory usage.

Configure chunking

Chunking divides large data sets into smaller pieces for processing. Configure chunking at the operation level:

Open the operation settings from one of these locations:
Project pane's Workflows tab
Project pane's Components tab
Design canvas (double-click the operation)
Select the Options tab.
Set the chunk size as large as possible while ensuring each chunk fits in available memory.

For very large XML sources and targets, chunking may be your only option if streaming isn't applicable.

Note

Use streaming transformations when possible. Chunking should only be used when streaming isn't available and memory usage is a concern.

Prevent duplicate file transfers

Harmony tracks processed files to prevent duplicate transfers. Before transferring a file, the system checks these three criteria:

Filename
Modification date
Operation ID

If all three criteria match a previously processed file, the system skips the transfer.

This automatic deduplication ensures that rerunning operations doesn't reprocess unchanged files.