Development and deployment using TML and REST API

Development and deployment using TML and REST API

When deploying embedded analytics, each organization will have defined practices for development, testing, and deployment of content to ThoughtSpot. ThoughtSpot instances act as a constantly running service, so deployment only involves publishing ThoughtSpot content, in the form of ThoughtSpot Modeling Language (TML) files to a given ThoughtSpot instance.

Note
Any example workflow you see within this document has been implemented and tested within the libraries available in the Additional Resources and we recommend that you start with these libraries and tools.

OverviewπŸ”—

The three steps to building an SDLC process with ThoughtSpot are:

  1. Export source: Downloading ThoughtSpot objects as TML files into a source control system (for example, Git)

  2. Build release: Altering copies of the TML files for the next stage / environment

  3. Import release: Importing the TML files into the new environment

Every object on a ThoughtSpot instance has a GUID as a unique reference.

The most essential aspect of steps 2 and 3 is recording any newly created object GUIDs from the destination environment into a mapping file along with the GUID of the source object.

When publishing to a new environment on the same ThoughtSpot instance, you must swap out the GUIDs from the source environment with those of the equivalent objects in the destination environment within the TML files, so that only destination environment content is referenced.

Deployment scenariosπŸ”—

Instance-to-instance deploymentπŸ”—

The simplest deployment scenario is moving content from one ThoughtSpot instance to another separate instance.

The TML Import process will use the guid: property of the imported TML files as the GUID for the new objects on the destination instance on all instances later than 9.0.0, which includes all ThoughtSpot Cloud deployments.

This means that a mapping file or swapping in and out GUIDs is not required. You simply make sure that all Connections have the same unique names on both instances and TML files should import without any modifications.

If your instance is running a ThoughtSpot release version lower than 9.0.0.cl, refer to the Notes for older releases.

Multiple environments on the same instanceπŸ”—

Many ThoughtSpot customers have multiple "environments" on the same instance, either using Orgs or well-defined Access Control.

In this scenario, you must track the equivalent GUIDS between source and destination environments, and swap them out within the TML files for your deployment process to work correctly.

The workflow for a very simple "dev" to "prod" flow on the same environment shown here, is the same pattern for any source-to-destination environment flow:

Development and deployment workflow

GUID mapping fileπŸ”—

As noted above, keeping a mapping file of GUIDs of source objects and their descendant objects in the destination environment is essential. The exact structure of the file will depend on the complexity of your deployment needs.

The simplest pattern is to assume that releases are built exclusively from the "dev" environment, regardless of the destination environment. This pattern can be represented in the simple JSON structure:

{
  "test": {
    "<dev-env-guid>" :  "<test-env-guid>"
  },
  "uat": {
    "<dev-env-guid>" :  "<uat-env-guid>"
  },
  "prod": {
    "<dev-env-guid>" :  "<prod-env-guid>"
  }
  ...
}

More complex data structures can be used if you have the possibility of multiple source environments. The important aspect is recording the source GUID along with the GUID of the equivalent object in the destination.

Note that this pattern assumes 1:1 equivalence between the source and destination environment.

Export source processπŸ”—

The process for exporting TML files into source control is:

  1. Use Metadata APIs (/metadata/list in v1 or /metadata/search in v2.0) to get a filtered list of objects

  2. Use /metadata/tml/export endpoint in REST API v1 or v2.0 with export_fqns=true argument and formmattype=YAML to retrieve the TML of the object

  3. Save the TML response strings to disk in a Git-enabled directory using a consistent name format

You can use the CS Tool Scriptibility package for a pre-built tool for programmatic exporting or build your own equivalent using the thoughtspot_rest_api_v1 Python library.

Best practices with TML export APIπŸ”—

The formattype argument can be set to YAML or JSON.

Export in YAML for saving to disk for source control or use with the thoughtspot_tml library. Export in JSON when you need details from TML within a web browser or just need to read values programmatically.

You can pass any number of GUIDs in the export_ids argument, although it is simpler to retrieve one at a time, particularly when processing the results obtained from the export_associated=true option.

The export_associated argument retrieves the TML objects for all related objects when used, including the GUID of each object within the headers. This is useful for dependency checking, and was valuable in versions lower than 8.9.0.cl to fill in fqn values. For more information, see Notes for older releases.

Build release processπŸ”—

To change the source environment TML files so that they can be imported into the destination environment, you need a process that correctly manipulates the TML files.

Common adjustments include:

  • Switching connections at the Table level

  • Changing database details within Table objects

  • Adding or removing columns

  • Renaming columns for translations

For information about the specific TML changes to achieve these goals, see Modify TML files. There are also functioning code examples of many of these changes in the thoughtspot_tml repository.

GUIDs in TML files determine create vs update operationsπŸ”—

Objects of the same or different types can have the same display name in ThoughtSpot, so the GUID is necessary to identify the particular object.

In the REST APIs, id properties are the GUIDs.

In TML:

  • the guid: property will be at the top of the file

  • fqn: properties are used to reference other connected objects (typically data sources) with a GUID

Rules for create vs. update operationsπŸ”—

Object names are never used for determining an object to update, because object names are not unique within ThoughtSpot.

Whether an imported TML will create a new object or update an existing object depends on:

  • the presence/absence of the guid: property in the TML file

  • whether that GUID matches an existing object on that ThoughtSpot instance

  • the force_create=true parameter

Creation vs. update is determined by the following rules:

  • No GUID in the TML file: always creates a new object with a new GUID

  • GUID in TML file, where an object with the same GUID already exists in instance: update object

  • GUID in TML file, where no object with same GUID exists in ThoughtSpot instance: creates a new object with the GUID from the TML file

  • Table objects match on fully-qualified tables in the database (each Connection can only have one Table object per table in the database), not GUID: If a Table object representing the same database table is found, the GUID of the original object is maintained, but the updates are applied from the new TML file

  • force_create=true parameter of the TML Import API is used: every uploaded TML file results in new objects being created

Note

In versions prior to 9.0.0.cl, ThoughtSpot did not consistently use the GUID provided in the TML file for a new object when that GUID was not already in use on that ThoughtSpot instance.

GUID mapping and swappingπŸ”—

Regardless of the other changes you make, building a release for an environment on the same instance will require swapping in the correct GUIDs. Because the presence of the guid property determines whether an individual TML file will cause a create or update action, you need to keep a GUID mapping file to determine how to adjust the TML files for upload to the new environment.

The guid mapping file is referenced when creating the final TML files for publishing and then should be updated with any new object GUIDs after publishing:

  1. Check the guid mapping file

    1. If no key-value pair exists for the dev GUID for the new environment: remove the guid property from the TML file. This will cause a create action

    2. If a key-value pair exists: swap the TML file guid value from the dev GUID to the destination environment GUID. This will cause an update action

  2. When a new object is published for the first time, record the dev GUID as the key, and the new object GUID as the value

  3. Perform the same process for any fqn properties, which specify data object references. Remove the fqn property if the data object is being newly created, or swap it to the mapped GUID for that environment

The thoughtspot_tml library provides a helper function called disambiguate() which implements the logic described above when provided with a Dict representing the GUID map. For information about how to use the library, see the README and examples or look at the source code if building an equivalent process yourself in another language.

Import release processπŸ”—

The /metadata/tml/import REST API endpoint is used to upload any number of TML files at one time.

All details of the objects to be created or modified are specified within the uploaded TML file, including the GUID which determines which existing object a given TML file will update.

The Build release process section above describes the process for getting the TML files prepared for the import release process. The following describes the Import TML REST API call and what to do with the responses, which do feed back into the build release process in the form of the GUID mapping file.

TML import options and responsesπŸ”—

ThoughtSpot does not consider object display name for a TML file, but does use name matching for data object references within a TML file.

All data objects are referenced as "tables" within TML, whether they are a ThoughtSpot table, Worksheet, View, SQL view, or any other data object type.

The following heuristic is used to find matching objects by name within tables or joins sections:

  1. Data object names within the same TML Import operation: Must only be one single object with that name

  2. Searches the entire ThoughtSpot instance: Must be only one single object with that name

The best practice is to create and upload "packages" of related objects together at once:

  • Give data objects within a package unique names, even though not enforced by ThoughtSpot

  • All Table objects that use the same Connection object and all Worksheets connected to those tables should be uploaded together in a single TML Import

  • If a data object already exists, swap out the fqn references to avoid the name matching heuristic

Storing new GUIDs in a mappingπŸ”—

To track relationships between objects in different environments, particularly on the same instance, you must store a mapping of the child object GUID to its source object GUID when you first publish the child object.

The import REST API endpoint returns the GUID in the response after a successful import. The object key of the response to the import call contains an array, where each element has a ["response"]["header"]["id_guid"] key providing the GUID. If you import multiple TML files at once, the response array will be in the same order as the request. This allows you to record a mapping of the originating GUID to the newly created GUIDs.

{
  "object": [
    {
      "response": {
        "status": {
          "status_code": "OK"
        },
        "header": {
          "id_guid": "a09a3787-e546-42cb-888f-c17260dd1229",
          "name": "Basic Answer 1",
          "description": "This is basic answer with table and headline visualizations.",
          "author_guid": "59481331-ee53-42be-a548-bd87be6ddd4a",
          "owner_guid": "a09a3787-e546-42cb-888f-c17260dd1229",
          "metadata_type": "QUESTION_ANSWER_BOOK"
        }
      }
    }
  ]
}

Update the mapping file with the new pair of source object GUID and destination environment object GUID, so that the release build process can do the appropriate swaps the next time the object needs to be updated.

Additional ResourcesπŸ”—

  • The thoughtspot-tml module is written in Python providing classes to work with the TML files as Python objects. You can install it via pip:

    pip install thoughtspot_tml
  • The thoughtspot-rest-api-v1 module is a Python module implementing the full ThoughtSpot V1 REST API. You can install it via pip:

    pip install thoughtspot_rest_api_v1
  • The ts_rest_api_and_tml_tools project provides examples of workflows using the REST API and TML modification possible with the thoughtspot_tml and thoughtspot_rest_api_v1 modules. This library is intended to provide working examples and is not maintained or supported by ThoughtSpot.

  • The examples/tml_and_sdlc/ directory includes many different example scripts for these TML-based workflows.

    Within the examples directory, the tml_download.py script is a simple example of exporting all TML objects to disk for use with Git or another source control system.

  • For command-line administration tools including many pre-built TML-based workflows, the cs_tools project is available.

Notes for older releases (8.9.0.cl or earlier versions)πŸ”—

Add FQNs of associated objects in TMLπŸ”—

Prior to ThoughtSpot 8.9.0.cl, TML files did not include the GUIDs of associated objects on export. However, you can use the export_associated=true argument to retrieve the GUIDs of the associated objects, then programmatically add the fqn property to the downloaded TML with the correct GUIDs. Including the GUIDs in the saved files on disk allows you to substitute the GUIDs for the equivalent objects in another environment.

For example, in these earlier versions, the items in the tables: list of this example worksheet TML only include a name: property, representing the name of the ThoughtSpot table object (as opposed to the table’s name in the data warehouse).

If there are table objects with duplicate names, specify the GUID of the object using the fqn: property. This will distinguish the correct object when importing the TML back.

When you set export_associated=true in the TML export command, the first item in the response will be the object you requested in the export:

guid: 0a0bb654-b0e8-482c-a6c8-9ed396d1cb92
worksheet:
  name: Markspot 2 Worksheet
  tables:
  - name: DIM_CUSTOMERS_2
  table_paths:
  - id: DIM_CUSTOMERS_2_1
    table: DIM_CUSTOMERS_2
    join_path:
    - {}
...

The overall response will be structured as a JSON array, with an edoc property representing the TML document itself and an info section providing basic metadata information, but more importantly the name and id properties.

{
  "object": [
    {
      "edoc":  "<string of the TML doc>"
        ,
        "info": {
          "id": "<object guid>",
          "name": "<object name>",
           ...
        }
      },
     ...
  ]
}

Parse through this array and record a simple mapping of name to guid:

Python example of this process
name_guid_map = {}

for obj in objs:
    name_guid_map[obj['info']['name']] = obj['info']['id']

Because we know that these are the GUIDs that match the name values in this particular TML file, we can now use the map we created to add in the fqn properties, to result in the worksheet TML looking like this:

guid: 0a0bb654-b0e8-482c-a6c8-9ed396d1cb92
worksheet:
  name: Markspot 2 Worksheet
  tables:
  - name: DIM_CUSTOMERS_2
    fqn: 3b87cea1-7767-4fd8-904f-23255d4ba7b3
  table_paths:
  - id: DIM_CUSTOMERS_2_1
    table: DIM_CUSTOMERS_2
    join_path:
    - {}