Giter Club home page Giter Club logo

dremio-cloner's Introduction

Dremio Cloner

Dremio Cloner is a python-based utility for Dremio Enterprise. It supports the following commands: get, put, cascade-acl, report-acl, report-reflections.

Dremio Cloner can be utilized for:

  • Migrating entire Dremio environments, for example, from community edition to enterprise edition
  • CI/CD processes
  • Disaster Recovery scenarios
  • Partial backup/restore
  • Security Audit reporting
  • Reflection reporting

Dremio Cloner is executed with the following command:

python dremio_cloner.py [config_file.json]`

Dremio Migration Tool helps to migrate spaces and folders to new paths. This is done by reading in a Dremio Cloner Export, modifying it and writing it into a new directory or file. It also rewrites and reformats the SQL queries, probably SQL comments can be lost. The output file or directory can be read by Dremio Cloner and written into a destination system.

The Migration Tool is executed with the following command:

python dremio_migration.py [config_migration_file.json]

Disclaimer

All scripts/code are run at your own risk, and while they have been written with the intention of minimizing the potential for unintended consequences, Dremio will not be responsible for any errors or problems. The scripts/code are provided by Dremio "as is" and any express or implied warranties are disclaimed by Dremio. In no event will Dremio be liable for any direct, indirect, incidental, special, exemplary, or consequential damages, or any loss of use or data, however caused and on any theory of liability, arising in any way out of the use of the scripts/code, even if advised of the possibility of such damage.

Prerequisites

Dremio Cloner requires Python 3 and requires some additional Python libraries, please install:

$ pip install mo-sql-parsing requests

If you are using Dremio Migration Tool, you additionally need to install sqlparse:

$ pip install sqlparse

Important note

Older versions of Dremio Cloner used the Python package moz-sql-parser, which is now deprecated and got replaced by mo-sql-parsing. If you ran an older version of Dremio Cloner before, you need to uninstall these packages before installing mo-sql-parsing.

$ pip list
...
mo-dots            4.22.21108
mo-future          3.147.20327
mo-imports         3.149.20327
mo-kwargs          4.22.21108
mo-logs            4.23.21108
moz-sql-parser     4.40.21126

$ pip uninstall -y moz-sql-parser mo-dots mo-future mo-imports mo-kwargs mo-logs 
$ pip install mo-sql-parsing requests

Command "get"

Command "get" selectively saves definitions for objects such as Source, Space, Folder, PDS, VDS, ACLs, Reflections, Queues, Rules, Tags, Wikis, and Votes from a Dremio environment into a JSON file.

The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.

  • "command":"get"
  • "source": defines source Dremio Environment with
    • "endpoint"
    • "username"
    • "password"
    • "verify_ssl"
    • "is_community_edition"
    • "graph_api_support"
    • "is_dremio_cloud"
    • "dremio_cloud_org_id"
    • "dremio_cloud_project_id"
    • "dremio_cloud_source_catalog_name"
  • "target": defines an output filename or a directory with
    • "filename"
    • "directory"
    • "overwrite"
    • "separate_sql_and_metadata_files"
  • "options":
    • logging options
      • "logging.level"
      • "logging.format"
      • "logging.filename"
      • "logging.verbose"
    • miscellaneous options
      • "max_errors"
      • "http_timeout"
    • scope of Space processing
      • "space.process_mode"
      • "folder.process_mode"
      • "space.filter"
      • "space.filter.names"
      • "space.exclude.filter"
      • "space.folder.filter"
      • "space.folder.filter.paths"
      • "space.folder.exclude.filter"
      • "space.folder.exclude.filter.paths"
    • scope of Source processing
      • "source.process_mode"
      • "source.filter"
      • "source.filter.names"
      • "source.filter.types"
      • "source.exclude.filter"
      • "source.folder.filter"
      • "source.folder.filter.paths"
      • "source.folder.exclude.filter"
    • scope of PDS processing
      • "pds.process_mode"
      • "pds.filter"
      • "pds.filter.names"
      • "pds.exclude.filter
      • "pds.list.useapi"
    • scope of VDS processing
      • "vds.process_mode"
      • "vds.filter"
      • "vds.filter.names"
      • "vds.exclude.filter"
      • "vds.exclude.filter.paths"
      • "vds.dependencies.process_mode"
    • scope of Reflection processing
      • "reflection.process_mode"
      • "reflection.id_include_list"
      • "reflection.only_for_matching_vds&quot
    • scope of Workload Management processing
      • "wlm.queue.process_mode"
      • "wlm.rule.process_mode"
    • scope of processing other objects
      • "user.process_mode"
      • "group.process_mode"
      • "wiki.process_mode"
      • "tag.process_mode"
      • "home.process_mode"
      • "vote.process_mode"

Please see a sample JSON configuration file in the config folder of this repository.

Command "put"

Command "put" selectively updates an existing Dremio Environment from a JSON file previously generated by "get" command.

Command "put" can also process ACL transformation. For example, it can transform ACLs to use LDAP_GROUP_PROD instead of LDAP_GROUP_DEV.

Command "put" can also process Source transformations. For example it can transform paths and references in objects to use SOURCE_PROD instead of SOURCE_DEV. PLEASE NOTE: Use of the source transformation feature is against best practices. As a best practice it is recommended that sources are named the same in all environments. In addition, for the source transformation to succeed as expected you must ensure that no VDS, PDS, Column or Folder in the system contains the same name (nor will it contain an exact substring match) as the original\source data source name.

The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.

  • "command":"put"
  • "source": defines an output filename with
    • "filename"
    • "directory"
  • "target": defines target Dremio Environment with
    • "endpoint"
    • "username"
    • "password"
    • "verify_ssl"
    • "is_community_edition"
    • "is_dremio_cloud"
    • "dremio_cloud_org_id"
    • "dremio_cloud_project_id"
    • "dremio_cloud_target_catalog_name"
    • "dremio_cloud_spaces_to_catalog"
  • "options":
    • logging options
      • "logging.level"
      • "logging.format"
      • "logging.filename
      • "logging.verbose"
    • miscellaneous options
      • "max_errors"
      • "http_timeout"
      • "source.retry_timedout"
      • "dry_run"
    • processing of User and Group objects missing in the target environemnt
      • "space.ignore_missing_acl_user"
      • "space.ignore_missing_acl_group"
      • "folder.ignore_missing_acl_user"
      • "folder.ignore_missing_acl_group"
      • "source.ignore_missing_acl_user"
      • "source.ignore_missing_acl_group"
      • "pds.ignore_missing_acl_user"
      • "pds.ignore_missing_acl_group"
      • "vds.ignore_missing_acl_user"
      • "vds.ignore_missing_acl_group"
    • scope of Space processing
      • "space.process_mode"
      • "space.filter"
      • "space.filter.names"
      • "space.exclude.filter"
      • "folder.process_mode"
      • "space.folder.filter"
      • "space.folder.filter.paths"
      • "space.folder.exclude.filter"
    • scope of Source processing
      • "source.process_mode"
      • "source.filter"
      • "source.filter.names"
      • "source.filter.types"
      • "source.exclude.filter"
      • "source.folder.filter"
      • "source.folder.filter.paths"
      • "source.folder.exclude.filter"
    • scope of PDS processing
      • "pds.process_mode"
      • "pds.filter"
      • "pds.filter.names"
      • "pds.exclude.filter
      • "pds.list.useapi"
    • scope of VDS processing
      • "vds.process_mode"
      • "vds.filter"
      • "vds.filter.names"
      • "vds.exclude.filter"
      • "vds.max_hierarchy_depth"
    • scope of Reflection processing
      • "reflection.process_mode"
      • "pds.reflection_refresh_mode"
      • "reflection.id_include_list"
    • scope of processing other objects
      • "user.process_mode"
      • "group.process_mode"
      • "wiki.process_mode"
      • "tag.process_mode"
      • "home.process_mode"
      • "vote.process_mode"
    • acl transformation processing
      • "transformation"
        • "acl"
          • "file"
    • source transformation processing
      • "transformation"
        • "source"
          • "file"

Please see a sample JSON configuration file in the config folder of this repository.

Command "cascade-acl"

Command "cascade-acl" selectively propagates ACLs in an object hierarchy.

The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.

  • "command":"cascade-acl"
  • "target": defines Dremio Environment to be processed with
    • "endpoint"
    • "username"
    • "password"
    • "verify_ssl"
  • "options":
    • logging options
      • "logging.level"
      • "logging.format"
      • "logging.filename
      • "logging.verbose"
    • miscellaneous options
      • "max_errors"
      • "http_timeout"
      • "source.retry_timedout"
      • "dry_run"
    • scope of Space processing
      • "space.filter"
      • "space.exclude.filter"
      • "space.cascade-acl-origin.override-object"
      • "space.folder.filter"
      • "space.folder.exclude.filter"
      • "space.folder.cascade-acl-origin.filter"
    • scope of Source processing
      • "source.filter"
      • "source.exclude.filter"
      • "source.cascade-acl-origin.override-object"
      • "source.folder.filter"
      • "source.folder.exclude.filter"
    • scope of PDS processing
      • "pds.filter"
      • "pds.exclude.filter
      • "pds.list.useapi"
    • scope of VDS processing
      • "vds.filter"
      • "vds.exclude.filter"

Note, if none of space.cascade-acl-origin.override-object, space.folder.cascade-acl-origin.filter, and source.cascade-acl-origin.override-object specified:

  • each Space ACL will be propagated through its hierarchy and applied to Folders and VDSs as per filter configuration
    • To cascade ACLs for all spaces, specify {"space.filter": "*"}
    • To omit cascading any space ACLs, specify {"space.filter": ""}
    • To cascade ACLs for a specific named space, specify {"space.filter": "spacename"} where spacename should be replaced with the actual name of the space
  • each Source ACL will be propagated through its hierarchy and applied to PDSs as per filter configuration
    • To cascade ACLs for all sources, specify {"source.filter": "*"}
    • To omit cascading any source ACLs, specify {"source.filter": ""}
    • To cascade ACLs for a specific named source, specify {"source.filter": "sourcename"} where sourcename should be replaced with the actual name of the source

Please see a sample JSON configuration file in the config folder of this repository.

Command "report-acl"

Command "report-acl" produces a selective security report on all objects with ACL in a Dremio environment.

The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.

  • "command":"report-acl"
  • "source": defines Dremio Environment with
    • "endpoint"
    • "username"
    • "password"
    • "verify_ssl"
    • "is_rbac_version"
  • "target": defines an output filename with
    • "filename"
  • "options":
    • logging options
      • "logging.level"
      • "logging.format"
      • "logging.filename
      • "logging.verbose"
    • miscellaneous options
      • "max_errors"
      • "http_timeout"
      • "source.retry_timedout"
    • report format
      • "report.csv.delimiter"
      • "report.csv.newline"
    • scope of Space processing
      • "space.filter"
      • "space.exclude.filter"
      • "space.folder.filter"
      • "space.folder.exclude.filter"
    • scope of Source processing
      • "source.filter"
      • "source.exclude.filter"
      • "source.folder.filter"
      • "source.folder.exclude.filter"
    • scope of PDS processing
      • "pds.filter"
      • "pds.exclude.filter
      • "pds.list.useapi"
    • scope of VDS processing
      • "vds.filter"
      • "vds.exclude.filter"

Please see a sample JSON configuration file in the config folder of this repository.

Command "report-reflections"

Command "report-reflections" produces a reflection report with reflection usage information and ranking on potentially duplicate reflections.

The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.

  • "command":"report-acl"
  • "source": defines Dremio Environment with
    • "endpoint"
    • "username"
    • "password"
    • "verify_ssl"
  • "target": defines an output filename with
    • "filename"
  • "options":
    • logging options
      • "logging.level"
      • "logging.format"
      • "logging.filename
      • "logging.verbose"
    • miscellaneous options
      • "max_errors"
      • "http_timeout"
      • "source.retry_timedout"
    • report format
      • "report.csv.delimiter"
      • "report.csv.newline"

Note, that this command does not provide any option for Scope definition. Please see a sample JSON configuration file in the config folder of this repository.

Configuration Options

Target or Source section, when Dremio Environment definition is required.

Configuration Option Description
endpoint Defines Dremio API endpoint. For example, http://localhost:9047/. Mandatory attribute.
username Dremio user name. Must be an Admin. Mandatory attribute. To be left empty ("") for Dremio Cloud.
password Dremio user password. Optional field. If not provided, CLI will request password at runtime.
verify_ssl If set to False, Dremio Cloner will not validate SSL certificate of the Dremio Environment. Default is True.
is_community_edition Set to True if reading Dremio CE. Writing to Dremio CE is not supported.
graph_api_support Dremio Graph API is only available in EE starting version 4.0.0. Default value is False.
is_rbac_version Set to True if the version of Dremio EE supports the RBAC privileges model. Default value is False.
is_dremio_cloud Set to True if reading from or writing to Dremio Cloud. Default value is False.
dremio_cloud_org_id Dremio Cloud Organization ID to connect to.
dremio_cloud_project_id Dremio Cloud Project ID to connect to.
dremio_cloud_source_catalog_name Dremio Cloud root catalog name during "get" operation.
dremio_cloud_target_catalog_name Dremio Cloud root catalog name during "put" operation.
dremio_cloud_spaces_to_catalog Set to True if migrating from spaces to Arctic catalog

Target or source section, when defined with a file name

Configuration Option Description
filename Defines a JSON filename to be used as either source of information for put command or target for saving data for get command. The JSON file will encapsulate entire information on a Dremio environment. Either filename or directory must be defined.
directory Similar to filename above. However, a folder structure, identical to Dremio environment will be created and the information on a Dremio objects will be stored in sepearate files within this folder structure. This option allows for use cases with indivudal processing of Dremio objects by external tools, such as github.
overwrite Allows to overwrite existing JSON file or directory.
separate_sql_and_metadata_files Per VDS create a JSON metadata file and a SQL file with the VDS definition (only applicable when a directory is set). This option allows more efficient management of changes in code editors and improves readability of the SQL queries.

Logging options

Configuration Option Description
logging.level Defines logging level: DEBUG, INFO, WARN, ERROR
logging.format Logging format. For example: "%(levelname)s:%(asctime)s:%(message)s"
logging.filename Filename for logging. File will be appended if exists. If this option is omitted, standard output will be used for logging.
logging.verbose Default False. Produce verbose logging such as log entire entity definitions if set to True.

Miscellaneous options

Configuration Option Description
max_errors Defines a number of errors at which processing will be terminated.
http_timeout Timeout for each API call. This parameter might become important in certain situations when Sources defined in Dremio are not available.
dry_run Defines a Dremio Cloner execution that will not update a target Dremio environment. In conjunction with logging.level set to WARN allows to execute Dremio Cloner without an impact on the target environment and check the log file for all activities that would have been submitted to the target Dremio Environment. Respective log entries will include dry_run keyword.
vds._max_hierarchy_depth Defines maximum level of VDS hierarchy supported by Dremio Cloner. It's a guard rail with default value of 100.

Scope of Dremio Space processing

Configuration Option Description
space.filter A filter that defines what Spaces will be included into processing. "*" will include all Spaces. Empty field will exclude all Spaces. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with space.exclude.filter.
space.filter.names If specified, a list filter that defines what Spaces will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"space.filter.names": []},) then the "get" or "put" command will include all spaces specified by space.filter, which is the default behavior. Works in logical AND with space.exclude.filter. Example: {"space.filter.names": ["MySpace1", "MySpace2", "MySpace3"]},
space.exclude.filter A filter that defines what Spaces will be excluded into processing. "*" will exclude all Spaces. Empty field will include all Spaces. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with space.filter.
space.folder.filter A filter that defines what Space Folders will be included into processing. "*" will include all Folders. Empty field will exclude all Folders. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with other active space.folder filters.
space.folder.filter.paths If specified, a list filter that defines what Space Folder paths will be included into processing during "get" or "put" command execution. This filter is ignored, if this option is not specified or the list is empty (e.g. {"space.folder.filter.paths": []},). Works in logical AND with other active space.folder filters. Example: {"space.folder.filter.paths": ["folder1/folder2", "Staging"]},
space.folder.exclude.filter A filter that defines what Space Folders will be excluded into processing. "*" will exclude all Folders. Empty field will include all Spaces. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with other active space.folder filters.
space.folder.exclude.filter.paths If specified, a list filter that defines what Space Folder paths will be excluded into processing during "get" or "put" command execution. This filter is ignored, if the option is not specified or the list is empty (e.g. {"space.folder.exclude.filter.paths": []},). Works in logical AND with other active space.folder filters. Example: {"space.folder.exclude.filter.paths": ["ignorefolder1/folder2", "dontProcessfolder2"]},

Scope of Dremio Source processing

Configuration Option Description
source.filter A filter that defines what Sources will be included into processing. "*" will include all Sources. Empty field will exclude all Sources. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with source.exclude.filter.
source.filter.names If specified, a list filter that defines what Sources will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"source.filter.names": []},) then the "get" or "put" command will include all sources specified by source.filter, which is the default behavior. Works in logical AND with source.exclude.filter. Example: {"source.filter.names": ["MySource1", "MySource2", "MySource3"]},
source.filter.types If specified, a list filter that defines what Source Types will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty then the "get" or "put" command will include all possible source types present based on the other source filters, which is the default behavior. Works in logical AND with the other source filters. Example: {"source.filter.types": ["S3", "POSTGRES", "NAS"]},
source.exclude.filter A filter that defines what Spaces will be excluded into processing. "*" will exclude all Spaces. Empty field will include all Sources. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with source.filter.
source.folder.filter A filter that defines what Source Folders will be included into processing. "*" will include all Folders. Empty field will exclude all Folders. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with source.exclude.filter.
source.folder.filter.paths If specified, a list filter that defines what Source Folder paths will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"source.folder.filter.paths": []},) then the "get" or "put" command will include all source folders specified by source.folder.filter, which is the default behavior. Works in logical AND with source.folder.exclude.filter. Example: {"source.folder.filter.paths": ["folder1/folder2", "default"]},
source.folder.exclude.filter A filter that defines what Source Folders will be excluded into processing. "*" will exclude all Folders. Empty field will include all Spaces. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with source.filter.

Scope of Dremio PDS processing

Configuration Option Description
pds.filter A filter that defines what PDSs will be included into processing. "*" will include all PDSs. Empty field will exclude all PDSs. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with pds.exclude.filter.
pds.filter.names If specified, a list filter that defines what PDSs will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"pds.filter.names": []},) then the "get" or "put" command will include all PDSs specified by pds.filter, which is the default behavior. Works in logical AND with pds.exclude.filter. Example: {"pds.filter.names": ["MyPDS1", "MyPDS2", "MyPDS3"]},
pds.exclude.filter A filter that defines what PDSs will be excluded into processing. "*" will exclude all PDSs. Empty field will include all PDSs. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with pds.filter.
pds.list.useapi Forces to use API for collecting list of PDSs if set to True. Default value is False which means that INFOMRATION_SCHEMA will be utilized instead of API. False is a recommended value.

Scope of Dremio VDS processing

Configuration Option Description
vds.filter A filter that defines what VDSs will be included into processing. "*" will include all VDSs. Empty field will exclude all VDSs. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with vds.exclude.filter.
vds.filter.names If specified, a list filter that defines what VDS names will be included into processing during "get" or "put" command execution. This filter is ignored, if this option is not specified or the list is empty (e.g. {"vds.filter.names": []},). Works in logical AND with other active vds filters. Example: {"vds.filter.names": ["MyVDS1", "MyVDS2", "MyVDS3"]},
vds.exclude.filter A filter that defines what VDSs will be excluded into processing. "*" will exclude all VDSs. Empty field will include all VDSs. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with vds.filter.
vds.exclude.filter.paths If specified, a list filter that defines what VDSs (incl. paths and wildcards) will be excluded into processing during "get" or "put" command execution. This filter is ignored, if this option is not specified or the list is empty (e.g. {"vds.exclude.filter.paths": []},). Works in logical AND with other active vds filters. Example: {"vds.exclude.filter.paths": ["folder/ignoreVDSxyz", "*/ignoreVDSwithWildcard"]},

Scope of Dremio Reflection processing

Configuration Option Description
reflection.id_include_list If specified, a list filter that defines what reflection ids will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty then the "get" or "put" command will include all reflections, which is the default behavior. During "get" command execution this list refers to ids of reflections in the source Dremio environment, which are visible in sys.reflections. During "put" command execution this list refers to ids of reflections that were previously exported out of a source Dremio environment and present in the source file(s) being fed into the "put" command. Example: {"reflection.id_include_list": ["dc86ab2e-8ebf-4d69-9302-911875a79e74", "ad3444df-7da5-4ea5-9624-b7705f351914"]}

Scope of User and Group processing

Configuration Option Description
user.process_mode group.process_mode Determines if users will be created in the target Dremio Environment if they are referenced in the source JSON file but not in the target environment. Applicable for "put" command only. However, user creation is not possible with the current Dremio API. This parameter can only have a single value skip.
space.ignore_missing_acl_user space.ignore_missing_acl_group folder.ignore_missing_acl_user folder.ignore_missing_acl_group source.ignore_missing_acl_user source.ignore_missing_acl_group pds.ignore_missing_acl_user pds.ignore_missing_acl_group vds.ignore_missing_acl_user vds.ignore_missing_acl_group These configuration parameters define if Dremio Cloner ignores a situation when a user or a group is defined in an ACL in the source JSON file but is not present in the target Dremio Environment. This situation is a potential security risk as an ACL may be created with no limitations in the target environment when all referenced users and groups cannot be found. Default value is False.

Scope of object-level processing

Configuration Option Description
space.process_mode folder.process_mode source.process_mode pds.process_mode vds.process_mode reflection.process_mode pds.reflection_refresh_mode wlm.queue.process_mode wlm.rule.process_mode wiki.process_mode tag.process_mode home.process_mode vote.process_mode Defines whether Dremio Cloner will 1) insert new objects only or 2) update existing objects only or 3) do an upsert. These parameters can be set to: skip, create_only, update_only, create_overwrite, process. process is only aplicable for "get" command. skip will prevent any changes to the target Dremio Environment for the specified object type. Note, pds.process_mode can only take skip and promote with promote updating PDS ACL as required.
vds.dependencies.process_mode Possible values: ignore, get. Default ignore. If set to get, Dremio Cloner will collect information on all decencies throughout the object hierarchy (VDS and PDS) required for each VDS that satisfies VDS filter criteria.

Cascade-acl specific parameters

Configuration Option Description
space.cascade-acl-origin.override-object If specified, overrides default behavior for Space hierarchy and an ACL of the object specified in this parameter will be used through all Spaces all hierarchies instead of the respective Spaces' ACLs. A valid example is this: {"space.filter": "spacetest"}, {"space.cascade-acl-origin.override-object": "spacetest/spacetest_folder"}, which is interpreted as read the ACLs from the object called spacetest/spacetest_folder and apply those ACLs to each object under the space called spacetest.
source.cascade-acl-origin.override-object If specified, overrides default behavior for Source hierarchy and an ACL of the object specified in this parameter will be used through all Source all hierarchies instead of the respective Sources' ACLs.
space.folder.cascade-acl-origin.filter If specified, overrides default behavior for Space hierarchy and an ACLs of the Folders selected by this will be used through its Folder hierarchy instead of the respective Source's ACL. A valid example is this: {"space.filter": "spacetest"}, {"space.cascade-acl-origin.override-object": "spacetest/spacetest_folder"}, {"space.folder.cascade-acl-origin.filter": "another_folder"}, which can be interpreted as all objects under spacetest will get the ACLs that are defined in spacetest/spacetest_folder, EXCEPT for those in spacetest/another_folder. All objects beneath another_folder (whose full path is spacetest/another_folder in this example) will have their ACLs set to whatever the ACLs are on another_folder.

Transformation parameters

Configuration Option Description
transformation If specified, allows for transformation during "put" command execution. Supported transformations are ACL and Source transformation. Transformation rules are specified in a separate json file and the file is referenced in the main comnfiguration file. For example: {"transformation": {"acl": {"file": "acl_transformation.json"}}} for ACL transformations and {"transformation": {"source": {"file": "source_transformation.json"}}} for Source transformations

Report format parameters

Configuration Option Description
report.csv.delimiter A field delimiter used to generate a report.
report.csv.newline A new line delimiter used to generate a report.

dremio-cloner's People

Contributors

chufe-dremio avatar deane-dremio avatar jeff-99 avatar mxmarg avatar tejkm avatar tokoko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dremio-cloner's Issues

Dependency resolving causes an infinite loop on a valid VDS definition

In one of our systems we have the following VDS that causes an infinite loop in dependency resolving.

The VDS' name is Staging.TOS.Container.Container
and the query roughly looks like this:

WITH CONTAINER AS ( SELECT ... )
SELECT *
FROM CONTAINER
WHERE X = 1 

This gave a python recursion depth exception on processing the VDS.
Changing the reference to the following solved the issue:

WITH CONTAINER_BASE AS ( SELECT ... )
SELECT *
FROM CONTAINER_BASE
WHERE X = 1 

The initial query is a perfectly valid query so should IMO not cause an issue in syncing the script to source control

Dremio cloner unable to deploy 1 VDS (vds ln_dly_calc_fact)

I am opening this on behalf of Pawan Teja at Fannie Mae ("Nyshadham, Pawan Teja x (Contractor)" [email protected]).

Description:

We had earlier deployment on OCT end and they are unable to push this vds (LN_dly_calc_fact) in

Prod & Acpt environment

Attempts: Dev team has tried various scenarios to push the code , removed comments & reduced lines of sqlScript and it was unsuccessful .

below are the new attributes to the vds which could not be deployed

Line 201 :  Loan_Final_Additional_Tier_1_Cost_Of_Capital_Basis_Point_Rate
Line : 225 :   Loan_Final_Tier_2_Cost_Of_Capital_Basis_Point_Rate

Please see attached files .:

fnm_config_write_dir (1).json
import_10_31_2023_13_53_20 (2).log
LN_DLY_CALC_FACT_VW (1).txt

Pawan originally opened this via a ticket, but max suggested a github request.

GET reflections from Dremio CE

GET reflections from Dremio CE
Test Env is Dremio Dremio AWSE CE 24.0.0 and 24.1.4

The Dremio "GET" operation did not pick up the reflections even when {"reflection.process_mode": "process"}.
I also tried this with {"reflection.only_for_matching_vds":"True"} and {"reflection.filter_mode": "apply_vds_pds_filter"}

The only Workaround i could get working is cumbersome using a List of Reflection Ids to migrate
{"reflection.id_include_list": ["8bf8d3dd-b3c7-47f7-879d-eef43765b061"]}

Unable to read/get a single VDS only

I tried to simply download the definition of a single VDS. However, looks like dremio-cloner downloads all the folders along with the single VDS.

Here is my folder structure
image

Here is the output that I get
image
In the results output directory, I want to see inherit.json but not the other two folders mk and ck (Both of them actually have VDS's within them but dremio-cloner downloads only the empty folders).

What would be the right config to use here in order to only get the single VDS definition? (get only inherit.json file as output)
Below is the desired output.
image

Below is the config file used (config_read_dir.json):
Along with generating unnecessary folders, it also ends up taking a lot of time when the number of folders are high. (Time is spent making API calls for each folder)

{"dremio_cloner": [
  {"command":"get"},
  {"source": [
	{"endpoint": "https://dremio.nonprod.com/"},
	{"username": "dremio-local-admin"},
	{"password": "****"},
	{"verify_ssl": "True"},
	{"is_community_edition": "False"},
	{"graph_api_support": "True"}]
  },
{"target": [
	{"directory":"results"},
	{"overwrite": "True"}]
	},
	{"options": [
	{"logging.level":"logging.DEBUG"},
	{"logging.format":"%(levelname)s:%(asctime)s:%(message)s"},
	{"logging.filename":"read_log"},
	{"logging.verbose": "False"},

	{"max_errors":"9999"},
	{"http_timeout":"10"},

	{"user.process_mode":"skip"},
	{"group.process_mode":"skip"},
	{"space.process_mode":"skip"},
	{"source.process_mode":"skip"},
	{"reflection.process_mode": "skip"},
	{"wlm.queue.process_mode": "skip"},
	{"wlm.rule.process_mode": "skip"},
	{"wiki.process_mode": "skip"},
	{"tag.process_mode": "skip"},
	{"home.process_mode": "skip"},
	{"vote.process_mode": "skip"},
	{"folder.process_mode": "skip"},
	{"vds.process_mode": "process"},
	{"pds.process_mode": "skip"},

	{"space.filter": "*"},
	{"space.filter.names": ["CICD"]},
	{"space.exclude.filter": ""},
	{"space.folder.filter":"*"},
	{"space.folder.filter.paths": []},
	{"space.folder.exclude.filter":""},

	{"source.filter":"*"},
	{"source.filter.names": []},
	{"source.filter.types": []},
	{"source.exclude.filter":""},
	{"source.folder.filter":"*"},
	{"source.folder.filter.paths": []},
	{"source.folder.exclude.filter":""},

	{"pds.filter":"*"},
	{"pds.filter.names": []},
	{"pds.exclude.filter":""},
	{"pds.list.useapi":"False"},

	{"vds.filter":"*"},
	{"vds.filter.names": ["inherit"]},
	{"vds.exclude.filter":""},
	{"vds.dependencies.process_mode":"ignore"},

	{"reflection.only_for_matching_vds":"True"}]
	}]
}

Writing permissions to DCS Project

Writing permissions to DCS Project

Using Cloner to Write to Target DCS Project, seems like acl_transformation_rbac.json file is mandatory for migaring permissions even if No Access or Permissions are being Changed.

In absence of a transformation file the during the PUT operation (write) we see an error:
ERROR:2023-08-08 11:24:17,489:_process_acl: Source User de3711ce-5367-4cdf-9b37-f7f4e8d01ecd not found in the target Dremio Environment. ACL Entry cannot be processed as per ignore_missing_acl_user configuration. space:DeepakSpace

Including an ACL file as shown below fixed the issue.
{"acl-transformation": [ { "source": {"user":"[email protected]"}, "target": {"user":"[email protected]"}} ] }

If i am not transforming any Permissions, why should we need to include a transformation file ??
The workaround is cumbersome, requiring Cloner/Dremio ADMIN to consolidate a list of all Users/Roles from Source and either build a acl_transformation file to include all these roles/users or to Generate SQL to grant Privileges from sys.organization.users or sys.users on Source Dremio Cluster.

Sync cloud instance does not load any catalog content

Hi,

While trying to sync our first cloud instance of Dremio i am facing an issue: i started from the read_dremio_cloud config file. The sync does read the catalog and source setup, but won't start the download of their actual contents (homes, sources, spaces etc.). Attached you will find my config file, and my logfile (without the two important guids and username). Thanks for having a look.

Patrick
content.json
sync-get-log.txt

space.folder.filter.paths config is not being respected while performing reads

Below config filter path is not being applied when performing a get operation using config_read_dir.json
{"space.folder.filter.paths": [""]},

For example,
Let's say I have a space named 'my_space' and within that space I have a folder called 'trades'. Using below config, dremio-cloner should only pull the objects located within this folder.

	{"space.filter": "*"},
	{"space.filter.names": ["my_space"]},
	{"space.exclude.filter": ""},
	{"space.folder.filter": "*"},
	{"space.folder.filter.paths": ["trades"]},
	{"space.folder.exclude.filter":""},

However right now it is pulling all the objects located in the space.

Issue handling reflections

Hi,

I'm trying this project for the first time, and I am seeig an eror

python dremio_cloner.py ..\test_read.json
Traceback (most recent call last):
File "C:\PythonProjects\dremio-cloner\src\dremio_cloner.py", line 159, in
main()
File "C:\PythonProjects\dremio-cloner\src\dremio_cloner.py", line 47, in main
get_dremio_environment(config)
File "C:\PythonProjects\dremio-cloner\src\dremio_cloner.py", line 78, in get_dremio_environment
dremio_data = reader.read_dremio_environment()
File "C:\PythonProjects\dremio-cloner\src\DremioReader.py", line 57, in read_dremio_environment
self._read_reflections()
File "C:\PythonProjects\dremio-cloner\src\DremioReader.py", line 281, in _read_reflections
reflections = self._dremio_env.list_reflections()['data']
TypeError: 'NoneType' object is not subscriptable

If I change the reflections setting in the JSON config from Process to Skip, this doesn't have an issue.

I'm running against Dremio 18.1.0 Community Edition

Dremio Cloner with Cloud: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)

I am trying to test Dremio Cloner with Cloud. I have followed the readme at https://github.com/deane-dremio/dremio-cloner/blob/master/README.md.

However, when attempting to perform a PUT to Cloud I encounter the following error:

File "C:\Python\Lib\site-packages\requests\adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

Here is an excerpt from the config_write_dir.json file:

{"dremio_cloner": [
{"command":"put"},
{"target": [
{"endpoint": "http://api.eu.dremio.cloud/"},
{"username": ""},
{"password": ""},
{"verify_ssl": "True"},
{"is_community_edition": "False"},
{"is_dremio_cloud": "True"},
{"dremio_cloud_org_id": "#########################"},
{"dremio_cloud_project_id": "###############"}
]
},

RecursionError: maximum recursion depth exceeded for a put operation

Hello Folks,

I have used dremio cloner in the past successful for source migration, I try to implement again to migrating an s3 source, but this time ran into a couple of error. I am using dremio version 22.1.7 integrated with Active Directory
LogINFO:2023-05-16 14:48:25,032:Executing command 'put'. WARNING:2023-05-16 14:48:25,651:_process_acl: Source User 30489f5d-678a-4129-a7ad-6becbbc425ca not found in the target Dremio Environment. User is removed from ACL definition as per ignore_missing_acl_user configuration. space:Samson
Error from console
Traceback (most recent call last): File "/Users/s.eromonsei/dremio-cloner/src/dremio_cloner.py", line 159, in <module> main() File "/Users/s.eromonsei/dremio-cloner/src/dremio_cloner.py", line 49, in main put_dremio_environment(config) File "/Users/s.eromonsei/dremio-cloner/src/dremio_cloner.py", line 96, in put_dremio_environment writer.write_dremio_environment() File "/Users/s.eromonsei/dremio-cloner/src/DremioWriter.py", line 90, in write_dremio_environment self._write_space(space, self._config.space_process_mode, self._config.space_ignore_missing_acl_user, self._config.space_ignore_missing_acl_group) File "/Users/s.eromonsei/dremio-cloner/src/DremioWriter.py", line 126, in _write_space return self._write_entity(entity, process_mode, ignore_missing_acl_user_flag, ignore_missing_acl_group_flag) File "/Users/s.eromonsei/dremio-cloner/src/DremioWriter.py", line 312, in _write_entity updated_entity = self._dremio_env.update_catalog_entity(entity['id'], entity, self._config.dry_run, report_error) File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 250, in update_catalog_entity return self._api_put_json(self._catalog_url + entity_id, entity, source="update_catalog_entity", report_error = report_error) File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 443, in _api_put_json return self._api_put_json(url, json_data, source, report_error, False) File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 443, in _api_put_json return self._api_put_json(url, json_data, source, report_error, False) File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 443, in _api_put_json return self._api_put_json(url, json_data, source, report_error, False) [Previous line repeated 967 more times] File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 430, in _api_put_json response = requests.request("PUT", self._endpoint + url, json=json_data, headers=self._headers, timeout=self._api_timeout, verify=self._verify_ssl) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 790, in urlopen response = self._make_request( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 536, in _make_request response = conn.getresponse() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/urllib3/connection.py", line 454, in getresponse httplib_response = super().getresponse() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1322, in getresponse response.begin() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 327, in begin self.headers = self.msg = parse_headers(self.fp) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 221, in parse_headers return email.parser.Parser(_class=_class).parsestr(hstring) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/parser.py", line 67, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/parser.py", line 56, in parse feedparser.feed(data) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/feedparser.py", line 176, in feed self._call_parse() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/feedparser.py", line 180, in _call_parse self._parse() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/feedparser.py", line 295, in _parsegen if self._cur.get_content_maintype() == 'message': File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/message.py", line 594, in get_content_maintype ctype = self.get_content_type() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/message.py", line 578, in get_content_type value = self.get('content-type', missing) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/message.py", line 471, in get return self.policy.header_fetch_parse(k, v) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/_policybase.py", line 316, in header_fetch_parse return self._sanitize_header(name, value) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/_policybase.py", line 287, in _sanitize_header if _has_surrogates(value): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/utils.py", line 57, in _has_surrogates s.encode() RecursionError: maximum recursion depth exceeded while calling a Python object

Allow non-admin users to run the tool

In version 24.1.0, DX-60480 was fixed, which prevents non-admin users from using /api/v3/users/{id} (unless they've also got the create user privilege).
This causes Cloner to fail with

{
    "errorMessage": "User not allowed to get details of other user",
    "moreInfo": ""
}

The workaround is to assign the user GRANT CREATE USER ON SYSTEM TO USER <username> but some people don't want to allow CI/CD teams to create users.

The request is to change the cloner tool to use APIs that are runnable by non-admin users and users who don't have the CREATE USER privilege.

Unable to deploy to more than one level of folders

In our dremio space we have the workspace + the root folder + additional folders..
Example: BI_PROJECTS.XXX1.XXX2.XXX3
We can't deploy to the level of XXX3 folder but only at XXX1 level.
This is an issue as there multiple developments happening at XXX1 level folders and this is causing conflicts as there still non-existing dependencies.
Is there currently a solution for that?
thanks

Deploying VDS that have Wiki details

Hello,

I am using the dremio-cloner script to deploy my dremio environment and I have some virtual datasets that have Wiki details written. I encounter the following errors during the deployment:

DEBUG:2024-02-06 14:13:41,939:_write_wiki: processing wiki: {'entity_id': 'xxx', 'path': ['SELF_SERVICE_PROJECTS', 'DCOG_PROJECT', 'SIMILARWEB', 'SEGMENT_TRAFFIC_AND_ENGAGEMENT'], 'text': ''} DEBUG:2024-02-06 14:13:41,959:https://xxx.com:443 "GET /api/v3/catalog/by-path/SELF_SERVICE_PROJECTS/DCOG_PROJECT/SIMILARWEB/SEGMENT_TRAFFIC_AND_ENGAGEMENT HTTP/1.1" 404 148 INFO:2024-02-06 14:13:41,959:get_catalog_entity_by_path: received HTTP Response Code 404 for : <api/v3/catalog/by-path/SELF_SERVICE_PROJECTS/DCOG_PROJECT/SIMILARWEB/SEGMENT_TRAFFIC_AND_ENGAGEMENT> errorMessage: Could not find entity with path [[SELF_SERVICE_PROJECTS, DCOG_PROJECT, SIMILARWEB, SEGMENT_TRAFFIC_AND_ENGAGEMENT]] moreInfo: ERROR:2024-02-06 14:13:41,959:_write_wiki: Unable to resolve wiki's dataset for {'entity_id': 'xxx', 'path': ['SELF_SERVICE_PROJECTS', 'DCOG_PROJECT', 'SIMILARWEB', 'SEGMENT_TRAFFIC_AND_ENGAGEMENT'], 'text': ''} ERROR:2024-02-06 14:13:41,959:_write_wiki: Unable to resolve wiki's dataset for {'entity_id': 'xxx', 'path': ['SELF_SERVICE_PROJECTS', 'DCOG_PROJECT', 'SIMILARWEB', 'SEGMENT_TRAFFIC_AND_ENGAGEMENT'], 'text': ''}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.