Azure Data Factory Integration Runtime in Windows Container Sample

License: MIT License

Dockerfile 4.63% PowerShell 95.37%

azure-data-factory-integration-runtime-in-windows-container's Introduction

Azure Data Factory Integration Runtime in Windows Container Sample

This repo contains the sample for running the Azure Data Factory Integration Runtime in Windows Container

Support SHIR version: 5.0 or later

For more information about Azure Data Factory, see https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime

QuickStart

Prepare Windows for containers
Build the Windows container image in the project folder

> docker build . -t <image-name> [--build-arg="INSTALL_JDK=true"]

Arguments list

Name	Necessity	Default	Description
`INSTALL_JDK`	Optional	`false`	The flag to install Microsoft's JDK 11 LTS.

Run the container with specific arguments by passing environment variables

> docker run -d -e AUTH_KEY=<ir-authentication-key> \
    [-e NODE_NAME=<ir-node-name>] \
    [-e ENABLE_HA={true|false}] \
    [-e HA_PORT=<port>] \
    [-e ENABLE_AE={true|false}] \
    [-e AE_TIME=<expiration-time-in-seconds>] \
    <image-name>

Arguments list

Name	Necessity	Default	Description
`AUTH_KEY`	Required		The authentication key for the self-hosted integration runtime.
`NODE_NAME`	Optional	`hostname`	The specified name of the node.
`ENABLE_HA`	Optional	`false`	The flag to enable high availability and scalability. It supports up to 4 nodes registered to the same IR when `HA` is enabled, otherwise only 1 is allowed.
`HA_PORT`	Optional	`8060`	The port to set up a high availability cluster.
`ENABLE_AE`	Optional	`false`	The flag to enable offline nodes auto-expiration. If enabled, the node will be marked as expired when it has been offline for timeout duration defined by `AE_TIME`.
`AE_TIME`	Optional	`600`	The expiration timeout duration for offline nodes in seconds. Should be no less than 600 (10 minutes).

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

azure-data-factory-integration-runtime-in-windows-container's People

Contributors

Stargazers

Watchers

azure-data-factory-integration-runtime-in-windows-container's Issues

Cleanup previous connected node

Hi,

When I'm restarting the docker container I'm getting the following error

Registration of new node is forbidden when Remote Access is disabled on another node. To enable it, you can login the machine where the other node is installed and run 'dmgcmd.exe -EnableRemoteAccess "<port>" ["<thumbprint>"]'.

The other node it's referring to is my previous registration of the same node. After deleting the old node (with has the same name etc) registration succeeds again.

Is there a workaround for this? It seems that a cleanup before registering again should do the trick.

Context:
I'm running this windows container in an AKS edge essentials environment. Although this is a new and experimental setup this issue doesn't seem to be related.

Update:
I'm not able to run the -EnableRemoteAccess command because the old node is already gone.

New security control allow or disallow local SHIR file system access through File system connector

What is the best way to opt-in or opt-out of this new SHIR feature for SHIR Windows Containers?

https://go.microsoft.com/fwlink/?linkid=853077
CURRENT VERSION (5.23.8355.1)
· What’s new –
o Azure Data Factory: introduce a new security control for the Self-hosted Integration Runtime (SHIR) admin that lets them allow or disallow local SHIR file system access through File system connector. SHIR admins can use the local command line (dmgcmd.exe -DisableLocalFolderPathValidation/-EnableLocalFolderPathValidation) to allow or disallow. Refer here to learn more.

Note: We have changed the default setting to disallow local SHIR file system access from SHIR versions (>= 5.22.8297.1). Using the above command line, you should explicitly opt-out the security control and allow local SHIR file system if needed.

`https://go.microsoft.com/fwlink/?linkid=839822&clcid=0x409` does not download latest version

When I go to https://go.microsoft.com/fwlink/?linkid=839822&clcid=0x409 I am now downloading version 5.30.8555.1 when latest version is 5.32.8597.5, according to https://www.microsoft.com/en-us/download/details.aspx?id=39717

Screenshot of `https://go.microsoft.com/fwlink/?linkid=839822&clcid=0x409`

Screenshot of `https://www.microsoft.com/en-us/download/details.aspx?id=39717`

Multiple IR on one VM Invalid AUTH_KEY Value

Hello,

I built and run the Self-hosted IR in windows container on windows VM successfully

However, when I try to connect to the second Self-hosted IR by runing the second conainer, it does not work.

docker ps -a
poc-shir2 "powershell C:/SHIR/…" About a minute ago Exited (1) 43 seconds ago shir-container-2
poc-shir "powershell C:/SHIR/…" 34 minutes ago Up 34 minutes (healthy) compassionate_matsumoto

Docker logs:
The error message : Invalid AUTH_KEY Value

So why the second container on the same VM with the seocnd Auth-Key does not work?

Thanks

SHIR unable to reconnect to Synapse after forced restart

Problem: when container terminates non-gracefully it is unable to connect to synapse during restart

Background: I have setup the container as suggested, with below alterations

FROM: servercore:ltsc2022
Additionally installed

Denodo ODBC
Oracle Instant Client Basic Lite
VS 2017 redistributable (required for Oracle Instant Client / ODBC)
Oracle ODBC
Oracle Instant Client sqlplus tool
Generated a few DSNs from above
added server certificate

It is running fine on an AKS cluster and the deployment is controlled using ArgoCd.
When I (or AKS backend) terminates/deletes the pod is restarted by ArgoCd. This would end up in an error "Registration of new node is forbidden when Remote Access is disabled on another node. To enable it, you can login the machine where the other node is installed and run 'dmgcmd.exe -EnableRemoteAccess "" [""]'."

As a work-around I am able to delete the node in synapse and then restart the pod. With these steps it is able to re-connect .
How can I get around this problem

Passing in HA_PORT doesn't work, as $PORT isn't being passed in correctly in code

The current logic that implements passing in $HA_PORT from the docker run isn't properly making it to dmgcmd.exe due to this line of code:

$PORT = $HA_PORT -or "8060"
Start-Process $DmgcmdPath -Wait -ArgumentList "-EnableRemoteAccess"

I think -or is being incorrectly used here. By this logic, $PORT always evaluates to True (rather than picking up 8060 if no $HA_PORT is passed in - I think the intention here was to perform NULL Coalesce?)

So you get this error every time:

The value of port is invalid. Please set an integer bigger than or equal to 0 and less than or equal to 65535.

Since the expression turns out to be .\dmgcmd.exe -EnableRemoteAccess TRUE.

Possible Solution

Performance seems inadequate

Hi
When this container is installed and being used in a Synapse environment, pulling data from a DB2 database, transfer times are really slow! 700k records which is around 3GB of data takes 9 1/2 hrs to transfer.
If we take docker out of the equation and install the SHIR directly on the windows server the performance issues are gone.

I am told Docker is setup to use Default process isolation mode, which should allow docker to consume as much of the hosts resources?

I also note that the stats are blank/static within the Integration Runtime monitor when using Docker, again when direct on Server they work. Memory is always 0MB, CPU stuck at 50%, Network always 0

As far as i am aware there is no other way to install multiple SHIRs on a VM for Synapse so i really need to get this working!

health-check.ps1 logic is faulty, when setup.ps1 is accessing status-check.txt, the container shuts down

Because the health-check.ps1 and setup.ps1 are both using status-check.txt for grabbing the output from dmgcmd.exe -cgc command, I noticed when the cycles (120 seconds vs 60 seconds) clash, the Container shuts down because the health check tries to delete the same text file that setup.ps1 might be accessing:

Options could have health-check.ps1 use another filename? I disabled the healthcheck in the Dockerfile as a workaround.

Adding second container doesn't work for HA due to order of operations and incorrect remote access command

When adding a second container with ENABLE_HA and HA_PORT specified, the container doesn't correctly get registered and also fails the health check:

I found the following fixes the issue:

Use -EnableRemoteAccessInContainer instead of EnableRemoteAccess
Order of operation matters on the Second Node, need to run -EnableRemoteAccessInContainer first, before -RegisterNewNode is executed

With this setup - the second node is successful:

Enables HA:

Upgrade shir to current version

Hi,

What is the policy for upgrading the shir version in this container? I would like to have version 5.37.8767.4, but the setup script explicitly downloads 5.34.8675.1.

Could anyone bring some insight or workaround, or should I produce a pull request?

Create Linked service file system not work

Hello,

I have installed and created Self-hosted IR in windows container.
Then I try to add a linked service file system to ADF, it did't work

First, I got the error 'Access to fold is not allowed'
Then I exec the docker and run container and run this command '.\dmgcmd.exe -DisableLocalFolderPathValidation'

The access is allowed, but when I try test the connection here is the error message:
"Cannot connect to 'C:\data'. Detail Message: The operation completed successfully
The operation completed successfully"

So how could I set up the linked service pls?

Thanks

How are local files supposed to work?

I've got SHIR in a windows container, and I mounted the files it needs as a local volume. how do I make azure data factory see them?

it seems azure data factory requires a username/password, even for files local to the SHIR, so I modified the dockerfile to create a user, but it can't read anything. am I missing something?

dockerfile changes:

FROM mcr.microsoft.com/windows/servercore:ltsc2022
ARG INSTALL_JDK=false

# Download the latest self-hosted integration runtime installer into the SHIR folder
COPY SHIR C:/SHIR/

RUN ["powershell", "C:/SHIR/build.ps1"]

# Allow local folder Navigation
RUN ["powershell", "'C:/Program Files/Microsoft Integration Runtime/6.0/Shared/dmgcmd.exe -DisableLocalFolderPathValidation'"]

RUN mkdir "C:/data"

RUN net user dataUser '(PASSWORD_HERE)' /ADD

RUN ["icacls", "C:/data", "/grant", "dataUser:(OI)(CI)F"]

ENTRYPOINT ["powershell", "C:/SHIR/setup.ps1"]

ENV SHIR_WINDOWS_CONTAINER_ENV True

HEALTHCHECK --start-period=120s CMD ["powershell", "C:/SHIR/health-check.ps1"]

I then run the container with the following docker compose:

services:
  shir:
    image: shir
    build: "B:/docker/MSSHIR/Azure-Data-Factory-Integration-Runtime-in-Windows-Container-main"
    volumes:
      - type: bind
        source: "B:/data/"
        target: "C:/data"
    environment:
      AUTH_KEY: "(Auth_Key)"
      NODE_NAME: "shir"
      ENABLE_AE: true
      ENABLE_HA: true

is anyone else using local files this way? I'm pulling my hair out, this feels much easier with linux containers.

Multiple SHIR containers with cascading failures - 0x80010002 (RPC_E_CALL_CANCELED))

We are running several windows SHIR containers on the same physical machines all containers are using the same network and default nat docker switch. Once one container is unhealthy it starts to slowly cascade to the rest of the SHIR containers. We do not use a proxy and network issues are not occurring between the onprem and Azure ADF/Synapse instance.

Are their any issues with running multiple SHIR containers on the same host that all connect to different Azure ADF/Synapse instances? We have the need to scale this out to hundreds of SHIR containers.

Server 2019 Standard 1809 build 17763.3406

Dockerfile is latest with this addtion:
RUN MD C:\Download ADD https://github.com/adoptium/temurin8-binaries/releases/download/jdk8u345-b01/OpenJDK8U-jdk_x64_windows_hotspot_8u345b01.zip C:/Download RUN MD "C:\Program Files\Eclipse Adoptium\jdk8u345-b01" RUN tar -xf C:/Download/OpenJDK8U-jdk_x64_windows_hotspot_8u345b01.zip -C "C:\Program Files\Eclipse Adoptium" RUN SETX PATH "%PATH%;C:\Program Files\Eclipse Adoptium\jdk8u345-b01\bin;C:\Program Files\Eclipse Adoptium\jdk8u345-b01\jre\bin\server" /m RUN SETX JAVA_HOME "C:\Program Files\Eclipse Adoptium\jdk8u345-b01\" /m

The only docker warning that is logged on the host server:
Health check for container 39fbbf4f690da051145d18f9d4df16b6666108c76dd39cf73d177179bf961f60 error: context deadline exceeded

This show up on all the containers that are unhealthy
`[09/22/2022 12:23:08] Registering SHIR node with the node key: redacted@ServiceEndpoint=usgovva.frontend.datamovement.azure.us@Vredacted

[09/22/2022 12:23:09] Registering SHIR node with the node name: redacted
[09/22/2022 12:23:09] Registering SHIR node with the enable high availability flag: true

[09/22/2022 12:23:09] Registering SHIR node with the tcp port: 8060

[09/22/2022 12:25:54] Start registering a new SHIR node

[09/22/2022 12:25:54] Enable High Availability

[09/22/2022 12:25:54] Remote Access Port: 8060

[09/22/2022 12:31:59] Waiting 60 seconds for connecting

Get-WmiObject : Call was canceled by the message filter. (Exception from