Giter Club home page Giter Club logo

device's Introduction

naisdevice

naisdevice is a mechanism enabling NAVs developers to connect to internal resources in a secure and friendly manner.

Each resource is protected by a gateway, and the developer is only granted access to the gateway if all of the following requirements are met:

  • Has a valid account
  • Has accepted naisdevice terms and conditions
  • Device is healthy
  • Is member of the AAD access group for the gateway (e.g. to connect to team A's DB (via gateway), you must be member of team A's AAD-group)

Deploying client changes

Executing make release-frontend is required for deploy of new naisdevice client to be released and made available for download/install/update.

key attributes

  • minimal attack surface
  • instantly reacting to relevant security events
  • improved auditlogs: who connected when and to what
  • moving away from traditional device management enables building a strong security culture through educating our users on client security instead of automatically configuring their computers

components

apiserver

The apiserver component serves as the gRPC API server, responsible for handling various configurations and managing communication with other agents. Its primary functionalities include:

Run API server locally

# Create a sqlite database file with a mock device
go run ./hack/local-device.go
# Start apiserver
go run ./cmd/apiserver

## Run device agent with access to your local apiserver
go run ./cmd/naisdevice-agent --local-apiserver

gateway-agent

The gateway-agent runs on virtual machines (VMs) and interacts with the apiserver to receive and apply configurations. Key features of the gateway-agent include:

  • Streaming configurations from the apiserver.
  • Dynamic setup of:
    • WireGuard for communication from devices.
    • iptables for forwarding traffic.

auth-server

The auth-server operates in a cloud run environment and plays a crucial role in user authentication. Its functionalities include:

  • Authenticating users.
  • Issuing tokens to devices for secure communication.

enroller

The enroller is deployed on Cloud Run and is responsible for managing the enrollment process for both gateways and devices.

  • Handling the enrollment of gateways and devices securely.

device-helper

The device-helper serves as the gRPC API for the device-agent and performs essential setup tasks for devices. Key functionalities include:

  • Providing a gRPC API for the device-agent.
  • Reading device serial information.
  • Configuring network interfaces, routes, and WireGuard for secure communication.

device-agent

The device-agent is a crucial component responsible for managing device configurations and facilitating communication with the apiserver. Its main features include:

  • Streaming configurations from the apiserver.
  • Delegating configuration tasks to the device-helper via its gRPC API.
  • Serving status updates through its gRPC API to the CLI/systray.
  • Executing the authentication flow to obtain user tokens.

systray

The systray component acts as a graphical user interface (GUI) for the agent, utilizing its gRPC API. It provides a convenient way for users to interact with and monitor the agent's status.

controlplane-cli

The controlplane-cli serves as an administrative command-line interface (CLI) interacting with the apiserver through its gRPC API. This CLI is designed for administrative tasks and configurations.

prometheus-agent

The prometheus-agent component connects to all gateways over WireGuard and configures Prometheus (deployed on the same VM) to scrape relevant metrics.

  • Establishing connections to gateways using WireGuard.
  • Configuring Prometheus to scrape metrics from connected gateways.

FAQ

How to install

See https://doc.nais.io/operate/naisdevice/how-to/install/

Stuff we use

Kolide

WireGuard

device's People

Contributors

ahusby avatar androa avatar audunstrand avatar chinatsu avatar christeredvartsen avatar dependabot[bot] avatar erlingjd avatar frodesundby avatar henrikhorluck avatar jhrv avatar jksolbakken avatar jrtm avatar kimtore avatar mortenlj avatar muni10 avatar pcmoen avatar pjwalstrom avatar rbjornstad avatar sechmann avatar starefossen avatar thokra-nav avatar toby1knby avatar tommytroen avatar toresbe avatar tronghn avatar x10an14 avatar x10an14-nav avatar ybelmekk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

device's Issues

Lage phar-pakker til device health update

For å forenkle installasjon / bruk av device health update skal det genereres phar pakker.

Det bør laget to pakker, en for "update" og en for "get-checks". Dette kan legges til i hovedworkflowen til master-branchen. Det kan også lages snapshots for andre branches.

Use tags in Kolide instead of a static configuration file

Now that Kolide have a concept around tags we should use these for check severity.

Currently all checks in Kolide have been tagged with the correct levels, according to the existing severity levels found in the configuration file.

Clean up routes when tunnel is not working

Currently, if the tunnel stops working it will end up in a never ending loop because:

  • We add routes that tunnel Microsoft-traffic through a gateway
  • We use azure ad auth on our api server

In the scenario where the microsoft gateway stops working, we're unable to fetch new access tokens, and therefore unable to communicate with the api.

Screenlock compliance check

Is "missing" from the Linux tables in Osquery. We need a discussion on how to mitigate/solve this requirement.

as a naisdevice admin i want to have a dashboard (w/alerts) so i know if the system is working properly

  • instrument gateway-agent and apiserver with relevant metrics
  • setup metrics-rig, prometheus on own server? must communicate over tunnel
  • prometheus available as DS for common grafana instance

metrics:
gateways:
- throughput (gateways)
- device count
apiserver:
- device count
- healthy/unhealthy count
- apicalls by code
healthchecker:
- platformtype count

alerts:

  • if gateways or apiserver goes down
  • x time since healthchecker has run
  • ..

improve login-page

Some of our users seem to not appreciate the kekw meme as much as we do.

Gateway agent make restart less disruptive

Atm the gateway-agent will always setup the wg interface (teardown+setup), which disconnects everyone. Instead of teardown we can check if the wg interface is already set up, and just skip that step if it is (or allow the ip link add command to fail).

Improve start-up time

Currently it takes ~20 seconds before all timers have run in a order to allow reaching the gateways

  1. device-agent-helper takes 10 seconds to run sync on the bootstrap-config
  2. this allows device-agent, about 5 seconds later, to get gateways from apiserver
  3. 10 seconds later, these are synced and made available for user

AAD token is not refreshed properly

ERRO[2020-06-04T08:17:23+02:00] Unable to get gateway config: getting device config: Get http://10.255.240.1/devices/C02X1CGMJG5J/gateways: oauth2: cannot fetch token: 400 Bad Request
Response: {"error":"invalid_grant","error_description":"AADSTS70043: The refresh token has expired or is invalid due to sign-in frequency checks by conditional access. The token was issued on 2020-06-03T07:09:26.9675714Z and the maximum allowed lifetime for this request is 82800.\r\nTrace ID: 39de2a28-2749-4a6e-8aeb-f05d97ad0701\r\nCorrelation ID: 672adb9a-e472-40ac-a220-c48f11499d90\r\nTimestamp: 2020-06-04 06:17:23Z","error_codes":[70043],"timestamp":"2020-06-04 06:17:23Z","trace_id":"39de2a28-2749-4a6e-8aeb-f05d97ad0701","correlation_id":"672adb9a-e472-40ac-a220-c48f11499d90","suberror":"token_expired"}

Eksempel på fungerende on-prem gateway

Må modellere inn informasjon om hvilke CIDRs gatewayen skal være proxy for.
Vi har i dag to typer gateways, en som kun er en nat-gw (f.eks azure-gw) og en som proxyer trafikk (apiservere i GCP). Dette må skilles på i modellen slik at gateway-agent kan konfigurere iptables riktig avhengig av hvilken type gateway den er.

Kan muligens bare utledes ved at gatewayen ikke har definert noen routes.

if len(gateway.routes) == 0 {
  // proxy type
} else {
  // nat type
}

Agent seem unable to refresh tokens

After running agent for a while, this occurs:

INFO[2020-05-27T17:53:56+02:00] Starting device-agent with config:
{APIServer:http://10.255.240.1 Interface:utun69 ConfigDir:/Users/hrv/Library/Application Support/naisdevice BinaryDir:/usr/local/bin BootstrapToken: WireGuardBinary: WireGuardGoBinary: PrivateKeyPath: WireGuardConfigPath: BootstrapConfigPath: LogLevel:info OAuth2Config:{ClientID:8086d321-c6d3-4398-87da-0d54e3d93967 ClientSecret: Endpoint:{AuthURL:https://login.microsoftonline.com/62366534-1ec3-4962-8869-9b5535279d0b/oauth2/v2.0/authorize TokenURL:https://login.microsoftonline.com/62366534-1ec3-4962-8869-9b5535279d0b/oauth2/v2.0/token AuthStyle:0} RedirectURL:http://localhost:51800 Scopes:[openid 6e45010d-2637-4a40-b91d-d4cbb451fb57/.default offline_access]} Platform: BootstrapAPI:https://bootstrap.device.nais.io}
INFO[2020-05-27T17:53:56+02:00] If the browser didn't open, visit this url to sign in: https://login.microsoftonline.com/62366534-1ec3-4962-8869-9b5535279d0b/oauth2/v2.0/authorize?access_type=offline&client_id=8086d321-c6d3-4398-87da-0d54e3d93967&code_challenge=xxx-sg6nXyE4SnuZ0&code_challenge_method=S256&redirect_uri=http%3A%2F%2Flocalhost%3A51800&response_type=code&scope=openid+6e45010d-2637-4a40-b91d-d4cbb451fb57%2F.default+offline_access&state=HrNmLh1i5iNSf9YB
Starting device-agent-helper, you might be prompted for password
Password:
INFO[2020-05-27T17:54:02+02:00] Starting device-agent-helper with config:
{Interface:utun69 BinaryDir: WireGuardBinary:/usr/local/bin/naisdevice-wg WireGuardGoBinary:/usr/local/bin/naisdevice-wireguard-go WireGuardConfigPath:/Users/hrv/Library/Application Support/naisdevice/wg0.conf LogLevel:info DeviceIP:10.255.240.9}
ERRO[2020-05-27T19:04:23+02:00] Unable to get gateway config: getting device config: Get http://10.255.240.1/devices/SERIAL/gateways: oauth2: cannot fetch token: 400 Bad Request
Response: {"error":"invalid_grant","error_description":"AADSTS70043: The refresh token has expired or is invalid due to sign-in frequency checks by conditional access. The token was issued on 2020-05-22T18:38:20.0528180Z and the maximum allowed lifetime for this request is 82800.\r\nTrace ID: 5d0a143c-3d3e-4872-8ac6-be1628421e00\r\nCorrelation ID: 39a0d8d8-368d-4237-ae96-9a8ddf1574f8\r\nTimestamp: 2020-05-27 17:04:23Z","error_codes":[70043],"timestamp":"2020-05-27 17:04:23Z","trace_id":"5d0a143c-3d3e-4872-8ac6-be1628421e00","correlation_id":"39a0d8d8-368d-4237-ae96-9a8ddf1574f8","suberror":"token_expired"}

Brutal cleanup of existing `wireguard-go` processes

device-agent-helper sometimes fail due to existing wireguard-go process (that for unknown reasons were not killed when a previous instance of device-agent-helper exited)

We could check for and kill existing wireguard-go processes when we start device-agent-helper

Improve column names in the API server database

last_check and last_seen should be renamed.

last_check is only updated by the API server, and this occurs when a device is updated. Rename to last_updated?

last_seen is when Kolide last saw the device. Rename to kolide_last_seen?

Device health checker does not always check for failures

The device health checker only fetches failures for the Kolide devices if the failure_count attribute of the device is larger than the resolved_failure_count attribute. The issue is that resolved_failure_count is an always increasing number, while failure_count reflects the current amount of failing checks for the device, and will reset to 0 when a failure has been resolved.

Automate gateway setup

Today we use terraform to set up the VM, networking, secrets and serviceusers.
The rest of the setup: packages, systemd and kernel settings are set manually, and should be automated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.