Giter Club home page Giter Club logo

sdk's Introduction

@jupiterone/sdk

A collection of packages supporting integrations with JupiterOne.

Development Resources

Introduction

Integrating with JupiterOne may take one of these paths:

  1. A structured integration leveraging this SDK to dramatically simplify the synchronization process, essential for any significant, ongoing integration effort
  2. A command line script (sh, bash, zsh, etc.) using the JupiterOne CLI tool to easily query/create/update/delete entities and relationships in bulk
  3. Any programming/scripting language making HTTP GraphQL requests to query/create/update/delete entities and relationships
  4. A JavaScript program using the JupiterOne Node.js client library to query/create/update/delete entities and relationships

The integration SDK structures an integration as a collection of simple, atomic steps, executed in a particular order. It submits generated entities and relationships, along with the raw data from the provider used to build the entities, to the JupiterOne synchronization system, which offloads complex graph update operations, provides integration progress information, and isolates failures to allow for as much ingestion as possible.

An integration built this way runs not only on your local machine; it can be deployed to JupiterOne's managed infrastructure. You can easily build the integration you need today and run it wherever you'd like. When you're ready, we can help you get that integration running within the JupiterOne infrastructure, lowering your operational costs and simplifying adoption of your integration within the security community!

Please reference the integration development documentation for details about how to develop integrations with this SDK.

Development

To get started with development:

  1. Install dependencies using npm install
  2. Run npm run build

This project utilizes TypeScript project references for incremental builds. To prepare all of the packages, run npm run build. If you are making a changes across multiple packages, it is recommended you run npm run build -- --watch to automatically compile changes as you work.

Linking packages

If you are making changes to the SDK and you want to test the changes in another project then it is recommended to automatically rebuild and link this project when changes are made.

Steps to automatically build and link:

  • Run npm run build or npm run build --watch in this project from a terminal and wait for initial build to complete.

  • Run npm link in the package that you want to link.

  • In a separate terminal, run npm link @jupiterone/<package to link> in the integration project. You can now use the integration SDK CLI in the other project and it will use the latest code on your filesystem.

Versioning this project

To version all packages in this project and tag the repo with a new version number, run the following (where major.minor.patch is the version you expect to move to). Don't forget to update the CHANGELOG.md file!

git checkout -b release-<major>.<minor>.<patch>
git push -u origin release-<major>.<minor>.<patch>
npm exec lerna version <major>.<minor>.<patch>

Note the git checkout/git push is required because Lerna will expect that you've already created a remote branch before bumping, tagging, and pushing the local changes to remote.

โ•Make sure to have committed all your changes before running npm exec lerna version since it will commit the version update and tag that commit. Rebasing or amending lerna's commit will cause the tag to point to a different commit.

sdk's People

Contributors

ctdio avatar austinkelleher avatar ndowmon avatar zemberdotnet avatar adam-in-ict avatar nick-ncsu avatar aiwilliams avatar vdubber avatar ronaldeam avatar mknoedel avatar gonzalo-avalos-ribas avatar dobregons avatar mdaum avatar jzolo22 avatar i5o avatar imdevinc avatar softwarewright avatar austin-rausch avatar ryan-willis avatar dobregon-truepill avatar edreyer1 avatar dependabot[bot] avatar ceelias avatar maxheckel avatar eyadmba avatar philidem avatar cdeevfrr avatar jakeferrero avatar gastonyelmini avatar geovannimp avatar

Stargazers

Austin Songer,MIS,CEH,ESCA,Project+ (Navy Veteran) avatar Patrick Borja Bueno  avatar Aeonik Chaos avatar Taylor D. Edmiston avatar  avatar Andrew Fisher avatar Neal Patel avatar  avatar Rex Whitten avatar Rick Waterman avatar Stephen Demjanenko avatar Jayson Jensen avatar Ben Wheat avatar Akash Ganapathi avatar Prathamesh More avatar Stefan Virag avatar  avatar  avatar  avatar Carter Hesterman avatar

Watchers

Rex Whitten avatar James Cloos avatar Ryan McAfee avatar  avatar  avatar  avatar  avatar jmibarlucea avatar  avatar

sdk's Issues

Allow an integration to provide a step context initialization function

I'd like the steps to receive an integration-specific sub-type of the IntegrationStepExectionContext, which has integration-specific properties on it used across the steps. This should happen once before the steps are called, and the executionHandler should be typed to receive the integration-specific context.

A concrete use case: initializing a single provider API client object. Without this mechanism, each integration has to invent a way to share a single instance of an API client across the steps.

Move test fixtures and utils into someplace that's a little easier to access

It would nice to be able to access the utilities without having to dig into other __tests__ directories.

There's a simple trick that can be done to have node resolve the directory without having to muck with require (see this test and this directory)

Once we move to a monorepo, we could just put the utils and fixtures in a separate package that won't be published.

Provide method for sharing state between steps

In the Qualys integration, it was necessary to collect QID values (Qualys ID values) from multiple collectors/steps and then at the end we need to batch fetch all of the Qualys vulnerabilities using those accumulated QID values. That is we need some way to collect QID values across multiple steps and share that state with another step that will actually go fetch the Qualys vulnerabilities.

I'm open to different ideas for how to solve this problem as well. In this specific Qualys example, it seems necessary to accumulate a set of values (to avoid duplicates) and then wait until the end to actually fetch the entities from that set.

Convert project into a monorepo

Switching to a monorepo will help keep testing and cli code out of the final package/bundle that we hope to use in deployed environments while still keeping packages nicely in sync.

The code is already pretty well segregated so creating separate packages should be pretty easy.

src/cli -> @jupiterone/integration-sdk-cli
src/framework -> @jupiterone/integration-sdk (or maybe @jupiterone/integration-sdk-core?)
src/testing -> @jupiterone/integration-sdk-testing

Disable schema validation when deployed to production

Since we might not always get complete data back from integration, we should avoid enforcing schema validation when deployed to production environment. Schema validation should only occur when developing and testing integration.

Allow optional & default fields for instanceConfig.

For example, see the field role below:

{
  "username": {
    "type": "string"
  },
  "account": {
    "type": "string"
  },
  "password": {
    "type": "string",
    "mask": true
  },
  "role": {
     "type": "string",
     "optional": true,
     "default": "jupiterone"
  }
}

`validateInvocation` doesn't provide enough details of what is invalid when returning `false`

I've noticed that integrations are inconsistent in what they return/do in the validation function. I think we need this to be more consistent. In jupiter-managed-integration-sdk, the function did not return anything, but expected one or other error thrown, such as https://bitbucket.org/lifeomic/jupiter-managed-integration-sdk/src/858886f3f9d71eecb407cae153991bf711491e0f/src/integration/types/errors.ts#lines-55, because a simple true or false isn't very useful for communicating why the invocation isn't valid. Of course, we could allow for returning more than just true or false, to include some kind of reason, such as [false, "Authentication failed (path=/thing/I/checked, code=401)"] or [false, "Authorization forbidden (path=/resource/I/tried, code=403"].

Support `--step` option in `sync` and `run` commands

The most of the logic already exists in the collect command, we just need pull that out. For the sync command, we will probably need to adapt the code that determines partial datasets from a collection run to work with it.

Change "integrationConfigFields.json" to be JSON schema compatible

Currently, integrations perform a lot of basic validation on IntegrationInstance config that could be shifted to automatic validation in the SDK. We could do this by changing our existing integrationConfigFields.json to be JSON schema compatible.

For example, many integrations have validation like this:

const apiKey = instance.config?.apiKey;
if (!apiKey) {
  throw new Error(
    'Configuration option "apiKey" is missing on the integration instance config',
  );
}

// More simple validation...

Proposed:

integrationConfigFields.json:

{
  // We assume that every `integrationConfigFields.json` is type "object". If it is specified, we
  // will just ignore it.
  // type: "object",
  "properties": {
    "apiKey": {
      "type": "string",
      "format": "masked"
    },
    "myNumberField": {
      "type": "number",
      "minimum": 1,
      "maximum": 5
    }
  },
  "required": ["apiKey"]
}

The apiKey property above has a "format" property. We will write a custom ajv formatter for masked properties.

Ensure source timestamps are collected

In order to support mechanisms such as dealing with an update operation that is reflecting an older state of source data, the integration SDK should work to ensure time stamps from the source are included in the entities/relationships.

The data model documents createdOn and updatedon for tracking the source timestamps. This should be required data, even if it is generated by the integration when it is not known/provided by the source, indicating the time that the integration collected the data, in case something else collects the data at a later time, but delivers it to the synchronizer earlier.

Once this data is ensured to be included in in the data delivered to the synchronization job, the persister should be able to rely it. It seems there is already some work it does to avoid overwriting newer data when an operation is stale.

Improve `createIntegrationEntity` and document behavior

This function has lofty goals, but it's current behavior isn't documented well enough, so that on a number of occasions, developers don't know what to do with source, or they don't like what it does/think it isn't clear.

Consider moving dependsOn property to metadata in index file

It would be nice to see all of the dependsOn in a a single index.ts file instead of having to open each step file to get that information. I the metadata of the steps should be in index.ts and the step function code should be in separate files.

Provide guidance and error type for integrations raising errors

The gitlab integration stores an API response code in IntegrationError.code, and there may be another integration doing something similar. However, I think now that was not the best choice.

In https://github.com//JupiterOne/integration-sdk/blob/master/src/errors.ts#L7, and everywhere in the integration-sdk currently, the values for code are constant-like strings, and there are error classes that assign code so that we don't see the strings repeated where the errors are thrown (great!).

We should avoid having each integration develop their own set of codes for common things, wo that we can make some things consistent for runtime monitoring and debugging. We'll need a clear way for integrations to raise certain types of errors, and limit the use of new error codes as much as reasonable.

Consider this proposal:

/**
 * A type representing errors generated or handled by the integration runtime.
 * Integrations should not throw this error type directly. See below for
 * options.
 *
 * All errors occurring during execution of an integration should ultimately end
 * up as an `IntegrationError`, or a subclass thereof, each expressing a `code`
 * reflecting the class of the error. In case an unhandled `Error` occurs, which
 * has no `code`, then `"UNEXPECTED_ERROR"` will be used in logging the error.
 *
 * Integration code should throw these exceptions:
 *
 * ```
 *   // Thrown during `validateInvocation`, anything the user should see that
 *   // can help them fix the problem on their own.
 *   throw new IntegrationValidationError("Something that will be displayed to user so they can fix configuration")
 *
 *   // Typically thrown during `validateInvocation`. User will be informed of
 *   // what we're calling to test authentication, and the response code.
 *   throw new IntegrationProviderAuthenticationError({
 *     cause: err,
 *     endpoint: "https://something.com/api/authcheckwhatever",
 *     status?: err.status,
 *     message?: err.statusText
 *   });
 *
 *   // Thrown during resource fetching steps. The runtime will show these along
 *   // with information about what could not be obtained thanks to insufficient
 *   // access.
 *   throw new IntegrationProviderAuthorizationError({
 *     type: [], // the `_type`s that could not be obtained
 *     cause: err,
 *     endpoint: "https://something.com/api/resource",
 *     status?: err.status,
 *     message?: err.statusText
 *   });
 *
 *   // Thrown for unexpected API errors. This will be shown to users to help
 *   // them understand why some data sets may be incomplete.
 *   throw new IntegrationProviderAPIError({
 *     cause: err,
 *     endpoint: "https://something.com/api/something",
 *     status?: err.status,
 *     message?: err.statusText
 *   });
 * ```
 */
export class IntegrationError extends Error { ... }

Report key metrics to a metrics service, including important dimensions

In order to support visibility into execution patterns, report key events to a metrics service. This should be an interface to different implementations, at least 2, one for local execution and in a managed execution environment, a cloud-native metrics service.

Each metric should include these dimensions:

  • accountId
  • integrationDefinitionId
  • integrationInstanceId
  • integrationJobId

Let's consider starting with these metrics:

  • Time to execute each step within the integration
  • Total time to complete an execution
  • Number of executions

Report code and id to users in job event error log

When an error occurs, it is very helpful to include the error code and a unique ID in the job event log for the error event. The jupiter-managed-integration-sdk would set the event description to something like, Step "Mochi" failed to complete due to error (code=Forbidden errorId=abc123-def-456-ghi-789).

Perhaps we could drop the due to error bit, since this new format will clearly indicate it was an error.

Add `addEntity`, `addRelationship` (singular forms of existing functions) to storage interface

I think we should consider adding addEntity and addRelationship to the storage interface. This could help to avoid a pattern where folks end up creating collector variables, which just end up keeping unnecessary references around longer than needed (memory bloat in the frame). That is, I can see folks collecting all the things and calling addEntities/addRelationships just once at the end of a step, thereby negating the benefits of the flushing the store does. Of course, we should also consider adding guidance that encourages working on data in small batches.

Poll for sync job completion when running "sync" command from CLI

Currently, we log the following when running sync command from CLI:

Synchronization job status: FINALIZE_PENDING
Entities uploaded: 120
Relationships uploaded: 148

We should add an option such as --wait that will cause the command to poll for the job to change to a terminal state.

Couple `getIntegrationStepStartState` with the individual `IntegrationStep` it is associated with

Currently, there is a getStepStartStates function that is used to determine which steps are disabled. This is particularly useful when an integration has a set of permissions passed into the integration configuration, which are then used to conditionally disable steps. It would be great if we could couple the start state condition (IntegrationStepStartState) within each IntegrationStep instead of the getStepStartStates.ts file at the root of the src.

Old:

getStepStartStates.ts

export default function getStepStartStates(
  executionContext: IntegrationExecutionContext,
): IntegrationStepStartStates {
  return {
    'my-state': { disabled: true },
  };
}

Proposed:

const step: IntegrationStep = {
  id: '...',
  name: '...',
  types: ['...'],
  getIntegrationStepStartState(
    stepExecutionContext: IntegrationStepExecutionContext
  ): IntegrationStepStartState {
    return { disabled: true };
  },
  async executionHandler({
    instance,
    jobState,
  }: IntegrationStepExecutionContext) {
    //
  },
};

Recorded responses are stored as gzip, cannot do PR reviews for sensitive content

It isn't possible during pull request reviews to see if there is sensitive information in the responses. There are a couple projects doing decompression:

Generally speaking, it would be necessary to look at the response encoding and when possible, decode and store UTF-8 text. It will be important that when the recordings are replayed, they still work.

Report resource utilization data at end of job execution

This will allow for:

  1. Capacity planning in production environments
  2. Visibility into the amount of data an integration collects
  3. Encourage developers to minimize utilization

Keep in mind that some metrics are auto-collected by nature of the environment, such as AWS Lambda max memory usage. Also, we may end up moving away from the simple file storage mechanism currently in use. Therefore, consider design by interface and allow for collecting the resource usage info from the storage implementation.

`getStepStartStates` that are missing steps causes collect to crash

12:45 $ NODE_OPTIONS=--enable-source-maps yarn j1-integration collect
yarn run v1.22.4
$ /Users/aiwilliams/Workspaces/jupiterone-io/graph-azure/node_modules/.bin/j1-integration collect
TypeScript files detected. Registering ts-node.

Configuration loaded! Running integration...

TypeError: Cannot read property 'disabled' of undefined
    at /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/dependencyGraph.js:247:46
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/dependencyGraph.ts:339:38
    at Array.map (<anonymous>)
    at buildStepResultsMap (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/dependencyGraph.js:236:10)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/dependencyGraph.ts:323:6
    at Object.executeStepDependencyGraph (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/dependencyGraph.js:59:28)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/dependencyGraph.ts:83:26
    at Object.executeSteps (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/step.js:11:30)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/step.ts:26:10
    at executeWithContext (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/executeIntegration.js:48:49)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/executeIntegration.ts:96:40
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async executeIntegrationInstance (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/executeIntegration.js:26:20)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/executeIntegration.ts:61:18
    at async Command.<anonymous> (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/cli/commands/collect.js:68:25)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/cli/commands/collect.ts:91:23
    at async Promise.all (index 0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.