jupiterone / sdk Goto Github PK

View Code? Open in Web Editor NEW

20.0 9.0 16.0 49.19 MB

Home of the JupiterOne SDK

License: Mozilla Public License 2.0

TypeScript 94.98% JavaScript 1.21% Handlebars 3.79% Shell 0.03%

aws hacktoberfest gcp azure security graph neo4j

sdk's Introduction

@jupiterone/sdk

A collection of packages supporting integrations with JupiterOne.

Development Resources

Introduction

Integrating with JupiterOne may take one of these paths:

A structured integration leveraging this SDK to dramatically simplify the synchronization process, essential for any significant, ongoing integration effort
A command line script (sh, bash, zsh, etc.) using the JupiterOne CLI tool to easily query/create/update/delete entities and relationships in bulk
Any programming/scripting language making HTTP GraphQL requests to query/create/update/delete entities and relationships
A JavaScript program using the JupiterOne Node.js client library to query/create/update/delete entities and relationships

The integration SDK structures an integration as a collection of simple, atomic steps, executed in a particular order. It submits generated entities and relationships, along with the raw data from the provider used to build the entities, to the JupiterOne synchronization system, which offloads complex graph update operations, provides integration progress information, and isolates failures to allow for as much ingestion as possible.

An integration built this way runs not only on your local machine; it can be deployed to JupiterOne's managed infrastructure. You can easily build the integration you need today and run it wherever you'd like. When you're ready, we can help you get that integration running within the JupiterOne infrastructure, lowering your operational costs and simplifying adoption of your integration within the security community!

Please reference the integration development documentation for details about how to develop integrations with this SDK.

Development

To get started with development:

Install dependencies using npm install
Run npm run build

This project utilizes TypeScript project references for incremental builds. To prepare all of the packages, run npm run build. If you are making a changes across multiple packages, it is recommended you run npm run build -- --watch to automatically compile changes as you work.

Linking packages

If you are making changes to the SDK and you want to test the changes in another project then it is recommended to automatically rebuild and link this project when changes are made.

Steps to automatically build and link:

Run npm run build or npm run build --watch in this project from a terminal and wait for initial build to complete.
Run npm link in the package that you want to link.
In a separate terminal, run npm link @jupiterone/<package to link> in the integration project. You can now use the integration SDK CLI in the other project and it will use the latest code on your filesystem.

Versioning this project

To version all packages in this project and tag the repo with a new version number, run the following (where major.minor.patch is the version you expect to move to). Don't forget to update the CHANGELOG.md file!

git checkout -b release-<major>.<minor>.<patch>
git push -u origin release-<major>.<minor>.<patch>
npm exec lerna version <major>.<minor>.<patch>

Note the git checkout/git push is required because Lerna will expect that you've already created a remote branch before bumping, tagging, and pushing the local changes to remote.

❕Make sure to have committed all your changes before running npm exec lerna version since it will commit the version update and tag that commit. Rebasing or amending lerna's commit will cause the tag to point to a different commit.

sdk's People

Contributors

Stargazers

Watchers

Forkers

neurovelho cdeevfrr kevincasey1222 pprathameshmore keystone-strategy alexbckr isabella232 jagadeeshdt miteshsharma electricgull-test-org-1 electricgull-test-org-1 eightnot2 zemberdotnet matea16 marsfour maazkarim

sdk's Issues

Allow an integration to provide a step context initialization function

I'd like the steps to receive an integration-specific sub-type of the IntegrationStepExectionContext, which has integration-specific properties on it used across the steps. This should happen once before the steps are called, and the executionHandler should be typed to receive the integration-specific context.

A concrete use case: initializing a single provider API client object. Without this mechanism, each integration has to invent a way to share a single instance of an API client across the steps.

Move test fixtures and utils into someplace that's a little easier to access

It would nice to be able to access the utilities without having to dig into other __tests__ directories.

There's a simple trick that can be done to have node resolve the directory without having to muck with require (see this test and this directory)

Once we move to a monorepo, we could just put the utils and fixtures in a separate package that won't be published.

Provide method for sharing state between steps

In the Qualys integration, it was necessary to collect QID values (Qualys ID values) from multiple collectors/steps and then at the end we need to batch fetch all of the Qualys vulnerabilities using those accumulated QID values. That is we need some way to collect QID values across multiple steps and share that state with another step that will actually go fetch the Qualys vulnerabilities.

I'm open to different ideas for how to solve this problem as well. In this specific Qualys example, it seems necessary to accumulate a set of values (to avoid duplicates) and then wait until the end to actually fetch the entities from that set.

J1 users have no indication of activity during the collect phase in the job UI

Integrations that take a few minutes to collect data look like they're doing nothing. Consider logging events for things like:

Validating configuration
Starting step X
Completed step X
Synchronizing changes

Convert project into a monorepo

Switching to a monorepo will help keep testing and cli code out of the final package/bundle that we hope to use in deployed environments while still keeping packages nicely in sync.

The code is already pretty well segregated so creating separate packages should be pretty easy.

src/cli -> @jupiterone/integration-sdk-cli
src/framework -> @jupiterone/integration-sdk (or maybe @jupiterone/integration-sdk-core?)
src/testing -> @jupiterone/integration-sdk-testing

Disable schema validation when deployed to production

Since we might not always get complete data back from integration, we should avoid enforcing schema validation when deployed to production environment. Schema validation should only occur when developing and testing integration.

Allow optional & default fields for instanceConfig.

For example, see the field role below:

{
  "username": {
    "type": "string"
  },
  "account": {
    "type": "string"
  },
  "password": {
    "type": "string",
    "mask": true
  },
  "role": {
     "type": "string",
     "optional": true,
     "default": "jupiterone"
  }
}

`validateInvocation` doesn't provide enough details of what is invalid when returning `false`

I've noticed that integrations are inconsistent in what they return/do in the validation function. I think we need this to be more consistent. In jupiter-managed-integration-sdk, the function did not return anything, but expected one or other error thrown, such as https://bitbucket.org/lifeomic/jupiter-managed-integration-sdk/src/858886f3f9d71eecb407cae153991bf711491e0f/src/integration/types/errors.ts#lines-55, because a simple true or false isn't very useful for communicating why the invocation isn't valid. Of course, we could allow for returning more than just true or false, to include some kind of reason, such as [false, "Authentication failed (path=/thing/I/checked, code=401)"] or [false, "Authorization forbidden (path=/resource/I/tried, code=403"].

createIntegrationRelationship has confusing method signature and auto complete suggestions

export declare function createIntegrationRelationship(options: DirectRelationshipOptions | DirectRelationshipLiteralOptions | MappedRelationshipOptions | MappedRelationshipLiteralOptions): Relationship

The union of so many types makes usage of the function hard to understand. We should consider decomposing into separate functions.

Do not use ts-node in deployed environment

I see this in our logs at runtime:

TypeScript files detected. Registering ts-node.

We should not be using ts-node with our compiled integrations.

Support `--step` option in `sync` and `run` commands

The most of the logic already exists in the collect command, we just need pull that out. For the sync command, we will probably need to adapt the code that determines partial datasets from a collection run to work with it.

Add the integration job ID to the job start log event to assist users reporting job issues

Include the integration job id at the start of job runs so that users that experience some kind of unexpected issue can pass that along in support requests.

Support loading integration from single index file

Instead of auto-loading steps, we should move toward having a single index file that contains metadata and imports steps (instead of scanning directory).

Change "integrationConfigFields.json" to be JSON schema compatible

Currently, integrations perform a lot of basic validation on IntegrationInstance config that could be shifted to automatic validation in the SDK. We could do this by changing our existing integrationConfigFields.json to be JSON schema compatible.

For example, many integrations have validation like this:

const apiKey = instance.config?.apiKey;
if (!apiKey) {
  throw new Error(
    'Configuration option "apiKey" is missing on the integration instance config',
  );
}

// More simple validation...

Proposed:

integrationConfigFields.json:

{
  // We assume that every `integrationConfigFields.json` is type "object". If it is specified, we
  // will just ignore it.
  // type: "object",
  "properties": {
    "apiKey": {
      "type": "string",
      "format": "masked"
    },
    "myNumberField": {
      "type": "number",
      "minimum": 1,
      "maximum": 5
    }
  },
  "required": ["apiKey"]
}

The apiKey property above has a "format" property. We will write a custom ajv formatter for masked properties.

Add toggles for control ingesting data for only certain categories

It would be good to be able to limit ingestion to certain categories.

For example, in the AWS integrations you might have categories such as the following:

[X] EC2
[X] IAM
[ ] RDS
[X] Other

Ensure source timestamps are collected

In order to support mechanisms such as dealing with an update operation that is reflecting an older state of source data, the integration SDK should work to ensure time stamps from the source are included in the entities/relationships.

The data model documents createdOn and updatedon for tracking the source timestamps. This should be required data, even if it is generated by the integration when it is not known/provided by the source, indicating the time that the integration collected the data, in case something else collects the data at a later time, but delivers it to the synchronizer earlier.

Once this data is ensured to be included in in the data delivered to the synchronization job, the persister should be able to rely it. It seems there is already some work it does to avoid overwriting newer data when an operation is stale.

Add logger functions for publishing events and error details

Automatically redact known sensitive headers when recording

Authorization should always be redacted, there may be others.

Log accountId and integrationInstanceId when running in deployed environment

Export @jupiterone/data-model from SDK

We should export * from https://github.com/JupiterOne/data-model/ in the SDK index.

Move to monorepo for integration-sdk

The integration-sdk should be a monorepo with separate packages for cli, core, development, etc.

Adding a single relationship with "jobState.addRelationship" adds an entity instead of a relationship

Support mapped relationships in visualization

Mapped relationships are a bit more sophisticated than simple direct relationships (see https://github.com/JupiterOne/integration-sdk/blob/master/src/framework/data/createIntegrationRelationship.ts#L77). They should produce target entities if necessary and build the relationship.

cc/ @erkangz @softwarewright @philidem

Improve `createIntegrationEntity` and document behavior

This function has lofty goals, but it's current behavior isn't documented well enough, so that on a number of occasions, developers don't know what to do with source, or they don't like what it does/think it isn't clear.

Consider moving dependsOn property to metadata in index file

It would be nice to see all of the dependsOn in a a single index.ts file instead of having to open each step file to get that information. I the metadata of the steps should be in index.ts and the step function code should be in separate files.

Provide guidance and error type for integrations raising errors

The gitlab integration stores an API response code in IntegrationError.code, and there may be another integration doing something similar. However, I think now that was not the best choice.

In https://github.com//JupiterOne/integration-sdk/blob/master/src/errors.ts#L7, and everywhere in the integration-sdk currently, the values for code are constant-like strings, and there are error classes that assign code so that we don't see the strings repeated where the errors are thrown (great!).

We should avoid having each integration develop their own set of codes for common things, wo that we can make some things consistent for runtime monitoring and debugging. We'll need a clear way for integrations to raise certain types of errors, and limit the use of new error codes as much as reasonable.

Consider this proposal:

/**
 * A type representing errors generated or handled by the integration runtime.
 * Integrations should not throw this error type directly. See below for
 * options.
 *
 * All errors occurring during execution of an integration should ultimately end
 * up as an `IntegrationError`, or a subclass thereof, each expressing a `code`
 * reflecting the class of the error. In case an unhandled `Error` occurs, which
 * has no `code`, then `"UNEXPECTED_ERROR"` will be used in logging the error.
 *
 * Integration code should throw these exceptions:
 *
 * ```
 *   // Thrown during `validateInvocation`, anything the user should see that
 *   // can help them fix the problem on their own.
 *   throw new IntegrationValidationError("Something that will be displayed to user so they can fix configuration")
 *
 *   // Typically thrown during `validateInvocation`. User will be informed of
 *   // what we're calling to test authentication, and the response code.
 *   throw new IntegrationProviderAuthenticationError({
 *     cause: err,
 *     endpoint: "https://something.com/api/authcheckwhatever",
 *     status?: err.status,
 *     message?: err.statusText
 *   });
 *
 *   // Thrown during resource fetching steps. The runtime will show these along
 *   // with information about what could not be obtained thanks to insufficient
 *   // access.
 *   throw new IntegrationProviderAuthorizationError({
 *     type: [], // the `_type`s that could not be obtained
 *     cause: err,
 *     endpoint: "https://something.com/api/resource",
 *     status?: err.status,
 *     message?: err.statusText
 *   });
 *
 *   // Thrown for unexpected API errors. This will be shown to users to help
 *   // them understand why some data sets may be incomplete.
 *   throw new IntegrationProviderAPIError({
 *     cause: err,
 *     endpoint: "https://something.com/api/something",
 *     status?: err.status,
 *     message?: err.statusText
 *   });
 * ```
 */
export class IntegrationError extends Error { ... }

Allow integrations to report resource access authorization errors to job event log

When an integration is not allowed to read a resource endpoint, there needs to be a consistent way of reporting the event to the job log.

Allow `iterateEntities` to find entities by properties other than just `_type`

This would allow folks to avoid writing their own iterators that just filter the matching entities. We can do this in the SDK as-is, inefficient as it may be, but at least when we make it fast, there will be no changes in the integrations.

Improve messaging around steps that fail to be loaded

At the moment, the following error is thrown when a step fails to load:

TypeError: Cannot read property 'id' of undefined

This is a bit cryptic and we should try to improve the messaging around this.

Warn when entity and relationship types collected from a step do not match the step's declared types

The JobState object that gets created for each step can store a set of types that it has run across. After the step has completed, we can log a warning that unexpected types were detected. This information could also be stored on the IntegrationStepResult and summarized at the end.

Use TypeScript generics to define IntegrationInstance "config" type.

Old:

export interface IntegrationInstance {
  ...
  config: any;
}

New:

export interface IntegrationInstance<TConfig> {
  ...
  config: TConfig;
}

Report key metrics to a metrics service, including important dimensions

In order to support visibility into execution patterns, report key events to a metrics service. This should be an interface to different implementations, at least 2, one for local execution and in a managed execution environment, a cloud-native metrics service.

Each metric should include these dimensions:

accountId
integrationDefinitionId
integrationInstanceId
integrationJobId

Let's consider starting with these metrics:

Time to execute each step within the integration
Total time to complete an execution
Number of executions

Documentation: provide guidance for one full end-to-end test with Polly.js recording

We should make it standard practice to provide one end-to-end test of the integration that uses a Polly.js recording.

Report code and id to users in job event error log

When an error occurs, it is very helpful to include the error code and a unique ID in the job event log for the error event. The jupiter-managed-integration-sdk would set the event description to something like, Step "Mochi" failed to complete due to error (code=Forbidden errorId=abc123-def-456-ghi-789).

Perhaps we could drop the due to error bit, since this new format will clearly indicate it was an error.

Use TypeScript type definitions from @jupiterone/data-model

To make it easier to catch schema validation errors during development, we should generate typescript type definitions from our JSON schema for the data model.

Provide a means to obtain last successful job execution time

Some integrations want to pull data since last time they ran. The previous integration provided https://bitbucket.org/lifeomic/jupiter-managed-integration-sdk/src/master/src/integration/service/lastSuccessfulSynchronizationTime.test.ts.

Add `addEntity`, `addRelationship` (singular forms of existing functions) to storage interface

I think we should consider adding addEntity and addRelationship to the storage interface. This could help to avoid a pattern where folks end up creating collector variables, which just end up keeping unnecessary references around longer than needed (memory bloat in the frame). That is, I can see folks collecting all the things and calling addEntities/addRelationships just once at the end of a step, thereby negating the benefits of the flushing the store does. Of course, we should also consider adding guidance that encourages working on data in small batches.

Move eslint, tsconfig, husky, etc. into shareable package

After moving to a monorepo (#126) we should make sure the configuration for development tooling is moved to a shareable package.

Poll for sync job completion when running "sync" command from CLI

Currently, we log the following when running sync command from CLI:

Synchronization job status: FINALIZE_PENDING
Entities uploaded: 120
Relationships uploaded: 148

We should add an option such as --wait that will cause the command to poll for the job to change to a terminal state.

Couple `getIntegrationStepStartState` with the individual `IntegrationStep` it is associated with

Currently, there is a getStepStartStates function that is used to determine which steps are disabled. This is particularly useful when an integration has a set of permissions passed into the integration configuration, which are then used to conditionally disable steps. It would be great if we could couple the start state condition (IntegrationStepStartState) within each IntegrationStep instead of the getStepStartStates.ts file at the root of the src.

Old:

getStepStartStates.ts

export default function getStepStartStates(
  executionContext: IntegrationExecutionContext,
): IntegrationStepStartStates {
  return {
    'my-state': { disabled: true },
  };
}

Proposed:

const step: IntegrationStep = {
  id: '...',
  name: '...',
  types: ['...'],
  getIntegrationStepStartState(
    stepExecutionContext: IntegrationStepExecutionContext
  ): IntegrationStepStartState {
    return { disabled: true };
  },
  async executionHandler({
    instance,
    jobState,
  }: IntegrationStepExecutionContext) {
    //
  },
};

Recorded responses are stored as gzip, cannot do PR reviews for sensitive content

It isn't possible during pull request reviews to see if there is sensitive information in the responses. There are a couple projects doing decompression:

Generally speaking, it would be necessary to look at the response encoding and when possible, decode and store UTF-8 text. It will be important that when the recordings are replayed, they still work.

Report resource utilization data at end of job execution

This will allow for:

Capacity planning in production environments
Visibility into the amount of data an integration collects
Encourage developers to minimize utilization

Keep in mind that some metrics are auto-collected by nature of the environment, such as AWS Lambda max memory usage. Also, we may end up moving away from the simple file storage mechanism currently in use. Therefore, consider design by interface and allow for collecting the resource usage info from the storage implementation.

Move the `/template` to a GitHub repo and make it a Template

Perhaps JupiterOne/integration-template 😄

Implement `--module` flag for j1-integration commands

--module is documented here, but is not implemented.

Use separate sets to cache entity and relationship keys

I don't think this is a very common use case, but an entity key and a relationship key can technically be the same and we are currently using the same Set (DuplicateKeyTracker) to track the keys for each state.

Update template for `validateInvocation` to provide better guidance on what it should do

IntegrationProviderAuthenticationError should be thrown when it fails to validate. That will capture more info than throw Error(). Also, remove this:

context.logger.info(
    {
      instance: context.instance,
    },
    'Validating integration config...',
  );

The runtime is already logging this information.

Add function to indicate partial data set while integration is running

We support a way to indicate that one or more _type values didn't synchronize but you have to throw an error in a step to add to the list of partial _type files. We should be able to add _type value to partial data set without having to throw error

Some step directory layouts are not handled so that an `undefined.id` error occurs

We've seen a good bit of confusion and different directory layouts, and failures only found once deployed, around the loading from a directory structure approach. This mechanism needs to be more robust, or we should consider moving to an explicit export invocationConfig from @jupiterone/graph-*/index.ts.

`getStepStartStates` that are missing steps causes collect to crash

12:45 $ NODE_OPTIONS=--enable-source-maps yarn j1-integration collect
yarn run v1.22.4
$ /Users/aiwilliams/Workspaces/jupiterone-io/graph-azure/node_modules/.bin/j1-integration collect
TypeScript files detected. Registering ts-node.

Configuration loaded! Running integration...

TypeError: Cannot read property 'disabled' of undefined
    at /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/dependencyGraph.js:247:46
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/dependencyGraph.ts:339:38
    at Array.map (<anonymous>)
    at buildStepResultsMap (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/dependencyGraph.js:236:10)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/dependencyGraph.ts:323:6
    at Object.executeStepDependencyGraph (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/dependencyGraph.js:59:28)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/dependencyGraph.ts:83:26
    at Object.executeSteps (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/step.js:11:30)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/step.ts:26:10
    at executeWithContext (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/executeIntegration.js:48:49)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/executeIntegration.ts:96:40
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async executeIntegrationInstance (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/framework/execution/executeIntegration.js:26:20)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/framework/execution/executeIntegration.ts:61:18
    at async Command.<anonymous> (/Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/dist/src/cli/commands/collect.js:68:25)
        -> /Users/aiwilliams/Workspaces/jupiterone-io/integration-sdk/src/cli/commands/collect.ts:91:23
    at async Promise.all (index 0)

Warn or error when duplicate entity and relationship keys are generated

Along with #74 + the schema validation that goes on, this should help integration devs ensure that they aren't sending bad data prior to performing synchronization.