openfn / kit Goto Github PK

View Code? Open in Web Editor NEW

8.0 7.0 12.0 5.03 MB

The bits & pieces that make OpenFn work. (diagrammer, cli, compiler, runtime, runtime manager, logger, etc.)

JavaScript 3.09% TypeScript 96.73% Shell 0.03% Dockerfile 0.14%

dpg ict4d openfn

kit's Issues

Compiler: Shouldn't import Date

Date and Promise need to be added to the list of globals in the compiler to prevent them auto importing.

Also we probably need to be more explicit in our reporting for this so that Mtuchi can more easily debug this stuff

Mtuchi feedback

Issues found in cli from pairing with Mtuchi:

CLI help says you can use "trace" level, which is wrong
The runtime logger is printing details in the config object - shouldn't the logger be obfuscating values?
The no exprts found warning is probably a bit too severe given that it really doesn't matter...
~ in adaptor paths isn't working

Switch to jsdelivr for type definitions

User story

Unpkg doesn't cache indefinitely leading to long waits for cold files (like an hour?)

It appears the jsdelivr caches forever and claims to be faster.

Details

Switch to jsdelivr for dts assets on adaptor docs component

Implementation notes

It would be nice to know if a list of possible versions is available as well, this could speed up lightning as well when requesting available adaptors

Release notes

Tests

CLI: clean up repo pwd

Raised out of #56

openfn repo pwd isn't very clear.

Instead, let's just do:

openfn repo - print out diagnostic information (including the repo working dir)

Also, this needs to report defaultRepoPath if the env var isn't set, so it never returns undefined.

describe-package: export/from seems to export symbols as "unknown"

Here's a weird thing.

I was just looking at the docs for primero and describe-package isn't reporting the exports from common.

In Adaptor.js we do this:

export {
  alterState,
 . ..
} from '@openfn/language-common';

Looking at the output of getSymbols(), typescript seems to recognise the names of all these exports as unknown. It doesn't seem to pull out any useful information at all.

I wonder if that's because there's no definition of common available in the TS project. In fact that's probably it: even if it was able to scrape the names of these exports, it wouldn't be able to report any type information.

So, describe-package may have to somehow pull the definitions of any dependencies into the Project in order to properly report types. Because this doesn't just apply to language adaptors - http exports axios, for example...

adaptor-docs: Filter function list

To make it easier to find docs for a given function, we need to present a little filter/search widget at the top of adaptor docs.

This is not high priority.

CLI: Add a preflight check at the start

It would be really nice to have a sort of pre-flght validation check at the start.

This should try to resole all the input paths - the adaptor, the state, the job etc - and report an error if any path can't be resolved.

This would provide nicer, cleaner error reporting and make it easier for users to understand if they've mis-typed a path. At the moment it'll just report fairly arcane errors saying it can't find certain paths, and it may not be clear what's wrong with those paths.

The validation should log passes on info (ie, tell us what you found), and raise errors by default.

An alternative to this, I guess, would just be to have better error messages which give a little more feedback.

Runtime: set a timeout on jobs

Each operation should run within a fixed timeout or else throw an exception.

Timeout duration should be easily to set via options.

Cli should use a default timeout of maybe a couple of seconds (and accept an option)

adaptor-docs: collapse functions

The function list needs to be collapsible, and collapsed by default. So I guess it needs to be expandable.

Anyway, we need this urgently so that we can show a clear, non-overwhelming list of available operations.

Linker: consolidate module injection code

When loading modules for use by the new runtime, the linker has to load modules from text and inject (or maybe reflect?) the API into what's know as a Synthetic Module.

To do this, we have to find the exports of the loaded module. And this gets a bit hairy because of CJS and ESM and I'm not even sure what else.

Here's the code to find the exported values right now:

  let target = exports;
  if (exports.__esModule && target.default.default) {
    // CJS
    target = target.default.default; // ?!
  } else {
    // ESM
    // If we import @openfn/[email protected], its named exports are found on the default object
    // Which doesn't seem quite right?
    // This elaborate workaround may help
    if (
      Object.keys(exports).length === 1 &&
      exports.default &&
      Object.keys(exports.default).length > 0
    ) {
      target = target.default;
    }
  }

I mean it's horrible, full of guesswork, and just needs bringing under control. And to be unit tested effectively.

For what it's worth, given the list of exports, here's how we expose them in the Synthetic Module for use in the runtie:

  const exportNames = Object.keys(target);
  // Wrap up the real module into a Synthetic Module
  const m = new vm.SyntheticModule(
    exportNames,
    function (this: SyntheticModule) {
      for (const e of exportNames) {
        this.setExport(e, target[e]);
      }
    },
    { context }
  );

Compiler: add a validator

The compiler should have the ability to validate job code and report any errors it finds. This could be implemented as eslint rules, or just our own validation which runs against the parsed AST.

This could be run as a pre-flight check by a runtime manager, or be used in lightning to give feedback to job authors at dev time.

Validation issues could include:

Catching invalid/disallowed imports
Warning about nested operations (ie, we want to encourage many top level operations or "magic functions" so that we can report on them more effectively)
Catch common usage errors with adaptor functions (ie, passing state directly without wrapping it in an arrow)

Importing adaptors and exporting type definitions

A discussion thread.

In Lightning, we have a need to pass type definitions into Monaco to support code assist and make job writing easier.

At the moment, job code assumes that adaptor function exports are available globally (I assume it's the runtime's job to inject them?)

fn(/*do something*/)

Monaco won't understand that fn is an @openfn/language-common export unless we take action.

I think these are our options:

1. Explicit imports

Make jobs explicitly declare imports from the adaptor

import { fn } from '@openfn/adaptors';
fn(/*do something*/)

Advantages:

Job code declares its imports like any regular JS file
It's easier for a JS dev to understand the code
Job code becomes a bit more portable
Static analysis tools (like Monaco/tsc) can understand these types
Enables multiple adaptors per job

Disadvantages:

Job authors have to remember write that import statement
This is tedious
You need to know in advance what to import, which defeats the point a little
Job code is bloated
Not backward compatible with existing jobs

Mitigations:

We could auto-generate an import statement for each adaptor
We could (probably) hide the import statement in the code editor

2. Global types

Export adaptor typings as globals, rather than (or as well as) modules

declare global {
  export function fn(operation: () => void): void;
}

By importing the correct typings for an adaptor, Monaco will understand the top-level symbols.

Advantages:

Nothing changes for job authors
Monaco Just Works
Backwards compatible. Probably (depending on how many adaptor defs we want to create)

Disadvantages:

We have to generate a fake / non-representative interface
Generating type definitions for adaptors gets a bit more complicated (we probably need a standard module .d.ts and an editor-friendly global d.ts)
We lose the potential benefits of transparent imports

CLI: support doc gen (a cunning plan)

This issue is an enhancement and a nicety BUT it will also solve a big problem for us by enabling server-side doc gen.

So this is a design post to track the work I plan to do in the next couple of weeks:

Add a docgen command to the CLI, which takes an adaptor name and outputs the result of describe-package (ie, an easy to consume JSON object with rich documention info, suitable for use by the adaptor-docs component)
The CLI should output docgen results to a file in the repo. Each json file is named by a specifier (ie package name and version)
If there's already a json file in the repo, the CLI just returns it to stdout or returns a path
While loading, the file contents should be { loading: true, startTime: Date.now() }. If a second CLI call comes in while describe-package is in flight, it can use this to poll for the result rather than duplicating the work. It can timeout too an start the request again.

There are two clients to this CLI command:

Using the generated docs, we can implement openfn api common fn as per #31. This should output basic doc information plus a link to docs.openfn.org
Lightning can add an endpoint which calls the CLI and returns the generated docs JSON to the client. This enables caching and greatly speeds up docgen in the app.

This work shouldn't do anything to enhance the actual describe-docs functionality (apart from maybe enabling caching). Enhancements made to describe-package later will naturally manifest in the CLI.

CLI: running openfn repo list before the repo is inited causes an exception

pnpm start fails in /examples

Also just getting this down for the record. Cannot start with pnpm start in the examples folder:

`pnpm i` seems to work

~/kit/examples allow_cron ❯ pnpm i                                                         Node 16.15.0 16:54:09
Scope: all 4 workspace projects
Lockfile is up-to-date, resolution step is skipped
Already up-to-date

`pnpm start` fails

~/kit/examples allow_cron ❯ pnpm start                                                     Node 16.15.0 16:54:12

> [email protected] start /Users/taylor/kit
> node server.js

node:internal/modules/cjs/loader:936
  throw err;
  ^

Error: Cannot find module '/Users/taylor/kit/server.js'
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:933:15)
    at Function.Module._load (node:internal/modules/cjs/loader:778:27)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
    at node:internal/main/run_main_module:17:47 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}
 ELIFECYCLE  Command failed with exit code 1.

CLI: High severity warnings on install

On 0.0.14:

Not good really!

Module resolution

A design ticket to help me straighten out some spaghetti in my head.

We have a requirement to dynamically load the dependencies of a job (ie, the adaptors it relies on, but maybe also more generally any other npm packages it may import.

For example, a job will often include the line import { fn } from '@openfn/language-common'. But where do we actually load @openfn/language-common from?

Because the same job my run in various places, and because different tools need to analyse this code, the resolution of this path is quite complicated.

Clients

So who needs to resolve the module @openfn/language-common?

To auto-insert import statements into legacy (and modern?) jobs, the compiler needs to look up the package.json and type declaration file
(Actually the lookup needs to run before the compiler run, so the CLI needs to load the module's types and pass them into the compiler)
To execute the job, the runtime's linker needs to load the module's actual javascript.
To develop the runtime framework, a language adaptor or a job file, a developer may then need to override the runtime path to the module, typically to use a local version.
The runtime manager may also have to download/install the module from a central repository (probably unpkg). I suppose in this case we only need the specifier and version.

Also note that Lightning's code editing tools need to know what modules are imported to provide code assist and intelligence. This in practice builds on the compiler, but Lightning itself may want to pre-load module definitions or documentation for its own tooling.

So at a high level, a runtime manager (devtools or lightning) needs to be able to specify the path to a module, and the runtime and compiler both need to respect whatever path is provided.

Oh, yeah, and we need to worry about versioning as well.

Module loading rules

We can potentially provide a number of ways to provide a mapping to a module, most of them oriented around the devtools CLI.

Here are the options to resolve @openfn/x (without versioning):

Explicit mapping: the module may be explicitly mapped to a location on the local system
Modules home dir: pass a directory to use like node_modules. Modules will be resolved here first. Ie, load @openfn/x from ${MODULES_HOME}/@openfn/x. The idea here is that a developer creates a local folder with their own adaptors in, sets this as an env var, and so automatically loads local language adaptors. Difficulty: If the module is @openfn/x, then in the node_module folder, you need to have a @openfn dir and then an x dir inside it. Which is fine but quite an awkward setup.
Monorepo home: set the path to the language adaptor monorepo (when it exists) as an env var and we'll load adaptors straight out of there. The loader logic should understand the monorepo structure
Otherwise we load from the actual specifier itself (without a version) - ie, import('@openfn/x)

If monorepo and modules_home are both set, we probably use modules_home as it's more specific.

Since there's so much complexity in module resolution, the tooling needs to be really clear about where it's loading modules form.

MODULES_HOME may have another use. If a job has arbitarary node dependencies (ie axios or lodash), those need to be installed somewhere. They're not dependencies of the CLI and shouldn't be served by the CLI's node_modules. Nor the runtime's. So arbitrary

What do we need?

There's a lot of mapping rules here and I don't want to have to write and test them in several places (the CLI, runtime and compiler may each need to do this!).

So we need to provide a single function, which accepts options from a runtime manager. Something like:

resolveModulePath(specifier, { modulesHome, moduleMap, monorepoHome }) {
  const path = /* resolve the specifier into a path */
  return path || specifier
}

Where does this live? It's a dependency of compiler and runtime, but not really part of either. The CLI needs it to preload module exports. It could go in a generic utils package but it's a bit lost. So maybe a module-resolver ?

I think it's literally just one function, I'm not sure what else it needs to do. I suppose we could push more logic into it?

loadDTs()
loadModule()
loadSyntheticModules()

Once you start doing that there's actually quite a close relationship to describe-package. Maybe that needs to evolve into like a module-helper or module-manager or something. This, I think, is part of the answer.

Reporting on memory usage

We want to be able to report on memory usage when running jobs.

node.process gives us a memoryUsage API which makes it look simple.

But in practice, I'm not sure the best way to handle this. It's probably not helpful to just print what the current usage is?

My feeling is:

Run gc() before each operation
Capture memory after GC and at the end of the operation. The diff is presumably the operation cost.
At the end of the whole job pipeline, we should report the min and max usage (reporting which operation was at max)
We should handle network traffic in a similar way

A bookmark for later: node-memwatch

Auto-install of adaptors (and other dependencies)

A major design issue.

In the distant future, I want uncompiled jobs to look like this:

import { fn }  from '@openfn/language-common@^2.0.1'
import { each } from 'lodash@2'

fn(() => {...};

This enables us to a) version lock a job's dependencies and b) import arbitrary npm modules.

Couple of quick caveats here:

It's up to the runtime manager whether and how to allow arbitrary module imports (i think the manager should have a whitelist of allowed modules). The principle in the runtime is simply that modules can be imported
I expect the Lightning UI to do some heavy lifting when it comes to declaring imports
We can of course still run a job from live JS or with no imports, in which case all this stuff is simply ignored.

The code above should compile into something like this, with a manifest declared as a set of directives at the top of the file:

'use @openfn/language-common 2.0.1';
'use lodash 2';

import { fn }  from '@openfn/language-common';
import { each } from 'lodash';

export default [fn(() => {...}];

The directives tell the runtime (or runtime manager maybe) what versions of dependencies to use.

Note that the compiler can auto-generate directives (at least for adaptors and maybe more) if certain options are passed (like how we generate imports for adaptors now).

Auto Installing Dependencies

Somewhere in the new runtime stack, we'll need the ability to auto-install a language adaptor (and eventually any arbitrary npm module).

This is either the job of the runtime manager or the runtime itself. At the time of writing I actually think it's for the runtime.

Runtime Manager

The most obvious approach in this architecture is that the runtime manager ensures dependencies are installed and available to the runtime before anything runs. The manager should pass paths to each import to the runtime, which the linker will resolve at runtime. There is already a mechanism for this in place.

A major disadvantage of this approach is that the work needs to be duplicated across different managers. Specifically the CLI and the web service would both need a solution to the same problem - downloading and installing modules. This may mean implementing the installer as an export of the runtime, or as its own package in the monorepo (neither of which really appeals in this architecture!).

Runtime

Alternatively we can push this logic directly down into the runtime.

This makes sense because it's the runtime's linker which has to resolve modules anyway - why shouldn't the linker be able to install modules on the fly, manage its own cache, etc?

The runtime would have to do the following:

Read the manifiest at the top of the job source
Run a setup step before executing the pipeline and install any modules into a specific folder (which it can log and report events for)
Run the job and have the linker resolve modules on demand

The runtime would need some kind of working directory to save all its node modules in, and would presumably need an API to purge this directory somehow to prevent disk space exploding. This bit feels like a runtime manager problem - I guess the runtime manager tells the runtime its working dir and when and how to purge it.

Loading Mechanisms

Regardless of WHO does the loading, down at the metal, we can install node modules to a local folder using aliases.

We should bear in mind the alias name rules, although there's nothing surprising here.

I think we probably want to install the runtime modules into a prefixed folder, with a name like package_x.y.x. We will need to maintain a package.json (but probably not a package lock) otherwise each new install will result in "unused" packages being removed.

Bundled adaptors?

A related issue here is: should adaptors be bundled into a single .js file, with all their dependencies included?

A benefit of bundling is that an adaptor can be "installed" simply by extracting the .js and package.json files somewhere. No npm install is needed. Managing multiple versions is easy - not least because nested dependencies are included in the file.

On the other hand, the adaptor becomes very bloated, taking up disk space and (potentially) being slower to load.

Note that we can't bundle non-openfn dependencies, so if someone wanted to import lodash, we would have to support an npm install.

Without bundling, we need to decide how strictly to lock dependency versions in the package manifest of the adaptors. Do we allow some patch or even minor level changes to accept fixes? Or do we strictly fix versions to ensure nothing changes?

switch to esbuild

note to self: take a look at replacing rollup with esbuild. What breaks? What do we lose? Aim for parity with the adaptor monorepo

Setup example fails, could not resolve "@openfn/workflow-diagram"

I assume I'm doing something wrong, but opening this up for posterity.

~/kit main !2 ❯ pnpm run -C examples/flow start                                            Node 16.15.0 16:45:57

> [email protected] start /Users/taylor/kit/examples/flow
> pnpm node esbuild.js

Serving "." at http://127.0.0.1:8080
✘ [ERROR] Could not resolve "@openfn/workflow-diagram"

    src/index.jsx:1:28:
      1 │ import WorkflowDiagram from "@openfn/workflow-diagram";
        ╵                             ~~~~~~~~~~~~~~~~~~~~~~~~~~

  The module "./dist/index.js" was not found on the file system:

    node_modules/@openfn/workflow-diagram/package.json:10:19:
      10 │         "default": "./dist/index.js"
         ╵                    ~~~~~~~~~~~~~~~~~

  You can mark the path "@openfn/workflow-diagram" as external to exclude it from the bundle, which
  will remove this error.

/Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:1611
  let error = new Error(`${text}${summary}`);
              ^

Error: Build failed with 1 error:
src/index.jsx:1:28: ERROR: Could not resolve "@openfn/workflow-diagram"
    at failureErrorWithLog (/Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:1611:15)
    at /Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:1257:28
    at runOnEndCallbacks (/Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:1042:63)
    at buildResponseToResult (/Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:1255:7)
    at /Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:1364:14
    at /Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:674:9
    at handleIncomingPacket (/Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:771:9)
    at Socket.readFromStdout (/Users/taylor/kit/node_modules/.pnpm/[email protected]/node_modules/esbuild/lib/main.js:640:7)
    at Socket.emit (node:events:527:28)
    at addChunk (node:internal/streams/readable:315:12) {
  errors: [
    {
      detail: undefined,
      location: {
        column: 28,
        file: 'src/index.jsx',
        length: 26,
        line: 1,
        lineText: 'import WorkflowDiagram from "@openfn/workflow-diagram";',
        namespace: '',
        suggestion: ''
      },
      notes: [
        {
          location: {
            column: 19,
            file: 'node_modules/@openfn/workflow-diagram/package.json',
            length: 17,
            line: 10,
            lineText: '        "default": "./dist/index.js"',
            namespace: '',
            suggestion: ''
          },
          text: 'The module "./dist/index.js" was not found on the file system:'
        },
        {
          location: null,
          text: 'You can mark the path "@openfn/workflow-diagram" as external to exclude it from the bundle, which will remove this error.'
        }
      ],
      pluginName: '',
      text: 'Could not resolve "@openfn/workflow-diagram"'
    }
  ],
  warnings: []
}
 ELIFECYCLE  Command failed with exit code 1.

Fix word wrap on workflow-diagram nodes

User story

As a user looking at the workflow-diagram, I want to see the text in the workflow nodes have a good word wrap so that I can read it.

Details

See @taylordowns2000's comment here for more details.

Implementation notes

Release notes

Tests

Allowing adaptors to extend execute

Some adaptors, like postgres, allow the execute function to be extended.

This will break in v2 at the moment!

What's really going on here is the adaptor wants to add setup and teardown hooks which run at the start and end of the pipeline.

I think the solution is to publish a new adaptor which is only compatible with the new runtime. The adaptor would look like this:

import { addPrePipelineHook, addPostPipelineHook } from '@openfn/runtime';

function setup(state) { ... }
function teardown(state) { ... }

addPrePipelineHook(setup)
addPostPipelineHook(setup)
export { setup, teardown, connect, ... }

Whenever the adaptor is imported, it'll automatically run the setup and teardown operations at the start and end of the pipeline. it's similar to the overridden execute really, but a bit less magical.

What I like about this approach is that there's no compiler or runtime magic. v1 jobs will continue to work in the new runtime.

The only downside I can see is that the next-gen adaptor will fail in the old engine. Given that the platform can control which adaptors are loaded, I think this is probably acceptable?

It also gets a bit sticky in he new CLI because you'll need to target the CLI at the correct adaptor version. Again this is probably OK, but may rely on a good versioning strategy eg openfn . -a @open/[email protected]

Remove workflow box and update highest workflow box to include a name

User story

As a user that is looking at a workflow for the first time, I would like to easily identify the name of a workflow and understand when it gets triggered.

Details

remove the “workflow” wrapper node
change the “webhook” node to have the workflow name and describe how the workflow gets triggered. (make it bigger so that it is obviously different to the rest of the nodes in a workflow)

Implementation notes

clone OpenFn/kit
read README.md
start the flow example with pnpm run -C examples/flow start
cd into /packages/workflow-diagram and start a build/watch with pnpm run watch
take a look at the example which is now running in your browser on http://127.0.0.1:8080/
take a look at how this example gets "fed" by a project description at examples/flow/src/index.jsx
head over to packages/workflow-diagram/src/layout/index.ts and see how the example gets built.
To start work, you might approach this by removing the wrapper node entirely and modifying the trigger node so that it displays information about both the workflow (name) and the trigger (details).

Release notes

Tests

Compiler: compile jobs from typescript

Seems to me there's no reason we shouldn't support jobs written in typescript.

Given a flag or directive at the top of the job source, or a file extension, we should be able to first compile the job into javascript before running the regular transformation pipeline.

This is mostly not a very big deal - I guess the difficulty comes from setting up the typescript rules. Usually when we write typescript we provide a tsconfig. Would we have to default it? Should we accept one? What about type definitions, how do we import those?

Transform projectSpace JSON data for presentation concerns

User story

As a user looking at the workflows diagrams I want to see the title Untitled for untitled workflows and a description for webhook and / or cron workflows.

Details

For webhook the description should be like this when data is received at https://demo.openfn.org/i/34f843bd-eb87-4833-b32a-905139534d5a and for cron the description should be like "at 7:34pm every Tuesday"

Implementation notes

Release notes

Tests

CLI: ideas for enhancements

Provide an interactive CLI

Triggered when openfn is called with no arguments

$ openfn
> Welcome the openfn devtools CLI!
> Enter the path to the job you'd like to run
$ job.js
> Would you like to use an adaptor?
[ ] language-common
[ ] language-http
> Any special requests?
[ ] compile only
[ ] set output path
[ ] set logging levels

Allow the the last job to be re-run with the same arguments

$openfn --repeat

or --last or something.

Shows the last command and waits for confirmation before running.

Name a command

$ openfn job.js --name my_favourite_job
$ openfn my_favourite_job

Run a command with a name and we'll save it. Call openfn with that name and we'll repeat it. Maybe prompt before running.

Saving a job with an existing name will prompt and ask you to override or cancel

Openfn config

Prints out any configuration you may have . Ie, OPENFN_MODULES_HOME location, any saved jobs, the previous job, any credential information, any adaptors we know about.

Benchmark

$ openfn bench job.js

Run a named job a bunch of times (5 by default, but allow a value to be set). Saves the average, min and max times.

When the benchmark is next run, it'll also print out a performance diff. Ie average: +5%, min: -10% max 6%

CLI: provide help for adaptor functions

User story

A great idea presented by Mtuchi, which I really like, is to be able to use @openfn/cli to get help about adaptor functions.

Something like this:

openfn -a language-http -h get

Details

This is basically modelled on iex -h which reports help information about elixir functions.

An ideal solution would be to:

Print a one-sentence description of the function, the signature and an example
Also print a hyperlink to online documentation

Implementation notes

We need to work out the format of the help question. You have to input the adaptor name and a function - so is that @openfn/language-http.get or language-http/get or even -a language-http -h get.

That last one is the most consistent, because it uses a standard command to input an adaptor (which can accept an inline path to a local package), and a simple help command which takes a function name and nothing more.

When we move all the adaptors into the monorepo, we'll be streamlining all the help docs and making sure they'er in one place with predictable URLs. So generating a printing a URL to the help docs is pretty easy.

What about the help itself? The CLI needs to intelligence to look in the adaptor's package, read the type definition for the required function (either straight from the dts or from the ast.json), and massage it into a nice, standard format for the CLI.

Because we want to optmise the formatting for the CLI we want to read fairly low level data and generate output, rather than just dumping an existing string (or even HTML string)

CLI Autoinstall: ensure it's multithread safe

When running in a multi-threaded environment, it's possible for multiple autoinstall requests to come at the same time.

Ie, three different jobs run at about the same time and each needs the same package to be autoinstalled.

A blocking function might be a solution here.

If the autoinstall process is long running, it may be able to track which promises are in flight and piggy back them. That speaks to one thread managing the repo while others call into it.

shared tsconfig.tsbuildinfo is killing the builds

This issue is a bit of a whinge, a place for discussion, and a log of what I've been up to.

I've lost a lot of time today (and, in hindsight, previously) because my ts builds are randomly failing. I'll run a build and I get an error out of rollup as if it's trying to parse .ts files as .js files. Builds will just fail, after trivial or even no code changes, and then randomly start working again.

What's happening is this:

We now have a common top-level .tsconfig file at the top of the kit monorepo, so all packages can inherit the same settings (this is my fault and probably the cause of the problem)
When compiling incrementally, typescript saves a bunch of build information to kit/tsconfig.tsbuildinfo to save time
While there's a top level tsconfig.tsbuildinfo, ts compilation fails (presumably because its reading cached information for a different project, or something).

I've FINALLY worked out that if I remove tsconfig.tsbuildinfo on each build, the build is stable.

What is odd is that two builds of the same package - which should build from the the same cache - seems to be what's causing the same problem. For example, I can build cli once and it's fine, if I run build again it'll fail so long as the buildinfo file is present. Ok, so we can remove the build cache on each build, but doesn't that defeat the purpose of incremental builds?

Note that in tsconfig.compilerOptions we can set a path to the tsBuildInfoFile, and so ensure that each package has its own ts cache (and further that the cache can be cleared at the appropriate time).

CLI: auto-install should ignore paths

This fails today:

openfn -i -a common=path/to/common

Basically the autoinstall needs to recognise that a path has been provided for common, so it does't need to acutally autoinstall.

It should always be safe to pass -i.

Runtime lifecycle APIs

Just thinking ahead a little.

The runtime should provide lifecycle hooks which can be imported by adaptors (and jobs, why not).

Ie, language-postgres will want to do this:

import { onStart, onEnd } from '@openfn/runtime

onStart(() => connect(state.data.configuration) );
onEnd(() => tearDown());

This is a sketch of the interface:

// Run a function when the pipeline starts (a pre or init step)
export onStart(fn(state: any) => state): void;

// Run a function when the pipeline ends (a post or final step)
export onStart(fn(state: any) => state): void;

// Run a function before each job
// Maybe not useful for adaptors but probably useful for runtime managers!
// Maybe this should take the operation, or some metadata about it, as an argument
export beforeEach(fn(state: any) => state): void;

export afterEach(fn(state: any) => state): void;

Do we want the API to be registerCallback or onEvent or hooks.onStart ?

Do we want pre and post semantics instead of start and end?

An alternative API would be like:

type HookStage = 'start' | 'end' | 'beforeEach' | 'afterEach'

addHook(when: HookStage, fn: Operation)

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Update dependency ts-node to v10.9.1
Update Node.js to v19
Update dependency @rollup/plugin-typescript to v9
Update dependency @types/node to v18
Update dependency ava to v5
Update dependency chalk to v5
Update dependency rollup to v3
Update dependency rollup-plugin-dts to v5
Click on this checkbox to rebase all open PRs at once

Detected dependencies

asdf

.tool-versions

node 18.12.1

circleci

.circleci/config.yml

cimg/node 18.12

npm

package.json

@changesets/cli ^2.25.0

gunzip-maybe ^1.4.2

rimraf ^3.0.2

tar-stream ^2.2.0

packages/adaptor-docs/package.json

react ^18.2.0

react-dom ^18.2.0

@rollup/plugin-typescript ^9.0.2

@types/node ^18.11.9

@types/react ^18.0.25

autoprefixer ^10.4.13

esbuild ^0.15.14

esbuild-css-modules-plugin ^2.6.2

esbuild-postcss ^0.0.4

esbuild-style-plugin ^1.6.0

tailwindcss ^3.2.4

postcss ^8.4.19

rimraf ^3.0.2

rollup ^2.79.0

rollup-plugin-dts ^4.2.2

rollup-plugin-postcss ^4.0.2

ts-lib ^0.0.5

ts-node ^10.9.1

tsup ^6.2.3

typescript ^4.8.4

react 16 || 17 || 18

react-dom 16 || 17 || 18

node >=16

pnpm >=7

packages/cli/package.json

rimraf ^3.0.2

treeify ^1.1.0

yargs ^17.5.1

@openfn/language-common 2.0.0-rc3

@types/mock-fs ^4.13.1

@types/node ^17.0.45

@types/yargs ^17.0.12

ava ^4.2.0

mock-fs ^5.1.4

ts-node ^10.8.1

tslib ^2.4.0

tsup ^6.2.3

typescript ^4.7.4

node >=16

pnpm >=7

packages/compiler/package.json

acorn ^8.8.0

ast-types ^0.14.2

recast ^0.21.2

yargs ^17.5.1

@types/node ^17.0.45

@types/yargs ^17.0.12

ava ^4.2.0

ts-node ^10.8.1

tslib ^2.4.0

tsup ^6.2.3

typescript ^4.7.4

node >=16

pnpm >=7

packages/describe-package/package.json

@typescript/vfs ^1.3.5

ava ^5.0.1

cross-fetch ^3.1.5

node-localstorage ^2.2.1

threads ^1.7.0

tsup ^6.2.3

typescript ^4.7.4

url-join ^5.0.0

@rollup/plugin-typescript ^8.5.0

@types/chai ^4.3.1

@types/node ^17.0.45

@types/node-localstorage ^1.3.0

esbuild ^0.15.7

rimraf ^3.0.2

rollup ^2.79.0

rollup-plugin-dts ^4.2.2

ts-node ^10.8.1

tslib ^2.4.0

tsm ^2.2.1

node >=16

pnpm >=7

packages/logger/package.json

chalk 4

figures ^5.0.0

prompt-confirm ^2.0.4

@types/node ^18.7.18

ava ^4.3.3

ts-node 10.8.1

tslib ^2.4.0

tsup ^6.2.3

typescript ^4.8.3

node >=16

packages/runtime-manager/package.json

@openfn/language-common 2.0.0-rc3

@types/koa ^2.13.5

@types/workerpool ^6.1.0

koa ^2.13.4

workerpool ^6.2.1

@types/koa ^2.13.5

@types/node ^17.0.31

@types/workerpool ^6.1.0

ava ^4.2.0

nodemon ^2.0.19

ts-node ^10.8.1

tslib ^2.4.0

tsm ^2.2.2

tsup ^6.2.3

typescript ^4.6.4

packages/runtime/package.json

semver ^7.3.8

@openfn/language-common 2.0.0-rc3

@types/node ^17.0.31

ava ^4.2.0

mock-fs ^5.1.4

ts-node ^10.7.0

tslib ^2.4.0

tsup ^6.2.3

typescript ^4.6.4

packages/workflow-diagram/package.json

classcat ^5.0.4

cronstrue ^2.14.0

elkjs ^0.8.2

react-flow-renderer ^10.3.16

zustand ^4.1.1

@rollup/plugin-typescript ^8.4.0

@types/chai ^4.3.1

@types/node ^18.7.14

@types/react ^18.0.18

@types/react-dom ^18.0.6

ava ^4.3.3

esbuild ^0.15.6

postcss ^8.4.16

react ^18.2.0

react-dom ^18.2.0

rimraf ^3.0.2

rollup ^2.79.0

rollup-plugin-dts ^4.2.2

rollup-plugin-postcss ^4.0.2

tailwindcss ^3.1.8

ts-node ^10.9.1

tslib ^2.4.0

tsm ^2.2.2

typescript ^4.8.2

react 16 || 17 || 18

react-dom 16 || 17 || 18

Check this box to trigger a request for Renovate to run again on this repository

[compiler] flatten module exports to allow job diagnostics

All jobs are written with the assumption the functions will be injected at some point.
Jobs essentially have a const { a, b,c } = require(<adaptor>); prepended to them, it's technically a little different but the effect is the same.

In order for the typescript compiler to provide diagnostics (and without changing how Jobs are written) we should 'flatten' the export members of our dts files.

The compiler will then see export declare function foo().. as declare function foo()....

CLI: Be monorepo aware

Now that we have a monorepo, we should make devtools aware of it.

If you're developing a language adaptor, you'll be writing it in the monorepo. You want to be able to use the new CLI to test that adaptor against jobs.

If you give the CLI a path to the monorepo, any adaptor imports should load straight from the monorepo.

This can be used a couple of ways:

pass in a monorepo=path/to/monorepo flag (or -m path/to/monotro)
Set an env var and pass a -M flag to the CLI (makes monorepo use easy but still won't load from the monorepo by default)

Some thoughts:

We should load from the build, not the source (this also ensures that .d.ts files are there)
Logging should make it very clear when adaptors have been loaded out of the monorepo (success level logging)
If using the monorepo all version stuff is ignored - no autoinstalls or anything, and any version directives in job code will be ignored

Logger: there's a level missing

We have a success, which is shown by default and has a reassuring tick

We have an info, which is not shown by default and has no particular icon.

We need another level, default I guess, or maybe log (because who doesn't want to see code like log.log('did the thing')?), of equal priority to success error and warning. It's to print important messages which shouldn't have a tick next to them.

For example, it's nice when running a command to get a high level overview of what your command will do, ie, "Running job at job.js" or "Installing module @openfn/language-http".

I think we should also have either/both of:

butterfly - print to default level with a butterly icon (generally used as a final sign off)
print({ icon, level, color, namespace }, ...message) - a general logging function allowing to print custom output.

describe-package: Better type representation

Adaptor docs doesn't currently show any type information on parameters (or return types, which may be less useful).

We need to a) extend describe-package to include good, human-readable descriptions of types, and b) render those types in adaptor-docs.

This gets particularly interesting with complex types, like objects.

I've started to add type analysis to describe-package but this issue is a reminder to take it further.

Runtime: emit events

The new runtime should also emit events.

When we call run at the moment, it returns a promise which resolves to a final value.

That promise should also emit emits, which a runtime manager can use to respond to and report on things happening inside the runtime.

Possible events include:

status_changed: returns a status string saying what the job is doing (ie, running step 4 or initialising)
operation_start/end: triggered when an operation starts or ends. Includes some statistical information (duration, memory usage).
error: an operation threw an error

This could also include automated events from adaptors, ie, http could publish an event for every http request sent out

Tailwind: what's the correct way to use styles in library components?

The components we create in kit use embedded CSS to style themselves.

This causes problems in Lightning because tailwind utilities are defined several times in conflicting CSS, resulting in weird stuff.

I have suggested a quick workaround that keeps us moving forward, but this needs more thought.

Seems to be that if I have a tailwind component with a bunch of util classes, it should "inherit" those classes from the parent app. So if I use text-s as 56px in my parent app, the component should use the same. Because of how Tailwind works, this approach (if its even correct) requires the parent app build to process the utility classes used by the components in node_modules.

I suspect the answer may be to create a Tailwind plugin for the component. I think this allows us to declare and contribute styles used by the library component to the parenting app, so it can generate the correct CSS. But this needs more research.

This problem has actually been on my mind all week, although until now it had felt a bit academic. I haven't been able to Google a good solution yet, but I will keep looking. The Tailwind UI library probably hold the answers (which is what clued me into plugins)

Linker: Disable global imports?

Since 0.14 the runtime uses a repo - basically a folder with a package json - to load dependencies from. It's up to the runtime manger to ensure these dependencies are installed to the repo so that the runtime can find them (maybe layer we'll let the runtime autoinstall but that's a problem for later).

But if the linker can't find a dependency a) at an explicit path passed in or b) in the repo, it'll do import(specifier) as a fallback, resorting to nodes standard module resolution. This means that packages loaded into the runtime or installed globally will happen to work. But there's very little transparency on this, so if you're accidentally using a globally installed adaptor, there's basically no way to know about it (there is a little debug warning but it's subtle and easy to miss, let alone understand).

So I think we should remove this global fallback entirely. Maybe for now I'll wrap it in an option which defaults to false.

tsconfig isn't quite right

note to self: the tsconfig isn't quite right, the package configs should just extend common and reset the basedir. The rest should be inherited.

State globals in the new runtime

In the v1 runtime, state is provided as a global object. So you can do stuff like this:

create({
   name: state.data.name
}});

Which is nice and good and simple.

But I'm going to suggest we remove the state global from the v2 runtime.

The new runtime has a stricter notion of immutable state. A cloned state object should be passed into each operation (it's an open question whether we allow mutation within the operation or if we force an immutable object).
This ensures the integrity of the state object in each operation, which is critical for reproducing bugs and reporting on what is happening within the job
But obviously with a global state object we can't do anything to preserve the integrity across operations.

This gives us a problem with legacy code which references global state. But it's easy to migrate!

Create new runtime and complier

describe-package build

describe-package (formerly kit/compiler) currently uses esbuild to build its exported package.

For reasons I don't understand, this is giving me grief when importing describe-package into other packages in the workspace.

For example, in the compiler, if I import { Project } from '@openfn-describe-package' in the tests, the test freezes and timesout as soon as I reference Project (I don't even have to call it, a static reference is enough to blow the tests).

Using a relative path import { Project } from '.../../describe-package/index' works (its bypassing the esbuild distribution), but then builds of the compiler fail because it's importing stuff outside of src (a tsc thing).

As a workaround I've added a rollup build to describe-package (without removing the existing esbuild).

Every package in kit should be using a consistent build tool. I don't really care which so long as the build is simple, clean and works. I favour rollup as it's more familiar to me.

We should have a discussion around this at some stage.

Compiler: transformer is too confusing!

Reminder issue.

I just tried to refactor some types in the transformer function and kind of failed.

It's a bit confusing in there, mostly because the term "visitor" is overloaded. We have to create a bunch of ast-types visitors and some open fn visitors. And when you look over the types its kind of impossible to work out.

This is pretty much a branding exercise tbh.

These steps may help:

Everything in transforms/ is a transformed (not a visitor). A Transformer is an object with an id, a list of types, and a TransformFunction
buildVisitorMap becomes buildTransformerMap (or even something better, like indexTransformers)
The TransformMap/Index is a Record<string, TransformFunction[]>
ie buildVisitorMap just pushes the transform function directly into the array
buildVisitorMethods invokes each transformer, and when it invokes it, it can be responsible for passing in the logger and options
Hang on, options is more complicated. Maybe then we pass the Transform object, as it has an id, then inside buildvisitorMethods we read the id, look up the options, then call the visitor with the node, options and logger. yep.

CLI integration tests

I'd like to setup a suite of integration tests for the CLI.

There's a lot of stuff that's hard to mock (and I'm not a fan of intricate mocks) or isn't really suitable for unit tests (not a fan of calling out to the web in unit tests)

What we need to do is:

Setup something like a docker image
Checkout the latest commit
Build it
Install the CLI globally
Run a bunch of CLI commands
- repo list
- install <various versions of the same module>
- execute a simple job against each adaptor version, with different arguments provided
Clean the repo

These are designed to be run sequentially.

This should include tests with a global module too

Logging

The runtime and compiler should provide excellent visibility of what they do are doing internally. They should basically log every little thing they do.

The runtime manager (devtools and the lightning runner) should filter and nicely display these logs so that they are readable and helpful.

This issue outline some requirements and aspirations for what the logger service should look like. It's notes, questions and ideas more than a formal design.

The Big Idea

The compiler and runtime accept a logger function as arguments (provided by the runtime manager). This logger is fed through to the job itself (probably wrapped in console.log and maybe with additional APIs).

The logger compiler and runtime should indiscriminately log basically everything that happens.

The logger needs some facility for namespacing (ie: compile, runtime, job, platform). So there's an api like:

const logger = new Logger('compiler')
compiler.compile(source, logger)

When the compiler calls logger.log or logger.warn, it should use the namespace. In devtools CLI that would look something like:

[compiler] compilation complete in 26ms!

On Lightning, we may want to send JSON objects as http objects, so its logger would output something like:

{ 
   'level': 'info',
   'source': 'compiler',
   'time': <UTC timestamp>,
   'message': 'compilation complete!'
}

The logger should be able to take global options - like show no output, or only show output matching a regex, or of a particular level. Or it should accept the same options per namespace. So eg you can show trace level compiler output and ignore runtime output.

CLI output

The CLI output could eventually look something like this:

setup
    loading job from...
    loading state from ...
    resolving adaptors
        resolved @openfn/language-common to path/to/module
compiling
    added 4 imports
    added default exports
    moved n operations to exports
running
    loaded @openfn/language-common from path/to/module
    starting operation 1: anon
    info: fetching data from wwww....
    received: {}
    operation 1 complete in 10ms
processing
    writing output to ...
done!

This would use console.group or something to handle indentation. Maybe colours too to highlight key values.

Render workflow name, make it double-clickable

User story

As a user, I would like to name my workflow
So that I can easily refer to it and understand which workflow a run is related to

Details

We want to make the required database changes to have workflows as an entity. For now, they can just have an ID and a name (or label, see Stu's comments below).

Implementation notes

Label above or near the trigger that has the workflow name. This means we wouldn't have to rearange the structure.

Release notes

Tests

For another issue:

On the workflows page, there is a button that says 'new workflow'. When I click it, there is a modal on the page asking for a workflow name. When I hit 'create', I can see the workflow name on the canvas.
Below the workflow name is a block with the job name set to 'Choose a trigger'.

QA

This has been QA’d by Amber, branch can be merged after review

New runtime manager service

The Runtime Manager Service is a long-lived, multi-threaded out-of-process worker to support Lightning. It will initially be hosted inside a runtime manager server, which provides the HTTP interface between lightning and the runtime manager.

The service itself is responsible for running jobs and workflows in worker threads (or whatever!), re-directing console.log messages to an event bus, and caching (probably just in memory) compiled job expressions. It also needs to provide a reporting API to communicate memory usage, thread counts, status of jobs, etc.

It will need to auto-install adaptors and cache compiled jobs.

The service should ideally have zero persistence.

Worker threads should not allow any file i/o within a job, and we need good unit tests to enforce and prove this.

We've started to build this out in packages/runtime-manager.

This issue builds on the new runtime (#11) , but does not include the webserver that will talk to Lightning. That will be covered in another issue (#201)

describe-package: handle namespaces in export descriptions

Some packages export a beta namespace.

At the moment, describe-package presents the beta namespaced functions as a flat list alongside the main functions - which is a) confusing and b) can lead to duplication (for example, common exports two each functions).

We basically need to de-flatten the exported list (or flag the namespace).

Probably we need to utilise the new package structure which is returned and namespaces to the functions properties, or something.

openfn / kit Goto Github PK

kit's Issues

User story

Details

Implementation notes

Release notes

Tests

1. Explicit imports

2. Global types

pnpm i seems to work

pnpm start fails

Clients

Module loading rules

What do we need?

Auto Installing Dependencies

Runtime Manager

Runtime

Loading Mechanisms

Bundled adaptors?

User story

Details

Implementation notes

Release notes

Tests

User story

Details

Implementation notes

Release notes

Tests

User story

Details

Implementation notes

Release notes

Tests

Provide an interactive CLI

Allow the the last job to be re-run with the same arguments

Name a command

Openfn config

Benchmark

User story

Details

Implementation notes

Open

Detected dependencies

The Big Idea

CLI output

User story

Details

Implementation notes

Release notes

Tests

QA

Recommend Projects

Recommend Topics

Recommend Org

`pnpm i` seems to work

`pnpm start` fails