cisagov / crossfeed Goto Github PK

External monitoring for organization assets

Home Page: https://docs.crossfeed.cyber.dhs.gov

License: Creative Commons Zero v1.0 Universal

Dockerfile 0.02% TypeScript 91.02% HTML 0.03% CSS 0.02% HCL 6.01% Smarty 0.07% JavaScript 0.69% Makefile 0.01% Shell 0.34% Python 1.19% SCSS 0.59%

cybersecurity infrastructure scanning

crossfeed's Issues

Form should clear after clicking "create organization"

When I click "Create organization", the org is added, but the form doesn't clear. It should clear.

Remove unused environment variables

Remove APP_SECRET from dev.env.example and all mappings for it in env.yml
Remove mapping of DOMAIN to dev/prod in env.yml

Parse security.txt/sitemap.xml/robots.txt

Also: parse directory listings if available

Integrate with login.gov for user accounts and authentication

Right now, authentication is based on AWS Cognito. We will need to manage user accounts and enforce authentication on Crossfeed. For users, we should store the following:

User name
User email address
User role - user or global admin
List of organizations with access to if user
Other relevant authentication info

We should delegate authentication to an established authentication provider. Either AWS Cognito (currently implemented) or Login.gov are viable options.

Add self-service functionality

Move login infrastructure from Cognito to our own database
integrate with login.gov

Move lambda functions & fargate to a public subnet

Because a NAT Gateway is too expensive -- we can have the same level of security with security groups.

GitHub Team -- Gain access to CISA AWS

Add support for running scanners as ECS Fargate

Some tasks will take more than the allotted 15 minutes from Lambda or will need to download large files. For this, we should use ECS Fargate.

Reference: https://www.serverless.com/blog/serverless-application-for-long-running-process-fargate-lambda

We will need to support downloading large files, which is possible via EFS - https://aws.amazon.com/blogs/containers/developers-guide-to-using-amazon-efs-with-amazon-ecs-and-aws-fargate-part-1/

login.gov initial setup not working

Running the code from #56:

When setting up code from scratch, I can't do anything because there are no organizations!

This is probably because we need to set the first name of the user, so that the user doesn't have to go through this prompt, at this code:

crossfeed/backend/src/api/auth.ts

Lines 54 to 56 in 22e02c4

 firstName: '', 

 lastName: '', 

 userType: process.env.IS_OFFLINE ? 'globalAdmin' : 'standard'

Add pshtt scanner

Add the pshtt scanner (which utilizes sslyze under the hood) in order to scan HTTPS configurations of domains.

As pshtt is a python module, it will be easiest to write this as a python Lambda function, which is then synchronously invoked by a wrapper JS function to manage DB lookups and updates.

At a high level:

Create a wrapper JS function to fetch all live websites (port 80 or 443), see getLiveWebsites in helpers.
Store the list of websites in S3 (we could pass them as input, but that's limited by the 6 MB input limit which we might exceed). For local development use serverless-s3-local.
Synchronously invoke the Python Lambda function, passing the S3 object name as input - docs
Run pshtt against all domains from the Python Lambda. Example of using pshtt programmatically.
Write output to another S3 object, return the object name.
In the wrapper Lambda, store all results in the DB

Use pre-commit to run / fix lint on commit

Build cron lambda service

Build a lambda service that will continuously fetch jobs from the database, see which ones need to be spawned, and spawn the relevant Lambda functions.

Add support for detecting when a domain is behind a CDN

https://github.com/Pascal-0x90/findcdn

Specifically, organizations may want to know which domains are not using Cloudflare / WAF.

Add Censys module

Module to fetch all relevant domains from Censys

API docs

We should start with the Censys Certificates API in order to fetch all known certificates that match a root domain.

For each root domain of each organization, we should:

Fetch certificates using the /api/v1/search/certificates endpoint (example web query). Example POST body:

{
"query": "parsed.names: cisa.gov",
"fields":["parsed"]
}

See here for a complete list of data that can be pulled.

Note that Censys has rate limiting of 1 request per second. Each page returns 100 rows, and we should request all pages.

For each returned record, insert the domain into the database if it does not exist (see how the bitdiscovery scanner does this)
For each piece of useful metadata, store this in the database as a SSLInfo record. We should try to match the fields as closely as possible, though we can create new fields if necessary.

Add organization-specific views and workflows

As an organization member, I should only seem domains that are relevant to my organization.

Consolidate SSLInfo/WebInfo to Domain model

findomain doesn't work with domains with large results

🐛 Bug Report

The findomain lambda function doesn't work with domains that have more than a handful of subdomains.

To Reproduce

Steps to reproduce the behavior:

Add an organization with domain google.com.
Run docker-compose run backend npx serverless invoke local -f findomain.
The function just hangs and never finishes.

Expected behavior

Google has ~36000 subdomains, which takes only 15 seconds if I run findomain -t google.com. These subdomains should show up in the dashboard.

Any helpful log output

My computer just hangs after the last line:

% docker-compose run backend npx serverless invoke local -f findomain  
Starting crossfeed_db_1 ... done
 
 Serverless Warning --------------------------------------
 
  A valid environment variable to satisfy the declaration 'env:CENSYS_API_ID' could not be found.
 
 
 Serverless Warning --------------------------------------
 
  A valid environment variable to satisfy the declaration 'env:CENSYS_API_SECRET' could not be found.
 
Serverless: Compiling with Typescript...
Serverless: Using local tsconfig.json
Serverless: Typescript compiled.
=> DB Connected

Integrate ARIN Whois API

The ARIN Whois API offers the ability to query for IP ranges owned by specific organizations. We can make use of this in order to gather IP ranges that belong to organizations in Crossfeed.

For each organization, we should:

Find associated organizations in ARIN. This is possible by either searching for the organization by name (i.e. http://whois.arin.net/rest/orgs;name=DEPARTMENT%20OF%20HOMELAND%20SECURITY) or tracing known associated IP addresses.
Get ARIN Organization details: https://whois.arin.net/rest/org/DHS-37.html
Collect networks (https://whois.arin.net/rest/org/DHS-37/nets), AS numbers (https://whois.arin.net/rest/org/DHS-37/asns), and contacts (https://whois.arin.net/rest/org/DHS-37/pocs).

This is all possible via the API as well as web interface.

Note that as entries in ARIN may be outdated, we should start scans for these CIDRs in passive mode until confirmed by the owner.

Add loading animations

This will make network requests (especially ones regarding auth / login) feel smoother.

User onboarding flow updates

Pre-pilot

- Organization admins should be able to invite users directly to their organization
- Implement organization-specific views
- Allow global admins to create organization and invite first admin

Future post-pilot tasks:

Users should be able to request creating an organization if theirs does not already exist
When creating an organization, users should be able to specify root domains and/or IP blocks that they own
Implement approval process for new organizations

Add edit frontend for organizations/scans

Add screenshot service

https://github.com/sensepost/gowitness

We could also integrate this with the webscraper and use https://github.com/scrapy-plugins/scrapy-splash.

Parse reverse/forward DNS datasets

Build a scanner to parse both RDNS and FDNS datasets from Project Sonar.

We will need to figure out the most efficient way to parse these datasets and filter for domains we care about, given that datasets are 10-20 GB in size.

Add authentication to tests

Backend tests are broken after #56 as authentication is now required. We should add authentication to tests in order to fix.

Migrate organization/scan CRUD backend

TypeError: Cannot read property 'filter' of undefined

After login, I get stuck on this page:

Logs:

ackend_1 | => DB Connected
backend_1 | TypeError: Cannot read property 'filter' of undefined
backend_1 | at /app/.build/src/api/auth.js:128:26
backend_1 | at step (/app/.build/src/api/auth.js:33:23)
backend_1 | at Object.next (/app/.build/src/api/auth.js:14:53)
backend_1 | at fulfilled (/app/.build/src/api/auth.js:5:58)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | TypeError: Cannot read property 'filter' of undefined
backend_1 | at /app/.build/src/api/auth.js:128:26
backend_1 | at step (/app/.build/src/api/auth.js:33:23)
backend_1 | at Object.next (/app/.build/src/api/auth.js:14:53)
backend_1 | at fulfilled (/app/.build/src/api/auth.js:5:58)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | (node:176) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'filter' of undefined
backend_1 | at /app/.build/src/api/auth.js:128:26
backend_1 | at step (/app/.build/src/api/auth.js:33:23)
backend_1 | at Object.next (/app/.build/src/api/auth.js:14:53)
backend_1 | at fulfilled (/app/.build/src/api/auth.js:5:58)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | (Use node --trace-warnings ... to show where the warning was created)
backend_1 | (node:176) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
backend_1 | (node:176) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
backend_1 | => DB Connected
backend_1 | (node:187) UnhandledPromiseRejectionWarning: TypeError [ERR_MISSING_ARGS]: The "message" argument must be specified
backend_1 | at process.target._send (internal/child_process.js:721:13)
backend_1 | at process.target.send (internal/child_process.js:706:19)
backend_1 | at process. (/app/node_modules/serverless-offline/dist/lambda/handler-runner/child-process-runner/childProcessHelper.js:47:11)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | (Use node --trace-warnings ... to show where the warning was created)
backend_1 | (node:187) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
backend_1 | (node:187) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Pull data from Digital Analytics Program

Pull data from https://analytics.usa.gov/, which federal agencies are required to use to collect analytics data.

https://analytics.usa.gov/data/live/sites-extended.csv is a nightly updated csv with all live sites that had any activity.

Also -- EOT -- https://raw.githubusercontent.com/GSA/https/master/compliance/m-15-13/data/eot-2016.csv

Add web crawler

One option: https://github.com/hakluke/hakrawler

We should extract and make accessible JavaScript files, both for finding internal endpoints and external dependencies.

Steps:

Use hakrawler to get a list of all pages
Create a Webpage model
Display this data in a table, visualize it like a directory listing
Tags (login page, admin page, directory listings, signup pages, form pages, php code)

Data to store in a Webpage:

status code
response length
response html (maybe look into elasticsearch)
GET requests
screenshot (S3 url)

Later steps:

Visualize hyperlinks / connections using a map

Change favicon from DOD -> CISA

Host Crossfeed in AWS

- Utilize Terraform setup and Github workflows
- Obtain domain name

Frontend frequently runs out of memory

The frontend container frequently runs out of memory when running locally.

frontend_1  | The build failed because the process exited too early. This probably means the system ran out of memory or someone called `kill -9` on the process.
frontend_1  | npm ERR! code ELIFECYCLE
frontend_1  | npm ERR! errno 1
frontend_1  | npm ERR! [email protected] start: `react-scripts start`
frontend_1  | npm ERR! Exit status 1
frontend_1  | npm ERR! 
frontend_1  | npm ERR! Failed at the [email protected] start script.
frontend_1  | npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
frontend_1  | 
frontend_1  | npm ERR! A complete log of this run can be found in:
frontend_1  | npm ERR!     /root/.npm/_logs/2020-07-11T21_44_44_085Z-debug.log

Add "log out" button

Periodic backend errors when running locally

When running locally, I get periodic backend errors with the following message:

backend_1   | (node:493) UnhandledPromiseRejectionWarning: TypeError [ERR_MISSING_ARGS]: The "message" argument must be specified
backend_1   |     at process.target._send (internal/child_process.js:721:13)
backend_1   |     at process.target.send (internal/child_process.js:706:19)
backend_1   |     at process.<anonymous> (/app/node_modules/serverless-offline/dist/lambda/handler-runner/child-process-runner/childProcessHelper.js:47:11)
backend_1   |     at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1   | (Use `node --trace-warnings ...` to show where the warning was created)
backend_1   | (node:493) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
backend_1   | (node:493) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Switch login.gov integration from loa to ial

This site says:

We strongly recommend using IAL for the identity proofing process. The concept of Level of Assurance (LOA) is retired by the NIST 800-63-3 digital identity guidelines, and support by login.gov for LOA requests is deprecated.

We currently use http://idmanagement.gov/ns/assurance/loa/1.

JWT improvements

Add ability to automatically refresh JWT tokens after a set period of time (e.g. 15 mins)

Examine Fargate quotas / request an increase

From this page, here are some limits on Fargate tasks:

Service quota	Description	Default quota value
Clusters per account	The maximum number of clusters per Region, per account.	10,000
Container instances per cluster	The maximum number of container instances per cluster.	2,000
Services per cluster	The maximum number of services per cluster.	2,000
Tasks using the EC2 launch type per service (the desired count)	The maximum number of tasks using the EC2 launch type per service (the desired count). This limit applies to both standalone tasks and tasks launched as part of a service.	2,000
Tasks using the Fargate launch type or the FARGATE capacity provider, per Region, per account	The maximum number of tasks using the Fargate launch type or the FARGATE capacity provider, per Region. This limit applies to both standalone tasks and tasks launched as part of a service.	100
Fargate Spot tasks, per Region, per account	The maximum number of tasks using the FARGATE_SPOT capacity provider, per Region.	250
Public IP addresses for tasks using the Fargate launch type	The maximum number of public IP addresses used by tasks using the Fargate launch type, per Region.	100

These quotas are adjustable, though. When we deploy this to production, we should estimate the quota of Fargate tasks that we need, then request an increase from AWS support.

Populate data visualizations with data

"ERR_MISSING_ARGS" error produced by scheduler

🐛 Bug Report

This can happen when I try to paginate to go to the next page.

It makes a request to /domain/search, but then the request just hangs and never completes.

In the Docker logs for the backend, I get:

(node:326) UnhandledPromiseRejectionWarning: TypeError [ERR_MISSING_ARGS]: The "message" argument must be specified
    at process.target._send (internal/child_process.js:721:13)
    at process.target.send (internal/child_process.js:706:19)
    at process.<anonymous> (/app/node_modules/serverless-offline/dist/lambda/handler-runner/child-process-runner/childProcessHelper.js:47:11)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
(Use `node --trace-warnings ...` to show where the warning was created)

Identify host versions using Wappalyzer. Investigate passive methods for observing software versions.

Monitor passive DNS for keywords

Parse Censys IP snapshot raw data - https://censys.io/data/ipv4

This will allow us to extract full port and banner information. See https://censys.io/data/ipv4.

We will want to implement an ETL pipeline that can ingest this data using Lambda functions, where each function processes one chunk of data. Censys divides each daily scan into ~2500 separate ~100 MB chunks.

We should use an SQS queue to trigger these Lambda functions. Reference: https://www.serverless.com/blog/aws-lambda-sqs-serverless-integration/

Consider moving away from TypeORM

Not sure if it's too late to do this, but given that TypeORM has been having issues with maintenance for a while (see typeorm/typeorm#3267), it might be better for the long term just to use something like sequelize with types.

Use serverless-webpack for lambda deployment

This will allow us to have smaller lambda deployment packages (as we won't have to include node_modules in the deployment packages), thus speeding up both packaging time and cold start times.

Add alerts when new other services/vulnerabilities are found

Add amass module

Utilize amass in order to passively collect subdomains given an organization's root domain.

Run amass on all root domains
Store new domains/IP address combinations in database
(optional) add task scheduling logic to schedule passive port scans directly after amass runs

This should be hosted initially as a Lambda function, though we may have to switch to ECS Fargate if execution time gets to be too long.

Refactor scanning backend

Switch from using lambda -> fargate for tasks.

DB

Keep the same schemas as the existing Scan table:

id	name	arguments	frequency	lastRun
id1	findomain scan	cisa.gov	1 hour	null

Deployment

We should push a single Docker image with all our tools (amass, findomain, etc.) to ECR, then create a task definition with this image -- this definition will be used for all Fargate tasks.
I think it's easier to manage a single docker image with all our tools on it, rather than having to create docker images for each tool.

Fargate

When launching the Fargate task, we can use overrides to specify a different start command (which tool to use) and parameters (domain name).
Fargate tasks have a minimum of 20 GB ephemeral storage -- I think that is enough for us, so I don't think we need EFS for storage right now. We could add it later if needed, though. thoughts @cablej ?
The Fargate task will be in our private subnet so that it can access the database. Potential problem: I'm not sure how networking with Fargate works. If each Fargate instance requires a separate ENI (and thus a separate private IP), we'll be limited by our subnet size -- we would need to make sure the subnet size is big enough. AWS lambda behaved the same way and had the same issue with private subnets -- until a recent update in which lambda functions were able to share ENIs and thus only require a handful of IP addresses available in a subnet. We should ask AWS Support about this.

Lambda

Run a "scheduler" lambda function that reads from each and every entry in the scan table and calls RunTask, using the task definition that we created earlier and passing in the right arguments to it.
Limitations: we can only launch at most 10 Fargate tasks every 10 seconds, so we might have to implement some kind of backoff strategy. Hope this doesn't run into the lambda time limit if we have too many lambda functions -- we might want to test this out by running like 100 Fargate tasks in a single lambda function to see if we have any issues. -- thoughts @cablej ?
- Two possible ways around this issue:
  1. Have lambda send tasks to SQS -- each item in SQS will then call a Fargate task.
  2. Use Fargate only for long-running tasks, and use lambda for the other tasks.

Also, an alternative to lambda is EventsBridge, which appears to be AWS's recommended way of scheduling ECS tasks:

EventsBridge used to be called "Cloudwatch Events" -- more info on how this works with launching ECS tasks -- (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduled_tasks.html, https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduled_tasks_cli_tutorial.html)
better than using lambda, because EventsBridge is free for AWS events -- https://aws.amazon.com/eventbridge/pricing/
however, quotas are 300 rules per event bus, but this appears to be adjustable if we contact AWS support (see https://docs.aws.amazon.com/eventbridge/latest/userguide/cloudwatch-limits-eventbridge.html)
also adds complexity to our system since we have to create / modify / delete EventsBridge tasks whenever we modify the Scan table
probably better to just stick with lambda for more flexibility / simplicity