cisagov / crossfeed Goto Github PK
View Code? Open in Web Editor NEWExternal monitoring for organization assets
Home Page: https://docs.crossfeed.cyber.dhs.gov
License: Creative Commons Zero v1.0 Universal
External monitoring for organization assets
Home Page: https://docs.crossfeed.cyber.dhs.gov
License: Creative Commons Zero v1.0 Universal
This will allow us to extract full port and banner information. See https://censys.io/data/ipv4.
We will want to implement an ETL pipeline that can ingest this data using Lambda functions, where each function processes one chunk of data. Censys divides each daily scan into ~2500 separate ~100 MB chunks.
We should use an SQS queue to trigger these Lambda functions. Reference: https://www.serverless.com/blog/aws-lambda-sqs-serverless-integration/
Some tasks will take more than the allotted 15 minutes from Lambda or will need to download large files. For this, we should use ECS Fargate.
Reference: https://www.serverless.com/blog/serverless-application-for-long-running-process-fargate-lambda
We will need to support downloading large files, which is possible via EFS - https://aws.amazon.com/blogs/containers/developers-guide-to-using-amazon-efs-with-amazon-ecs-and-aws-fargate-part-1/
When running locally, I get periodic backend errors with the following message:
backend_1 | (node:493) UnhandledPromiseRejectionWarning: TypeError [ERR_MISSING_ARGS]: The "message" argument must be specified
backend_1 | at process.target._send (internal/child_process.js:721:13)
backend_1 | at process.target.send (internal/child_process.js:706:19)
backend_1 | at process.<anonymous> (/app/node_modules/serverless-offline/dist/lambda/handler-runner/child-process-runner/childProcessHelper.js:47:11)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | (Use `node --trace-warnings ...` to show where the warning was created)
backend_1 | (node:493) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
backend_1 | (node:493) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Nuclei allows targeted scanning of hosts given YAML templates. This will let us allow configuring scans for CVEs or misconfigurations in the UI.
https://github.com/projectdiscovery/nuclei
Public templates: https://github.com/projectdiscovery/nuclei-templates
Running the code from #56:
When setting up code from scratch, I can't do anything because there are no organizations!
This is probably because we need to set the first name of the user, so that the user doesn't have to go through this prompt, at this code:
crossfeed/backend/src/api/auth.ts
Lines 54 to 56 in 22e02c4
Pre-pilot
Future post-pilot tasks:
Right now, authentication is based on AWS Cognito. We will need to manage user accounts and enforce authentication on Crossfeed. For users, we should store the following:
We should delegate authentication to an established authentication provider. Either AWS Cognito (currently implemented) or Login.gov are viable options.
Add the pshtt scanner (which utilizes sslyze under the hood) in order to scan HTTPS configurations of domains.
As pshtt is a python module, it will be easiest to write this as a python Lambda function, which is then synchronously invoked by a wrapper JS function to manage DB lookups and updates.
At a high level:
getLiveWebsites
in helpers.Add ability to automatically refresh JWT tokens after a set period of time (e.g. 15 mins)
Module to fetch all relevant domains from Censys
We should start with the Censys Certificates API in order to fetch all known certificates that match a root domain.
For each root domain of each organization, we should:
/api/v1/search/certificates
endpoint (example web query). Example POST body:{
"query": "parsed.names: cisa.gov",
"fields":["parsed"]
}
See here for a complete list of data that can be pulled.
Note that Censys has rate limiting of 1 request per second. Each page returns 100 rows, and we should request all pages.
This site says:
We strongly recommend using IAL for the identity proofing process. The concept of Level of Assurance (LOA) is retired by the NIST 800-63-3 digital identity guidelines, and support by login.gov for LOA requests is deprecated.
We currently use http://idmanagement.gov/ns/assurance/loa/1.
Also: parse directory listings if available
After login, I get stuck on this page:
Logs:
ackend_1 | => DB Connected
backend_1 | TypeError: Cannot read property 'filter' of undefined
backend_1 | at /app/.build/src/api/auth.js:128:26
backend_1 | at step (/app/.build/src/api/auth.js:33:23)
backend_1 | at Object.next (/app/.build/src/api/auth.js:14:53)
backend_1 | at fulfilled (/app/.build/src/api/auth.js:5:58)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | TypeError: Cannot read property 'filter' of undefined
backend_1 | at /app/.build/src/api/auth.js:128:26
backend_1 | at step (/app/.build/src/api/auth.js:33:23)
backend_1 | at Object.next (/app/.build/src/api/auth.js:14:53)
backend_1 | at fulfilled (/app/.build/src/api/auth.js:5:58)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | (node:176) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'filter' of undefined
backend_1 | at /app/.build/src/api/auth.js:128:26
backend_1 | at step (/app/.build/src/api/auth.js:33:23)
backend_1 | at Object.next (/app/.build/src/api/auth.js:14:53)
backend_1 | at fulfilled (/app/.build/src/api/auth.js:5:58)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | (Use node --trace-warnings ...
to show where the warning was created)
backend_1 | (node:176) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict
(see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
backend_1 | (node:176) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
backend_1 | => DB Connected
backend_1 | (node:187) UnhandledPromiseRejectionWarning: TypeError [ERR_MISSING_ARGS]: The "message" argument must be specified
backend_1 | at process.target._send (internal/child_process.js:721:13)
backend_1 | at process.target.send (internal/child_process.js:706:19)
backend_1 | at process. (/app/node_modules/serverless-offline/dist/lambda/handler-runner/child-process-runner/childProcessHelper.js:47:11)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:93:5)
backend_1 | (Use node --trace-warnings ...
to show where the warning was created)
backend_1 | (node:187) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict
(see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
backend_1 | (node:187) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Build a lambda service that will continuously fetch jobs from the database, see which ones need to be spawned, and spawn the relevant Lambda functions.
Because a NAT Gateway is too expensive -- we can have the same level of security with security groups.
From this page, here are some limits on Fargate tasks:
Service quota | Description | Default quota value |
---|---|---|
Clusters per account | The maximum number of clusters per Region, per account. | 10,000 |
Container instances per cluster | The maximum number of container instances per cluster. | 2,000 |
Services per cluster | The maximum number of services per cluster. | 2,000 |
Tasks using the EC2 launch type per service (the desired count) | The maximum number of tasks using the EC2 launch type per service (the desired count). This limit applies to both standalone tasks and tasks launched as part of a service. | 2,000 |
Tasks using the Fargate launch type or the FARGATE capacity provider, per Region, per account | The maximum number of tasks using the Fargate launch type or the FARGATE capacity provider, per Region. This limit applies to both standalone tasks and tasks launched as part of a service. | 100 |
Fargate Spot tasks, per Region, per account | The maximum number of tasks using the FARGATE_SPOT capacity provider, per Region. | 250 |
Public IP addresses for tasks using the Fargate launch type | The maximum number of public IP addresses used by tasks using the Fargate launch type, per Region. | 100 |
These quotas are adjustable, though. When we deploy this to production, we should estimate the quota of Fargate tasks that we need, then request an increase from AWS support.
Build a scanner to parse both RDNS and FDNS datasets from Project Sonar.
We will need to figure out the most efficient way to parse these datasets and filter for domains we care about, given that datasets are 10-20 GB in size.
One option: https://github.com/hakluke/hakrawler
We should extract and make accessible JavaScript files, both for finding internal endpoints and external dependencies.
Steps:
Data to store in a Webpage:
Later steps:
Remove APP_SECRET from dev.env.example and all mappings for it in env.yml
Remove mapping of DOMAIN to dev/prod in env.yml
This will make network requests (especially ones regarding auth / login) feel smoother.
The frontend container frequently runs out of memory when running locally.
frontend_1 | The build failed because the process exited too early. This probably means the system ran out of memory or someone called `kill -9` on the process.
frontend_1 | npm ERR! code ELIFECYCLE
frontend_1 | npm ERR! errno 1
frontend_1 | npm ERR! [email protected] start: `react-scripts start`
frontend_1 | npm ERR! Exit status 1
frontend_1 | npm ERR!
frontend_1 | npm ERR! Failed at the [email protected] start script.
frontend_1 | npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
frontend_1 |
frontend_1 | npm ERR! A complete log of this run can be found in:
frontend_1 | npm ERR! /root/.npm/_logs/2020-07-11T21_44_44_085Z-debug.log
Pull data from https://analytics.usa.gov/, which federal agencies are required to use to collect analytics data.
https://analytics.usa.gov/data/live/sites-extended.csv is a nightly updated csv with all live sites that had any activity.
Also -- EOT -- https://raw.githubusercontent.com/GSA/https/master/compliance/m-15-13/data/eot-2016.csv
As an organization member, I should only seem domains that are relevant to my organization.
The ARIN Whois API offers the ability to query for IP ranges owned by specific organizations. We can make use of this in order to gather IP ranges that belong to organizations in Crossfeed.
For each organization, we should:
This is all possible via the API as well as web interface.
Note that as entries in ARIN may be outdated, we should start scans for these CIDRs in passive mode until confirmed by the owner.
Not sure if it's too late to do this, but given that TypeORM has been having issues with maintenance for a while (see typeorm/typeorm#3267), it might be better for the long term just to use something like sequelize with types.
This will allow us to have smaller lambda deployment packages (as we won't have to include node_modules in the deployment packages), thus speeding up both packaging time and cold start times.
Backend tests are broken after #56 as authentication is now required. We should add authentication to tests in order to fix.
This can happen when I try to paginate to go to the next page.
It makes a request to /domain/search
, but then the request just hangs and never completes.
In the Docker logs for the backend, I get:
(node:326) UnhandledPromiseRejectionWarning: TypeError [ERR_MISSING_ARGS]: The "message" argument must be specified
at process.target._send (internal/child_process.js:721:13)
at process.target.send (internal/child_process.js:706:19)
at process.<anonymous> (/app/node_modules/serverless-offline/dist/lambda/handler-runner/child-process-runner/childProcessHelper.js:47:11)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
(Use `node --trace-warnings ...` to show where the warning was created)
Switch from using lambda -> fargate for tasks.
Keep the same schemas as the existing Scan table:
id | name | arguments | frequency | lastRun |
---|---|---|---|---|
id1 | findomain scan | cisa.gov | 1 hour | null |
Also, an alternative to lambda is EventsBridge, which appears to be AWS's recommended way of scheduling ECS tasks:
Utilize amass in order to passively collect subdomains given an organization's root domain.
This should be hosted initially as a Lambda function, though we may have to switch to ECS Fargate if execution time gets to be too long.
https://github.com/sensepost/gowitness
We could also integrate this with the webscraper and use https://github.com/scrapy-plugins/scrapy-splash.
The findomain lambda function doesn't work with domains that have more than a handful of subdomains.
Steps to reproduce the behavior:
google.com
.docker-compose run backend npx serverless invoke local -f findomain
.Google has ~36000 subdomains, which takes only 15 seconds if I run findomain -t google.com
. These subdomains should show up in the dashboard.
My computer just hangs after the last line:
% docker-compose run backend npx serverless invoke local -f findomain
Starting crossfeed_db_1 ... done
Serverless Warning --------------------------------------
A valid environment variable to satisfy the declaration 'env:CENSYS_API_ID' could not be found.
Serverless Warning --------------------------------------
A valid environment variable to satisfy the declaration 'env:CENSYS_API_SECRET' could not be found.
Serverless: Compiling with Typescript...
Serverless: Using local tsconfig.json
Serverless: Typescript compiled.
=> DB Connected
https://github.com/Pascal-0x90/findcdn
Specifically, organizations may want to know which domains are not using Cloudflare / WAF.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.