Giter Club home page Giter Club logo

discovery-engine's People

Contributors

abdulkhader21 avatar achrefbensaad avatar aloksharma20 avatar amankumar2696 avatar ankurk99 avatar asifalix avatar daemon1024 avatar dku-boanlab avatar gowtham-dharsan avatar humancalico avatar kprateep avatar mohankumarmani avatar nagarajan0396 avatar nareshpandianpc avatar nyrahul avatar paveenv avatar prakashrajsaravanan avatar prateeknandle avatar pugal-k1 avatar rajasahil avatar rksharma95 avatar seswarrajan avatar seungsoo-lee avatar stefin9898 avatar sujithkasireddy avatar vishnusomank avatar vyom-yadav avatar wazir-ahmed avatar weirdwiz avatar yasin-cs-ko-ak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

discovery-engine's Issues

Enable to connect Cilium Hubble relay

Not only get the cilium traffic information from the database,

we should have an option to connect the cilium hubble relay, and get the traffic information each time interval.

Dropping egress rules if the destination ip does not map to a pod after DNS resolution

Right now, if we apply the cilium network policies that are being generated by our libraries, we see that some of the network flows are getting dropped.
Precisely, the dropped network flow corresponds to egress communication from the pods to external IPs, as we can see in the screenshot below.
We can have some function to check if a packet is being sent to an ip that does not map to any of the pod after DNS resolution, then we drop the egress rules from cilium network policy all together and just have ingress rules for the particular pod. Because that can mean that the pod has external entities communicating with it and we can't possibly have rules to whitelist every possible CIDR.

Screenshot 2020-11-04 at 11 13 20

Go unit test

  • Update go unit test per each package
  • Update go unit test script
  • Update unused functions

Configuring policy aggregation level

  • Policy aggregation level

    • Selector side: label aggregation level

      • low: no aggregation, discover all individual lable-based policy
      • medium: all pods in the selector labels communicated the target
      • high: discover selector label based on the superset labels
    • Target side: label aggregation level

      • low: no aggregation, discover all individual lable-based policy
      • medium: all pods in the selector labels communicated the target
      • high: discover selector label based on the superset labels
    • Target side: port aggregation

      • on: from min. to max. port number aggregation per same protocol
      • off: discover individual port number policy

Ignoring flows/generated-policy type

  • Ignore flows

    • Selector side:
      • namespace
      • matched labels
    • Target side:
      • namespace
      • matched labels
      • port number
      • protocol
  • Ignore policies

    • Policy type; ingress/egress
    • Rule type; matchLabels/FQDN/Entity/HTTP

Add test cases for system policy discovery

Add 8 test cases for system policy discovery

matchPaths (file operation with fromSource)
matchDirecotires (file operation with fromSource)
matchPaths (file operation w/o fromSource)
matchDirectories (file operation w/o fromSource)
 
matchPaths (process operation with fromSource)
matchDirecotires (process operation with fromSource)
matchPaths (process operation w/o fromSource)
matchDirectories (process operation w/o fromSource)

Handling Overlapping Policies

What happens if we have policies having overlapping rules?
Let's say we have a discovered policy { name: policy_1, label:xyz, rule1, rule2 } which gets added to the policy group. At a later point in time there is another policy which is discovered which is { name: policy_2, label: xyz, rule1, rule2, rule3 }.
policy_2 renders policy_1 useless i.e, policy_2 has the same labels and all the rule sets of policy_1 + more. However, with respect to the enforcement, it is possible to have both the policies applicable at the same time in the backend. This would not cause any problems.

It is possible to find redundant policies in all the groups by running a policy trace simulation engine. However, this could be optimization and not a basic requirement. This issue handles this point.

In the future, we can have an icon alongside a policy that signals it to be a redundant policy and on clicking that icon we show a list of policies against which it matches.

Enable to discover system policy (operation: Process)

  • Add a function to discover system policy (operation: "Process")
  • Add a function to aggregate the multiple process paths for the process operation policies
  • Add a function to remove duplicated system policies
  • Add functions to get/insert system policies from the MySQL database

Support system policy discovery from the system alert events

As of now, if at least 1 kubeArmorPolicy applied, KubeArmor doesn't generate the system logs anymore.

Rather, it generates the system alert events.

Thus, we need to discover the system policy based on the system alert events as well to not miss any other system policy.

Deployment for Policy discovery module

Tasks involved:

  • High-level sketch of deployment dependency
  • Updates to helm chart for knoxAutoPolicy
  • Validate/test integrated setup

knoxAutoPolicy has a dependency on other modules such as Network flow, MongoDB and knoxServicePolicy. Configure the helm charts in a way such that all the services are deployed appropriately on a new cluster and the information is shared correctly.

Requirements for Auto-discovery

Requirements

  • Discovery of label-based policy. Use-case: General L3/L4 policies
  • Discovery of IP/CIDR based policies.
  • Aggregating policies using common labels
  • Discovery of DNS based policy. Use-case: Pods talking to external services
  • Discover policies involving external service connecting to internal pods
  • Discovery of L7 policies
  • Verifying if the discovered policy will result in allowing only the specified traffic (or specified behavior)

Rule sets

Rule-sets could be based on protocol, port, HTTP attributes, FQDNs etc. For detailed rule-sets, please check:
https://docs.google.com/spreadsheets/d/1ty2ZPWCalCGoDsEqB6-2w2H9f08RVG7k9xqSWG1N37E/edit#gid=1740895243

Design requirements

  • Minimal policy set covering maximum flows
  • Decide on a common schema to be used (ideally use the schema used for flow monitoring)

Duplicate detection of Discovered Policy

The module should check if the newly detected policy is already present in the database/table. The matching has to be done based on Selector Labels and the RuleSets. It's a strict match i.e, all the labels and all the rules have to match.

Automation testing framework

  • Providing an automation testing framework by the shell script.

Then, we can test a test case by pushing inputs (flow) and outputs (expected policies).

L7 policy discovery annotation

We need to annotate a pod with io.cilium.proxy-visibility=<{Traffic Direction}/{L4 Port}/{L4 Protocol}/{L7 Protocol}> to see the payload of the packet from cilium hubble.

for example kubectl annotate pod foo -n bar io.cilium.proxy-visibility="<Egress/53/UDP/DNS>,<Egress/80/TCP/HTTP>"

Then, the cilium monitor forwards the packets that are matched with the annotation to the envoy proxy to get its payload.

So, finally, we can generate L7 network policies based on those information.

Here, what is our strategy to monitor the port number/protocol for discovering network policies?

@nyrahul

DNS/Service based L3 policy discovery

  • Discovery of DNS based policies
  • Discovery of internal k8s service-based policies
  • Discovery of external service-based policies

Use-case:

  • Pods talking to external services,
  • External services talking to pods

Flexibility of policy discovery mode

we should be able to discover network policies by three different modes for its flexibility

  • ingress-centric mode
  • egress-centric mode
  • egress-ingress mode

gRPC service implementation

  • grpc server implementation: to dynamically configure discovery options
(for now, those are defined by environment variables)
  • Save customized policy aggregation settings by configuration name

Reducing impact of external service access on Policy Discovery - toCIDR and use of rev DNS lookup

Currently, if there is an internal pod accessing external service then every flow access may lead to a different IP address. In the flow information, we see that as an external IP access and thus a toCIDR egress policy is discovered for all such flows. The external service may be hosted on hundreds of different IP addresses and thus may be a problem since it will result in different toCIDR policy everytime.

We need to use reverse DNS lookup to convert the IP address to a domain name and then over a period of time need to aggregate the policies.

Handling multiple HTTP rules

For now, the HTTP rule (method/path) is handled by the exact matching. So, for example, if the product page is defined by the product ID, there could be many paths as follows.

apiVersion: v1
kind: KnoxNetworkPolicy
metadata:
  name: autopol-egress-thbttgjfepzbadb
  namespace: hipster
  rule: matchLabels+toHTTPs+toPorts
  status: latest
  type: egress
spec:
  selector:
    matchLabels:
      app: loadgenerator
  egress:
  - matchLabels:
      app: frontend
      k8s:io.kubernetes.pod.namespace: hipster
    toPorts:
    - port: "8080"
      protocol: tcp
    toHTTPs:
    - method: GET
      path: /product/6E92ZMYYFZ
    - method: GET
      path: /product/9SIQT8TOJO
    - method: GET
      path: /cart
    - method: GET
      path: /product/0PUK6V6EV0
    - method: GET
      path: /product/OLJCESPC7Z
    - method: GET
      path: /product/1YMWWN1N4O
    - method: GET
      path: /product/2ZYFJ3GM2N
    - method: GET
      path: /product/66VCHSJNUP
  action: allow
generatedTime: 1608101559

So we need to handle those multiple paths of the HTTP rule by aggregating. The challenges here are how to merge? and why?

Simple command line interface

Provide a CLI

  • to configure the filtering for the network/system policy discovery
  • to conduct network/system policy discovery

Test items from the discussion with Brian

  • Check whether the discovered policy could be saturated

After a specific point, the discovered policy should be saturated.
if so, check how much time(or how many network flows) we need in Google hipster app.

  • Run knoxAutoPolicy in multi namespaces, and check it works correctly.

Library/Daemon to auto-discover policy

A library which takes input as a set of flows and derives discovers rules. The library is a generic lib which can be used for flows from cilium or sysdig equally.

Enable to discover system policy (operation: File)

  • Add KubeArmorSystemPolicy structure type
  • Add a function to discover system policy (operation: "File")
  • Add a function to aggregate the multiple file paths for the file operation policies
  • Add a function to build discovered kubearmor system policies

System Policy: merging the old policy into new policy

In the cron-job operation,

if we discover a new policy that has same the selector of the previous one,
we should merge the old policy into the new policy.

And, when merging those policies, we should consider the file/process the path aggregation.

Auto-discovered policies storage

  • Maintain a separate database in the git-repo to keep the auto-discovered policies
    We are keeping the auto-discovered policies in the mongodb itself.
  • When the user groups the auto-discovered policy then remove it from the auto-discovered list.
    Handled by the microservice team
  • Whenever a new policy is auto-discovered the knoxSystemPolicy daemon has to verify if the policy is the same as the previously discovered policy.

PolicyAutoDiscoveryIntegration

Why separate database to keep discovered policies?

Another option was to keep the auto-discovered policies in a separate folder in the same git-server repo. But knoxAutoPolicy daemon might have to periodically check if the newly discovered policy was already discovered previously. Secondly, the version controls cannot be applied to the discovered policies. Hence keeping it in a separate DB makes sense.

Discovered Policy Verification

Verifying if the discovered policy will result in allowing only the specified traffic (or specified behavior).

If we discover a policy such that only allow role=frontend to communicate to role=backend, the knoxAutoPolicy should verify if only those flows are impacted by going through (using cilium policy trace for e.g.) all the flows in the database.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.