Giter Club home page Giter Club logo

gnh1201 / caterpillar Goto Github PK

View Code? Open in Web Editor NEW
15.0 5.0 7.0 1008 KB

Caterpillar Proxy - The simple web debugging proxy (formerly, php-httpproxy)

Home Page: https://catswords.social/@catswords_oss

License: MIT License

Python 48.41% PHP 18.18% Dockerfile 0.19% Java 9.14% Perl 5.88% Ruby 4.92% JavaScript 5.11% Shell 0.17% HTML 7.99%
ssl network-filtering hijacking http-proxy https-proxy web-debugging-proxy parasitic-computing tls mitm k-anonymity

caterpillar's Introduction

gnh1201/caterpillar

Caterpillar Proxy - The simple web debugging proxy (formerly, php-httpproxy)

title image

Use cases

How it works

Basic structure

You <-> Proxy client (Python) <-> Parasitized proxy server (Optional, PHP) <-> On the Web

For example, build a simple web debugging proxy on the shared servers.

Stateful mode

This project supports two modes of connection. The default is stateless. You can use the stateful mode to avoid being constrained by transfer capacity limits. See the Stateful mode (github.com/gnh1201/caterpillar wiki).

(Optional) Before to use

If you have a server that will be parasitized and you want to proxy it, you should upload the index.php file to a shared server. The index.php file is located in the assets/php directory within this repository.

How to use

  1. Write a file .env(Linux) or settings.ini(Windows). Like this:
[settings]
PORT=5555
SERVER_URL=http://example.org
SERVER_CONNECTION_TYPE=stateless
CA_KEY=ca.key
CA_CERT=ca.crt
CERT_KEY=cert.key
CERT_DIR=certs/
OPENSSL_BINPATH=openssl
CLIENT_ENCODING=utf-8
  • (Optional) Create a certificate for SSL decryption
chmod +x configure_certs.sh
./configure_certs.sh
sudo apt-get install -y ca-certificates
sudo cp ca.crt /usr/local/share/ca-certificates/caterpillar-ca.crt
sudo update-ca-certificates
  1. Run python3 server.py and set HTTP(S) proxy in your web browser (e.g. Firefox, Chromium)

  2. Test 100MB/SSL, 1GB/SSL, 10GB/SSL download and check the speed.

  3. Enjoy it

  4. (Optional) With Cloudflare, we can expect to accelerate the 4x speed and reduce the network stuck.

Extensions

Thanks to

Report abuse

caterpillar's People

Contributors

gnh1201 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

caterpillar's Issues

Decoupling the fediverse features

The Caterpillar project aims to be a universal web debugging proxy. Currently, Fediverse-related features (such as spam filters) are embedded within the code without being modularized.

There is a need to separate this.

Cache services

This improvement retrieves the requested URL from cache services (such as Wayback Machine, Google Search Cache, Bing Search Cache, etc.) and delivers the response through a proxy.

HTTP Basic Authentication

Some web hosting providers require HTTP Basic Authentication when accessing their subdomains rather than standalone domains.

Moreover, even in cases where it's not necessarily required, there might be potential instances where utilizing HTTP Basic Authentication is advantageous.

Stateful relay

If we implement stateful relay, we can get the effect of improving speed, reliability, and bypass the capacity limit setting of the web server (e.g., max_upload_size) to allow large-capacity transmission.

These proposals are messages that the client (e.g., server.py) will send to the server (e.g., index.php).

Proposal: Stateless relay

This is a case where the client cannot expose the port to the outside, it is the same already implemented.

{
    "jsonrpc": "2.0",
    "method": "relay_request",
    "params": {
        "data": <base64 encoded data>,
        "compressed": <e.g. deflate, none>,    // proposal
        "client": <address of the client>,
        "server": <address of the remote server>,
        "port": <port number of the remote server>,
        "scheme": <scheme (e.g. http, https, ssl, tls),
        "url": <URL>,
        "length": <length of data>,
        "chunksize": <size of buffer (e.g. 8192)>,
        "datetime": <datetime (e.g. %Y-%m-%d %H:%M:%S.%f)>
    },
    "id": 3
}

Proposal: Stateful relay

This is a case where the client can expose the port to the outside, which works similarly to tunneling.

{
    "jsonrpc": "2.0",
    "method": "relay_connect",
    "params": {
        "client": <address of the client>
        "port": <port number of the client>,
        "chunksize": <size of buffer (e.g. 8192)>,
        "datetime": <datetime (e.g. %Y-%m-%d %H:%M:%S.%f)>
    },
    "id": 3
}

Case Study in Real World: Ambiguities in TCP and HTTP

Summary

(Here's a story from Canada.)

There is an issue in infrastructure management tools (e.g., OpenShift or similar software) where TCP and HTTP are not clearly distinguished. This has led to minor conflicts between infrastructure engineers and developers.

As a result, there is a need for a solution that allows maintaining the business logic while flexibly switching the communication method between TCP (socket) and HTTP.

My Opinion

TCP and HTTP are distinctly different concepts, but since HTTP is the most well-known application protocol running on top of TCP, the mistake of not distinguishing between them often occurs. To better understand this, you can refer to the OSI 7-layer model, which explains this distinction well.

Upon hearing this story, Caterpillar Proxy has decided to work on enabling the endpoint communication method to be switched flexibly between TCP (socket) and HTTP.

Currently, we support existing plugins through the web.py file.

Add WebAssembly support

Summary

In 2022, I conducted a PoC on executing WASM(WebAssembly) binaries using PHP and WAMR(WebAssembly Micro Runtime) on top of the LAMP stack. As the Caterpillar project is currently underway, it seems feasible to integrate the efforts made during that time into the Caterpillar project.

The Caterpillar project will provide a method for injecting WASM runtimes into shared hosting servers in the future.

Related Links

Serial (or Bluetooth) gateway support

Gateways that do not rely on well-known protocols such as WiFi or 3G/LTE provide data transmission only through serial or Bluetooth. For instance, gateways supporting LoRaWAN technology or satellite communication rarely, if ever, directly support Ethernet or WiFi, and even if they do, they are usually expensive.

We need to assess whether we can support serial or Bluetooth in this project.

ThreadPool

This project did not prioritize ThreadPool implementation as it was focused on Proof of Concept (PoC). ThreadPool implementation is necessary to resolve thread hell when a large number of connections occur.

SMTP relay

I need to implement an SMTP relay. Caterpillar will receive SMTP requests first and then forward them to a relay server written in PHP. The relay server written in PHP will then send the email to the actual recipient.

I propose the following flow:

SMTP Client (Sender) <- Caterpillar (Python) <-> Relay Server (PHP) <-> Actual recipient

Bypassing HSTS policy

HSTS only applies to software that fulfills all the specifications as a web browser. Therefore, in communications where there is no web browser involved, typical SSL MITM poses no issue.

However, if you intend to use a web browser, HSTS policies can cause inconvenience. Thus, here are some alternatives:

These alternatives are based on the assumption that we won't alter the web browser's settings. Disabling the HSTS feature by adjusting the browser settings can resolve the issue more easily than expected.

  1. Removing HSTS-related headers.
  2. Proxying with an actual web browser.

I'll add more ideas if they come up in the future.

Alternative methods for bypassing the request body limit

In the Caterpillar Proxy, we support a stateful method of tunneling using network sockets to bypass capacity limit settings in HTTP requests.

However, there's an idea for resolving capacity limits even in stateless mode.

It involves storing the data to be requested in a separate Object Storage. This way, capacity limits can be addressed even in Stateless mode.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.