Giter Club home page Giter Club logo

netpull's Introduction

netpull

Fast local network file transfers.

What is it?

netpull is a tool that's designed for file transfers over your local network. In particular:

  • No compression is performed.
  • No encryption is performed.

Neither of these are necessarily set in stone, but they do mean that at the moment, netpull can perform zero-copy, multithreaded file transfers over a local network.

Highlights

  • Designed for one-time pulls.
  • Runs integrity checks on the downloaded files.
  • Pretty fast.
  • Supports using multiple threads.
  • Supports resumable transfers. If your connection drops or the server system crashes, then you can resume the job.

Building

You need Bazel. Just run:

$ bazel build //...

to build everything. This should leave two binaries in bazel-bin: netpull_server and netpull_client.

Note that the build may take a bit because this builds vendored copies of Abseil, protobuf, and BoringSSL. (Well, also a custom one-file wcwidth from Termux, but if that's the compilation bottleneck then you really need a better system.)

Running the server

    -allow_ip_ranges (Allow IP addresses in this range, @private represents any
      private IP); default: ;
    -deny_ip_ranges (Deny IP addresses in this range, @private represents any
      private IP); default: ;
    -port (The default port to serve on); default: 7420;
    -root (The root directory to serve); default: ".";
    -verbose (Be verbose); default: false;
    -workers (The default number of file streaming workers to use); default: 4;

In short, you likely want to run something like:

-workers specifies the number of workers to use for file streaming or integrity checks. In general, you can give a pretty high value for this (e.g. 128, 256) as long as your system can take it.

-root is the root directory that will be served. Pretty self-explanatory...

-allow_ip_ranges specifies the IP address ranges that will be allowed to connect to the server. You can pass @private to mean that any IP addresses on your local network, or you can pass a specific IP (run ip -c addr on the client to find your local IP address). You can also pass IP ranges (e.g. 127.0.0.0-127.0.10.10, meaning any IP address falling within that range), and you can pass multiple IPs or ranges separated by commas.

In general, this is a good bet to start:

$ bazel-bin/netpull_server -workers 128 -allow_ip_ranges @private

Running the client

    -server (The server to connect to); default: 127.0.0.1:7420;
    -verbose (Be verbose); default: false;
    -workers (The default number of file forwarding workers to use); default: 4;

-workers carries a similar significance as before. However, there seem to be some bugs with some Linux Wi-Fi drivers where too many open connections at once causes the firmware to crash. If you try a ton of workers (e.g. 128), and then all of a sudden your transfer stops, and any pings fail (e.g. ping 8.8.8.8), then try to use a lower worker count. (To fix the issue once it occurs, restart your device via ip link set MYDEVICE down and ip link set MYDEVICE up, or just restart your networking system systemctl restart NetworkManager).

Of course, you often won't be connecting to localhost, so you can pass your server IP via -server. (If you didn't change the port on the server end from the default of 7420, then you can omit it here.)

So your command line might look something like:

$ bazel-bin/netpull_client -workers 32 -server 192.168.1.74 / Music

The server path (/) is relative to the root directory, so here we're pulling the entire server root's contents and placing it inside the Music directory on the client system.

You can of course do subdirectories:

$ bazel-bin/netpull_client ... /remember/nzk005 nzk005

which would pull the remember/nzk005 subdirectory's contents and place them in the nzk005 directory on the client system.

Resuming jobs

netpull maintains log files on the server (stored in ~/.cache/netpull) for each job that runs (each log file is named using the unique job ID), and these log files can be used to resume jobs that failed or got stuck.

When the client connects to the server, it will print the job ID. In order to resume this job, just use replace the server path with @jobid. For instance, if you want to resume job ID b9ccfad9e66a18e7, run:

$ bazel-bin/netpull_client ... @b9ccfad9e66a18e7 nzk005

You still have to specify the output directory.

If a job visibly fails or is interrupted, netpull will print this job ID to the screen at the very end, so it's easy to see. If for some reason you lose your job ID, you can run:

$ ls -lt ~/.cache/netpull | less

to show all the recent job IDs, starting at the most recent. Then you can figure out which run you want to resume from there.

What to do if something fails

If the client says that the connection was interrupted, received 0 bytes, broken pipe, etc., then check the server to see if it's printed error logs.

If the client or server crashes, try to get a coredump and stack trace (coredumpctl debug netpull_server or coredumpctl debug netpull_client), then file a bug. You may need to perform a debug build for that to work (bazel build //... --compilation_mode=dbg).

Known issues

Open file descriptor count

Currently, the server dups quite a few fds while crawling the filesystem. You should set a reasonably high fd ulimit in order to work around this until its fixed in netpull.

You can query the current fd ulimit with ulimit -n and the max open file descriptor count with cat /proc/sys/fs/file-max. To change the ulimit for your session, use ulimit -n new_count, and to change the max count, use echo new_count | sudo tee /proc/sys/fs/file-max.

On my system, file-max reports 1,606,110, but the ulimit -n result is only 1024. Therefore, in order to be able to fit all the fds while having some room, I ran ulimit -n 200000.

Local port range

If you are pulling a very, very large directory with many small files, you may get Address not available [error 99]. In this case, you should try increasing your TCP port range. I actually have no idea why this happens at the moment...

Directory permissions

Directory permissions are set before files are stored within. Therefore, if a directory doesn't have write permission, saving any files inside will fail. netpull will likely automatically ensure directories have rw permissions in the future; meanwhile, if you see a failure with opening a file, make sure the source directory is rw.

netpull's People

Contributors

refi64 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.