Client/server interface for distributed rendering.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Distributed rendering about chunky HOT 23 OPEN

chunky-dev commented on May 29, 2024

Distributed rendering

from chunky.

Comments (23)

magneticflux- commented on May 29, 2024 3

@llbit I have used JPPF for similar Java distributed computation before and it's worked well. The main goal of it is to be able to be set up/integrated extremely quickly, while also being very powerful. The data caching feature (to prevent sending lots of render data over and over) and the node balancing features seem like they would be useful, and it even comes with an OpenCL demo, so you could submit jobs that require a GPU and jobs that require a CPU separately.

In short, it's in active development, native Java, and, from my experience, has handled massive computational loads (read: several thousand games of Super Mario Bros being played by NEAT-style neural networks). While it will take some effort to restructure Chunky to be more modular so it can be used, it's a lot better than having to write your own networking and distribution code at the same time.

from chunky.

thibaultmol commented on May 29, 2024

You could use BOINC, which is an infrastructure for doing these kind of things, just saying.

from chunky.

AGSPhoenix commented on May 29, 2024

BOINC would be a bit overkill for this kind of thing, seeing as it's meant to dole out mounds of works to at least dozens of computers, and not hear from them again for hours. And the server requires a complicated install.

from chunky.

abonander commented on May 29, 2024

I really want to tackle this. I think it would help bridge the gap in processing power between rendering on a single CPU and harnessing the power of a GPU through OpenCL.

So what I'm thinking is a networked implementation of RenderManager which sends copies of the scene to all slaves which are connected over TCP. ZeroMQ would be awesome for this.

Each slave gets a tile of the canvas, like each RenderWorker does now. But each slave is its own RenderManager, and it breaks its tile up to feed to its own RenderWorkers.

I'll be working on this as my mental model of execution for Chunky as it is now is incomplete.

from chunky.

AGSPhoenix commented on May 29, 2024

I'd advise running a quick benchmark on each slave before splitting the work up; you wouldn't want half of the scene being handled by your overclocked i7 and the other half going to your old Core 2 Duo you hooked up for a minor speed boost. Depending on how the tiles work, that might not be a problem though. Just keep in mind the large power disparity among machines for the people who only have two because they didn't throw their old one away.

from chunky.

abonander commented on May 29, 2024

@AGSPhoenix Duly noted. The benchmark that's already been implemented should do nicely.

Benchmark each machine (including the master), determine their total SPS as a group, give each machine a percentage of the work equal to their percentage of the total SPS?

from chunky.

AGSPhoenix commented on May 29, 2024

Seems reasonable. However, I'm concerned about what would happen if one machine was bogged down during or after the benchmark, causing an uneven workload. Perhaps once a machine finished its work, the master could give it a copy of the slow machine's task with a different seed (or however you would go about ensuring unique samples) and then blend them together on the master? Of course, that would require occasional communication from workers reporting progress for the master to determine which machines are falling behind (Although, come to think of it, that should probably be implemented anyway).

Or perhaps it would be easier to just use many small tiles rather than implement something to have multiple machines working on a tile concurrently and just deal with less than 100% utilization.

from chunky.

TOGoS commented on May 29, 2024

Using ZeroMQ or splitting the image up into 'tiles' both complicate things more than they need to be. The simplest way would be to dole out jobs based on SPP. Like, "hey you, render this scene 100 times for me" (that it should be done with a random seed is implied), keeping a pipeline of 2 or 3 jobs active for every slave. Keep going until you have the desired SSP. It doesn't even matter if some jobs never finish, since all job results are functionally identical. You just want to make sure jobs are sized so that the overhead of sending render dumps back after each job is small relative to the amount of work being done (though since CPU and network usage are mostly independent, this just means you want rendering to take somewhat longer than transferring).

To cut down on network usage, you could easily get away with sending 16 or 32-bit floats back rather than doubles.

Otherwise, this would really just amount to automating and integrating into the client UI the whole copy scene file, render, copy dump back, merge render dump process that can already be done manually.

from chunky.

abonander commented on May 29, 2024

I thought of that but it seemed too simple of a solution. Then again, I still don't completely understand the rendering end of it. I thought simply merging separate dumps wouldn't give the right results, due to the different seeds. Though the same seed wouldn't be much use.

So yeah, that simplifies it quite a bit.

I don't think network usage is going to be a big issue. Most people now have unlimited broadband, and this feature will be the most useful over LAN anyways. I don't think many people will bother doing it over the internet, and the few that will, will likely have more than enough network resources to handle it. If there are issues, throttling shouldn't be that hard to implement.

from chunky.

TOGoS commented on May 29, 2024

Actually, now that I think of it, you don't need queues at all. Just get each slave started and have them keep sending back dumps at some interval. When you have enough (target SSP is reached or user hits 'pause'), drop their connections. By their nature, fast nodes will accomplish more than the slow ones, so you don't need to worry yourself one bit with distributing tasks evenly or throttling.

This simplicity what makes path-tracing an 'embarrassingly parallel' undertaking.

from chunky.

abonander commented on May 29, 2024

@TOGoS

Even more simplification. Thank you very much.

I figure that the master can start rendering, and then bind a listener to a configurable IP address and port. Then any slaves that want to contribute simply have to connect. After a short handshake, including communication of Chunky version and possibly memory requirements (for large chunk selections, some slaves might not have enough memory in the VM to hold the entire octree and would crash with an OutOfMemoryException—might be a good idea to predict that, and simply reject the slave if they don't have enough available memory), the master sends the scene graph, configuration, and texture pack to the slave, and the slave begins rendering.

from chunky.

AGSPhoenix commented on May 29, 2024

I have to agree, @TOGoS's solution is probably the best one. It might also be beneficial to send an occasional broadcast when rendering, and have the master show up on a list under the slave's "Network" tab, to save typing addresses.

I have several machines here, if you need help testing builds, I'm up for it.

from chunky.

abonander commented on May 29, 2024

Maybe an idea to keep in mind for the near future is some sort of job queue, a list of scenes to be rendered and their target SPPs.

With a queue, this network rendering model could be extensible to a render pool, where a master server takes in job submissions through a web interface and renders them on the pool, and people contribute their CPU cycles to benefit everyone using the pool.

This would make high SPP or otherwise extremely intensive renders possible for those who otherwise might not have the processing power to accomplish it, either at all or in a practical amount of time.

Of course, the viability of a render pool is directly dependent on the Chunky userbase. There's not much of a point to a public render pool if only three people ever contribute to it or use it, though private render pools/farms would still be useful for large creative clans, or other groups with a high render demand. A good implementation will leave this possibility open.

@AGSPhoenix I have a few machines here as well that I can test with over LAN. I'll stick to that for now. But rendering over the internet should be a well-tested function. Your assistance will be much appreciated when the time comes.

from chunky.

MarcelloNicoletti commented on May 29, 2024

I'm just checking in here. Has there been any progress in the last 8 months?

from chunky.

abonander commented on May 29, 2024

@CupricWolf I was just writing this when you commented.

I've found some time to contribute to this, so I'm finally gonna tackle this project.

Two modules, the master and the slave. The master would be configurable from the Advanced tab in the Render window. Enabling networked rendering would add a new console window below the SPP progress bar that displays connection and dump events from slaves.

The slave would be a separate program module accessible from the Launcher, launched with the same parameters as Chunky itself. It would be minimalist, starting with an automatic LAN master discovery/manual hostname input for WAN masters, and then going to a page with a progress bar and stop/pause controls. We could optionally include a render output (maybe an option to disable on the master, for privacy?).

The master communicates with slaves via a TCP port, while simultaneously facilitating automatic LAN discovery by broadcasting on that same port via UDP (disable by option).

All information will be sent via JSON, with the binary representation of the files of chunk data, textures, and render dumps encoded in base64. Endianness shouldn't be an issue as we can safely assume most processing will be done on x86-based (incl. 64-bit) processors, which are all little-endian. If this becomes an issue in the future, handling both possibilities isn't difficult to implement.

To simplify the render halt process, the master drops the connection, and the slaves halt. Slaves should regularly "ping" the TCP port to make sure the master is still running. We should not simply wait for a dump send to fail, as that could waste many CPU cycles on the slave.

from chunky.

llbit commented on May 29, 2024

Java uses network byte order, i.e. big endian, so as long as you are only working with Java you won't have to think about endianness.

Looking forward to see your progress on this, @cybergeek94 !

from chunky.

HellOnBlocks commented on May 29, 2024

I am excited to hear about the distributed rendering work also. I have several ESX farms here at work which may actually be able to assist me with rendering decent scenes quickly if progress is made here!

Thanks @cybergeek94 and @llbit.

from chunky.

abonander commented on May 29, 2024

I've started work on this feature in my fork.

@llbit I may need help with the GUI stuff. But I'll get the guts working first and let you know.

from chunky.

abonander commented on May 29, 2024

Ignore the protocol for transferring textures as it will probably change.

I'll be implementing caching for texture packs on the Slave to reduce the amount of data transferred, especially when reconnecting to a host rendering the same scene. The whole texture pack .zip will be transferred to new hosts as that will be simpler than transferring the raw texture data.

from chunky.

abonander commented on May 29, 2024

Still working on this, just been really busy.

from chunky.

abonander commented on May 29, 2024

@llbit Getting back on this, do you think you could review what I've got so far and provide comments? I'm concerned that the networking implementation might be a bit bulky.

Also, I welcome suggestions on transferring texture packs. It's a bit difficult to retrieve and send the currently selected one; I'm not sure if I actually managed to work around it or not.

My fork is here. The changes are in the network_slave folder as well as chunky/src/java/se/llbit/chunky/renderer/network

I think there's excess duplication of constants in the Headers class. Those headers could probably be converted to an enum and sent as an int ordinal but I wanted to make it simple to debug when reading the communicated data.

from chunky.

llbit commented on May 29, 2024

@cybergeek94 Sorry that I gave encouraging remarks earlier, but my opinions on this have been changing: I will not be merging anyone else's distributed code into Chunky. It is one of those things that are on my bucket list for Chunky. I'll do it eventually, and then I'll do it on my own.

This is really based on several reasons, but in the end letting others add so much code just does not sit right with me as long as I am still actively coding on Chunky myself.

What you do with your code is up to you, I am just wary of giving false expectations.

from chunky.

abonander commented on May 29, 2024

@llbit

I will drop that branch of my fork then. If there's anything you need help with, let me know. I think I had committed myself to one or two other tasks here. I'll take a look at those.

from chunky.

Distributed rendering about chunky HOT 23 OPEN

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent