Giter Club home page Giter Club logo

Comments (6)

cadop avatar cadop commented on May 13, 2024

@mariuszhermansdorfer thanks for taking the time to provide the links.

If you could let me know which configuration you think is the most beneficial speedup for you application I can make a specific function/interface for it (assuming you don't want to add one in the C++ source yourself). If it does give some meaningful performance difference i'll try to roll out the interface as you suggested into its own function (explanation below).

There are a few aspects to adding these to the core API:

Speed:

  1. The streaming rays don't guarantee order, so the user would need to do their own sorting and ID checks (inside python or c#), which especially for the python interface will remove any performance benefit.

The implementation of the stream ray query functions may re-order rays arbitrarily and re-pack rays into ray packets of different size. For this reason, callback functions may be invoked with an arbitrary packet size (of size 1, 4, 8, or 16) and different ordering as specified initially. For this reason, one may have to use the rayID component of the ray to identify the original ray, e.g. to access a per-ray payload.

  1. It was a while ago, but we did test the speed differences and it was difficult to see a performance difference. In particular, users should be using:
    std::vector<char> EmbreeRayTracer::Occlusions(
    const std::vector<std::array<float, 3>>& origins,
    const std::vector<std::array<float, 3>>& directions,
    float max_distance, bool use_parallel)
    {
    Because this parallelizes the rays into chunks and scales to the number of cores on the users computer, we found it to be faster than trying to use the streaming rays.

Explaining the above two items is difficult for most users and can end up in confusion or worse performance. From your other issue I see the application to sun study, so if there is a meaningful speedup for that type of thing we could also make a specific extension/function for it.

Clarity/Flexibility:

  1. (perhaps not of your concern) Although the example code keeps using the term EmbreeRaytracer, we use other raytracers as well. For example, if you use the double precision flag, since embree doesn't support it, we use NanoRT. Naturally, we don't want to confuse users by specifying all these flags in a complex function parameter and they don't all work depending on previous decisions/contexts of the bvh.

from dhart.

mariuszhermansdorfer avatar mariuszhermansdorfer commented on May 13, 2024

Thanks for your detailed answer @cadop.

As you might imagine, the reason I asked for these additional modes is speed. If it turns out that using ray streams doesn't come with any speed benefit at all, I would be the first one to remove it.

Currently, I'm working on a sunlight analysis. It takes the following inputs:

  • analysis plane as mesh with around 500.000 cells
  • context buildings/trees/shading structures etc. joined into a mesh
  • date & time range

From the context a bvh is created.
For each date & time value (in 1 hour steps) I calculate a sun vector.
Then, for each cell, I shoot a ray with the origin in the cell center and direction opposite to the sun vector. If the ray hits the context mesh, the cell is marked as occluded, otherwise it gets direct sunlight.

From my understanding, this scenario could benefit from shooting rays with the RTC_INTERSECT_CONTEXT_FLAG_COHERENT flag. Also, I'm hoping that grouping rays into packets could speed this up as well.

Again, it'd need to be benchmarked to know for sure.

from dhart.

cadop avatar cadop commented on May 13, 2024

Just to make sure, you are using this function? https://cadop.github.io/dhart/C%23%20Public%20Docs/html/class_d_h_a_r_t_a_p_i_1_1_ray_tracing_1_1_embree_raytracer.html#a1a96d5b61f43fe87e2649ef612dd63ff

and only passing one direction (inverse sun vector) based on:

One direction, multiple origins: Cast a ray in the given direction from each origin point in origins.

You could also try to duplicate the the origins into one large vector for using this version:

Equal amount of directions/origins: Cast a ray for every pair of origin/direction in order.i.e. (origin[0], direction[0]), (origin[1], direction[1]), etc.

This would take more time in generating data in C#, but would mean there is only one call to dhart and all rays would be parallelized. I'm not sure if it would be faster.

Could you also make sure that system monitor shows multiple cores being used and its not single threaded, and how much time is the raycasting 500k cells taking?

The example you give does seem to be the ideal case for coherent rays. I will make an example within the next week or so to test this out. So the plan is:

  1. change the current occlusion method to use the rtcOccluded1m, which seems to decide on its own how to repacket rays. And use RTC_INTERSECT_CONTEXT_FLAG_COHERENT. (probably I will make a temporary hack, where setting the max ray distance to 99999 will switch the occlusion method).
  2. given this new method is meaningfully faster change the case where there are multiple rays and a single direction to always use the stream.

from dhart.

mariuszhermansdorfer avatar mariuszhermansdorfer commented on May 13, 2024

Yes, Iā€™m using this function and pass an array of origins (500k points) and one reverse sun vector at a time. This code runs in parallel as all the CPU get 100% load.

 _analysisPoints = new List<Point3d>();
            _sunDirections = new List<Vector3d>();
            DA.GetDataList(0, _analysisPoints);
            DA.GetData(1, ref _context);
            DA.GetDataList(2, _sunDirections);

            System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
            sw.Start();

            MeshInfo contextMesh = new MeshInfo(_context.Faces.ToIntArray(true), _context.Vertices.ToFloatArray());

            LogTime(sw, "Initial setup");

            EmbreeBVH bvh = new EmbreeBVH(contextMesh);
            LogTime(sw, "Setup bvh");

            Vector3D[] sunDirections = ConvertListOfVectors(_sunDirections);
            Vector3D[] analysisPoints = ConvertListOfVectors(_analysisPoints);

            LogTime(sw, "Data conversion");

            for (int i = 0; i < sunDirections.Length; i++)
                EmbreeRaytracer.IntersectOccluded(bvh, analysisPoints, new Vector3D[] { sunDirections[i] });

            LogTime(sw, "Embree Ray casting");

The code is quite fast already - here is a test with 100k points and only one sun vector. Embree is 5x faster than native Rhino ray cast:

Do you have Rhino on your machine? I will put this test file into the dedicated branch so that we have a common base for benchmarks.

from dhart.

mariuszhermansdorfer avatar mariuszhermansdorfer commented on May 13, 2024

Update, here are the results for 500K points and one ray:
image

Casting 11 rays corresponding to 11 hours of sunlight for a chosen day, gives me this:
image

EDIT:
Benchmark of the release version:
500K origin points, one ray
image

500K origin points, 11 rays
image

Performance scales nearly linearly and compute is definitely multi-threaded. Ideally, I'd like this to run at 30FPS.
Let's see how far we can push it :)

from dhart.

cadop avatar cadop commented on May 13, 2024

Hey @mariuszhermansdorfer , yes I have rhino, and thanks for showing these results. It seems the first was is 50x faster than rhino.

Would you mind making a Discussion on the performance and just mention this issue. I'd like to keep the conversation going in a more longterm format instead of within the issue of ray streaming.

I am not sure about getting to 30fps, which would be ~33ms. I'll followup more on the discussion post for some ways to check bottlenecks since there is some data transfer between c#, c interface, and c++.

from dhart.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.