dawoodoz / dfpsr Goto Github PK

Fast realtime softare rendering library for C++14 using SSE/AVX/NEON. 2D, 3D and isometric rendering with minimal system dependencies.

Home Page: https://dawoodoz.com/dfpsr.html

C++ 92.64% C 6.42% Shell 0.74% Batchfile 0.20%

2d-graphics 3d-graphics c-plus-plus cross-platform gui isometric-graphics linux realtime-rendering rendering rendering-pipeline

dfpsr's Introduction

DFPSR

A modern software rendering library for C++14 using SSE/NEON created by David Forsgren Piuva. If you're looking for the latest mainstream fad, look elsewhere. This is a library for quality software meant to be developed over multiple decades and survive your grandchildren with minimal maintenance. Just like carving your legacy into stone, it takes more effort to master the skill but gives a more robust result by not relying on a far away library. Maximum user experience and minimum system dependency.

Creator's background

DXOMARK world record in digital video stabilization from the mobile industry. Worked with safety critical robotic vision for civilian airport traffic control. Held lectures in optimization at different companies in the mobile, medical and gaming industries. Worked with optimizations on GPU, CPU, DSP, ISP, FPGA and ASIC.

Optimization needs good tools to save your time

The most important part about optimizing code is to grasp both high algorithms and low hardware limitations, because you can not let a scientist design the algorithm and a programmer optimize it with no room for changes in handwritten assembler (the most common mistake). The algorithm design is not done until you have a good tradeoff between quality and performance with all optimizations in place. Time savings at the cost of quality in one place can be compensated by increasing quality at a lower cost somewhere else to increase both speed and quality. The faster you can create a near optimal vectorization of an algorithm, the faster you can iterate the design process. Think about what you are truly approximating. Is your goal to draw as many perfectly straight polygons as possible, or is the goal to approximate a complex real world shape using any technique?

The official website: dawoodoz.com

What your games might look like using isometric CPU rendering

Real-time dynamic light with depth-based casted shadows and normal mapping at 453 frames per second in 800x600 pixels running on the CPU. Higher resolutions would break the retro style and actually look worse, but there's lots of time left for game logic and additional effects. By pre-rendering 3D models to diffuse, normal and height images, reading the data is much more cache efficient on modern CPUs than using a free perspective. This also allow having more triangles than pixels on the screen and doing passive updates of static geometry. Low-detailed 3D models are used to cast dynamic shadows.

Traditional 3D rendering with polygons is also supported

3D rendering is not as fast as 2D or isometric rendering on the CPU, but often reaches 60 Hz in 1920x1080 pixels for low-detailed graphics. For higher detail level and more features, it is recommended to copy and modify the rendering pipeline to make it hardcoded for only the features you want and then simplify the math for your specific rendering engine, as done for the Sandbox example where only vertex colors are needed for tiny triangles without perspective, so that colors can be calculated by incrementing color values instead of interpolating from depth divided coordinates.

Why use an open-source software renderer when GPUs are so fast?

Robustness Using a software renderer will probably not ruin your system when making a mistake, unlike graphics APIs for the GPU that are prone to blue-screens.
Determinism If it worked on one computer, it will probably work the same on another computer with minor differences between operating systems. OpenGL however, has zero features remaining if you remove everything that has a bug in any driver implementation.
Low overhead When you want a low resolution for the visual style or robotic vision, you might as well keep it minimal with a statically linked software renderer when the GPU would be slower.
Easy debugging When all the data is processed in a software renderer, it is much easier to see what went wrong during debugging.
Easy to modify There are no hardware limits other than CPU cycles and memory, so you can modify the whole rendering pipeline by copying and pasting code.
Pixel exact 2D Instead of making strange workarounds over polygons, a software renderer let you work with whole pixels to begin with.

Why use this software renderer?

Minimal system dependencies Everything related to a specific system API is placed in a separate wrapper module integrating the bare minimum of essential features that should be easy to integrate on future operating systems. Unlike other software renderers, this one does not require any graphics drivers, because you can get the same performance using multi-threading by uploading the canvas on a background thread when most cores are idle anyway.
No binaries The whole library is automatically compiled from source code. Even the build system is compiling itself before building your project. This makes sure that nobody in the future has to reverse engineer century old binaries when trying to build your program, and it also makes it a lot safer against malware when everything can be inspected in readable code.
Static linking The whole library is linked statically with your program, just as if you had written the code yourself. Only core system APIs that have survived for decades are relied on as dependencies, no GPU drivers, no external media layers. Everything from how to encode Unicode characters and render fonts to how a polygon is rasterized against a depth buffer will exist within your compiled C++ program for maximum reliability and determinism. The build system allows statically linking the C++ standard libraries when possible.
Create your legacy Make software that future generations might be able to port, compile and run natively without the need for emulators or reverse engineering of proprietary graphics drivers.

Features in this library

Fully automatic C++ build system No more long lists of source files in your project. The included build system will find included headers and their by name corresponding source files automatically. Just tell it to crawl from main and let it figure out the rest automatically. Different backends for libraries are handled by including the library's project header, telling which backend to use for each platform. Checksums are used to only build what has changed, so there is no need to create a static library for parts of your code.
2D drawing Pixel exact standard draw calls for lines, rectangles, solid image copy, alpha filtered image drawing, depth buffered drawing, and stencil drawing.
3D rendering Roughly equivalent to Direct3D 7 with bi-linear texture sampling, mipmapping, lightmaps and alpha filtering when used out of the box, but can be modified to be more like Direct 3D 9 if you apply shading to textures (can use SIMD with multi-threading and be scheduled based on viewing distance).
Occlusion system The collection of rendering tasks for multi-threading also contains an occlusion grid where occlusion shapes can be drawn to skip drawing of triangles, object or whole groups if your engine implements a broad-phase for culling and occlusion tests. This fully dynamic occlusion can then be combined with static optimizations for specific games using information about which regions can be seen from each camera location.
Optional far clipping Because this graphics API only uses floating-point depth buffers for perspective, there is no need to normalize the depth values for any integer based representation. This allows selecting an infinite far clip distance when creating your camera, if you can afford rendering the entire scene at once.
Media layer Cross-platform media layer designed for robustness. Alsa and WinMM sound backends for full control over sound mixing, without having to call anything system specific yourself. Window management uses multi-threading for uploading the canvas, so that you don't need a GPU graphics drivers and heavy dependencies just to upload the result. Uses a borderless window for full-screen, so that you can easily access other programs if you get an important e-mail or instant message in the background. Upscaling is done on the CPU to work with any screen resolution without relying on graphics drivers that might give pixels the wrong interpolation or not even exist. Older media layers designed for CTR displays may cause frequency out of range errors when no graphics drivers are installed and the display does not accept the arbitrary selection of resolution. Uses an invisible cursor icon to hide the mouse, so that a crashing program will not take away the cursor when trying to kill the process.
Graphical user interface framework Load a visual interface to your window using a single line of code reading a layout file or string. Get generic handles to components using names or a combination of name and index. Add events by attaching lambda functions to component and window callbacks.
Timers Get the double precision seconds passed since the first call to the timer, so that you won't have to worry about midnight bugs when the time of day resets.
SIMD abstraction layer Use simd.h to automatically generate highly efficient SSE, AVX and NEON intrinsics from fully readable math syntax. Your vectorized code will look like a reference implementation and compiling for an unknown target architecture will generate scalar operations that can still give a performance boost by writing your algorithm with basic operations that are most often supported directly in CPU hardware, accessing memory aligned with cache lines, keeping the instruction window packed with tasks, and making it very easy for a compiler's auto-vectorization if something similar with a different name exists in the future.
Safe pointers Use SafePointer.h to catch more errors by telling your pointer which part of an allocation it may work on. Leaves no overhead in the release version, so that you can always replace your raw pointer with SafePointer and know that you will get an informative error message with the pointer's name and detailed information when something bad happens.
Strings Use UTF-32 to store characters in memory to make sure that all algorithms work with non-latin characters (compatible with U"" string literals). Saving to files default to UTF-8 (compact storage) with BOM (explicitly saying which format is used) and CR LF line endings (so that text files encoded anywhere can be read everywhere). Uses shared memory buffers automatically to allow splitting into a list of strings without flooding the heap with small allocations.
Buffers All files are saved and loaded through Buffer objects. This makes sure that all file formats you design only have to worry about how to encode the bytes, regression tests will be easy by not involving external side-effects from the file system, and any file can be bundled into your own by using the Buffer equivalent of a save function.
File management Roughly equivalent to std::filesystem from C++17, but works with C++14, uses the same String and ReadableString types on all platforms, and can automatically correct folder separators between / (Posix) and \ (MS-Windows).
Process management Can start other applications and keep track of their status, so that you can call an application like a function writing the result to files.

Summary of licenses

This library mainly uses the Zlib Open Source License, but also includes the STB Image library for saving and loading images, which has a permissive dual license (MIT / Unlicense). Because the STB Image library can be used as public domain, it does not have any legal effect on using the library as a whole under the Zlib Open Source License. All included source code with all their licenses allow both commercial and non-commercial use, including undisclosed modification of the source code. If you are not redistributing the source code, then you do not have to tell anyone that you use this library, because an insincere endorsement has no value.

Still a public beta

Theme, GUI, font and sound APIs are still under active development and may have significant changes before a stable version 1.0 is ready, because some code is just a primitive placeholder until the advanced implementation can replace it, and one must try to actually use the library before usability problems become obvious. Buffer, file, image, draw, filter, string and time APIs are however already quite version stable. You can choose to stick with a specific version for each new project, keep updated with the latest changes, or wait for stable version 1.0.

How you can help

Port to Macintosh or Wayland using the same principles of minimal dependency.
Test this beta version and give feedback on the design before version 1.0 is released.
Create different types of game engines with open-source tools.

Supported CPU hardware:

Intel/AMD using SSE2 intrinsics and optional extensions.
ARM using NEON intrinsics.
Unknown CPU architectures, without SIMD vectorization as a fallback solution.

Platforms:

Linux, tested on Mint, Mate, Manjaro, Ubuntu, RaspberryPi OS, Raspbian (Buster or later). Linux Mint needs the compiler and X11 headers, so run "sudo apt install g++" and "sudo apt install libx11-dev" before compiling. Currently supporting X11 and Wayland is planned for future versions.
Microsoft Windows, but slower than on Linux because Windows has lots of background processes and slower threading and memory management.

Might also work on:

BSD and Solaris have code targeting the platforms in fileAPI.cpp for getting the application folder, but there are likely some applications missing for running the build script. Future Posix compliant systems should only have a few quirks to sort out if it has an X11 server.
Big-Endian is supported in theory if enabling the DSR_BIG_ENDIAN macro globally, but this has never actually been tested due to difficulties with targeting such an old system with modern compilers.

Not yet ported to:

Macintosh no longer uses X11, so it will require some porting effort. Macintosh does not have a symbolic link to the binary of the running process, so it would fall back on the current directory when asking for the application folder.

Will not target:

Mobile phones. Because the constant changes breaking backward compatibility on mobile platforms would defeat the purpose of using a long-lifetime framework. Mobile platforms require custom C++ compilers, access to signal processors, screen rotation, battery saving, knowing when to display the virtual keyboard, security permissions, forced full-screen... Trying to do both at the same time would end up with design compromises in both ends like Microsoft Windows 8 or Ubuntu's Unity lock screen, so it would be better to just take bits and pieces into a new library built on different design principles.
Web frontends. Such a wrapper over this library would not be able to get the power of SIMD intrinsics for defining your own image filters, so you would be better off targeting a GPU shading language from the browser which is more suited for dynamic scripting.

dfpsr's People

Contributors

Stargazers

Watchers

Forkers

vb6hobbyst7 happydpc vitalfadeev sirdody fadelismaii matthewtolman

dfpsr's Issues

Move simdExtra.h into simd.h?

The simdExtra.h header was meant for a collection of SIMD functions that are not efficiently emulated on scalar operations, but so far it only contains the zip operation. Vector extraction is already emulated with scalars in simd.h, so this distinction does not make sense anymore. Can just write the scalar version of zip and unzip to have them moved into simd.h.

Version 1.0 should be as clean as possible to make following minor releases more version stable for projects that waited for the first stable release. Adapting to this change would only require automatically replacing "simdExtra.h" with "simd.h" to work as before.

Incorrect forward declaration of VirtualMachine

types.h forward declares it as class: https://github.com/Dawoodoz/DFPSR/blob/master/Source/DFPSR/api/types.h#L96
but it is actually a struct: https://github.com/Dawoodoz/DFPSR/blob/master/Source/DFPSR/machine/VirtualMachine.h#L261

This is not allowed, and for some compilers it will miscompile things like MediaMachine(const std::shared_ptr<VirtualMachine>& machine) - which sees only forward declaration of VirtualMachine: https://github.com/Dawoodoz/DFPSR/blob/master/Source/DFPSR/api/types.cpp#L49

How to handle loading of empty files

Buffers currently don't allow empty allocations in order to mimic closely how a raw memory allocation works. This can however seem strange on the higher level, if loading an empty file as text raises an exception. One can instead return a null handle when creating a buffer of length zero, and then treat null buffers just like buffers of length zero where it makes sense. This would however prevent someone from porting algorithms back and forth between a C allocation and the Buffer object, and one would not be able to distinguish between not having anything and having something empty.

Another option would be to allocate the Buffer head but let it contain a null data pointer when the requested size is zero. This can break algorithms that rely on catching length zero as an exception with try-catch, but also reduce the number of special cases one needs to handle, by working almost exactly like an unused padded allocation.

Returning a headless null handle would avoid causing a confusing distinction between empty and non-existing buffers, because currently one can see the buffer handle as the buffer itself when not shared. On the other hand, an empty Buffer could be treated just like a padded allocation where no data is used, by pretending that the head has an allocation.

Porting to Win32

What to do
A stable Windows port of the window wrapper in source/windowManagers would allow running on Microsoft Windows natively in full speed. This is done by defining the createBackendWindow function, which the DFPSR library will be calling to create a window. Create a class inheriting from BackendWindow and implement the virtual methods closely to how the X11 version works. Simply uploading the image being sent, trying to handle resize of the canvas without crashing, and taking input from mouse and keyboard. Full-screen and multi-threaded upload can wait for another pull request if it becomes difficult.

How to do it
Stability comes first, so don't try to force the screen's resolution into a dimension that the screen might not be able to handle. Just maximize a border-less window the safe way without exclusive access to the screen. A good system will recognize this as full-screen and get the same level of optimization but without the dangers of incompatible forced settings on unknown display devices. Even if Windows usually comes with pre-installed GPU drivers that does up-scaling, this often results in bi-linear interpolation removing the game's retro-look and the games should have the same look on different platforms. The library already have image upscaling built-in and will send the up-scaled image. All GUI stuff is also handled by the library, so the window backend just feeds input to the message queue while mapping to portable key codes.

Only system dependencies
Because this is a zero-dependency library which should be possible to just compile and run together with a program, dynamic linking to third-party media layers is not allowed. The compiled applications should be possible to run on a clean install of the operating system without installing any other software. No other libraries, no 3D accelerated graphics drivers. Users of this library should be able to use this for creating driver installers running before anything else in the system.

Compiling on Windows
Last time I compiled on Windows, I made a CodeBlocks project, included the whole content of the DFPSR library, included the program's project folder, created a module in windowManagers, selected G++14 with all warnings and included the Windows libraries. Just having the window module and a list of linked libraries would be enough to see this as completed. Improving the cross-platform build process can be another task.

Compilers
Trying to Compile with Microsoft's C++ compiler will fail because it's not standard C++14. The library has compiled with CLang before, but it will likely have its own opinions about style being contrary to GCC's suggestions, so sticking with the latest version of GCC is the easiest way to avoid a mess of ifdefs for each compiler version.

Implement real-time volumetric light on the CPU

By extruding convex shapes generated from the shadow casting models, layers of light bounds can be drawn efficiently.

"Deep particles" would be a first step to drawing volumes using the depth buffer, by just fading based on collision with the background. Saturated additive effects would also remove the need for depth sorting.

Look for bugs and usability problems

If anything seems out of the ordinary, investigate and file an issue. Even if you are technically wrong about something, you first assumption is important for improving the library's usability. It's the documentation's fault if you read it and still made a mistake.

I'm making this into an explicit task because it's easy to forget the importance of things that cannot be measured when checking things off on a long list. Quality is what matters most.

Remove flicker on window resize

When a window resizes, it usually draws some random color that the application did not request. It won't affect full-screen applications, but is annoying when resizing in windowed mode. The problem in itself is actually quite trivial, but the documentation available about native window management is almost non-existent, especially for X11.

Mouse look

To be able to rotate a camera freely with a mouse, the common trick is to hide the mouse and move it to the center of the screen very often while recording relative cursor motion. The problems with this old trick is that the cursor might stay hidden when the program crashes, and that the camera will spin very fast from rejecting the new cursor position when entering absolute cursor positions using stylus, touch-screen or eye-tracking.

While the framework should try to avoid contemporary hacks as much as possible in order to work with many different input techniques that computers may have in the future, there are ways to limit the damage. One can use an empty image as a cursor icon and associate it with the window, so that moving the mouse outside during a crash will make the cursor visible even if the program is not responding. Games can have multiple input modes for camera rotation and reset the cursor position in randomized bursts to detect when the input device uses absolute cursor positions. If the platform does not support hiding and moving the cursor, one should safely fall back on the absolute position mode that does not require moving the cursor.

For stylus without screen
When using absolute positioning to control a camera, one can limit lattitude to the screen's height, let the crosshair move sideways within a dead zone, and start spinning the camera faster the furter out the cursor goes to left or right sides.

For touch-screen, stylus with screen or eye tracking
If the input is mapped directly to the screen, it may make sense to not move the camera's lattitude and instead move the crosshair freely over the dead zone, while still rotating when hovering over left and right edges.

Another problem is how to synchronize between mouse move events arriving before and after setting the pointer location when reading mouse movements, because there will be delays from both X11 messages and buffering of events. For small motions, one can select new origins at distinct locations perpendicular to the direction of motion, making it physically difficult to create a move jumping into another region in a single frame.

Vector fonts and images

Scalable fonts would allow games that use higher resolutions without having too small or too large raster fonts. Vector fonts should consist of vector images holding glyphs. A typeface should be able to generate fonts with different settings for bold, italic, et cetera. Each vector image should generate raster images with different scales.

Custom error messages

Sometimes when the command line is flooded with profiling information, it's easy to miss warnings. An API telling which types of errors to report in which way would allow customizing it for the application's needs in debug and release modes.

Throw a standard C++ error to catch. Unsafe for anything else than terminating because jumping out can create a broken state inside of the application.
Custom callback for warnings. Useful when having a log file or GUI component displaying warnings.
Abort the action silently. Only useful in release mode for things that don't matter.

Buffer.cpp compilation fails

$ git clone https://github.com/Dawoodoz/DFPSR
Cloning into 'DFPSR'...
remote: Enumerating objects: 543, done.
remote: Counting objects: 100% (543/543), done.
remote: Compressing objects: 100% (334/334), done.
remote: Total 543 (delta 256), reused 480 (delta 204), pack-reused 0
Receiving objects: 100% (543/543), 3.61 MiB | 6.44 MiB/s, done.
Resolving deltas: 100% (256/256), done.
$ cd DFPSR/
$ git rev-parse HEAD
fff2001a157752fb8ed797afd04dafcfbdbe2b79
$ cd Source/SDK/sandbox/
$ ./build.sh 
Building version DNDEBUG_stdcpp14_O2
Compiling renderer framework.
Lazy building dfpsr
tar: Removing leading `../../' from member names
  Old md5 checksum: Compilation not completed...
  New md5 checksum: fec4ac756912177f4c163e382678e9d5
  Checksums didn't match. Rebuilding whole library to be safe.
Compiling cpp files into object files in ../../../../temporary/DNDEBUG_stdcpp14_O2 using -DNDEBUG mode.
  C++ ../../DFPSR/api/configAPI.cpp
  C++ ../../DFPSR/api/drawAPI.cpp
  C++ ../../DFPSR/api/guiAPI.cpp
  C++ ../../DFPSR/api/imageAPI.cpp
  C++ ../../DFPSR/api/mediaMachineAPI.cpp
  C++ ../../DFPSR/api/modelAPI.cpp
  C++ ../../DFPSR/api/timeAPI.cpp
  C++ ../../DFPSR/api/types.cpp
  C++ ../../DFPSR/base/Buffer.cpp
../../DFPSR/base/Buffer.cpp: In function ‘uint8_t* buffer_allocate(int32_t, std::function<void(unsigned char*)>&)’:
../../DFPSR/base/Buffer.cpp:37:35: error: ‘aligned_alloc’ was not declared in this scope
   37 |   uint8_t* allocation = (uint8_t*)aligned_alloc(buffer_alignment, newSize);
      |                                   ^~~~~~~~~~~~~
../../DFPSR/base/Buffer.cpp: In lambda function:
../../DFPSR/base/Buffer.cpp:38:42: error: ‘free’ was not declared in this scope
   38 |   targetDestructor = [](uint8_t *data) { free(data); };
      |                                          ^~~~
../../DFPSR/base/Buffer.cpp:26:1: note: ‘free’ is defined in header ‘<cstdlib>’; did you forget to ‘#include <cstdlib>’?
   25 | #include "../math/scalar.h"
  +++ |+#include <cstdlib>
   26 | 
../../DFPSR/base/Buffer.cpp: In lambda function:
../../DFPSR/base/Buffer.cpp:61:85: error: ‘free’ was not declared in this scope
   61 | : size(newSize), bufferSize(newSize), data(newData), destructor([](uint8_t *data) { free(data); }) {}
      |                                                                                     ^~~~
../../DFPSR/base/Buffer.cpp:61:85: note: ‘free’ is defined in header ‘<cstdlib>’; did you forget to ‘#include <cstdlib>’?
Failed to compile ../../DFPSR/base/Buffer.cpp!

Fix naming of character encodings

CharacterEncoding was first introduced for saving text files, so the formats that included a byte order mark had BOM_ prefixes.

Later came a separate argument telling if the BOM should be allowed or supressed, making a contradiction with the prefix. Then other methods started using the same enumeration.

It would be a relatively easy fix to automatically remove all the BOM_ prefixes, so that version 1.0 can be more easy for new users to learn.

Recycle small heap allocations

Should allow creating and destroying in different threads, because allocation is a global state and must therefore act as a server. The simplest way would be a global mutex.
Should be easy to turn off using a macro, in case that memory recycling hides a leak. It can also have its own tools for reporting allocation statistics.
Should at least work with string content and image heads.
Fixed-size heads can try to find their allocation bin in compile-time using a constexpr function of the size. Dynamic allocations have to find their bin at run-time.
Plain non-virtual structs can safely share bins with similar size types, so simplify as many virtual classes as possible to keep more active memory within cache.
A custom implementation of reference counted pointers for handles might make it easier to control allocation and recycling. Custom reference counted pointers could also allow making a C interface for core parts of the library, but it would still rely on compiling C++.

Show the focus and hover states in components

There are now state flags for direct and indirect hovering over components, which has both getters and state update notifications. There is however no integration with how components are rendered using the information, which needs to be done do that the interface gives instant feedback on which component would respond if clicked.

One can display hover using:

More permutations of pre-generated images
- Wasting memory exponentially
- Limited to a single layer
Scale a silhouette image to draw on top when needed
+ Memory efficient
- Only glowing hover styles allowed
- Slow to render
- Complicated to generate non-standard formats from the MediaMachine
Draw an alpha filtered image on top
+ Somewhat flexible in which styles to choose
- Can get RGBA image from the MediaMachine
- Even slower to render
A single image that is regenerated when state changes a property used as input for the method, just like when it is resized
+ Uses less memory
+ Full control over the background
+ Simpler code by getting a reusable state bit mask from input arguments in the theme and assigning the state directly by asking the component when detecting the arguments
+ Can be scaled up to handle 288 permutations at no extra cost (enabled/disabled, locked/writable, pressed/lifted, selected/not, checked/not, direct/indirect/no focus, direct/indirect/no hover)
+ Flexibility to add any state for any component in the theme without changing code in the component

The menu expansion might also want to display that a menu is opened using something else than a hardcoded color change. An "enabled" property might be okay to implement using only color changes, because it is rarely used.

Inconsistency between global and member methods for String class

Some dsr::String methods are in the class for accessing private data (important for safe immutability) while other methods have to be outside of the class for symmetry when merging strings. Don't know if using macros for making members public internally for string methods would be overkill, or if one can simply live with having global wrapper methods when it already exists as a public member method.

Implement borderless full-screen for Win32

The MS-Windows portability wrapper "source/windowManagers/Win32Window.cpp" currently only supports windowed mode. Full-screen should be implemented in about the same way as in X11, by safely maximizing a border-less Window.

No exclusive mode or other strange legacy, because emulating a CRT monitor makes no sense today. Changing resolution "natively" (usually emulated on the GPU) could also make retro games look blurry or have unwanted letter-boxing.

Implement a procedural API for buffers

This would abstract away technical details of the implementation and allow easily saving, loading, compressing and packing binary files for when the assets start to grow. Having a procedural API also makes it easier to find and less likely to break backward compatibility after the first release. As a fundamental feature, this has to be well tested.

Without this abstraction of Buffer handles, adding new functionality like sub-buffers (reference counting internally while still having to solve issues with aliasing) will probably break a lot of backward compatibility. The goal is to have a more clear separation between version stable APIs and internal functionality that may change while the library gets more features.

With a clear separation between file access and format parsing, regression tests will be much more useful by focusing on many smaller problems. Storing a file's raw content in a buffer is used anyway as an optimization to prevent the worst case of spending minutes to save a small file when the system frees and replaces its buffer for each new character being added.

Make an application for fixing source encoding

The source code was written on Linux and the text editors default to only using line-feed (10). However, this project should be easy to read on Microsoft Windows as well and the source code should have a formating that works on most systems. This is a good time to start making a code formating tool converting into explicit UTF-8 with Cr-Lf breaks for *.cpp and Lf breaks for *.sh. Detecting accidental use of soft-tab would also be good, but the coder should be free to break most other rules when the need arises.

Loading text files using byte order mark

string_load in Source/DFPSR/base/text.cpp is currently just basic skeleton code that works in the current examples, but it should read the BOM bytes at the beginning of a document, map the different formats correctly to the internal UTF-32 string format, and assume ascii if no BOM is detected.

While it could probably just link to some existing solution dynamically, the point of this project is that only trivial and well defined things may be left to interpretation by the compiler. If someone has to port this to another language a few hundred years for now, it's good to have a reference detailing the text interpretation byte after byte with different formats and how line endings are handled.

Create tool for designing isometric sprites with deferred light

The current script system to create models from height maps in the Sandbox example has only scratched the surface of what can be done with the rendering technique. No need for textures or separate materials, just increase the density of vertices to match textures.

Potential features:

Sculpt geometry with automatically generated normals.
Spray paint vertex colors.
Optional material channels. Specular, gloss, anisotropy, ambient occlusion, infra-red, ultra-violet, self illumination...
Apply textures as decals.
Generate materials based on calculated wear on corners and edges.
Automatic optimization of internal shadow model. Try to cover as much surface as possible with the least number of triangles, without getting too close to the surface. All details casting a shadow must be thicker than the depth bias of light sources.
Creating animations.
Increase detail level of voxel sets by diffusing the surface and comparing the intensity with a procedurally generated 3D material. This can generate a detailed brick wall from a single voxel with a material and random seed applied.
Save/load application specific format. Containing the whole workspace.
Export to a binary model format where the compression method is baked into the file to be executed by a vectorized virtual machine. The file will then contain a set of input streams and the assembler code for a machine interpreting the data. More tools can then be created without breaking version compatibility with the files. Fixed-point 16.16 values will be used for calculations, so that it has 100% determinism.

Implement distance adaptive shadows on the CPU

This can currently be done by using multiple light sources, but this is quite expensive. A combination of multiple light sources and temporal smoothing of shaking lights can improve the performance. If temporal filtering is applied as motion blur using a cyclic buffer on the final result (diffuse x light), it will look natural even if something moves. If the number of light positions in the loop equals the number of image in the cyclic buffer, the perceived image will be stable by subtracting the oldest image before adding the new image from the same relative translation offset.

If large blocks of illumination are known to have no shadows by not casting from the floor, sampling the depth based shadow maps can then be skipped and save around 80% of the computation time. Regions entirely in shadows can even be skipped completely.

The goal is to look better than Nvidia RTX enabled games (ugly noise filter) and run faster on a budget CPU.

Optimize Win32 using multi-threading

Basically just copying the optimization used for X11 where the image is uploaded on a background thread while the program moves on o the next frame. The hard thing is making sure that it does not break the call convention for the Win32 API. As usual, this optimization should be possible to disable using a macro, so that multi-threading can quickly be excluded as the cause of defects while debugging programs.

To upload one image while another is being rendered to, a swap-chain with double-buffering is needed, just like in "X11Window.cpp".

Prevent switching into fullscreen directly after creating windowed.

If one creates a window using window_create and directly calls window_setFullScreen, the canvas may get incorrect dimensions. The correct way is to construct the window using window_create_fullscreen, but one should at least be warned when something is non-deterministic. The hard thing is to define "directly" when the delay of creating a window may vary depending on the system.

Because operating systems may add new flags that have side-effects from other flags, switching back and forth between windowed and fullscreen would only be a temporary hack needing constant updates with new flags being erased. Therefore the library always creates a new window backend and erases the old one when switching between windowed and fullscreen modes.

Create a fast and reusable sound engine

Fast
SIMD vectorized playback of sounds without interpolation. Changing volume should be optional to save performance. Having the same volume on all channels should only multiply once.

Efficient
The backends should stop requesting samples when there are no sounds to play, so that it can be used for basic click sounds in an otherwise silent application. Future operating-systems might have a significant delay for initializing playback, but closing and restarting when needed would be the simplest solution to re-implement, making sure that all implementations support this feature just by cleaning up resources. One can also make a setting to run the sound engine without interrupting on silence, to get lower latency for applications that run sounds almost constantly.

Standardizing
The sound engine should try to standardize basic types and methods for storing and converting sounds, which can be in a separate sound core API, not listed in the framework subset, without adding dependencies to sound backends when only used to process sound for conversion tools. This allow sound encoders and decoders for different file formats to be reused across different sound engines. First define how to load into a lossless integer sound buffer, then use standard conversion functions to convert between different sound formats.

Non-imposing
Custom sound engines should be free not to use the SDK's example sound engine, add their own sound formats for import or in-memory compression, have their own pipeline design for distorting effects, and do their own sound mixing for different usecases and optimizations.

Make linking more robust on GNU's G++

Linking to the X11 window manager without creating a window causes linker problems with GNU's G++. Probably related to the order of linked libraries, because X11Window both calls the DFPSR library to process text and images and has its constructor called from DFPSR.

Until it can be solved properly, a hello-world template for command line applications using the library only for text processing could make a smoother experience when making a new project (by already having WINDOW_MANAGER=NONE in the compilation script).

Make the string API ready

It's hard to write a tutorial about the string API before it's located in the API folder and free from internal implementation details that are easier to hide in the cpp file than to document.

Sound portability layer

While game engines should be allowed to fully define how sounds are played and mixed in software, it would be good to at least have a sound portability layer that runs as a background thread and requests sound samples to be generated for speakers once in a while using a lambda callback.

The alternative would be to let anyone wanting sound include external sound engines just for the portability, in which features are changed when the sound engine is replaced. Sound output is simple, you just send samples to each speaker.

Due to the added work of selecting backend implementations and libraries for different platforms, adding this should have minimal impact on the ease of using the framework without sound. Maybe just an extra argument to the build scripts that can be assigned to NONE.

Create an editor for scalable parametric images and component themes

The media machine is a virtual machine with fixed-point precision for full determinism. Registers hold scalars and whole images. Most of the instructions are planar image operations. A user-defined function in the virtual machine can process or generate images. It's currently used to generate scalable graphics for visual components, but could also be used for scripting vector animations using a time variable.

Currently, there's only a default theme coded by hand in the virtual media machine's assembly language and a method for changing theme (window_applyTheme in guiAPI.h), but no theme API nor tool for creating new themes.

There should be ways to compile parametric scalable images into virtual assembly code and link them together to create different themes. A graph based programming tool might be the most powerful, so that normalizing multiplications can take additional parameters. Infix syntax would not be capable of utilizing cache-efficient compound-operations and automatic optimization doesn't work on saturated intermediate images because removing a saturation step would not be equivalent.

Replace as much std library functionality as possible

Because the std library is deprecating new features before you have time to try them, it will certainly not last for 200 years. Replacing std dependencies with compact and readable code will make projects using this library easier to port. A single complex feature in a standard library can be enough to prevent a project from being worth porting in the future. Multi-threading and timers in the C++ standard library can however not be replaced, because they interface with operating systems that have not been created yet. They might as well deprecate everything that's not a hardware abstraction and let custom libraries handle all high level functionality without the bad design decisions.

dsr::List currently depends on std::vector because it was a convenient way to start, but more direct control over the allocation would allow making features that std::vector don't have.
std::shared_ptr is used in most type handles for the API, but just making an alias could make a future replacement easier in case of standard library features being deprecated.

Port to MacOS High Sierra

The X11 window wrapper should in theory work on older Macintosh systems, but High Sierra is said to break this compatibility. Using XQartz seems like the simplest first step, but a native port might give better performance. Using a third party media layer with heavy dependencies would defeat the purpose of making this library.

Create an integration test application

To simplify porting the media layer in the future, one can have an application that goes through different tests and sais what is good enough in a summary.

Essential integration:

Create, load and destroy files in the application's folder. Try to access the same file with absolute and canonical paths. A virtual filesystem can be created in memory if not available.
Show colors in random order and ask the developer to name them. Making sure that channels are packed in the correct order and instructing which RGBA pack order to select for the system based on test results.
Showing thin lines along the canvas edge and asking the user to say where pixels are flashing, making sure that all pixels on the canvas are visible. A magnifier may be used for high resolution screens.
Asking the user to press many objects appearing randomly on the screen, with as much accuracy as possible, detecting if there is a bias or wrong scaling in mouse coordinates.
Asking the user to press certain keys. This could be used to automatically suggest remapping between mixed up keys. If physical keys are not available, they can be remapped to alternatives or integrate a virtual keyboard.
Allocate and destroy lots of resources and see if there is a memory leak. X11 may cause false detection due to taking ownership of allocated memory, which might be freed after the application terminates.
Play sounds from different speakers and ask if they were mapped correctly for mono, stereo, 4.1, 5.1, 7.1, 8.1... This might be more for the end user of a specific sound engine.

Features that might not be possible to integrate in the future:

While the left mouse key is essential, additional features should take advantage of right mouse button and scroll wheel.
Toggling full screen and asking if it worked. If not possible to change, a system is allowed to always use windowed or full screen, by simply ignoring the request.
Asking the user to follow a moving object with the mouse, to check that mouse move events are sent without too much input lag. If hover is not supported, a GUI theme without hover effects needs to be selected for programs.
Hiding and showing the cursor.
Moving the cursor. Will not work with a stylus pen, because they set the cursor location instead of adjusting it, so relative input (like a mouse) is required to pass this optional feature.
Copy text to the clipboard, paste it into a text editor, modify as described, copy and paste back the modified text. When the clipboard is not integrated, a fake clipboard allows copying and pasting text within the same application as a fallback solution, so it is easy to get fooled if not testing together with external applications.

Implement clipboard access

Copying and pasting in the textbox is currently using a global variable instead of accessing the system's clipboard. This should remain as a fallback implementation, so that implementing clipboard access for new platforms is optional.

For security reasons, access to the clipboard is tied to window management, so that anti-virus software can see a connection between pressing Ctrl+V and accessing the clipboard through the same window. There are libraries that isolate clipboard access into stand-alone applications, but using those when you already have a window displayed, may get the application incorrectly flagged as malware. Because this library already has window management connected to the components, the best security practice would be to use this connection and let clipboard access be a part of the portable window API and called automatically by attached textboxes.

Linux has features for accessing multiple clipboards, but just having default copy/paste is enough for a portable media layer, so that there are not too many things to test on different operating systems.

Handle shift and control of uncertain left/right

Tried with another keyboard, which did not distinguish between left and right shift and control keys.

A quick fix is to assign unknown to the left side. This may seem wrong, but will retain backwards compatibility and fix the bug in existing applications.

Breaking compatibility by merging keycodes would always be caught by the compiler and an easy fix with automatic text replacement, assuming that nobody needs the left/right distinction.

Introducing a third value like on MS-Windows would make it difficult to test applications.

Fix window resize flicker on MS-Windows

Getting black flickering on MS-Windows after implementing double buffering, so the flickering on X11 might have multiple causes. It could be that the first image after resize needs to duplicate the image and run on a single thread, just like when the window was recently created, so that one does not start by showing a previous image that does not exist.

Create beginner friendly HTML tutorials

Should be useful for both beginners and advanced users who plans to use the library for learning graphics from the ground up without having to worry about blue-screen-crashes and API-deprecation. Should use the storage space efficiently so that it can be saved together with the library's source code. Still GIF images for large drawings and PNG when more quality is needed. Someone using the library after a few decades will probably not be able to visit a website with a specific URL, but plain HTML documents in a folder can be read as plain text in the worst case.

Planned topics:

String system
- Basic text processing
- Parsing
2D images
- Sub-images
- 2D drawing
- Tile sets
Isometric rendering
- Passive updates
  - Dirty boxes
  - Passive background and multiple layers
- Effects
  - Depth buffering
  - Normal mapping
  - Pre-rendering
  - Light sources
    - Phong reflection model
  - Static light techniques
    - Soft light using many sources and blur filters
3D rendering
- Left handed coordinate system
  X is right, Y is up and Z is forward
  Having Z going forward makes depth buffers easier to think about.
  Telling Blender to export with Y up and Z forward may turn X to the left (which is probably bug in the PLY export script).
- Depth buffers
- Cameras
- Models
- Textures
  - Pyramid generation
Image processing and computer vision
- Implementing basic image analysis filters
  - Map and image generation using lambdas (Done)
  - Safe image sampling (Done)
  - Thresholding, morphology, distance transforms, integral images
  - Merging multiple filters into one to save time on memory access
- From reference implementation to extreme performance
  - Safe-pointer abstraction (Done)
  - SIMD instructions and the abstraction layer (Done)
  - Multi-threading and the abstraction layer
GUI
- Creating a window
  - Compilation of Win32 or X11 wrappers
  - Full-screen
- The visual components and how to use them.
- Creating custom components.
- Creating a visual interface layout
  - Relative coordinates
  - Properties
  - *.lof format's syntax
- Active versus passive interfaces
  - Power saving
- Mapping commands using lambda expressions
  - Global variables versus capture
  - DSR key codes
Storing images
- Embedding ascii images
- Saving and loading image files
Making a program
- Compiling and linking
- Debug versus release modes
- Different operating systems and processor types
- Internationalization and multiple interface layouts
Making a game
- Storing your media in a folder, embedding or generating
- Direct mouse and keyboard input

VLA fallback solution

Triangle rasterization uses small but dynamic arrays for storing pixel intervals for each row without having to fetch memory far away on the heap.

In case that the VLA C extension can suddenly no longer be used in the distant future (new CPU architecture with new conflicting feature, et cetera), it would be good to have a fallback implementation for simulating or replacing VLA when not available (just like the SIMD abstraction runs with zero overhead when not having the extensions).

A global stack on the heap would not work when called from multiple threads breaking the call order.

Carrying thread contexts would be a horribly entangled spaghetti design.

Allocating on the heap per triangle would be compact, but also horribly slow if ending up with cache misses from another thread stealing the address space. Pre-allocating the height of the target's section with even padding would have enough room for the worst case triangle height and have no allocation overhead per triangle, but this would not be easily reusable for other problems needing VLA.

Implement a window wrapper for the Wayland protocol

Some Linux systems do not support X11 but have Wayland instead, which is a leaner and more modern alternative to X11.

Due to the library's principle of being easy to compile even if some libraries are missing, one should be able to compile a program exclusively for X11 or Wayland, in case that certain developer libraries no longer exist when compiling in a distant future. One can however use header implementations for X11 and Wayland, so that different cpp wrappers can select combinations of window managers, for those who have access to both when building their program and want the application to select the best at runtime like most media layers do.

Image upload
Having multi-threaded image upload with double buffering like the X11 window wrapper would be a plus, but stability and correct visuals is the first priority. Performance can be optimized later, because window wrappers are entirely separate from application code.

Fullscreen
Full screen should not force excessive control over the display with "native fullscreen", because this is not even a real thing after people stopped using CRT displays and will always crash on Raspberry Pi with fixed resolution screens. Trying to force an LCD to a different resolution will cause emulation of the resolution in either graphics drivers or a scaling processor in the display itself. This library already has upscaling functions built in to give full control over what the pixels look like, so just make it maximized, borderless and in front of everything else.

Input
Unicode character codes should be supported.

The point of keeping window backends minimal is to allow users to write their own 100 years from now after automatically porting the C++ code to a backwards compatible language, so having someone else perform this task would be a good test to see how long it takes and whether the integration with BackendWindow is self explanatory or if a proper interface with detailed documentation is needed.

Create light-map SDK example

The use of light-maps in the Cube example is quite ugly, because the secondary texture layer is generic rather than a baked light-map for the model. Importing levels with light-maps from an existing tool or creating light-maps dynamically would allow creating a much better example together with documentation of the workflow.

Improve occlusion shapes for 3D rendering

There is currently no viewframe clipping of occlusion shapes when drawing to the occlusion grid, which makes large occlusion shapes less effective when entering a clip plane makes them vanish. With clipping, more surface would be occluded. One could also calculate the worst case depth per cell instead of per shape for long walls.

Occlusion shapes currently only support boxes, but arbitrary convex shapes from point clouds could also be added.

Make it easy to find the application's path

When releasing an application, it's good if t can be launched using a link from the desktop and still find files from the local folder. C++14 doesn't have any portable solution for this and any reusable system will be a hack that won't always work, but a real application still needs shortcuts from the desktop to work.

Store application path in a register
– No support for multiple instances (not acceptable)
– Only works on a few operating systems (not acceptable)
– Requires installer just to run (not acceptable)
Use system specific calls
– Does nothing for future compatibility
– Linking to additional libraries for this would defeat the purpose of making the library
Get path from parsing argv[0]
– Doesn't work for links from the desktop because they will only point to the shortcut
– Doesn't work for global system aliases that have no path
– Might not work when the application runs from memory and doesn't have a folder
Search for a unique filename from the current directory
– Very slow (not acceptable)
– Ambiguity if the program has clones for backups
– Potential drive-by download attack vector (you're screwed anyway if you allowed JavaScript)
std::filesystem::path::parent_path (same as in Boost)
– Requires you to know the application's path before returning the folder path, therefore solving nothing. No idea why that's even in the standard library if it's not hardware nor system abstraction.
– Requires C++17 or later. Should be less dependencies on std, not more, but the problem is essential.

Implement real-time SSAO on the CPU

Screen Space Ambient Occlusion implemented on a CPU can take advantage of many techniques that are not available in GPU pixel shaders to make it fast.

Dirty rectangles can allow only calculating a filter for the regions that have changed.

Box filtering on a CPU can be done in less runtime complexity than a separable pixel shader on the GPU. Repeated box filtering approximates gaussian blur and can probably be added to the same pass using more registers to save cache.

Other effects such as bloom can also reduce run-time complexity by allowing each following pixel to re-use the result of the previous neighbor. If the GPU is ten times as fast, the CPU will just have to do 10% of the calculations to reach the same results in the same amount of time.

Create GUI layout designer

An application that can load, edit and save interface layouts in the custom *.lof format. This will make designing a window much easier, like in Visual Studio or NetBeans, but still maintain a clear line between code and design.

The difficult part is to find the best usability for working with relative and absolute coordinates at the same time. One way to visualize this may be lines that go from the compnoent's edge to the relative anchor point. The length of the line shows the absolute offset while the anchor can be moved along 0% to 100% in whole integers. Having an anchor on the opposite side of the component can be visualized by making a sharp turn and going around the component. Selecting a component should then show both the classic 8 squares on the sides and 4 additional anchor points.

Compiler warnings

Many warnings come from an old version of stb_image. Not sure if upgrading the pasted dependency would make the warnings go away.

Should AVX-3 (a.k.a. AVX-512) be used?

Someone thought that the library might as well push for maximum performance all the way, by sacrificing some determinism. One can create another SIMD header containing longer vectors that beginners wanting determinism across hardware don't have to use.

Solutions:

Using longer vectors than 128 bits as a fixed length type across all platforms would risk running out of registers on ARMv7 where 128 bit quad registers are the largest available.
Using variable length SIMD vectors would make it very difficult for people without access to all platforms to participate in the development, when the same code behaves differently on different platforms. One could however create emulator modes where one simulates different fixed SIMD lengths without caring about performance. One could set the length of the default vector to 128, 256, 512, or 1024 bits and have buffers and images aligned accordingly. Another problem then is that very large vectorization is often divided into small and large vectorization, where you have less padding for small images and more padding for large images running with 1024 bit SIMD.

Another problem is that 512-bit SIMD is only supported on the more expensive processor models, so compilers don't enable this feature by default. One would have to manually compile different versions, just like when enabling AVX2 for faster texture sampling.

Optimize real-time rendering of dense isometric models

The Sandbox example is currently rendering a spinning barrel pressing the Y key. This feature allow mixing deep sprites with things that need bone animation or just free rotation. The rendering technique (using vertex colors instead of textures meant for offline rendering) is however not yet optimized.

Remaining work:

The algorithm used to merge dirty rectangles could be used with 128-bit alignment to split groups of overlapping dynamic objects into tasks for multi-threading.
Some parts should be able to use SIMD vectorization using smaller data types.

Integers for positions in the optimized model format can also make the format more deterministic so that models can later be saved in a compressed format and rendered in the selected resolution at program start. Only having models to load but still getting the same look on each computer would make the engine a lot easier to use for beginners unfamiliar with the coordinate systems.

SIMD vectorize sound backends

Aligning the sound buffers and vectorizing the conversions would allow creating SIMD vectorized sound engines on top of the framework. A simple sound engine where sounds must be pre-scaled when changing playback speed to avoid runtime interpolation, might be fast enough even on a weak ARM processor as long as it has NEON.

Generate optimized code for shaders using a new language

To both allow previewing models with third party shaders in the model editor and get high performance in the final release of an application, a portable shader language is needed for vertex and texture shaders. The generated code should be so readable that one can modify it by hand in C++, but the option of generating more optimized instrinsic functions for C/C++ as well would make the language reusable outside of the framework.

Texel shaders would process regions of an image, similar to the Halide language, but with built-in functions for sampling light. This would be like a subset of the media machine, but optimized for performance by allowing floating-point operations in virtualization for compound instructions and generated code for full speed. Shading to texture is the technique used on GPUs for subsurface scattering to make skin look more realistic by blurring the shadows after calculating colored glow from the light 's direction.

Pixel shaders could be referred to using names in the materials and then fetch a pre-compiled pixel shader from the graphics engine. A collection of generic built-in pixel shaders can also be added to the core renderer, to extend what can be done with interpreted vertex shaders.

Vertex shaders should work pretty much like on a GPU, but with certain limitations in control flow to make it faster on a CPU.

Interpreted mode for quick prototyping in the 3D editor:

Interpreted with large planar buffers for inteemediate values.
Intrinsic compound functions for common combinations of virtual instructions can be applied because no clamping is performed implicitly when floating-point operations are used.
Work with planar data formats to get zero overhead when mixing channels from different attributes.

Compiled mode:

Calling SIMD.h from automatically generated code.
Work with packed data formats and process one SIMD vector at a time.
The generated C++ code can be saved with the other code, so that one does not depend on getting the shader compiler to work, just the SIMD.h hardware abstraction layer. This makes sure that basic maintenance does not require knowledge about compilers or assembler.

Calling the shader generation should be easy with both external build systems and the library's own build system.

The language syntax should abstract away both vector length and how the data is stored, so that it does not matter if one uses a planar or packed vertex structure.

Conditional if statements should not be allowed, because that would not be data parallel for vectorization. Masking operations should be used instead.

Models
For best integration with the Model API, the old format will become the default vertex structure and a new type of pointer similar to SafePointer will contain a padded power-of-two element stride, so that it can access packed, planar and semi-planar data by changing the element stride in the pointer and automatically get the correct element from the [] operand by pre-multiplying the element size with the stride and getting the base two logarithm for bit shifting. If requesting a part's vertex color from a model, you get a pointer to the first FVector4D with a stride to the next element, which will be the same for all vertices in the same packing. One packing is for the final render, so that you don't need to read the other attributes used for generating light. Rarely used data is packed together further back in the allocation. Separate vertex buffers could potentially be used for different types of animation. In the 3D model editor, one can use the same type of pointer for accessing planar buffers from separate allocations, which are currently used to save memory by only cloning the attributes that changed for planar immutable undo history.

Fast occlusion and depth sorting for 3D rendering

3D rendering currently has occlusion per pixel and camera culling per model, but no higher engine abstractions that can be made faster without changing the code. Encapsulating the raw command queue with a higher abstraction that receives draw calls can let non-animated models be sorted by depth while animated models send their triangles at their current state. A quad-tree of maximum depth generated from the depth buffer allow rejecting models based on their bounding boxes without iterating over each polygon.

Triangles in the final command queue can be sorted based on perceived depth within each region of the target image to avoid drawing triangles that will become occluded. Depth sorting per triangle can also make alpha filtering look better (multiple alpha-filtered layers is deterministic but undefined behaviour).

Systems for game-specific occlusion can prevent sending too many model instances to the renderer. This can be implemented in SDK examples using different broad-phases.

IRect.h uses std::min/max without <algorithm> include

https://github.com/Dawoodoz/DFPSR/blob/master/Source/DFPSR/math/IRect.h#L56

Separate camera from triangle rendering (while still allowing infinite far clip plane)

The 3D camera is a bit strange and nested into 3D rendering instead of just having a standard 4x4 matrix. This is because being allowed to have an infinite far clip plane means not being able to normalize the camera space. Everything would become zero after dividing by infinity if applying the traditional GPU math. If there is another clean mathematical representation capable of expressing the camera's projection with the clip planes, it could be a simple pre-transforming step. This would then allow regression testing the pre-transformed triangle rasterization without the camera and make the core methods easier to understand.

Powerful transform
One possible use of a more powerful camera transformation is stereoscopic views where the center of perspective is shifted in 2D, without having to crop on side of the image. Global functions for generating camera transformation packets can then return the perspective center point within some kind of flexible yet fast transformation. Maybe normalize X and Y to -1..+1 while the Z depth remains in the original length system.

Optional near and far clip planes
For orthogonal systems, the near clip plane can also be disabled.