Open-Source Photorealism Using GPU: Cycles Render
GPGPU technology has stimulated the appearance of several renderings on the GPU in the market, including iRay, V-ray RT, Octane, and Arion. But open-source community made available at least two free renders on the GPU: SmallLuxGPU and Cycles Render. I want to share my impressions about the second one.
Cycles Render is unbiased render with the ability to render on the GPU (CUDA and OpenCL for ATI). It lies in a box with the Blender that runs on Windows, Linux, and OSX.
Cycles Render, this is a car with the procedural texture, FullHD was prepared in 2 minutes using GTX580.
I was not interested too much in the Blender, even though I know some of the advantages like openness, easy installer, and working speed. However, being tapped on the topic unbiased renders, especially on the GPU, I decided to try Cycles and brush up the Blender (at the time of publication of the article the version is 2.63).
Here is a short movie about the interactivity, and how it all works:
The rendering mode with Cycles can be done directly in the active viewport (this is not new, just a convenience), or from a camera to monitor the changes in the scene in real time.
CPU vs GPU
The CPU architecture cores x86-64 has a very cumbersome set of commands that require a large chip space. Because of this, it is difficult to place a lot of cores on a CPU, but in single-threaded applications x86 shows its best side.
But the rendering is multi-threaded to ugliness. The main thing here is high speed of floating point operations, and operating with large sizes of data requires a good memory bandwidth. GPU is much better for this purpose.
But the GPU, as a platform, initially is dedicated for hardware rasterization (OpenGL, DirectX) is rather difficult to adapt to the tasks of GPGPU. Many software solutions that are easily resolved on the CPU require considerable amount of time working with the GPU through frameworks, such as CUDA and OpenCL. Often, developers refuse to programming on the GPU, because of the algorithm complexity and weak optimization frameworks (e.g. OpenCL).
We need new processor architecture with a small instruction set, a large number of cores and a set of hardware solutions for quick additions and multiplications of floating point numbers for the mathematical operations (rendering, physics calculation). Either wait until the GPU hardware and software is better adapted to the needs of non-graphics calculations.
But we do not have available such architecture and do not want to wait until that would be cool, so the developers already cope with the GPU around the world. Of course, the rendering increases speed on the GPU in several times.
There is a small benchmark, where you can try your hardware.
My render time is (core i5 2500 vs GTX580).
Windows 7 64bit: CPU 5:39:64 CUDA 0:42:54. It is in 8.07 times.
Ubuntu 12.04 64bit: CPU 3:48:77, CUDA 0:39:03. It is in 5.84 times.
It would be interesting to find out about rendering speed on the latest Radeon.
It is an interesting fact that Unix is superior to Windows in the rendering speed of the CPU.
The GPU breakaway depends on the hardware and the complexity of procedural textures as well. In complex procedural textures, the GPU breakaway slightly reduced.
To create the desired material you should get familiar with the construction of shaders using the graph node. How it works I will try to explain. Here is an example:
I thought that would be much clearer if I do it backwards:
1. Output material is needed to display the function on the surface.
2. Shader mixes the component colors (4) and gloss (5) according to an option (3).
3.The reflection coefficient of the glossy surface (it depends on the angle of incidence than is perpendicularly to the surface that reflects less than the tangent).
4. Shader mixes shaders 6 and 7 in equal proportions (Fac = 0.5).
5. Mirror reflection (glossy surface).
6 and 7. Diffuse and glossy (roughness 0.35) components of the color.
8. Color converter. Input is Hue parameter of Fac texture (9) from 0 to 1. Output is the mixture of red color.
9. Random color generator of the cells (r, g, b), where the Fac is depth (from 0 to 1).
Here is the main concept of work:
You can combine all types of textures and surfaces. There is a FullHD.
You can create negative light source luminosity.
Light and anti-light.
Procedural can be done not only surface, but also the environment: the sky, clouds, etc. Using the nodes, you can also adjust a post-processing image.
Well, at first, this question was unclear for me, but then I realized what was happening. Here, as I understand the question is between performance and convenience, and this applies to all unbiased renders on the GPU (an exception: Arion Render and all unbiased ones on the CPU).
They have the glossy material for mirror and glossy reflections, and the diffuse is for scattered.
Here’s the thing. If the scattering is absent, then the random deviation size in terms of incidence points is equal to 0 and the beam is reflected smoothly. If 1 is (maximum) then the beam can be reflected in any direction in a hemisphere of reflection. That is, if we take the mirror and give it the maximum surface roughness, we get the white paper. At least I’m used to using the Maxwell.
If it did not turn into rough and glossy, then this is the diffuse.
The same applies to the translucent shader. Translucent refers to the diffuse refraction in rendering. I mean, Translucent is a frosted glass.
We can say that translucent looks fine.
What is clear that Glossy and Glass is close to 1 (visually, more than 0.7), it is better to use Diffuse and Translucent.
Detailed information on the shaders is here.
These issues are not fundamental to obtain a realistic picture, but still, I would like to add some plausible model of reflection for those who are accustomed to.
In addition, Cycles Render has another disadvantage. If we have multiple light sources in the scene (let’s say 2), then the probability that the beam, which is released from the camera reflected in the large light source is greater than the smallest one. When the scene is combined by soft and hard light, it might look like this (left).
We can see that in the picture.
The first thing that may come to mind is to combine the two renders in post-processing.
However, Cycles has a feature: “Sample as lamp”, which is enabled by default. If you remove a tick from it, then some beams will be reflected from objects in a random direction, but not in the direction of the light source (a pure path tracing). In this case, a small light source wins, and a big one loses a bit. I think this is a temporary solution, and sooner or later, the program will be finished and will solve this problem.
Oranges vs tomatoes
Let’s compare the Cycles with Maxwell. The resolution of 400×300, and time is 10 seconds:
Anyway, Maxwell looks much more alive.
In Maxwell algorithm takes all the load distribution.
There is loud noise from the caustic in Cycles (a caustic can be turned off), because it lacks Metropolis Sampling (optimization algorithm of radial beams, which is in the Maxwell Render).
It should be noted when using the light from the environment, or from one large light source, the image in Cycles considerably cleaner than in Maxwell.
It was rendered for 5 seconds.
Here is a little more serious (core i5, 1 min).
BVH (Bounding Volume Hierarchy)
To be honest in different renderer processes “pre-render preparation” is called differently, compiling mesh in Octane, voxelisation in Maxwell. I call it voxelisation like many other people who work with Maxwell. This thing was invented in order not to check each beam at the intersection of all triangles in the scene. What if there are millions of them. They all need to be checked for intersection. In this case, we hardly see the rate of more than a couple of samples per second. And with each new triangle, the task will be more complicated.
The disadvantage is that BVH is done in Cycles always on the CPU. Perhaps, someday will appear voxelisation on the GPU, but as long as it is not there, it has some limitations. For example, you have the scene of 10 million triangles and 8 top graphics card. They will render the image in a matter of seconds, whereas voxelisation may take more than a minute, even if you have Core i7.If you use only core i7, then voxelisation will take about a minute and the renderer about 20-30 minutes. In this case, the voxelisation time does not matter.
Voxelisation of a car above (400k triangles) takes 14 seconds.
At the interactive visualization (preview), voxelisation is done only before rendering, and when there are changes in the geometry of the object. Also, when you press Ctrl + Z I think it is new things.
When rendering (press F12) voxelisation is always done. When animation, you can avoid the constant rebuilding BVH static objects by pressing the check mark Cache BVH.
Let’s hope that soon this will be somehow resolved in favor of accelerating the voxelisation process.
OpenCL disappointed me for my Nvidia, CUDA is faster twice. Ubuntu blender just crashes with OpenCL. Win7 does not render properly with OpenCL as well, if the material is composed of several layers, then only one is shown, for example, gloss or matte component. There are a lot of bugs in the viewport.
Radeon so far does not have such bugs.
It is not difficult during the rendering on the CPU to do web surfing, and then at full load of GPU is convenient only to read.
Maybe there are some ways to change the priority of tasks on the GPU, but I do not know about them.
You can start it right now. To do so, download Blender and run Cycles at home. To select the GPU: File -> User Preferences, select the tab at the top System and on the left bottom you can choose the platform to render (CPU is the default).
Today, Cycles is good enough for visualization.
I think it would be nice to use it to render the subject: on the basis of Cycles, you can create your own Bunkspeed Shot, Hypershot, Keyshot, and Autodesk Showcase.
The enthusiasm of developers and the active open-source community make me happy.
I am looking forward to further development of this project.
|Vote for this post
Bring it to the Main Page