Thursday, April 17, 2014

Octahedron normal vector encoding

Many rendering techniques benefit from encoding normal (unit) vectors. For example in deferred shading G-buffer space is a limited resource. Additionally it's nice to be able to encode world space normals with uniform precision. Some encoding techniques work only for view space normals, because they use variable precision depending on normal direction.

World space normals have some nice properties - they don't depend on camera. This means that on static objects specular and reflections won't wobble when camera moves (imagine FPS game with slight camera movement on idle). Besides their precision doesn't depend on camera. This is important because sometimes we need to deal with normals pointing away from camera. For example because of normals map and perspective correction or because of calculating lighting for back side (subsurface scattering).

Octahedron-normal vectors [MSS*10] are a simple and clever extension of octahedron environment maps [ED08]. The idea is to encode normals by projecting then on a octahedron, folding it and placing on one square. This gives some nice properties like quite uniform value distribution and low encoding and decoding cost.

I compared octahedron to storing 3 components (XYZ) and spherical coordinates. Not a very scientific approach - just rendered some shiny reflective spheres. Normals were stored in world space in a R8G8B8A8 render target. Post contains also complete source code (which unfortunately isn't provided in original paper), so you can paste into your engine and see yourself how this compression looks in practice.


XYZ
float3 Encode( float3 n )
{
    return n * 0.5 + 0.5;
}

float3 Decode( float3 encN )
{
    return encN * 2.0 - 1.0;
}



Spherical coordinates
float2 Encode( float3 n )
{
    float2 encN;
    encN.x = atan2( n.y, n.x ) * MATH_INV_PI;
    encN.y = n.z;

    encN = encN * 0.5 + 0.5;
    return encN;
}

float3 Decode( float2 encN )
{
    float2 ang = encN * 2.0 - 1.0;

    float2 scth;
    sincos( ang.x * MATH_PI, scth.x, scth.y );
    float2 scphi = float2( sqrt( 1.0 - ang.y * ang.y ), ang.y );

    float3 n;
    n.x = scth.y * scphi.x;
    n.y = scth.x * scphi.x;
    n.z = scphi.y;
    return n;
}


Octahedron-normal vectors
float2 Encode( float3 n )
{
    n /= ( abs( n.x ) + abs( n.y ) + abs( n.z ) + eps ); // eps ~= 0.0012
    n.xy = n.z >= 0.0 ? n.xy : ( 1.0 - abs( n.yx ) ) * ( n.yx >= 0.0 ? 1.0 : -1.0 );
    n.xy = n.xy * 0.5 + 0.5;
    return n.xy;
}

float3 Decode( float2 encN )
{
    encN = encN * 2.0 - 1.0;

    float3 n;
    n.z  = 1.0 - abs( encN.x ) - abs( encN.y );
    n.xy = n.z >= 0.0 ? encN.xy : ( ( encN.yx >= 0.0 ? 1.0 : -1.0 ) - encN.yx );
    n = normalize( n );
    return n;
}


Conclusion
Spherical coordinates have bad value distribution and bad performance. Distribution can be fixed by using some kind of spiral [SPS12]. Unfortunately it still requires costly trigonometry and quality is only marginally better than octahedron encoding.

One other method worth mentioning is Crytek's best fit normals [K10]. It provides extreme precision. On the other hand it won't save any space in G-Buffer as it requires 3 components. Also encoding uses a 512x512 lookup texture, so it's quite expensive.

Octahedron encoding uses a low number of instructions and there are only two non-full rate instruction (calculated on "transcendental unit"). One rcp during encoding and one rcp during decoding. In addition quality is quite good. Concluding octahedron-normal vectors have great quality to performance ratio and blow out of water old methods like spherical coordinates.

UPDATE: As pointed by Alex in the comments, interesting and detailed normal encoding technique survey was just released [CDE*14].


References
[MSS*10] Q. Meyer, J. S├╝bmuth, G. Subner, M. Stamminger, G. Greiner - "On Floating-Point Normal Vectors",  Computer Graphics Forum 2010
[ED08] T. Engelhardt, C. Dachsbacher - "Octahedron Environment Maps", VMW 2008
[K10] A. Kaplanyan - "CryENGINE 3: Reaching the speed of light", Siggraph 2010
[SPS12] J. Smith, G. Petrova, S. Schaefer - "Encoding Normal Vectors using Optimized Spherical Coordinates", Computer and Graphics 2012
[CDE*14] - Z. H. Cigolle, S. Donow, D. Evangelakos, M. Mara, M. McGuire, Q. Meyer - "A Survey of Efficient Representations for Independent Unit Vectors", JCGT 2014

Saturday, May 25, 2013

Simple GPUView custom event markers

GPUView is a powerful tool for GPU/CPU interaction profiling for Windows. It's interface isn't very user friendly, but it gets job done. I used it for optimizing in-house GPU lightmapper and spend some time in order to find a way to add custom event markers. Custom markers are quite essential, when you try to profile and understand complex interactions.

Most solutions on web are quite complicated - involving writing strange DLLs, manifests, using ECManGen.exe... Thankfully there is a much simpler solution.

First register an event handler using custom GUID:
REGHANDLE gEventHandle;
GUID guid;
UuidFromString( (RPC_CSTR) "a9744ea3-e5ac-4f2f-be6a-42aad08a9c6f", &guid );
EventRegister( &guid, nullptr, nullptr, &gEventHandle );
Then just call EventWriteString with custom text:
EventWriteString( gEventHandle, 0, 0, L"Render" );
Final step is to modify log.cmd in order to add this custom GUID for tracing (same one, which was passed to EventRegister). Just pass it as new Xperf parameter (see TRACE_DSHOW or TRACE_DX variables for reference).

During next GPUView profiling session open "Event Listing" dialog and locate custom event by GUID:


I guess it should also work for XPerf and other tools which use windows event tracing. For better integration look up Writing anifest-based eventsWriting an Instrumentation Manifest and ECManGen.exe on MSDN.

Monday, March 25, 2013

LA Noire

LA Noire has some amazing tech for face animations. Basically actors are filmed from multiple cameras and resulting movies are converted to a keyframed animation and animated textures. All the textures are captured in neutral lighting conditions, so usually lighting doesn't fit in game environment. Looks like that those textures are animated at around 3 frames per second. Eyes are animated separately and at higher rate. This approach has also some interesting "artifacts", as it's impossible to capture everything during one day. For example you can see as hair shifts and changes when blending between two performances captured at different days:


More info:
LA Noire face tech animation trailer
LA Noir tech description by IQGamer
MotionScan website

Saturday, October 6, 2012

Unreal Engine 4 gaussian specular normalization

Recently Epic did a nice presentation about their new tech: "The technology behind Unreal 4 Elemental demo". Among a lot of impressive stuff they showed their gaussian specular aproximation. Here is a BRDF with U4 specular for Disney's BRDF explorer:
analytic

::begin parameters
float n 1 512 100
bool normalized 1
::end parameters

::begin shader

vec3 BRDF( vec3 L, vec3 V, vec3 N, vec3 X, vec3 Y )
{
    vec3 H = normalize( L + V );
    float Dot = clamp( dot( N, H ), 0, 1 );
    float Threshold = 0.04;
    float CosAngle = pow( Threshold, 1 / n );
    float NormAngle = ( Dot - 1 ) / ( CosAngle - 1 );
    float D = exp( -NormAngle * NormAngle );

    if ( normalized )
    {
        D *= 0.17287429 + 0.01388682 * n;
    }
 
    return vec3( D );
}

::end shader
This aproximation was tweaked to have less aliasing than the standard Blinn-Phong specular (it has smoother falloff):

 
 

Mentioned presentation doesn't include a normalization factor for it. It was a nice excuse for spending some time with Mathematica and try to derive it myself.

Basic idea of normalization factor is that lighting needs to be energy conserving (outgoing energy can't be greater than incoming energy). This means that integral of BRDF times cos(theta) over upper hemisphere can't exceed 1 or more specifically in our case we want it to be equal 1:


The highest values will be when light direction equals normal (L=N). This means that we can replace dot(N,H) with cos(theta/2), as now angle between H (halfway vector) and N equals to half of angle between L and N. This greatly simplifies the integral. Now we can replace the f(l,v) with U4 gaussian aproximation:


Unfortunately neither I nor Mathematica could solve it analytically. So I had to calculate values numerically and try to fit various simple functions over range [1;512]. The best aproximation which I could find was: 0.17287429 + 0.01388682 * n. Where n is Blinn-Phong specular power.



As you can see it isn't accurate for small specular power values, but on the other hand it's very fast and specular power below 16 aren't used often.

Monday, June 4, 2012

Visual C++ linker timestamp

Nice trick to get build's timestamp at runtime (Visual C++ only):
EXTERN_C IMAGE_DOS_HEADER __ImageBase;

(...)

IMAGE_NT_HEADERS const* ntHeader 
    = (IMAGE_NT_HEADERS*) ( (char*) &__ImageBase + __ImageBase.e_lfanew );
DWORD const timeStamp = ntHeader->FileHeader.TimeDateStamp;
It's not very portable, but it doesn't require any additional recompilation (as __DATE__ __TIME__ macros do).

Sunday, May 6, 2012

PixelJunk Eden data extractor

Recently Q-Games ported one of their games - PixelJunk Eden to Windows. Actually it's their first Windows release. From tech perspective it's not as impressive as PixelJunk Shooter series, but still it was quite interesting to poke around this game's files. Unfortunately game data was stored in custom format and encrypted. I wonder why people waste time to encrypt game data.

I had to reverse engineer data files and wrote a simple extractor program. You can download sources here. The rest of the post contains information about encryption and file formats.

Files are encrypted using a homemade xor encryption scheme:
int seed = fileSize + 0x006FD37D;
for ( unsigned i = 0; i < fileSize; ++i )
{
    int const xorKey = seed * ( seed * seed * 0x73 - 0x1B ) + 0x0D;
    fileByteArr[ i ] ^= xorKey;
    ++seed;
}

Game data is stored in lump_x_x.pak files with description in the lump.idx file. Lump.idx consits of 1 header, multiple lump_x_x.pak descriptors and multiple packed file descriptors. All files (*.idx and *.pak) are encrypted using the mentioned above xor scheme.

lump.idx header:
struct IndexHeader
{
    char     m_magic[ 4 ];      // "PACK"
    unsigned m_unknown0;
    unsigned m_unknown1;
    unsigned m_packedFileMaxID; // packed file num - 1
    unsigned m_lumpFileMaxSize;
    unsigned m_lumpFileNum;
    char     m_align[ 230 ];
};

lump.idx file descriptors of lump_x_x.pak files:
struct IndexLumpDesc
{
    unsigned      m_unknown;
    unsigned      m_lumpSize;
    unsigned char m_lumpPartID;
    unsigned char m_lumpID;
    char          m_align[ 2 ];
};

lump.idx file descriptors of packed files:
struct IndexFileDesc
{
    char     m_filename[ 120 ];
    unsigned m_offset;
    unsigned m_size;
};

lump_x_x.pak files contain packed files at specified offsets. Every packed file is stored at offset aligned to 128 bytes.

Sunday, February 13, 2011

Virtual memory on PC

There is an excelent post about virtual memory. It's written mainly from a perspective of console developer. On consoles most of memory issues are TLB misses and physical memory limit. I'll try to write more about how (bad) it looks on PC (windows) with 32 bits programs. Especially nowadays when games require more and more data.

Firstly half of program's virtual address space is taken by kernel. This means that first pointer's bit is unused and it can be used for some evil trickery :). Moreover first and last 64kb are reserved by kernel.

Program's source and heap has to be loaded somewhere. When compiling using VC++ default place is 0x0040000. Then a bunch of DLLs are loaded into strange virtual memory addresses. You can check what DLLs are loaded, into what address and see their size using Dependacy Walker. Use start profiling feature to see real virtual memory address of given DLL. DLLs and program usually aren't loaded into one contiguous address range. At this point we didn't call new/malloc even once and virtual memory is already fragmented.

Now there comes video driver. It will use precious virtual memory for managed resources, command buffer and temporary for locking non managed resources. Especially creating/locking non managed resources is quite misinforming as DirectX returns "out of video memory" instead of "out of virtual memory". It's very tempting to put all static level geometry into one 100mb non-managed vertex buffer. When creating/filling this VB video driver will try to allocate contiguous 100mb chunk of virtual memory. This will likely result in program crash after some time.

Windows uses 4kb pages, so doing smaller allocations will lead to internal fragmentation. I guess already everyone is using some kind of custom memory allocator, so it isn't a problem.

There is /LARGEADDRESSAWARE linker flag, which allows to use additional 1gb of virtual memory. It requires user to change boot params and usually doesn't work well in practice (system stability issues etc.). It's also possible to compile as 64 bit program, but according to Steam HW survey half of gamers use a 32 bit OS. This is really annoying that MS is still making 32 bit systems because currently min PC game spec CPUs are core2 or similar with 64 bit support.

Summarizing in theory memory shouldn't be a problem on PC, but in practice it's a precious and fragile resource.