Currently, Open MW's basic file access speed is limited by a peculiar layer of virtualisation in BSAFile's interface. This PR removes such virtualisation by properly separating BSAFile from CompressedBSAFile in low level contexts.
We now use PositionAttitudeTransform for unmerged nodes because I have been informed PositionAttitudeTransform's scene graph performance is measurably faster than MatrixTransform's. We still need to use MatrixTransform for merged nodes because the Optimizer has limited support for non-MatrixTransform nodes. These MatrixTransforms will be removed in the merging process anyway.
We skip this during node path iterations. this is not a node we are interested in.
We avoid allocating a new mGeomToSkelMatrix per frame and avoid a ref_ptr associated with its update.
We speed up a search for the Skeleton node by adding a continue; condition prior to an expensive dynamic_cast.
This is the value used by vanilla Morrowind engine. At this angle player
character starts sliding down the slope.
This fixes navigation and movement issue in Mournhold, Godsreach.
For some reason we have commented std::move keywords in keywordsearch.hpp while they are quite beneficial. openmw_test_suite for keywordsearch.hpp takes 30% less time with these changes.
We can expect marginally improved loading times with this PR. Drawable, Transform and Node counts in stats panels are expected to remain unchanged - this PR does not add new scene graph optimisations, it just increases the speed with which we apply existing ones.
1. We add explicit `NodeVisitor::apply` overrides for commonly encountered node types to avoid additional virtual function calls per node associated with the default `apply` implementation.
2. We skip pushing `StateSet`s when `_mergeAlphaBlending` is enabled or the `StateSet` contains no relevant state.
3. We add a specialised variant of `CollectLowestTransformsVisitor::addTransform` accepting `MatrixTransform` to avoid matrix copies and multiplications.
With this PR we optimise a function that is called quite often when loading new cells.
We remove avoidable dynamic_casts.
We remove an unused pair.second element.
We convert a map to an unordered_map because its ordering is irrelevant in this case.
We avoid adding the root Skeleton node to the bones' node path.
Currently, we create a new ComputeLightSpaceBounds visitor per frame. Within this visitor, we require excessive memory allocations, mainly a new osg::RefMatrix per encountered Transform node.
With this PR we reuse a single ComputeLightSpaceBounds visitor across frames and enable the createOrReuseMatrix functionality to avoid allocating new matrices every frame. osgUtil::CullVisitor internally uses the same approach.
Currently, we run culling tests against all lights in the scene during LightListCallback::pushLightState. We can avoid most of these tests by removing off-screen lights at an earlier stage. We should benchmark the cumulative time spent within LightListCallback::pushLightState before and after this PR.
As we discovered in #3148, `Transform` nodes and their low level equivalence `pushModelViewMatrix` are somewhat costly involving a `Matrix::invert` operation per frame.
With this PR we avoid one `Transform` node for sun flashes and avoid another `pushModelViewMatrix` call in case the sun is fully visible.
With this PR we test out osg's shader define system for a somewhat harmless feature. As we can see, our code becomes more concise and efficient in this case. Most importantly, we no longer create unneeded vertex shader objects.