We moved from using vertex buffer only to using both vertex and index buffer. Take the square which is made of two triangles as example, using only vertex buffer means we need to write position (2 floats = 8 bytes) and color (4 uint8_t = 4 bytes) for 6 vertex into vertex buffer. This takes 12 * 6 = 72 bytes. While using vertex and index buffer together means need to write position and color for 4 vertex (12 * 4 = 48) into vertex buffer and 6 indices (6 uint32_t = 24), sums up also equals to 72 bytes. Here the difference might be small because we're only drawing two rectangles. When the number of primitives grows, let's say 10 triangles. Vertex only approach takes 12 * 30 = 360 bytes while using both takes (4 * 30 + 12 * 10 = 240 byes. No need to mention situations in real industry where there's hundreds/thousands of primitives. Besides the advantage mentioned above, it's also more intuitive friendly logical to separate data and index of data.
My approach to represent mesh is considering each mesh as a triangle which takes 3 fixed index and 3 user defined vertex. Then everything becomes clear that from the user point of view, we want to do things like "CreateTriangle(vertex0, vertex1, vertex2)". I provided this interface to the user in my code. In our situation, it's not necessary to keep the vertex in mesh struct/class because it's passed into vertex buffer and nothing changes after that. I still keep a vertex member in my mesh class in case of future use. Above that, I keep a static vector containing all mesh instances for render purpose per frame. OpenGL simply use vertex array id to find vertex buffer where we pushed relative info into. But I still keep the vector as the common structure in two implementations.
I was careless enough to miss the fact that direct3d uses a bgra instead of rgba. It takes more than half a hour to find out. Then I added preprocessor macros in order to make vertex struct work as it should for both implementations. As memory management, I make ctor and dtor private. Ctor is called when user calls AddTriangleMesh(v0, v1, v2). Then buffer is allocated by calling create buffer functions which used to be in Graphics implementations. Correspondingly, I have a shutdown function (cleanup might be a better name though...) which calls clear on the vector which then call all the mesh dtor to release buffer. For Direct3D implementation, direct3dDevice is needed in rendering, I pass this in with an initialize interface which takes a void pointer argument then reinterpret cast it back to direct3dDevice pointer in implementation. I hope to learn better ways to do that.
My draw function DrawAll (poor name again...) is a static method in mesh class which takes no argument and return boolean to indicate result.
Here's a copy of the executable part of the solution: