The GBA had a pretty slow processor. The ARM7 is very nice; they just ran it slow and gave it next to no resources.
There is a reason why a lot of Nintendo games at that point and before were side-scrollers. HARDWARE. It is all done in hardware. You had multiple layers of tiles plus one or more sprites and the hardware did all the work to extract pixels from those tables and drive the display.
You build the tile set up front and then had a smallish memory that was a tile map. Want the lower left tile to be tile 7? You put a 7 in that memory location. Want the next tile over to be tile 19? In the tile set, you put a 19 there, and so on for each layer that you have enabled. For the sprite, you simply set the x/y address. You can also do scaling and rotation by setting some registers and the hardware takes care of the rest.
Mode 7, if I remember right, was a pixel mode, but that was like a traditional video card where you put bytes in that cover the color for a pixel and the hardware takes care of the video refresh. I think you could ping pong or at least when you had a new frame you could flip them, but I don't remember right. Again, the processor was fairly underclocked for that day and age and didn't have too many fast resources. So while some games were mode 7, a lot were tile based side-scrollers...
If you want a solution that is a high frame rate, you need to design that solution. You can't just take any old display you find and talk to it via SPI or I²C or something like that. Put at least one framebuffer in front of it, ideally two, and have row and column control if possible over that display.
A number of the displays I suspect you are buying have a controller on them that you are actually talking to. If you want GBA/console type performance you create/implement the controller. Or you buy/build with a GPU/video chip/logic blob, and use HDMI or other common interface into a stock monitor.
Just because a bicycle has tires and a chain and gears doesn't mean it can go as fast as a motorcycle. You need to design the system to meet your performance needs, end to end. You can put that bicycle wheel on that motorcycle, but it won't perform as desired; all of the components have to be part of the overall design.
Asteroids worked this way too; it only needed one 6502. The vector graphics were done with separate logic; the 6502 sent a tiny string of data to the vector graphics controller, which used a ROM and that data to do the xy plotting of the beam and z, on/off... Some standups had separate processors to handle audio and video separate from the processor computing the game. Of course today the video is handled by some hundreds, if not thousands, of processors that are separate from the main processor...