Bubble Universe performance (short Basic graphical demo)
Over on stardot I see there are varying reports of the performance of Brandy Basic.
It's an impressive demo - relies in detail, I think, on the speed and accuracy of trig on largish numbers - for example SIN(200). The various efforts in that thread have come out with similar kinds of output but different in detail.
It's an impressive demo - relies in detail, I think, on the speed and accuracy of trig on largish numbers - for example SIN(200). The various efforts in that thread have come out with similar kinds of output but different in detail.
0
Comments
-
That is a nice demo - and based on a slightly tweaked version of mike12f's code on page 2 of that thread, I am getting about 34fps on my virtual machine hosted by a pair of 8-year-old Xeons.
For completeness here is that code, with my tweaks:n=200 r=PI*2/235 MODE 640,512,24 x=0:y=0:v=0:t=0.22 s=240 VDU 23,1,0;0;0;0; VDU 29,640;512; *REFRESH ONERROR TIME=0 frame_counter%=0 REPEAT CLS FOR i=0 TO n STEP 2 FOR j=0 TO n STEP 2 u=SIN(i+v)+SIN(r*i+x) v=COS(i+v)+COS(r*i+x) x=u+t r%=(i/200+0.5) MOD 2 g%=(j/200+0.5) MOD 2 c%=r%+g%*2 IF c%=0 c%=4 REMGCOL 0,c% GCOL i+50,j+50,99+50 REMPLOT 69,u*s,v*s CIRCLE FILL u*s,v*s,2 NEXT j NEXT i t+=0.025 frame_counter%+=1 PRINT"fps:";(frame_counter%/TIME*100) *REFRESH UNTIL 0
The MODE change to 640,512,24 changes to a 24-bit equivalent to MODE 21 (indeed, 21 runs slower - probably the adjusting of pixel colours to fit the 256-colour palette), the speed-up certainly comes from *REFRESH having to update a smaller screen area (73 is 1024x576).
The *REFRESH ONERROR change simply sets a flag so on any error condition (including hitting Escape) the *REFRESH mode changes to ON.0 -
I've just checked in a change (git commit 563c3c2) that uses an optimised path for full-screen blitting in 1:1 pixel modes. With this, I now get 50fps on my old Xeon, 5.67fps on the RasPi3B0
-
A nice increase! What's with the CIRCLE command - is that just plotting a point?0
-
I have no idea! It's an almost copy & paste of mike12f's program from *. with my minor tweaks.0
-
Oh... ah, yes, I see it now. "To boost the brightness" - but even a small circle sounds to me like an expensive thing to paint.0
-
Regarding the increase in speed, it was your post about block copy on the PiTubeDirect doing block copies that made me think, and it occurred to me that an often-used option is to blit the full screen, and I have a routine that copies, pixel by pixel scaling if necessary from the working screen to the display buffer. So I put a shortcut in that if there is no scaling and it's the full screen, I'd just memcpy() from one to the other, and that made this increase!0
-
Oh, that's nice! I think it was Dave (hoglet) who thought the block copy was worth a try. I was all set for the approach of storing all the graphics commands in a buffer and then replaying them once the calculations were done.
BTW I've been wondering about sincos... the machines we have will probably compute sin and cos together nearly as fast as computing just one of them. But sincos returns two values. Is there a way to make it usable, I wonder, or to make use of it?0 -
Possibly via a SYS call that takes a buffer... something like:
DIM sincos% 24 |sincos%=<float64 value> SYS "Brandy_SINCOS", sincos%: REM don't try this now, it hasn't been implemented. REM SIN in |(f%+8), COS in |(f%+16)
Though, all that memory pushing and the SYS interface, not sure if jumping those hoops would negate the advantage of calculating both at the same time.0 -
Ah, that's not too bad - after all, sincos is probably only useful in special situations. I tend to think of trig as very expensive, so a little fiddling shouldn't compare too badly, but then again trig probably gets a bit cheaper with each generation, so that could be a mistake. Notably, in the case of this demo, we're doing lots of range reduction as well, which is extra cost, and so an extra saving.0
-
I've now implemented this, exactly as described in my example. It appears RISC OS and MINIX don't have sincos() or at least the cross-compiler I have doesn't, so they have to be calculated separately so there may be little to gain there, but it makes it complete across platforms.0
-
Having modified the BubbleUniverse demo to use this, the overheads are catastrophic - from 57fps to 10fps on my Xeon-hosted VM.
Since there is absolutely nothing to gain from this function, I will back this change out.0 -
oh no! But thanks for trying.1
-
I have managed a further slight improvement, the Xeon VM can now get 68fps, and (while a reboot helped!) the RasPi3B+ can get 12.8fps.0
-
I've pushed an update (that moves ESCAPE polling to its own thread) that speeds up the SDL build by about 15-25%. On my 10-year-old i5 laptop (running AlmaLinux 8) it increased the frame rate from about 80 to 95.0
-
The *REFRESH ONERROR change simply sets a flag so on any error condition (including hitting Escape) the *REFRESH mode changes to ON.
That seems, at first sight, a somewhat superfluous extension. BBC BASIC for SDL 2.0 does the equivalent of *REFRESH ON when any untrapped error occurs, on the basis that in the absence of an ON ERROR handler you will always want to see the message.
If there is an ON ERROR handler it doesn't do that, but then you can put the *REFRESH ON command in the handler. Sometimes having control over its placement is essential, for example if the program has switched to another OpenGL context enabling output before switching back to the original context may cause a crash.
As far as the Bubble Universe program itself is concerned (I understood it to have been written by Paul Dunn, who doesn't seem to have been acknowledged in this thread - I can't read Stardot so I don't know about there) BBCSDL is rather too slow to do it justice. So I've made some more drastic changes (such as reducing the window size to 512 x 512) to speed it up:
https://wasm.bbcbasic.co.uk/bbcsdl.html?app=bbcbasic.co.uk/webapps/bubbles.bbb
0