Complex and RTL scripts

I have the feeling I've already asked this question, but I can't find it (maybe it was at StarDot), so apologies if it's a repetition.

Presumably RISC OS, in its 'modern' incarnation at least, supports rendering complex and right-to-left-written scripts (Arabic being an example of both). It could hardly not, and still claim to be a serious OS.

But how is that capability exposed in BASIC? Is there a SWI which can be used to render an Arabic string, for example? What would be the minimum BASIC code to achieve it?

This has arisen because I've written a library (script.bbc) for BBCSDL, and I'm in the process of trying to write a compatible library for BB4W. It would be interesting to see if a similar library for Matrix Brandy could be written.

Comments

  • I don't recall you asking the question before, and if on *. then I completely missed it. It's not something I've ever tried to do as Arabic (or Hebrew, which I also believe is RTL) is something I have absolutely zero knowledge about. The only text rendering in Matrix Brandy is the classic "system font", the RISC OS FontManager SWI calls are entirely unimplemented. I did play with them once, back in the early 1990s on RISC OS 2 which had no support for complex scripts or Unicode (I believe the latter is at least implemented on newer versions?)
  • Soruk wrote: »
    Arabic (or Hebrew, which I also believe is RTL) is something I have absolutely zero knowledge about.
    I'm not necessarily expecting you to be knowledgeable about them, but in the 21st century it's surely not acceptable for any programming language to be limited only to 'Latin-based' languages. Are you happy to exclude such a large proportion of the world from the potential user base of Matrix Brandy? I'm not even sure that should be legal! ;-)

    What I'm still not clear about is whether BBC BASIC (even Sophie's ARM BASIC) when running on RISC OS doesn't expose the capability of rendering complex scripts, or whether it's just Matrix Brandy that doesn't. If it's the former I would find that very surprising, given that (I assume) BBC BASIC is still bundled.
  • In the case of Sophie's ARM BASICs such capability would be by using the RISC OS Font manager, rather than any built-in capability within BASIC itself.

    For alphabetic scripts like Hebrew, it would be relatively easy by defining a custom text font (or switching to the Hebrew SAA505X set in Mode 7) then using VDU23,16,1| to reverse the X direction.

    If libraries exist that can take a font file and render the Unicode to a bitmap then I suppose more complex scripts like Arabic and Chinese could be supported, it is not something I have personally tried to do.
  • Soruk wrote: »
    In the case of Sophie's ARM BASICs such capability would be by using the RISC OS Font manager
    Which you would access from BASIC how? Using SYS? Is that something you could emulate?
    or switching to the Hebrew SAA505X set in Mode 7
    Like most languages, there's a lot more to Hebrew than the simple alphabet (diacritics etc.). There are 134 code points in the Hebrew sections of Unicode (100 in the main block from 0x0591 to 0x05F4).
    it is not something I have personally tried to do.
    We must have very different attitudes to 'internationalisation'; American programmers have a regrettable tendency to neglect those issues, but not Europeans because of their geographical proximity to countries using complex scripts and RTL languages.

    It's always been a high priority for me, not least because I have users who rely on that support (and it may well have been contributory to their choice of BBC BASIC).
  • Sorry to raise this topic again, but it's pertinent because I've just been editing a monospaced Hebrew alphabet into the 'DejaVuSansMono' font, which strangely omits it (DejaVuSans does include Hebrew).

    How fundamental, in the architecture of Matrix Brandy, is the character code being only 8 bits? How much work would it be to change to a 16-bit character code internally?

    If you could switch to using a 16-bit code you'd be able, in principle, to accommodate the entire Unicode Basic Multilingual Plane (BMP). I realise that it would still have to be a bitmap font, but a multilingual bitmap font would be a lot better than what you have now.

    And having recently discovered GNU Unifont, a bitmap font (8 x 16 or 16 x 16) with excellent coverage of the BMP, it would mean you could incorporate all the common glyphs with little effort (licensing conditions permitting).
  • It's all 8x8 - except MODE 7 which is 16x20.

    A long while back I had prepared a slide-show thing written in Matrix Brandy that plotted scaled and proportional-spaced characters, by actually plotting individual pixels, but it was fast enough (especially with *REFRESH OFF) to not be an issue. (I did rework it for a non-scaled version to use VDU5 and MOVE to position characters to make a proportional version of MODE1 just about usable on the BBC Micro! It's in my NoteQuiz program.)

    Probably, at least initially if I figure out the file format I could make something that parses that and plots the characters in a similar-ish fashion, certainly that would be a good way to start an implementation of a font painter. (I would probably try to make the implementation work similar to the RISC OS FontManager

    I haven't been so active with Matrix Brandy development lately - I changed my job in September last year and have been rushed off my feet ever since!
  • Soruk wrote: »
    It's all 8x8 - except MODE 7 which is 16x20.
    My question was about the character code, not the graphical representation. An 8-bit code gives you up to 256 different characters, a 16-bit code up to 65536. Sorry if I wasn't clear.

    In BB4W and BBCSDL the character code is stored internally as a 16-bit number, which is why it's limited to the Basic Multilingual Plane (UCS-2 encoding, in other words).
  • In that case, it's 8 bit, multiply by 8 it references a pointer to where a writable copy of the system font is stored (and modifiable with VDU23). Implementing an external font mechanism (be it via SDL, or a handwritten font driver either in the source code or as a BASIC program), it could be anything.
  • Soruk wrote: »
    In that case, it's 8 bit
    Oh well, another idea bites the dust! I know I can't complain: my own motivation to develop BB4W and BBCSDL further is also at rock bottom.

    Does your new job give you an opportunity to use Matrix Brandy? When I worked for the BBC I used BBC BASIC for software tasks when I could. It wasn't always possible, sometimes it was too slow or the host platform unsuitable, but for things like tools and utilities that I needed for my work it was the go-to language of course.
  • for things like tools and utilities that I needed for my work it was the go-to language of course.
    One of the tools I developed at work, the filter synthesis program FIRBBC, still gets some use I think - probably not at the BBC but I know of one user in the States.

    Incidentally that program is the only case I know of when the full 80-bit extended precision arithmetic of the x86 is actually vital. It couldn't run properly on an ARM CPU!