Thomas Touhey cfed4cc7de | ||
---|---|---|
scripts | ||
test | ||
textoutpc | ||
.editorconfig | ||
.gitignore | ||
GUIDE.rst | ||
LICENSE.md | ||
Makefile | ||
README.rst | ||
TAGS.rst | ||
requirements.txt | ||
setup.cfg | ||
setup.py |
README.rst
Planète Casio's textout() BBcode markup language translator
Warning
If you are accessing this repository from <https://git.planet-casio.com>_, keep in mind that it is only a mirror and that the real repository is located at <https://forge.touhey.fr/pc/textout.git>_ for now.
BBcode has been invented in the 90s/2000s for bulletin board systems. It has been implemented in Planète Casio during its first years (although some research has to be made on how that choice was done…).
On Planète Casio,
which is coded in PHP at the time I'm writing this, we have our own
custom version of BBcode, which we pass through an internal utility
named textout()
.
I, Thomas “Cakeisalie5” Touhey, rewrote it recently, and it works
pretty well while being secure, but as the next version of Planète Casio (the ”v5”)
will be written from scratch, I figured out I could rewrite the
textout()
utility in Python, and improve the language
parsing to be more practical and add features that are in the original
BBcode markup language.
As this is a rewrite, the vulnerabilities and bug will not be common to this project and the online version of the transcoder.
Usage
To use this module, simply use the to<language>()
functions once imported:
#!/usr/bin/env python3
import textoutpc
= "Hello, [i]beautiful [b]world[/i]!"
text print(textoutpc.tohtml(text))
print("---")
print(textoutpc.tolightscript(text))
The supported output types are:
html
: HTML compatible output, requiring some additional style and script;lightscript
: Lightscript Markdown-like language. See the Lightscript topic on Planète Casio for more information.
Tweaks
The tohtml()
and tolightscript()
can take
additional keywords that tags can read so that they can adapt their
behaviour. The name of the tweaks are case-insensitive and
non-alphanumeric characters are ignored: for example,
label_prefix
, LABELPREFIX
and
__LaBeL___PRE_FIX__
are all equivalent.
The following tweaks are read by the translator and built-in tags:
label_prefix
(HTML): prefix to be used by the[label]
and[target]
tags, e.g.msg45529-
. Defaults to "" for PCv42 compatibility;obsolete_tags
(HTML): use obsolete HTML tags for old browsers (e.g. lynx) compatibility, e.g.<b>
,<i>
,<center>
, and others. Defaults toTrue
.
An example call would be:
#!/usr/bin/env python3
import textoutpc
print(textoutpc.tohtml("Hello, [i]beautiful[/i]!", obsolete__TAGS=False))
What is left to do
- Correct the translator until all the tests pass;
- Manage blocks superseeding each other;
- Implement BBcode lists using
[*]
,[**]
, …; - Manage lightscript (or even markdown?) as output languages;
- Check where the errors are to display them to the user:
- Count character offset, line number and column number in the lexer;
- Produce readable exceptions;
- Make a clean interface to transmit them;
- Check why exceptions on raw tags effectively escape the content, as it shouldn't…?
- Implement the
inline
tweak in order not to read blocks in the translator. - Look for security flaws (we really don't want stored XSS flaws!).