2
0
Fork 0
Python port of Planète Casio's BBcode translator.
Go to file
Thomas Touhey cfed4cc7de
Going forward!
2018-06-22 01:35:41 +02:00
scripts Having *other* bugs now. 2018-06-21 00:43:03 +02:00
test Going forward! 2018-06-22 01:35:41 +02:00
textoutpc Going forward! 2018-06-22 01:35:41 +02:00
.editorconfig Continued. 2018-01-16 13:34:11 +01:00
.gitignore Made imports cleaner. 2018-02-19 20:13:10 +01:00
GUIDE.rst Okay, so maybe these empty lines were required after all. 2018-04-15 02:04:07 +02:00
LICENSE.md Modified meta-information about the module. 2018-01-19 22:56:51 +01:00
Makefile Prepared packaging & stuff 2018-02-11 21:31:39 +01:00
README.rst Having *other* bugs now. 2018-06-21 00:43:03 +02:00
TAGS.rst Going forward! 2018-06-22 01:35:41 +02:00
requirements.txt Stream ready, started adding block/inline logic to tags. Transcoder to HTML is TODO. 2018-01-05 03:31:33 +01:00
setup.cfg Initial commit. 2018-01-02 18:57:04 +01:00
setup.py Added some sort of CSS injection imitation on the text-related tags. 2018-04-15 01:26:30 +02:00

README.rst

Planète Casio's textout() BBcode markup language translator

Warning

If you are accessing this repository from <https://git.planet-casio.com>_, keep in mind that it is only a mirror and that the real repository is located at <https://forge.touhey.fr/pc/textout.git>_ for now.

BBcode has been invented in the 90s/2000s for bulletin board systems. It has been implemented in Planète Casio during its first years (although some research has to be made on how that choice was done…).

On Planète Casio, which is coded in PHP at the time I'm writing this, we have our own custom version of BBcode, which we pass through an internal utility named textout().

I, Thomas “Cakeisalie5” Touhey, rewrote it recently, and it works pretty well while being secure, but as the next version of Planète Casio (the ”v5”) will be written from scratch, I figured out I could rewrite the textout() utility in Python, and improve the language parsing to be more practical and add features that are in the original BBcode markup language.

As this is a rewrite, the vulnerabilities and bug will not be common to this project and the online version of the transcoder.

Usage

To use this module, simply use the to<language>() functions once imported:

#!/usr/bin/env python3
import textoutpc

text = "Hello, [i]beautiful [b]world[/i]!"
print(textoutpc.tohtml(text))
print("---")
print(textoutpc.tolightscript(text))

The supported output types are:

Tweaks

The tohtml() and tolightscript() can take additional keywords that tags can read so that they can adapt their behaviour. The name of the tweaks are case-insensitive and non-alphanumeric characters are ignored: for example, label_prefix, LABELPREFIX and __LaBeL___PRE_FIX__ are all equivalent.

The following tweaks are read by the translator and built-in tags:

  • label_prefix (HTML): prefix to be used by the [label] and [target] tags, e.g. msg45529-. Defaults to "" for PCv42 compatibility;
  • obsolete_tags (HTML): use obsolete HTML tags for old browsers (e.g. lynx) compatibility, e.g. <b>, <i>, <center>, and others. Defaults to True.

An example call would be:

#!/usr/bin/env python3
import textoutpc

print(textoutpc.tohtml("Hello, [i]beautiful[/i]!", obsolete__TAGS=False))

What is left to do

  • Correct the translator until all the tests pass;
  • Manage blocks superseeding each other;
  • Implement BBcode lists using [*], [**], …;
  • Manage lightscript (or even markdown?) as output languages;
  • Check where the errors are to display them to the user:
    • Count character offset, line number and column number in the lexer;
    • Produce readable exceptions;
    • Make a clean interface to transmit them;
  • Check why exceptions on raw tags effectively escape the content, as it shouldn't…?
  • Implement the inline tweak in order not to read blocks in the translator.
  • Look for security flaws (we really don't want stored XSS flaws!).