2
0
Fork 0
textout/README.rst

97 lines
3.5 KiB
ReStructuredText
Raw Normal View History

Planète Casio's textout() BBcode markup language translator
===========================================================
.. warning::
If you are accessing this repository from <https://git.planet-casio.com>_,
keep in mind that it is only a mirror and that the real repository
is located at <https://forge.touhey.fr/pc/textout.git>_ for now.
BBcode has been invented in the 90s/2000s for bulletin board systems.
It has been implemented in `Planète Casio`_ during its first years (although
some research has to be made on how that choice was done…).
On `Planète Casio`_, which is coded in PHP at the time I'm writing this,
we have our own custom version of BBcode, which we pass through an internal
utility named ``textout()``.
I, Thomas “Cakeisalie5” Touhey, rewrote it recently, and it works pretty well
while being secure, but as the next version of `Planète Casio`_ (the ”v5”)
will be written from scratch, I figured out I could rewrite the ``textout()``
utility in Python, and improve the language parsing to be more practical and
add features that are in the original BBcode markup language.
As this is a rewrite, the vulnerabilities and bug will not be common to this
project and the online version of the transcoder.
-----
Usage
-----
To use this module, simply use the ``to<language>()`` functions once imported:
.. code-block:: python
#!/usr/bin/env python3
import textoutpc
text = "Hello, [i]beautiful [b]world[/i]!"
print(textoutpc.tohtml(text))
print("---")
print(textoutpc.tolightscript(text))
The supported output types are:
- ``html``: `HTML`_ compatible output, requiring some additional style and
script;
- ``lightscript``: `Lightscript`_ Markdown-like language. See
`the Lightscript topic on Planète Casio <Lightscript topic>`_ for
more information.
------
Tweaks
------
The ``tohtml()`` and ``tolightscript()`` can take additional keywords that
tags can read so that they can adapt their behaviour. The name of the tweaks
are case-insensitive and non-alphanumeric characters are ignored: for example,
``label_prefix``, ``LABELPREFIX`` and ``__LaBeL___PRE_FIX__`` are all
equivalent.
The following tweaks are read by the translator and built-in tags:
- ``label_prefix`` (HTML): prefix to be used by the ``[label]`` and
``[target]`` tags, e.g. ``msg45529-``. Defaults to `""` for PCv42
compatibility;
- ``obsolete_tags`` (HTML): use obsolete HTML tags for old browsers
(e.g. lynx) compatibility, e.g. ``<b>``, ``<i>``, ``<center>``, and
others. Defaults to ``True``.
An example call would be:
.. code-block:: python
#!/usr/bin/env python3
import textoutpc
print(textoutpc.tohtml("Hello, [i]beautiful[/i]!", obsolete__TAGS=False))
------------------
What is left to do
------------------
- Correct the translator until all the tests pass;
- Manage blocks superseeding each other;
- Implement BBcode lists using ``[*]``, ``[**]``, …;
- Manage lightscript (or even markdown?) as output languages;
- Check where the errors are to display them to the user:
* Count character offset, line number and column number in the lexer;
* Produce readable exceptions;
* Make a clean interface to transmit them;
- Check why exceptions on raw tags effectively escape the content, as it
shouldn't…?
- Look for security flaws (we really don't want stored XSS flaws!).
.. _Planète Casio: https://www.planet-casio.com/Fr/
.. _HTML: https://www.w3.org/html/
.. _Lightscript: https://git.planet-casio.com/lephe/lightscript
.. _Lightscript topic: https://planet-casio.com/Fr/forums/lecture_sujet.php?id=15022