textout/README.rst

Planète Casio's textout() BBcode markup language translator
===========================================================

.. warning::
	If you are accessing this repository from <https://git.planet-casio.com>_,
	keep in mind that it is only a mirror and that the real repository
	is located at <https://forge.touhey.fr/pc/textout.git>_ for now.

BBcode has been invented in the 90s/2000s for bulletin board systems.
It has been implemented in `Planète Casio`_ during its first years (although
some research has to be made on how that choice was done…).

On `Planète Casio`_, which is coded in PHP at the time I'm writing this,
we have our own custom version of BBcode, which we pass through an internal
utility named ``textout()``.

I, Thomas “Cakeisalie5” Touhey, rewrote it recently, and it works pretty well
while being secure, but as the next version of `Planète Casio`_ (the ”v5”)
will be written from scratch, I figured out I could rewrite the ``textout()``
utility in Python, and improve the language parsing to be more practical and
add features that are in the original BBcode markup language.

As this is a rewrite, the vulnerabilities and bug will not be common to this
project and the online version of the transcoder.

-----
Usage
-----

To use this module, simply use the ``to<language>()`` functions once imported:

.. code-block:: python
	#!/usr/bin/env python3
	import textoutpc

	text = "Hello, [i]beautiful [b]world[/i]!"
	print(textoutpc.tohtml(text))
	print("---")
	print(textoutpc.tolightscript(text))

The supported output types are:

- ``html``: `HTML`_ compatible output, requiring some additional style and
  script;
- ``lightscript``: `Lightscript`_ Markdown-like language. See
  `the Lightscript topic on Planète Casio <Lightscript topic>`_ for
  more information.

------
Tweaks
------

The ``tohtml()`` and ``tolightscript()`` can take additional keywords that
tags can read so that they can adapt their behaviour. The name of the tweaks
are case-insensitive and non-alphanumeric characters are ignored: for example,
``label_prefix``, ``LABELPREFIX`` and ``__LaBeL___PRE_FIX__`` are all
equivalent.

The following tweaks are read by the translator and built-in tags:

- ``label_prefix`` (HTML): prefix to be used by the ``[label]`` and
  ``[target]`` tags, e.g. ``msg45529-``. Defaults to `""` for PCv42
  compatibility;
- ``obsolete_tags`` (HTML): use obsolete HTML tags for old browsers
  (e.g. lynx) compatibility, e.g. ``<b>``, ``<i>``, ``<center>``, and
  others. Defaults to ``True``.

An example call would be:

.. code-block:: python
	#!/usr/bin/env python3
	import textoutpc

	print(textoutpc.tohtml("Hello, [i]beautiful[/i]!", obsolete__TAGS=False))

------------------
What is left to do
------------------

- Correct the translator until all the tests pass;
- Manage blocks superseeding each other;
- Implement BBcode lists using ``[*]``, ``[**]``, …;
- Manage lightscript (or even markdown?) as output languages;
- Check where the errors are to display them to the user:

  * Count character offset, line number and column number in the lexer;
  * Produce readable exceptions;
  * Make a clean interface to transmit them;
- Check why exceptions on raw tags effectively escape the content, as it
  shouldn't…?
- Look for security flaws (we really don't want stored XSS flaws!).

.. _Planète Casio: https://www.planet-casio.com/Fr/
.. _HTML: https://www.w3.org/html/
.. _Lightscript: https://git.planet-casio.com/lephe/lightscript
.. _Lightscript topic: https://planet-casio.com/Fr/forums/lecture_sujet.php?id=15022
Added some sort of CSS injection imitation on the text-related tags. 2018-04-15 01:26:30 +02:00			`Planète Casio's textout() BBcode markup language translator`
			`===========================================================`

			`.. warning::`
			`If you are accessing this repository from <https://git.planet-casio.com>_,`
			`keep in mind that it is only a mirror and that the real repository`
			`is located at <https://forge.touhey.fr/pc/textout.git>_ for now.`

			`BBcode has been invented in the 90s/2000s for bulletin board systems.`
			It has been implemented in `Planète Casio`_ during its first years (although
			`some research has to be made on how that choice was done…).`

			On `Planète Casio`_, which is coded in PHP at the time I'm writing this,
			`we have our own custom version of BBcode, which we pass through an internal`
			utility named ``textout()``.

			`I, Thomas “Cakeisalie5” Touhey, rewrote it recently, and it works pretty well`
			while being secure, but as the next version of `Planète Casio`_ (the ”v5”)
			will be written from scratch, I figured out I could rewrite the ``textout()``
			`utility in Python, and improve the language parsing to be more practical and`
			`add features that are in the original BBcode markup language.`

			`As this is a rewrite, the vulnerabilities and bug will not be common to this`
			`project and the online version of the transcoder.`

			`-----`
			`Usage`
			`-----`

			To use this module, simply use the ``to<language>()`` functions once imported:

			`.. code-block:: python`
			`#!/usr/bin/env python3`
			`import textoutpc`

			`text = "Hello, [i]beautiful [b]world[/i]!"`
			`print(textoutpc.tohtml(text))`
			`print("---")`
			`print(textoutpc.tolightscript(text))`

			`The supported output types are:`

			- ``html``: `HTML`_ compatible output, requiring some additional style and
			`script;`
			- ``lightscript``: `Lightscript`_ Markdown-like language. See
			`the Lightscript topic on Planète Casio <Lightscript topic>`_ for
			`more information.`

			`------`
			`Tweaks`
			`------`

			The ``tohtml()`` and ``tolightscript()`` can take additional keywords that
			`tags can read so that they can adapt their behaviour. The name of the tweaks`
			`are case-insensitive and non-alphanumeric characters are ignored: for example,`
			``label_prefix``, ``LABELPREFIX`` and ``__LaBeL___PRE_FIX__`` are all
			`equivalent.`

			`The following tweaks are read by the translator and built-in tags:`

			- ``label_prefix`` (HTML): prefix to be used by the ``[label]`` and
			``[target]`` tags, e.g. ``msg45529-``. Defaults to `""` for PCv42
			`compatibility;`
			- ``obsolete_tags`` (HTML): use obsolete HTML tags for old browsers
			(e.g. lynx) compatibility, e.g. ``<b>``, ``<i>``, ``<center>``, and
			others. Defaults to ``True``.

			`An example call would be:`

			`.. code-block:: python`
			`#!/usr/bin/env python3`
			`import textoutpc`

			`print(textoutpc.tohtml("Hello, [i]beautiful[/i]!", obsolete__TAGS=False))`

			`------------------`
			`What is left to do`
			`------------------`

			`- Correct the translator until all the tests pass;`
			`- Manage blocks superseeding each other;`
			- Implement BBcode lists using ``[]``, ``[*]``, …;
			`- Manage lightscript (or even markdown?) as output languages;`
			`- Check where the errors are to display them to the user:`

			`* Count character offset, line number and column number in the lexer;`
			`* Produce readable exceptions;`
			`* Make a clean interface to transmit them;`
			`- Check why exceptions on raw tags effectively escape the content, as it`
			`shouldn't…?`
			`- Look for security flaws (we really don't want stored XSS flaws!).`

			`.. _Planète Casio: https://www.planet-casio.com/Fr/`
			`.. _HTML: https://www.w3.org/html/`
			`.. _Lightscript: https://git.planet-casio.com/lephe/lightscript`
			`.. _Lightscript topic: https://planet-casio.com/Fr/forums/lecture_sujet.php?id=15022`