hissp.munger module#
Lisspās symbol munger.
Encodes Lissp symbols with special characters into valid, human-readable (if unpythonic) Python identifiers, using NFKC normalization and Quotez.
E.g. *FOO-BAR*
becomes QzSTAR_FOOQz_BARQzSTAR_
.
Quotez are written in upper case and wrapped in a Qz
and _
.
This format was chosen because it contains an underscore
and both upper-case and lower-case letters,
which makes it distinct from
standard Python naming conventions:
lower_case_with_underscores
,
UPPER_CASE_WITH_UNDERSCORES
,
and CapWords
,
as well as an extremely rare bigram, āQzā,
which makes the Quotez (but not the normalization)
reversible in the usual cases,
and also cannot introduce a leading underscore,
which can have special meaning in Python.
Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals.
The demunge
function will accept any of these encodings,
while the munge
function will prioritize short names,
then fall back to Unicode names, then fall back to ordinals.
Short names are given in the TO_NAME
table in this module.
Any spaces in the Unicode names are replaced with an x
and
any hyphens are replaced with an h
.
(Unicode names are in all caps and these substitutions are lower-case.)
Ordinals are given in base 10.
- hissp.munger.munge(s: str) str [source]#
Lisspās symbol munger.
Encodes Lissp symbols with special characters into valid, human-readable (if unpythonic) Python identifiers, using NFKC normalization and Quotez.
Inputs that begin with
:
are assumed to be control words and returned unmodified. Full stops are handled separately, as those are meaningful to Hissp.
- hissp.munger.force_munge(s: str) str [source]#
As
munge
, but skips the control word check.Used for reader tags.
- hissp.munger.QUOTEZ = 'Qz{}_'#
Format string for creating Quotez.
- hissp.munger.FIND_QUOTEZ = re.compile('Qz([0-9A-Z][0-9A-Zhx]*?)?_')#
Regex pattern to find Quotez. Used by
demunge
.
- hissp.munger.TO_NAME = {'!': 'QzBANG_', '"': 'QzQUOT_', '#': 'QzHASH_', '$': 'QzDOLR_', '%': 'QzPCENT_', '&': 'QzET_', "'": 'QzAPOS_', '(': 'QzLPAR_', ')': 'QzRPAR_', '*': 'QzSTAR_', '+': 'QzPLUS_', '-': 'Qz_', '/': 'QzSOL_', ';': 'QzSEMI_', '<': 'QzLT_', '=': 'QzEQ_', '>': 'QzGT_', '?': 'QzQUERY_', '@': 'QzAT_', '[': 'QzLSQB_', '\\': 'QzBSOL_', ']': 'QzRSQB_', '^': 'QzHAT_', '`': 'QzGRAVE_', '{': 'QzLCUB_', '|': 'QzVERT_', '}': 'QzRCUB_'}#
Shorter names for Quotez.
- hissp.munger.qz_encode(c: str) str [source]#
Converts a character to its Quotez encoding, unless itās already valid in a Python identifier.
- hissp.munger.force_qz_encode(c: str) str [source]#
Converts a character to its Quotez encoding, even if itās valid in a Python identifier.
- hissp.munger.LOOKUP_NAME = {'QzAPOS_': "'", 'QzAT_': '@', 'QzBANG_': '!', 'QzBSOL_': '\\', 'QzDOLR_': '$', 'QzEQ_': '=', 'QzET_': '&', 'QzGRAVE_': '`', 'QzGT_': '>', 'QzHASH_': '#', 'QzHAT_': '^', 'QzLCUB_': '{', 'QzLPAR_': '(', 'QzLSQB_': '[', 'QzLT_': '<', 'QzPCENT_': '%', 'QzPLUS_': '+', 'QzQUERY_': '?', 'QzQUOT_': '"', 'QzRCUB_': '}', 'QzRPAR_': ')', 'QzRSQB_': ']', 'QzSEMI_': ';', 'QzSOL_': '/', 'QzSTAR_': '*', 'QzVERT_': '|', 'Qz_': '-'}#
The inverse of
TO_NAME
.
- hissp.munger.demunge(s: str) str [source]#
The inverse of
munge
. Decodes any Quotez into characters.Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals.
demunge
will decode any of these, even thoughmunge
will consistently pick only one of these for any given character.demunge
will also leave the remaining text as-is, along with any invalid Quotez.>>> demunge("QzFOO_QzGT_QzHYPHENhMINUS_Qz62_bar") 'QzFOO_>->bar'