diff --git a/peps/pep-0822.rst b/peps/pep-0822.rst index 29eeabe44ab..dec2a7ef402 100644 --- a/peps/pep-0822.rst +++ b/peps/pep-0822.rst @@ -18,23 +18,25 @@ multiline string literals. Dedented multiline strings use a new prefix "d" (shorthand for "dedent") before the opening quote of a multiline string literal. -Example (spaces are visualized as ``_``): +Example (spaces are visualized as ``.``): .. code-block:: python - def hello_paragraph() -> str: - ____return d""" - ________

- __________Hello, World! - ________

- ____""" + def hello_paragraph() -> str: + ....return d""" + ........

+ ..........Hello, World! + ........

+ ....""" -The closing triple quotes control how much indentation would be removed. -In the above example, the returned string will contain three lines: +Unlike ``textwrap.dedent()``, indentation before closing quotes is also +considered when determining the amount of indentation to be removed. +Therefore, the string returned in the example above consists of the following +three lines. -* ``"____

\n"`` (four leading spaces) -* ``"______Hello, World!\n"`` (six leading spaces) -* ``"____

\n"`` (four leading spaces) +* ``"....

\n"`` +* ``"......Hello, World!\n"`` +* ``"....

\n"`` Motivation @@ -43,7 +45,7 @@ Motivation When writing multiline string literals within deeply indented Python code, users are faced with the following choices: -* Accept that the content of the string literal will be left-aligned. +* Write the contents of the string without indentation. * Use multiple single-line string literals concatenated together instead of a multiline string literal. * Use ``textwrap.dedent()`` to remove indentation. @@ -51,14 +53,15 @@ users are faced with the following choices: All of these options have drawbacks in terms of code readability and maintainability. -* Left-aligned multiline strings look awkward and tend to be avoided. +* Writing multiline strings without indentation in deeply indented code + looks awkward and tends to be avoided. In practice, many places including Python's own test code choose other methods. * Concatenated single-line string literals are more verbose and harder to - maintain. + maintain. Writing ``"\n"`` at the end of each line is tedious. + It's easy to miss the semicolons between many string concatenations. * ``textwrap.dedent()`` is implemented in Python so it requires some runtime - overhead. - It cannot be used in hot paths where performance is critical. + overhead. Moreover, it cannot be used to dedent t-strings. This PEP aims to provide a built-in syntax for dedented multiline strings that is both easy to read and write, while also being efficient at runtime. @@ -99,30 +102,41 @@ Specification ============= Add a new string literal prefix "d" for dedented multiline strings. -This prefix can be combined with "f", "t", and "r" prefixes. +This prefix can be combined with "f", "t", "r", and "b" prefixes. This prefix is only for multiline string literals. So it can only be used with triple quotes (``"""`` or ``'''``). -Using it with single or double quotes (``"`` or ``'``) is a syntax error. Opening triple quotes needs to be followed by a newline character. This newline is not included in the resulting string. +The content of the d-string starts from the next line. -The amount of indentation to be removed is determined by the whitespace -(``' '`` or ``'\t'``) preceding the closing triple quotes. -Mixing spaces and tabs in indentation raises a ``TabError``, similar to -Python's own indentation rules. +Indentation is leading whitespace characters (spaces and tabs) of each line. -The dedentation process removes the determined amount of leading whitespace -from every line in the string. -Lines that are shorter than the determined indentation become just an empty -line (e.g. ``"\n"``). -Otherwise, if the line does not start with the determined indentation, -Python raises an ``IndentationError``. +The amount of indentation to be removed is determined by the longest common +indentation of lines in the string. +Lines consisting entirely of whitespace characters are ignored when +determining the common indentation, except for the line containing the closing +triple quotes. + +Spaces and tabs are treated as different characters. +For example, ``" hello"`` and ``"\thello"`` have no common indentation. + +The dedentation process removes the determined indentation from every line in +the string. + +* Lines that are longer than or equal in length to the determined indentation + must start with the determined indentation. + Othrerwise, Python raises an ``IndentationError``. + The determined indentation is removed from these lines. +* Lines that are shorter than the determined indentation (including + empty lines) must be a prefix of the determined indentation. + Otherwise, Python raises an ``IndentationError``. + These lines become empty lines. Unless combined with the "r" prefix, backslash escapes are processed after -removing indentation. -So you cannot use ``\\t`` to create indentation. +the dedentation process. +So you cannot use ``\\t`` in indentations. And you can use line continuation (backslash at the end of line) and remove indentation from the continued line. @@ -130,102 +144,177 @@ Examples: .. code-block:: python - # Whitespace is shown as _ and tab is shown as ---> for clarity. - # Error messages are just for explanation. Actual messages may differ. - - s = d"" # SyntaxError: d-string must be a multiline string - s = d"""Hello""" # SyntaxError: d-string must be a multiline string - s = d"""Hello - __World! - """ # SyntaxError: d-string must start with a newline - - s = d""" - __Hello - __World!""" # SyntaxError: d-string must end with an indent-only line - - s = d""" - __Hello - __World! - """ # Zero indentation is removed because closing quotes are not indented. - print(repr(s)) # '__Hello\n__World!\n' - - s = d""" - __Hello - __World! - _""" # One space indentation is removed. - print(repr(s)) # '_Hello\n_World!\n' - - s = d""" - __Hello - __World! - __""" # Two spaces indentation are removed. - print(repr(s)) # 'Hello\nWorld!\n' - - s = d""" - __Hello - __World! - ___""" # IndentationError: missing valid indentation - - s = d""" - --->Hello - __World! - __""" # IndentationError: missing valid indentation - - s = d""" - --->--->__Hello - --->--->__World! - --->--->""" # Tab is allowed as indentation. - # Spaces are just in the string, not indentation to be removed. - print(repr(s)) # '__Hello\n__World!\n' - - s = d""" - --->____Hello - --->____World! - --->__""" # TabError: mixing spaces and tabs in indentation - - s = d""" - __Hello \ - __World!\ - __""" # line continuation works as ususal - print(repr(s)) # 'Hello_World!' - - s = d"""\ - __Hello - __World - __""" # SyntaxError: d-string must starts with a newline. - - s = dr""" - __Hello\ - __World!\ - __""" # d-string can be combined with r-string. - print(repr(s)) # 'Hello\\\nWorld!\\\n' - - s = df""" - ____Hello, {"world".title()}! - ____""" # d-string can be combined with f-string and t-string too. - print(repr(s)) # 'Hello, World!\n' - - s = dt""" - ____Hello, {"world".title()}! - ____""" - print(type(s)) # - print(s.strings) # ('Hello, ', '!\n') - print(s.values) # ('World',) - print(s.interpolations) - # (Interpolation('World', '"world".title()', None, ''),) + # d-string must starts with a newline. + s = d"" # SyntaxError: d-string must be triple-quoted + s = d"""""" # SyntaxError: d-string must start with a newline + s = d"""Hello""" # SyntaxError: d-string must start with a newline + s = d"""Hello + ..World! + """ # SyntaxError: d-string must start with a newline + + # d-string removes the longest common indentation from each line. + # Empty lines are ignored, but closing quotes line is always considered. + s = d""" + ..Hello + ..World! + ..""" + print(repr(s)) # 'Hello\nWorld!\n' + + s = d""" + ..Hello + ..World! + .""" + print(repr(s)) # '.Hello\n.World!\n' + + s = d""" + ..Hello + ..World! + """ + print(repr(s)) # '..Hello\n..World!\n' + + s = d""" + ..Hello + . + + ..World! + ...""" # Longest common indentation is '..'. + print(repr(s)) # 'Hello\n\n\nWorld!\n.' + + # Closing qutotes can be on the same line as the last content line. + # In this case, the string does not end with a newline. + s = d""" + ..Hello + ..World!""" + print(repr(s)) # 'Hello\nWorld!' + + # Tabs are allowed as indentation. + # But tabs and spaces are treated as different characters. + s = d""" + --->..Hello + --->..World! + --->""" + print(repr(s)) # '..Hello\n..World!\n' + + s = d""" + --->Hello + ..World! + ..""" # There is no common indentation. + print(repr(s)) # '\tHello\n..World!\n..' + + # Line continuation with backslash works as usual. + # But you cannot put a backslash right after the opening quotes. + s = d""" + ..Hello \ + ..World!\ + ..""" + print(repr(s)) # 'Hello World!' + + s = d"""\ + ..Hello + ..World + ..""" # SyntaxError: d-string must starts with a newline. + + # d-string can be combined with r-string, b-string, f-string, and t-string. + s = dr""" + ..Hello\ + ..World!\ + ..""" + print(repr(s)) # 'Hello\\\nWorld!\\\n' + + s = db""" + ..Hello + ..World! + ..""" + print(repr(s)) # b'Hello\nWorld!\n' + + s = df""" + ....Hello, {"world".title()}! + ....""" + print(repr(s)) # 'Hello,.World!\n' + + s = dt""" + ....Hello, {"world".title()}! + ....""" + print(type(s)) # + print(s.strings) # ('Hello, ', '!\n') + print(s.values) # ('World',) How to Teach This ================= -In the tutorial, we can introduce d-string with triple quote string literals. -Additionally, we can add a note in the ``textwrap.dedent()`` documentation, -providing a link to the d-string section in the language reference or -the relevant part of the tutorial. +The main difference between ``textwrap.dedent("""...""")`` and d-string can be +explained as follows: + +* ``textwrap.dedent()`` is a regular function, but d-string is part of the + language syntax. d-string has no runtime overhead, and it can remove + indentation from t-strings. + +* When using ``textwrap.dedent()``, you need to start with ``"""\`` to avoid + including the first newline character, but with d-string, the string content + starts from the line after ``d"""``, so no backslash is needed. + + .. code-block:: python + + import textwrap + + s1 = textwrap.dedent("""\ + Hello + World! + """) + s2 = d""" + Hello + World! + """ + assert s1 == s2 + +* ``textwrap.dedent()`` ignores all blank lines when determining the common + indentation, but d-string also considers the indentation of the closing + quotes. + This allows d-string to preserve some indentation in the result when needed. + + .. code-block:: python + + import textwrap + + s1 = textwrap.dedent("""\ + Hello + World! + """) + s2 = d""" + Hello + World! + """ + assert s1 != s2 + assert s1 == 'Hello\nWorld!\n' + assert s2 == ' Hello\n World!\n' + +* Since d-string removes indentation before processing escape sequences, + when using line continuation (backslash at the end of a line), the next line + can also be dedented. + + .. code-block:: python + + import textwrap + + s1 = textwrap.dedent("""\ + Lorem ipsum dolor sit amet, consectetur adipiscing elit, \ + sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. + Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris \ + nisi ut aliquip ex ea commodo consequat. + """) + s2 = d""" + Lorem ipsum dolor sit amet, consectetur adipiscing elit, \ + sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. + Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris \ + nisi ut aliquip ex ea commodo consequat. + """ + assert s1 == s2 + Other Languages having Similar Features -======================================== +======================================= Java 15 introduced a feature called `text blocks `__. Since Java had not used triple qutes before, they introduced triple quotes for @@ -242,14 +331,14 @@ PHP 7.3 introduced `Flexible Heredoc and Nowdoc Syntaxes `__ +that removes indent from lines in heredoc. + +Java, Julia, and Ruby uses the least-indented line to determine the amount of indentation to be removed. Swift, C#, and PHP uses the indentation of the closing triple quotes or closing marker. -This PEP chose the Swift and C# approach because it is simpler and easier to -explain. - Reference Implementation ======================== @@ -312,6 +401,74 @@ Therefore, `many people preferred the new string prefix