Commit Graph

7 Commits

Author SHA1 Message Date
Enno Hermann 39321d02be
fix: correctly strip/restore initial punctuation (#3336)
* refactor(punctuation): remove orphan code for handling lone punctuation

The case of lone punctuation is already handled at the top of restore(). The
removed if statement would never be called and would in fact raise an
AttributeError because the _punc_index named tuple doesn't have the attribute
`mark`.

* refactor(punctuation): remove unused argument

* fix(punctuation): correctly handle initial punctuation

Stripping and restoring initial punctuation didn't work correctly because the
string-splitting caused an additional empty string to be inserted in the text
list (because `".A".split(".")` => `["", "A"]`). Now, an initial empty string is
skipped and relevant test cases are added.

Fixes #3333
2023-11-30 13:03:16 +01:00
WeberJulian 5cef6facb0
Fix tokenizer for punc only (#1717) 2022-07-06 22:59:41 +02:00
Eren Gölge 424d04e4f6 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge c9972e6f14 Make lint 2022-02-25 11:07:34 +01:00
Eren Gölge 79a84410f2 Test punctuations 2022-02-25 10:48:02 +01:00
Eren Gölge d8bdeb8b8f Fix Punctuation 2022-02-25 10:48:02 +01:00
Eren Gölge 8d85af84cd Implement Punctuation class 2022-02-25 09:32:54 +01:00