-
-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Labels
pythonPython binding-relatedPython binding-related
Description
Creating pre-tokenizer with surface-projection specified does not overrides the projection of dictionary.
In test:
sudachi.rs/python/tests/test_pretokenizers.py
Lines 105 to 119 in 232d9ee
| def test_projection_surface_override(self): | |
| dictobj = sudachipy.Dictionary(config=sudachipy.config.Config(projection="reading")) | |
| pretok = dictobj.pre_tokenizer(sudachipy.SplitMode.A, projection="surface") | |
| vocab = { | |
| "[UNK]": 0, | |
| "サケ": 1, | |
| "ヒト": 2, | |
| "ノム": 3, | |
| "ヲ": 5, | |
| "外国人参政権": 4 | |
| } | |
| tok = tokenizers.Tokenizer(WordLevel(vocab, unk_token="[UNK]")) | |
| tok.pre_tokenizer = pretok | |
| res = tok.encode("酒を飲む人") | |
| self.assertEqual(res.ids, [1, 5, 3, 2]) |
Is this intentional or bug?
Metadata
Metadata
Assignees
Labels
pythonPython binding-relatedPython binding-related