-
-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Description
Morpheme.split method seems returns empty morpheme list when the morpheme does not has split information.
It should return a list contains itself in that case.
reproduce:
tok = sudachipy.Dictionary().create()
ms = tok.tokenize("国会議事堂前駅で降りる")
[m.split(mode="a") for m in ms]
outputs:
[<MorphemeList[
<Morpheme(国会, 0:2, (0, 364210))>,
<Morpheme(議事, 2:4, (0, 686966))>,
<Morpheme(堂, 4:5, (0, 368464))>,
<Morpheme(前, 5:6, (0, 318425))>,
<Morpheme(駅, 6:7, (0, 755333))>,
]>, <MorphemeList[
]>, <MorphemeList[
]>]
should be:
[<MorphemeList[
<Morpheme(国会, 0:2, (0, 364210))>,
<Morpheme(議事, 2:4, (0, 686966))>,
<Morpheme(堂, 4:5, (0, 368464))>,
<Morpheme(前, 5:6, (0, 318425))>,
<Morpheme(駅, 6:7, (0, 755333))>,
]>, <MorphemeList[
<Morpheme(で, ...)>,
]>, <MorphemeList[
<Morpheme(降りる, ...)>,
]>]
Metadata
Metadata
Assignees
Labels
No labels