Introducing spaCy v2.1
Explosion
MARCH 17, 2019
doc = nlp("I live in NewYork") with doc.retokenize() as retokenizer: heads = [(doc[3], 1), doc[2]] attrs = {"POS": ["PROPN", "PROPN"], "DEP": ["pobj", "compound"]} retokenizer.split(doc[3], ["New", "York"], heads=heads, attrs=attrs) With better splitting and merging, we’re also well set up for better support for statistical tokenization.
Let's personalize your content