Remove newyork
article thumbnail

Introducing spaCy v2.1

Explosion

doc = nlp("I live in NewYork") with doc.retokenize() as retokenizer: heads = [(doc[3], 1), doc[2]] attrs = {"POS": ["PROPN", "PROPN"], "DEP": ["pobj", "compound"]} retokenizer.split(doc[3], ["New", "York"], heads=heads, attrs=attrs) With better splitting and merging, we’re also well set up for better support for statistical tokenization.

NLP 52