eternally stressed semanticist (cqs) wrote,
eternally stressed semanticist

How not to remove suffixes

Sadly, I don't have much interesting to report about my work, because most of the interesting things I've discovered have been on particular not-yet-discussable projects; and because mostly I've just been frustrated with the state of natural language processing, which I'm sure has made advances since the last time I looked in on it when I was in college, but I'm having a little trouble seeing them. (I'm sure tf*idf is great and all, but I'd like it better if it were working.)

So the best I can do, in the tradition of this video courtesy of Heidi Harley, is to observe that a coworker has found an example of bad suffix removal that probably beats anything I've found so far. I was merely amused to learn that a particularly common term in one set of data was the singular city of "Lo Angele", the parser having helpfully stripped away the plural suffixes; but he discovered that the parser did the same for the less-than-superlative city of Budap.
  • Post a new comment


    default userpic

    Your reply will be screened

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment