Statistics question

The majority of today’s grammar checkers are still rule-based. But as Data Science and statistical methods are gaining relevance in NLP, grammar checking can also be performed on the basis of Big Data- as opposed to grammar rules.


The task of this semester project is to develop a statistical grammar checker.


Your prototype (for English, German or Arabic) should include (at least) the following features:


  • a GUI via which input can be typed,

  • a match of the input (of n-grams thereof) to „big data“, i.e. large corpora which are available online,

  • a calculation on the basis of statistical methods (e.g. n-gram counts, Markov probabilities,…),

  • the detection & handling of errors, and

  • a suggestion of a correction (i.e. a more likely string).

Powered by WordPress