Xinyu Hua

Things I hope to work on when I have time

Here are a few ideas that came to me while I was walking to my office.

Realistic Ones

  1. Use a Transformer model as data compression. For example, the pre-trained GPT-2-small is around 523M. If it’s overfitted on the training corpus, which is 40G web text, it can potentially be used for compression.
  2. Study what caused decoding-time repetition problem for auto-regressive sequence models. Why it is so rarely observed in training but quite often in RNN or even Transformer models. Existing work adds coverage mechanism or disabling n-grams to avoid this, but can something be done during training?

Probably Unrealistic Ones

  1. Record my keystrokes, possibly limited to certain environment, and train a neural language model. Try to predict what my next key is.