Things I hope to work on when I have time
Here are a few ideas that came to me while I was walking to my office.
- Use a Transformer model as data compression. For example, the pre-trained GPT-2-small is around 523M. If it’s overfitted on the training corpus, which is 40G web text, it can potentially be used for compression.
- Study what caused decoding-time repetition problem for auto-regressive sequence models. Why it is so rarely observed in training but quite often in RNN or even Transformer models. Existing work adds coverage mechanism or disabling n-grams to avoid this, but can something be done during training?
Probably Unrealistic Ones
- Record my keystrokes, possibly limited to certain environment, and train a neural language model. Try to predict what my next key is.