T.R. Davidson, V. Surkov, V. Veselovsky, G. Russo, R. West, C. Gulcehre
   🎓 arXiv paper  🤖 GitHub repository  📰 IEEE Spectrum article
A rapidly growing number of applications is being built on just a few frontier LMs. This dependency might introduce novel security risks if LMs develop self-recognition capabilities. Inspired by human verification methods, we assess self-recognition in LMs using model-generated "security questions".
TL;DR: Novel insights on self-recognition and position bias in LMs