The study examines how well language models handle longer input contexts and their performance in tasks like multi-document question answering and key-value retrieval. The authors show that models perform best when relevant information is at the beginning or end of the input, but their performance drops significantly when accessing information in the middle of lengthy contexts. Even explicitly designed long-context models experience decreased performance with longer input. https://arxiv.org/pdf/2307.03172.pdf