Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks. Nevertheless, achieving optimal performance often requires careful adjustment. One crucial aspect is data quantity. LLMs are instructed on massive datasets, and the relevance of this data directly influences model performance. Furthermore,