
black box 까본 연구들은 초록색 글씨
추가 논문들
Explainability for LLMs 연구의 필요성
전통적인 기존 해석 방법의 한계
- LLM 사이즈 너무 커져서(increased complexity) 기존의 해석을 위한 gradient 기반 방법들, SHAP values 등등 은 computational power 가 너무 많이 든다.
논문 contribution
Section 2. Training Paradigms of LLMs
Section3. Explanation for Traditional Fine-Tuning Paradigm
- Here, local explanation aims to provide an understanding of how a language model makes a prediction for a specific input instance, while global explanation aims to provide a broad
understanding of how the LLM works overall. Next, we discuss how explanations can be used to debug and
improve models (Section 3.3)

3.1 Local Explanation
3.2 Global Explanation
3.3 Making Use of Explanations
Section 4. EXPLANATION FOR PROMPTING PARADIGM
- traditional explanation methods are unsuitable for LLMs 이유
- the aggressive surge in model scale (급증)
- Additionally, computationally demanding explanation techniques quickly become infeasible at the scale of hundreds of billions of parameters or more.
- Further, the intricate inner workings and reasoning processes of prompting-based models are too complex to be effectively captured by simplified surrogate models.
4.1 Base Model Explanation