Vanessa Klotzman
Objective: This study aims to predict ICD-10-CM codes for medical diagnoses from short diagnosis descriptions and compare two distinct.
Approaches: similarity search and using a generative model with few-shot learning.
Materials and Methods: The text-embedding-ada-002 model was used to embed textual descriptions of 2023 ICD-10-CM diagnosis codes, provided by the Centers provided for Medicare & Medicaid Services. GPT-4 used few-shot learning. Both models underwent performance testing on 666 data points from the eICU Collaborative Research Database.
Results: The text-embedding-ada-002 model successfully identified the relevant code from a set of similar codes 80% of the time, while GPT-4 achieved a 50 % accuracy in predicting the correct code.
Discussion: The work implies that text-embedding-ada-002 could automate medical coding better than GPT-4, highlighting potential limitations of generative language models for complicated tasks like this.
Conclusion: The research shows that text-embedding-ada-002 outperforms GPT4 in medical coding, highlighting embedding models’ usefulness in the domain of medical coding.