Skip to main content

Creating Value with Real World Data and Natural Language Processing

The era of big data in health care is upon us, powered by the digitization of all healthcare information. Fundamental to this trend has been the widespread adoption of the electronic health record (EHR). (1) However, enthusiasm regarding the use of EHR data in research has been tempered by concerns over data quality – inconsistent formatting, missingness, variability in recording and the fact that many important variables are not routinely coded but contained in free text. (2) Indeed, most data in EHRs are recorded in free text, including physician notes, radiology reports, pathology reports, etc. Mining such free text presents a challenge. Natural language processing (NLP) provides promising tools to make sense of these unstructured data (3); however, NLP is not systematically applied to code all concepts of interest, and the state of the art does not yet give human-level accuracy. Clinicians instinctively know this, and it underlies their ongoing motivation to enter much of the data into EHRs as free text. (3) Unlike the entry of coded information, narrative text captures a more nuanced patient narrative, can be told from different perspectives, and allows expressions of feelings. In addition...

Hit the "download" button below to read the full article.