Most Healthcare and Life Science Data Pipelines Are About to Become Obsolete

Recently, I did a little experiment: I fed a PDF of my routine blood work to an LLM and asked it to return the data in FHIR format. I told it not to make anything up, but do its best to comply with the FHIR DiagnosticReport and Observations specs. I had an actual FHIR version from my iPhone to compare results.

In a few seconds, I had an accurate representation of the results in FHIR format using a $20/month consumer LLM plan (not HIPAA compliant!). The LLM even took the liberty of mapping test names to LOINC codes for me.

Previously, building a proof of concept data pipeline for this task would have taken several people and a minimum of weeks. It would have required independently tuned and validated OCR and NLP tooling, plus code to transform the results into FHIR’s nested JSON representation.

The LLM included every result, but it made some structural choices I didn’t like. The quality relative to my total investment of 30 seconds and fractions of a penny of out-of-pocket costs was certainly better than anything I’ve seen before.

Throughout my career, I’ve advocated starting with clean data before pursuing fancy outcomes. There’s risk with this approach: if you spend too much time preparing data, you never deliver value before running out of time, institutional will, or both. Unfortunately, half-cleaned data doesn’t have half the value; it’s worthless.

LLMs don’t care if one doctor writes “hypertension” and another writes “high blood pressure.” They understand messy, human data the way humans do—through context and intelligence, not rigid rules. I think this changes the whole data processing strategy away from fragile, rule-based approaches to ones that can handle edge cases and nuances.

If you’ve dismissed LLMs due to accuracy concerns in healthcare and life sciences, now is the time to reconsider. Economic forces favor rapid improvement of these models. Data pipeline teams need to figure out where they can go faster using LLMs or risk falling behind.