Discussion about this post

User's avatar
Josh's avatar

This was a super interesting article! But I suspect the differences you're seeing are mostly from specific product decisions in the chat app, rather than any differences between individual SOTA models.

Which especially makes sense in the context of healthcare! Sure, each individual doctor is smart and well trained, but the "intelligence" of the system lies in the org structure, context management, and processes:

1/ The triage nurse: The initial prompt engineering and intent routing.

2/ The chart and patient history: The context window + maybe RAG pipeline

3/ The differential diagnosis: The explicit agentic loop mapping out multi-step reasoning as well as the system prompt

4/ Lab techs and specialists: The external tool use and continuous verification.

Aliaks Ramaniuk's avatar

tldr: meh.

overall. interesting. big fan of the stack. solid effort with Dr Farag and making few valid points. however was difficult to get through the article during my read. the main issue I see if you fall into few medical student level fallacies. the ''right answer'' from clinical case studies you're 'emphysematous chole - because ruq pain and fever, and PAD because aki and fingers cold'' are ok for medical student board prep pearls but are not how pulmonary and critical care medicine is done and not a fair LLM test. trash in trash out. it reminds me of poorly written test questions on USMLE ,that reflexively say ' if Bubba and his dog boss have a cold -- right answer is it can only be blastomycosis. '' not real life and not a useful model test. the ACEi entrappment on timing also falls short. overall keep up the efforts but validation modeling would be best served in complex differential developement and not one reflexive answers on a poorly writtent multiple choice shelf exam.

12 more comments...

No posts

Ready for more?