View this email in your browser

Doctor Penguin Weekly


Welcome to the second week for the Doctor Penguin newsletter! 

Over the past week, these 4 AI papers caught our attention: Ardila et al. propose a deep learning algorithm that uses a patient’s current and prior computed tomography volumes to predict the risk of lung cancer. Dascalu et al. validate the accuracy of skin cancer diagnosis on dermoscopy images acquired from a low-resolution dermoscope by converting image data to sound before inputting into a 1D convolutional network. Seymour et al. derive sepsis phenotypes from clinical data using consensus k-means clustering, determine the reproducibility and relationship of these phenotypes with biomarkers and clinical outcomes, and simulate the potential influence of the phenotypes on the results of previous randomized clinical trials (RCTs). Finally, De Chaumont et al. use random forests to detect and track mice with infrared sensors, to study behavioral traits of both individuals and the groups of mice and provide a phenotypic profile for each animal.

As a bonus this week, we have an expert commentary on AI For Radiation Oncology by Jean-Emmanuel Bibault, MD, PhD.

-- Eric Topol & Pranav Rajpurkar  

Quick Links:

  1. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography
  2. Skin cancer detection by deep learning and sound analysis algorithms: A prospective clinical study of an elementary dermoscope
  3. Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis
  4. Real-time analysis of the behaviour of groups of mice via a depth-sensing camera and machine learning

End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography.


In Nature medicine

With an estimated 160,000 deaths in 2018, lung cancer is the most common cause of cancer death in the United States1. Lung cancer screening using low-dose computed tomography has been shown to reduce mortality by 20-43% and is now included in US screening guidelines1-6. Existing challenges include inter-grader variability and high false-positive and false-negative rates7-10. We propose a deep learning algorithm that uses a patient's current and prior computed tomography volumes to predict the risk of lung cancer. Our model achieves a state-of-the-art performance (94.4% area under the curve) on 6,716 National Lung Cancer Screening Trial cases, and performs similarly on an independent clinical validation set of 1,139 cases. We conducted two reader studies. When prior computed tomography imaging was not available, our model outperformed all six radiologists with absolute reductions of 11% in false positives and 5% in false negatives. Where prior computed tomography imaging was available, the model performance was on-par with the same radiologists. This creates an opportunity to optimize the screening process via computer assistance and automation. While the vast majority of patients remain unscreened, we show the potential for deep learning models to increase the accuracy, consistency and adoption of lung cancer screening worldwide.

Ardila Diego, Kiraly Atilla P, Bharadwaj Sujeeth, Choi Bokyung, Reicher Joshua J, Peng Lily, Tse Daniel, Etemadi Mozziyar, Ye Wenxing, Corrado Greg, Naidich David P, Shetty Shravya


Skin cancer detection by deep learning and sound analysis algorithms: A prospective clinical study of an elementary dermoscope.


In EBioMedicine

BACKGROUND : Skin cancer (SC), especially melanoma, is a growing public health burden. Experimental studies have indicated a potential diagnostic role for deep learning (DL) algorithms in identifying SC at varying sensitivities. Previously, it was demonstrated that diagnostics by dermoscopy are improved by applying an additional sonification (data to sound waves conversion) layer on DL algorithms. The aim of the study was to determine the impact of image quality on accuracy of diagnosis by sonification employing a rudimentary skin magnifier with polarized light (SMP).

METHODS : Dermoscopy images acquired by SMP were processed by a first deep learning algorithm and sonified. Audio output was further analyzed by a different secondary DL. Study criteria outcomes of SMP were specificity and sensitivity, which were further processed by a F2-score, i.e. applying a twice extra weight to sensitivity over positive predictive values.

FINDINGS : Patients (n = 73) fulfilling inclusion criteria were referred to biopsy. SMP analysis metrics resulted in a receiver operator characteristic curve AUC's of 0.814 (95% CI, 0.798-0.831). SMP achieved a F2-score sensitivity of 91.7%, specificity of 41.8% and positive predictive value of 57.3%. Diagnosing the same set of patients' lesions by an advanced dermoscope resulted in a F2-score sensitivity of 89.5%, specificity of 57.8% and a positive predictive value of 59.9% (P=NS).

INTERPRETATION : DL processing of dermoscopic images followed by sonification results in an accurate diagnostic output for SMP, implying that the quality of the dermoscope is not the major factor influencing DL diagnosis of skin cancer. Present system might assist all healthcare providers as a feasible computer-assisted detection system. FUND: Bostel Technologies. Trial Registration Identifier: NCT03362138.

Dascalu A, David E O


Artificial intelligence, Deep learning, Dermoscopy, Melanoma, Skin cancer, Sonification, Telemedicine

Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis.



Importance : Sepsis is a heterogeneous syndrome. Identification of distinct clinical phenotypes may allow more precise therapy and improve care.

Objective : To derive sepsis phenotypes from clinical data, determine their reproducibility and correlation with host-response biomarkers and clinical outcomes, and assess the potential causal relationship with results from randomized clinical trials (RCTs).

Design, Settings, and Participants : Retrospective analysis of data sets using statistical, machine learning, and simulation tools. Phenotypes were derived among 20 189 total patients (16 552 unique patients) who met Sepsis-3 criteria within 6 hours of hospital presentation at 12 Pennsylvania hospitals (2010-2012) using consensus k means clustering applied to 29 variables. Reproducibility and correlation with biological parameters and clinical outcomes were assessed in a second database (2013-2014; n = 43 086 total patients and n = 31 160 unique patients), in a prospective cohort study of sepsis due to pneumonia (n = 583), and in 3 sepsis RCTs (n = 4737).

Exposures : All clinical and laboratory variables in the electronic health record.

Main Outcomes and Measures : Derived phenotype (α, β, γ, and δ) frequency, host-response biomarkers, 28-day and 365-day mortality, and RCT simulation outputs.

Results : The derivation cohort included 20 189 patients with sepsis (mean age, 64 [SD, 17] years; 10 022 [50%] male; mean maximum 24-hour Sequential Organ Failure Assessment [SOFA] score, 3.9 [SD, 2.4]). The validation cohort included 43 086 patients (mean age, 67 [SD, 17] years; 21 993 [51%] male; mean maximum 24-hour SOFA score, 3.6 [SD, 2.0]). Of the 4 derived phenotypes, the α phenotype was the most common (n = 6625; 33%) and included patients with the lowest administration of a vasopressor; in the β phenotype (n = 5512; 27%), patients were older and had more chronic illness and renal dysfunction; in the γ phenotype (n = 5385; 27%), patients had more inflammation and pulmonary dysfunction; and in the δ phenotype (n = 2667; 13%), patients had more liver dysfunction and septic shock. Phenotype distributions were similar in the validation cohort. There were consistent differences in biomarker patterns by phenotype. In the derivation cohort, cumulative 28-day mortality was 287 deaths of 5691 unique patients (5%) for the α phenotype; 561 of 4420 (13%) for the β phenotype; 1031 of 4318 (24%) for the γ phenotype; and 897 of 2223 (40%) for the δ phenotype. Across all cohorts and trials, 28-day and 365-day mortality were highest among the δ phenotype vs the other 3 phenotypes (P < .001). In simulation models, the proportion of RCTs reporting benefit, harm, or no effect changed considerably (eg, varying the phenotype frequencies within an RCT of early goal-directed therapy changed the results from >33% chance of benefit to >60% chance of harm).

Conclusions and Relevance : In this retrospective analysis of data sets from patients with sepsis, 4 clinical phenotypes were identified that correlated with host-response patterns and clinical outcomes, and simulations suggested these phenotypes may help in understanding heterogeneity of treatment effects. Further research is needed to determine the utility of these phenotypes in clinical care and for informing trial design and interpretation.

Seymour Christopher W, Kennedy Jason N, Wang Shu, Chang Chung-Chou H, Elliott Corrine F, Xu Zhongying, Berry Scott, Clermont Gilles, Cooper Gregory, Gomez Hernando, Huang David T, Kellum John A, Mi Qi, Opal Steven M, Talisa Victor, van der Poll Tom, Visweswaran Shyam, Vodovotz Yoram, Weiss Jeremy C, Yealy Donald M, Yende Sachin, Angus Derek C


Real-time analysis of the behaviour of groups of mice via a depth-sensing camera and machine learning.

In Nature Biomedical Engineering

Preclinical studies of psychiatric disorders use animal models to investigate the impact of environmental factors or genetic mutations on complex traits such as decision-making and social interactions. Here, we introduce a method for the real-time analysis of the behaviour of mice housed in groups of up to four over several days and in enriched environments. The method combines computer vision through a depth-sensing infrared camera, machine learning for animal and posture identification, and radio-frequency identification to monitor the quality of mouse tracking. It tracks multiple mice accurately, extracts a list of behavioural traits of both individuals and the groups of mice, and provides a phenotypic profile for each animal. We used the method to study the impact of Shank2 and Shank3 gene mutations—mutations that are associated with autism—on mouse behaviour. Characterization and integration of data from the behavioural profiles of Shank2 and Shank3mutant female mice revealed their distinctive activity levels and involvement in complex social interactions.

Fabrice de Chaumont, Elodie Ey, Nicolas Torquet, Thibault Lagache, Stéphane Dallongeville, Albane Imbert, Thierry Legou, Anne-Marie Le Sourd, Philippe Faure, Thomas Bourgeron & Jean-Christophe Olivo-Marin

Expert Commentary: AI For Radiation Oncology

About the author: Jean-Emmanuel Bibault, MD, PhD is a postdoctoral research fellow in the Laboratory of Artificial Intelligence in Medicine and Biomedical Physics at Stanford University and a radiation oncologist with a PhD in medical informatics.
Radiation Oncology is one of the most structured fields of medicine because every treatment needs to be planned and monitored on a computer. There is significant progress that AI has been making in the field, including for image segmentation and treatment response prediction tasks.

1. Image segmentation for treatment planning
Before a patient can be treated with radiotherapy, they go through several steps, including a CT Scan to acquire their anatomy. On this scan, a radiation oncologist often delineates the target volumes and all the surrounding organs in 3D: these volumes are critical to prescribe the radiation dose, in order to have a therapeutic effect, without unacceptable toxicities. This delineation process is very time consuming, because each CT slide must be manually contoured. Traditional atlas-based methods have been developed but still require significant human intervention. Mak et al. have recently investigated the use of crowd innovation to rapidly produce artificial intelligence (AI) solutions that replicate the accuracy of an expert radiation oncologist in segmenting lung tumors for RT targeting [1].

Figure 1. An example of three-dimensional manual segmentation for a pelvic malignancy
2. Treatment outcome prediction
Beyond tumor and organ segmentation, machine learning is also being used to predict the treatment outcome. This approach could guide treatment strategies to better personalize treatments. Our team worked on predicting the pathological complete response from quantitative features extracted from the treatment planning CT Scans. We know that up to a quarter of patients with locally advanced rectal cancer can be considered cured by chemoradiation alone. But today’s standard of care requires that a total mesorectal excision be performed for all patients, with significant morbidity. Our goal was to identify the patients that could potentially avoid surgery. Using a deep neural network, we were able to identify these patients by correlating their radiomics profile and the pathological complete response in a cohort of 96 patients (accuracy=0.8, AUC=0.72 (95% CI = 0.65 to 0.87) [2]. Going forward, the performance of our model will need to be validated on a much larger cohort, in a prospective manner.
1. Mak RH, Endres MG, Paik JH, et al. Use of Crowd Innovation to Develop an Artificial Intelligence-Based Solution for Radiation Therapy Targeting. JAMA Oncol. April 2019. doi:10.1001/jamaoncol.2019.0159
2. Bibault J-E, Giraud P, Durdux C, et al. Deep Learning and Radiomics predict complete response after neo-adjuvant chemoradiation for locally advanced rectal cancer. Sci Rep. 2018;8(1):12611. doi:10.1038/s41598-018-30657-6
Follow Us on

This email was sent to <<Email Address>>
why did I get this?    unsubscribe from this list    update subscription preferences
Stanford · 353 Serra Mall · Stanford, CA 94305-5008 · USA

Email Marketing Powered by Mailchimp