View this email in your browser

Doctor Penguin Weekly

Welcome to the third week for the Doctor Penguin newsletter! 

Over the past week, these AI papers caught our attention: 

Cristiano et al. developed a machine learning model to detect cancer from genome-wide fragmentation features: blood was collected from healthy individuals and patients with cancer, cfDNA was extracted from plasma, processed into sequencing libraries, examined by WGS, mapped to the genome, and analyzed to determine cfDNA fragmentation profiles across the genome, after which ML was used to categorize whether individuals had cancer and identify the tumor tissue of origin. Sundaram et al. developed a deep convolutional neural network trained purely on tactile information to identify or weigh objects and explore the tactile signatures of the human grasp. Mishima et al. evaluated Face2Gene---a previously developed deep learning-based diagnosis assistance system that utilizes patients’ facial images to suggest congenital dysmorphic syndrome---on a patient population in Japan to investigate the effect of ethnicity and age on the system.
-- Eric Topol & Pranav Rajpurkar  

Quick Links:

  1. Genome-wide cell-free DNA fragmentation in patients with cancer
  2. Learning the signatures of the human grasp using a scalable tactile glove
  3. Evaluation of Face2Gene using facial images of patients with congenital dysmorphic syndromes recruited in Japan
  4. Deep Learning Convolutional Neural Networks for the Automatic Quantification of Muscle Fat Infiltration Following Whiplash Injury
  5. A study of deep learning approaches for medication and adverse drug event extraction from clinical text
  6. Capturing the differences between humoral immunity in the normal and tumor environments from repertoire-seq of B-cell receptors using supervised machine learning

In Nature

Cell-free DNA in the blood provides a non-invasive diagnostic avenue for patients with cancer1. However, characteristics of the origins and molecular features of cell-free DNA are poorly understood. Here we developed an approach to evaluate fragmentation patterns of cell-free DNA across the genome, and found that profiles of healthy individuals reflected nucleosomal patterns of white blood cells, whereas patients with cancer had altered fragmentation profiles. We used this method to analyse the fragmentation profiles of 236 patients with breast, colorectal, lung, ovarian, pancreatic, gastric or bile duct cancer and 245 healthy individuals. A machine learning model that incorporated genome-wide fragmentation features had sensitivities of detection ranging from 57% to more than 99% among the seven cancer types at 98% specificity, with an overall area under the curve value of 0.94. Fragmentation profiles could be used to identify the tissue of origin of the cancers to a limited number of sites in 75% of cases. Combining our approach with mutation-based cell-free DNA analyses detected 91% of patients with cancer. The results of these analyses highlight important properties of cell-free DNA and provide a proof-of-principle approach for the screening, early detection and monitoring of human cancer.

Stephen Cristiano, Alessandro Leal, Jillian Phallen, Jacob Fiksel, Vilmos Adleff, Daniel C. Bruhm, Sarah Østrup Jensen, Jamie E. Medina, Carolyn Hruban, James R. White, Doreen N. Palsgrove, Noushin Niknafs, Valsamo Anagnostou, Patrick Forde, Jarushka Naidoo, Kristen Marrone, Julie Brahmer, Brian D. Woodward, Hatim Husain, Karlijn L. van Rooijen, Mai-Britt Worm Ørntoft, Anders Husted Madsen, Cornelis J. H. van de Velde, Marcel Verheij, Annemieke Cats, Cornelis J. A. Punt, Geraldine R. Vink, Nicole C. T. van Grieken, Miriam Koopman, Remond J. A. Fijneman, Julia S. Johansen, Hans Jørgen Nielsen, Gerrit A. Meijer, Claus Lindbjerg Andersen, Robert B. Scharpf & Victor E. Velculescu



In Nature

Humans can feel, weigh and grasp diverse objects, and simultaneously infer their material properties while applying the right amount of force-a challenging set of tasks for a modern robot1. Mechanoreceptor networks that provide sensory feedback and enable the dexterity of the human grasp2 remain difficult to replicate in robots. Whereas computer-vision-based robot grasping strategies3-5 have progressed substantially with the abundance of visual data and emerging machine-learning tools, there are as yet no equivalent sensing platforms and large-scale datasets with which to probe the use of the tactile information that humans rely on when grasping objects. Studying the mechanics of how humans grasp objects will complement vision-based robotic object handling. Importantly, the inability to record and analyse tactile signals currently limits our understanding of the role of tactile information in the human grasp itself-for example, how tactile maps are used to identify objects and infer their properties is unknown6. Here we use a scalable tactile glove and deep convolutional neural networks to show that sensors uniformly distributed over the hand can be used to identify individual objects, estimate their weight and explore the typical tactile patterns that emerge while grasping objects. The sensor array (548 sensors) is assembled on a knitted glove, and consists of a piezoresistive film connected by a network of conductive thread electrodes that are passively probed. Using a low-cost (about US$10) scalable tactile glove sensor array, we record a large-scale tactile dataset with 135,000 frames, each covering the full hand, while interacting with 26 different objects. This set of interactions with different objects reveals the key correspondences between different regions of a humanhand while it is manipulating objects. Insights from the tactile signatures of the human grasp-through the lens of an artificial analogue of the natural mechanoreceptor network-can thus aid the future design of prosthetics7, robot grasping tools and human-robot interactions1,8-10.

Subramanian Sundaram, Petr Kellnhofer, Yunzhu Li, Jun-Yan Zhu, Antonio Torralba & Wojciech Matusik 



In Journal of human genetics

An increasing number of genetic syndromes present a challenge to clinical geneticists. A deep learning-based diagnosis assistance system, Face2Gene, utilizes the aggregation of "gestalt," comprising data summarizing features of patients' facial images, to suggest candidate syndromes. Because Face2Gene's results may be affected by ethnicity and age at which training facial images were taken, the system performance for patients in Japan is still unclear. Here, we present an evaluation of Face2Gene using the following two patient groups recruited in Japan: Group 1 consisting of 74 patients with 47 congenital dysmorphic syndromes, and Group 2 consisting of 34 patients with Down syndrome. In Group 1, facial recognition failed for 4 of 74 patients, while 13-21 of 70 patients had a diagnosis for which Face2Gene had not been trained. Omitting these 21 patients, for 85.7% (42/49) of the remainder, the correct syndrome was identified within the top 10 suggested list. In Group 2, for the youngest facial images taken for each of the 34 patients, Down syndrome was successfully identified as the highest-ranking condition using images taken from newborns to those aged 25 years. For the oldest facial images taken at ≥20 years in each of 17 applicable patients, Down syndrome was successfully identified as the highest- and second-highest-ranking condition in 82.2% (14/17) and 100% (17/17) of the patients using images taken from 20 to 40 years. These results suggest that Face2Gene in its current format is already useful in suggesting candidate syndromes to clinical geneticists, using patients with congenital dysmorphic syndromes in Japan.

Mishima Hiroyuki, Suzuki Hisato, Doi Michiko, Miyazaki Mutsuko, Watanabe Satoshi, Matsumoto Tadashi, Morifuji Kanako, Moriuchi Hiroyuki, Yoshiura Koh-Ichiro, Kondoh Tatsuro, Kosaki Kenjiro



In Scientific reports

Muscle fat infiltration (MFI) of the deep cervical spine extensors has been observed in cervical spine conditions using time-consuming and rater-dependent manual techniques. Deep learning convolutional neural network (CNN) models have demonstrated state-of-the-art performance in segmentation tasks. Here, we train and test a CNN for muscle segmentation and automatic MFI calculation using high-resolution fat-water images from 39 participants (26 female, average = 31.7 ± 9.3 years) 3 months post whiplash injury. First, we demonstrate high test reliability and accuracy of the CNN compared to manual segmentation. Then we explore the relationships between CNN muscle volume, CNN MFI, and clinical measures of pain and neck-related disability. Across all participants, we demonstrate that CNN muscle volume was negatively correlated to pain (R = -0.415, p = 0.006) and disability (R = -0.286, p = 0.045), while CNN MFI tended to be positively correlated to disability (R = 0.214, p = 0.105). Additionally, CNN MFI was higher in participants with persisting pain and disability (p = 0.049). Overall, CNN's may improve the efficiency and objectivity of muscle measures allowing for the quantitative monitoring of muscle properties in disorders of and beyond the cervical spine.

Weber Kenneth A, Smith Andrew C, Wasielewski Marie, Eghtesad Kamran, Upadhyayula Pranav A, Wintermark Max, Hastie Trevor J, Parrish Todd B, Mackey Sean, Elliott James M



In Journal of the American Medical Informatics Association : JAMIA

OBJECTIVE : This article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.

MATERIALS AND METHODS : The clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches.

RESULTS : Our best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction.

CONCLUSION : In this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.

Wei Qiang, Ji Zongcheng, Li Zhiheng, Du Jingcheng, Wang Jingqi, Xu Jun, Xiang Yang, Tiryaki Firat, Wu Stephen, Zhang Yaoyun, Tao Cui, Xu Hua


adverse drug events, deep learning, electronic health records, named entity recognition, relation extraction


In BMC bioinformatics

BACKGROUND : The recent success of immunotherapy in treating tumors has attracted increasing interest in research related to the adaptive immune system in the tumor microenvironment. Recent advances in next-generation sequencing technology enabled the sequencing of whole T-cell receptors (TCRs) and B-cell receptors (BCRs)/immunoglobulins (Igs) in the tumor microenvironment. Since BCRs/Igs in tumor tissues have high affinities for tumor-specific antigens, the patterns of their amino acid sequences and other sequence-independent features such as the number of somatic hypermutations (SHMs) may differ between the normal and tumor microenvironments. However, given the high diversity of BCRs/Igs and the rarity of recurrent sequences among individuals, it is far more difficult to capture such differences in BCR/Ig sequences than in TCR sequences. The aim of this study was to explore the possibility of discriminating BCRs/Igs in tumor and in normal tissues, by capturing these differences using supervised machine learning methods applied to RNA sequences of BCRs/Igs.

RESULTS : RNA sequences of BCRs/Igs were obtained from matched normal and tumor specimens from 90 gastric cancer patients. BCR/Ig-features obtained in Rep-Seq were used to classify individual BCR/Ig sequences into normal or tumor classes. Different machine learning models using various features were constructed as well as gradient boosting machine (GBM) classifier combining these models. The results demonstrated that BCR/Ig sequences between normal and tumor microenvironments exhibit their differences. Next, by using a GBM trained to classify individual BCR/Ig sequences, we tried to classify sets of BCR/Ig sequences into normal or tumor classes. As a result, an area under the curve (AUC) value of 0.826 was achieved, suggesting that BCR/Ig repertoires have distinct sequence-level features in normal and tumor tissues.

CONCLUSIONS : To the best of our knowledge, this is the first study to show that BCR/Ig sequences derived from tumor and normal tissues have globally distinct patterns, and that these tissues can be effectively differentiated using BCR/Ig repertoires.

Konishi Hiroki, Komura Daisuke, Katoh Hiroto, Atsumi Shinichiro, Koda Hirotomo, Yamamoto Asami, Seto Yasuyuki, Fukayama Masashi, Yamaguchi Rui, Imoto Seiya, Ishikawa Shumpei


B-cell receptor/immunoglobulin, Cancer, Machine learning

Follow Us on

This email was sent to <<Email Address *>>
why did I get this?    unsubscribe from this list    update subscription preferences
Stanford · 353 Serra Mall · Stanford, CA 94305-5008 · USA

Email Marketing Powered by Mailchimp