View this email in your browser

Doctor Penguin Weekly

Welcome to the fifth week for the Doctor Penguin newsletter! 

Over the past week, these AI papers caught our attention: 
Zhou et al. use machine learning (gaussian mixture models) for movement tracking during habituation and social interaction to determine differences in social beviours between SHANK3-mutant macaques and controls. Similarly, Capozzi et al. use machine learning methods to establish dependencies between leadership and gaze behaviors in group interactions. Bogard et al. use deep learning to predict Alternative polyadenylation, a major driver of transcriptome diversity in human cells, from DNA sequence alone. Lee et al. use convolutional neural networks on raw protein sequences to predict drug-target interactions.
-- Eric Topol & Pranav Rajpurkar  

Quick Links:

  1. Atypical behaviour and connectivity in SHANK3-mutant macaques
  2. Tracking the Leader: Gaze Behavior in Group Interactions
  3. A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation
  4. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
  5. A machine learning approach to predicting psychosis using semantic density and latent content analysis
  6. Performance of a Deep-Learning Algorithm vs Manual Grading for Detecting Diabetic Retinopathy in India
  7. Assessment of Machine Learning Detection of Environmental Enteropathy and Celiac Disease in Children

Atypical behaviour and connectivity in SHANK3-mutant macaques.


In Nature

Mutation or disruption of the SH3 and ankyrin repeat domains 3 (SHANK3) gene represents a highly penetrant, monogenic risk factor for autism spectrum disorder, and is a cause of Phelan-McDermid syndrome. Recent advances in gene editing have enabled the creation of genetically engineered non-human-primate models, which might better approximate the behavioural and neural phenotypes of autism spectrum disorder than do rodent models, and may lead to more effective treatments. Here we report CRISPR-Cas9-mediated generation of germline-transmissible mutations of SHANK3 in cynomolgus macaques (Macaca fascicularis) and their F1 offspring. Genotyping of somatic cells as well as brain biopsies confirmed mutations in the SHANK3 gene and reduced levels of SHANK3 protein in these macaques. Analysis of data from functional magnetic resonance imaging revealed altered local and global connectivity patterns that were indicative of circuit abnormalities. The founder mutants exhibited sleep disturbances, motor deficits and increased repetitive behaviours, as well as social and learning impairments. Together, these results parallel some aspects of the dysfunctions in the SHANK3 gene and circuits, as well as the behavioural phenotypes, that characterize autism spectrum disorder and Phelan-McDermid syndrome.

Zhou Yang, Sharma Jitendra, Ke Qiong, Landman Rogier, Yuan Jingli, Chen Hong, Hayden David S, Fisher John W, Jiang Minqing, Menegas William, Aida Tomomi, Yan Ting, Zou Ying, Xu Dongdong, Parmar Shivangi, Hyman Julia B, Fanucci-Kiss Adrian, Meisner Olivia, Wang Dongqing, Huang Yan, Li Yaqing, Bai Yanyang, Ji Wenjing, Lai Xinqiang, Li Weiqiang, Huang Lihua, Lu Zhonghua, Wang Liping, Anteraper Sheeba A, Sur Mriganka, Zhou Huihui, Xiang Andy Peng, Desimone Robert, Feng Guoping, Yang Shihua



In iScience

Can social gaze behavior reveal the leader during real-world group interactions? To answer this question, we developed a novel tripartite approach combining (1) computer vision methods for remote gaze estimation, (2) a detailed taxonomy to encode the implicit semantics of multi-party gaze features, and (3) machine learning methods to establish dependencies between leadership and visual behaviors. We found that social gaze behavior distinctively identified group leaders. Crucially, the relationship between leadership and gaze behavior generalized across democratic and autocratic leadership styles under conditions of low and high time-pressure, suggesting that gaze can serve as a general marker of leadership. These findings provide the first direct evidence that group visual patterns can reveal leadership across different social behaviors and validate a new promising method for monitoring natural group interactions.

Capozzi Francesca, Beyan Cigdem, Pierro Antonio, Koul Atesh, Murino Vittorio, Livi Stefano, Bayliss Andrew P, Ristic Jelena, Becchio Cristina


Behavioral Neuroscience, Neuroscience, Social Interaction

A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation.


In Cell

Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over 3 million APA reporters. APARENT's predictions are highly accurate when tasked with inferring APA in synthetic and human 3'UTRs. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of 3' end processing, and integrates these features into a comprehensive, interpretable, cis-regulatory code. We apply APARENT to forward engineer functional polyadenylation signals with precisely defined cleavage position and isoform usage and validate predictions experimentally. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.

Bogard Nicholas, Linder Johannes, Rosenberg Alexander B, Seelig Georg


MPRA, SNV, alternative polyadenylation, cis-regulation, deep learning, generative model, mRNA processing, machine learning, massively parallel reporter assay, single nucleotide variant, synthetic biology

DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences.


In PLoS computational biology

Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at

Lee Ingoo, Keum Jongsoo, Nam Hojung


A machine learning approach to predicting psychosis using semantic density and latent content analysis.



In NPJ schizophrenia

Subtle features in people's everyday language may harbor the signs of future mental illness. Machine learning offers an approach for the rapid and accurate extraction of these signs. Here we investigate two potential linguistic indicators of psychosis in 40 participants of the North American Prodrome Longitudinal Study. We demonstrate how the linguistic marker of semantic density can be obtained using the mathematical method of vector unpacking, a technique that decomposes the meaning of a sentence into its core ideas. We also demonstrate how the latent semantic content of an individual's speech can be extracted by contrasting it with the contents of conversations generated on social media, here 30,000 contributors to Reddit. The results revealed that conversion to psychosis is signaled by low semantic density and talk about voices and sounds. When combined, these two variables were able to predict the conversion with 93% accuracy in the training and 90% accuracy in the holdout datasets. The results point to a larger project in which automated analyses of language are used to forecast a broad range of mental disorders well in advance of their emergence.

Rezaii Neguine, Walker Elaine, Wolff Phillip


Performance of a Deep-Learning Algorithm vs Manual Grading for Detecting Diabetic Retinopathy in India.


In JAMA ophthalmology

Importance : More than 60 million people in India have diabetes and are at risk for diabetic retinopathy (DR), a vision-threatening disease. Automated interpretation of retinal fundus photographs can help support and scale a robust screening program to detect DR.

Objective : To prospectively validate the performance of an automated DR system across 2 sites in India.

Design, Setting, and Participants : This prospective observational study was conducted at 2 eye care centers in India (Aravind Eye Hospital and Sankara Nethralaya) and included 3049 patients with diabetes. Data collection and patient enrollment took place between April 2016 and July 2016 at Aravind and May 2016 and April 2017 at Sankara Nethralaya. The model was trained and fixed in March 2016.

Interventions : Automated DR grading system compared with manual grading by 1 trained grader and 1 retina specialist from each site. Adjudication by a panel of 3 retinal specialists served as the reference standard in the cases of disagreement.

Main Outcomes and Measures : Sensitivity and specificity for moderate or worse DR or referable diabetic macula edema.

Results : Of 3049 patients, 1091 (35.8%) were women and the mean (SD) age for patients at Aravind and Sankara Nethralaya was 56.6 (9.0) years and 56.0 (10.0) years, respectively. For moderate or worse DR, the sensitivity and specificity for manual grading by individual nonadjudicator graders ranged from 73.4% to 89.8% and from 83.5% to 98.7%, respectively. The automated DR system's performance was equal to or exceeded manual grading, with an 88.9% sensitivity (95% CI, 85.8-91.5), 92.2% specificity (95% CI, 90.3-93.8), and an area under the curve of 0.963 on the data set from Aravind Eye Hospital and 92.1% sensitivity (95% CI, 90.1-93.8), 95.2% specificity (95% CI, 94.2-96.1), and an area under the curve of 0.980 on the data set from Sankara Nethralaya.

Conclusions and Relevance : This study shows that the automated DR system generalizes to this population of Indian patients in a prospective setting and demonstrates the feasibility of using an automated DR grading system to expand screening programs.

Gulshan Varun, Rajan Renu P, Widner Kasumi, Wu Derek, Wubbels Peter, Rhodes Tyler, Whitehouse Kira, Coram Marc, Corrado Greg, Ramasamy Kim, Raman Rajiv, Peng Lily, Webster Dale R


Assessment of Machine Learning Detection of Environmental Enteropathy and Celiac Disease in Children.


In JAMA network open

Importance : Duodenal biopsies from children with enteropathies associated with undernutrition, such as environmental enteropathy (EE) and celiac disease (CD), display significant histopathological overlap.

Objective : To develop a convolutional neural network (CNN) to enhance the detection of pathologic morphological features in diseased vs healthy duodenal tissue.

Design, Setting, and Participants : In this prospective diagnostic study, a CNN consisting of 4 convolutions, 1 fully connected layer, and 1 softmax layer was trained on duodenal biopsy images. Data were provided by 3 sites: Aga Khan University Hospital, Karachi, Pakistan; University Teaching Hospital, Lusaka, Zambia; and University of Virginia, Charlottesville. Duodenal biopsy slides from 102 children (10 with EE from Aga Khan University Hospital, 16 with EE from University Teaching Hospital, 34 with CD from University of Virginia, and 42 with no disease from University of Virginia) were converted into 3118 images. The CNN was designed and analyzed at the University of Virginia. The data were collected, prepared, and analyzed between November 2017 and February 2018.

Main Outcomes and Measures : Classification accuracy of the CNN per image and per case and incorrect classification rate identified by aggregated 10-fold cross-validation confusion/error matrices of CNN models.

Results : Overall, 102 children participated in this study, with a median (interquartile range) age of 31.0 (20.3-75.5) months and a roughly equal sex distribution, with 53 boys (51.9%). The model demonstrated 93.4% case-detection accuracy and had a false-negative rate of 2.4%. Confusion metrics indicated most incorrect classifications were between patients with CD and healthy patients. Feature map activations were visualized and learned distinctive patterns, including microlevel features in duodenal tissues, such as alterations in secretory cell populations.

Conclusions and Relevance : A machine learning-based histopathological analysis model demonstrating 93.4% classification accuracy was developed for identifying and differentiating between duodenal biopsies from children with EE and CD. The combination of the CNN with a deconvolutional network enabled feature recognition and highlighted secretory cells' role in the model's ability to differentiate between these histologically similar diseases.

Syed Sana, Al-Boni Mohammad, Khan Marium N, Sadiq Kamran, Iqbal Najeeha T, Moskaluk Christopher A, Kelly Paul, Amadi Beatrice, Ali S Asad, Moore Sean R, Brown Donald E


Follow Us on

This email was sent to <<Email Address *>>
why did I get this?    unsubscribe from this list    update subscription preferences
Stanford · 353 Serra Mall · Stanford, CA 94305-5008 · USA

Email Marketing Powered by Mailchimp