View this email in your browser
Co-curators Sarah Catanzaro and Naren Krishna ● October 3, 2019


Regression Planning Networks

by Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Recent machine learning approaches are often contrasted with symboling programming approaches, in which rules and states are manually encoded. However, both have limitations when applied to planning problems (e.g. that require an agent to evaluate long-term strategies). Xu et al. combine the best of both approaches, presenting a learning-to-plan method, Regression Planning Networks, that generates a long-term plan towards a symbolic task goal based on environmental observations. Their approach implements backward planning in a symbolic space, which searches for a path that connects the final goal to the current observation. Read the paper >> 

Interpretations are useful: penalizing explanations to align neural networks with prior knowledge

by Laura Rieger, Chandan Singh, W. James Murdoch, Bin Yu

While most approaches to deep learning explanations offer some insight into the model, fewer suggest actions for model improvement. Rieger et al. present context decomposition explanation penalization, which enables the user to directly penalize certain feature importances and interactions once shown that a model has incorrectly assigned importances, thereby forcing the network to produce the correct prediction and explanation. They show how CDEP can be applied to evolve spurious confounding variables and/or bias in classification tasks. Read the paper >> 

Learning the Difference that Makes a Difference with Counterfactually-Augmented Data

by Divyansh Kaushik, Eduard Hovy, Zachary Lipton

Kaushik, Hovy, and Lipton define spurious associations as those that arise due to a confounding cause (and not direct or indirect effects). They introduce methods and resources for training models to avoid spurious associations, including through a human-in-the-loop system to counterfactually manipulate documents (e.g. where annotators edit documents to make the targeted/counterfactual class applicable with minimal changes). They demonstrate that sentiment classifier accuracy (for IMDB reviews) improves when trained on the revised dataset. They also release 2 datasets of counterfactual revisions (to IMDB and SNLI datasets). Read the paper >>



When building applications, developers often search for code (some say, this is why they drink so much whiskey). Machine learning can improve this process; however, benchmarks to evaluate code search engines did not exist until now. Github and Weights & Biases have released the CodeSearchNet Challenge evaluation environment and leaderboard, and a dataset with baseline models to facilitate the development of higher quality code search tools.  View the repo >>  and Read the blog >>



After nearly seven years of development to ensure stability, performance, and cost efficiency, Tencent has open-sourced TubeMQ, its distributing messaging queue system, which enables high-performance storage and transmission of very large datasets. View the repo >>

TensorFlow 2.0

Tensorflow 2.0’s much anticipated final release has finally arrived! Tensorflow 2.0 comes with default eager execution (operations return concrete values instead of constructing computational graphs to run later) and interfaces for low-level operations to facilitate development and debugging. Multi-GPU support and distributed training with minimal code changes offer increased training and inference performance. Additionally, ‘Tensorflow.js’ and ‘Swift for Tensorflow’ enable development not only in Python, but also in JavaScript and Swift, which in conjunction with a standardized SavedModel format, enable easy deployment across many domains -- cloud/web/mobile/embedded systems. View the blog post >>

Models for integrating data science teams within organizations

Pardis Noorzad, Medium

Pardis evaluates popular organizational models for data science teams based on coordination efficiency, management success, employee happiness, and product success. She considers the advantages and disadvantages of the following models: Centers of Excellence (most centralized), Accounting (DS team focuses on BI), Consultant (DS respond to tickets issued by other teams), Embedded (DS hired by and engaged with product teams), Democratic (self-serve analytics), Product Data Science (DS are part  of product team and DS function). Read the post >>

5 Problems to Solve to Unlock Peak Performance in Machine Learning Models

Megan DeLaunay and Erin Babinsky, Capital One

Megan DeLaunary and Erin Babinsky provide a few tips for machine learning teams on collecting labelled data, including advice on defining labeling schema, allocating annotators to tasks, and determining how much labelled data is necessaryRead the post >>

Streamlit launches open-source machine learning application development framework

Ron Miller, TechCrunch

To explore and analyze machine learning model inputs and outputs, model developers must often create custom interfaces (or solicit help from their tools team to build more elegant front-ends). To make this task easier, Streamlit has released an app framework that provides model developers with a set of “building blocks” to create ML apps for understanding and interacting with data. Read the article >>

Thanks for reading Projects To Know! If you have friends and colleagues who may be interested in subscribing, they can sign up here.

This is a weekly edition of the Projects To Know Newsletter, which features papers, OSS projects and select news stories that are playing a meaningful role in the future of software development.

If you have any suggestions on paper, projects or content to include we would love to hear from you at
Twitter Twitter
Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.