7 Sessions at ODSC East 2023 to Help You Perform NLP Better

ODSC - Open Data Science
5 min readApr 11, 2023

A lot goes into NLP. Languages, dialects, unstructured data, and unique business needs all contribute to requiring constant innovation from the field. Going beyond NLP platforms and skills alone, having expertise in novel processes, and staying afoot in the latest research are becoming pivotal for effective NLP implementation. We looked at a number of NLP sessions coming to ODSC East this May 9th-11th that highlight changes in the growing field and to perform NLP better.

From Big Data to NLP insights: Getting started with PySpark and Spark NLP

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data over the last few years and is now a critical part of the data science toolbox. In recent years, text data is increasingly becoming more common as new techniques to work with them become popular.

This workshop will introduce you to the fundamentals of PySpark (Spark’s Python API), the Spark NLP library, and other best practices in Spark programming when working with textual or natural language data.

Speaker: Akash Tandon, Co-Founder and Co-author of Advanced Analytics with PySpark | Looppanel and O’Reilly Media

Self-Supervised and Unsupervised Learning for Conversational AI and NLP

Self-supervised and Unsupervised learning techniques such as Few-shot and Zero-shot learning are changing the shape of AI research and product community. We have seen these techniques advancing multiple fields in AI such as NLP, Computer Vision, and Robotics.

In this talk, Chandra will be giving some background in Conversational AI, and NLP along with Self-supervised and Unsupervised techniques. Transformers-based large language models (LLMs) such as GPT-3, Jurasic, and T5 have been foundational to the advances that we see. Chandra will walk the audience through hands-on examples and how they can leverage transformers and large language models for few-shot, zero-shot learning in a variety of NLP applications such as text classification, summarization, and question-answering.

Speaker: Chandra Khatri | Chief Scientist, Head of AI, and Co-Founder | Got It AI

Hyper-productive NLP with Hugging Face Transformers

In this workshop, you’ll walk through a complete end-to-end example of using Hugging Face Transformers, involving both our open-source libraries and some of our commercial products. Starting from a dataset containing real-life product reviews from Amazon.com, you’ll train and deploy a text classification model predicting the star rating for similar reviews.

Julien Simon | Chief Evangelist | Hugging Face

Interpreting Features in Deep Networks

Despite significant advances in interpretable machine learning in recent years, many ML models — especially deep networks — to understand and control. One promising new direction in interpretable deep learning aims to understand models by understanding their learned features and internal representation.

This tutorial will survey state-of-the-art techniques for feature-level interpretability, with a focus on vision and language processing applications. We’ll learn how to automatically discover and describe the function of individual neurons within deep networks, and use these descriptions to identify model failures and improve their robustness. This tutorial is targeted at learners who have experience with neural network models and are interested in gaining a deeper understanding of how they work.

Speaker: Jacob Andreas, PhD | Assistant Professor | MIT

Creating a Custom Vocabulary for NLP tasks using exBERT and spaCY

For NLP tasks, the first step is to pre-process text for training. Let’s say you have the English language model, you will have a model that includes over 1 million items of vocabulary, many classes of entity recognition, and a lot of compound noun recognition. But what happens when we need to add new terms and customize the vocabulary?

In this tutorial, we will show an approach to how to create a custom vocabulary that can be further used for any NLP tasks.

Speaker: Swagata Ashwani | Senior Data Scientist | Boomi

Mastering Adversarial Evaluation for NLP: A Practical Workshop

The development of advanced deep neural language models has revolutionized the performance of various natural language processing (NLP) tasks. However, these models are increasingly intricate and less comprehensible, making them particularly vulnerable to failure when exposed to input data that is different from the data used for training. This brittleness of neural language models presents a significant challenge, as their complexity continues to increase. Unless this issue is addressed, progress in NLP could be hindered and the potential benefits of these models may not be fully realized.

This workshop will equip participants with the skills and knowledge to conduct an adversarial evaluation of NLP systems.

Speaker: Panos Alexopoulos, PhD | Head of Ontology | Textkernel BV

Bagging to BERT — A Tour of Applied NLP

Most data we encounter is “unstructured” which means it needs additional processing in order to be used in decision-making. Often these data are text, coming in the form of comment fields, notes, and descriptions. This is being enabled by a wide array of open-source NLP libraries such as spaCy and HuggingFace’s Transformers.

In this workshop, we will explore some popular NLP techniques that have broad applicability. From the basics of bagging and word vectors to the creation of contextualized representations of words and sentences, the workshop will equip participants with the tools they need to turn raw text data into useful insights.

Speaker: Benjamin Batorsky, PhD | Senior Data Scientist | Institute for Experiential AI at Northeastern University

Perform NLP Better with Training at ODSC East 2023

We just listed off quite a few skills, platforms, topics, and frameworks. It’s not expected to know every single thing mentioned above, but knowing a good chunk of them — and how to apply them in business settings — will help you get a job or become better at your current one. At ODSC East 2023 this May, we have an entire track devoted to NLP. Learn NLP skills and platforms like the ones listed above!

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.