Artificial Intelligence Zone

What Is Rum data and why does it matter?

IBM Journey to AI blog

DECEMBER 18, 2023

What is RUM data? Contrary to what you might think, RUM data isn’t a performance indicator for Captain Morgan, Cuban tourism or a Disney film franchise. Real User Monitoring (RUM) data is information about how people interact with online applications and services. Are there alternatives to RUM data? Actually, yes!

Algorithm

Algorithm Automation AI AI

Will LLM and Generative AI Solve a 20-Year-Old Problem in Application Security?

Unite.AI

JUNE 14, 2023

The Magic of LLM in Security Generative AI is an advancement over older models used in machine learning algorithms that were great at classifying or clustering data based on trained learning of synthetic samples. GitHub) that are partially tagged for security issues.

LLM

LLM Generative AI Automation Machine Learning

Do Language Models Know When They Are Hallucinating? This AI Research from Microsoft and Columbia University Explores Detecting Hallucinations with the Creation of Probes

Marktechpost

DECEMBER 31, 2023

Trained on large amounts of textual data, these models perform various tasks, including generating meaningful responses to questions, text summarization, translations, text-to-text transformation, and code completion. Probes are basically the instruments or systems trained on the language model’s internal operations.

AI Researcher

AI Researcher AI Research Large Language Models Natural Language Processing

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Meet the Omnivore: Industrial Designer Blends Art and OpenUSD to Create 3D Assets for AI Training

NVIDIA

SEPTEMBER 19, 2023

The team uses NVIDIA Omniverse , a platform for developing and connecting 3D tools and applications, and Universal Scene Description — aka OpenUSD — to enhance its synthetic data generation pipelines. Boehmer creates realistic 3D assets that can be used with SORDI.ai , short for Synthetic Object Recognition Dataset for Industries.

AI

AI AI Software Development Automation

LLMs cannot find any more data, what are they going to do now?

Bitext

SEPTEMBER 22, 2023

If data is the oil of the AI industry, we are running out of data faster than out of oil. This lack of differentiation leads to AI applications that offer undifferentiated experiences since they are based on similar models with similar data and similar architectures. Definitely, we have a problem. What Solutions are Available?

LLM

LLM NLP Chatbots AI

Instana 2023: Recapping our latest innovation

IBM Journey to AI blog

JANUARY 26, 2024

Our team announced different product capabilities designed to simplify your teams’ ability to observe, debug, remediate and enhance your entire stack—integrating observability practices and telemetry data seamlessly into your entire software development lifecycle. Learn more by reading our documentation.

Automation

Automation DevOps Software Development Artificial Intelligence

Gretel AI Releases Largest Open Source Text-to-SQL Dataset to Accelerate Artificial Intelligence AI Model Training

Marktechpost

APRIL 4, 2024

In today’s age, the accuracy of data plays a crucial role in determining the efficiency of artificial intelligence (AI) systems. This move will significantly accelerate the training of AI models and will enhance the quality of data-driven insights across various industries.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence AI Modeling Data Scarcity

Introducing Synthetic Task Generation with ChatGPT in NLP Lab 5.2

John Snow Labs

AUGUST 7, 2023

NLP Lab, a pioneering No Code Platform designed for document annotation and model training, has just rolled out a powerful new feature – Synthetic Task Generation, integrated with ChatGPT. Let’s delve into this exciting update and how it promises to change the way we handle and analyze data in NLP Lab. And that’s not all!

NLP

NLP ChatGPT Data Scientist Natural Language Processing

How we achieved 89% accuracy on contract question answering

Snorkel AI

APRIL 2, 2024

We began with an out-of-the-box solution using the GPT-4 large language model (LLM) and OpenAI’s text data embeddings. I recently talked with Matt Casey, Snorkel AI’s data science content lead, about how we leveraged the power of Snorkel Flow to boost our system’s performance to production levels in just a few weeks.

Metadata

Metadata Large Language Models Machine Learning Data Science

Instana 2023: Recapping our latest innovation

IBM Journey to AI blog

JANUARY 26, 2024

Our team announced different product capabilities designed to simplify your teams’ ability to observe, debug, remediate and enhance your entire stack—integrating observability practices and telemetry data seamlessly into your entire software development lifecycle. Learn more by reading our documentation.

Automation

Automation DevOps Software Development Artificial Intelligence

This Paper Unveils ‘Mach’ (Make-A-Character): Revolutionizing 3D Character Creation with Machine Learning for the AI and Metaverse Era

Marktechpost

DECEMBER 31, 2023

To collect ground truth data, they captured the faces of 193 individuals under uniform illumination and artificially created textures under varying illumination conditions. To improve data diversity and avoid overfitting, they augmented the skin colors of the ground truth diffuse albedos based on the Individual Typology Angle (ITA).

Machine Learning

Machine Learning LLM AI AI

You cannot use GPT for CX purposes? Wrong, yes you can

Bitext

OCTOBER 11, 2023

This tagging documents the linguistic reasons behind every variant; for example, “colloquial” is the tag for sentences like “do u wanna send my order asap”. With this dataset we then fine-tuned an instance of GPT 3.5 and, finally, we evaluated the answers from the two instances: Playground Instance of GPT 3.5

ChatGPT

ChatGPT Chatbots AI AI

How to change your order in 10,000 different ways

Bitext

FEBRUARY 15, 2022

To illustrate this variety in practice, with this post we release a tagged dataset that contains 10,000 ways of asking for an order modification, in English this time. If you don’t have historical data to leverage –or if you just want to avoid privacy issues, the typical answer is generating and tagging this data by hand.

Chatbots

Chatbots Automation AI AI

How Wayfair built better, faster catalog tagging with Snorkel Flow

Snorkel AI

AUGUST 22, 2023

What are product tags? We use product tags to organize and store descriptive information about our products. These tags capture specific attributes of each product, such as its color, design, and pattern, in a structured manner. Examples of theme-related tags include “Outer Space Rug,” “Bird Decorative Object,” etc.

Computer Vision

Computer Vision Machine Learning Data Scientist Generative AI

How Wayfair built better, faster catalog tagging with Snorkel Flow

Snorkel AI

AUGUST 22, 2023

What are product tags? We use product tags to organize and store descriptive information about our products. These tags capture specific attributes of each product, such as its color, design, and pattern, in a structured manner. Examples of theme-related tags include “Outer Space Rug,” “Bird Decorative Object,” etc.

Computer Vision

Computer Vision Machine Learning Data Scientist Generative AI

Introducing a new breed of data to finetune LLMS: hybrid datasets

Bitext

SEPTEMBER 27, 2023

Download the dataset on our Github or Huggingface profile The best of synthetic data and expert curation. This collection of data serves as a valuable resource for companies, research teams, universities, and AI enthusiasts seeking to expand the potential of their LLMs. Some ideas and a sample.

Large Language Models

Large Language Models Conversational AI LLM Chatbots

Top Synthetic Data Tools/Startups For Machine Learning Models in 2023

Marktechpost

JULY 17, 2023

Information created intentionally rather than as a result of actual events is known as synthetic data. Synthetic data is generated algorithmically and used to train machine learning models, validate mathematical models, and act as a stand-in for test production or operational data test datasets.

Machine Learning

Machine Learning Data Scientist Computer Vision Deep Learning

The Evolution of Tabular Data: From Analysis to AI

Towards AI

AUGUST 11, 2023

Discover how tabular data space is being transformed by Kaggle competitions, the open-source community, and Generative AI. Image by Author Introduction Tabular data refers to data organized into rows and columns. Traditionally, tabular data has been used for simply organizing and reporting information.

Machine Learning

Machine Learning Data Analysis Deep Learning AI

Researchers From Binghamton University Introduce A Privacy-Enhancing Anonymization System (My Face, My Choice) For Everyone To Have Control Over Their Faces In Social Photo Sharing Networks

Marktechpost

JULY 2, 2023

The ability to recognize and identify individuals through their facial features raises questions about consent, control over personal data, and potential misuse. The current tagging systems in social networks need to adequately address the problem of unwanted or unapproved faces appearing in photos.

Algorithm

Algorithm AI Tools AI Researcher AI Research

Empowering Model Sharing, Enhanced Annotation, and Azure Blob Backups in NLP Lab

John Snow Labs

OCTOBER 12, 2023

Improvements Improved User Experience for Synthetic Text Generation In version 5.4, we have overhauled the Synthetic Text Generation page to provide a more user-friendly and efficient experience while preserving all the familiar features and capabilities. With version 5.4, We have now resolved this issue.

NLP

NLP Auto-complete Auto-classification Prompt Engineer

Predicting Retrosynthesis in a Single Step by Incorporating chemists’ Insights with AI Models

Marktechpost

SEPTEMBER 10, 2023

In organic synthesis, molecules are built through organic processes, making it an important branch of synthetic chemistry. Model performance is significantly enhanced on the data subset from which substructures were successfully recovered. Without any help from humans, they can extract the underlying structures.

AI Modeling

AI Modeling Neural Network Natural Language Processing AI

Improve LLM performance with human and AI feedback on Amazon SageMaker for Amazon Engineering

AWS Machine Learning Blog

APRIL 24, 2024

In this post, we share how we analyzed the feedback data and identified limitations of accuracy and hallucinations RAG provided, and used the human evaluation score to train the model through reinforcement learning. When the training data is ready, further tune the model using reinforcement learning from human feedback (RLHF).

LLM

LLM AI AI Generative AI

Advance RAG- Improve RAG performance

Mlearning.ai

FEBRUARY 26, 2024

Post-Retrieval Next, the RAG model augments the user input (or prompts) by adding the relevant retrieved data in context (query + context). Pre-Retrieval Optimisation Pre-retrieval techniques include improving the quality of indexed data and chunk optimisation. This process creates a knowledge library that the LLM can understand.

Metadata

Metadata Large Language Models LLM Neural Network

Getting ready for artificial general intelligence with examples

IBM Journey to AI blog

APRIL 18, 2024

While AGI remains theoretical, organizations can take proactive steps to prepare for its arrival by building a robust data infrastructure and fostering a collaborative environment where humans and AI work together seamlessly. NLP techniques help them parse the nuances of human language, including grammar, syntax and context.

Neural Network

Neural Network LLM AI AI

The most valuable AI use cases for business

IBM Journey to AI blog

FEBRUARY 14, 2024

Promote cross- and up-selling Recommendation engines use consumer behavior data and AI algorithms to help discover data trends to be used in the development of more effective up-selling and cross-selling strategies, resulting in more useful add-on recommendations for customers during checkout for online retailers.

Computer Vision

Computer Vision Automation Robotics AI

Watch all Future of Data-Centric AI 2023 videos now!

Snorkel AI

OCTOBER 12, 2023

Snorkel AI hosted the 2023 installment of its Future of Data-Centric AI virtual conference in June. The two-day event brought together researchers, practitioners, and industry leaders to discuss the latest trends and advances in data-centric AI, and we recorded each session as a video.

Data Scientist

Data Scientist AI AI ML

Multimodal Language Models Explained: Visual Instruction Tuning

Towards AI

AUGUST 9, 2023

Similarly, MM-ReAct [2] incorporates visual information in the forms of image captioning, dense captioning, image tagging, etc., 20] On the other hand, finetuning a language model is data-hungry which makes the approach less applicable when domain-specific data is limited. inside the prompt to feed to the LLM. Zhu et al. [9]

Explainability

Explainability LLM ChatGPT Large Language Models

What AI Music Generators Can Do (And How They Do It)

AssemblyAI

SEPTEMBER 22, 2023

And how are text-based or image-based techniques used to generate synthetic AI music? Data scarcity: Paired natural anguage descriptions of music and corresponding music recordings are extremely scarce, in contrast to the abundance of image/descriptions pairs available online, e.g. in online art galleries or social media.

Convolutional Neural Networks

Convolutional Neural Networks AI AI Data Scarcity

Watch all Future of Data-Centric AI 2023 videos now!

Snorkel AI

OCTOBER 12, 2023

Snorkel AI hosted the 2023 installment of its Future of Data-Centric AI virtual conference in June. The two-day event brought together researchers, practitioners, and industry leaders to discuss the latest trends and advances in data-centric AI, and we recorded each session as a video.

Data Scientist

Data Scientist AI AI ML

CMU Researchers Introduce BUTD-DETR: An Artificial Intelligence (AI) Model That Conditions Directly On A Language Utterance And Detects All Objects That The Utterance Mentions

Marktechpost

JULY 23, 2023

It is trained on image-language pairings tagged with the bounding boxes for all items alluded to in the speech, as well as fixed-vocab object detection datasets. In this approach, a single model can ground language and recognize objects while sharing the same training data for both tasks.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence AI Modeling Computer Vision

? Guest Post: Adala – The First Open Source Data-Labeling Agent*

TheSequence

OCTOBER 30, 2023

In this guest post, Jimmy Whitaker, Data Scientist in Residence at Human Signal, introduces Adala, an Autonomous Data Labeling Agent framework that harmonizes AI's computational power with human judgment. Enter Adala, a new Autonomous Data Labeling Agent framework. It’s open-source and you can contribute to it!

LLM

LLM Data Scientist Computer Vision Large Language Models

NLP Summit Insights: How LLMs Are Shaping the Future of Modern Business

John Snow Labs

SEPTEMBER 26, 2023

According to the data collected by Forbes , over a half (53.3% to be precise) of data scientists and engineers plan to deploy Large Language Model (LLM) applications into production in the next 12 months or “as soon as possible.” Additionally, the data indicates that 8.3%

NLP

NLP Large Language Models LLM Data Scientist

AI-Fueled Productivity: Generative AI Opens New Era of Efficiency Across Industries

NVIDIA

JULY 13, 2023

Whether predicting protein structures or securely training algorithms on large real-world and synthetic datasets, generative AI and accelerated computing are opening new areas of research that can help mitigate the spread of disease, enable personalized medical treatments and boost patient survival rates.

Generative AI

Generative AI AI AI Natural Language Processing

Meeting minutes generation with ChatGPT 4 API, Google Meet, Google Drive & Docs APIs

Becoming Human

JUNE 14, 2023

It will stay synthetic whatever the duration of the meeting. One of the improvements of this code could be the execution of data masking before calling the ChatGPT API. At least the attendee names and additional tagged fields containing sensitive information should be masked. When known, a due date is also inserted.

ChatGPT

ChatGPT Python OpenAI Artificial Intelligence

LLM distillation demystified: a complete guide

Snorkel AI

FEBRUARY 13, 2024

While rarely an endpoint, large language model (LLM) distillation lets data science teams kickstart the data development process and get to a production-ready model faster than they could with traditional approaches. LLM distillation is when data scientists use LLMs to train smaller models. That’s where distillation comes in.

LLM

LLM Data Scientist Neural Network Data Science

LLM distillation demystified: a complete guide

Snorkel AI

FEBRUARY 13, 2024

While rarely an endpoint, large language model (LLM) distillation lets data science teams kickstart the data development process and get to a production-ready model faster than they could with traditional approaches. LLM distillation is when data scientists use LLMs to train smaller models. That’s where distillation comes in.

LLM

LLM Data Scientist Neural Network Data Science

Use RAG for drug discovery with Knowledge Bases for Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 29, 2024

Knowledge Bases for Amazon Bedrock automates synchronization of your data with your vector store, including diffing the data when it’s updated, document loading, and chunking, as well as semantic embedding. RAG is a popular technique that combines the use of private data with large language models (LLMs). Choose Next.

Machine Learning

Machine Learning Computer Vision Generative AI Explainability

Comcast’s data-centric approach to speech interfaces

Snorkel AI

FEBRUARY 13, 2023

Jan Neumann, Vice President, Machine Learning, Comcast Applied AI and Discovery gave a presentation entitled “Data-Centric AI in Comcast’s Voice and Conversational Interfaces” at Snorkels Future of Data-Centric AI conference in 2022. But let’s focus on the use-case of data-centric AI for Voice.

Metadata

Metadata Machine Learning Deep Learning BERT

Comcast’s data-centric approach to speech interfaces

Snorkel AI

FEBRUARY 13, 2023

Jan Neumann, Vice President, Machine Learning, Comcast Applied AI and Discovery gave a presentation entitled “Data-Centric AI in Comcast’s Voice and Conversational Interfaces” at Snorkels Future of Data-Centric AI conference in 2022. But let’s focus on the use-case of data-centric AI for Voice.

Metadata

Metadata Machine Learning Deep Learning BERT

Sierra Division Studios Presents Three Epic Projects Built With NVIDIA Omniverse

NVIDIA

JULY 11, 2023

Combined with synthetic data , these assets can help solve real-world problems in simulation, including for AI-powered 3D artists. Join by show us a photo/video of how one of your art projects started and then one of the final result + tag #StartToFinish. Welcome to our new #StartToFinish challenge.

AI

AI AI

Multi-domain Multilingual Question Answering

Sebastian Ruder

DECEMBER 6, 2021

Multi-Domain QA Models In most cases when learning in a multi-domain setting there may be limited or no labelled data available in the target domain available. Unsupervised Domain Adaptation for QA Unsupervised domain adaptation for QA assumes access to labelled data in a source domain and unlabelled target domain data.

BERT

BERT NLP Natural Language Processing Computational Linguistics

74 Summaries of Machine Learning and NLP Research

Marek Rei

NOVEMBER 12, 2019

The model is optimized on unlabaled data by 1) predicting masked words in the input sequence, and 2) predicting whether the input sequences occur together. InferLite: Simple Universal Sentence Representations from Natural Language Inference Data Jamie Kiros, William Chan. The model is optimised using NLI data. NAACL 2019.

Machine Learning

Machine Learning NLP Neural Network BERT

ACL 2022 Highlights

Sebastian Ruder

JUNE 6, 2022

They also made practical suggestions to encourage more work on such languages: creating data resources; establishing a conference track for work on low-resource and endangered languages; and encouraging researchers to apply their systems to low-resource language data. The KinyaBERT model architecture. Both CANINE ( Clark et al. )

NLP

NLP Natural Language Processing Computational Linguistics Neural Network

What Is Rum data and why does it matter?

Will LLM and Generative AI Solve a 20-Year-Old Problem in Application Security?

Webinars

Trending Sources

Do Language Models Know When They Are Hallucinating? This AI Research from Microsoft and Columbia University Explores Detecting Hallucinations with the Creation of Probes

Webinars

Meet the Omnivore: Industrial Designer Blends Art and OpenUSD to Create 3D Assets for AI Training

LLMs cannot find any more data, what are they going to do now?

Instana 2023: Recapping our latest innovation

Gretel AI Releases Largest Open Source Text-to-SQL Dataset to Accelerate Artificial Intelligence AI Model Training

Introducing Synthetic Task Generation with ChatGPT in NLP Lab 5.2

How we achieved 89% accuracy on contract question answering

Instana 2023: Recapping our latest innovation

This Paper Unveils ‘Mach’ (Make-A-Character): Revolutionizing 3D Character Creation with Machine Learning for the AI and Metaverse Era

You cannot use GPT for CX purposes? Wrong, yes you can

How to change your order in 10,000 different ways

How Wayfair built better, faster catalog tagging with Snorkel Flow

How Wayfair built better, faster catalog tagging with Snorkel Flow

Introducing a new breed of data to finetune LLMS: hybrid datasets

Top Synthetic Data Tools/Startups For Machine Learning Models in 2023

The Evolution of Tabular Data: From Analysis to AI

Researchers From Binghamton University Introduce A Privacy-Enhancing Anonymization System (My Face, My Choice) For Everyone To Have Control Over Their Faces In Social Photo Sharing Networks

Empowering Model Sharing, Enhanced Annotation, and Azure Blob Backups in NLP Lab

Predicting Retrosynthesis in a Single Step by Incorporating chemists’ Insights with AI Models

Improve LLM performance with human and AI feedback on Amazon SageMaker for Amazon Engineering

Advance RAG- Improve RAG performance

Getting ready for artificial general intelligence with examples

The most valuable AI use cases for business

Watch all Future of Data-Centric AI 2023 videos now!

Multimodal Language Models Explained: Visual Instruction Tuning

What AI Music Generators Can Do (And How They Do It)

Watch all Future of Data-Centric AI 2023 videos now!

CMU Researchers Introduce BUTD-DETR: An Artificial Intelligence (AI) Model That Conditions Directly On A Language Utterance And Detects All Objects That The Utterance Mentions

? Guest Post: Adala – The First Open Source Data-Labeling Agent*

NLP Summit Insights: How LLMs Are Shaping the Future of Modern Business

AI-Fueled Productivity: Generative AI Opens New Era of Efficiency Across Industries

Meeting minutes generation with ChatGPT 4 API, Google Meet, Google Drive & Docs APIs

LLM distillation demystified: a complete guide

LLM distillation demystified: a complete guide

Use RAG for drug discovery with Knowledge Bases for Amazon Bedrock

Comcast’s data-centric approach to speech interfaces

Comcast’s data-centric approach to speech interfaces

Sierra Division Studios Presents Three Epic Projects Built With NVIDIA Omniverse

Multi-domain Multilingual Question Answering

74 Summaries of Machine Learning and NLP Research

ACL 2022 Highlights

Stay Connected