Databricks LLM in Data Engineering: Testing DBRX vs GPT-3.5
Introduction:
In recent years, the field of data engineering has witnessed a significant
transformation, largely driven by advancements in artificial intelligence and
machine learning. One of the key contributors to this evolution is the
emergence of powerful language models, such LLM’s. These models, built upon
deep learning architectures, have demonstrated remarkable capabilities in
natural language understanding and generation. In this Blog, we'll explore how
data engineers can leverage LLM models to enhance various aspects of their
workflows in generating insights.
DBRX is a
transformer-based decoder-only LLM that was trained using next-token
prediction. DBRX is having 132B
parameters, 36B parameters out of which are active on any input, with my
intial testing experince I am pretty impressed with DBRX as I utilized it to
parse unstructured data efficiently. With its natural language processing
capabilities, this model can extract relevant information from text-heavy
datasets, such as QnA,social media feeds, customer reviews, or news articles.
Refer
Databricks page for more details on DBRX
But as per Databricks
it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. I try to give
it a quick comparison with GPT-3.5, DBRX is clear winner as you can see in
below screenshots DBRX has parsed the unstructured data more efficiently when
same set of prompts given to both LLM.
Response from
GPT-3.5 : Having no characteristics like taste, color, But I am pretty sure
this can be improved with good prompting.
Response from
DBRX : Having characteristics like taste, color and winning the race
Recently, I've conducted experiments with leveraging the DBRX model to perform a natural language data processing task within a Databricks notebook. This was achieved by creating a Python-based UDF function. Detailed steps are mentioned below :
Step 1: Create python function that can interact with DBRX Instruct, here is the code of making function that will do NLP and provide humanly answers.
NOTE: Exact same function can also be made using Open AI , so it will be more what user want to use, but the comparison above can help you choosing wisely.
Step 2: Registering the function as UDF, so that
we can use it in our DataFrame
Below is the sample
DataFrame that have one column having certain questions
Step 3: Using UDF, to create new column “answers” which will hold responses from DBRX Model in DataFrame
But if you
need to use AI directly in SQL fashion and do not want UDF, Databricks has
also provided ai_query
SQL function that can be used in Databricks SQL doing similar kind of work for
which we are creating the UDF. Unfortunately, this function cannot be used in
Notebooks as of now and is only limited to in Databricks SQL. Also you can request AI
Functions Public Preview Enrollment if you want to experiment it.
Result:
BINGO!! DBRX has done great job in answering
all the questions, this opens new possibilities for innovation. By
incorporating these models into their toolkit, data engineers can unlock deeper
insights, streamline processes, and drive greater value from their data assets.
However, it's
essential to recognize the ethical considerations and potential biases
associated with AI-driven solutions, ensuring responsible and equitable
deployment in data engineering tasks. Gen AI aims to replicate the breadth and
depth of human intelligence across a wide range of domains. While the
advancements in artificial intelligence have been extraordinary, there are
certain tasks that still require the human touch but combination of human and
AI represents a powerful synergy that leverages the
unique strengths of both entities to achieve unprecedented outcomes.
Comments
Post a Comment