Skip to content

Building Data Fabric for Winning Gen AI Products Saurabh Kaushik Becoming Human: Artificial Intelligence Magazine – Medium

  • by

The landscape of Generative AI is evolving rapidly, presenting Product Managers with the formidable challenge of devising winning strategies for their products in this fiercely competitive market. While much attention is drawn to the emergence of larger and more advanced Foundational Models (FM) such as LLaMa, Mistral, Claude, Cohere, Gemini, Gemma, GPT-4, and Granite, it’s essential to recognize that the FM serves merely as the foundational element of the solution. Ultimately, success hinges on the strategic utilization of data for business context.

Drawing from my experience leading various Data & AI products, I aim to provide fellow Product Managers with a concise guide to crafting a Data Fabric Strategy for their Generative AI Product Initiative, offering insights to enhance their chances of achieving remarkable success in their endeavors.

Opportunity — Custom Data as Strategic Edge in the Gen AI Era

Market Trend — Democratization of Generative AI with Open Source FM

The open-source movement undoubtedly fosters innovation while simultaneously establishing a fair competitive landscape for startups vis-à-vis monopolistic corporations. Similar dynamics are unfolding within Generative AI (Gen AI), which traces its roots back to Google’s seminal paper “Attention is all I need.” From this origin, we witness a proliferation of open-source Foundational Models (FM) daily, vigorously competing with their proprietary counterparts. Despite offering comparable quality and performance, open-source models gain a competitive edge over commercial ones due to their cost-effectiveness. Presented here is a benchmarking analysis comparing open-source and closed-source FMs to illuminate this phenomenon.

Credit — Artificial Analytics AI

In the recent past, the advent of open sources, SaaS models and cloud platforms have undeniably democratized the technological landscape, and Gen AI is also following suit. Consequently, it’s plausible to assert that Gen AI Foundational Models (FMs) are poised to become a commodity or foundational capability universally accessible to all, devoid of any strategic advantage for any one particular entity.

Opportunity — Leveraging Custom Data to Succeed in the Gen AI Competition

The winners in the Generation AI race distinguish themselves by their skill in leveraging their organization’s extensive business and proprietary data to customize models to their specific domain, needs, and context, thus creating unique value propositions. This presents a significant opportunity for Product Managers to not only integrate Generation AI capabilities into their offerings but also to gain a competitive advantage that is difficult for rivals to replicate. To seize this opportunity, Product Managers must devise a Data Fabric Strategy with the help of senior technical staff/architects.

Strategy — Data Fabric Strategy for Gen AI Product:

Data Requirement Analysis for Gen AI

Before outlining a data fabric strategy, Product Manager needs to examine the primary data needs of a Gen AI Product. The diagram below illustrates the interaction between users and the application with LLM Models, along with four distinct types of data inputs that impact the outcome of the Product.

Behavioral Context (Prompt Templates) — Detailed prompts or instructions that guide the generative AI model on what kind of output is desired. These prompts should be diverse and cover a wide range of potential outputs the AI product may need to generate. Examples of prompts could include specific tasks the AI needs to perform, such as generating text for customer service responses, composing emails, creating product descriptions, or generating creative content like stories or poems.Situational Context (Knowledge Base & Business Data) — A comprehensive knowledge base comprising information relevant to the business domain and scenarios the AI product will operate within. This data should cover various aspects of the business, including products, services, policies, procedures, industry standards, and customer preferences. Business data such as historical records, transactional data, customer feedback, market research reports, and any other information that can provide insights into the context in which the AI will be generating content.Semantic Context (Semantic Data & Relationship Data) — Semantic data that helps the AI understand the meaning and context of the information it processes. This includes ontologies, taxonomies, and semantic relationships between different entities or concepts within the knowledge base. Relationship data that maps connections between entities, such as customer-product relationships, employee-job role relationships, or hierarchical structures within the organization. This data helps the AI model generate content that aligns with the specific relationships and contexts it encounters.Knowledge Base (Conversation, Enterprise Knowledge Data) — Conversational data was collected from various sources, including chat logs, customer support interactions, forums, and social media platforms. This data provides the AI with real-world examples of language usage and conversation patterns. Enterprise knowledge data encompassing internal documents, manuals, training materials, and other resources specific to the organization. This data helps the AI understand the internal workings of the business and generate content that is aligned with organizational goals and standards.

Open or closed LLMs excel at generalized tasks but often fall short in business applications due to their lack of understanding of specialized context, specialized business tasks, and nuances within specific business domains. To overcome these limitations, Product Manager must refine Data Requirements further to ensure that the models are equipped to address these scenarios effectively.

Data Fabric Strategy for Gen AI Products:

Essentially, the Data Fabric Strategy involves a comprehensive plan to seamlessly integrate diverse data sources, processing capabilities, and AI algorithms to enable the creation, training, and deployment of generative AI models. It provides a unified platform approach for the Collection of Data, Organizing the data, and Allowing good Governance over data, facilitating the development of winning AI Products.

The Product Manager establishes the North Star Metrics (NSM) for the product according to the business context, with the most prevalent and crucial NSM being User Experience, contingent upon three pivotal factors.

Latency — Latency refers to the time taken for the generative AI system to process input data and produce output.Accuracy — This dimension ensures that the generative AI models produce high-quality outputs that closely resemble the desired content.Ethics — This dimension is about whether generated content is safe, fair, transparent and explainable.

With these User Experience criteria for Gen AI Product, the Product Manager can now craft a Data Fabric Strategy (Collect, Organize, and Govern) with their respective North Star Metrics.

Data Collection:

Gathering data from diverse sources with seamless integration and efficient preprocessing.

Data Fabric Strategy:

Facilitate swift retrieval with seamless integration from diverse data sources using Zero ETL Integrations, ensuring minimal latency in accessing data for generative AI models.Aggregate and pre-process extensive volumes of structured and unstructured data in the Data Lake / Warehouse, optimizing for accuracy and efficiency in data processing to support robust generative AI model training.

Data Fabric Matric:

Data Retrieval Index (DRI): Percentage of time that the generative AI models have access to required data without delays. It measures the efficiency of data retrieval and integration processes without ETL, indicating the readiness of data for model training and inference.

Data Organization:

Establishing a structured data catalog and refining data for better comprehension, contextualization and analysis.

Data Fabric Strategy:

Establish a comprehensive data catalog and graph to build situational context for FMs, enabling better understanding and utilization of data for generative AI tasks.Refine and prepare the data into native and vectorized embeddings to enhance semantic comprehension for FMs, improving the accuracy and interpretability of generative outputs.

Data Fabric Metrics:

Data Contextualisation Index (DCI): A composite index measuring the improvement in generative model performance attributed to enhanced semantic comprehension of input data. It reflects the impact of data refinement and contextual understanding on the quality of generated outputs.

Data Governance:

Ensuring data security, compliance, and superior quality aligned with ethical AI principles.

Data Fabric Strategy:

Elevate data security framework and compliance adherence across data, models, and prompts to prevent FM hacking attempts, prioritizing data integrity and confidentiality in generative AI processes.Ensure superior data quality coverage in alignment with ethical AI principles (Explainability, Fairness, Robustness, Privacy, Transparency) for the FM’s generative nature, fostering trustworthiness and responsible use of AI-generated content.

Data Fabric Metrics:

Ethical AI Compliance Index (ECI): An aggregated score assessing the adherence to ethical AI principles (Explainability, Fairness, Robustness, Privacy, Transparency) across data governance and model development practices. It assures ethical compliance and trustworthiness of the generative AI product.

Design — Data Fabric Architecture for Gen AI Product

Data Fabric Capabilities for Gen AI Product:

At this stage, the Product Manager possesses a thorough understanding of the essential capabilities needed to fulfill the data requirements for Generative AI (such as situational and semantic context, and access to a vast knowledge base) sourced from various data repositories (including vectorized data, graph data, data lakes, etc.). With a defined Data Fabric strategy, the PM is well-equipped to delineate the Data Fabric Capability Framework and articulate the necessary functionalities.

Data Collection:

Zero ETL Integration: This enables swift and seamless retrieval of data from diverse sources without the need for ETL processes, ensuring minimal latency in accessing data for generative AI models. By adopting a unified data integration approach, it simplifies data ingestion and enhances real-time data availability, supporting timely responses to generative AI tasks.Data Lake/Warehouse Aggregation and Pre-processing: Through scalable data aggregation, this component centralizes extensive volumes of structured and unstructured data in a Data Lake or Warehouse, providing a reliable storage solution for generative AI model training. Efficient pre-processing pipelines optimize data quality and readiness, streamlining the preparation process and facilitating smoother and more effective generative AI model training.

Data Organization:

Data Catalog and Graph DB: This component centralizes metadata and lineage information in a comprehensive data catalog, enabling FMs to better understand and utilize data for generative AI tasks, while a graph-based situational context enhances FMs’ awareness of data relationships and dependencies, facilitating more informed decision-making during generation.Data Refinement and Vector Embedding: Through a refinement pipeline, raw data is processed into standardized formats, ensuring quality and consistency for generative AI tasks, while native and vectorized embeddings capture semantic meaning and context, enhancing FMs’ comprehension and ability to produce accurate and interpretable generative outputs.

Data Governance:

Data Security Management: This ensures data security by implementing robust encryption and access controls, safeguarding sensitive data from unauthorized access or tampering. Additionally, it maintains compliance with regulatory standards through continuous monitoring and auditing processes, minimizing the risk of hacking attempts and ensuring data integrity and confidentiality.Ethical AI Compliance Management: This promotes trust in AI-generated content by integrating explainability and transparency tools, allowing stakeholders to understand how content is generated and make informed decisions about its use. Furthermore, it mitigates biases and ensures fairness in content generation through the adoption of fairness-aware machine learning techniques, fostering responsible and ethical AI practices.

Solution Architecture for Gen AI Product:

Before getting into the Data Fabric Architecture layout based on the above Capability Framework, PM needs to look at one more aspect of Gen AI Application development.

As outlined in the Data Requirement section, addressing challenges such as the lack of understanding of specialized context, specialized business tasks, and nuances within specific business domains requires leveraging three key Design Patterns to elicit customized behavior from LLMs: Retrieval Augmented Generator (RAG), Fine-Tuning, and Custom Pre-Training. While Prompt Engineering, primarily focused on user interaction, is not included in this discussion. The Product Manager is tasked with selecting the most suitable solution approach, considering factors such as the desired level of customization for LLMs, the depth of the available knowledge base, and the readiness of the Data Fabric infrastructure.

RAG Solution: RAG combines generative abilities with retrieval-based methods. It uses a pre-trained language model to generate responses, augmenting them with relevant information retrieved from external sources. This integration enhances context understanding and diversifies responses through techniques like data augmentation and interpolation, but there is no training involved in this approach.

Advise to PM: Choose RAG for dynamic and informative responses. It enhances your AI product’s output by integrating external knowledge sources, enriching context understanding, and increasing response diversity. With a retrieval corpus, RAG enables versatile applications without significant training overhead.

Fine Tuning Solution: Fine-tuning adapts a pre-trained language model to specific tasks or domains by adjusting its parameters using task-specific labeled data. This shallow training initializes the model with learned weights from a large corpus and then updates them through backpropagation using task-specific data. This process enables the model to learn task-specific features and optimize performance.

Advise to PM: Opt for Fine-Tuning to tailor responses for specific tasks or domains. It refines pre-trained models with task-specific data, offering precise control over parameters for superior performance. While it requires moderate training time and resources, Fine-Tuning ensures accuracy tailored to your product’s requirements.

Custom Pre-Training Solution: Custom Pre-Training involves training a language model from scratch or retraining an existing model using domain-specific text corpora. This deep training requires collecting a large dataset relevant to the domain, preprocessing the data, and training the model using techniques like masked language modeling or causal language modeling. Custom Pre-Training allows the model to learn domain-specific patterns and semantics, leading to superior performance in specialized domains.

Advice to PM: Consider Custom Pre-Training for exceptional performance in specialized domains. Training from scratch using domain-specific text corpora yields unmatched flexibility and accuracy. Despite requiring significant resources and time, Custom Pre-Training empowers your AI product with unparalleled adaptability.

Data Fabric Architecture for Gen AI Product:

Embarking on the journey of implementing a Data Fabric Strategy, the pinnacle stage lies in sculpting the Solution Architecture tailored for Gen AI product. While the accountability rests with the Product Manager, the creation of this vital blueprint falls under the purview of the Architect.

In dissecting the intricacies of Data Fabric solutions, we encounter two fundamental components: the user-facing interactions and the robust Data Processing Pipeline.

Transactional User Interactions: In this aspect, the Gen AI App orchestrates interactions with users, employing Prompt Templates for processing conversations effectively.Batch or Streaming Processes: On the other front, the Data Fabric operates by managing Batch or Streaming data, undertaking tasks such as processing, organizing, storing, and feeding data into the LLM model to enable customized behavior.

Hyperscalers Data Fabric Offerings for Gen AI:

One of the pivotal decisions confronting the Product Manager is selecting a cloud platform vendor capable of providing all the necessary functionalities to construct a Gen AI solution in a cost-effective and future-proof manner. While most hyperscalers offer a comprehensive suite of Data Fabric Capabilities tailored for Generative AI, the Product Manager must make a judicious choice based on the organization’s present setup and future requirements. Here’s a brief overview of AWS, IBM, and Google Cloud’s Data Fabric capabilities for meeting the data requirements of a Generative AI Product. However, the ultimate decision rests with the PM, who must consider factors such as ease of use, intuitiveness, and accessibility aligned with their organization’s preferences.

Launch — Data Fabric Performance Management for Gen AI Product

Product Building is essential, but true success lies in effectively launching it and adjusting/adapting product strategies based on market adoption and performance. The Product Manager plays a pivotal role in this process, continuously monitoring both Leading and Lagging Indicators to scale the product to new heights.

For effective monitoring of product performance and informed decision-making, the Product Manager incorporates ample PLG (product-led growth) instrumentation into the product. Additionally, their PM Dashboard should establish a direct correlation between Product-level North Star Metrics and the underlying technical metrics that significantly impact the overall user experience.

In my own research, I’ve examined over five Gen AI solutions where Product Managers adjusted their Data Fabric Strategies to optimize the utilization of Custom Data in their products. The results were overwhelmingly positive.

Conclusion:

In conclusion, as the landscape of Generative AI continues to evolve at a rapid pace, it becomes increasingly evident that success in this fiercely competitive market hinges not solely on the sophistication of Foundational Models, but rather on the strategic utilization of data within a well-crafted Data Fabric Strategy. Through the exploration of data requirements, the delineation of a comprehensive Data Fabric Strategy, and the selection of appropriate solution architectures, Product Managers are empowered to harness the potential of Generative AI products, leveraging custom data to gain a strategic edge. By embracing this approach, organizations can not only integrate Generative AI capabilities into their portfolios but also establish a competitive advantage that is difficult to replicate. As we navigate the era of Gen AI, the importance of a robust Data Fabric Strategy cannot be overstated, offering a pathway to remarkable success in the development and deployment of Generative AI products.

Disclaimer: The postings on this article are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

Saurabh Kaushik: Data and AI Product Management Leader for 24 years. From web 1.0 to cutting-edge AI solutions, he’s pioneered tech products across industries, from startups to enterprises. Saurabh is a renowned thought leader and speaker at global tech forums, and his tech blogs span over a decade. His relentless innovation continues to shape Data and AI solutions worldwide.

Connect with him on Linkedin

Building Data Fabric for Winning Gen AI Products was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

 The landscape of Generative AI is evolving rapidly, presenting Product Managers with the formidable challenge of devising winning strategies for their products in this fiercely competitive market. While much attention is drawn to the emergence of larger and more advanced Foundational Models (FM) such as LLaMa, Mistral, Claude, Cohere, Gemini, Gemma, GPT-4, and Granite, it’s essential to recognize that the FM serves merely as the foundational element of the solution. Ultimately, success hinges on the strategic utilization of data for business context.Drawing from my experience leading various Data & AI products, I aim to provide fellow Product Managers with a concise guide to crafting a Data Fabric Strategy for their Generative AI Product Initiative, offering insights to enhance their chances of achieving remarkable success in their endeavors.Opportunity — Custom Data as Strategic Edge in the Gen AI EraMarket Trend — Democratization of Generative AI with Open Source FMThe open-source movement undoubtedly fosters innovation while simultaneously establishing a fair competitive landscape for startups vis-à-vis monopolistic corporations. Similar dynamics are unfolding within Generative AI (Gen AI), which traces its roots back to Google’s seminal paper “Attention is all I need.” From this origin, we witness a proliferation of open-source Foundational Models (FM) daily, vigorously competing with their proprietary counterparts. Despite offering comparable quality and performance, open-source models gain a competitive edge over commercial ones due to their cost-effectiveness. Presented here is a benchmarking analysis comparing open-source and closed-source FMs to illuminate this phenomenon.Credit — Artificial Analytics AIIn the recent past, the advent of open sources, SaaS models and cloud platforms have undeniably democratized the technological landscape, and Gen AI is also following suit. Consequently, it’s plausible to assert that Gen AI Foundational Models (FMs) are poised to become a commodity or foundational capability universally accessible to all, devoid of any strategic advantage for any one particular entity.Opportunity — Leveraging Custom Data to Succeed in the Gen AI CompetitionThe winners in the Generation AI race distinguish themselves by their skill in leveraging their organization’s extensive business and proprietary data to customize models to their specific domain, needs, and context, thus creating unique value propositions. This presents a significant opportunity for Product Managers to not only integrate Generation AI capabilities into their offerings but also to gain a competitive advantage that is difficult for rivals to replicate. To seize this opportunity, Product Managers must devise a Data Fabric Strategy with the help of senior technical staff/architects.Strategy — Data Fabric Strategy for Gen AI Product:Data Requirement Analysis for Gen AIBefore outlining a data fabric strategy, Product Manager needs to examine the primary data needs of a Gen AI Product. The diagram below illustrates the interaction between users and the application with LLM Models, along with four distinct types of data inputs that impact the outcome of the Product.Behavioral Context (Prompt Templates) — Detailed prompts or instructions that guide the generative AI model on what kind of output is desired. These prompts should be diverse and cover a wide range of potential outputs the AI product may need to generate. Examples of prompts could include specific tasks the AI needs to perform, such as generating text for customer service responses, composing emails, creating product descriptions, or generating creative content like stories or poems.Situational Context (Knowledge Base & Business Data) — A comprehensive knowledge base comprising information relevant to the business domain and scenarios the AI product will operate within. This data should cover various aspects of the business, including products, services, policies, procedures, industry standards, and customer preferences. Business data such as historical records, transactional data, customer feedback, market research reports, and any other information that can provide insights into the context in which the AI will be generating content.Semantic Context (Semantic Data & Relationship Data) — Semantic data that helps the AI understand the meaning and context of the information it processes. This includes ontologies, taxonomies, and semantic relationships between different entities or concepts within the knowledge base. Relationship data that maps connections between entities, such as customer-product relationships, employee-job role relationships, or hierarchical structures within the organization. This data helps the AI model generate content that aligns with the specific relationships and contexts it encounters.Knowledge Base (Conversation, Enterprise Knowledge Data) — Conversational data was collected from various sources, including chat logs, customer support interactions, forums, and social media platforms. This data provides the AI with real-world examples of language usage and conversation patterns. Enterprise knowledge data encompassing internal documents, manuals, training materials, and other resources specific to the organization. This data helps the AI understand the internal workings of the business and generate content that is aligned with organizational goals and standards.Open or closed LLMs excel at generalized tasks but often fall short in business applications due to their lack of understanding of specialized context, specialized business tasks, and nuances within specific business domains. To overcome these limitations, Product Manager must refine Data Requirements further to ensure that the models are equipped to address these scenarios effectively.Data Fabric Strategy for Gen AI Products:Essentially, the Data Fabric Strategy involves a comprehensive plan to seamlessly integrate diverse data sources, processing capabilities, and AI algorithms to enable the creation, training, and deployment of generative AI models. It provides a unified platform approach for the Collection of Data, Organizing the data, and Allowing good Governance over data, facilitating the development of winning AI Products.The Product Manager establishes the North Star Metrics (NSM) for the product according to the business context, with the most prevalent and crucial NSM being User Experience, contingent upon three pivotal factors.Latency — Latency refers to the time taken for the generative AI system to process input data and produce output.Accuracy — This dimension ensures that the generative AI models produce high-quality outputs that closely resemble the desired content.Ethics — This dimension is about whether generated content is safe, fair, transparent and explainable.With these User Experience criteria for Gen AI Product, the Product Manager can now craft a Data Fabric Strategy (Collect, Organize, and Govern) with their respective North Star Metrics.Data Collection:Gathering data from diverse sources with seamless integration and efficient preprocessing.Data Fabric Strategy:Facilitate swift retrieval with seamless integration from diverse data sources using Zero ETL Integrations, ensuring minimal latency in accessing data for generative AI models.Aggregate and pre-process extensive volumes of structured and unstructured data in the Data Lake / Warehouse, optimizing for accuracy and efficiency in data processing to support robust generative AI model training.Data Fabric Matric:Data Retrieval Index (DRI): Percentage of time that the generative AI models have access to required data without delays. It measures the efficiency of data retrieval and integration processes without ETL, indicating the readiness of data for model training and inference.Data Organization:Establishing a structured data catalog and refining data for better comprehension, contextualization and analysis.Data Fabric Strategy:Establish a comprehensive data catalog and graph to build situational context for FMs, enabling better understanding and utilization of data for generative AI tasks.Refine and prepare the data into native and vectorized embeddings to enhance semantic comprehension for FMs, improving the accuracy and interpretability of generative outputs.Data Fabric Metrics:Data Contextualisation Index (DCI): A composite index measuring the improvement in generative model performance attributed to enhanced semantic comprehension of input data. It reflects the impact of data refinement and contextual understanding on the quality of generated outputs.Data Governance:Ensuring data security, compliance, and superior quality aligned with ethical AI principles.Data Fabric Strategy:Elevate data security framework and compliance adherence across data, models, and prompts to prevent FM hacking attempts, prioritizing data integrity and confidentiality in generative AI processes.Ensure superior data quality coverage in alignment with ethical AI principles (Explainability, Fairness, Robustness, Privacy, Transparency) for the FM’s generative nature, fostering trustworthiness and responsible use of AI-generated content.Data Fabric Metrics:Ethical AI Compliance Index (ECI): An aggregated score assessing the adherence to ethical AI principles (Explainability, Fairness, Robustness, Privacy, Transparency) across data governance and model development practices. It assures ethical compliance and trustworthiness of the generative AI product.Design — Data Fabric Architecture for Gen AI ProductData Fabric Capabilities for Gen AI Product:At this stage, the Product Manager possesses a thorough understanding of the essential capabilities needed to fulfill the data requirements for Generative AI (such as situational and semantic context, and access to a vast knowledge base) sourced from various data repositories (including vectorized data, graph data, data lakes, etc.). With a defined Data Fabric strategy, the PM is well-equipped to delineate the Data Fabric Capability Framework and articulate the necessary functionalities.Data Collection:Zero ETL Integration: This enables swift and seamless retrieval of data from diverse sources without the need for ETL processes, ensuring minimal latency in accessing data for generative AI models. By adopting a unified data integration approach, it simplifies data ingestion and enhances real-time data availability, supporting timely responses to generative AI tasks.Data Lake/Warehouse Aggregation and Pre-processing: Through scalable data aggregation, this component centralizes extensive volumes of structured and unstructured data in a Data Lake or Warehouse, providing a reliable storage solution for generative AI model training. Efficient pre-processing pipelines optimize data quality and readiness, streamlining the preparation process and facilitating smoother and more effective generative AI model training.Data Organization:Data Catalog and Graph DB: This component centralizes metadata and lineage information in a comprehensive data catalog, enabling FMs to better understand and utilize data for generative AI tasks, while a graph-based situational context enhances FMs’ awareness of data relationships and dependencies, facilitating more informed decision-making during generation.Data Refinement and Vector Embedding: Through a refinement pipeline, raw data is processed into standardized formats, ensuring quality and consistency for generative AI tasks, while native and vectorized embeddings capture semantic meaning and context, enhancing FMs’ comprehension and ability to produce accurate and interpretable generative outputs.Data Governance:Data Security Management: This ensures data security by implementing robust encryption and access controls, safeguarding sensitive data from unauthorized access or tampering. Additionally, it maintains compliance with regulatory standards through continuous monitoring and auditing processes, minimizing the risk of hacking attempts and ensuring data integrity and confidentiality.Ethical AI Compliance Management: This promotes trust in AI-generated content by integrating explainability and transparency tools, allowing stakeholders to understand how content is generated and make informed decisions about its use. Furthermore, it mitigates biases and ensures fairness in content generation through the adoption of fairness-aware machine learning techniques, fostering responsible and ethical AI practices.Solution Architecture for Gen AI Product:Before getting into the Data Fabric Architecture layout based on the above Capability Framework, PM needs to look at one more aspect of Gen AI Application development.As outlined in the Data Requirement section, addressing challenges such as the lack of understanding of specialized context, specialized business tasks, and nuances within specific business domains requires leveraging three key Design Patterns to elicit customized behavior from LLMs: Retrieval Augmented Generator (RAG), Fine-Tuning, and Custom Pre-Training. While Prompt Engineering, primarily focused on user interaction, is not included in this discussion. The Product Manager is tasked with selecting the most suitable solution approach, considering factors such as the desired level of customization for LLMs, the depth of the available knowledge base, and the readiness of the Data Fabric infrastructure.RAG Solution: RAG combines generative abilities with retrieval-based methods. It uses a pre-trained language model to generate responses, augmenting them with relevant information retrieved from external sources. This integration enhances context understanding and diversifies responses through techniques like data augmentation and interpolation, but there is no training involved in this approach.Advise to PM: Choose RAG for dynamic and informative responses. It enhances your AI product’s output by integrating external knowledge sources, enriching context understanding, and increasing response diversity. With a retrieval corpus, RAG enables versatile applications without significant training overhead.Fine Tuning Solution: Fine-tuning adapts a pre-trained language model to specific tasks or domains by adjusting its parameters using task-specific labeled data. This shallow training initializes the model with learned weights from a large corpus and then updates them through backpropagation using task-specific data. This process enables the model to learn task-specific features and optimize performance.Advise to PM: Opt for Fine-Tuning to tailor responses for specific tasks or domains. It refines pre-trained models with task-specific data, offering precise control over parameters for superior performance. While it requires moderate training time and resources, Fine-Tuning ensures accuracy tailored to your product’s requirements.Custom Pre-Training Solution: Custom Pre-Training involves training a language model from scratch or retraining an existing model using domain-specific text corpora. This deep training requires collecting a large dataset relevant to the domain, preprocessing the data, and training the model using techniques like masked language modeling or causal language modeling. Custom Pre-Training allows the model to learn domain-specific patterns and semantics, leading to superior performance in specialized domains.Advice to PM: Consider Custom Pre-Training for exceptional performance in specialized domains. Training from scratch using domain-specific text corpora yields unmatched flexibility and accuracy. Despite requiring significant resources and time, Custom Pre-Training empowers your AI product with unparalleled adaptability.Data Fabric Architecture for Gen AI Product:Embarking on the journey of implementing a Data Fabric Strategy, the pinnacle stage lies in sculpting the Solution Architecture tailored for Gen AI product. While the accountability rests with the Product Manager, the creation of this vital blueprint falls under the purview of the Architect.In dissecting the intricacies of Data Fabric solutions, we encounter two fundamental components: the user-facing interactions and the robust Data Processing Pipeline.Transactional User Interactions: In this aspect, the Gen AI App orchestrates interactions with users, employing Prompt Templates for processing conversations effectively.Batch or Streaming Processes: On the other front, the Data Fabric operates by managing Batch or Streaming data, undertaking tasks such as processing, organizing, storing, and feeding data into the LLM model to enable customized behavior.Hyperscalers Data Fabric Offerings for Gen AI:One of the pivotal decisions confronting the Product Manager is selecting a cloud platform vendor capable of providing all the necessary functionalities to construct a Gen AI solution in a cost-effective and future-proof manner. While most hyperscalers offer a comprehensive suite of Data Fabric Capabilities tailored for Generative AI, the Product Manager must make a judicious choice based on the organization’s present setup and future requirements. Here’s a brief overview of AWS, IBM, and Google Cloud’s Data Fabric capabilities for meeting the data requirements of a Generative AI Product. However, the ultimate decision rests with the PM, who must consider factors such as ease of use, intuitiveness, and accessibility aligned with their organization’s preferences.Launch — Data Fabric Performance Management for Gen AI ProductProduct Building is essential, but true success lies in effectively launching it and adjusting/adapting product strategies based on market adoption and performance. The Product Manager plays a pivotal role in this process, continuously monitoring both Leading and Lagging Indicators to scale the product to new heights.For effective monitoring of product performance and informed decision-making, the Product Manager incorporates ample PLG (product-led growth) instrumentation into the product. Additionally, their PM Dashboard should establish a direct correlation between Product-level North Star Metrics and the underlying technical metrics that significantly impact the overall user experience.In my own research, I’ve examined over five Gen AI solutions where Product Managers adjusted their Data Fabric Strategies to optimize the utilization of Custom Data in their products. The results were overwhelmingly positive.Conclusion:In conclusion, as the landscape of Generative AI continues to evolve at a rapid pace, it becomes increasingly evident that success in this fiercely competitive market hinges not solely on the sophistication of Foundational Models, but rather on the strategic utilization of data within a well-crafted Data Fabric Strategy. Through the exploration of data requirements, the delineation of a comprehensive Data Fabric Strategy, and the selection of appropriate solution architectures, Product Managers are empowered to harness the potential of Generative AI products, leveraging custom data to gain a strategic edge. By embracing this approach, organizations can not only integrate Generative AI capabilities into their portfolios but also establish a competitive advantage that is difficult to replicate. As we navigate the era of Gen AI, the importance of a robust Data Fabric Strategy cannot be overstated, offering a pathway to remarkable success in the development and deployment of Generative AI products.Disclaimer: The postings on this article are my own and don’t necessarily represent IBM’s positions, strategies or opinions.Saurabh Kaushik: Data and AI Product Management Leader for 24 years. From web 1.0 to cutting-edge AI solutions, he’s pioneered tech products across industries, from startups to enterprises. Saurabh is a renowned thought leader and speaker at global tech forums, and his tech blogs span over a decade. His relentless innovation continues to shape Data and AI solutions worldwide.Connect with him on LinkedinBuilding Data Fabric for Winning Gen AI Products was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.  Read More data-fabric, artificial-intelligence, generative-ai-tools, product-management, solution-architecture 

Leave a Reply

Your email address will not be published. Required fields are marked *