AI Deep Dive — CTO Edition

Chapter 01

The AI Hierarchy — What Fits Where

Every AI buzzword maps to a specific layer in a hierarchy. Understanding this hierarchy is the single most important thing before you walk into any AI conference.

Key Distinction: AI vs ML vs Deep Learning

Concept	What It Is	Example
AI	Any system that performs tasks typically requiring human intelligence. Includes rule-based systems.	A chess engine with hardcoded rules. An if-else fraud filter.
Machine Learning	A subset of AI. The system learns patterns from data instead of being explicitly programmed.	Spam filter that learns from labeled emails. Recommendation engines.
Deep Learning	A subset of ML using neural networks with many layers. Excels at unstructured data (text, images, audio).	ChatGPT, image recognition, voice assistants.
Generative AI	A subset of DL that creates new content — text, images, code, audio.	Claude writing an email. DALL-E generating an image.

CTO mental model: All generative AI is deep learning. All deep learning is ML. All ML is AI. But not all AI is ML — rule-based expert systems are AI but not ML.

Types of Machine Learning

📊

Supervised Learning

Learns from labeled examples. Input → known output. Used for: classification (spam/not spam), regression (price prediction), forecasting.

🔍

Unsupervised Learning

Finds patterns in unlabeled data. No "right answer" given. Used for: customer segmentation, anomaly detection, clustering.

🎮

Reinforcement Learning

Agent learns by trial-and-error, maximizing a reward signal. Used for: game AI, robotics, ad bidding, RLHF for LLMs.

🔄

Self-Supervised Learning

The model creates its own labels from the data. "Predict the next word." This is how LLMs like Claude are pre-trained.

Chapter 02

How AI Actually Learns — From Data to Intelligence

Neural Networks: The Core Mechanism

A neural network is a function that takes input numbers and produces output numbers, with adjustable parameters (called weights) in between. "Learning" means adjusting those weights to minimize errors.

The Training Loop (every AI model follows this)

1Forward pass: Feed input data through the network. It produces a prediction.
2Loss calculation: Compare the prediction to the correct answer. Measure how wrong it was (the "loss").
3Backpropagation: Calculate how each weight contributed to the error.
4Weight update: Adjust weights slightly to reduce the error (using gradient descent).
5Repeat: Do this billions of times across the entire dataset. Each full pass = one "epoch."

What Makes Deep Learning "Deep"

A shallow network has 1-2 hidden layers. A deep network has dozens to hundreds. Each layer learns increasingly abstract features:

Layer 1: Raw patterns (edges, individual characters)

Layer 2-5: Combinations (shapes, words, phrases)

Layer 10-20: Concepts (objects, syntax, meaning)

Layer 50+: Abstract reasoning (intent, context, nuance)

The Transformer Architecture (2017 — the breakthrough)

Before transformers, AI processed text word-by-word sequentially (slow, forgetful). The transformer introduced self-attention: the model can look at all words in a sentence simultaneously and learn which words relate to which.

Why it matters: Every major LLM today — GPT-4, Claude, Gemini, Llama — is a transformer. The 2017 Google paper "Attention Is All You Need" is the single most important AI paper of the decade.

Key Numbers (to have in your back pocket)

Metric	What It Means	Typical Values
Parameters	The adjustable weights in the model. More ≈ more capacity to learn.	GPT-4: ~1.8T, Llama 3: 8B-405B, Claude: undisclosed
Context Window	How much text the model can "see" at once (input + output).	Claude: 200K tokens. GPT-4: 128K. Gemini: 1M+
Tokens	Chunks of text (~0.75 words per token). The unit of measurement for LLMs.	This entire page ≈ 4,000 tokens
Training Data	Total text the model was trained on.	Typically trillions of tokens from books, web, code
Inference	Running the trained model to generate a response. What you pay for via API.	~$3-15 per million input tokens (varies by model)

Chapter 03

Large Language Models — The Engine Behind Everything

What an LLM Actually Does

An LLM is a next-token predictor. Given a sequence of tokens, it predicts the probability distribution over all possible next tokens, then samples from that distribution. That's it. All the apparent "intelligence" emerges from doing this prediction extremely well over extremely large amounts of data.

The Three Phases of Building an LLM

Phase 1: Pre-training

→

Phase 2: Fine-tuning

→

Phase 3: RLHF / Alignment

Phase 1 — Pre-training (costs $10M-$100M+): Feed the model trillions of tokens of text from the internet, books, code. The model learns language, facts, reasoning patterns. This produces a "base model" — it can complete text but won't follow instructions.

Phase 2 — Fine-tuning (SFT): Train on curated instruction/response pairs. "When a human asks X, a good response looks like Y." This teaches the model to be a helpful assistant instead of just an autocomplete engine.

Phase 3 — RLHF (Reinforcement Learning from Human Feedback): Human raters rank multiple model responses. A reward model learns what humans prefer. The LLM is then trained to maximize that reward. This is what makes Claude polite, safe, and genuinely useful vs. just technically capable.

Key LLM Capabilities

In-Context Learning

Give the LLM examples in the prompt, it adapts behavior without retraining. "Few-shot" prompting.

Chain of Thought

Ask it to "think step by step" and accuracy on reasoning tasks jumps dramatically.

Tool Use / Function Calling

The LLM outputs structured JSON to call external APIs, databases, or tools. This is the foundation of agents.

RAG (Retrieval-Augmented Generation)

Before answering, retrieve relevant documents from a database and inject them into the context. Reduces hallucination. Keeps answers grounded in your data.

What LLMs Cannot Do (important for a CTO)

• No true memory: Each conversation starts fresh unless you engineer persistence.

• Hallucinations: They confidently state false things. This is inherent to probabilistic generation. Mitigation = RAG, grounding, verification layers.

• No real-time data: Knowledge is frozen at training cutoff unless you add search/retrieval tools.

• Math and precise logic: Unreliable for complex calculations without tool use. They approximate; they don't compute.

• Determinism: Same input can produce different outputs. Temperature controls randomness but never eliminates it.

Chapter 04

The AI Provider Landscape

Foundation Model Providers

Provider	Models	Strengths	Access
Anthropic	Claude (Opus, Sonnet, Haiku)	Safety, long context (200K), instruction following, coding, analysis	API, claude.ai, AWS Bedrock, GCP Vertex
OpenAI	GPT-4o, o1, o3	Broad capabilities, vision, ecosystem, first-mover brand	API, ChatGPT, Azure OpenAI
Google	Gemini (Ultra, Pro, Flash)	Multimodal, huge context (1M+), integrated with Google Cloud	API, Gemini app, GCP Vertex
Meta	Llama 3/4	Open-source, self-hostable, fine-tunable, no vendor lock-in	Download weights, run anywhere
Mistral	Mixtral, Mistral Large	European, efficient, open-weight options	API, self-host

How Claude is Different

Anthropic's approach centers on "Constitutional AI" — the model is trained with a set of principles (a "constitution") rather than relying purely on human labeling. This produces more consistent, principled behavior.

Practical differences: Claude tends to follow complex instructions more precisely, handles very long documents well (200K context), and is more cautious about harmful outputs. For enterprise use: Claude is available on AWS Bedrock and GCP Vertex, so you can keep data within your cloud perimeter.

Open Source vs Closed Source — CTO Decision Framework

Factor	Closed (Claude, GPT-4)	Open (Llama, Mistral)
Performance	Generally best-in-class	Closing the gap rapidly
Cost	Pay per token (API)	Infra cost (GPUs) — can be cheaper at scale
Data privacy	Data sent to provider's API (though enterprise tiers offer zero retention)	Runs on your infra — full control
Customization	Prompt engineering, some fine-tuning	Full fine-tuning, modify architecture
Maintenance	Provider handles everything	You own ops, updates, security
Best for	Fast deployment, best quality, small-medium scale	High volume, strict compliance, niche domains

Chapter 05

AI Agents — The Salesforce Conference Focus

What is an AI Agent?

An AI agent is an LLM that can plan, use tools, observe results, and iterate — autonomously. Instead of just answering a question, it takes action to accomplish a goal.

Agent vs Automation vs Chatbot — The Critical Distinction

Feature	Traditional Automation (RPA)	Chatbot (rule-based)	AI Chatbot (LLM)	AI Agent
Decision making	None. Follows fixed rules.	Decision tree only	Flexible, but single-turn	Plans multi-step, adapts
Handles ambiguity	No — breaks on edge cases	No	Yes	Yes
Uses tools	Hardcoded integrations	No	If programmed to	Autonomously decides which tools
Memory	None	Session only	Session only	Short + long-term memory
Autonomy	Zero	Zero	Low	High — can loop, retry, escalate
Example	Copy data between systems on schedule	"Press 1 for billing"	"Summarize this ticket"	"Resolve this ticket: read it, check the database, update the CRM, email the customer"

Salesforce context: When Salesforce says "Agentforce," they mean AI agents embedded into Salesforce workflows — agents that can read your CRM data, take actions (create cases, send emails, update records), and operate within the guardrails you define. These sit on top of their Einstein AI platform + Data Cloud.

The Agent Loop (ReAct Pattern)

Most agent frameworks follow this loop:

1Observe: Receive user request or trigger event. Retrieve relevant context from memory.
2Think: The LLM reasons about what to do next. It creates a plan or picks the next action.
3Act: Call a tool — query a database, call an API, send a message, update a record.
4Observe: Check the result of the action. Did it work? Was the data correct?
5Loop or stop: If the goal is met, respond to the user. If not, go back to step 2 with updated context.

Agent Architecture Components

🧠

LLM Core

The reasoning engine. Chooses actions, interprets results, generates responses. Claude, GPT-4, etc.

🔧

Tools

Functions the agent can call: APIs, database queries, web search, calculators, file operations. Defined as schemas the LLM can invoke.

💾

Memory

Short-term: Conversation context. Long-term: Vector database storing past interactions, user preferences, knowledge base.

🛡️

Guardrails

Rules that constrain what the agent can do. "Never delete records." "Always get approval for refunds > $500." "Escalate to human if confidence is low."

📊

Orchestrator

Manages the agent loop. Handles retries, timeouts, error handling, and routing between multiple agents.

👁️

Observability

Logging every step: what the agent thought, what tools it called, what it returned. Critical for debugging and compliance.

Chapter 06

Building AI Applications — Practical Architecture

The AI Application Stack

User Interface — Chat, voice, embedded UI, API endpoints

Application Layer — Business logic, auth, rate limiting, caching

Orchestration — Agent framework, prompt management, tool routing

RAG Pipeline — Document ingestion, embeddings, vector search, reranking

Model Layer — LLM API calls (Claude, GPT-4, etc.) or self-hosted models

Infrastructure — GPUs, vector DB, object storage, monitoring

RAG: The Most Common Enterprise AI Pattern

RAG (Retrieval-Augmented Generation) is how you make an LLM answer questions about your data without retraining the model.

How RAG Works — Step by Step

1Ingest documents: Take your internal docs (PDFs, Confluence pages, Slack messages, CRM data). Split them into chunks (typically 200-500 tokens each).
2Create embeddings: Run each chunk through an embedding model (e.g., OpenAI text-embedding-3, Cohere Embed). This converts text to a numerical vector (a list of ~1500 numbers) that captures semantic meaning.
3Store in vector database: Store those vectors in a vector DB (Pinecone, Weaviate, pgvector, Qdrant, Chroma). This enables fast similarity search.
4At query time: Convert the user's question to a vector using the same embedding model.
5Search: Find the top 5-20 most similar document chunks in your vector DB.
6Augment: Insert those chunks into the LLM prompt as context: "Given this information: [chunks], answer the user's question: [question]."
7Generate: The LLM answers based on the retrieved context, dramatically reducing hallucination.

RAG pitfall: "Garbage in, garbage out" applies fully. If your documents are messy, poorly structured, or out of date, RAG will confidently retrieve bad information. Data quality work is 60% of a RAG project.

How IT Vendors / SaaS Companies Build AI Into Their Products

When Salesforce, ServiceNow, SAP, or any SaaS vendor says "we have AI," here's what they typically do:

Approach 1: Embed an LLM via API

The vendor calls Claude/GPT-4 via API and wraps it with their product context. Your CRM data is injected into prompts. The vendor manages prompt engineering, guardrails, and tool definitions. Example: Salesforce Einstein GPT uses a combination of proprietary models and partner LLMs.

Approach 2: Fine-tune on Domain Data

Take a base model, fine-tune it on industry-specific data (healthcare records, legal contracts, financial reports). This creates a specialized model that understands domain jargon and patterns. Example: Bloomberg trained BloombergGPT on financial data.

Approach 3: Build a RAG Layer Over Customer Data

The vendor builds a RAG pipeline that indexes each customer's data (behind proper isolation/permissions). When the AI answers, it's grounded in that customer's specific data. Example: Salesforce Data Cloud feeds into Einstein AI as a retrieval layer.

Approach 4: Train Custom Models

Major vendors train their own models from scratch for specific tasks — fraud detection, recommendation engines, demand forecasting. These are typically smaller, specialized models, not general-purpose LLMs.

Practical: Building an AI Feature (Simplified)

Say you want to build: "An AI that answers customer questions using your knowledge base."

Step	What You Do	Tools/Services
1. Data prep	Export knowledge base articles, clean HTML, split into chunks	Python, LangChain text splitters, Unstructured.io
2. Embeddings	Generate vector embeddings for each chunk	OpenAI Embeddings API, Cohere, Voyage AI
3. Vector store	Store embeddings with metadata (article ID, title, date)	Pinecone, Weaviate, pgvector (Postgres), Qdrant
4. Retrieval	Build search endpoint: query → vector → top-k similar chunks	Vector DB SDK + reranking (Cohere Rerank)
5. Prompt	System prompt with role, rules, tone + injected context chunks	Prompt templating (LangChain, custom)
6. LLM call	Send assembled prompt to Claude/GPT-4 API	Anthropic API, OpenAI API
7. UI	Chat interface with streaming responses	React, Vercel AI SDK, Streamlit
8. Guardrails	Input validation, output filtering, content policy enforcement	Custom rules, Guardrails AI, Anthropic's built-in safety
9. Observability	Log every request, response, retrieval, latency, cost	LangSmith, Langfuse, Helicone, Datadog
10. Evaluation	Measure answer quality, hallucination rate, user satisfaction	Human review, LLM-as-judge, RAGAS framework

Key Frameworks & Tools You'll Hear About

LangChain

Python/JS framework for building LLM applications. Chains together prompts, tools, memory, retrievers. Widely used but can be over-abstracted for simple use cases.

LlamaIndex

Focused specifically on RAG. Better than LangChain for document indexing and retrieval pipelines. Good for "connect your data to LLMs."

CrewAI / AutoGen

Multi-agent frameworks. Define multiple agents with different roles that collaborate on complex tasks. Still experimental for production.

Vercel AI SDK

Lightweight SDK for building AI chat UIs in Next.js/React. Handles streaming, tool calls, multi-model support. Good for frontend devs.

Chapter 07

Voice AI — Calls, IVR, and Conversational Agents

How Voice AI Works End-to-End

🎤 User speaks

→

ASR/STT

→

LLM processes text

→

TTS

→

🔊 User hears response

ASR (Automatic Speech Recognition) / STT (Speech-to-Text): Converts spoken audio to text. Leading models: OpenAI Whisper (open source, very good), Google Cloud Speech, Deepgram (low latency, real-time), AssemblyAI.

LLM Processing: Same as any text-based AI. The transcribed text is the input. The LLM generates a text response.

TTS (Text-to-Speech): Converts the LLM's text response back to natural-sounding speech. Leading options: ElevenLabs (most natural), OpenAI TTS, Google Cloud TTS, Play.ht, Cartesia (ultra-low latency).

Voice AI Use Cases

Welcome / Outbound Calls

AI calls new customers to welcome them, walk through onboarding, collect preferences. Runs 24/7. Companies like Bland.ai and Retell AI provide turn-key platforms.

Customer Support IVR

Replace "Press 1 for billing" with natural conversation. AI understands intent, pulls up account data, resolves issues or routes to human. Huge cost savings.

Appointment Scheduling

AI calls to confirm/reschedule appointments. Handles back-and-forth negotiation on times. Used in healthcare, salons, auto services.

Sales Qualification

AI calls inbound leads, asks qualifying questions, logs data to CRM, books meetings for human reps. Example: Air AI, Vapi.

Building a Voice AI System — Key Decisions

Decision	Options	Trade-off
Latency target	<500ms feels natural, >1s feels robotic	Lower latency = more expensive, requires streaming ASR+TTS
Build vs buy	Platforms: Vapi, Retell, Bland.ai, Vocode. Build: Twilio + ASR + LLM + TTS	Platforms are faster to ship. Custom gives control over every component.
Interruption handling	Must detect when user starts speaking mid-response and stop gracefully	Hard to get right. Platforms handle this. DIY requires VAD (Voice Activity Detection).
Phone integration	Twilio, Vonage, Plivo for SIP/PSTN	Twilio is most mature. Costs per minute apply.

Chapter 08

Enterprise AI — Practical Considerations

Data Privacy & Security

Zero Data Retention (ZDR): Enterprise API tiers from Anthropic and OpenAI guarantee your data is not used for training and is not retained after the request. Verify this in your contract.

Data residency: Run Claude via AWS Bedrock in your preferred AWS region. Data never leaves your VPC. Same with Google Vertex AI.

PII handling: Either redact PII before sending to the LLM, or use enterprise tiers with appropriate data processing agreements.

SOC2 / HIPAA / GDPR: Check provider compliance certifications. Anthropic and OpenAI have SOC2 Type II. HIPAA requires BAA agreements.

Cost Management

LLM API Pricing Model

You pay per token — both input (prompt) and output (response). A 2000-word document as input + a 500-word response ≈ 3500 tokens ≈ $0.01-0.05 depending on model.

Cost Optimization Strategies

Model routing: Use a cheap/fast model (Claude Haiku, GPT-4o mini) for simple queries. Route complex queries to the expensive model (Claude Opus, GPT-4). This alone can cut costs 60-80%.

Caching: Cache frequent queries. Anthropic offers prompt caching — if the same system prompt is reused, you pay a fraction of the cost.

Shorter prompts: Every token in your system prompt is charged on every request. Optimize for brevity.

Batch processing: For non-real-time tasks, use batch APIs at 50% discount.

Evaluation — How to Know If Your AI Is Good

Accuracy / Correctness

Are the answers factually correct? Measure with human review + automated checks against known answers.

Hallucination Rate

How often does it make things up? Use "faithfulness" metrics — does the answer only contain claims supported by the retrieved context?

Latency

Time-to-first-token and total response time. Users expect <2s for first token in chat UIs.

Cost per Query

Track input tokens, output tokens, embedding calls, vector DB queries per request. Set budgets.

Team Structure for AI

Role	What They Do	Typical Background
AI/ML Engineer	Builds pipelines, integrates LLMs, manages RAG, fine-tuning	Software engineer + ML experience
Prompt Engineer	Designs and optimizes system prompts, evaluates output quality	Domain expert + writing skill (often non-technical)
Data Engineer	Prepares, cleans, and pipelines data for RAG / training	Data engineering, ETL, data quality
Platform/Infra	Manages GPUs, vector DBs, model serving, observability	DevOps / SRE with ML infra experience

Chapter 09

Salesforce AI Ecosystem — Conference Context

Key terms you'll hear at the conference: Agentforce, Einstein GPT, Data Cloud, Prompt Builder, Model Builder, Trust Layer, Einstein Copilot.

Salesforce AI Architecture

Agentforce — Pre-built + custom AI agents for sales, service, marketing

Einstein Copilot — Conversational AI assistant inside Salesforce UI

Einstein GPT + Prompt Builder — Customizable AI generation in flows

Trust Layer — Prompt defense, toxicity filtering, PII masking, audit logging

Data Cloud — Unified customer data, real-time, feeds into AI as context

Foundation Models — Salesforce's own + OpenAI, Anthropic, Google, Cohere via gateway

What Salesforce "Agentforce" Actually Is

Agentforce = a platform for building and deploying AI agents inside Salesforce. Think of it as:

LLM (multi-model — they route to different providers) + Salesforce data (CRM, Data Cloud, Knowledge Base) + Tools (Salesforce actions: create case, update opportunity, send email) + Guardrails (Trust Layer + your business rules) + Deployment (embed in Service Cloud, Sales Cloud, web, Slack, etc.)

You can build agents using low-code tools (Agent Builder) or code (Apex, LWC). Salesforce provides pre-built agent templates for: Service Agent, Sales Coach, Merchant Agent, Buyer Agent, Campaign Agent.

What Questions to Ask at the Conference

• "How does Agentforce handle multi-step tool failures and retries?"

• "What's the latency overhead of the Trust Layer on each LLM call?"

• "Can I bring my own model (BYOM) and still use the orchestration layer?"

• "How does Data Cloud grounding work — is it RAG under the hood? What embedding model?"

• "What's the pricing model — per agent, per conversation, per action?"

• "How do I evaluate agent quality? Is there built-in testing/evaluation tooling?"

• "What observability do I get — can I see every reasoning step, tool call, and retrieval?"

Chapter 10

Risks, Limitations & What Can Go Wrong

Hallucination

LLMs generate plausible-sounding false information. Mitigation: RAG, citations, confidence scoring, human-in-the-loop for high-stakes decisions.

Prompt Injection

Malicious users craft inputs that override system instructions. "Ignore previous instructions and..." Mitigation: input sanitization, separate system/user contexts, output validation.

Data Leakage

Model accidentally reveals training data or other users' data. Mitigation: data isolation, output filtering, PII redaction, tenant separation.

Bias

Models inherit biases from training data. Can discriminate in hiring, lending, customer service. Mitigation: bias testing, diverse training data, human oversight.

Cost Overruns

Token costs can spike unexpectedly. One rogue loop in an agent can burn through budget. Mitigation: per-request budgets, circuit breakers, cost monitoring.

Vendor Lock-in

Building on one provider's API creates dependency. Mitigation: abstract the model layer, use frameworks that support model switching.

Chapter 11

CTO Strategy — Where to Start

The Pragmatic Adoption Ladder

1Internal productivity (low risk, high value): Deploy an LLM chatbot connected to your internal knowledge base. Let employees ask questions about policies, docs, procedures. This is the quickest win with the least risk.
2Customer-facing copilot (medium risk): AI that helps customers with common questions, grounded in your help center. Always with a "talk to human" escape hatch.
3Process automation agents (higher risk): Agents that take actions — update records, send emails, process refunds. Requires guardrails, approval workflows, thorough testing.
4Autonomous agents (highest complexity): Multi-step agents that handle end-to-end workflows with minimal human oversight. Only after you have robust evaluation, monitoring, and rollback capabilities.

The #1 mistake CTOs make: Starting with a model choice instead of starting with the problem. Pick a specific, measurable business problem first. Then figure out which AI approach solves it. Often you don't need the most expensive model — or any model at all.

Quick Reference: The AI Glossary

Term	Plain English
Token	A chunk of text (~¾ of a word). The unit LLMs process and bill by.
Embedding	Converting text to a list of numbers that capture meaning. Similar texts → similar numbers.
Vector Database	A database optimized for storing and searching embeddings by similarity.
RAG	Retrieval-Augmented Generation. Look up relevant docs, then feed them to the LLM.
Fine-tuning	Additional training on specific data to specialize a model. Expensive, usually unnecessary.
Prompt Engineering	Crafting the instructions (system prompt) to get the best output from an LLM.
Temperature	Controls randomness. 0 = deterministic, 1 = creative. Use low for facts, higher for brainstorming.
Context Window	Max text the model can process at once. Bigger = more info per request, but more expensive.
RLHF	Reinforcement Learning from Human Feedback. How models learn to be helpful vs. just technically correct.
Inference	Running a trained model to get a prediction/response. The thing you pay for in production.
Hallucination	When the model confidently outputs false information.
Guardrails	Rules that constrain AI behavior — what it can/can't do, say, or access.
MCP	Model Context Protocol. An open standard (by Anthropic) for connecting LLMs to external tools and data sources.
Function Calling	The LLM outputs structured data to invoke external tools/APIs. The mechanism behind agents.
Agentic	AI that can plan, act, observe, and loop — not just respond to a single prompt.