Видео 705
Просмотров 2 094 638

50:37

Claude 3.5 SONNET hallucinates w/ a Logic Bug?

4:56

Decoding AI's Blind Spots: Solving Causal Reasoning

13:39

NEW Multi-Modal AI by APPLE

26:49

Financial AI Brilliance: 7 Children at Stanford? 😆

6:25

Text-to-GRAPH w/ LGGM: Generative Graph Models

11:55

Q* explained: Complex Multi-Step AI Reasoning

NEW Q* explained: Complex Multi-Step AI Reasoning for Experts only (integrating graph theory and Q-learning from reinforcement learning of LLMs and VLMs).
My video provides an in-depth analysis of Q-Star, a novel approach that amalgamates Q-Learning and A-Star algorithms to address the challenges faced by large language models (LLMs) in multi-step reasoning tasks. This approach is predicated on conceptualizing the reasoning process as a Markov Decision Process (MDP), where states represent sequential reasoning steps and actions correspond to subsequent logical conclusions. Q-Star employs a sophisticated Q-value model to guide decision-making, estimating future rewards and optimizing polic...

Видео

50:37

5 Easy Ways to help LLMs to Reason

Просмотров 2,8 тыс.День назад

5 Effective Strategies to Enhance LLM Reasoning: If your LLM (either an open source LLama 3 or a proprietary GPT-4omni) fails at reasoning, given your task, I introduce 5 easy methods to help LLMs to improve their reasoning capability significantly. Boost LLM Reasoning: 5 methods w/o fine-tuning LLMs From Chain-Thoughts, to Tree-of-Thoughts, Graph-of-Thoughts, Abstraction-of-Thoughts to my own ...

Claude 3.5 SONNET hallucinates w/ a Logic Bug?

4:56

Claude 3.5 SONNET hallucinates w/ a Logic Bug?

Просмотров 1,4 тыс.День назад

CLAUDE 3.5 SONNET tested for causal reasoning and logic reveals massive problem. CLAUDE 3.5 SONNET simply generates incorrect facts (hallucinates) plus operates w/ a logic bug called "affirming the consequent", where logical chains are incorrectly inverted and proclaimed true. This CLAUDE 3.5 SONNET logic bug has consequences for my tasks in writing code, mathematics, logic, finance, medical, g...

Decoding AI's Blind Spots: Solving Causal Reasoning

13:39

Decoding AI's Blind Spots: Solving Causal Reasoning

Просмотров 1,9 тыс.День назад

How to Fix AI's Causal Reasoning Failures as evident in my last video on "Financial AI Brilliance: 7 Children at Stanford?" ruclips.net/video/YBdTd09OuYk/видео.html Great response from the AI community on my prompt, testing causal reasoning, where all LMMs failed. Here now my response. #airesearch #aieducation #failure

26:49

NEW Multi-Modal AI by APPLE

Просмотров 1,8 тыс.День назад

Apple published new Machine Learning (ML) models on its GitHub repo: 4M-21. Massively Multimodal Masked Modelling. All rights w/ authors: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities arxiv.org/pdf/2406.09406 Video from Apple and Lausanne: storage.googleapis.com/four_m_site/videos/4M-21_Website_Video.mp4 #appleai #apple #multimodalai

Financial AI Brilliance: 7 Children at Stanford? 😆

6:25

Financial AI Brilliance: 7 Children at Stanford? 😆

Просмотров 1,1 тыс.День назад

A novel benchmark for financial excellence in reasoning for the best Large Language Models (LLMs) on this planet. Surprising results. My simple test: "Stanford provides a financial aid to families with low income. They pay 90% of their official fees. If a poor family with 6 children will send all children to Stanford, at what time will they have enough money, received from Stanford, to send the...

Text-to-GRAPH w/ LGGM: Generative Graph Models

11:55

Text-to-GRAPH w/ LGGM: Generative Graph Models

Просмотров 3,9 тыс.2 дня назад

New research explores the possibilities of large generative graph models (LGGM). 2 universities, Adobe and Intel explore diffusion based tech and commercial applications of new text2Graph functionality (from biomed to cybersecurity). all rights w authors: Large Graph Generative Models arxiv.org/pdf/2406.05109 #airesearch #graph #newtechnology

NEW TextGrad by Stanford: Better than DSPy

41:25

NEW TextGrad by Stanford: Better than DSPy

Просмотров 8 тыс.14 дней назад

In this TEXTGRAD framework, each AI system is transformed into a computation graph, where variables are inputs and outputs of complex (not necessarily differentiable) function calls. The feedback to the variables (dubbed ‘textual gradients’) are provided in the form of informative and interpretable natural language criticism to the variables; describing how a variable should be changed to impro...

Adversarial Questions Test Multimodal MED AI sys

21:08

Adversarial Questions Test Multimodal MED AI sys

Просмотров 1,3 тыс.14 дней назад

My cranial MRI data were recorded and the question is: What medical Large Multimodal Model (LMM) to use for AI :: medical analysis, X-ray, CT or MRI? Medical Vision Question Answering (VQA). AI models analyse medical images and scans. What is the state of technology for medical AI? Next-Gen Healthcare: ProbMed adversarial Pairs All rights with authors of: Worse than Random? An Embarrassingly Si...

19:12

BEST RAG you can buy: LAW AI (Stanford)

Просмотров 4,5 тыс.14 дней назад

Best commercially available legal research /Law AI RAG systems, evaluated by Stanford, in new research. All rights with authors only: Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools arxiv.org/pdf/2405.20362 hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasive #airesearch #law #ai

RAG explained step-by-step up to GROKKED RAG sys

59:31

RAG explained step-by-step up to GROKKED RAG sys

Просмотров 5 тыс.14 дней назад

Today I try to answer all questions by my subscriber about my last three videos, w/ focus on the new Grokked LLM integration into traditional RAG systems. I'll cover a wide array of questions, incl ARM Graph based re-ranker for optimal RAG systems to new "Buffer of Thoughts" BoT reasoning methods of LLMs (so we have Chain-of-Thoughts, Tree-of-Thoughts, Graph-of-Thoughts and now: Buffer-of-Thoug...

GROKKED LLM beats RAG Reasoning (Part 3)

30:03

GROKKED LLM beats RAG Reasoning (Part 3)

Просмотров 7 тыс.21 день назад

We open the black box of GROKKED LLMs and analyze each layer of the transformer architecture for its performance in causal reasoning, after the grokking phase transition of our LLM. Current research in AI clearly indicates that established LLMs, like Gemini Pro 1.5 or GPT-4 Turbo fail in deep reasoning, even when integrated in complex RAG systems. A Grokking phase transition is essential for LL...

47:51

LLM - Reasoning SOLVED (new research)

Просмотров 15 тыс.21 день назад

Grokking transformers, a technique for infusing transformers also with near-perfect causal reasoning abilities. (Note: Grokking has nothing to do with Musk's AI Grok or Groq Inc. for fast inference.) Grokking achieves this by enabling transformers to identify hierarchical structures within human sentences. Through extended training, the internal structure of the transformer undergoes a fundamen...

New Discovery: LLMs have a Performance Phase

29:51

New Discovery: LLMs have a Performance Phase

Просмотров 14 тыс.21 день назад

Grokking is a new phase in the performance of LLMs. Starting with arithmetic operations, we analyze the patterns in the embedded space of Transformers. Grokking refers to a phenomenon where, after extensive training beyond typical saturation points, transformers can generalize effectively to unseen data, achieving high performance long after initial overfitting occurs. This discovery challenges...

One Thought on the Future of AI Agents World Model

29:13

One Thought on the Future of AI Agents World Model

Просмотров 2,1 тыс.28 дней назад

A New Vision Language Action Video Mathematics Model? One simple Thought on the Future of AI Agents, regarding its future abilities to include financial or physical simulations in code. Function calling, Tool use, reasoning in LLMs, Planning AI, Science AI. A Vision Language Action Video Mathematics Model as the new WORLD MODEL for AI? #airesearch #ainews

NEW ChatGPT EDU for AI Universities: Unique SALE

20:41

NEW ChatGPT EDU for AI Universities: Unique SALE

Просмотров 2,2 тыс.Месяц назад

NEW ChatGPT EDU for AI Universities: Unique SALE

46:27

Agentic AI: The Future is here?

Просмотров 8 тыс.Месяц назад

Agentic AI: The Future is here?

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

58:30

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

Просмотров 1 тыс.Месяц назад

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

RoPE Rotary Position Embedding to 100K context length

39:56

RoPE Rotary Position Embedding to 100K context length

Просмотров 2,2 тыс.Месяц назад

RoPE Rotary Position Embedding to 100K context length

Many-Shot VISUAL ICL is amazing! (Stanford)

23:19

Many-Shot VISUAL ICL is amazing! (Stanford)

Просмотров 2 тыс.Месяц назад

Many-Shot VISUAL ICL is amazing! (Stanford)

In-Context Learning: EXTREME vs Fine-Tuning, RAG

21:42

In-Context Learning: EXTREME vs Fine-Tuning, RAG

Просмотров 3,6 тыс.Месяц назад

In-Context Learning: EXTREME vs Fine-Tuning, RAG

Warning GPT-4o: DON'T translate to Chinese (MIT)

9:40

Warning GPT-4o: DON'T translate to Chinese (MIT)

Просмотров 1,7 тыс.Месяц назад

Warning GPT-4o: DON'T translate to Chinese (MIT)

CODE Fine-Tune Vision Language VLM eg PaliGemma-3B

13:40

CODE Fine-Tune Vision Language VLM eg PaliGemma-3B

Просмотров 2 тыс.Месяц назад

CODE Fine-Tune Vision Language VLM eg PaliGemma-3B

GPT-4 Turbo vs GPT-4o in Reasoning TEST #gpt4o

9:47

GPT-4 Turbo vs GPT-4o in Reasoning TEST #gpt4o

Просмотров 1,8 тыс.Месяц назад

GPT-4 Turbo vs GPT-4o in Reasoning TEST #gpt4o

New Trick for Fine-Tuning LLMs #airesearch

27:23

New Trick for Fine-Tuning LLMs #airesearch

Просмотров 2,6 тыс.Месяц назад

New Trick for Fine-Tuning LLMs #airesearch

9:52

From Dating Apps to AI: Gen Z Edition 😆

Просмотров 661Месяц назад

From Dating Apps to AI: Gen Z Edition 😆

Do not use Llama-3 70B for these tasks ...

12:33

Do not use Llama-3 70B for these tasks ...

Просмотров 3 тыс.Месяц назад

Do not use Llama-3 70B for these tasks ...

New xLSTM explained: Better than Transformer LLMs?

22:33

New xLSTM explained: Better than Transformer LLMs?

Просмотров 5 тыс.Месяц назад

New xLSTM explained: Better than Transformer LLMs?

GPT-4o in stealth as im-a-good-gpt2-chatbot

15:20

GPT-4o in stealth as im-a-good-gpt2-chatbot

Просмотров 2,6 тыс.Месяц назад

GPT-4o in stealth as im-a-good-gpt2-chatbot

Understand DSPy: Programming AI Pipelines

28:21

Understand DSPy: Programming AI Pipelines

Просмотров 3,5 тыс.Месяц назад

Understand DSPy: Programming AI Pipelines

@josea8187 День назад
Hello, thanks a lot for the awesome explanation! SBERTS don’t receive the attention they need. One question though, where do cross-encoders come to play here then? Is it a sentence embedding model with an output layer that produces a score between 0 and 1?
@thesimplicitylifestyle День назад
Yay! 😎🤖
@btscheung День назад
Your presentation in this video is definitely A+ in terms of clarity and depth of understanding! well done. Also, I am happy to see a real paper and study on the speculative Q* heuristic search algorithm. Although their results seems to not justify the effort and added complexity, we are only looking at well-known math problems that those LLMs might be pre-trained and focused a lot. If we change the angle to the algorithm is applied in the general solution search space, with greater complexity, Q* is the way to go!
@tablen2896 День назад
Small tip: black borders on white font makes text easier to read and less tiring to watch
@drdca8263 День назад
27:58 : you say “estimated utility of reaching the correct answer”. Does this mean “an estimate of what the utility would be if the correct answer is obtained” (which sounds to me like the plainest interpretation , but also the least likely, as I would think the utility for that would be arbitrary) or “the expected value of the random variable which gives utility based just on whether final answer is correct”, or “the expected value of the random variable, utility, which is determined by both whether the final answer is correct, and other things, such as length of answer”, or something else?
@gregsLyrics День назад
firehose to my brain. Amazing! This indicates a fairly long path of steps I need to learn so I can properly digest this beautiful wisdom. Really amazing channel, filled with advanced knowledge of the gods.
@drdca8263 День назад
I thought Q* was supposed to be a project by Google or OpenAI (I forget which, but I thought it was supposed to be one of them). The authors listed in the paper are indicated as being affiliated with either “Skywork AI” or “Nanyang Technology university”? Is this a model inspired by the rumors of there being a model with the name “Q*”, or is this the model the rumors were about? Were some of these people previously at OpenAI or Google, but not anymore? Or..?
@fontende День назад
great theory, but only theory, in science if method cannot be reproduced independently...it's not a discovery. Similar to patents, this kinda published but there's no practical method of programming code to prove such, and the most important i haven't seen such method implementation in any models which i've seen, like new Gemma from Gogle using many new tricks but not even mention this.
@theoptimisticnihilistyt День назад
wow
@scitechtalktv9742 2 дня назад
Interesting explanation! You mentioned there is code to try it yourself, but I cannot find that. Can you point me to it?
@smicha15 2 дня назад
246th view. Nailed it!
@GodbornNoven 2 дня назад
Amazing video as always
@syedibrahimkhalil786 2 дня назад
Fourth then 😂
@SirajFlorida 2 дня назад
LoL. Third I guess. Well Yacinezahidi was 0th user, is 1st, and I'm 2nd.
@user-uz1ol2gs6y 2 дня назад
Second
@yacinezahidi7206 2 дня назад
First viewer here 🗡️
@jmirodg7094 2 дня назад
Excelent! we need more like this pls.
@godgivespizza238 3 дня назад
Correct me if i am wrong are we focusing too much on perfecting linear algebra based models other than abstract and universal algebra concepts. Because in graph theory when related to abstract algebra the data representations were in triangles building higher dimensional structures from lower dimensional nodes, from nodes to to complex graphs and theirevolution. There were better concepts in mathematics for solving mathematical cognition based reasoning that deep learning experts may not exploring So what is happening ? I mean i know it took huge amount of resources but so does transformers in the begining
@code4AI 3 дня назад
Imagine the psychological shock, if big tech announces that their beloved Ai systems, that they invested billions of dollars, fail at simple logic tests, before they can actually materialize a return of their investment? Because until now only the short time financial evaluation of these companies went up, based on what they promised for future AI. Image what Microsoft would be worth without AI, without Copilot plus, ..... only based on the value of Win11? smile.
@attashemk8985 3 дня назад
Hermes-Pro-Llama-3 with my custom sampler (kinda beam search with restarts) solves your prompt: Generated completion: Let us assume that the official fees for sending one child to Stanford is $X. Now, since Stanford pays 90% of these fees for families with low income, it would mean that the parents need to cover the remaining 10% of the fees. So, they would have to pay $0.1 * X for each child. < .... an useless calculation ... > Dividing both sides by 6, we find T = 0.1 / 6 ≈ 0.0167 years or about 5 days. However, since the time it takes to accumulate the required funds is measured in years, they won't have enough money for the 7th child with no additional income. They would need some savings or outside financial help. ---- After all model made right conclusion
@code4AI 3 дня назад
Now if we enter the arena of additional prompt instruction to my innocent little prompt: An A* approach would be more elegant than beam search.
@ragnarherron7742 3 дня назад
You asked the wrong expert. Lean4 is a programming language for math theorum proving. Your data cleaning gateway must pass through this transform to assure that data and its use is unambigous.
@TiagoTiagoT 3 дня назад
I know just enough to be dangerous as the saying goes, but here's an idea I had a while ago: how about something that instead of building on sequential data like LLMs, or a operating all at once on a predefined grid like diffusion based image generators and such; what if you had something based on the Wave Function Collapse algorithm (the game map generation one, not quantum mechanics), but applied to a free-form graph structure, with both nodes and edges being tokens; the graph is initialized with the prompt (and context) with whatever dangling edges it may imply, and as the AI thinks the probabilities of adding, removing, and replacing tokens are adjusted (keeping the prompt and context frozen, obviously), until a cluster that has good enough score (including self-consistency and compatibility with the prompt cluster's edges) is found, and that gets interpreted as the output in whatever modality the graph it forms represents? Something like this could allow for the AI to intuit (is that a word?) an end goal that starts disconnected (the void could be considered it's own node token with edges that are compatible with all tokens, and always have dangling edges, but it would be special in that it would not be counted as belonging to any cluster), and then gradually deduce a way to connect it to the context+prompt cluster, and be able to think in a more flexible way, with parallel hypothesis, inherent self-consistency checking mechanisms etc, right?
@anonymousaustralianhistory2081 4 дня назад
Hybrid-Graph-Abstraction-of-Thought thats cool do you find it works better?
@MBR7833 4 дня назад
Thank you so much for your content! So for the 7th student thing, I think I understand why LeChat worked but not Claude / ChatGPT: LeChat likely did not have the "reasoning training" (or the meta prompt with all the examples) that the more recent models have, and therefore was not "tricked". If you have not come across this article / team, I would love to understand it more "Transformers meet Neural Algorithmic Reasoners" by the team led by Petar Veličković at Deepmind which is likely one of the most interesting teams as they do research in topology (group theory etc)
@Khari99 4 дня назад
I believe hallucinations are not going away because of how LLMs are designed. Since they are only predicting the next token, their probabilistic outputs will be incorrect when the degree of confidence they have in their reasoning is not high. They'd have to get reasoning to 100%. Even if this is possible with grokking for some tasks, it will be a while until general reasoning is solved across all tasks. Even with 99.5% accuracy, that 0.5% accuracy will be the hallucination. LLM's lack the capability to say "I don't know" because they don't have an understanding of their own internal state representation. They can only derive the most probable output from its internal states.
@D_Analyst007 2 дня назад
yep! as karpathy said it
@Dan.OutSee 4 дня назад
How's the brain damage working out for you? Hope you feel better soon.
@Dan.OutSee 4 дня назад
Why did I get notified about this just now? I wanted to be first :D
@alextitov6425 5 дней назад
I was able to get relevant answer only with following instructions with sonet 3.5: PROMPT question: "Stanford provides financial aid to families with low income. They pay 90% of their official fees. If a poor family with 6 children will send all children to Stanford, at what time will they have enough money, received from Stanford, to send the 7th child to Stanford, if they have no money at all??" Please provide the most logical and straightforward answer on question based on the information given, even if the prompt contains inconsistencies or unrealistic assumptions. 1. Focusing on giving a direct answer rather than just analysing the prompt. 2. Using the information provided, even if it seems unrealistic. 3. Reaching a logical conclusion based on the given scenario. Iterate over own results to give most relevant answer. ANSWER from Sonnet 3.5 (temp 0.3): Based on the information provided in the question, I'll provide the most logical and straightforward answer possible: The question contains an inherent contradiction that makes it impossible to provide a definitive answer. Here's why: 1. Stanford provides 90% financial aid to low-income families. 2. The family in question has no money at all. 3. The family needs to pay 10% of the fees for each child (since Stanford covers 90%). 4. The question asks when they'll have enough money from Stanford to send a 7th child. Given these conditions, the family will never have enough money to send the 7th child to Stanford. Here's the reasoning: 1. With no initial money, they can't afford even the 10% for the first child. 2. Stanford doesn't provide money to families; it reduces the amount they need to pay. 3. Even if all 6 children somehow attended Stanford, the family wouldn't receive any money from Stanford to save for a 7th child. Therefore, the most logical answer is: The family will never have enough money received from Stanford to send the 7th child to Stanford. The premise of the question is flawed, as Stanford's financial aid model doesn't provide money to families but rather reduces the amount they need to pay.
@code4AI 5 дней назад
So in your tests, CLAUDE 3.5 SONNET was also not able to solve my prompt. Thank you for confirming this fact. And as I mentioned, there are multiple ways to additionally guide or help CLAUDE 3.5 SONNET to solve this prompt (without altering my prompt - thank you for confirming this, since a lot of people claim that this prompt is nonsense), just by giving it additional instructions how to find a solution (additional prompt engineering with solution pathways). But if we as humans all the time have to give our LLMs additional solution paths, since AI alone is not able to solve it, ....well. Which was the point of my video.
@alextitov6425 5 дней назад
@@code4AI At least we can see that LLM smart enough to answer the question without modifying the initial question structure or context. We can consider abstract instructions as unify way to show right direction to LLM and reuse the same instructions for any "illogical" question. It is actually annoying that LLMs don't see human intentions.
@LazarusStirs 5 дней назад
Honestly have not had this problem at all. In fact the 15 logical tests I gave to Claude, it past 12 of them while GPT only got 5.
@code4AI 5 дней назад
So you were not able to replicate the logical bug on CLAUDE 3.5 SONNET that I presented in my video? Or do you want to say you discovered 3 more on SONNET?
@thebluriam 5 дней назад
I think if you wanted to experiment with this technique properly you would either need to use the API, or maybe use the API playground. Using the ChatGPT UI is going to come pre-poisoned from the embedded system prompt that OpenAI forces
@JohnLewis-old 5 дней назад
How long does the logical train need to be to confuse Sonnet?
@code4AI 5 дней назад
I always ask, how complex can my research become before AI starts to hallucinate. Or: how complex can my financial analysis become, before AI makes pure reasoning mistakes on some simple terms and fails to conclude reasonable investment strategies.
@JohnLewis-old 5 дней назад
@@code4AI So how long did it take? One logical association, 10?
@code4AI 5 дней назад
If I could have access to the proprietary CLAUDE 3.5 SONNET and open the hood of the black box, I could give you a theoretical answer about a probability. A hallucination could occur in the first step, in the middle or at the end.
@JohnLewis-old 5 дней назад
@@code4AI Couldn't you just shorten the prompt until it got it right?
@code4AI 4 дня назад
To be statistically relevant, I assume I would have to pay for 1000 tests on its API.
@MrAloha 6 дней назад
Aloha kakahiaka, from Las Vegas, NV. I love you, bro! Thank you!
@eferra8343 6 дней назад
how to rag with jamba ?
@christiand6312 6 дней назад
Can we have a discord please?
@christiand6312 6 дней назад
Also can I become good at jax when aiming for parrallelism improvments to reduce compute costs in training an LLM? Can you pleae explain hwo we can use an 8xh100 for 6-12 hours and get a 70B model trained on some Legal corpus data for a niche case? Is this actually possible? Would love to know.
@tomw4688 6 дней назад
Great catch! Thanks for reviewing this.

code_your_own_AI

Видео

Комментарии