Видео 706
Просмотров 2 102 672

Q* explained: Complex Multi-Step AI Reasoning

5 Easy Ways to help LLMs to Reason

Claude 3.5 SONNET hallucinates w/ a Logic Bug?

Decoding AI's Blind Spots: Solving Causal Reasoning

NEW Multi-Modal AI by APPLE

Видео

Masterclass on AI by Microsoft

Просмотров 1,1 тыс.2 дня назад

What a brilliant insight: A masterclass by Microsoft how to use a (security risk) communication to your customer to cross- and up-sell new products. My video delves into communication from Microsoft concerning a new AI vulnerability known as the "Skeleton Key," which poses a significant security risk across various AI platforms, not limited to Microsoft's own. It is noted that this vulnerabilit...

Q* explained: Complex Multi-Step AI Reasoning

Просмотров 5 тыс.4 дня назад

NEW Q* explained: Complex Multi-Step AI Reasoning for Experts only (integrating graph theory and Q-learning from reinforcement learning of LLMs and VLMs). My video provides an in-depth analysis of Q-Star, a novel approach that amalgamates Q-Learning and A-Star algorithms to address the challenges faced by large language models (LLMs) in multi-step reasoning tasks. This approach is predicated on...

5 Easy Ways to help LLMs to Reason

Просмотров 3,2 тыс.6 дней назад

5 Effective Strategies to Enhance LLM Reasoning: If your LLM (either an open source LLama 3 or a proprietary GPT-4omni) fails at reasoning, given your task, I introduce 5 easy methods to help LLMs to improve their reasoning capability significantly. Boost LLM Reasoning: 5 methods w/o fine-tuning LLMs From Chain-Thoughts, to Tree-of-Thoughts, Graph-of-Thoughts, Abstraction-of-Thoughts to my own ...

Claude 3.5 SONNET hallucinates w/ a Logic Bug?

Просмотров 1,3 тыс.8 дней назад

CLAUDE 3.5 SONNET tested for causal reasoning and logic reveals massive problem. CLAUDE 3.5 SONNET simply generates incorrect facts (hallucinates) plus operates w/ a logic bug called "affirming the consequent", where logical chains are incorrectly inverted and proclaimed true. This CLAUDE 3.5 SONNET logic bug has consequences for my tasks in writing code, mathematics, logic, finance, medical, g...

Decoding AI's Blind Spots: Solving Causal Reasoning

Просмотров 1,9 тыс.10 дней назад

How to Fix AI's Causal Reasoning Failures as evident in my last video on "Financial AI Brilliance: 7 Children at Stanford?" ruclips.net/video/YBdTd09OuYk/видео.html Great response from the AI community on my prompt, testing causal reasoning, where all LMMs failed. Here now my response. #airesearch #aieducation #failure

NEW Multi-Modal AI by APPLE

Просмотров 1,9 тыс.11 дней назад

Apple published new Machine Learning (ML) models on its GitHub repo: 4M-21. Massively Multimodal Masked Modelling. All rights w/ authors: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities arxiv.org/pdf/2406.09406 Video from Apple and Lausanne: storage.googleapis.com/four_m_site/videos/4M-21_Website_Video.mp4 #appleai #apple #multimodalai

Financial AI Brilliance: 7 Children at Stanford? 😆

Просмотров 1,2 тыс.13 дней назад

A novel benchmark for financial excellence in reasoning for the best Large Language Models (LLMs) on this planet. Surprising results. My simple test: "Stanford provides a financial aid to families with low income. They pay 90% of their official fees. If a poor family with 6 children will send all children to Stanford, at what time will they have enough money, received from Stanford, to send the...

Text-to-GRAPH w/ LGGM: Generative Graph Models

Просмотров 3,9 тыс.15 дней назад

New research explores the possibilities of large generative graph models (LGGM). 2 universities, Adobe and Intel explore diffusion based tech and commercial applications of new text2Graph functionality (from biomed to cybersecurity). all rights w authors: Large Graph Generative Models arxiv.org/pdf/2406.05109 #airesearch #graph #newtechnology

NEW TextGrad by Stanford: Better than DSPy

Просмотров 8 тыс.17 дней назад

In this TEXTGRAD framework, each AI system is transformed into a computation graph, where variables are inputs and outputs of complex (not necessarily differentiable) function calls. The feedback to the variables (dubbed ‘textual gradients’) are provided in the form of informative and interpretable natural language criticism to the variables; describing how a variable should be changed to impro...

Adversarial Questions Test Multimodal MED AI sys

Просмотров 1,3 тыс.19 дней назад

My cranial MRI data were recorded and the question is: What medical Large Multimodal Model (LMM) to use for AI :: medical analysis, X-ray, CT or MRI? Medical Vision Question Answering (VQA). AI models analyse medical images and scans. What is the state of technology for medical AI? Next-Gen Healthcare: ProbMed adversarial Pairs All rights with authors of: Worse than Random? An Embarrassingly Si...

BEST RAG you can buy: LAW AI (Stanford)

Просмотров 4,6 тыс.21 день назад

Best commercially available legal research /Law AI RAG systems, evaluated by Stanford, in new research. All rights with authors only: Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools arxiv.org/pdf/2405.20362 hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasive #airesearch #law #ai

RAG explained step-by-step up to GROKKED RAG sys

Просмотров 5 тыс.23 дня назад

Today I try to answer all questions by my subscriber about my last three videos, w/ focus on the new Grokked LLM integration into traditional RAG systems. I'll cover a wide array of questions, incl ARM Graph based re-ranker for optimal RAG systems to new "Buffer of Thoughts" BoT reasoning methods of LLMs (so we have Chain-of-Thoughts, Tree-of-Thoughts, Graph-of-Thoughts and now: Buffer-of-Thoug...

GROKKED LLM beats RAG Reasoning (Part 3)

Просмотров 7 тыс.25 дней назад

We open the black box of GROKKED LLMs and analyze each layer of the transformer architecture for its performance in causal reasoning, after the grokking phase transition of our LLM. Current research in AI clearly indicates that established LLMs, like Gemini Pro 1.5 or GPT-4 Turbo fail in deep reasoning, even when integrated in complex RAG systems. A Grokking phase transition is essential for LL...

LLM - Reasoning SOLVED (new research)

Просмотров 15 тыс.27 дней назад

Grokking transformers, a technique for infusing transformers also with near-perfect causal reasoning abilities. (Note: Grokking has nothing to do with Musk's AI Grok or Groq Inc. for fast inference.) Grokking achieves this by enabling transformers to identify hierarchical structures within human sentences. Through extended training, the internal structure of the transformer undergoes a fundamen...

New Discovery: LLMs have a Performance Phase

Просмотров 14 тыс.29 дней назад

Grokking is a new phase in the performance of LLMs. Starting with arithmetic operations, we analyze the patterns in the embedded space of Transformers. Grokking refers to a phenomenon where, after extensive training beyond typical saturation points, transformers can generalize effectively to unseen data, achieving high performance long after initial overfitting occurs. This discovery challenges...

@MagusArtStudios День назад
I think a good method of grokking would be to train on data compressed by synonyms
@cholst1 День назад
"forbes article" -aka someones blogpost on their forbes "site"
@archiee1337 День назад
great insights, thank you
@anshtanwar1813 День назад
Great series, really enjoyed it.
@gheatza День назад
I was about to ask if this video was sponsored by Microsoft 🤦‍♂ thank you for the translation 😉 , did not occur to me, honestly 🤦‍♂
@mulderbm 2 дня назад
😂 Thank you
@mulderbm 2 дня назад
Gigo all the way. Google as a reseller of third parties and AI as the front
@nthehai01 2 дня назад
Thank you for such a detailed explanation. Really enjoyed it 🚀. But is this Q* somewhat relevant to the one from OpenAI that people have been talking about 🧐?
@h-s7218 2 дня назад
I had this error when executing it : TypeError: Missing required arguments; Expected either ('model' and 'prompt') or ('model', 'prompt' and 'stream') arguments to be given
@hm2OO5 2 дня назад
Thanks for the great content! Where can I find the link to your colab notebook (including your extra comments)?
@neodim1639 2 дня назад
Gemma 2 27B (Q4_K_M) got it zero shot: ** User: Stanford provides a financial aid to families with low income. They pay 90% of their official fees. If a poor family with 6 children will send all children to Stanford, at what time will they have enough money, received from Stanford, to send the 7th child to Stanford, if they have no money at all?? ** Ellama: This is a bit of a trick question! Here's why: + *Financial aid is need-based:* If a family has no income, it's unlikely they would receive 90% coverage from Stanford's financial aid program. *The problem with the scenario:* The question assumes the family has to rely solely on Stanford's financial aid for their 7th child's education. This is a common misconception - financial aid often doesn't cover 100% of costs, and it's based on financial need. A family with no income would likely find it impossible to attend Stanford without some form of outside funding or support. Let me know if you'd like to explore a different scenario with realistic income constraints!
@code4AI 2 дня назад
Nice.
@code4AI 2 дня назад
Here is GEMMA-2-9B answer: "This is a bit of a trick question! Here's why: Stanford's financial aid is based on need: They don't simply pay 90% of fees for everyone. The amount of aid a family receives depends on their income and assets. The question assumes a constant cost: College tuition and fees increase over time. To figure out when the family could afford a 7th child, we'd need more information: The family's income and assets: This determines their eligibility for financial aid. The actual cost of attendance at Stanford: Tuition, fees, room, board, books, etc. How much financial aid Stanford would provide for each child: This varies based on need. Let me know if you want to make some assumptions about these factors, and we can try to estimate a timeline!"
@hyungsukkim6361 2 дня назад
My Claude 3.5 Sonnet answered like this for your prompt: ------------------------- Let's approach this step-by-step, using the given facts and rules to determine if Alice is wozz. Step 1: Fact - Alice is feg Step 2: Fact - Alice is tess Step 3: Rule - If feg and tess, then rabe Since Alice is both feg and tess, we can conclude that Alice is rabe Step 4: Rule - If rabe, then thoxe Since we concluded Alice is rabe, we can now conclude Alice is thoxe Step 5: Rule - If thoxe, then vepe Since we concluded Alice is thoxe, we can now conclude Alice is vepe Step 6: Fact - Alice is tess (we already knew this, but we're restating it for the next step) Step 7: Rule - If tess and vepe, then nup Since Alice is both tess and vepe, we can conclude that Alice is nup Step 8: Fact - Alice is ag Step 9: Fact - Alice is vide Step 10: Rule - If nup and ag and vide, then duve Since Alice is nup, ag, and vide, we can conclude that Alice is duve (Note: We already knew Alice is duve from the facts, but this confirms it through the rules) Step 11: Rule - If fum, then wozz However, we don't have any information about Alice being fum Conclusion: Based on the given facts and rules, we cannot conclude that Alice is wozz. There's no rule or chain of rules that leads us from the facts we know about Alice to the conclusion that she is wozz. Therefore, the answer to the query "Is Alice wozz?" is: We don't know. We cannot conclude that Alice is wozz based on the given information.
@code4AI 2 дня назад
Beautiful, another wrong answer by Claude 3.5 SONNET. Since there is a logic path to conclude if Alice is wozz.
@ruchaapte5124 2 дня назад
Superb video :) Very clear and concise explanation. Thank you.
@code4AI 2 дня назад
Thank you for this comment. Appreciate it if people take time for a feedback.
@josea8187 3 дня назад
Hello, thanks a lot for the awesome explanation! SBERTS don’t receive the attention they need. One question though, where do cross-encoders come to play here then? Is it a sentence embedding model with an output layer that produces a score between 0 and 1?
@thesimplicitylifestyle 3 дня назад
Yay! 😎🤖
@btscheung 3 дня назад
Your presentation in this video is definitely A+ in terms of clarity and depth of understanding! well done. Also, I am happy to see a real paper and study on the speculative Q* heuristic search algorithm. Although their results seems to not justify the effort and added complexity, we are only looking at well-known math problems that those LLMs might be pre-trained and focused a lot. If we change the angle to the algorithm is applied in the general solution search space, with greater complexity, Q* is the way to go!
@tablen2896 3 дня назад
Small tip: black borders on white font makes text easier to read and less tiring to watch
@drdca8263 3 дня назад
27:58 : you say “estimated utility of reaching the correct answer”. Does this mean “an estimate of what the utility would be if the correct answer is obtained” (which sounds to me like the plainest interpretation , but also the least likely, as I would think the utility for that would be arbitrary) or “the expected value of the random variable which gives utility based just on whether final answer is correct”, or “the expected value of the random variable, utility, which is determined by both whether the final answer is correct, and other things, such as length of answer”, or something else?
@gregsLyrics 3 дня назад
firehose to my brain. Amazing! This indicates a fairly long path of steps I need to learn so I can properly digest this beautiful wisdom. Really amazing channel, filled with advanced knowledge of the gods.
@drdca8263 3 дня назад
I thought Q* was supposed to be a project by Google or OpenAI (I forget which, but I thought it was supposed to be one of them). The authors listed in the paper are indicated as being affiliated with either “Skywork AI” or “Nanyang Technology university”? Is this a model inspired by the rumors of there being a model with the name “Q*”, or is this the model the rumors were about? Were some of these people previously at OpenAI or Google, but not anymore? Or..?
@jswew12 2 дня назад
It was OpenAI internal document leaks I believe. I’m wondering the same thing! I feel like it has to be related, otherwise this feels kind of wrong. I understand wanting to get eyes on your research, and this seems like good research so I commend them on that, but still. If anyone has more info, leave a reply.
@a_soulspark 2 дня назад
I'm also really confused. Skywork AI seems to be a legit company/research group, they have released models in the past. however, I see no indication that their Q* is related to OpenAI's. the authors of this paper don't seem to have a record on big tech companies. one of the authors, Chaojie Wang, has a github page which gives some more context (you can look it up on Google if you want)
@a_soulspark 2 дня назад
I also was quite confused! It doesn't seem like the people behind the paper have any relation with big tech companies (Google, OpenAI, Microsoft, etc.) and it doesn't seem like their paper is directly related to OpenAI's supposed Q*
@a_soulspark 2 дня назад
my old comment got deleted, perhaps bc some word triggered the algorithm. I just said you can use search to find out more about the authors, the first one in the cover of the paper immediately answers many questions.
@idiomaxiom День назад
The trick is whether you have a Q* over a sequence or if you figured out how to credit a sequence for good or bad. "The Credit assignment problem". Possibly OpenAI has figured out a fine grained Q* which would give fast accurate feedback and learning.
@fontende 3 дня назад
great theory, but only theory, in science if method cannot be reproduced independently...it's not a discovery. Similar to patents, this kinda published but there's no practical method of programming code to prove such, and the most important i haven't seen such method implementation in any models which i've seen, like new Gemma from Gogle using many new tricks but not even mention this.
@theoptimisticnihilistyt 3 дня назад
wow
@scitechtalktv9742 4 дня назад
Interesting explanation! You mentioned there is code to try it yourself, but I cannot find that. Can you point me to it?
@parthmakode5255 День назад
please tag me also once you find the code
@smicha15 4 дня назад
246th view. Nailed it!
@GodbornNoven 4 дня назад
Amazing video as always
@syedibrahimkhalil786 4 дня назад
Fourth then 😂
@SirajFlorida 4 дня назад
LoL. Third I guess. Well Yacinezahidi was 0th user, is 1st, and I'm 2nd.
@user-uz1ol2gs6y 4 дня назад
Second
@yacinezahidi7206 4 дня назад
First viewer here 🗡️
@jmirodg7094 4 дня назад
Excelent! we need more like this pls.
@godgivespizza238 5 дней назад
Correct me if i am wrong are we focusing too much on perfecting linear algebra based models other than abstract and universal algebra concepts. Because in graph theory when related to abstract algebra the data representations were in triangles building higher dimensional structures from lower dimensional nodes, from nodes to to complex graphs and theirevolution. There were better concepts in mathematics for solving mathematical cognition based reasoning that deep learning experts may not exploring So what is happening ? I mean i know it took huge amount of resources but so does transformers in the begining
@code4AI 5 дней назад
Imagine the psychological shock, if big tech announces that their beloved Ai systems, that they invested billions of dollars, fail at simple logic tests, before they can actually materialize a return of their investment? Because until now only the short time financial evaluation of these companies went up, based on what they promised for future AI. Image what Microsoft would be worth without AI, without Copilot plus, ..... only based on the value of Win11? smile.
@attashemk8985 5 дней назад
Hermes-Pro-Llama-3 with my custom sampler (kinda beam search with restarts) solves your prompt: Generated completion: Let us assume that the official fees for sending one child to Stanford is $X. Now, since Stanford pays 90% of these fees for families with low income, it would mean that the parents need to cover the remaining 10% of the fees. So, they would have to pay $0.1 * X for each child. < .... an useless calculation ... > Dividing both sides by 6, we find T = 0.1 / 6 ≈ 0.0167 years or about 5 days. However, since the time it takes to accumulate the required funds is measured in years, they won't have enough money for the 7th child with no additional income. They would need some savings or outside financial help. ---- After all model made right conclusion
@code4AI 5 дней назад
Now if we enter the arena of additional prompt instruction to my innocent little prompt: An A* approach would be more elegant than beam search.
@ragnarherron7742 5 дней назад
You asked the wrong expert. Lean4 is a programming language for math theorum proving. Your data cleaning gateway must pass through this transform to assure that data and its use is unambigous.
@TiagoTiagoT 5 дней назад
I know just enough to be dangerous as the saying goes, but here's an idea I had a while ago: how about something that instead of building on sequential data like LLMs, or a operating all at once on a predefined grid like diffusion based image generators and such; what if you had something based on the Wave Function Collapse algorithm (the game map generation one, not quantum mechanics), but applied to a free-form graph structure, with both nodes and edges being tokens; the graph is initialized with the prompt (and context) with whatever dangling edges it may imply, and as the AI thinks the probabilities of adding, removing, and replacing tokens are adjusted (keeping the prompt and context frozen, obviously), until a cluster that has good enough score (including self-consistency and compatibility with the prompt cluster's edges) is found, and that gets interpreted as the output in whatever modality the graph it forms represents? Something like this could allow for the AI to intuit (is that a word?) an end goal that starts disconnected (the void could be considered it's own node token with edges that are compatible with all tokens, and always have dangling edges, but it would be special in that it would not be counted as belonging to any cluster), and then gradually deduce a way to connect it to the context+prompt cluster, and be able to think in a more flexible way, with parallel hypothesis, inherent self-consistency checking mechanisms etc, right?
@anonymousaustralianhistory2081 6 дней назад
Hybrid-Graph-Abstraction-of-Thought thats cool do you find it works better?

code_your_own_AI

Видео

Комментарии