sistemas.ai

sistemas.ai Artificial Intelligence
(4)

๐“๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ ๐„๐ฑ๐ฉ๐ฅ๐š๐ข๐ง๐ž๐ซBeautiful visualization of the inner workings of a Transformer.
16/08/2024

๐“๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ ๐„๐ฑ๐ฉ๐ฅ๐š๐ข๐ง๐ž๐ซ
Beautiful visualization of the inner workings of a Transformer.


An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.

โ€œ๐—”๐—น๐—ด๐—ผ๐—ฟ๐—ถ๐˜๐—ต๐—บ๐˜€ ๐—ณ๐—ผ๐—ฟ ๐——๐—ฒ๐—ฐ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐— ๐—ฎ๐—ธ๐—ถ๐—ป๐—ดโ€, MIT publishing book is freely available.Book: https://algorithmsbook.com/
15/08/2024

โ€œ๐—”๐—น๐—ด๐—ผ๐—ฟ๐—ถ๐˜๐—ต๐—บ๐˜€ ๐—ณ๐—ผ๐—ฟ ๐——๐—ฒ๐—ฐ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐— ๐—ฎ๐—ธ๐—ถ๐—ป๐—ดโ€, MIT publishing book is freely available.
Book: https://algorithmsbook.com/

๐—–๐—ผ๐—ฑ๐—ถ๐—ป๐—ด ๐—ฎ ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น (๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป) ๐—Ÿ๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜€๐—ฐ๐—ฟ๐—ฎ๐˜๐—ฐ๐—ต ๐—ถ๐—ป  ๐—ฃ๐˜†๐—ง๐—ผ๐—ฟ๐—ฐ๐—ต.Topics:Transformer model (Embeddings, Positional En...
08/08/2024

๐—–๐—ผ๐—ฑ๐—ถ๐—ป๐—ด ๐—ฎ ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น (๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป) ๐—Ÿ๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜€๐—ฐ๐—ฟ๐—ฎ๐˜๐—ฐ๐—ต ๐—ถ๐—ป ๐—ฃ๐˜†๐—ง๐—ผ๐—ฟ๐—ฐ๐—ต.
Topics:
Transformer model (Embeddings, Positional Encoding, Multi-Head Attention, Feed Forward Layer, Logits, Softmax)
Vision Transformer model
Contrastive learning (CLIP, SigLip)
Numerical stability of the Softmax and the Cross Entropy Loss
Rotary Positional Embedding
Multi-Head Attention
Grouped Query Attention
Normalization layers (Batch, Layer and RMS)
KV-Cache (prefilling and token generation)
Attention masks (causal and non-causal)
Weight tying
Top-P Sampling and Temperature
and much more!

https://youtu.be/vAmKB7iPkWw?si=Lp4xV3xJlMgQfdtL

Full coding of a Multimodal (Vision) Language Model from scratch using only Python and PyTorch. We will be coding the PaliGemma Vision Language Model from sc...

๐Ÿ”ฅhttps://artificialanalysis.ai/ is a great site that benchmarks quality/speed/price of different LLM API providers to he...
07/07/2024

๐Ÿ”ฅhttps://artificialanalysis.ai/ is a great site that benchmarks quality/speed/price of different LLM API providers to help developers pick which models to use.

๐Ÿ”ฅAn excellent new book to learn the concepts of Deep Learning.Topics include fundamental building blocks, Transformers, ...
06/07/2024

๐Ÿ”ฅAn excellent new book to learn the concepts of Deep Learning.
Topics include fundamental building blocks, Transformers, GNNs, RL, diffusion models, and more.
FREE PDF: https://udlbook.github.io/udlbook/

GraphRAG, a graph-based approach to retrieval-augmented generation (RAG) that significantly improves question-answering ...
04/07/2024

GraphRAG, a graph-based approach to retrieval-augmented generation (RAG) that significantly improves question-answering over private or previously unseen datasets, is now available on GitHub.
https://www.microsoft.com/en-us/research/blog/graphrag-new-tool-for-complex-data-discovery-now-on-github/

GraphRAG, a graph-based approach to retrieval-augmented generation (RAG) that significantly improves question-answering over private or previously unseen datasets, is now available on GitHub. Learn more:

๐Ÿ”ฅ๐“๐Ž๐ ๐‚๐•๐๐‘ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ’ ๐ฉ๐š๐ฉ๐ž๐ซ๐ฌThis repository is a curated collection of the most exciting and influential CVPR 2024 papers!https...
27/06/2024

๐Ÿ”ฅ๐“๐Ž๐ ๐‚๐•๐๐‘ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ’ ๐ฉ๐š๐ฉ๐ž๐ซ๐ฌ
This repository is a curated collection of the most exciting and influential CVPR 2024 papers!
https://github.com/SkalskiP/top-cvpr-2024-papers

This repository is a curated collection of the most exciting and influential CVPR 2024 papers. ๐Ÿ”ฅ [Paper + Code + Demo] - SkalskiP/top-cvpr-2024-papers

Together AI and Morph Labs collaborated to create an excellent blog post on tuning models for RAG (Retrieval Augmented G...
26/06/2024

Together AI and Morph Labs collaborated to create an excellent blog post on tuning models for RAG (Retrieval Augmented Generation). RAG fine-tuning combines code retrieval with model training, addressing the limitations of outdated knowledge and hallucinations in LLMs.

Large Language Models (LLMs) have shown promising capabilities on multiple applications such as code generation, task planning, and document understanding. Despite the impressive performance, these models often fall short due to two main reasons: hallucinations and outdated knowledge in the models.....

๐Ÿ”ฅ๐‹๐ž๐ญโ€™๐ฌ ๐ซ๐ž๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ž ๐†๐๐“-๐Ÿ (๐Ÿ๐Ÿ๐Ÿ’๐Œ) by Andrej Karpathy (former OpenAI scientist and Tesla's former head of AI)๐Ÿ“ฝ๏ธ New 4 hour vi...
10/06/2024

๐Ÿ”ฅ๐‹๐ž๐ญโ€™๐ฌ ๐ซ๐ž๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ž ๐†๐๐“-๐Ÿ (๐Ÿ๐Ÿ๐Ÿ’๐Œ) by Andrej Karpathy (former OpenAI scientist and Tesla's former head of AI)
๐Ÿ“ฝ๏ธ New 4 hour video lecture on YouTube:
https://youtu.be/l8pRSuU81PU?si=M1kznmR5XSiYW-Qz

We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really...

๐Ÿ”ฅ๐๐ฐ๐ž๐ง๐Ÿ is the newest Alibaba's open source large language model. It slightly surpasses Llama 3 70B on benchmark performa...
07/06/2024

๐Ÿ”ฅ๐๐ฐ๐ž๐ง๐Ÿ is the newest Alibaba's open source large language model. It slightly surpasses Llama 3 70B on benchmark performance in English while being a better multilingual model.
Blog:
https://qwenlm.github.io/blog/qwen2/

๐“๐ก๐ž ๐‘๐ข๐ฌ๐ž ๐จ๐Ÿ ๐€๐ ๐ž๐ง๐ญ๐ข๐œ ๐‘๐ž๐ญ๐ซ๐ข๐ž๐ฏ๐š๐ฅ-๐€๐ฎ๐ ๐ฆ๐ž๐ง๐ญ๐ž๐ ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐จ๐ง (๐‘๐€๐†) ๐ข๐ง ๐€๐ซ๐ญ๐ข๐Ÿ๐ข๐œ๐ข๐š๐ฅ ๐ˆ๐ง๐ญ๐ž๐ฅ๐ฅ๐ข๐ ๐ž๐ง๐œ๐ž ๐€๐ˆIn the rapidly developing fields o...
05/06/2024

๐“๐ก๐ž ๐‘๐ข๐ฌ๐ž ๐จ๐Ÿ ๐€๐ ๐ž๐ง๐ญ๐ข๐œ ๐‘๐ž๐ญ๐ซ๐ข๐ž๐ฏ๐š๐ฅ-๐€๐ฎ๐ ๐ฆ๐ž๐ง๐ญ๐ž๐ ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐จ๐ง (๐‘๐€๐†) ๐ข๐ง ๐€๐ซ๐ญ๐ข๐Ÿ๐ข๐œ๐ข๐š๐ฅ ๐ˆ๐ง๐ญ๐ž๐ฅ๐ฅ๐ข๐ ๐ž๐ง๐œ๐ž ๐€๐ˆ
In the rapidly developing fields of data science and Artificial Intelligence (AI), the search for increasingly effective systems is also increasing significantly. The development of Agentic Retrieval-Augmented Generation (RAG) is among the most revolutionary developments of recent times. This strategy is set to completely transform the way information is used and managed, offering a substantial improvement over current RAG systems.
https://www.marktechpost.com/2024/05/28/the-rise-of-agentic-retrieval-augmented-generation-rag-in-artificial-intelligence-ai/

The Rise of Agentic Retrieval-Augmented Generation (RAG) in Artificial Intelligence AI

๐†๐ซ๐š๐ฉ๐ก๐‘๐€๐† (Graph-based Retrieval Augmented Generation) enhances the traditional Retrieval Augmented Generation (RAG) meth...
02/06/2024

๐†๐ซ๐š๐ฉ๐ก๐‘๐€๐† (Graph-based Retrieval Augmented Generation) enhances the traditional Retrieval Augmented Generation (RAG) method by integrating knowledge graphs (KGs) or graph databases with large language models (LLMs). It leverages the structured nature of graph databases to organize data as nodes and relationships, enabling more efficient and accurate retrieval of relevant information to provide better context to LLMs for generating responses.
https://gradientflow.substack.com/p/graphrag-design-patterns-challenges

Subscribe โ€ข Previous Issues Enhancing RAG with Knowledge Graphs: Blueprints, Hurdles, and Guidelines By Ben Lorica and Prashanth Rao. GraphRAG (Graph-based Retrieval Augmented Generation) enhances the traditional Retrieval Augmented Generation (RAG) method by integrating knowledge graphs (

๐†๐๐“ ๐‘๐ž๐ฌ๐ž๐š๐ซ๐œ๐ก๐ž๐ซ-GPT based autonomous agent that does online comprehensive research on any given topic.-GPT Researcher sup...
29/05/2024

๐†๐๐“ ๐‘๐ž๐ฌ๐ž๐š๐ซ๐œ๐ก๐ž๐ซ
-GPT based autonomous agent that does online comprehensive research on any given topic.
-GPT Researcher supports all major LLM providers.
Repo:
https://github.com/assafelovic/gpt-researcher

๐Œ๐ข๐ฌ๐ญ๐ซ๐š๐ฅ ๐…๐ข๐ง๐ž๐ญ๐ฎ๐ง๐žMistral released an official repository to fine-tune its models.GitHub Repo:https://github.com/mistralai...
27/05/2024

๐Œ๐ข๐ฌ๐ญ๐ซ๐š๐ฅ ๐…๐ข๐ง๐ž๐ญ๐ฎ๐ง๐ž
Mistral released an official repository to fine-tune its models.
GitHub Repo:
https://github.com/mistralai/mistral-finetune

๐ƒ๐ข๐Ÿ๐Ÿ๐ฎ๐ฌ๐ข๐จ๐ง ๐Œ๐จ๐๐ž๐ฅ๐ฌNotes on the theory behind models like Stable Diffusion and their applications.https://andrewkchan.dev/p...
26/05/2024

๐ƒ๐ข๐Ÿ๐Ÿ๐ฎ๐ฌ๐ข๐จ๐ง ๐Œ๐จ๐๐ž๐ฅ๐ฌ
Notes on the theory behind models like Stable Diffusion and their applications.
https://andrewkchan.dev/posts/diffusion.html

ContentsDiffusion ModelsNotes on the theory behind models like Stable Diffusion and their applications. I spent 2022 learning to draw and was blindsided by the rise of AI art models like Stable Diffusion. Suddenly, the computer was a better artist than I could ever hope to be. It's been two years, a...

๐„๐ฅ๐ข๐šA snappy, keyboard-centric terminal user interface for interacting with large language models. Chat with ChatGPT, Cl...
26/05/2024

๐„๐ฅ๐ข๐š
A snappy, keyboard-centric terminal user interface for interacting with large language models. Chat with ChatGPT, Claude, Llama 3, Phi 3, Mistral, Gemma and more.
Repo:
https://github.com/darrenburns/elia

๐Ÿ”ฅ๐ฅ๐ฅ๐š๐ฆ๐š๐Ÿ‘ ๐ข๐ฆ๐ฉ๐ฅ๐ž๐ฆ๐ž๐ง๐ญ๐ž๐ ๐Ÿ๐ซ๐จ๐ฆ ๐ฌ๐œ๐ซ๐š๐ญ๐œ๐กThis great tutorial shows every step of reconstructing Llama 3 and running the trained w...
24/05/2024

๐Ÿ”ฅ๐ฅ๐ฅ๐š๐ฆ๐š๐Ÿ‘ ๐ข๐ฆ๐ฉ๐ฅ๐ž๐ฆ๐ž๐ง๐ญ๐ž๐ ๐Ÿ๐ซ๐จ๐ฆ ๐ฌ๐œ๐ซ๐š๐ญ๐œ๐ก
This great tutorial shows every step of reconstructing Llama 3 and running the trained weights.
Repo:
https://github.com/naklecha/llama3-from-scratch

๐Š๐จ๐ฅ๐ฆ๐จ๐ ๐จ๐ซ๐จ๐ฏ ๐€๐ซ๐ง๐จ๐ฅ๐ ๐๐ž๐ญ๐ฐ๐จ๐ซ๐ค๐ฌ (๐Š๐€๐) ๐๐š๐ฉ๐ž๐ซ ๐„๐ฑ๐ฉ๐ฅ๐š๐ข๐ง๐ž๐ - An exciting new paradigm for Deep Learning?https://youtu.be/7zpz_AlFW...
19/05/2024

๐Š๐จ๐ฅ๐ฆ๐จ๐ ๐จ๐ซ๐จ๐ฏ ๐€๐ซ๐ง๐จ๐ฅ๐ ๐๐ž๐ญ๐ฐ๐จ๐ซ๐ค๐ฌ (๐Š๐€๐) ๐๐š๐ฉ๐ž๐ซ ๐„๐ฑ๐ฉ๐ฅ๐š๐ข๐ง๐ž๐ - An exciting new paradigm for Deep Learning?
https://youtu.be/7zpz_AlFW2w

This is a paper breakdown video of the paper: Kolmogorov Arnold Networks, which brilliantly provides an alternative to standard Multi Layer Perceptrons. The ...

๐Ÿ”ฅ๐€๐๐ฏ๐š๐ง๐œ๐ž๐ ๐๐‹๐ ๐Ÿ๐ซ๐จ๐ฆ ๐‚๐š๐ซ๐ง๐ž๐ ๐ข๐ž ๐Œ๐ž๐ฅ๐ฅ๐จ๐ง ๐”๐ง๐ข๐ฏ๐ž๐ซ๐ฌ๐ข๐ญ๐ฒ! (2024)Great recent NLP topics like prompting, fine-tuning and instruction...
16/05/2024

๐Ÿ”ฅ๐€๐๐ฏ๐š๐ง๐œ๐ž๐ ๐๐‹๐ ๐Ÿ๐ซ๐จ๐ฆ ๐‚๐š๐ซ๐ง๐ž๐ ๐ข๐ž ๐Œ๐ž๐ฅ๐ฅ๐จ๐ง ๐”๐ง๐ข๐ฏ๐ž๐ซ๐ฌ๐ข๐ญ๐ฒ! (2024)
Great recent NLP topics like prompting, fine-tuning and instruction-tuning, retrieval and RAG, ensembling and mixture of experts (MoE), and more.

One of the best NLP courses on the web:
https://www.youtube.com/playlist?list=PL8PYTP1V4I8D0UkqW2fEhgLrnlDW9QK7z

This year, more and more developers are talking about AI agents - autonomous or semi-autonomous systems capable of handl...
16/05/2024

This year, more and more developers are talking about AI agents - autonomous or semi-autonomous systems capable of handling a wider range of tasks and making decisions on their own. Unlike co-pilots, agents have a higher degree of autonomy and can take proactive actions based on their goals and understanding of the environment. They can complete tasks without constant human intervention, learning and adapting based on their interactions and experiences.
https://gradientflow.substack.com/p/agentic-ai-challenges-and-opportunities

Subscribe โ€ข Previous Issues Navigating the Complex World of AI Agents Last year, the buzz in the AI community revolved around the concept of AI co-pilots - systems designed to work alongside humans, assisting them in tasks and decision-making processes. These co-pilots, such as GitHub Copilot for ...

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐œ๐ž ๐€๐ ๐ž๐ง๐ญ, a Google labs experiment! An experiment to build an AI generated Colab notebook that handles data...
15/05/2024

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐œ๐ž ๐€๐ ๐ž๐ง๐ญ, a Google labs experiment! An experiment to build an AI generated Colab notebook that handles data cleaning, data exploration, plotting, Q&A on data, and predictive modeling.
โ—† Helps with complex tasks like planning, and error correction.
โ—† Helps with data science tasks like predictive modeling.
โ—† Outputs an AI-generated Colab notebook based on your prompt.

Try it at https://labs.google.com/code/

๐Ÿ”ฅ๐ˆ๐ง๐œ๐ซ๐ž๐๐ข๐›๐ฅ๐ž ๐ฌ๐ž๐ซ๐ข๐ž๐ฌ ๐จ๐Ÿ ๐Ÿ—๐Ÿ– ๐ฅ๐ž๐œ๐ญ๐ฎ๐ซ๐ž๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐”๐“ ๐€๐ฎ๐ฌ๐ญ๐ข๐ง ๐จ๐ง ๐๐‹๐ ๐š๐ง๐ ๐‹๐‹๐Œ๐ฌIt gives descent synopses of modern NLP topics and recen...
11/05/2024

๐Ÿ”ฅ๐ˆ๐ง๐œ๐ซ๐ž๐๐ข๐›๐ฅ๐ž ๐ฌ๐ž๐ซ๐ข๐ž๐ฌ ๐จ๐Ÿ ๐Ÿ—๐Ÿ– ๐ฅ๐ž๐œ๐ญ๐ฎ๐ซ๐ž๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐”๐“ ๐€๐ฎ๐ฌ๐ญ๐ข๐ง ๐จ๐ง ๐๐‹๐ ๐š๐ง๐ ๐‹๐‹๐Œ๐ฌ
It gives descent synopses of modern NLP topics and recent ones like RLHF, instruction-tuning, few-shot prompting, chain-of-thought, and more.
Lectures:
https://www.youtube.com/playlist?list=PLofp2YXfp7TZZ5c7HEChs0_wfEfewLDs7

๐Ÿ”ฅ๐†๐ซ๐š๐ง๐ข๐ญ๐ž ๐‚๐จ๐๐ž ๐Œ๐จ๐๐ž๐ฅ๐ฌ: ๐€ ๐…๐š๐ฆ๐ข๐ฅ๐ฒ ๐จ๐Ÿ ๐Ž๐ฉ๐ž๐ง ๐…๐จ๐ฎ๐ง๐๐š๐ญ๐ข๐จ๐ง ๐Œ๐จ๐๐ž๐ฅ๐ฌ ๐Ÿ๐จ๐ซ ๐‚๐จ๐๐ž ๐ˆ๐ง๐ญ๐ž๐ฅ๐ฅ๐ข๐ ๐ž๐ง๐œ๐ž- IBM has released four variations of the ...
09/05/2024

๐Ÿ”ฅ๐†๐ซ๐š๐ง๐ข๐ญ๐ž ๐‚๐จ๐๐ž ๐Œ๐จ๐๐ž๐ฅ๐ฌ: ๐€ ๐…๐š๐ฆ๐ข๐ฅ๐ฒ ๐จ๐Ÿ ๐Ž๐ฉ๐ž๐ง ๐…๐จ๐ฎ๐ง๐๐š๐ญ๐ข๐จ๐ง ๐Œ๐จ๐๐ž๐ฅ๐ฌ ๐Ÿ๐จ๐ซ ๐‚๐จ๐๐ž ๐ˆ๐ง๐ญ๐ž๐ฅ๐ฅ๐ข๐ ๐ž๐ง๐œ๐ž
- IBM has released four variations of the Granite code model.
- Ranging in size from 3 to 34B parameters
- Trained on 3 to 4T tokens sourced from ๐Ÿ๐Ÿ๐Ÿ” ๐ฉ๐ซ๐จ๐ ๐ซ๐š๐ฆ๐ฆ๐ข๐ง๐  ๐ฅ๐š๐ง๐ ๐ฎ๐š๐ ๐ž๐ฌ
- The models have outperformed other comparable models like Code Llama and Llama 3 in many tasks.
- Repo:
https://github.com/ibm-granite/granite-code-models
-Paper:
https://arxiv.org/abs/2405.04324

๐Ÿ”ฅ๐Š๐จ๐ฅ๐ฆ๐จ๐ ๐จ๐ซ๐จ๐ฏ-๐€๐ซ๐ง๐จ๐ฅ๐ ๐๐ž๐ญ๐ฐ๐จ๐ซ๐ค ๐Ÿ๐จ๐ซ ๐‘๐ž๐ข๐ง๐Ÿ๐จ๐ซ๐œ๐ž๐ฆ๐ž๐ง๐ญ ๐‹๐ž๐š๐ง๐ข๐ง๐ , ๐ข๐ง๐ข๐ญ๐ข๐š๐ฅ ๐ž๐ฑ๐ฉ๐ž๐ซ๐ข๐ฆ๐ž๐ง๐ญ๐ฌThis small project test the novel architecture ...
09/05/2024

๐Ÿ”ฅ๐Š๐จ๐ฅ๐ฆ๐จ๐ ๐จ๐ซ๐จ๐ฏ-๐€๐ซ๐ง๐จ๐ฅ๐ ๐๐ž๐ญ๐ฐ๐จ๐ซ๐ค ๐Ÿ๐จ๐ซ ๐‘๐ž๐ข๐ง๐Ÿ๐จ๐ซ๐œ๐ž๐ฆ๐ž๐ง๐ญ ๐‹๐ž๐š๐ง๐ข๐ง๐ , ๐ข๐ง๐ข๐ญ๐ข๐š๐ฅ ๐ž๐ฑ๐ฉ๐ž๐ซ๐ข๐ฆ๐ž๐ง๐ญ๐ฌ
This small project test the novel architecture Kolmogorov-Arnold Networks (KAN) in the reinforcement learning paradigm to the CartPole problem.
Repo:
https://github.com/riiswa/kanrl

๐Ÿ”ฅ๐ฑ๐‹๐’๐“๐Œ: ๐„๐ฑ๐ญ๐ž๐ง๐๐ž๐ ๐‹๐จ๐ง๐  ๐’๐ก๐จ๐ซ๐ญ-๐“๐ž๐ซ๐ฆ ๐Œ๐ž๐ฆ๐จ๐ซ๐ฒSepp Hochreiter, who invented the LSTM, just dropped a new LLM architecture!-The ...
09/05/2024

๐Ÿ”ฅ๐ฑ๐‹๐’๐“๐Œ: ๐„๐ฑ๐ญ๐ž๐ง๐๐ž๐ ๐‹๐จ๐ง๐  ๐’๐ก๐จ๐ซ๐ญ-๐“๐ž๐ซ๐ฆ ๐Œ๐ž๐ฆ๐จ๐ซ๐ฒ
Sepp Hochreiter, who invented the LSTM, just dropped a new LLM architecture!
-The xLSTM architecture is shown to be efficient at handling different aspects of long context problems.
-Major component is a new parallelizable LSTM.
-One of the major weaknesses of prior LSTMs was the sequential nature (can't be done at once)
Check the paper for more interesting insights and results:
https://arxiv.org/abs/2405.04517

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐€๐ฐ๐ž๐ฌ๐จ๐ฆ๐ž ๐Š๐€๐(๐Š๐จ๐ฅ๐ฆ๐จ๐ ๐จ๐ซ๐จ๐ฏ-๐€๐ซ๐ง๐จ๐ฅ๐ ๐๐ž๐ญ๐ฐ๐จ๐ซ๐ค)A curated list of awesome libraries, tutorials, papers, and other resources rel...
08/05/2024

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐€๐ฐ๐ž๐ฌ๐จ๐ฆ๐ž ๐Š๐€๐(๐Š๐จ๐ฅ๐ฆ๐จ๐ ๐จ๐ซ๐จ๐ฏ-๐€๐ซ๐ง๐จ๐ฅ๐ ๐๐ž๐ญ๐ฐ๐จ๐ซ๐ค)
A curated list of awesome libraries, tutorials, papers, and other resources related to Kolmogorov-Arnold Network (KAN). This repository aims to be a comprehensive and organized collection that will help researchers and developers in the world of KAN!
Repo: https://github.com/mintisan/awesome-kan

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Š๐€๐-๐†๐๐“The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KAN...
07/05/2024

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Š๐€๐-๐†๐๐“
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling
Repo:
https://github.com/AdityaNG/kan-gpt

๐ˆ๐‚๐‹๐‘ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ’ ๐Ž๐ฎ๐ญ๐ฌ๐ญ๐š๐ง๐๐ข๐ง๐  ๐๐š๐ฉ๐ž๐ซ ๐€๐ฐ๐š๐ซ๐๐ฌ (International Conference on Learning Representations)1. Generalization in diffusion ...
07/05/2024

๐ˆ๐‚๐‹๐‘ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ’ ๐Ž๐ฎ๐ญ๐ฌ๐ญ๐š๐ง๐๐ข๐ง๐  ๐๐š๐ฉ๐ž๐ซ ๐€๐ฐ๐š๐ซ๐๐ฌ (International Conference on Learning Representations)
1. Generalization in diffusion models arises from geometry-adaptive harmonic representations
2. Learning Interactive Real-World Simulators
3. Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
4. Protein Discovery with Discrete Walk-Jump Sampling
5. Vision Transformers Need Registers
Blog:
https://blog.iclr.cc/2024/05/06/iclr-2024-outstanding-paper-awards/

May 6 2024 ICLR 2024 Outstanding Paper Awards Yisong Yue ICLR 2024 Awards Committee: Eunsol Choi, Katja Hofmann, Ming-Yu Liu, Nan Jiang, Stephan Gรผnnemann, Suvrit Sra, Thomas Kipf, Volkan Cevher (This post is written by the Awards Committee, lightly edited by the Program Chairs.) Selection Process ...

Address


Alerts

Be the first to know and let us send you an email when sistemas.ai posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Contact The Business

Send a message to sistemas.ai:

Videos

Shortcuts

  • Address
  • Alerts
  • Contact The Business
  • Videos
  • Claim ownership or report listing
  • Want your business to be the top-listed Media Company?

Share