It's not every day you see AI and machine learning taking the spotlight in the style of rock stars, but here we are.
- WIRED's reportage on Databricks' DBRX
- Demis Hassabis receiving a knighthood
The AI world is buzzing with celebrity-like fervor: https://www.turingpost.com/p/fod47
It's not every day you see AI and machine learning taking the spotlight in the style of rock stars, but here we are.
- WIRED's reportage on Databricks' DBRX
- Demis Hassabis receiving a knighthood
The AI world is buzzing with celebrity-like fervor: https://www.turingpost.com/p/fod47
Meta released the Video Joint Embedding Predictive Architecture (V-JEPA). This model is a leap toward Yann LeCun's vision of advanced machine intelligence.
▪️ Mirroring human learning, largely derived from observational experiences, is how humans naturally learn about their environment.
V-JEPA excels in:
- Identifying complex object interactions
- Interpreting those interactions
▪️ V-JEPA introduces a non-generative model that enhances training and sample efficiency.
It predicts video content in an abstract representation space, distinct from pixel-level predictions. This method allows for a significant reduction in the need for labeled data.
▪️ The model's learning process involves:
- Masking significant portions of a video to challenge its predictive capabilities
- Fostering a deeper "understanding" of spatial and temporal dynamics
This technique ensures V-JEPA develops a grasp of various interactions within a scene.
▪️ V-JEPA's design emphasizes:
- Abstract representation
- Prioritizing conceptual understanding over minute details
This approach is particularly effective in "frozen evaluations," where the pre-trained model adapts to new tasks with minimal adjustments.
▪️ V-JEPA is released under a Creative Commons NonCommercial license.
Paper: https://ai.meta.com/research/publications/revisiting-feature-prediction-for-learning-visual-representations-from-video/
Code: https://github.com/facebookresearch/jepa
Groq’s Language Processing Unit (LPU) promises significant speed advancements for deploying LLMs. Is it a rival to GPUs? How it works?👇🏼
▪️ The LPU is a special kind of computer brain designed to handle language tasks very quickly.
Unlike other computer chips that do many things at once, the LPU works on tasks one after the other, which is perfect for understanding and generating language.
▪️ The LPU is designed to overcome the two LLM bottlenecks: compute density and memory bandwidth.
Groq took a novel approach right from the start, focusing on software and compiler development before even thinking about the hardware. They made sure the software could guide how the chips talk to each other, ensuring they work together seamlessly. This makes the LPU good at processing language efficiently and at high speed.
▪️ This advancement resulted in a highly optimized system that outperformed traditional setups in speed, cost efficiency, and energy consumption.
It's significant for industries such as finance, government, and technology, where rapid and precise data processing is essential.
▪️ Now, don't go tossing out your GPUs just yet!
While the LPU is a beast when it comes to inference, making light work of applying trained models to new data, GPUs still reign supreme in the training arena.
▪️ The LPU and GPU might become the dynamic duo of AI hardware, each excelling in their respective roles.
To better understand architecture, Groq offers two papers
1) https://wow.groq.com/groq-isca-paper-2020
2) https://wow.groq.com/isca-2022-paper/
Learn more about Groq at https://www.turingpost.com/p/fod41
Andrew Ng gave a talk on the opportunities in AI at Stanford University.
I summarised the main insights from this talk:
🔸 AI's Potential:
Dr. Andrew Ng sees AI as versatile as electricity, capable of transforming various sectors.
He highlights supervised learning and generative AI as fundamental tools shaping the AI landscape.
🔸 Rise of Large Language Models
Ng highlights the groundbreaking capabilities of LLMs, illustrating how they facilitate rapid application development. He predicts an influx of custom AI applications driven by advancements in prompt-based AI.
🔸 Financial value and opportunities in AI
Ng acknowledges the prospect for nascent startups and entrenched corporations to capitalize on AI, amidst the challenges of navigating short-lived trends and the imperative for tangible use cases.
🔸 Expanding AI across industries
Ng discusses that AI could revolutionize various industries through tailored AI systems and data-centric approaches.
He addresses the hurdles in broadening AI adoption, highlighting the significance of low/no-code tools for customization.
🔸 The role of AI in technology and applications
AI has a significant impact on various layers of technology.
Professor Ng provides insights on how to harness the power of AI in applications, such as relationship coaching, highlighting the immense untapped market potential.
🔸 Framework for building AI startups
Ng shares his methodology, which highlights iterative development, early leadership involvement, and customer engagement as key factors in incubating successful AI startups.
🔸 Ethical considerations
Ng emphasizes the significance of pursuing concrete AI initiatives that adhere to ethical standards. He advocates for responsible innovation and support for those affected by the disruptive effects of AI.
🔸 AI risks
Addressing concerns surrounding AI, Ng discusses the distant reality of AGI and dismisses unfounded extinction risks. He advocates for the proactive
CoDeF is a new open-source type of video representation that can
▪️ lift image-to-image translation to video-to-video translation
▪️ lift keypoint detection to keypoint tracking without any training
Code: https://github.com/qiuyu96/CoDeF
Team: Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen
HKUST, Ant Group, CAD&CG, ZJU
Authors of Rift say: Software will soon be written mostly by AI software engineers.
They present a server for your personal AI software engineer.
With it, you could execute conversational code edits, codebase-wide edits, and more!
Features of Rift:
▪️ Conversational code editing
▪️ Codebase-wide edits
▪️ Contextual codebase generation
The code is open-source 👇🏼
Code: https://github.com/morph-labs/rift
OptFormer is one of the first Transformer-based frameworks for hyperparameter tuning, learned from large-scale optimization data using flexible text-based representations.
Last week, Google AI introduced OptFormer.
Google shows that a single Transformer network can:
1. imitate highly complex behaviors from multiple algorithms over long horizons
2. predict objective values very accurately, in many cases surpassing Gaussian Processes
Read more here: https://ai.googleblog.com/2022/08/optformer-towards-universal.html
Google AI proposed to leverage large language models to translate language tasks for robots.
A novel approach, developed in partnership with Everyday Robots, enables a physical agent to follow high-level textual instructions for physically-grounded tasks.
This approach grounds the language model in tasks that are feasible within a specific real-world context.
To evaluate the method, Google placed robots in a real kitchen setting and gave them tasks expressed in natural language.
Results show that grounding the language model in the real world nearly halves errors over non-grounded baselines.
Google AI also releases a robot simulation setup where the research community can test this approach. https://github.com/google-research/google-research/tree/master/saycan
Google AI introduces iterative co-tokenization, a new approach to video-text learning.
This approach can efficiently fuse spatial, temporal and language information for video question-answering.
It outperforms the previous state-of-the-art by large margins.
Simultaneously, the model reduces the required GFLOPs from 150-360 to only 67, producing a highly efficient video question answering model.
Read more: https://ai.googleblog.com/2022/08/efficient-video-text-learning-with.html
Google AI open source Rax, a JAX-based library for supervised ranking algorithms.
It can be used in, but is not limited to, the following applications:
1. Search
2. Recommendation
3. Question Answering
4. Dialogue System
Rax provides state-of-the-art ranking losses, standard ranking metrics, and a set of function transformations.
All this functionality is provided with a well-documented and easy-to-use API.
Find Rax here: https://github.com/google/rax
Application of the MuZero algorithm to the challenge of video compression.
DeepMind collaborates with YouTube to optimize video compression in the open-source VP9 codec.
Researchers introduce a novel self-competition-based reward mechanism to solve constrained RL.
2 important ML use-cases from last week:
1. Uber unveiled some details about DeepETA used to predict arrival times
2. TensorFlow team provides details about improving the popular TF-GAN framework
AutoML-Zero can discover new ML algorithms without major restrictions in the search space.
It divides the ML algorithm into 3 functions:
1. Setup Function
2. Learn function
3. Predict Function
AutoML-Zero relies on basic mathematical operations instead of building blocks.
Maybe soon AutoML-Zero will rediscover algorithms like gradient descent on its own?
Read more about AutoML-Zero here: https://thesequence.substack.com/p/thesequence-edge2-automl-automl-zero
TACTO is a fast, flexible, and open-source simulator for tactile sensors.
It is released by @facebookai as a simulator for DIGIT, a compact tactile sensor designed for robotic in-hand manipulation.
Find the code for the simulator: https://github.com/facebookresearch/tacto