Data Science Central

  • Home
  • Data Science Central

Data Science Central Co-founded by Vincent Granville and part of the DSC community, our focus is on data science, ML, AI,

30 Python Libraries that I Often Use https://mltblog.com/3ONhMWi This list covers well-known as well as specialized libr...
18/02/2024

30 Python Libraries that I Often Use https://mltblog.com/3ONhMWi

This list covers well-known as well as specialized libraries that I use rather frequently. Applications include GenAI, data animations, LLM, synthetic data generation and evaluation, ML optimization, scientific computing, statistics, web crawling, APIs, SQL, and more. I also mention my owns, and issues that I faced with standard libraries. In several instances, for instance sound generation, I did not use any library. In addition, included some functions that I regularly call. Many times, I explain why I had to create my home-made versions.

30 Python libraries to solve most AI problems, including GenAI, data videos, synthetization, model evaluation, computer vision and more.

Gemini Ultra Unleashed: Google's Best LLM Now Available https://mltblog.com/3SBZzMzA lot has changed for the better sinc...
15/02/2024

Gemini Ultra Unleashed: Google's Best LLM Now Available https://mltblog.com/3SBZzMz

A lot has changed for the better since the first announcement not long ago.

Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Live demo and code-sharing session to see Gemini Ultra in action. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

Probabilistic ANN: The Swiss Army Knife of GenAI https://mltblog.com/48hQWfYANN — Approximate Nearest Neighbors —  is at...
11/02/2024

Probabilistic ANN: The Swiss Army Knife of GenAI https://mltblog.com/48hQWfY

ANN — Approximate Nearest Neighbors — is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, […]

ANN -- Approximate Nearest Neighbors -- is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time se...

Actions in GPTs: Developer Tips, Tricks & Techniques  https://mltblog.com/3utzlDZHands-on workshop for developers and AI...
10/02/2024

Actions in GPTs: Developer Tips, Tricks & Techniques https://mltblog.com/3utzlDZ

Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

How to Automate Data Cleaning, in a Nutshell
07/02/2024

How to Automate Data Cleaning, in a Nutshell

Issues and solutions to automate data cleaning. Free your data scientists from the most boring tasks, making them happier and reducing costs.

Massively Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neu...
06/02/2024

Massively Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.

Dramatically Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.

More Fun Math Problems for Machine Learning Practitioners
06/02/2024

More Fun Math Problems for Machine Learning Practitioners

This is part of a series featuring the following aspects of machine learning: Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science) Opinions, for instance about the value of a PhD in our field, or the use of some techniques Methods, principle...

Better, Faster, Less Expensive Synthetic Data Without Deep Learning
05/02/2024

Better, Faster, Less Expensive Synthetic Data Without Deep Learning

My talk at the ODSC Conference, San Francisco, October 2023. Includes Notebook demonstration, using our open-source Python libraries. View or download the PowerPoint presentation, here. I discuss NoGAN, an alternative to standard tabular data synthetization. It runs 1000x faster than GAN, consistent...

AI-based Object/Image Detection for Inventory Management https://mltblog.com/3SMRJRCHands-on workshop for developers and...
05/02/2024

AI-based Object/Image Detection for Inventory Management https://mltblog.com/3SMRJRC

Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

This is one of the AI applications where many compagnies recognize the value and are ready to invest, with guaranteed return thanks to low costs, proven technology, and automation.

Many of the requests we get from potential enterprise clients - even brick and mortar companies - are actually focused on this topic: automated classification and management of inventory or digital content, with an interest in automated image labeling and classification, as well as creating document taxonomies and better search tools (sometimes with automated data analysis) to help internal customers quickly find what they need.

NoGAN: Ultrafast Data Synthesizer and New Evaluation Metric - My Presentation at ODSC San Francisco
05/02/2024

NoGAN: Ultrafast Data Synthesizer and New Evaluation Metric - My Presentation at ODSC San Francisco

Our presentation/workshop about NoGAN at ODSC San Francisco, October 2023. Runs 1000x faster than GAN, consistently delivering better results according to th...

The Riemann Hypothesis in One Picture
05/02/2024

The Riemann Hypothesis in One Picture

With visual, simple, intuitive method for supervised classification

Simple Introduction to Public-Key Cryptography and Cryptanalysis: Illustration with Random Permutations
04/02/2024

Simple Introduction to Public-Key Cryptography and Cryptanalysis: Illustration with Random Permutations

In this article, I illustrate the concept of asymmetric key with a simple example. Rather than discussing algorithms such as RSA, (still widely used, for instance to set up a secure website) I focus on a system easier to understand, based on random permutations. I discuss how to generate these rando...

03/02/2024

GenAI: Fast Vector Search at Scale (Demo on AWS)

Register at https://mltblog.com/3UGF0l5.

ANN stands for Approximate Nearest Neighbors, a faster yet high-quality alternative to exact but slow KNN, for vector search in GenAI contexts (LLM, GPT, multimodal, and so on). My team is actually developing proprietary technology on this topic, with paper coming soon. In the meanwhile, if you want to see real enterprise case studies, and an existing fully scaled algorithm in action, this hands-on workshop is for you.

Intended to developers and AI professionals, featuring state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

Synthetizing the Insurance Dataset Using Copulas - Towards Better Synthetization
02/02/2024

Synthetizing the Insurance Dataset Using Copulas - Towards Better Synthetization

This article is an extract from my book “Synthetic Data and Generative AI”, available here. In the context of synthetic data generation, I've been asked a few times to provide a case study focusing on real-life tabular data used in the finance or health industry. Here we go: this article fills t...

A Simple Regression Problem
02/02/2024

A Simple Regression Problem

This article is part of a new series featuring problems with solution, to help you hone your machine learning and pattern recognition skills. Try to solve this problem by yourself first, before looking at the solution. Today’s problem also has an intriguing mathematical appeal and solution: this a...

Generative AI: Synthetic Data Vendor Comparison and Benchmarking Best Practices
01/02/2024

Generative AI: Synthetic Data Vendor Comparison and Benchmarking Best Practices

The goal of data synthetization is to produce artificial data that mimics the patterns and features present in existing, real data. Many generation methods and evaluation techniques are available, depending on purposes, the type of data, and the application field. Everyone is familiar with synthetic...

Book: Intuitive Machine Learning and Explainable AI
01/02/2024

Book: Intuitive Machine Learning and Explainable AI

Intuitive Machine Learning with focus on explainable AI, human-friendly intelligence, powerful visualizations and applications.

Machine Learning Cloud Regression: The Swiss Army Knife of Optimization
31/01/2024

Machine Learning Cloud Regression: The Swiss Army Knife of Optimization

Entitled “Machine Learning Cloud Regression: The Swiss Army Knife of Optimization”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. Also discussed in details with Python code in chapter 1 in my book “Intuitive Machine Learning and Explainable AI...

Better LLMs with Shorter Embeddings: Part 3  https://mltblog.com/3HGj6XiVariable Length Embeddings and fast ANN-like sea...
31/01/2024

Better LLMs with Shorter Embeddings: Part 3 https://mltblog.com/3HGj6Xi

Variable Length Embeddings and fast ANN-like search (approximated nearest neighbors) for better, lighter and less expensive LLMs

Variable Length Embeddings and fast ANN-like search (approximated nearest neighbors) for better, lighter and less expensive LLMs

18 Differences Between Good and Great Data Scientists
31/01/2024

18 Differences Between Good and Great Data Scientists

machine learning, data science career, business analytics, data science lifecycle, data visualizations

How to Choose the Best Machine Learning Technique: Comparison Table
30/01/2024

How to Choose the Best Machine Learning Technique: Comparison Table

Creating Embeddings on Large, Real-Time Data with OpenAI https://mltblog.com/3SiMGXFHands-on workshop for developers and...
30/01/2024

Creating Embeddings on Large, Real-Time Data with OpenAI https://mltblog.com/3SiMGXF

Hands-on workshop for developers and AI professionals, on state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

I recently showed how to optimize embeddings and RAG architecture in LLMs and GPT-like applications, with home-made systems. This webinar discusses a real business case, with much larger input data in real time, using efficient tools. Embeddings is the central piece.

New Python Library to Evaluate AI-generated Data and Compare Models
30/01/2024

New Python Library to Evaluate AI-generated Data and Compare Models

Called GenAI-Evalution, you use it for instance to assess the quality of tabular synthetic data. In this case, it measures how faithfully the synthetization mimics the real data it is derived from, by comparing the full joint empirical distributions (ECDF) attached to the two datasets. It works both...

A Synthetic Stock Exchange Played with Real Money. Includes Python code dealing with gigantic numbers using exact arithm...
29/01/2024

A Synthetic Stock Exchange Played with Real Money. Includes Python code dealing with gigantic numbers using exact arithmetic.

Not only that, but you can predict -- more precisely compute with absolute certainty -- what the value of any stock will be tomorrow. Transaction fees are well below 0.05% and the market, at least in the version presented here, is fair: in other words, a zero-sum game if you play by luck. If instead

Python Code and Material from the Book "Stochastic Processes and Simulations"  -  GitHub Repository
29/01/2024

Python Code and Material from the Book "Stochastic Processes and Simulations" - GitHub Repository

This repository contains the material (datasets, code, videos, spreadsheets) related to my book Stochastic Processes and Simulations - A Machine Learning Perspective. - GitHub - VincentGranville/Po...

An Intriguing Job Interview Question for AI/ML Professionals
29/01/2024

An Intriguing Job Interview Question for AI/ML Professionals

Intriguing technical job interview questions for candidates applying to machine learning and AI jobs, with 4 difficulty levels.

Book: Interpretable Machine Learning
28/01/2024

Book: Interpretable Machine Learning

Intuitive Machine Learning with focus on explainable AI, human-friendly intelligence, powerful visualizations and applications. By Vincent Granville Ph.D, published in September 2022. PDF format, 156 pages. Version 1.0 with Python code. The book is available here. For my upcoming course based on thi...

Build Document/Image Analytics with GPT-4 Vision https://mltblog.com/48Odh69 Showcasing a conceptual application demo th...
27/01/2024

Build Document/Image Analytics with GPT-4 Vision https://mltblog.com/48Odh69

Showcasing a conceptual application demo that can analyze insurance claims data, interpret PDF documents and photos of car accidents to infer damage types and estimate payouts.

Hands-on workshop for developers and AI professionals, on state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

New GenAI Evaluation Metric, Ultrafast Search, and Perfect Randomness
27/01/2024

New GenAI Evaluation Metric, Ultrafast Search, and Perfect Randomness

This article covers three different GenAI topics. First, I introduce one of the best random number generators (PRNG) with infinite period. Then I show how to evaluate the synthesized numbers using the full multivariate empirical distribution (same as KS that I used for NoGAN evaluation), but this ti...

My Book on Poisson-binomial Stochastic Processes and Simulations
26/01/2024

My Book on Poisson-binomial Stochastic Processes and Simulations

The book covers supervised classification, including fractal classification, as well as unsupervised clustering, using an innovative approach. Datasets are first mapped onto an image, then processed using image filtering techniques. I discuss the analogy with neural networks, comparing very deep but...

Address


Alerts

Be the first to know and let us send you an email when Data Science Central posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Shortcuts

  • Address
  • Alerts
  • Claim ownership or report listing
  • Want your business to be the top-listed Media Company?

Share