Get early access to Scentience devices

Home
Product
App
News
API Docs
AI Models
Research
Home
Product
App
News
API Docs
AI Models
Research
More
  • Home
  • Product
  • App
  • News
  • API Docs
  • AI Models
  • Research
  • Home
  • Product
  • App
  • News
  • API Docs
  • AI Models
  • Research

Sigma & ScentNet

30 December 2025


Today, Scentience launches ScentNet, the first open, multimodal dataset integrating chemical olfaction with vision, depth/LiDAR, audio, language, and inertial data. Using the new Sigma mobile sensing platform  - available now on iOS (Android coming) - researchers worldwide can collect, share, and accelerate AI models that include olfaction alongside traditional modalities.

An Open Olfaction-Focused Dataset for AI & Robotics

For decades, progress in artificial intelligence and robotics has been driven by shared datasets and common benchmarks. Vision, language, audio, and navigation advanced rapidly once researchers aligned around standardized data and reproducible evaluation through datasets like ImageNet, AudioSet, WordNet, and KITTI.


Olfaction has not had such a fruitful foundation in AI. It is not fully understood in biology and even less understood artificially.


Despite its importance in organism perception and its clear relevance to embodied AI, research in the machine sense of smell (olfaction) remains fragmented. Data is scarce, protocols vary, and chemical sensing is rarely synchronized with other modalities in a way that supports modern multimodal models. Today marks the kickoff of a new effort to change this in order to accelerate chemical sensing for machines and the construction of their world models for simulation. 


I am launching a community-driven initiative to build a large-scale multimodal dataset focused on olfaction called ScentNet. This dataset will contain chemical sensing captured in precise temporal alignment with vision, depth/LiDAR, audio, language, and inertial data using a unified mobile sensing application called Sigma. In mathematics, capital sigma (Σ) is the Greek letter that denotes the summation of all parts of a sequence - a fitting symbol for an effort intended to unify and standardize multiple data sources for a common cause. Here, Sigma is a mobile application and sensing instrument duo that synchronously works to capture and fuse high-fidelity world data. The app is currently available for iOS on the Apple App Store with an Android version soon to follow.


This effort is not a single paper or a one-off dataset. It is intended as a foundational benchmark, designed from the start to support reproducibility, extensibility, and broad adoption across AI, robotics, chemistry, and biology. Drawing on peer-reviewed insights from experts in biology, chemistry, robotics, and AI,  data collection protocols have been designed to ensure scientific rigor and broad applicability.


The initiative has already garnered enthusiasm from academic circles, with early collaborators at several universities testing the Scentience app and device in diverse scenarios - such as scent-based navigation, security applications, agricultural analysis, and lab-based chemistry work. Participants can easily join by downloading the app, attaching the olfaction instrument to their smartphone or tablet, and uploading anonymized data, contributing to a growing corpus that will be freely available for training next-generation AI models.


The Sigma technical specification will be published here soon, with ScentNet being published here on HuggingFace.

The Sigma app is a multimodal data collection instrument focused on olfactory AI for robotics.

Sigma is a multimodal data collection instrument focused on olfactory AI.

What’s Different

From the outset, this effort emphasizes:


  • Standardized collection protocols
  • Strict temporal synchronization
  • Rich environmental metadata
  • Clear licensing and governance
  • Community input with centralized stewardship


We began this effort by attempting to build a large open olfaction dataset over real-world tasks similar to CaptainCook4D or Open-X Embodiment dataset. However, it became clear that in order to solve olfaction for AI and robotics, time-sequenced multimodal data points were needed to provide the context needed for olfaction.


Data collection begins now. We have started with a few closed partners to ensure software infrastructure and instrument manufacturing are scalable. Dataset release schedules, benchmarks, and tasks will follow in structured phases as the dataset matures. All data will be open source under the MIT license.

Why This Matters Now

Vision-language models (VLMs) and vision-language-action models (VLAs) are current baselines for conversations regarding "superhuman intelligence" or "AGI". We cannot truly entertain such notions when our AI models do not incorporate the full human perception stack. VLMs are also the leading frontier architecture for modern robotics platforms, yet they all lack chemical sensing capabilities. As robotics and embodied AI move into the real world, vision and language alone are not enough. We are already seeing this in simulation-trained robots - real-world interaction is hard, and understanding how the data in air relates to vision is still a frontier in AI.


Olfaction is not an edge case - it is a missing modality. This initiative is an attempt to give it the shared foundation it needs to scale and enable a critical function for new intelligences.

What's Possible

This dataset is built to be maximally useful for anyone interested in building AI for olfaction. Top ideas that have been proposed are:


  • Olfaction-vision-language models and embeddings
  • Object-aroma localization ("From where is that fruity scent coming in this image?")
  • Scent-based navigation ("Hey, robot. Go find the source of ethanol.")
  • Search and rescue applications
  • Perfumery & cosmetics discovery and quality control
  • Food quality assessments
  • Aroma evolution in long-context scenes
  • Atmospheric research for aerospace
  • Chemical tracking for enhancing navigation on Mars
  • Research into nose prosthetics
  • Breath-based medical diagnostics
  • Research contributing to better olfaction sensor development (my personal favorite)


With Sigma temporally sequencing all six sensing modalities, we expect the community to invent novel applications (and discover entirely new correlations) than what is discussed above. 

How to Get Involved

Participation will be phased to ensure data quality and scientific rigor. Researchers and labs may:


  • Participate in early data collection rounds
  • Receive an olfaction research instrument
  • Provide feedback on protocols and benchmarks
  • Build and publish models using released data


A limited number of olfaction research instruments will be made available to selected groups under a research instrument license. Startups and industry teams are welcome to engage through non-commercial research use, benchmarking, and collaboration. Details on participation and the first dataset release will be shared shortly.This is the beginning of a long-term effort. The goal is not speed for its own sake, but a durable foundation in olfaction and artificial intelligence on which the community can build. 


All whom contribute to this effort will be openly acknowledged and cited.

Privacy

This initiative is an attempt to enable a sense of perception that has been ignored. As such, it is critical to ensure privacy is secured and trust is maintained. Data is collected under full anonymity and I have put forth my best effort to ensure the app, olfaction instrument, data, and database all maintain best practices of HIPAA and GDPR. The privacy agreement is displayed and accepted within the app. You have full power to purge any and all data you collect at any time.


However, I am only one person and doing my best to solicit review from cybersecurity experts. I have extensive background building secure hardware and software systems, but I am not a lawyer. If you any qualms about how data is collected, do not participate. There is no money to make here and the data will not be sold for financial gain - it is strictly an initiative for science. 

Dataset Access & Licensing

All data will be open-source under the MIT license. Research instrument participation requires a research instrument license; detailed technical specifications and dataset governance docs are forthcoming.

Endgame

I have spent my whole PhD and nearly the last decade of my career attempting to understand, standardize, and democratize olfaction for machines - both the hardware and software sides. I have open sourced multiple AI resources to try to encourage olfaction in robotics, but it is clear that the discipline will only move forward with better data accumulated en masse.


The horizon of this effort is intended to be a large and open multimodal dataset that can be used to deploy entirely new applications and become a foundational reference for olfaction research driven by consensus among all disciplines of science. I am self-funding this effort and the manufacturing of the olfaction instruments by choice because I don't want to be slowed down by fundraising (although there have been several offers). I strongly see what problems artificial olfaction can solve for technology, humanity, and science.


Through Sigma and ScentNet, let's enable the sense of smell for machines together.

- Kordel K. France

PhD Candidate at UT Dallas

Founder of the Scentience Initiative 




(No LLMs were used to construct this - every line is intentional)

Participate in ScentNet

Download Scentience Sigma


Multiple Patents Pending

See the Sigma Technical Spec

Copyright © 2024-2026 Alchemy Technologies DBA Scentience Robotics - All Rights Reserved.