AI & ML interests

AI4Science

Recent Activity

cgeorgiawΒ  updated a Space 12 days ago
LeMaterial/LeMat-GenBench
cgeorgiawΒ  published a Space 12 days ago
LeMaterial/LeMat-GenBench
thomwolfΒ  authored a paper about 2 months ago
Robot Learning: A Tutorial
View all activity

cgeorgiawΒ 
posted an update 18 days ago
cgeorgiawΒ 
posted an update 3 months ago
view post
Post
5948
πŸš€πŸš€πŸš€ The largest ever dataset of co-folded 3D protein-ligand structures just dropped on HF!!

Meet SAIR (Structurally Augmented ICβ‚…β‚€ Repository): 5M+ AI-generated complexes with experimentally measured drug potency data from SandboxAQ. πŸš€πŸš€πŸš€

Check it out and explore here: SandboxAQ/SAIR

Β·
cgeorgiawΒ 
posted an update 4 months ago
cgeorgiawΒ 
posted an update 6 months ago
cgeorgiawΒ 
posted an update 6 months ago
view post
Post
1608
Snooping on HF is the best because sometimes you just discover that someone (in this case, Earth Species Project) is about to drop terabytes of sick (high quality animal sounds) data...

EarthSpeciesProject/NatureLM-audio-training
cgeorgiawΒ 
posted an update 6 months ago
view post
Post
520
Just dropped two bigger physics datasets (both on photonics)!

NUMBA 1: SIB-CL
This dataset of Surrogate- and Invariance-Boosted Contrastive Learning (SIB-CL) datasets for two scientific problems:
- PhC2D: 2D photonic crystal density-of-states (DOS) and bandstructure data.
- TISE: 3D time-independent SchrΓΆdinger equation eigenvalue and eigenvector solutions.

NUMBA2: 2D Photonic Topology
Symmetry-driven analysis of 2D photonic crystals: 10k random unit cells across 11 symmetries, 2 polarizations, 5 contrasts. Includes time-reversal breaking cases for 4 symmetries at high contrast.

Check them out: cgeorgiaw/sib-cl & cgeorgiaw/2d-photonic-topology
clefourrierΒ 
posted an update 7 months ago
view post
Post
1996
Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨

If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!

An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!

The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"πŸ‘Œ
(to know on your use case how to select the best evals for you)

Blog: HuggingFaceFW/blogpost-fine-tasks
  • 2 replies
Β·