Tristan/olm-october-2022-tokenized-1024-perplexity-filters Viewer • Updated Dec 9, 2022 • 12.8M • 216
Tristan/olm-CC-MAIN-2022-40-sampling-ratio-0.15894621295-perplexity-filters Viewer • Updated Dec 8, 2022 • 14.6M • 1.91k
Tristan/olm-october-2022-tokenized-1024-no-bigscience-filters Viewer • Updated Dec 7, 2022 • 12.9M • 836
Tristan/olm-october-2022-tokenized-1024-exact-dedup-only Viewer • Updated Dec 7, 2022 • 13.2M • 1.73k
Tristan/olm-CC-MAIN-2022-40-sampling-ratio-0.15894621295-no-bigscience-filters Viewer • Updated Dec 7, 2022 • 16.4M • 1.02k
Tristan/olm-CC-MAIN-2022-40-sampling-ratio-0.15894621295-exact-dedup-only Viewer • Updated Dec 6, 2022 • 5.78M • 2.28k
Tristan/olm-CC-MAIN-2022-40-sampling-ratio-0.0001-ne-language Viewer • Updated Nov 17, 2022 • 37 • 20