Demo Center for Data Science

From DAAP
Revision as of 13:50, 30 September 2015 by Karima Rafes (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

For the moment list of all available data in https://io.datascience-paris-saclay.fr/.

Dataset Branch of science Description Contact
Air passengers dataset computer science The data set was donated to us by an unnamed company handling flight ticket reservations. The data is thin, it contains the date of departure, the departure airport, the arrival airport, etc. Email
Arctic sea ice cover computer science, climatology The data is a time series of "images", consisting of different physical variables on a regular grid on the Earth, indexed by longitude and latitude coordinates. Email
BioPortal biology BioPortal SPARQL is a service to query BioMedical ontologies using the SPARQL standard. Ontologies have been transformed into RDF triples from their original formats (OWL, OBO and UMLS/RRF, ...) and asserted into a triple store. Email
Biosamples biology The BioSample Database (BioSD) is a database at European Bioinformatics Institute for the information about the biological samples used in DNA sequencing. Email
Cell phenotyping computer science, medicine The data is from Samusik et al. where 38 surface markers (features) were measured in cells from the the bone marrow of healthy mice. The samples were analyzed and independently hand-gated by experts to identify 24 immune cell populations (classes). Email
ChEMBL biology ChEMBL is a manually curated chemical database of bioactive molecules with drug-like properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL). Email
Charged particle tracking in 2D with a possible future LHC Silicon detector computer science, physics The data provided is a list of hit positions from a simple toy detector model that mimics the Atlas detector design (which is generic enough for recent silicon-based tracking detectors). Email
Climate model simulation climatology Reference height temperature data Monthly Anomaly. Climate model simulation: model CCSM4 in post-industrial control conditions (piControl, r2ip1) Email
Common Frame of Reference for European contract law private law, contract law, law This database containing relationships between european legal principles and national legal decisions has been as developed, from 2005 to 2008, by 150 researchers grouped in « The Joint Network on European Private Law », for the European Project CoPECL (FP6-CITIZENS-3) Email
DAAP PMM analytical chemistry This wiki is a demonstrator for the project under construction DAAP (Data Acquisition For Analytical Platform). Its target is to bring together the research community in Analytical Chemistry. Email
DBPedia general knowledge A knowledge base extracted from Wikipedia Email
Dataset from the ATLAS Higgs Boson Machine Learning Challenge 2014 particle physics, machine learning The dataset has been built from official ATLAS simulation, with Higgs to tautau events mixed with different backgrounds. It has been used in the 2014 HiggsML challenge on Kaggle. It is hosted on the CERN Open Data Portal. Email
Drug Classification analytical chemistry The dataset contain Raman spectra of 4 types of chemotherapeutic agents diluted in 9 different solutions, and having different concentrations. Measures were made by the Lip(Sys)². Email
EMBL-EBI resources biology The European Bioinformatics Institute (EMBL-EBI) Platform aims to bring together the efforts of a number of EMBL-EBI resources that provide access to their data using Semantic Web technologies. It provides a unified way to query across resources using the W3C SPARQL query language. Email
Ensembl biology Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Email
Epidemium cancer mortality rate prediction Cancer This dataset contains mortality rates of different cancer types for several geographic areas. Email
Epidemium cancer mortality prediction dataset computer science, medicine These datasets are about number of males/females living, number of people living in the geographic area, size of the geographic area and mortality rate from any type of cancer. Email
Europeana Culture description Email
Expression Atlas biology A powerful way to find information about gene and protein expression across species and biological conditions. It aims to help answering questions such as ‘where is a certain gene expressed?’ or ‘how does its expression change in a disease?’. Email
French National Library library The data.bnf.fr project endeavours to make the data produced by Bibliothèque nationale de France (French National Library) more useful on the Web. Email
Gregorius canon law, legal history, Legal history of the Catholic Church Base de données en droit canonique Email
IODS Linked data List of open dataset in io.datascience-paris-saclay.fr Email
IdRef authority control Vous trouverez les notices d’autorité IdRef et les références bibliographiques en provenance du Sudoc. Tous les types de notices d'autorité sont présents : Personnes, Collectivités, Noms Communs (Rameau et FMeSH), Noms géographiques, Familles et Titres. Email
LRI Information System computer science Scientists of the laboratory Email
Libraries of Paris-Saclay University library List of libraries in the Paris-Saclay University. Email
Madelon data science This is one of the datasets for the NIPS 2003 feature selection challenge. Email
Modified HiggsML dataset particle physics This dataset is a version of the HiggsML dataset, which contains a mixture of Higgs particles decaying into tau pairs and the principal background processes (800K events in total). Half of the events are unchanged, but the other half has been artificially distorted or corrupted in some way. Email
National Center for Atmospheric Research (NCAR) climatology, Atmospheric chemistry The US National Center for Atmospheric Research studies meteorology, climate science, atmospheric chemistry, solar-terrestrial interactions, environmental and societal impacts. Email
Ontology Lookup Service biology The Ontology Lookup Service (OLS) is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions. Email
OrthoDB biology Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Email
Template:EqualPLISonMars Planetary SUrface Portal (PSUP) planetary science This facility involves a data processing center coupled with planetary surface data dissemination center (mineralogical maps, geomorphologic maps, DTM...). Planetary SUrface Portal is an initiative from OSUPS and OSUL. Email
Pollinating insect classification (SPIPOLL) entomology, natural science The SPIPOLL (Suivi Photographique des Insectes POLLinisateurs) project proposes to quantitatively study pollinating insects in France. Email
Template:Equalquaero+broadcast+news Quaero Broadcast News Extended Named Entity corpus computer science The Quaero Broadcast News Extended Named Entity corpus consists of the manual annotation of (i) the ESTER 2 corpus (see ELRA-S0338) and (ii) the Quaero Speech Recognition Evaluation corpus (manual and automatic transcriptions coming from 3 different ASR systems). Email
Quaero French Medical Corpus computer science The QUAERO French Medical Corpus is a selection of MEDLINE titles and EMEA documents manually annotated as a resource for named entity recognition and normalization. It was used as a gold standard for French biomedical text in the CLEF eHealth evaluation lab in 2015 and 2016. Email
Template:Equalquaero+old+press Quaero Old Press Extended Named Entity corpus computer science Manual annotation of 76 newspaper issues of 1890-1891: Le Temps, La Croix and Le Figaro according to the Quaero extended and structured named entity definition.

Training: 231 pages, 1,297,742 words, 114,599 types, 136,113 components. Test: 64 pages, 363,455 words, 33,083 types, 40,432 components.||Email

Reactome biology Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. Email
Reddit Public Comments (2007-10 through 2015-05) sociology ~1.7 billion JSON comment objects from reddit.com complete with the comment, score, author, subreddit, position in comment tree and other fields that are available through Reddit's API. Email
Scholarly Linked Open Data computer science ScholarlyData.org provides facilities and services to pubish you scholarly data as Linked Open Data Email
Semantic description of Debian packages computer science, free and open-source software, software engineering Semantic description of packages produced by the Debian projects Email
Sparql Score computer science SPARQLScore is an attempt to evaluate the conformance of triplestores to the W3C standards. Email
Synchrotron soleil physics, biology Data about French national Synchrotron facility. Email
The MNIST database of handwritten digits computer science, artificial intelligence The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. Email
WikiPathways biology WikiPathways is a Wiki for biological pathways. WikiPathways is intended to be an open, public space for content editing dedicated to biological pathways, facilitating the contribution and maintenance of pathway information from the scientific community. Email
Wikidata general knowledge, Semantic Web Wikidata is a free linked database for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource and others.

The content is available under a free license, exported using standard formats, and can be interlinked to other open data sets on the linked data web.||Email

YAGO general knowledge A knowledge base extracted from Wikipedia, containing general knowledge about famous people, cities, countries, movies, organizations, etc, together with a taxonomy from WordNet. Email
efSUP_sem1 massive open online course jeu de données pour de tests Email
21:23:04 02/16/2019 -- Refresh -- Duration of query :0.047s