Demo Center for Data Science

From DAAP
Revision as of 13:46, 30 September 2015 by Karima Rafes (talk | contribs) (Created page with "For the moment list of all available data in https://io.datascience-paris-saclay.fr/. {{#sparql: PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> PREFIX rdfs:<http://www.w3....")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

For the moment list of all available data in https://io.datascience-paris-saclay.fr/.

Dataset Branch of science Description Contact
computer science Air passengers dataset The data set was donated to us by an unnamed company handling flight ticket reservations. The data is thin, it contains the date of departure, the departure airport, the arrival airport, etc. Email
computer science, climatology Arctic sea ice cover The data is a time series of "images", consisting of different physical variables on a regular grid on the Earth, indexed by longitude and latitude coordinates. Email
biology BioPortal BioPortal SPARQL is a service to query BioMedical ontologies using the SPARQL standard. Ontologies have been transformed into RDF triples from their original formats (OWL, OBO and UMLS/RRF, ...) and asserted into a triple store. Email
biology Biosamples The BioSample Database (BioSD) is a database at European Bioinformatics Institute for the information about the biological samples used in DNA sequencing. Email
computer science, medicine Cell phenotyping The data is from Samusik et al. where 38 surface markers (features) were measured in cells from the the bone marrow of healthy mice. The samples were analyzed and independently hand-gated by experts to identify 24 immune cell populations (classes). Email
biology ChEMBL ChEMBL is a manually curated chemical database of bioactive molecules with drug-like properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL). Email
computer science, physics Charged particle tracking in 2D with a possible future LHC Silicon detector The data provided is a list of hit positions from a simple toy detector model that mimics the Atlas detector design (which is generic enough for recent silicon-based tracking detectors). Email
climatology Climate model simulation Reference height temperature data Monthly Anomaly. Climate model simulation: model CCSM4 in post-industrial control conditions (piControl, r2ip1) Email
stellar physics, Exoplanet CoRot CoRoT was a CNES space mission dedicated to exoplanet search and stellar physics. The main products available in this archive are light curves.

These light curves, labelled N2 (detailed description), are ready for science use.||Email

private law, contract law, law Common Frame of Reference for European contract law This database containing relationships between european legal principles and national legal decisions has been as developed, from 2005 to 2008, by 150 researchers grouped in « The Joint Network on European Private Law », for the European Project CoPECL (FP6-CITIZENS-3) Email
analytical chemistry DAAP PMM This wiki is a demonstrator for the project under construction DAAP (Data Acquisition For Analytical Platform). Its target is to bring together the research community in Analytical Chemistry. Email
general knowledge DBPedia A knowledge base extracted from Wikipedia Email
particle physics, machine learning Dataset from the ATLAS Higgs Boson Machine Learning Challenge 2014 The dataset has been built from official ATLAS simulation, with Higgs to tautau events mixed with different backgrounds. It has been used in the 2014 HiggsML challenge on Kaggle. It is hosted on the CERN Open Data Portal. Email
analytical chemistry Drug Classification The dataset contain Raman spectra of 4 types of chemotherapeutic agents diluted in 9 different solutions, and having different concentrations. Measures were made by the Lip(Sys)². Email
biology EMBL-EBI resources The European Bioinformatics Institute (EMBL-EBI) Platform aims to bring together the efforts of a number of EMBL-EBI resources that provide access to their data using Semantic Web technologies. It provides a unified way to query across resources using the W3C SPARQL query language. Email
biology Ensembl Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Email
Cancer Epidemium cancer mortality rate prediction This dataset contains mortality rates of different cancer types for several geographic areas. Email
computer science, medicine Epidemium cancer mortality prediction dataset These datasets are about number of males/females living, number of people living in the geographic area, size of the geographic area and mortality rate from any type of cancer. Email
Culture Europeana description Email
biology Expression Atlas A powerful way to find information about gene and protein expression across species and biological conditions. It aims to help answering questions such as ‘where is a certain gene expressed?’ or ‘how does its expression change in a disease?’. Email
library French National Library The data.bnf.fr project endeavours to make the data produced by Bibliothèque nationale de France (French National Library) more useful on the Web. Email
canon law, legal history, Legal history of the Catholic Church Gregorius Base de données en droit canonique Email
cosmology HESIOD The Herschel IdOc Database is delivering photometric maps and spectral cubes from the PACS and SPIRE instruments (IR domain), reprocessed at IAS with the latest ESA pipelines and with high level customized pipelines. Virtual Observatory compatible. Email
Linked data IODS List of open dataset in io.datascience-paris-saclay.fr Email
authority control IdRef Vous trouverez les notices d’autorité IdRef et les références bibliographiques en provenance du Sudoc. Tous les types de notices d'autorité sont présents : Personnes, Collectivités, Noms Communs (Rameau et FMeSH), Noms géographiques, Familles et Titres. Email
computer science LRI Information System Scientists of the laboratory Email
library Libraries of Paris-Saclay University List of libraries in the Paris-Saclay University. Email
data science Madelon This is one of the datasets for the NIPS 2003 feature selection challenge. Email
particle physics Modified HiggsML dataset This dataset is a version of the HiggsML dataset, which contains a mixture of Higgs particles decaying into tau pairs and the principal background processes (800K events in total). Half of the events are unchanged, but the other half has been artificially distorted or corrupted in some way. Email
climatology, Atmospheric chemistry National Center for Atmospheric Research (NCAR) The US National Center for Atmospheric Research studies meteorology, climate science, atmospheric chemistry, solar-terrestrial interactions, environmental and societal impacts. Email
biology Ontology Lookup Service The Ontology Lookup Service (OLS) is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions. Email
biology OrthoDB Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Email
entomology, natural science Pollinating insect classification (SPIPOLL) The SPIPOLL (Suivi Photographique des Insectes POLLinisateurs) project proposes to quantitatively study pollinating insects in France. Email
computer science Template:Equalquaero+broadcast+news Quaero Broadcast News Extended Named Entity corpus The Quaero Broadcast News Extended Named Entity corpus consists of the manual annotation of (i) the ESTER 2 corpus (see ELRA-S0338) and (ii) the Quaero Speech Recognition Evaluation corpus (manual and automatic transcriptions coming from 3 different ASR systems). Email
computer science Quaero French Medical Corpus The QUAERO French Medical Corpus is a selection of MEDLINE titles and EMEA documents manually annotated as a resource for named entity recognition and normalization. It was used as a gold standard for French biomedical text in the CLEF eHealth evaluation lab in 2015 and 2016. Email
computer science Template:Equalquaero+old+press Quaero Old Press Extended Named Entity corpus Manual annotation of 76 newspaper issues of 1890-1891: Le Temps, La Croix and Le Figaro according to the Quaero extended and structured named entity definition.

Training: 231 pages, 1,297,742 words, 114,599 types, 136,113 components. Test: 64 pages, 363,455 words, 33,083 types, 40,432 components.||Email

biology Reactome Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. Email
sociology Reddit Public Comments (2007-10 through 2015-05) ~1.7 billion JSON comment objects from reddit.com complete with the comment, score, author, subreddit, position in comment tree and other fields that are available through Reddit's API. Email
cosmology SZ cluster database This database provides access to catalogues and complementary information on clusters of galaxies observed through the Sunyaev-Zeldovich (SZ) effect. This Planck SZ cluster catalogue is accessible on the Virtual Obervatory. Email
computer science Scholarly Linked Open Data ScholarlyData.org provides facilities and services to pubish you scholarly data as Linked Open Data Email
computer science, free and open-source software, software engineering Semantic description of Debian packages Semantic description of packages produced by the Debian projects Email
computer science Sparql Score SPARQLScore is an attempt to evaluate the conformance of triplestores to the W3C standards. Email
physics, biology Synchrotron soleil Data about French national Synchrotron facility. Email
computer science, artificial intelligence The MNIST database of handwritten digits The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. Email
biology WikiPathways WikiPathways is a Wiki for biological pathways. WikiPathways is intended to be an open, public space for content editing dedicated to biological pathways, facilitating the contribution and maintenance of pathway information from the scientific community. Email
general knowledge, Semantic Web Wikidata Wikidata is a free linked database for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource and others.

The content is available under a free license, exported using standard formats, and can be interlinked to other open data sets on the linked data web.||Email

general knowledge YAGO A knowledge base extracted from Wikipedia, containing general knowledge about famous people, cities, countries, movies, organizations, etc, together with a taxonomy from WordNet. Email
massive open online course efSUP_sem1 jeu de données pour de tests Email
20:40:58 07/20/2019 -- Refresh -- Duration of query :0.144s