Demo Center for Data Science

From DAAP
Revision as of 13:46, 30 September 2015 by Karima Rafes (talk | contribs) (Created page with "For the moment list of all available data in https://io.datascience-paris-saclay.fr/. {{#sparql: PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> PREFIX rdfs:<http://www.w3....")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

For the moment list of all available data in https://io.datascience-paris-saclay.fr/.

Dataset Branch of science Description Contact
materials science AFLOW AFLOW is a database of more than 3 000 000 materials compounds, 500 000 000 calculated properties such as elastic properties, thermal properties... with online applications and documentations. Email
computer science Air passengers dataset The data set was donated to us by an unnamed company handling flight ticket reservations. The data is thin, it contains the date of departure, the departure airport, the arrival airport, etc. Email
materials science American Mineralogist Crystal Structure Database This site is an interface to a crystal structure database that includes every structure published in the American Mineralogist, The Canadian Mineralogist, European Journal of Mineralogy and Physics and Chemistry of Minerals, as well as selected datasets from other journals. Email
computer science, climatology Arctic sea ice cover The data is a time series of "images", consisting of different physical variables on a regular grid on the Earth, indexed by longitude and latitude coordinates. Email
information extraction, life sciences BioNLP-ST 2013 Bacteria Biotopes The knowledge tackled by this task is the habitats where bacteria live, and the environment properties of bacteria. This information is a particularly interesting in the fields of food processing and safety, health sciences and waste processing. Email
biology, life science BioPortal BioPortal SPARQL is a service to query BioMedical ontologies using the SPARQL standard. Ontologies have been transformed into RDF triples from their original formats (OWL, OBO and UMLS/RRF, ...) and asserted into a triple store. Email
biology, life science Biomodels BioModels linked data set contains all curated and non curated SBML models in the BioModels repository in RDF. Email
biology, life science Biosamples The BioSample Database (BioSD) is a database at European Bioinformatics Institute for the information about the biological samples used in DNA sequencing. Email
materials science CSD-Community List of crystallographic tools that contains data collection, validation and visualisation to teaching tools, research and analysis Email
computer science, medicine Cell phenotyping The data is from Samusik et al. where 38 surface markers (features) were measured in cells from the the bone marrow of healthy mice. The samples were analyzed and independently hand-gated by experts to identify 24 immune cell populations (classes). Email
biology, life science ChEMBL ChEMBL is a manually curated chemical database of bioactive molecules with drug-like properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL). Email
computer science, physics Charged particle tracking in 2D with a possible future LHC Silicon detector The data provided is a list of hit positions from a simple toy detector model that mimics the Atlas detector design (which is generic enough for recent silicon-based tracking detectors). Email
climatology Climate model simulation Reference height temperature data Monthly Anomaly. Climate model simulation: model CCSM4 in post-industrial control conditions (piControl, r2ip1) Email
stellar physics, Exoplanet CoRot CoRoT was a CNES space mission dedicated to exoplanet search and stellar physics. The main products available in this archive are light curves.

These light curves, labelled N2 (detailed description), are ready for science use.||Email

private law, contract law, law Common Frame of Reference for European contract law This database containing relationships between european legal principles and national legal decisions has been as developed, from 2005 to 2008, by 150 researchers grouped in « The Joint Network on European Private Law », for the European Project CoPECL (FP6-CITIZENS-3) Email
crystallography Crystal Lattice Structures This page offers a concise index of common crystal lattice structures. A graphical representation as well as useful information about the lattices can be obtained by clicking on the desired structure. Email
materials science, crystallography Crystallography Open Database Open-access collection of crystal structures of organic, inorganic, metal-organics compounds and minerals, excluding biopolymers. Email
analytical chemistry, chemistry DAAP Lip(Sys)² This wiki is a demonstrator for the project under construction DAAP (Data Acquisition For Analytical Platform). The target is to bring together the research community in Analytical Chemistry. The first step is to reference the resources available to researchers, and then to share data. Email
analytical chemistry DAAP PMM This wiki is a demonstrator for the project under construction DAAP (Data Acquisition For Analytical Platform). Its target is to bring together the research community in Analytical Chemistry. Email
document DATABASES, REVIEWS AND BOOKS ONLINE Focus Paris-Sud is the document search engine allowing you to access in a single search a large part of the documentation available at Paris-Sud University without distinction of support: books and e-books, book chapters , paper theses and online, journals. Email
general knowledge DBPedia A knowledge base extracted from Wikipedia Email
particle physics, machine learning Dataset from the ATLAS Higgs Boson Machine Learning Challenge 2014 The dataset has been built from official ATLAS simulation, with Higgs to tautau events mixed with different backgrounds. It has been used in the 2014 HiggsML challenge on Kaggle. It is hosted on the CERN Open Data Portal. Email
analytical chemistry Drug Classification The dataset contain Raman spectra of 4 types of chemotherapeutic agents diluted in 9 different solutions, and having different concentrations. Measures were made by the Lip(Sys)². Email
biology, life science EMBL-EBI resources The European Bioinformatics Institute (EMBL-EBI) Platform aims to bring together the efforts of a number of EMBL-EBI resources that provide access to their data using Semantic Web technologies. It provides a unified way to query across resources using the W3C SPARQL query language. Email
biology, life science Ensembl Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Email
Cancer Epidemium cancer mortality rate prediction This dataset contains mortality rates of different cancer types for several geographic areas. Email
computer science, medicine Epidemium cancer mortality prediction dataset These datasets are about number of males/females living, number of people living in the geographic area, size of the geographic area and mortality rate from any type of cancer. Email
biology, life science Expression Atlas A powerful way to find information about gene and protein expression across species and biological conditions. It aims to help answering questions such as ‘where is a certain gene expressed?’ or ‘how does its expression change in a disease?’. Email
chemistry, physics, thermodynamics FactSage There is datas about substances, pure solutions, pure metals or metallic solutions, liquids ...

We can have calculations such as phase diagrams or pH potential, equilibrium or chemical reactions.||Email

virtual reality, fluid mechanics, HCI FluidMeca Fluid mechanics results including tags.Real time simulations benchmarks. Email
library French National Library The data.bnf.fr project endeavours to make the data produced by Bibliothèque nationale de France (French National Library) more useful on the Web. Email
canon law, legal history, Legal history of the Catholic Church Gregorius Base de données en droit canonique Email
computer science Grid Observatory 3.0 The Grid Observatory 3.0 aims to publish the Grid Observatory and Green Computing Observatory data in an open and interoperable format in order to facilitate access to these data and the cross-analysis of these complementary data sources. Email
cosmology HESIOD The Herschel IdOc Database is delivering photometric maps and spectral cubes from the PACS and SPIRE instruments (IR domain), reprocessed at IAS with the latest ESA pipelines and with high level customized pipelines. Virtual Observatory compatible. Email
planetary science, hyperspectral imaging Hyperspectral classification toolkit Hyperspectral classification toolkit containing one hyperspectral cube example (from OMEGA instrument), reference spectral database and reference classification. This toolkit is done for classification test purpose. Email
linked data IODS List of open dataset in io.datascience-paris-saclay.fr Email
authority control IdRef Vous trouverez les notices d’autorité IdRef et les références bibliographiques en provenance du Sudoc. Tous les types de notices d'autorité sont présents : Personnes, Collectivités, Noms Communs (Rameau et FMeSH), Noms géographiques, Familles et Titres. Email
computer science LRI Information System Scientists of the laboratory Email
culture Libraries of Paris-Saclay University List of libraries in the Paris-Saclay University. Email
analytical chemistry, biology, life science Lipid modifications in J774 macrophages by vibrational spectroscopies Investigation of lipid modifications in J774 macrophages Email
Solar Physics MEDOC MEDOC (Multi Experiment Data & Operation Center) is a National Center for Space Solar Physics Data, approved by CNES, in the frame of an agreement between CNRS/INSU, Université Paris-Sud and CNES. MEDOC is located at Institut d'Astrophysique Spatiale in Orsay. Email
materials science MINCRYST Create the original combination consisting of the Crystal Structure Database for Minerals, the automatically formed Calculated Powder X-ray Diffraction Standards (CPDS) SubBase and the Applied Program Package using saved information for Powder X-Ray Diffraction and Crystal Chemical Analysis. Email
data science Madelon This is one of the datasets for the NIPS 2003 feature selection challenge. Email
chemistry, material, laboratory Materials Virtual Lab The Materials Virtual Lab is a materials AI group focused on the cross-disciplinary application of machine learning to large materials data sets to accelerate materials design. It's not a proper database but it can be used to get to other databases. Email
materials science Mineralogy Database The Mineralogy Database was last updated on 9/5/2012 and it contains 4,714 individual mineral species descriptions with links and a comprehensive image library. Email
analytical chemistry, lipidomics MoDALMI The goal of this project is to create a database in the analytical chemistry field for lipids, metabolites and isotopes, with an open access in an accessible common format, with metadata specifications. Email
particle physics Modified HiggsML dataset This dataset is a version of the HiggsML dataset, which contains a mixture of Higgs particles decaying into tau pairs and the principal background processes (800K events in total). Half of the events are unchanged, but the other half has been artificially distorted or corrupted in some way. Email
chemistry, thermodynamics, thermochemistry NIST-JANAF Thermochemical Tables NIST-JANAF gathers exhaustive informations about chemical elements or compounds. The database can be used in different ways. For example, one can type a chemical formula, in order to access specific informations, or an element of the periodic table to access all the possible compounds. Email
materials science NOMAD Laboratory Data on chemistry, chemical elements, crystallography and materials. Email
climatology, atmospheric chemistry National Center for Atmospheric Research (NCAR) The US National Center for Atmospheric Research studies meteorology, climate science, atmospheric chemistry, solar-terrestrial interactions, environmental and societal impacts. Email
chemistry, science National Institute of Standards and Technology (NIST) NIST produces the Nation’s Standard Reference Data (SRD). These data are assessed by experts and are trustworthy such that people can use the data with confidence and base significant decisions on the data. NIST provides 49 free SRD databases and 41 fee-based SRD databases. Email
biology, life science Ontology Lookup Service The Ontology Lookup Service (OLS) is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions. Email
biology, life science OrthoDB Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Email
planetary science Template:EqualPLISonMars Planetary SUrface Portal (PSUP) This facility involves a data processing center coupled with planetary surface data dissemination center (mineralogical maps, geomorphologic maps, DTM...). Planetary SUrface Portal is an initiative from OSUPS and OSUL. Email
entomology, natural science Pollinating insect classification (SPIPOLL) The SPIPOLL (Suivi Photographique des Insectes POLLinisateurs) project proposes to quantitatively study pollinating insects in France. Email
computer science Template:Equalquaero+broadcast+news Quaero Broadcast News Extended Named Entity corpus The Quaero Broadcast News Extended Named Entity corpus consists of the manual annotation of (i) the ESTER 2 corpus (see ELRA-S0338) and (ii) the Quaero Speech Recognition Evaluation corpus (manual and automatic transcriptions coming from 3 different ASR systems). Email
computer science Quaero French Medical Corpus The QUAERO French Medical Corpus is a selection of MEDLINE titles and EMEA documents manually annotated as a resource for named entity recognition and normalization. It was used as a gold standard for French biomedical text in the CLEF eHealth evaluation lab in 2015 and 2016. Email
computer science Template:Equalquaero+old+press Quaero Old Press Extended Named Entity corpus Manual annotation of 76 newspaper issues of 1890-1891: Le Temps, La Croix and Le Figaro according to the Quaero extended and structured named entity definition.

Training: 231 pages, 1,297,742 words, 114,599 types, 136,113 components. Test: 64 pages, 363,455 words, 33,083 types, 40,432 components.||Email

biology, life science Reactome Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. Email
sociology Reddit Public Comments (2007-10 through 2015-05) ~1.7 billion JSON comment objects from reddit.com complete with the comment, score, author, subreddit, position in comment tree and other fields that are available through Reddit's API. Email
cosmology SZ cluster database This database provides access to catalogues and complementary information on clusters of galaxies observed through the Sunyaev-Zeldovich (SZ) effect. This Planck SZ cluster catalogue is accessible on the Virtual Obervatory. Email
Semantic Web Scholarly Linked Open Data ScholarlyData.org provides facilities and services to pubish you scholarly data as Linked Open Data Email
computer science Semantic description of Debian packages Semantic description of packages produced by the Debian projects Email
computer science Sparql Score SPARQLScore is an attempt to evaluate the conformance of triplestores to the W3C standards. Email
physics, biology, life science Synchrotron soleil Data about French national Synchrotron facility. Email
computer science, artificial intelligence The MNIST database of handwritten digits The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. Email
biology, life science WikiPathways WikiPathways is a Wiki for biological pathways. WikiPathways is intended to be an open, public space for content editing dedicated to biological pathways, facilitating the contribution and maintenance of pathway information from the scientific community. Email
general knowledge, Semantic Web Wikidata Wikidata is a free linked database for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource and others.

The content is available under a free license, exported using standard formats, and can be interlinked to other open data sets on the linked data web.||Email

general knowledge YAGO A knowledge base extracted from Wikipedia, containing general knowledge about famous people, cities, countries, movies, organizations, etc, together with a taxonomy from WordNet. Email
MOOC efSUP_sem1 jeu de données pour de tests Email
NLP free GRACE French Literature text tagged with their POS Email
chemistry, crystallography the Bilbao Crystallographic Server Bilbao Crystallographic Server is an open access website offering online crystallographic database and programs aimed at analyzing, calculating and visualizing problems of structural and mathematical crystallography, solid state physics and structural chemistry Email
23:24:45 11/13/2019 -- Refresh -- Duration of query :0.063s