Retour

Explorez tous les épisodes du podcast Machine-Centric Science

Plongez dans la liste complète des épisodes de Machine-Centric Science. Chaque épisode est catalogué accompagné de descriptions détaillées, ce qui facilite la recherche et l'exploration de sujets spécifiques. Suivez tous les épisodes de votre podcast préféré et ne manquez aucun contenu pertinent.

Rows per page:

1–28 of 28

TitreDateDurée
Sandra Gesing17 Feb 202300:41:21

An interview about FAIR software, workflows, and virtual research environments (VREs) / science gateways with Sandra Gesing, currently a Senior Research Scientist and Scientific Outreach and Diversity, Equity, and Inclusion (DEI) Lead at the Discovery Partners Institute at the University of Illinois, Chicago.

Christophe Blanchi18 Jan 202301:12:10
Patrick Huck21 Jul 202200:53:08

Materials Project (MP) website: https://materialsproject.org/

Novel Materials Discovery (NOMAD) Laboratory: https://nomad-lab.eu/

Contributor Roles Taxonomy: https://credit.niso.org/

Authentication resources (FAIR A1.2):
- https://portier.github.io/using.html
- https://github.com/simov/grant
- https://docs.konghq.com/

U.S. Department of Energy resources:
- Office of Scientific and Technical Information (OSTI) Data ID Service: https://www.osti.gov/data-services
- https://www.energy.gov/science/office-science-pure-data-resources

Connecting with Patrick:
- https://www.linkedin.com/in/tschaume/
- https://twitter.com/tschaume
- https://appliedenergyscience.lbl.gov/people/patrick-huck

FAIR Implementation Profile (FIP) Ontology15 Jul 202200:09:43

The FAIR Implementation Profile (FIP) Ontology: https://w3id.org/fair/fip/terms/FIP-Ontology

R1.3: metadata and data meet domain-relevant community standards20 Jun 202200:08:10

Linked Open Vocabularies (LOV): https://lov.linkeddata.es/dataset/lov/

FAIRSharing: https://fairsharing.org/

PageRank of Linked Open Vocabularies (LOV): https://donnywinston.com/posts/pagerank-of-linked-open-vocabularies-lov/

Principles of Open Scholarly Infrastructure (POSI): https://openscholarlyinfrastructure.org/

R1.2: Metadata and data are associated with detailed provenance02 Jun 202200:07:29

https://www.w3.org/TR/prov-dm/#dfn-provenance

# Component 1: Entities/Activities:
Type: Entity
Type: Activity
Relation: Generation/Invalidation (E-Act)
Relation: Usage (Act-E)
Relation: Communication (Act1-[E]-Act2)

Relation: Trigger/Starter of Start of Act (trigger E, starter Act)
Relation: Trigger/Ender of End of Act End of Act (trigger E, ender Act)


# Component 2: Derivations:
Relation: Derivation (E-E, E-Act)

Relation: Revision (E-E)
Relation: Quotation (E-E)
Relation: Primary Source (E-E)

# Component3 : Agents, Responsibility, and Influence
Type: Agent
Relation: Attribution (E-Agt)
Relation: Association (Act-Agt (role), Act-E (plan))
Relation: Delegation (Agt-Act) - acted on behalf of

Relation: Influencer/Influencee ({E,Act,Agt}-[usage,start,end,generation,invalidation,communication,derviation,attribution,association,delgation]-{E,Act,Agt})

3 core types: entities, activities, agents. “instantaneous events” are put in context of activities.
wrt "time instants":
- generation is at instant of completion of production
- usage is at instant of beginning of utilization
- start, when activity is deemed started, is an instant
- end, when activity is deemed ended, is an instant
- invalidation is at instant of start of destruction, cessation, or expiry

10 influencing relations (not including 3 included subtypes of derivation - (1) [was] revision [of], (2) quotation ("was quoted from"), (3) [had] primary source).

R1.1: Meta(data) are released with a clear and accessible data usage license25 May 202200:11:45

The Creative Commons suite of licenses: CC0, CC BY, CC BY-SA, CC-BY-ND, CC BY-NC, CC BY-NC-SA, CC BY-NC-ND.

Code licenses: Server Side Public License, Affero GPL (AGPL), Lesser GPL (LGPL), Mozilla Public License (MPL), Business Source License (used e.g. by Sentry, <https://github.com/getsentry/sentry/blob/master/LICENSE>), Elastic License (for Elasticsearch), Apache 2.0, BSD, MIT. Spectrum of user freedom and redistributor freedom.

"The CRAPL: An academic-strength open source license": <https://matt.might.net/articles/crapl/>

R1: (Meta)data are richly described with a plurality of accurate and relevant attributes18 May 202200:09:12

* https://queryunderstanding.com
* http://contentunderstanding.com
* https://www.w3.org/TR/json-ld11-framing/
* https://www.w3.org/TR/shacl/
* https://jasonformat.com/islands-architecture/
* https://www.hydra-cg.com/spec/latest/core/

I3: (meta)data include qualified references to other (meta)data12 May 202200:05:47

In the W3C Provenance Ontology:
https://www.w3.org/TR/prov-o/#wasDerivedFrom

The HTML Anchor Element:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a

I2: (Meta)data use vocabularies that follow the FAIR principles04 May 202200:06:26

Heather Hedden, "Foundation for a Knowledge Graph Taxonomy Design Best Practices", slides at https://zenodo.org/record/6510205

Teodora Petkova, "The Dialogic Potential of the Web of Data", slides at https://zenodo.org/record/6518557

https://en.wikipedia.org/wiki/Bohm_Dialogue

Tim Berners-Lee's bag of chips

https://www.w3.org/TR/vocab-dcat-2/#Class:Dataset

https://schema.org/Dataset

I1: (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation27 Apr 202200:08:25

GUPRIs, RDF, RDFS, OWL, SHACL, JSON, JSON-LD, JSON Schema, ActivityPub, "fediverse", XMPP, SMTP.

A2. Metadata are accessible, even when the data are no longer available19 Apr 202200:04:39

Archival Resource Key (ARK) specification (section on policy metadata): https://datatracker.ietf.org/doc/html/draft-kunze-ark-34#section-5.1.1.

Permanence Levels and the Archives for NIH NLM's Permanent Web Documents: https://www.nlm.nih.gov/pubs/techbull/ma05/ma05_archive.html.

Vineeth Venugopal31 Oct 202200:59:19

https://en.wikipedia.org/wiki/Interatomic_potential

A1.2: The protocol allows for authentication and authorisation where necessary13 Apr 202200:05:04

A brief dip into the world of HTTP auth. The Authorization request header. The WWW-Authenticate response header. Basic authentication. Bearer-based authentication. Authenticating securely. Shared secrets versus asymmetric encryption (for non-repudiation).

A1.1: The protocol is open, free and universally implementable05 Apr 202200:03:41

Protocol versus implementation. HTTP, SMTP, Zulip.

A1: (Meta)data are retrievable by their identifier using a standardized communication protocol29 Mar 202200:02:48

You want to avoid protocols with limited implementation, poor documentation, and, when possible, components involving human intervention.

It may not be possible to provide secure access through a fully mechanized protocol like HTTP, for example, for highly sensitive data. However, the protocol  must be clear and explicit in the metadata, whether it involves a verbal request, email, telephone number, Slack username, et cetera.

The important thing is that the communication protocol for how to access is explicit and clearly defined in the metadata, whether fully mechanized or not.

F4: (Meta)data are registered or indexed in a searchable resource22 Mar 202200:06:42

The goal here is leverage: increasing the ratio of machine action to user action in getting to the data that they want. Otherwise, your data is technically findable, but it's going to require a lot of user action. They might have to do a full data download, scan through a full table, scroll through a long webpage, and it's unlikely that they're going to actually find what they need, because they're just not going to put in that much effort. So you really want indexing. You want this leverage to have your machine help do some of the action that a user might otherwise do.

F3: Metadata clearly and explicitly include the identifier of the data they describe15 Mar 202200:03:01

Literature references with and without DOIs. Tables of data in articles with and without unique identifiers in each row for what that row is about. The magic of including identifiers in the metadata you share.

The Data Catalog (DCAT) Vocabulary: https://www.w3.org/TR/vocab-dcat-2/

F2: Data are described with rich metadata08 Mar 202200:03:13

Kinds of metadata - "intrinsic" (machine-defined or machine-controlled; immutable) and "extrinsic" (user-defined or user-controlled). Other-than-technical interoperability. "Quality" in the eye of the beholder / data consumer. Analogy to web-browser feature detection, and application to search engine "rich results".

F1: (Meta)data have globally unique, persistent identifiers01 Mar 202200:06:57
  • HTTP URLs
  •  orcid.org, doi.org, uniprot.org
  • archival resource keys (ARKs)
  • meta-resolvers: identifiers.org, n2t.net
What to expect from this podcast22 Feb 202200:01:10

A rundown of what I'm planning: FAIRdowns, inside the Box, and FIP calls, oh my!

walk-and-talk: DIKW pyramid/hierarchy 27 Sep 202200:08:57

DIKW pyramid / DIKW hierarchy - https://en.wikipedia.org/wiki/DIKW_pyramid

"Data becomes information when it is stored *in* a given *formation*."
From B. Fong and D. I. Spivak, “Seven Sketches in Compositionality: An Invitation to Applied Category Theory,” Ch. 3 - Databases, arXiv, Oct. 12, 2018. doi: 10.48550/arXiv.1803.05316.

"There are only three things we can do with data. We can accrete data by adding it to an existing collection, reduce data by discarding information from an existing collection, or reshape data by placing it in a different kind of collection."
From Z. Tellman, *Elements of Clojure*, Ch. 4 - Composition. Monee, IL: Lulu.com, 2019.

types of information: situational, methodological, philosophical (epistemological, axiological, ontological)
From Dorian Taylor, "2022-05-11 types of information", (May 11, 2022). Accessed: Sep. 27, 2022. [Online Video]. Available: https://www.youtube.com/watch?v=zNUNgZ6RTmQ

Inductions vs deductions vs abductions
Informed by M. K. Bergman, A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce. Cham: Springer International Publishing, 2018. doi: 10.1007/978-3-319-98092-8.

"programs must be written for people to read, and only incidentally for machines to execute."
From preface to first edition (and included in subsequent editions) of H. Abelson, G. J. Sussman, and J. Sussman, *Structure and interpretation of computer programs*, Cambridge, Mass.: MIT Press.

I Fought the Law07 Sep 202200:01:12

`.split()`s on strings and `filter`s on `None`
I fought the Law and the Law won
I fought the Law and the Law won
I needed spec compliance; I got none
I fought the Law and the Law won
I fought the Law and the Law won

I varied my output with the latest fad
Breakin' every downstream run
Needed Postel more than I ever had
I fought the Law and the Law won
I fought the Law and the

Scatterin' parsing like a shotgun
I fought the Law and the Law won
I fought the Law and the Law won
I lost robustness and I lost my fun
I fought the Law and the Law won
I fought the Law and the Law won

I varied my output with the latest fad
Breakin' every downstream run
Needed Postel more than I ever had
I fought the Law and the Law won
I fought the Law and the

Martynas Jusevičius29 Aug 202200:29:59

- Linked Data
- Project Jupyter (Notebook, Lab, etc.)
- UI Blocks: Block Protocol
- Personal Knowledge Graphs: Roam, Logseq, Obsidian
- Solid: decentralized data stores
- Resource Description Framework (RDF)
- Twitter: Martynas, AtomGraph
- LinkedDataHub (Apache-2.0 license)
- AtomGraph: Website, GitHub

FAIR-Enabling Services19 Aug 202200:09:57
I was thinking about FAIR-enabling resources and wanted to distinguish between things that actually have to be running in order for data to be alive and for you to actually find it, access it, interoperate with it, and reuse it, versus "one-time" things that those services will need.
Stuck Data Mining Again (Lodi)09 Aug 202200:02:07

Just about a week ago,
I set out to download.
Seekin' supplementary data,
lookin' for a pot of gold.

Things got bad, and things got worse,
I guess you will know the tune.
Oh lord, stuck data mining again.

Rode in on semantics,
I'll be hand-waving out if I go.
Trying controlled vocabularies,
must've been seven of 'em or more.
No corresponding authors
have replied to my emails yet.
Oh lord, I'm stuck data mining again.

The man from Stack Overflow
said I was on my way.
My code kept raising exceptions.
I was reading tracebacks for days.
I wanted to run a one-off benchmark.
Looks like my plans fell through.
Oh lord, stuck data mining again.

If I only had metadata
that was machine-actionable
every time I've had a dataset
that I's told was interoperable.
You know I'd catch the FAIR train
and breeze through my planned reuse.
Oh lord, I'm stuck data mining again.
Oh lord, I'm stuck data mining again.

Don't Silo Me In04 Aug 202200:01:19

Oh give me mappings, lots of mappings, with resolving URIs. Don’t silo me in.

Let me prance through semantics of namespaces that I love. Don’t silo me in.

Let me use an open protocol to access these bytes, and for metadata promise me you’ll keep on the lights. Authenticate me repeatedly, but give clear usage rights. Don’t silo me in.

Just give me data bare. Let me reuse my old CPUs and mint my URIs.

With my own software, let me wander over yonder with least surprise.

I want to probe the provenance of metadata rich and plural, and represent my knowledge to be machine actionable. And I can’t look at schemas if they’re not interoperable. Don’t silo me in.

Shreyas Cholia29 Jul 202200:29:46

* [Materials Project](https://materialsproject.org/)

* [Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE)](https://ess-dive.lbl.gov/)

* [National Microbiome Data Collaborative (NMDC)](https://microbiomedata.org/)

* [W3C Provenance (PROV) specs](https://www.w3.org/TR/prov-overview/)

* [Research Equals (R=)](https://www.researchequals.com/)

* [JSON-LD](https://json-ld.org/)

* [Ecological Metadata Language (EML)](https://eml.ecoinformatics.org/)

* [DataCite](https://datacite.org/)

* [OSTI](https://www.osti.gov/)

* [DOI](https://www.doi.org/)

* schema.org

* [OAuth](https://oauth.net/2/)

* [OpenID Connect (OIDC)](https://openid.net/connect/)

* [OpenAPI](https://www.openapis.org/)

* [REST](https://en.wikipedia.org/wiki/Representational_state_transfer)

* [IGSN](https://www.igsn.org/)

* [Data Observation Network for Earth (DataONE)](https://www.dataone.org/)

* [Frictionless Data](https://frictionlessdata.io/)

© My Podcast Data