Add material in the spec about the triple-occurrence distinction #75
Conversation
|
I am against using type-token for this distinction. Consider URIs. Are ex:a and ex:a two different tokens for the same type? No! <:a :b :c> and <:a :b :c> are not either nor are "1"^^xsd:int and "1"^^xsd:int. The RDF* documents should use wording that makes sense if used for IRIs or RDF literals. |
Yes, if I read correctly https://plato.stanford.edu/entries/types-tokens/#WhaDis .
How many IRIs do you count in this?
I count 3 IRI types, and 5 IRI tokens. I don't consider the terms of RDF (IRIs, literals...) and RDF* (+triples) to be different, in that respect, from the terms of the English language. Of course, we are not talking here about what the terms denote (as in the example in the link above: in "an 8,000 year old bean", does the word "bean" denote a bean type or a bean token?). We are talking about terms types and term tokens. |
|
I do not agree.
https://plato.stanford.edu/entries/types-tokens/#WhaItNot
Although the matter is discussed more fully in §8 below, it
should
be mentioned here at the outset that the type-token distinction
is not
the same distinction as that between a type and (what logicians
call)
its occurrences. Unfortunately, tokens are often
explained as
the “occurrences” of types, but not all occurrences of
types are tokens. To see why, consider this time how many words
there
are in the Gertrude Stein line itself, the line type,
not a
token copy of it. Again, the correct answer is either three or
ten, but
this time it cannot be ten word tokens. The line is an
abstract type with no unique spatio-temporal location and
therefore
cannot consist of particulars, of tokens. But as there are only
three
word types of which it might consist, what then are we counting
ten of?
The most apt answer is that (following logicians' usage) it is
composed of ten occurrences of word types. See §8
below,
Occurrences, for more details.
Further, type is used in RDF and ontologies as the relationship
between an entity and classes that it belongs to (e.g., rdf:type)
so it is better to avoid other possible meanings of type.
peter
PS: I count 3 IRIs (actually 3 CURIES). If I have to
distinguish further, I count 5 occurrences of IRIs (CURIEs). I
count zero IRI (or CURIE) types and zero IRI (or CURIE) tokens.
On 12/17/20 4:22 PM, Pierre-Antoine
Champin wrote:
Consider URIs. Are ex:a and ex:a two different tokens for the
same type?
Yes, if I read correctly https://plato.stanford.edu/entries/types-tokens/#WhaDis
.
Rose is a rose is a rose is a rose.
In one sense of ‘word’ we may count three different words; in
another sense we may count ten different words. C. S. Peirce
(1931-58, sec. 4.537) called words in the first sense “types”
and words in the second sense “tokens”.
How many IRIs do you count in this?
rdfs:Class rdf:type rdfs:Class;
rdfs:subClassOf rdfs:Class.
I count 3 IRI types, and 5 IRI tokens.
I don't consider the terms of RDF (IRIs, literals...) and RDF*
(+triples) to be different, in that respect, from the terms of
the English language.
Of course, we are not talking here about what the
terms denote (as in the example in the link above: in
"an 8,000 year old bean", does the word "bean" denote a bean
type or a bean token?). We are talking about terms types and
term tokens.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "#75 (comment)",
"url": "#75 (comment)",
"name": "View Pull Request"
},
"description": "View this Pull Request on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
|
What about More seriously: I was not advocating to use terms like "IRI type" and "IRI token" in the report -- as I didn't use them in my PR. I was only using Peirce's terminology in this discussion to show how the Type-Token distinction is, in my opinion, appropriate in this situation. And yes, I also read the part about "what it is not" and occurrences. What I take away from it is: if you are counting IRIs in a graph or a triple, you are dealing with IRI occurrences. But if you are counting them on your screen or a printed page, then you are dealing with IRI tokens. |
|
I would call all occurrences of IRIs IRI occurrences, whether they are in a refresh of a screen (at 60 occurrences a second or more), on a screen, in a vocal utterance, in an email, in an email message, in a document, in a triple, or in a graph. Tokens are used in too many other ways in computer science that could be used when talking about RDF and RDF*, so the "kind" of token needs to be distinguished so using the more generic occurrence ends up being less confusing. As a case in point, programming languages have tokens (e.g., while) and these tokens have occurrences in code. |
|
The seminal example discussion is misguided. Even if there is only stated-by information, and even if embedded triples are sensitive to syntax, a source states some occurrence of information. The relationship between a source and a syntactic unique embedded triple is something like "stated an occurrence of" or "stated something that is expressed as". |
Fully agreed. And I considered that
There is a link to the example from section 2.1 |
|
This was discussed during today's call: https://w3c.github.io/rdf-star/Minutes/2020-12-18.html#item04 |
I just read up on the topic in SEP (https://plato.stanford.edu/entries/types-tokens/). It seems indeed more appropriate to speak of 'types' vs 'occurrences' in our context as we are not concerned with renderings of triples on screens and printouts, but replacing 'token' by 'ooccurrence' everywhere seems extreme: e.g. we might like to differentiate between different serializations of the same occurrence (say in N-triples* vs Turtle*) by calling them 'tokens'. |
|
Falling within the |
@@ -1034,7 +1058,7 @@ Interpolation lemma |
|||
| <section class="appendix"> | |||
| <h2>Historical remarksh2> | |||
|
|
|||
| <section class="appendix"> | |||
| <section> | |||
| <h2>SA-mode and PG-modeh2> | |||
pchampin
on Jan 7
Author
Collaborator
From https://respec.org/docs/:
Using h2 everywhere is sort of a tacit convention
plus, Respec changes the Hx tags itself into a consistent hierarchy.
Finally, even with H2 everywhere, this is still valid HTML...
TallTed
on Jan 7
•
Contributor
Using h2 everywhere is sort of a tacit convention
Ugh. Respec is basically saying, "Be lazy and sloppy, authors! We'll fix it in post!"
But I guess your dialect has now been blessed.
plus, Respec changes the Hx tags itself into a consistent hierarchy.
IFF everything is properly wrapped and nested in <section> tags. An important distinction without which the auto-hierarchy-construction will fail.
Finally, even with H2 everywhere, this is still valid HTML...
I never said it wasn't valid HTML. Plenty of valid HTML leads to incomprehensible browser rendering. I'm glad that doesn't happen here. I still believe it's best to keep the H# tags in the source numbered as they're intended to be rendered.
pchampin
on Jan 8
•
Author
Collaborator
Ugh. Respect is basically saying, "Be lazy and sloppy, authors! We'll fix it in post!"
I look at it more positively, as "be careful to nest your sections correctly, and I'll take care of the tedious redundant numbering of H's" ;-)
IFF everything is properly wrapped and nested in
tags.
Of course, but my personal opinion is that sections aren't optional nowadays. I think HTML would have been better off from the start with a <section> tag and a single <h> tag ;-)
TallTed
on Jan 12
Contributor
@pchampin - I'm sure I'm failing to mentally fill in something that should be obvious, but I'm even more sure that the markup is not leading to rendering as intended. Perhaps something was eaten/deleted when you saved it. As it stands, I think there's an extra backtick preceding sections aren't optional and some further imbalance as you go, with impact following start with a.
TallTed
on Jan 14
Contributor
shrugs I think we must agree to disagree. I prefer to keep the source as close as possible to the desired outcome, partly because it limits confusion when others review that source... which is where I always start, because the changes are highlighted better here than in the Respec DIFF. Back to focusing on the content instead of the markup. :-)
@@ -1047,6 +1071,29 @@ SA-mode and PG-mode |
|||
| ul> | |||
| section> | |||
|
|
|||
| <section> | |||
| <h2>The seminal exampleh2> | |||
|
@pchampin -- Note that there are currently conflicts that must be resolved before this can be merged, and the resolution of those conflicts may change the ongoing discussion as they will change the reading of the PR's results. (I and most reading this cannot see what the conflicts are, only that they exist ... ) |
|
There appears to be three changes in PR #75: a new section on triple occurrences, a new section on the seminal example, and the moving of the original RDF* paper from an informative to a normative reference. I find all three of these changes problematic. Does the RDF* specification depend on anything in the original RDF* paper? I heard not. So the original RDF* paper should not be normative. Is the seminal example problematic? Yes, very! So it should be disavowed. The section on it does not and instead should read something like "The seminal example was wrong and should be completely disregarded." What is an embedded triple? Is it something like a literal or something like an IRI? The new section is firmly on the side of literal. This has consequences. My view is that an embedded triple does not need to be something like a literal. |
Respec put it there, because of a mistake I made: I thought that appendices were automatically marked as "informative", but this is not the case. I marked the corresponding appendix as "informative", and the reference is back in the correct place. Thanks for spotting this. For the other issues, I think it better to discuss them during the call. |
|
I think section 2.1 is a step in the right direction and the main load of appendix A.2 should be incorporated in section 2.1. Like:
It’s more important that people get it right now and in the future and that they are warned about a subtle but important change than that they understand what the historic background is. Appendix A.2 doesn’t explain what motivated the change. The explanations I could come up with are all not very flattering (like "at all cost avoid the semantic muddle that named graphs represent" or "we are so enarmored with the Superman problem" or "representing unasserted assertions suddenly seemed so tremedously important"). Of course if someone could put it a little more positively it would be good to add such an explanation about the motivation for the change as well. |
There was a discrepance between the seminal example referring to an occurrence and RDF* not saying how it addresses that occurrence. For a long time I considered this an oversight, sloppy engineering and/or something that people need to be educated about. The change was when a few weeks ago it became clear that RDF* will be specified in a way that makes the seminal example wrong, a regrettable mistake, whatever. Technically this is not a change in RDF* but just in the examples. Practically it is a change , and very much so IMO, which is why I find it important that section 2.1 addresses the problem rather comprehensively. |
|
Here is something that is closer to what is needed for A.2. I would prefer something stronger and shorter but if there is a desperate need for a longer section on the seminal example, this wording at least lays out the situation more clearly. A.2 The seminal example The motivating example in the original RDF* paper [ RDF-STAR-FOUNDATION ] was on a provenance use-case, and is repeated below. Example 19 :bob foaf:name "Bob". This example is incorrect because there is a need to have multiple creators and related sources for the same embedded triple. There is only one entity for an embedded in triple in RDF* so the source corresponding to a creator cannot be distinguished if provenance is represented in this manner. Because of this, the example has given rise to significant confusion. To rescue the example requires an intermediate entity to represent the stating of a triple, as in <<:bob foaf:age 23>> ex:stating This corrected example shows that embedded triples can require more complex solutions than using RDF reification directly. |
|
@rat10 I agree that the seminal example only works if there are multiple entities for an embedded triple, but the semantics in the document had a single entity for an embedded triple and the newer semantics have also worked this way. So no change to the meaning of RDF*, just an incorrect and misleading example. (Well, using "just" here is really downplaying the seriousness of the situation. Misleading examples can do vast amounts of damage.) |
|
I remain concerned that the merge conflicts on this PR may be concealing text that ought to be highlighted during our review and consideration of this PR. Please, can those be addressed soon? |
|
This was discussed during today's call https://w3c.github.io/rdf-star/Minutes/2021-01-15.html#item02 |
the documentation of prov:wasDerivedFrom says: > A derivation is a transformation of an entity into another, an update of > an entity resulting in a new one, or the construction of a new entity > based on a pre-existing entity. Triples are mathematical abstractions; they are imutable, and so can not be "transformed" or "updated", and they are not constructed (they simply exist) and do not pre-exist each oethr.
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
... as discussed during last call https://w3c.github.io/rdf-star/Minutes/2020-12-18.html#item04
Co-authored-by: Olaf Hartig <olaf.hartig@liu.se>
|
Thanks for the input @pfps and @rat10, and apologies for not participating in yesterday's call (it was my daughter's birthday and I couldn't sneak out of the activities). Some responses to what you write above and what was mentioned in some emails on the list: I concur with Peter's comment that there is no change from the original paper to our current draft in terms of the meaning of RDF* and, instead, it is only the original example that has been wrong and misleading. I appreciated your continued input to make sure we get the examples in the draft right. I realize now that there are actually two separate mistakes that I made when writing this example. One of them is related to the distinction between the notion of an RDF* triple as a single entity (whose identity is defined entirely by the three RDF* terms it consists of) and the notion of an occurrence of such a triple. I simply didn't make this distinction. I think that the new Section 2.1 in the current version of this PR here does a good job raising the readers' awareness of the need to make this distinction. As a side node related to this mistake, I actually don't think that anywhere in the original paper the examples explicitly refer to occurrences of triples; I mean, nowhere in the text of the paper do I say that the example is about providing metadata for a specific occurrence of the triple about Bob's age. However, I can see now that readers may interpret the data in the example in this way, and that's one of the reasons the example is badly chosen as I understand now (and another reason is that I should have made the distinction as mentioned above). The other mistake I made in the example is that the modeling of the metadata is insufficient. Instead of using the embedded triple about Bob's age directly as the subject of the two metadata triples, I should have introduced a separate entity that captures the creation of this triple (or, more precisely, as I know now, the creation of an occurrence of the triple) and then represent the information about this creation (i.e., creator and source) as triples with this separate entity as their subject. A side node related to this mistake: I realize now that the given metadata triple with predicate Now, to address these mistakes, the RDF* graph of the example can be changed as follows (written in Turtle*, prefix declarations omitted). ## Turtle Start ##
@prefix : <#> .
[
a foaf:Document ;
dct:creator :crawlerC1 ;
dct:source <>
]
sioc:container_of
[
a rdf:Statement;
rdf:subject :bob;
rdf:predicate foaf:age ;
rdf:object "23"^^xsd:integer
] .
sioc:container_of owl:equivalentProperty :creation .
dct:source owl:equivalentProperty schema:url .
## Turtle End ##
Notice that the blank node labeled as As a final remark here, after these fixes, I would say that this example is not suitable anymore as the main first example to present the basic idea of RDF*. On the other hand, I think it is suitable now as an example that demonstrates that RDF* can be used as a building block to capture provenance use cases. |
|
@pfps in your comment above you write:
While I am fine with the corrected example (it is a variation of the fix in my previous comment but without capturing the creation as a separate entity), I don't understand your remark about the "more complex solutions." What you write in your corrected example are essentially three triples and one of them contains an embedded triple. So, it's four triples overall if we count the embedded triple as an independently. Let's compare this to RDF reification. While I am not entirely sure what you mean by "using RDF reification directly," I assume you mean to represent the example as follows (correct me if I am wrong).
In this representation, I am counting six triples. Can you explain what you mean by "more complex"? |
|
RDF reification has statements, which are underspecified. If one desires, one can use statements as, essentially, triple occurrences. No extra kinds of entities are required. |
| # the controversial seminal example | ||
| :bob foaf:name "Bob". | ||
|
<<:bob foaf:age 23>> dct:creator |
||
|
dct:source |
As you know, my position is that it is something like a literal. But I agree with you that this section does not need to be "tainted" by this assumption. In the latest commit (d0d948f), I slightly rephrased §2.1 to make it neutral w.r.t. this question. |
The way I see it, the solutions in RDF* are less complex, because the language is more expressive We trade additional complexity in the language itself (adding an extra kind of terms) for less complexity in the graphs using that language (less triples) and the serializations encoding these graphs (thanks to |
Couldn’t update branch
Oops, something went wrong.
Aims to addresses issue #64 by
prov:wasDerivedFrom)Preview | Diff