Hi Sam, The way to transfer metadata over a well defined standard way is called Turtle (TTL, https://www.w3.org/TR/turtle/). SPARQL is a query language that utilizes portions of this spec (with SPARQL INSERT you can more or less put a TTL block inside of INSERT{} and it should just work. Our ontology files are btw. also written using TTL. JSON is great now that the kids are making javascripts and websites. But it's also being abused to serialize everything into. Just like XML was abused back when SOAP was the hype of the day. The communication tracker-extract used to do with tracker-miner-fs before the passive-extraction branch of Carlos, was also two blocks of TTL: one for the deletes and one for the inserts. Of course, if you make an abstraction that doesn't sit in the way for normal operation and that can make it spew out JSON for something: I guess fine. But we want to avoid marshalling to JSON and then demarshalling from JSON just for the purpose of pleasing deh javascriptz kids. The way it works now, our IPC requires almost no such marshalling and demarshalling at all. That, plus Adrien's filedescriptor passing IPC mechanism, is what makes it fast. Putting a pointless conversion in the middle wouldn't. What would be more interesting is to have a cgi-bin or Angular or god knows what is popular nowadays among the cloud people thingy that allows to do a SPARQL query and then gives back JSON. That would be an implementation of this standard: https://www.w3.org/TR/sparql11-results-json/ Using standards is a good idea in open source. Kind regards, Philip On Sat, 2016-04-09 at 00:39 +0100, Sam Thursfield wrote:
Hi all I've always felt like Tracker's extractors should be reusable outside Tracker. The design makes that possible but right now they output their results as a series of slightly non-standard SPARQL update commands, which I don't think is useful for many folk. Lots of people aren't using SPARQL databases at all, believe it or not :-) The whole point of RDF is to make data interchange easy so I think we can do better than that. I've been looking at making the extractors optionally output their results in JSON-LD[1] format instead. The cool thing about JSON-LD is that if you squint, it's just good old JSON that everyone's familiar with. If you look closely it's also Linked Data, but in a more human-friendly serialization format than any of the more traditional RDF formats. The catch here is that Tracker's extractor modules are all hardwired to generate SPARQL using TrackerSparqlBuilder. To be honest I've never liked this approach, it's pretty incomprehensible to newcomers and overly verbose, especially where we explicitly generate DELETE queries to go along with the INSERT queries. so, inspired by something in the Python RDFLib library, I came up with a TrackerResource class that the extractors can use instead. This is a work in process, but I have a branch in git.gnome.org that adds TrackerResource, and converts some of the extractors to use it. The TrackerResource class can serialize either to SPARQL update commands or to JSON-LD. The branch also adds the `tracker extract` command from <https://bugzilla.gnome.org/show_bug.cgi?id=751991> so you can try out the extractors easily and specify `-o json` or `-o sparql` as you prefer. The results for extractors I've converted so far is promising in terms of reducing code size: src/tracker-extract/tracker-extract-abw.c | 51 ++-- src/tracker-extract/tracker-extract-bmp.c | 18 +- src/tracker-extract/tracker-extract-dvi.c | 17 +- src/tracker-extract/tracker-extract-epub.c | 131 +++----- src/tracker-extract/tracker-extract-gstreamer.c | 910 ++++++++++++++++++------------------------------------- src/tracker-extract/tracker-extract-mp3.c | 378 ++++++++--------------- 6 files changed, 511 insertions(+), 994 deletions(-) Here's an example of auto-generated SPARQL for an MP3 extraction: DELETE { } WHERE { <file:///home/sam/Downloads/Best%20Coast%20-%20The%20Only%20Place.mp3> nie:comment ?nie_comment ; nmm:trackNumber ?nmm_trackNumber ; nmm:performer ?nmm_performer ; nfo:averageBitrate ?nfo_averageBitrate ; nmm:musicAlbum ?nmm_musicAlbum ; nfo:channels ?nfo_channels ; nmm:dlnaProfile ?nmm_dlnaProfile ; nmm:musicAlbumDisc ?nmm_musicAlbumDisc ; rdf:type ?rdf_type ; nfo:duration ?nfo_duration ; nfo:codec ?nfo_codec ; nmm:dlnaMime ?nmm_dlnaMime ; nfo:sampleRate ?nfo_sampleRate ; nie:title ?nie_title . } DELETE { } WHERE { <urn:artist:Best%20Coast> nmm:artistName ?nmm_artistName ; rdf:type ?rdf_type . } INSERT { <urn:artist:Best%20Coast> a nmm:Artist ; nmm:artistName "Best Coast" . } DELETE { } WHERE { <urn:album:The%20Only%20Place> nmm:albumTitle ?nmm_albumTitle ; rdf:type ?rdf_type ; nmm:albumArtist ?nmm_albumArtist . } INSERT { <urn:album:The%20Only%20Place> a nmm:MusicAlbum ; nmm:albumTitle "The Only Place" ; nmm:albumArtist <urn:artist:Best%20Coast> . } DELETE { } WHERE { <urn:album-disc:%D0:%06%02:Disc1> nmm:setNumber ?nmm_setNumber ; nmm:albumDiscAlbum ?nmm_albumDiscAlbum ; rdf:type ?rdf_type . } INSERT { <urn:album-disc:%D0:%06%02:Disc1> a nmm:MusicAlbumDisc ; nmm:setNumber 1 ; nmm:albumDiscAlbum <urn:album:The%20Only%20Place> . } INSERT { <file:///home/sam/Downloads/Best%20Coast%20-%20The%20Only%20Place.mp3> a nmm:MusicPiece , nfo:Audio ; nie:comment "Free download from http://www.last.fm/music/Best+Coast and http://MP3.com" ; nmm:trackNumber 1 ; nmm:performer <urn:artist:Best%20Coast> ; nfo:averageBitrate 128000 ; nmm:musicAlbum <urn:album:The%20Only%20Place> ; nfo:channels 2 ; nmm:dlnaProfile "MP3" ; nmm:musicAlbumDisc <urn:album-disc:%D0:%06%02:Disc1> ; nfo:duration 164 ; nfo:codec "MPEG" ; nmm:dlnaMime "audio/mpeg" ; nfo:sampleRate 44100 ; nie:title "The Only Place" . } Note there are a lot more DELETE statements than before. I figured that anywhere we want to replace the existing data we need a DELETE statement, and the reason we don't normally do it is because previously it had to be done manually. That said, the TrackerResource class does have a way of avoiding this. If you ever call _set_value() for a property then it assumes you want to *overwrite* it, and will generate a DELETE. If you only use _add_value() then it will assume you want to *add* to it, and won't generate a DELETE. The latter case is needed for stuff like nao:hasTag. I may be misunderstanding things here of course, I didn't actually write any of the extractors myself. Here's a example of JSON-LD output: { "nie:comment" : "Free download from http://www.last.fm/music/Best+Coast and http://MP3.com", "nmm:trackNumber" : 1, "nmm:performer" : { "@id" : "urn:artist:Best%20Coast", "nmm:artistName" : "Best Coast", "@type" : "nmm:Artist" }, "nfo:averageBitrate" : 128000, "nmm:musicAlbum" : { "@id" : "urn:album:The%20Only%20Place", "nmm:albumTitle" : "The Only Place", "@type" : "nmm:MusicAlbum", "nmm:albumArtist" : { "@id" : "urn:artist:Best%20Coast", "nmm:artistName" : "Best Coast", "@type" : "nmm:Artist" } }, "nfo:channels" : 2, "nmm:dlnaProfile" : "MP3", "nmm:musicAlbumDisc" : { "@id" : "urn:album-disc:%C0:L%01:Disc1", "nmm:setNumber" : 1, "nmm:albumDiscAlbum" : { "@id" : "urn:album:The%20Only%20Place", "nmm:albumTitle" : "The Only Place", "@type" : "nmm:MusicAlbum", "nmm:albumArtist" : { "@id" : "urn:artist:Best%20Coast", "nmm:artistName" : "Best Coast", "@type" : "nmm:Artist" } }, "@type" : "nmm:MusicAlbumDisc" }, "nfo:duration" : 164, "nfo:codec" : "MPEG", "nmm:dlnaMime" : "audio/mpeg", "nfo:sampleRate" : 44100, "nie:title" : "The Only Place" } We can actually do much better than this, right now there's no @context so it kind of misses the point of JSON-LD. I need to finish writing a NamespaceManager class that can track all of the prefixes and generate a suitable JSON-LD context, so that instead of stuff like "nie:title", it can just say "title" and then the @context will link that to <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title> The code is in branch wip/sam/resource: <https://git.gnome.org/browse/tracker/log/?h=wip/sam/resource>. It's still of course a work in progress but I think it's pretty much taken shape, so please have a look and give feedback on whether you think this is a sane approach! Thanks Sam [1]: http://json-ld.org/ _______________________________________________ tracker-list mailing list tracker-list gnome org https://mail.gnome.org/mailman/listinfo/tracker-list
Attachment:
signature.asc
Description: This is a digitally signed message part