How to load *.nq files ?

Matters relating to the Virtual Database Engine functionality realm such as:
- Master Data Management (virtualization of Heterogeneous SQL (ODBC/JDBC/ADO.NET/OLE-DB), XML, RDF, and SOA based data sources)
- Product Information Management (building RDF Linked Data Spaces from Product Catalog Data)
- RDFization Middleware for producing RDF Linked Data from non RDF Data Sources
- Enteprise Information & Data Integration (in general)

How to load *.nq files ?

Postby pramoda84 on Mon Sep 28, 2009 12:27 am

Hello,

I am trying to load a file that is quite huge to Virtuoso using the isql prompt.
I did not find any utility on virtuoso to load .nq files (the RDF file with N quad format).
Since virtuoso is a quad store I was expecting some direct command to load the .nq files but I am unable to find it.
Is there any utility that can load quad files ?

Thanks and Best Regards,
Pramod.
pramoda84
 
Posts: 6
Joined: Sun Sep 27, 2009 10:08 pm

Re: How to load *.nq files ?

Postby hwilliams on Tue Sep 29, 2009 11:59 am

Hi

Assuming you mean N-triple format, you would typically use one of the following functions with the Virtuoso isql command line program to load multiple or large RDF files into the same graph:

http://docs.openlinksw.com/virtuoso/rdf ... thods.html
http://docs.openlinksw.com/virtuoso/fn_ttlp_mt.html
http://docs.openlinksw.com/virtuoso/fn_ ... ml_mt.html

As we do for some of the large datssets hosted in Virtuoso like DBpedia and the LOD cloud etc as detailed at:

http://docs.openlinksw.com/virtuoso/rdf ... uning.html

Best Regards
Hugh Williams
OpenLink Software
hwilliams
 
Posts: 258
Joined: Thu Mar 06, 2008 4:30 am

Re: How to load *.nq files ?

Postby pramoda84 on Tue Sep 29, 2009 6:52 pm

Well, I guess yes, it's NTriples. Only that besides the triple (S,P,O) it contains a graph identifier (S,P,O,G) making it a quad.

here is a snippet of the file:

Code: Select all
root@harp:/usr/local/virtuoso-opensource/bin/data# head btc-2009-small.nq
<http://www.w3.org/2002/01/tr-automation/tr.rdf> <http://purl.org/dc/elements/1.1/title> "W3C Standards and Technical Reports" <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/PersonalProfileDocument> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://creativecommons.org/ns#license> <http://creativecommons.org/licenses/by-nc/3.0/> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://purl.org/dc/elements/1.1/title> "Tim Berners-Lee's FOAF file" <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://xmlns.com/foaf/0.1/maker> <http://www.w3.org/People/Berners-Lee/card#i> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://xmlns.com/foaf/0.1/primaryTopic> <http://www.w3.org/People/Berners-Lee/card#i> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card#cm> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card#cm> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <http://www.koalie.net/foaf.rdf> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card#cm> <http://xmlns.com/foaf/0.1/mbox> <mailto:coralie@w3.org> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card#cm> <http://xmlns.com/foaf/0.1/name> "Coralie Mercier" <http://www.w3.org/People/Berners-Lee/card> .


Here is the error I got:

Code: Select all
SQL> DB.DBA.TTLP_MT (file_to_string_output ('/usr/local/virtuoso-opensource/bin/data/btc-2009-small.nq'), '', 'http://localhost:8890/DAV/test');
Connected to OpenLink Virtuoso
Driver: 05.09.3035 OpenLink Virtuoso ODBC Driver
*** Error 37000: [Virtuoso Driver][Virtuoso Server]SP029: TURTLE RDF loader, line 1: syntax error
at line 1 of Top-Level:
DB.DBA.TTLP_MT (file_to_string_output ('/usr/local/virtuoso-opensource/bin/data/btc-2009-small.nq'), '', 'http://localhost:8890/DAV/test')



Now, if I cut the last column, making it a triple again, it loads fine:

Code: Select all
root@harp:/usr/local/virtuoso-opensource/bin# ./isql
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> DB.DBA.TTLP_MT (file_to_string_output ('/usr/local/virtuoso-opensource/bin/data/btc-2009-tiny.nt'), '', 'http://localhost:8890/DAV/test');
Connected to OpenLink Virtuoso
Driver: 05.09.3035 OpenLink Virtuoso ODBC Driver

Done. -- 10 msec.



So... any thoughts?
pramoda84
 
Posts: 6
Joined: Sun Sep 27, 2009 10:08 pm

Re: How to load *.nq files ?

Postby hwilliams on Wed Sep 30, 2009 7:24 am

Hi

An RDF statement is comprised of a triple ( S, P ,O) to make a statement about something. A series of such RDF statements are group together to form a graph with common name which in our case is stored in a QUAD store (G, S, P ,O). The functions we provide take the RDF triples forming the statement and assign them to a given graph name which is specified in the function call. Thus as you have found removing the common graph name from the triples does allow the data to load.

I have not come across any products that load quad statements, all expect them to be triples that are then assigned a graph or even multiple named graphs. So it this a format you have devised yours self or a documented format we are not aware of ?

Best Regards
Hugh Wiliams
OpenLink Software
hwilliams
 
Posts: 258
Joined: Thu Mar 06, 2008 4:30 am

Re: How to load *.nq files ?

Postby pramoda84 on Wed Sep 30, 2009 2:43 pm

Hi,

The fourth IRI/literal can be used for having provenance information. (S, P, O, I) where I --> provenance information.
Please have a look at this page that describes N-Quad format:
http://sw.deri.org/2008/07/n-quads/

The dataset I am using to load to Virtuoso is the dataset from the BTC(Billion Triple Challenge), ISWC.
It would be good if there is a functionality to load these .nq files directly.

OR, Is Virtuoso already capable of loading these .nq files directly ?
pramoda84
 
Posts: 6
Joined: Sun Sep 27, 2009 10:08 pm

Re: How to load *.nq files ?

Postby iv_an_ru on Thu Oct 01, 2009 5:23 am

To load NQuads file, one should use DB.DBA.TTLP(), DB.DBA.TTLP_MT() or DB.DBA.TTLP_MT_FROM_LOCAL_FILE() function with 4-th argument (mode) equal to 512 (at least with bit 512 set on, bits 1 to 64 can also be added by taste).
The feature is not documented, but it's there. So probably it's time to add a line to the User's Guide.

( To keep the answer complete, these are other possible bits:
1 - Single quoted and double quoted strings may with newlines.
2 - Allows bnode predicates (but SPARQL processor may ignore them!).
4 - Allows variables, but triples with variables are ignored.
8 - Allows literal subjects, but triples with them are ignored.
16 - Allows '/', '#', '%' and '+' in local part of QName ("Qname
with path")
32 - Allows invalid symbols between '<' and '>', i.e. in relative
IRIs.
64 - Relax TURTLE syntax to include popular violations.
128 - Try to recover from lexical errors as much as it is possible.
256 - Allows TriG syntax, thus loading data in more than one graph.
)
iv_an_ru
 
Posts: 12
Joined: Wed Sep 03, 2008 10:26 am

Re: How to load *.nq files ?

Postby hwilliams on Thu Oct 01, 2009 8:49 am

Hi

Note with Ivan's suggestion I have now been able to load NQuad format datasets into my local Virtuoso 5.x and 6.x instances:

Code: Select all
$ more nquad.nq
<http://www.w3.org/2002/01/tr-automation/tr.rdf> <http://purl.org/dc/elements/1.1/title> "W3C Standards and Technical Reports" <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/PersonalProfileDocument> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://creativecommons.org/ns#license> <http://creativecommons.org/licenses/by-nc/3.0/> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://purl.org/dc/elements/1.1/title> "Tim Berners-Lee's FOAF file" <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://xmlns.com/foaf/0.1/maker> <http://www.w3.org/People/Berners-Lee/card#i> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card> <http://xmlns.com/foaf/0.1/primaryTopic> <http://www.w3.org/People/Berners-Lee/card#i> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card#cm> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card#cm> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <http://www.koalie.net/foaf.rdf> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card#cm> <http://xmlns.com/foaf/0.1/mbox> <mailto:coralie@w3.org> <http://www.w3.org/People/Berners-Lee/card> .
<http://www.w3.org/People/Berners-Lee/card#cm> <http://xmlns.com/foaf/0.1/name> "Coralie Mercier" <http://www.w3.org/People/Berners-Lee/card> .

$ /usr/local/virtuoso-opensource/bin/isql
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> DB.DBA.TTLP_MT (file_to_string_output ('./nquad.nq'), '', 'http://localhost:8890/DAV/test', 512);
Connected to OpenLink Virtuoso
Driver: 05.11.3040 OpenLink Virtuoso ODBC Driver

Done. -- 48 msec.
SQL> sparql select distinct ?g where {graph ?g {?s ?p ?o}};
g
VARCHAR
_______________________________________________________________________________

http://www.openlinksw.com/schemas/virtrdf#
http://www.w3.org/People/Berners-Lee/card
gr
http://vpn191.usnet.private:8889/DAV

4 Rows. -- 17 msec.

SQL> sparql select * from <http://www.w3.org/People/Berners-Lee/card> where {?s ?p ?o};
s                                                                                 p                                                                                 o
VARCHAR                                                                           VARCHAR                                                                           VARCHAR
_______________________________________________________________________________

http://www.w3.org/2002/01/tr-automation/tr.rdf                                    http://purl.org/dc/elements/1.1/title                                             W3C Standards and Technical Reports
http://www.w3.org/People/Berners-Lee/card                                         http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://xmlns.com/foaf/0.1/PersonalProfileDocument
http://www.w3.org/People/Berners-Lee/card                                         http://xmlns.com/foaf/0.1/primaryTopic                                            http://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card                                         http://purl.org/dc/elements/1.1/title                                             Tim Berners-Lee's FOAF file
http://www.w3.org/People/Berners-Lee/card                                         http://xmlns.com/foaf/0.1/maker                                                   http://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card                                         http://creativecommons.org/ns#license                                             http://creativecommons.org/licenses/by-nc/3.0/
http://www.w3.org/People/Berners-Lee/card#cm                                      http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://xmlns.com/foaf/0.1/Person
http://www.w3.org/People/Berners-Lee/card#cm                                      http://xmlns.com/foaf/0.1/mbox                                                    mailto:coralie@w3.org
http://www.w3.org/People/Berners-Lee/card#cm                                      http://xmlns.com/foaf/0.1/name                                                    Coralie Mercier
http://www.w3.org/People/Berners-Lee/card#cm                                      http://www.w3.org/2000/01/rdf-schema#seeAlso                                      http://www.koalie.net/foaf.rdf

10 Rows. -- 7 msec.
SQL>


We shall be documenting the missing 512 value for the TTLP_*() load functions, probably using your sample dataset as a usage example :-)

Best Regards
Hugh Williams
OpenLink Software
hwilliams
 
Posts: 258
Joined: Thu Mar 06, 2008 4:30 am

Re: How to load *.nq files ?

Postby pramoda84 on Thu Oct 01, 2009 9:38 pm

Hello,

Thank you for all your suggestions.
I am using virtuoso-opensource-5.0.9 and just wanted to confirm that if this version has the feature you mentioned.

if not included, please let me know which version has this support so that I can build that on my system.

Thanks,
Pramod.
pramoda84
 
Posts: 6
Joined: Sun Sep 27, 2009 10:08 pm

Re: How to load *.nq files ?

Postby hwilliams on Fri Oct 02, 2009 5:13 am

Hi

Why, have you tried it with 5.0.9 and it doesn't work, as I would expect it to but can't say for sure. A snapshot 5.0.12 archive which is what I used for my testing and does work can be downloaded from:

ftp://download.openlinksw.com/support/v ... 916.tar.gz

Should you have problems with 5.0.9 ...

Best Regards
Hugh Williams
OpenLink Software
hwilliams
 
Posts: 258
Joined: Thu Mar 06, 2008 4:30 am

Re: How to load *.nq files ?

Postby pramoda84 on Fri Oct 02, 2009 3:57 pm

Hi,

Your suggestions were very helpful.
I deployed the virtuoso version you pointed out and loaded a sample data in Nquad format.

Thanks a lot,
Pramod.
pramoda84
 
Posts: 6
Joined: Sun Sep 27, 2009 10:08 pm


Return to Virtuoso - Virtual Data Management

Who is online

Users browsing this forum: No registered users and 1 guest

cron