DB.DBA.TTLP_MT - How to Ingest

Matters relating to the Virtual Database Engine functionality realm such as:
- Master Data Management (virtualization of Heterogeneous SQL (ODBC/JDBC/ADO.NET/OLE-DB), XML, RDF, and SOA based data sources)
- Product Information Management (building RDF Linked Data Spaces from Product Catalog Data)
- RDFization Middleware for producing RDF Linked Data from non RDF Data Sources
- Enteprise Information & Data Integration (in general)

DB.DBA.TTLP_MT - How to Ingest

Postby bgeary on Wed May 16, 2012 3:00 pm

Hello:

We have decided to switch from AG to V because of V's user front end gui. However the bottleneck has been
getting a large ingest to work. Our ingest file is in turtle format and loads successfully into AG. However we have
been unsuccessful in loading any file within V that is large.

Currently we are using DB.DBA.TTLP_MT.

Code: Select all
SQL> DB.DBA.TTLP_MT (file_to_string_output ('/usr/local/share/virtuoso/vad/Triples'), '', 'http://www.company.com', 1, 2);

*** Error 37000: [Virtuoso Driver][Virtuoso Server]SP029: TURTLE RDF loader, line 29: syntax error
at line 11 of Top-Level:
DB.DBA.TTLP_MT (file_to_string_output ('/usr/local/share/virtuoso/vad/Triples'), '', 'http://www.company.com', 1, 2)


All of our data looks like.
Code: Select all
<http://www.company.com#md_19360-1a> <http://www.company.com#Username> "User134" .
<http://www.company.com#md_19360-1a> <http://www.company.com#p_app> "Active Directory - Ficticious Context Data" .
<http://www.company.com#md_19360-1o> <http://www.company.com#p_proc> "lsass.exe" .
<http://www.company.com#md_19360-1o> <http://www.company.com#action> "create" .
<http://www.company.com#p_19360-1> <http://www.company.com#Birthdate> "1977-04-02T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<http://www.company.com#p_19360-1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  <http://www.company.com#People> .


Any suggestions on what is the problem?

Thanks
Brian
bgeary
 
Posts: 9
Joined: Wed May 16, 2012 2:48 pm

Re: DB.DBA.TTLP_MT - How to Ingest

Postby hwilliams on Wed May 16, 2012 8:55 pm

Hi Brian,

What is on line 29 of the RDF dataset being loaded and does the file parse successfully in a RDF Validator ?

For loading large Datasets Virtuoso provides Bulk Loader functions of this purpose you should consider using as they can genercially load most RDF dataset formats ie RDF/XML, N3, N-Triple, Turtle, NQuad, Trig etc.

Best Regards
Hugh Williams
OpenLink Software
hwilliams
 
Posts: 904
Joined: Thu Mar 06, 2008 4:30 am

Re: DB.DBA.TTLP_MT - How to Ingest

Postby tthibodeau on Wed May 16, 2012 10:21 pm

Hi, Brian --

Without reviewing your entire load file, the AG content after load, etc., I cannot be certain, but it seems likely that AG is tolerating some TURTLE syntax violations, whether by silently dropping malformed triples, adjusting the triples based on common violations, or otherwise.

I believe this will be automatically handled by the Bulk Loader functions Hugh recommended -- though you should rename your source files to end with the correct filename extension for the content TURTLE, i.e.,
Code: Select all
/usr/local/share/virtuoso/vad/Triples.ttl


However, if you wish to pursue your manual efforts, the documentation of the DB.DBA.TTLP_MT function is likely to prove helpful -- particularly the flags bitmask argument.

In your command, you've set this to 1 -- "Single quoted and double quoted strings may with newlines."

I think you'll find it helpful to set this to 64 (or 65, to preserve your existing bit) -- "Relax TURTLE syntax to include popular violations."


There may be other bits that also prove appropriate for loading this file.
--
A: Yes.
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr. // Senior Support & Evangelism
tthibodeau@openlinksw.com // @TallTed // voice +1-781-273-0900 x32
OpenLink Software, Inc.
10 Burlington Mall Road, Suite 265, Burlington MA 01803
Check our Weblog // LinkedIn // @OpenLink // Google+ // Facebook
Universal Data Access, Integration, and Management Technology Providers
tthibodeau
Site Admin
 
Posts: 116
Joined: Tue Feb 26, 2008 10:19 am
Location: Boston, MA

Re: DB.DBA.TTLP_MT - How to Ingest

Postby bgeary on Thu May 17, 2012 8:02 am

Thanks for the help. Let me look at the bulk loader.
Will report back when things are successful or not and any commands I ran to make it work.

Brian
bgeary
 
Posts: 9
Joined: Wed May 16, 2012 2:48 pm

Re: DB.DBA.TTLP_MT - How to Ingest

Postby bgeary on Thu May 17, 2012 8:31 am

A) So when I loaded the kidehen n3 it loads successfully. Then I added in a few triples (line 28, 29, and 30) of my failing file.
Then I loaded it again and this time it failed on line 6.

B) This is my line 6.
Code: Select all
<http://www.company.com#md_19360> <http://www.company.com#hasObject>   <http://www.company.com#md_19360-1o>


C) Conclusion: After looking closely at line 6 I realize I have some lines that do not end with the trailing period ".".
That seems to be my problem. Appears AG either seems to work around missing periods or just drops without notification.
So at least now I know what my problem is. And I verified by adding in the trailing period and it loads successfully.


Thanks for the help! Not sure I would have noticed this problem for a few more hours without assistance.
Brian



Code: Select all
/usr/local/share/virtuoso/vad/kidehen2.n3                                         http://www.company.com                                                                           2           2012.5.17 9:23.22 0  2012.5.17 9:23.22 0  0           NULL        37000 SP029: TURTLE RDF loader, line 6: syntax error




Code: Select all
<http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.or/sioc/ns#User>.
<http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this> <http://www.w3.org/2000/01/rdf-schema#label>  "Kingsley" .
<http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this> <http://rdfs.org/sioc/ns#creator_of> <http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1300> .
<http://www.company.com#md_19360> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  <http://www.company.com#Event>  .
<http://www.company.com#md_19360> <http://www.company.com#hasAction>   <http://www.company.com#md_19360-1a>
<http://www.company.com#md_19360> <http://www.company.com#hasObject>   <http://www.company.com#md_19360-1o>
<http://www.company.com#md_19360> <http://www.company.com#hasSubject>   <http://www.company.com#md_19360-1s>
<http://www.company.com#md_19360> <http://www.company.com#hasTime>   <http://www.company.com#md_19360-1t>
bgeary
 
Posts: 9
Joined: Wed May 16, 2012 2:48 pm


Return to Virtuoso - Virtual Data Management

Who is online

Users browsing this forum: No registered users and 1 guest

cron