Tuesday, January 13, 2015

Getting started with Semantic Technologies

Recently Ontotext launched a Self-Service Semantic Suite – shorted to S4. S4 provides a set of services for low-cost (currently free) on-demand text analytics and metadata management on the cloud. This provides a great way to get acquainted with Semantic Technologies.


Available S4 services

S4 currently offers the following services:

  1. Text analytics for News content, Biomedical content or Twitter content
  2. Linked Data server with reliable access to the DBpedia, FactForge, GeoNames, WordNet, MusicBrainz, and New York Times datasets 
  3. Self-managed RDF database (GraphDB) on the cloud

Trying out Text analytics

Text analytics – in this context – is about finding out what is important in texts (natural language), and using this information.

To try this out, copy some text containing some Persons and Places, biomedical terms and/or Twitter content. Preferably in English for best results, but other languages will produce results as well.
Go to the S4 homepage and click on “Demo S4 today for free”. Paste your text in the Text Analytics box; choose whether your text is more News, Biomedical or Twitter oriented, and hit Execute.

Your result will show the provided text with different types of terms highlighted in different colours. See the below example.



Figure 1. Example of an annotated text

If you hover over an annotated term, it will show extra information. For instance for an organisation it will show the location in DBpedia (the semantic version of Wikipedia). In my example this makes clear that this article is not about some IMF, it is about the IMF, and more info is available on http://dbpedia.org/page/International_Monetary_Fund.

So what’s in it for you? 

This service can provide all kinds of structure and information on topics that can help you to classify, understand, link and enrich information.

Trying out Semantic queries

S4 also lets you try out semantic queries using SPARQL, the query language for semantically stored information such as the DBpedia.
Go to the S4 homepage and click on “Demo S4 today for free”. Go to the LOD Access Tab. Select a query from the Pulldown. Let’s try “Find airports near London”.
The SPARQL query is:

PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX ff: <http://factforge.net/>
PREFIX om: <http://www.ontotext.com/owlim/>

SELECT distinct ?airport ?label ?RR
WHERE {
        dbpedia:London geo-pos:lat ?latBase ;
                       geo-pos:long ?longBase .
        ?airport omgeo:nearby(?latBase ?longBase "50mi");
                 a dbp-ont:Airport ;
                 ff:preferredLabel ?label ;
                 om:hasRDFRank ?RR .
      } ORDER BY DESC(?RR)


Even without a SPARQL crash-course, this is quite easy to read:

  • First some definitions are introduced
  • The query will return the airport ID, name and some rank (RR)
  • From DBpedia the latitude and longitude of London are retrieved
  • Only results that have the DBpedia Ontology type “Airport” are selected, 
  • They must be nearer than 50 miles to London, according to the Owlim Geospatial function “Nearby”

Click on “Execute” and have a look at the results. Try out some of the other queries as well.

So what’s in it for you? 

An enormous wealth of structured information is available for you to use. Were you aware that you could ask Wikipedia such detailed questions? Be aware: extensive knowledge is needed to write such concise statements and really use the results.

Next step

Now that you know that Text analysis and Semantic queries are available, what is your next step in using Semantic technologies?