>>Our next speaker, Yin Qu is from the University
of Texas.>>QU: Yes, Texas, A&M.
>>Okay.>>QU: So good morning, everyone. I’m Yin
Qu and my adviser is Andruid Kerne, and we are here to present our work on Interoperable
Metadata Semantics. So what is metadata semantics? So we define metadata semantics as the conjunction
between a document and its content, its descriptions, its operations and its relations. So let’s
look at an example. This is an article from New York Times. We see that in addition to
its content, we have those descriptions like region and the–and the relations in a form
of hyperlinks and operations like share, email, so that you can send it to your friends. And
also, there are other kinds of metadata semantics on the web in different forms like RSS feeds,
like pattern repository and digital libraries and products and so on. So this metadata semantics
are useful applications but there are some problems with it to use them in applications.
Like, one problem is that most of the metadata semantics are represented in HTML files, which
you cannot use directly. Typically, you cannot use directly in your programs. And different
sources may use different forms, like, even they are using the same schema. So if you
want to use metadata semantics from different sources, you need to deal with them individually,
which is not convenient. So this makes it difficult to obtain, to use and to extend
between systems. So we have some–people have proposed the solutions for those problems.
Like, we have other metadata standards like Dublin Core and MARC, and others. And so,
metadata semantics in those–written in these standards can–you can use them without knowing
the sources. And–however, the standardization processes is slow, so it cannot scale up.
It can hardly scale up with the web. Well then, we have the semantic web approach which
is a very, very general way to describe metadata semantics. However, the availability is quite
limited, at least on today’s web. We have also those alternative semantic web ways like
Piggy Bank, which uses transcript data scrapers to scrape metadata semantics from documents.
And we also have microdata which allows the publisher to embed the semantics in HTML pages.
So our objective in this work is to implement interoperable metadata semantics, which is
obtained from different sources but translated into a consistent representation that you
can use directly in your program and also without knowing the sources. And so, we have
this software called Meta-Metadata which is a language for authoring wrappers. Wrappers
are defined as metadata schema and the extraction or presentation rules. And it is also an architecture
for extracting and translating metadata semantics from documents to your application and to
take actions on them, basically, action or operations. And it is also a repository of
all sort of wrappers, so you don’t have to start from scratch every time, and it is an
open-source software so everybody can use it. Here is the overview of the architecture
and I will go through the details with the use case of integrating search results from
different search engines. Let’s look at the authoring process first. So this is a typical
search result from Google which has a heading, a link and a snippet from the document it
refers to and we can use this wrapper to model this metadata schema. It’s pretty straightforward
here. You can see the heading, snippet, and link there. With this definition of search
result, we can define a search as the–as this collection of search results. As you’ll
see here, it’s a collection of search result. And this semantic action, basically, it reads
over each search result and go to that URL and to see if there are new semantics we can
use. It’s basically a crawling action. And the other wrappers are actually made up with
cross-language type system with support for inheritance. Here, each wrapper is corresponded
to a type. So like–so, with the general definition of search, we can, like, derive new types–subtypes
from the search for each individual source. But in this way, we can handle each search
as just–each of them as it–just a search. So we can handle all the metadata semantics
in a consistent way. Like here is a wrapper for Google search. We see that in addition
to the metadata schema, we see this selector here which specifies the MIME type or URL
pattern of documents for this wrapper so the system can know which wrapper to use for this
kind of documents. And we have those extraction rules. Here, we use the XPath parser but its
not limited XPath. We also have PDF parser, like, HTML parser, XML parser. This is a wrapper
for Yahoo Buzz search and Bing search, and Slashdot search. They are not very different
from each other, except for different extraction rules. And in the compilation time, those
types defined by wrappers, they are translated and they are compiled into programming language
types which we call metadata classes. Like, we support different kinds of languages. Here
shows a Java class for search result, and also this is a C Sharp version of search.
And these metadata classes serve as mapping between the metadata semantics in documents
to your application like active domain objects. At runtime, the system selects the given URL.
The system selects the appropriate wrapper for that document, extract the metadata semantics
and detect actions on extracted metadata objects. Those objects are just ordinary programming
language objects like defined here. So you can use them in your application. The–those
objects can also be serialize into different forms like XML or JSON and transfer through
network or other medium and deserialized in another system or language. So the semantics
can transfer across the system boundaries. And you can also render them into HTML with
micro data embedded. Here shows an example of the rendered HTML in which search results
from different engines are integrated together. And we think that Meta-Metadata transforms
the web into a ecosystem of interoperable metadata semantics, so that you do not rely
on publishers to publish those former semantics for you to use. You can just extract them
and it supports the application development by helping you to extract metadata semantics
and transfer them between systems. So, it also comes with a reusable repository, so
you can reuse those schema and the extraction rules defined by other people there. It is
currently used in several applications. We released it as an open source software, it
is funded by Google Summer of Code this year. And we also have–we demo the–our creativity
support and sensemaking too, which is called InfoComposer now, and here shows a screenshot
of it. This is an image clipping from our second paper, and did–here shows all the
metadata semantics associated with that paper. In the future, we will work on some more applications
using the software, especially tran-service–tran-surface applications, which transfers semantics across
surfaces and we will promote interoperability with other techniques like RDF. And we encourage
people to try the software and propose new use cases for the software to evolve. So that’s
it. Questions?>>I have one.
>>QU: Yes?>>I’m wondering about the relationship of
your metadata model with the RDF, for instance. And do you have some equivalent relationship
mechanism, such as a subclass property and subproperties relationship, you know?
>>QU: Between?>>Between different type of metadata.
>>QU: Oh, we–currently, we–I think the relationship between objects are mostly impressed
in a–like the field name, like in–like the reference relationship between the scholarly
article, the papers. It is–there is a field called references in a scholarly article which
is a collection of–a set of other scholarly article, so this represents the relationship.
And we are working on connecting this kind of relationship with RDF.
>>It will be pretty easy for us to generate RDF but our inherent approach is based on
the idea that you should use abstract data types to represent complex data, not cluegy
things like dripple that, you know, that you have to then translate into abstract date
types ultimately to operate on them in programs. We propose and have developed this quite substantial
mechanism for using abstract data types directly to represent very complex type systems and
graphs of interconnected semantic data.>>Oh, interesting. Thank you.
>>Thank you. Other questions? Oh, let’s thank our speaker. Thank you.
>>QU: Thank you.