Let´s freak out

Hi again!

Some things have happened since the last (and the first) time I wrote here. Probably the main one is that, now, I do have a path to follow, a clearer objective to reach:

To produce an implementation of an algorithm to verify if it is possible to determine the type of every node in a graph according to a ShEx Schemma.
Performance testing over the obtained code.
Establishing comparisons between my results and other people´s work over ShEx through tests.

And that’s all, for the moment. Pretty simple, isn’t it? Helly no ;) The previous list of objectives can suggest a lot of unresolved questions. The main ones, summary of all of them in my opinion, are:

What is ShEx?
What is your plan to do such a thing like that?

I prefer to not give you a big answer for the first one. For sure, we will be talking about it in other post. Let´s just say that it is a set of expressions handy for describing how nodes in a graph should be like in order to determine their type. Example: if you are considering a graph of cities and, for you, a city must necessarily have a name, a town hall, two parks and be connected with at least other two cities by road, there is a way to indicate it using something called Shape Expressions (ShEx). For those of you who are familiar with regular expressions, probably ShEx syntax won´t be something shocking. If your curiosity about this topic has not been satisfied with this brief explanation, you can get more into ShEx in the same way I did: looking for info in papers and google J. Just a piece of advice: look for “shape expressions”, not for “shex”. You will find that “shex” results are not linked with Shape Expressions…

For the second question, I think there is not a simple answer. But there are a bundle of requirements that I have to fullfil to be able to reach my goals. Knowledge that I have to improve or acquire:

- ShEx. Theoretical and syntactical aspects and work that has already been done around it.

- Graph theory.

- Set theory.

- Latex. If we want to produce a paper with this, we will have to use “heavy” math. This kind of formulas that you read thinking “oh god… Where are the numbers? What are these symbols? Is this math o abstract art?” The common text editors do not give you an easy way to manage that kind of symbols.

- Scala language. My director and I have decided that Scala can be a good option to face this work.

On this stage of the journey, I am just trying to look for information, understanding it, and exploring tools and technologies (Scala, Latex). Those are bumps on the road that you necessary have to manage, but it is not productive work.

However, there are some logical and mathematical concepts that I have went through that I have tried to “translate” to simpler words (natural language or easy examples), that can be short cuts to understand or remember some ideas.

I’ll put it here, to share it with you and for me, for my future consults.

Graph theory

Outbound neighborhood of a node “n” in a graph, also outg(n): set ot nodes you can reach from “n” using a single edge to move. It is necessary to use an edge. That means that if “n” does not have an edge pointing itself, “n” is not in outg(n).
Labeled outbound neighborhood of a node “n” in a graph, also out-lab(n): the idea is the same of the previous case, but out-lab(n) is not a set of nodes, but a set of pairs {label, node}. All the nodes present in outg(n) will be present in lab-outg(n), forming a pair with the label of the edge that you have to use to reach them from “n”.

Set Theory

- Bag (over a set). Really, a bag is a function, but we can make a first approach to the idea thinking in a bag as a kind of set that allows repetition of elements. As you probably know, a set (a mathematical set) is a group of unordered elements in which only matters if some element is present or not. For example, if we have a set s = {1,2,3} and we try to add a “3” to it, the set doesn´t change, because it already had a “3”. If we add a “4”, then the set does change. However, a bag over “s” does take care of the number of times that a single element appears in a set. For example, with the bag “b” = {|1,2,2,4,4|}, we are representing a kind of set in which we have a “1”, two “2” and two “4”.

This is the idea in which bags are based, but they are not really rare sets, but functions that can be represented in a way similar to sets. With “b” = {|1,2,2,4,4|}, what we are really doing is defining the next function:

o b(1) = 1

o b(2) = 2

o b(4) = 2

o b(anyOtherNumber) = 0

- Bag Union. The union of bags are quiet similar to the union of sets. Probably the best starting point is an example

o {|1,2,2|} bagUnion {|2,3,3|} = {|1,2,2,3,3|}

We add the elements represented in both bags regarding the number of repetitions of each element. I have written “bagUnion” instead of a mathematical symbol because is not easy to find the symbol used to represent that kind of operation. It looks like a Union (U) and a “+” overlapped.

Regular Bag Expressions

We can use RBEs (Regular Bag Expressions) to define types in a graph. Let´s see an example and work over it in order to talk about possibilities of RBEs:

Let´s consider S0 as a schema definition. The schema contains three expressions that looks like Tn à RegularBagExpression. Each sentence defines a type in the schema S0, and that types are T0, T1 and T2. Thinking in graphs, we can say “type of node” instead of only “type”, and that is going to make things easier.

Now, let´s dig into the first sentence, T0. We could translate it in the next way: “If we want to consider a node ‘n’ of type ‘T0’, then node ‘n’ should be like we are describing with this RBE”. At this point we can focus only in deciphering the meaning of the RBEs ;) We can see the next elements:

- Letters: “a”, “b” and “c”. They refer to labels in edges.

- Tn: “T1” and “T2”. They refer to types, the same as they did before.

- Double colon “::” The link an edge label with a type. They always appear between a letter and a Tn.

- Bar: “|”. It means alternative between the elements that appear in its right and its left. In natural language, it means “or”.

- Double bar: “||” It means unordered concatenation. It means that both elements (right and left) should appear, but without specifying the order.

- Parenthesis: “(…..)”. No difference with the classic meaning os parenthesis in math and logic.

- Asterisk: “*”. It means that the element at its left should appear from 0 to infinite times.

With this, we are in conditions to translate in natural language the RBEs of the three sentences:

1º: To be a T0 type, you have to have an edge labeled that point to a T1 and an edge labeled as “b” that points to a T2. And those connections, of course, don´t have an order, we are talking about graphs ;)

2º: To be a T1 type, you have to have an edge labeled “a” with a T1 OR an edge labeled “b” with a T2. And that has to be truth from 0 to infinite times. In a simpler way, to be a T1 you can have as much connections as you want with the form a::T1 and b::T2, being not having a single connection a valid possibility. But ey! no other connections of other forma are allowed.

3º: To be a T2 type. You must have a connection b::t2 OR c::t1. At least one of any type, but no more than one of each type.

And that´s enough for this time. We will be talking about ShEx soon, and not the math that rounds it ;)

See you in the next post!

Let´s freak out

jueves, 20 de marzo de 2014

Direction and short cuts

martes, 18 de febrero de 2014

Starting up