9^{th }Haifa Workshop on Interdisciplinary Applications of Graph Theory,

Combinatorics and Algorithms

Wednesday-Thursday, May 20-21, 2009

Room 570, Education Building, 5th Floor

**Diameter and Center Computations in Graphs and **

**Networks with Applications to Computational Geometry**

*Michel Habib*

LIAFA, CNRS & Université Paris Diderot, France

This email address is being protected from spambots. You need JavaScript enabled to view it.

Starting form a very practical question: *what is the best algorithm available to compute or to approximate the diameter of a huge network*? At first, to compute the diameter of a given graph, it seems necessary to compute all-pairs shortest paths, which is not known to be computable in linear time, see [1] and [10].

We first present the well-known 2-sweap Breadth-First Search approximation procedure and some experimental results of its randomized version on huge graphs [2], [4], [5], [11]. In order to explain the efficiency of this 2-sweap **linear**** **** time** procedure, we study its properties on various graph classes and its relation with classical graph parameters such as: k-chordality or tree-length [7].

We then emphasize on a notion introduced by M. Gromovin 1987 [8], namely the δ-hyperbolic metric spaces via a simple 4-point condition: for any four points *u, v, w, x* the two larger of the distance sums *d(u,v)+d(w,x), d(u,w)**)+ ** d(v,x), d(u,x)= d(v,w)* differ by at most 2δ. δ-hyperbolic metric spaces play an important role in geometric group theory, geometry of negatively curved spaces, and have recently become of interest in several areas of computer science including computational geometry and networking.

A connected graph *G* = (*V,E*) equipped with its standard graph metric *d _{G}* is δ-hyperbolic if the metric space (

*V, d*) is δ-hyperbolic. This very interesting notion captures the distance from a graph to a tree in a metric way. Moreover δ-hyperbolicity is polynomially computable and easy to approximate.

_{G}We survey some results about δ-hyperbolicity of graphs, and in particular we show that δ-hyperbolicity generalizes tree-length with respect to the2-sweap algorithm. We provide some experimental results on real data (i.e. graphs extracted from the Internet), showing that δ-hyperbolicity can be a very practical measure [9].

Some applications to computational geometry are also presented [3] and we finish with a comparison on the computational complexity of the diameter and the center of a given graph, listing some open questions [12].

**References**

[1] T. M. Chan, All-pairs shortest paths for unweighted undirected graphs in *o*(*mn*) time, SODA, 2006.

[2] V. Chepoi, F. Dragan, *A linear time Algorithm for finding a central vertex of a chordal graph*, ESA (1994) 159-170.

[3] V. Chepoi, F. Dragan, B. Estellon, M. Habib, Y. Vaxès, *Diameters, centers, and approximation trees of δ-hyperbolic spaces and graphs*, ACM Computational Geometry Conf.(2008) 59-68.

[4] D.G. Corneil, F. Dragan, M. Habib, C. Paul, *Diameter determination on** **restricted graph families*, Discrete Applied Mathematics 113 (2001) 143-166.

[5] D.G. Corneil, F. Dragan, E. Khler, *On the power of BFS to determine a** ** graph's diameter*, Networks, vol. 42 (4), (2003) 209-223.

[6] F. Dragan, *Estimating all pairs shortest paths in restricted graph families:** ** a unified approach*, J. of Algorithms 57 (2005) 1-21.

[7] Y. Dourisboure, C. Gavoille, *Tree-decomposition of graphs with small diameter bags*, Discrete Math, 307 (2007) 208-229.

[8] M. Gromov, *Hyperbolic Groups*, in Essays in Group Theory (S.M. Gersten, ed.), MSRI Series 8 (1987) 75-263.

[9] R. Kleinberg, *Geographic routing Using hyperbolic space*, INFOCOM (2007), 1902-1909.

[10] D. Kratsch, J. Spinrad, *Between O(mn) and O(n ^{α}*), SODA (2003) 709-716.

[11] C. Magnien, M. Latapy, M. Habib, *Fast Computation of Empirically Tight** ** Bounds for the Diameter of Massive Graphs*, ACM J. of Experimental Algorithms, 13 (2008).

[12] ... M. Parnas, D. Ron, *Testing the diameter of Graphs*, Random Structures & Algorithms, Vol. 20, (2002), 165-183.** **

** **

**Fault-Tolerant Spanners for General Graphs**

*S. Chechik*, M. Langberg**, D. Peleg*, L. Roditty****

*The Weizmann Institute of Science, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it. ; This email address is being protected from spambots. You need JavaScript enabled to view it.

**Open University, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

***Bar-Ilan University, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

The paper concerns graph spanners that are resistant to vertex or edge failures. Given a weighted undirected *n*-vertex graph *G=(V, E) *and an integer *k*≥1, the subgraph *H=(V, E')*, *E'* Í* E*, is a *spanner* of stretch *k* (or, a *k*-spanner) of *G* if d_{H}*(u, v) *£* k *· d_{G}*(u, v)* for every *u, v *Î* V*, where d_{G'}*(u, v)* denotes the distance between *u* and *v* in *G'*. Graph spanners were extensively studied since their introduction over two decades ago. It is known how to efficiently construct a (2*k*-1)-spanner of size *O(n*^{1+1/k}*)*, and this size-stretch tradeoff is conjectured to be tight.

The notion of fault tolerant spanners was introduced a decade ago in the geometric setting [Levcopoulos et al., STOC'98]. A subgraph *H* is an *f*-vertex fault tolerant *k*-spanner of the graph *G* if for any set *F* Í* V* of size at most *f* and any pair of vertices *u, v* Î* V* \ *F*, the distances in *H* satisfy d_{H\F}*(u,v)* £ *k* · d_{G \ F}*(u,v)*. Levcopoulos et al. presented an efficient algorithm that given a set *S* of *n* points in ú^{d}, constructs an *f*-vertex fault tolerant *geometric* (1+e)-spanner for *S*, that is, a sparse graph *H* such that for every set *F *Í* S* of size *f* and any pair of points *u, v *Î* S \ F ,* d_{H \ F}*(u,v) *£* *(1+e)|*uv*|, where |*uv*| is the Euclidean distance between *u* and *v*. A fault tolerant geometric spanner with optimal maximum degree and total weight was presented in [Czumaj & Zhao, SoCG'03]. This paper also raised as an open problem the question whether it is possible to obtain a fault tolerant spanner for an arbitrary undirected weighted graph.

The current paper answers this question in the affirmative, presenting an *f*-vertex fault tolerant (2*k*-1)-spanner of size *O(f*^{ 2}* k ^{f}*

^{+1}

*·*

*n*

^{1+1/k}

*log*

^{1}

^{-}

^{1/k}

*n)*. Interestingly, the stretch of the spanner remains unchanged while the size of the spanner only increases by a factor that depends on the stretch

*k*, on the number of potential faults

*f*, and on logarithmic terms in

*n*. In addition, we consider the simpler setting of

*f-edge*fault tolerant spanners (defined analogously). We present an

*f-edge*fault tolerant 2

*k*-1 spanner with edge set of size

*O*(

*f*·

*n*

^{1+1/k}) (only

*f*times larger than standard spanners). For both edge and vertex faults, our results are shown to hold when the given graph

*G*is weighted.

Work supported in part by The Open University of Israel's Research Fund (grant no. 46109).

** **

**Linear Recurrences of Combinatorial Functions**

*Tomer Kotek*

Technion, Israel Institute of Technology, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

We say a function f : ù®ù is a combinatorial function if it has a combinatorial interpretation. A typical non-trivial example is given by A. Cayley's Theorem from 1889, which says that *T* (*n*) = *n ^{n}*

^{-}

^{2}can be interpreted as the number of labeled trees on

*n*vertices. Another example are the Catalan number

*C*, which count the number of valid arrangements of

_{n}*n*pairs of parentheses. We use and extend a framework first introduced by C. Blatter and E. Specker [1], to study such functions. Let

*R*={

*R*

_{1}, ¼,

*R*} be a set of relation symbols with

_{s}*R*of arity

_{i}*r*(

*i*) (i.e.,

*R*Í [

_{i}*n*]

^{r}^{(i)}, C a class of finite

*R*-structures closed under isomorphisms, and

* sp_{C}* = |{(

*R*

_{1}, ¼,

*R*) Î [

_{s}*n*]

^{r}^{(i) }´ ¼ ´ [

*n*],

*R*

_{1}, ¼,

*R*) Î C } |

_{s}A function f : ù®ù is called a Specker function if there is a finite set of relation symbols`*R* and a class of`*R*-structures C such that *f*(*n*)= *sp_{C}*(

*n*).

C. Blatter and E. Specker, [1, 2, 3], used a logical method to study Specker functions. Monadic second order logic, *MSOL*, is an extension of First order logic, *FOL*. *MSOL* allows quantification over subsets of elements from the universe [*n*]. We say a class C of`*R*-structures is *MSOL*-definable if there is an *MSOL* formula f such that *M* Î C iff *M* |=f. We say a Specker function *sp_{C}*(

*n*) is

*MSOL*-definable if C is

^{k}*MSOL*-definable and consists of`

*R*-structures with relations of arity at most

*k*.

**Example**. The Bell number *B _{n}*, the number of partitions of [

*n*], is an

*MSOL*

^{2}definable Specker function, given by

*B*=

_{n}*sp*(

_{C}*n*) for

C = {* R | R *is an equivalence relation on [*n*]} ,

as the property of being an equivalence relation is definable in *MSOL* (in fact, also in *FOL*). Blatter and Specker proved the following remarkable but little known theorem.

**Theorem** (Specker-Blatter Theorem). *For any Specker function sp_{C} which MSOL*

^{2}

*-definable, sp*

**satisfies a modular linear recurrence relation with constant coefficients**_{C}*sp_{C}(n) *º

_{j}*sp*

**(n**_{C}*-*

*j) (mod m)*

*for every m *Î ù*, and hence is modularly ultimately periodic modulo m for each m. *

Continuing the line of investigation begun by Blatter and Specker, we use the notion of order invariance to deal with linear and modular linear recurrences. A formula f is said to be order invariant if, in addition to the relations in the`*R*-structures, it uses a relation *R*_{<} which is interpreted as a linear order of [*n*], and it is invariant on the order (i.e., it agrees on all interpretations of *R*_{<} as a linear order). The logic *OIMSOL* consists of order invariant *MSOL* formulas. The notions of *OIMSOL*-definable class of structures C and *OIMSOL*-definable Specker function *sp_{C}(n) *carry from

*MSOL*.

We further define a broader class of Specker functions. Let *ops _{D} *(

*nR*

_{<1}) be defined as

*ops*(

_{D}*nR*

_{<1}) =

|{( *R*_{1}, ¼, *R _{s}* ) Í [

*n*]

^{r}^{(i) }´ ¼ ´ [

*n*]

^{ }

^{r}^{(s)}| [n],

*R*

_{<1}, R

_{1}, ¼,

*R*) Î

_{s}*D*} |

A function f : ù®ù called an order invariant Specker function if there is a finite set of relation symbols`*R* and a class of ordered R-structures *D** *such that for all linear orders <_{1} and <_{2} of [*n*] we have *f*(*n*) = *osp_{C}*(

*n*, <

_{1}) =

*osp*(

_{C}*n*, <

_{2}). We will be interested in order invariant

*MSOL*

^{1}-definable Specker functions. It is worth-while to note every

*OIMSOL*-definable Specker function

^{k}*sp*(

_{C}*n*) is an order invariant

*MSOL*-definable Specker function

^{k}*osp*(

_{D}*n*, <

_{1}).

We give a characterization of Specker functions which satisfy a linear recurrence relation over

**Theorem**. *Let f be a Specker function. Then f satisfies a linear recurrence relation over* *iff* *f* = *f*_{1 }- *f*_{2} *is the difference of two order invariant* *MSOL*^{1}-*Specker functions. *

In the proof of this theorem it is convenient to introduce Specker polynomials, which are a special case of graph polynomials where graphs are replaced by linear orders. Next we show that the Specker-Blatter Theorem cannot be extended to order invariant Specker functions which are definable in *MSOL*^{2}. More precisely:

**Proposition**. *The Specker function f(n)= C _{n}, where C_{n} are the Catalan numbers, is order invariant and MSOL*

^{2}

*-definable, and is not ultimately periodic modulo 2.*

However, if we require that the defining formula *f* of a Specker function is itself order invariant, i.e. *f* Î *OIMSOL*^{2}, the Specker-Blatter Theorem still holds.

**Theorem**. *Let f be an OIMSOL*^{2}*-definable Specker function. Then, for all m* Î *, the function f satisfies a modular linear recurrence relation. *

Based on joint work with J. A. Makowsky.

** **

**References**

[1] C. Blatter and E. Specker. Le nombre de structures finies d'une théorie à caractère fin. *Sciences Mathématiques*, Fonds National de la Recherche Scientifique, Bruxelles, pages 41- 44, 1981.

[2] C. Blatter and E. Specker. Recurrence relations for the number of labeled structures on a finite set. In E. Börger, G. Hasenjaeger, and D. Rödding, editors, In *Logic and Machines: Decision Problems and Complexity*, volume 171 of Lecture Notes in Computer Science, pages 43-61. Springer, 1984.

[3] E. Specker. Application of logic and combinatorics to enumeration problems. In E. Börger, editor, *Trends in Theoretical Computer Science,* pages 141-169. Computer Science Press, 1988. Reprinted in: Ernst Specker, Selecta, Birkhäuser 1990, pp. 324-350.

**Approximating the Number of Network Motifs**

*Mira Gonen* Dana Ron** Yuval Shavitt*** *

Tel-Aviv University, Ramat Aviv, Israel

*This email address is being protected from spambots. You need JavaScript enabled to view it.; **This email address is being protected from spambots. You need JavaScript enabled to view it.; ***This email address is being protected from spambots. You need JavaScript enabled to view it.

World Wide Web, the Internet, coupled biological and chemical systems, neural networks, and social interacting species, are only a few examples of networks that contain characteristic patterns, termed *network* *motifs* [16] or *graphlets* [20], which occur far more often than in randomized networks with the same degree sequence.

There are quite a few works that deal with detecting network motifs and of counting their number in polynomial time. One of the most elegant techniques devised is *color-coding*, introduced in [3], and further applied in [4, 5, 2, 1]. The color coding technique is a building block in some of the algorithms presented in our work. Other related work in this category includes [15, 21, 10, 14]. In spite of all these works, a need for more efficient algorithms arises when the input graph is very large, as is indeed the case in many applications of motif counting.

In addition, a new systematic measure of a network's *local* topology, namely, counting the number of motifs a vertex is part of, was recently suggested [19, 13] as a method to classify nodes in the network. The promise is that the distribution of motifs a vertex participates in is an indication of its function in the network. Therefore, counting the number of network motifs *a vertex is part of* provides a major challenge. However, the only existing result is a polynomial algorithm for approximating the local number of triangles shown in [6].

Our first set of results is related to motif local counting. We present several algorithms with time complexity that, for the first time, approximate the local number of occurrences of *k*-length cycles, *k*-length cycles with a chord, and (*k**-*1)-length paths, where *k* = *O* (log *n*).We also provide algorithms with time complexity that, for the first time, approximate the local number of occurrences of all motifs of size of at most four. Our second set of results relates to general approximate motif counting. We design for the first time, *sublinear* algorithms for approximating the number of copies of certain constant-size subgraphs in a graph *G*. That is, our algorithms do not read the whole graph, but rather query parts of the graph. The main focus of our work is on the basic problem of counting the number of length-2 paths and more generally on counting the number of stars of a certain size. Specially, we design an algorithm that, given an approximation parameter 0 < e < 1 and query access to a graph *G*, outputs an estimate such that with high constant probability, (1-e) (*G*) £ £ (1+e) (*G*), where (*G*) denotes the number of stars of size *s*+1 in the graph. The expected query complexity and running time of the algorithm are We prove that our algorithm is tight up to polylogarithmic factors in *n* and the dependence on Namely, we show that:

- Any (multiplicative) approximation algorithm for the number of
*s*-stars must perform queries. - Any constant-factor approximation algorithm for the number of
*s*-stars must perform queries when the number of*s*-stars is*O*(*n*).^{s} - Any constant-factor approximation algorithm for the number of
*s*-stars must perform queries when the number of*s*-stars is

Our work extends the work of [11] and [12] on approximating the number of edges (or average degree) in a graph. Combined with these results, our result can be used to obtain an estimate on the variance of the degrees in the graph and corresponding higher moments. Another related line of work deals with approximating other graph measures (such as the weight of a minimum spanning tree) in sublinear time and includes [7, 9, 8, 18, 17].

In addition, we give some (negative) results on approximating the number of triangles and on approximating the number of length-3-paths in sublinear time.

This work is based on a joint work with Yuval Shavitt presented in the *Proceedings of the **6*^{th}* International Workshop on Algorithms and Models fort the Web-Graph (WAW 2009),* and on a joint work with Dana Ron and Yuval Shavitt submitted to the *13*^{th}* International Workshop on Randomized Techniques in Computation (RANDOM 2009).*

** **

**References**

[1] N. Alon, P. Dao, I. Hajirasouliha, F. Hormozdiari, and S. C. Sahinalp. Biomolecular network motif counting and discovery by color coding. *Bioinformatics, *24(13):241-249, 2008.

[2] N. Alon and S. Gutner. Balanced families of perfect hash functions and their applications. In *Proceedings of the **34*^{th}* International Colloquium on Automata**,** Languages and Programming (ICALP)*, pp.435-446, 2007.

[3] N. Alon, R. Yuster and U. Zwick. Color coding. *Journal of the ACM, *42:844-856, 1995.

[4] N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. *Algorithmica, *17:209-223, 1997.

[5] V. Arvind and V. Raman. Approximation algorithms for some parameterized counting problems. In *Proceedings of the **13*^{th}* International Symposium on Algorithms and Computation (ISAAC)*, pp.453-464, 2002.

[6] L. Becchetti, P. Boldi, C. Castillo and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In *Proceeding of the **14*^{th}* ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), *pp.16-24, 2008.

[7] B. Chazelle, R. Rubinfeld and L. Trevisan. Approximating the minimum spanning tree weight in sublinear time. *SIAM Journal on Computing, *34(6): 1370-1379, 2005.

[8] A. Czumaj, F. Ergun, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld and C. Sohler. Approximating the weight of the euclidean minimum spanning tree in sublinear time. *SIAM Journal on Computing*, 35(1): 91-109, 2005.

[9] A. Czumaj and C. Sohler. Estimating the weight of metric minimum spanning trees in sublinear time. In *Proceedings of the Thirty-Sixth Annual ACM Symposium on the Theory of Computing, (STOC)*, pages 175-183, 2004.

[10] R. Duke, H. Lefmann and V. Rödl. A fast approximation algorithm for computing the frequencies of subgraphs in a given graph. *SIAM Journal on Computing*, 24(3): 598-620, 1995.

[11] U. Feige. On sums of independent random variables with unbounded variance, and estimating the average degree in a graph. *SIAM Journal on Computing, *35(4): 964-984, 2006*.*

[12] O. Goldreich and D. Ron. Approximating average parameters of graphs. *Random Structures and Algorithms*, 32(4): 473-493, 2008.

[13] M. Gordon, E. A. Livneh, R. Y. Pinter and E. Rubin. Elucidating protein function using graphlet degree vectors in protein-protein interactions networks. Under review.

[14] J. Grochow and M. Kellis. Network motif discovery using subgraph enumeration and symmetry-breaking. In *Proceedings of the **11*^{th}* Annual International Conference Research in Computational Molecular Biology, (RECOMB)*, pages 92-106, 2007.

[15] N. Kashtan, S. Itzkovitz, R. Milo, and U. Alon. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. *Bioinformatics. *20(11): 1746-1758, 2004.

[16] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii and U. Alon. Network motifs: Simple building blocks of complex networks. *Science,* 298: 824-827, 2002.

[17] H.N. Nguyen and K. Onak. Constant-time approximation algorithms via local improvements. In *Proceedings of the Forty-Ninth Annual Symposium on Foundations of Computer Science (FOCS), *pages 327-336, 2008.

[18] M. Parnas and D. Ron. Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms. *Theoretical Computer Science*, 381(1-3): 183-196, 2007.

[19] N. Przulj. Biological network comparison using graphlet degree distribution. *Bioinformatics*, 23(2):e177-e183, 2007.

[20] N. Przulj, D. G. Corneil and I. Jurisica. Modeling interactome: Scale-free or geometric? *Bioinformatics, *20(18): 3508-3515, 2004.

[21] S. Wernicke. Efficient detection of network motifs. *IEEE/ACM Transactions on Computational Biology and Bioinformatics, *3(4):347-359, October 2006.

**Fishing Traffic Under Attack****: Overcoming DDoS Attacks by Server-Controlled Traffic Filtering**

*Polina Zilberman*

Ben-Gurion University, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

As more and more services are provided by servers via the Internet, Denial-of-Service (DoS) attacks pose an increasing threat to the Internet community. A DoS attack overloads the victim server with a large volume of adverse requests, thereby rendering the server unavailable to well-behaved users. A Distributed DoS (DDoS) attack is an extension of the DoS attack. DDoS occurs when multiple compromised computers, scattered all over the Internet, flood the bandwidth or other resources of a target server. Unfortunately, there is no "silver bullet" solution for protecting servers from DDoS attacks, since DDoS traffic may be indistinguishable from the legitimate user traffic.

In this paper, we address the DDoS problem by shaping the server incoming traffic. We provide the server with means for shaping its incoming traffic in order to maximize its resulting benefits from the legitimate user traffic. The traffic shaping is implemented by distributed algorithms that allow servers to dynamically filter their incoming traffic based on a distributed policy. The distributed filtering policy is computed by the server using, for instance, machine learning algorithms, and is based on the server's actual utility of the arriving traffic. Such a filtering policy consists of a set of traffic classification rules and the corresponding amounts of traffic that match each rule. For each classification rule, the corresponding traffic amount, measured in bandwidth units, is defined by the server. The policy allocates more bandwidth to high-priority traffic. Complying with the derived policy, limits the bandwidth consumption of the potential DDoS traffic. Thus, the proposed algorithms defend the server against DDoS attacks and simultaneously ensure that it continues to receive valuable user traffic.

The distributed filtering algorithm is enforced by the Internet Service Provider's (ISP) / Network Service Provider's (NSP) routers when the server is being overloaded with traffic. The goal is to maximize the amount of filtered traffic, corresponding to the distributed policy, forwarded to the server from the ISP's/NSP's network. We assume that the server leases communication lines from the ISP/NSP, so that the overall capacity of the leased lines, that connect the server to its users, is limited to the server's processing capacity. In addition, the ISP/NSP supports a notification procedure in which the server is provided with a summary on the traffic that has been discarded by the ISP's/NSP's routers.

The first proposed distributed filtering algorithm relies on complete collaboration among the ISP/NSP routers and delivers the best possible traffic mix to the server, in polynomial time. In contrast, the second filtering algorithm assumes no collaboration among the routers, and each router performs its filtering based on its incoming traffic. We study the worst-case performance of the second algorithm for different network topologies. Using competitive analysis, we prove that in the worst-case scenario the second algorithm delivers a significant fraction of the best available traffic. This fraction depends on the number of filtering routers in the network's periphery. In the third distributed filtering algorithm we assume an intermediate level of collaboration among the ISP/NSP routers.

Joint work with Shlomi Dolev, Yuval Elovici and Alex Kesselman.

Partially supported by Deutsche Telekom, FRONTS EU project, and Lynne and William Frankel Center for Computer Science.

*Marina Biberstein, Yuval Harel, Andrei Heilper*

IBM Haifa Research Lab, Haifa University Campus, Israel

{biberstein, harely, heilper}@il ibm com

Cell/B E is a heterogeneous multicore processor that was designed for efficient execution of parallel and vectorizable applications with high computation and memory requirements. The transition to multicores introduces the challenge of providing tools that help programmers tune the code running on these architectures. Tracing tools, in particular, often help locate performance problems related to thread and process communication.

A major impediment to implementing tracing on Cell is the absence of a common clock that can be accessed at low cost from all cores. The OS clock is costly to access from the auxiliary cores and the hardware times cannot be simultaneously set on all the cores. In this paper, we describe an offline trace analysis algorithm that assigns wall-clock time to trace records based on their thread-local time stamps and event order.

In order to reduce the complexity in the number of events, the algorithm collects a partial set of timing constraints. We prove that the full set of constraints can be reconstructed using an all-pairs shortest path algorithm. We demonstrate how to use these constraints to generate the relative offsets of the thread-local clocks so that the resulting global time is consistent with both the thread-local times and the event order. We also show how to overcome real-life problems such as the monitoring imprecision.

Our experiments on several Cell SDK workloads show that the indeterminism in assigning wall-clock time to events is low, on average 20-40 clock ticks (translating into 1.4-2.8 ms on the system used in our experiments).

More formally, we define our system as composed of one or more threads of control, each thread a sequence of events.

**Definition 1**. *A *trace *is a tuple* (*E*, tid, ttime), *where**:*

*E is the set of events*- tid :
*E*® ù*is a function that matches each event with its thread id.* - #
*is a happened-before partial order relation over E: e*_{1}#*e*_{2}*implies that e*_{1}*occurred not later than e*_{2}*.* - ttime :
*E*® ú*is a function that matches each event to its thread-local timestamp. We assume that the clocks advance at the same rate for all the threads. We also assume, w.l.o.g., that the clocks are incrementing:*

* *

tid(*e*_{1})*=* tid(*e*_{2})*, e*_{1 }# *e*_{2} ttime(*e*_{1}) # ttime(*e*_{2}). (1)

Figure 1: A parallel execution example

**Definition 2**. *Given a trace T *= (*E*, tid, #, ttime), *a* global time function *for T is a function* gtime : *E* ® ú *that agrees with the thread-local timestamps and event order, i.e., such that for every two events* *e*_{1}, *e*_{2} Î *E*

tid(*e*_{1}) = tid(*e*_{2})* *gtime(*e*_{1}) - gtime(*e*_{2}) = ttime(*e*_{1}) - ttime(*e*_{2}) (2)

(*e*_{1}) # (*e*_{2})* * gtime(*e*_{1}) # gtime(*e*_{2}) (3)

* *

Consider the example in Figure 1. Figure 1(a) shows a sample trace with thread-local timestamps and event order data; while in Figure 1(b) we show an example of corresponding global time. Note that any pair of events from different threads with a known order between them imposes a constraint on the relative start times of their threads. For example, in Figure 1(a), *e*_{11} precedes *e*_{21}, and thread-local timestamps of both are 0, so Thread 1 must have started before Thread 2.

Collecting the constraints from all event pairs has the complexity of *O*(|*E*|^{2}), which is not feasible for traces with millions of events. Instead, we collect the constraints from all the consecutive event pairs (complexity of *O*(|*E*|), which result in a *constraints* *graph* (Figure 1(c)), where each node corresponds to a thread and each edge weight is the constraint of the connected nodesN relative start times. For example, weight on the (3, 2) edge means that Thread 3 starts at most two time units after Thread 2. The constraints are then improved by constructing the *bounds* *graph,* where each node corresponds to a thread and each edge weight is the weight of the shortest path in the constraints graph between the nodes connected (Figure 1(d)).

The central result of this paper is to show that the constraints obtained by this process are identical to those derived from considering all event pairs. We then show how those constraints can be used to construct all the possible global times consistent with the trace. Another use of the constraints is to find how precisely can we determine threadsN relative timing based on the trace data. We also show how to adjust the constraints to account for the imprecision in the process of trace collection.

Foundations of Distance Based Phylogenetics

*Lior Pachter*

University of California, Berkeley (USA)

This email address is being protected from spambots. You need JavaScript enabled to view it.

Quartet MaxCut: Piecing Together Small Trees

**into the Large Tree of Life**

*Sagi Snir*

University of Haifa, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

The reconstruction of evolutionary trees (also known as phylogenies) is central to any problems in Biology. With the massive amounts of molecular data being produced, attention has increasingly turned to phylogenetic analyses of larger datasets, containing several hundreds to several thousand organisms. Key to this goal is the ability to estimate very accurate trees on different groups of organisms, and then combine these different trees into a tree on the full dataset.

Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet based reconstruction.

The question of approximating quartet reconstruction is open for over a decade. Previous attempts in the area were restricted to inputs of certain size or structure. When such premises do not hold, the best approximation result is 1/3 and is achieved naively by a random tree. In this talk, we show the first approximation algorithm for the problem that improves this result for both consistent and inconsistent inputs under the premise that the quartets are randomly chosen. The algorithm is based on solving MaxCut in a certain weighted graph induced by the input set and yields an approximation ratio of 0.425.

Based on joint works with Raphy Yuster and Satish Rao.

Reactive Search Optimization for Maximum Clique and Quasi-Cliques Problems in Graphs

*Roberto Battiti*

University of Trento, Italy

This email address is being protected from spambots. You need JavaScript enabled to view it.

The talk reviews results of an ongoing investigation about how different algorithmic building blocks contribute to solving the Maximum Clique and Maximum Quasi-Clique problems. We consider greedy constructions, plateau searches, and more complex schemes based on dynamic penalties and/or prohibitions, in particular Reactive Search Optimization. Reactive Search advocates the integration of sub-symbolic machine learning techniques into search heuristics for solving complex optimization problems.

The word reactive hints at a ready response to events during the search through an internal online feedback loop for the self-tuning of critical parameters. Methodologies of interest for Reactive Search include machine learning and statistics, in particular reinforcement learning, active or query learning, neural networks, and meta-heuristics (although the boundary signaled by the "meta" prefix is not always clear).

Affine and Projective Tree Metric Theorems

*Lior Pachter*

University of California, Berkeley (USA)

This email address is being protected from spambots. You need JavaScript enabled to view it.

Maximizing the Minimum Load: The Cost of Selfishness

*Elena Kleiman*

University of Haifa, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

We consider a scheduling problem where each job is controlled by a selfish agent, who is only interested in minimizing its own cost, which is defined as the total load on the machine that its job is assigned to. We consider the objective of maximizing the minimum load (cover) over the machines.

This objective is motivated by issues of Quality of Service and fair resource allocation. It is useful for describing systems where the complete system relies on keeping all the machines productive for as long as possible, as the entire system fails in case even one of the machines ceases to be active. From the networking aspect, this problem has applications to basic problems in network optimization such as fair bandwidth allocation. Consider pairs of terminal nodes that wish to communicate; we would like to allocate bandwidth to the connections in a way that no link unnecessarily suffers from starvation, and all links get a fair amount of resources. Another motivation is efficient routing of traffic. Consider parallel links between pairs of terminal nodes. Requests for shifting flow are assigned to the links. We are interested in having the loads of the links balanced, in the sense that each link should be assigned a reasonable amount of flow, compared to the other links.

Unlike the regular makespan minimization problem, which was extensively studied in a game theoretic context, this problem has not been considered in this setting before.

Probably the best known concept in game theory is that of the Nash equilibrium. This is a state which is stable in the sense that no agent can gain from unilaterally switching strategies.

We study the price of anarchy (POA) and the price of stability (POS). In our scheduling model, the POA is the worst case ratio between the social value (i.e., minimum delay of any machine, or cover) of an optimal schedule and the value of any pure Nash equilibrium. The POS is the worst case ratio between the social value of an optimal solution, and the value of the best pure Nash equilibrium. In addition, we study the mixed poa, where mixed Nash equilibria resulting from mixed strategies of players (and not only pure ones) are taken into account.

We show that on identical machines, the POA is at least 1.691 and at most 1.7, whereas POS =1. To achieve the upper bound of 1.7, we make an unusual use of weighting functions. Note that the POA for minimizing the makespan is larger (namely 2) although that problem appears to be in fact easier.

For uniformly related machines, we identify the cases in which these measures are unbounded, and analyze them for the remaining cases.

Joint work with Leah Epstein, Department of Mathematics, University of Haifa, Rob van Stee, Max-Planck-Institut für Informatik, Saarbrücken, Germany.

Maximum Flow in Directed Planar Graphs

**with Vertex Capacities**

*Haim Kaplan*, Yahav Nussbaum***

The Blavatnik School of Computer Science, Tel Aviv University, Israel

*This email address is being protected from spambots. You need JavaScript enabled to view it. ; **This email address is being protected from spambots. You need JavaScript enabled to view it.

The problem of finding maximum flow in a graph is a well-studied problem with applications in many fields, see the book of Ahuja, Magnanti and Orlin [1] for a survey. The maximum flow problem is also interesting if we restrict it to *planar* graphs, which are graphs that have an embedding in the plane without crossing edges. The case of planar graphs appears in many applications of the problem, for example road traffic or VLSI design. The special structure of planar graphs allows us to get simpler and more efficient algorithms for the problem.

In the maximum flow problem, usually the arcs of the graph have capacities which limit the amount of flow that may go through each arc. We study a version of the problem in which the vertices of the graph also have capacities, which limit the amount of flow that may enter each vertex. This version appears for example when computing vertex disjoint paths in graphs, and in other problems where the vertices model objects which have a capacity.

Ford and Fulkerson [3, Chapter I.11] studied this version of the problem. They suggested the following simple reduction to eliminate vertex capacities. We replace every vertex *v'* with a finite capacity *c* by two vertices *v'* and *v"*. The arcs that were directed into *v* now enter *v'*, and the arcs that were directed out of *v* now leave *v"*. We also add a new arc with capacity *c* from *v*' to *v"*. Unfortunately, this reduction does not preserve the planarity of the graph [6]. Consider for example the complete graph of four vertices, where one of the vertices has finite capacity. This graph is planar. If we apply the construction of Ford and Fulkerson we get a graph whose underlying undirected graph is the complete graph with 5 vertices. This graph is not planar by Kuratowski's Theorem.

The most efficient algorithm for maximum flow in directed planar graphs without vertex capacities, to date, was given by Borradaile and Klein [2]. Their paper also contains a survey of the history of the maximum flow problem on planar graphs. The time bound of the algorithm of [2] is *O*(*n* log *n*) where *n* is the number of vertices in the input graph. Borradaile and Klein ask whether their algorithm can be generalized to the case where the flow is subject to vertex capacities.

A planar graph is a *st-planar* graph if the source and the sink are on the same face. Hassin [4] gave an algorithm for the maximum flow problem in directed *st-*planar graphs without vertex capacities that can be implemented in *O*(*n*) time using the algorithm of [5].

Khuller and Naor [6] were the first to study the problem of maximum flow with vertex capacities in planar graphs. They gave various results, including an *O*(*n* ) time algorithm for finding the value of the maximum flow in *st*-planar graphs (which can be improved to *O*(*n*) time using the algorithm of [5]), an *O*(*n* log *n*) time algorithm for finding the maximum flow in *st*-planar graphs, an *O*(*n* log *n*) time algorithm for finding the value of the maximum flow in undirected planar graph, and an *O*(*n*^{1.5} log *n*)time algorithm for the same problem on directed planar graphs. If all vertices have unit capacities, then we get the vertex-disjoint paths problem. Ripphausen-Lipa et al. [7] solved this problem in *O*(*n*) time for undirected planar graph.

Recently, Zhang, Liang and Chen [8], used a construction similar to the one of [6] to find the maximum flow in undirected planar graph with vertex capacities in *O*(*n* log *n*) time. They also gave an *O*(*n*) time algorithm for undirected *st*-planar graphs. Zhang et al. also ask in their paper if there is an algorithm that solves the problem for directed planar graph.

In this work we answer [2] and [8], and show a linear time reduction of the problem of finding maximum flow in directed planar graphs with arc and vertex capacities, to the problem of finding maximum flow in directed graphs with only arc capacities. This problem is more general than the one for undirected planar graphs, since an undirected planar graph can be viewed as a special case of a directed planar graph, in which there are two opposite arcs between any pair of adjacent vertices.

We show how to apply the constructions of [6] and [8] to directed planar graphs. Given directed planar graph, we construct another directed planar graph without vertex capacities, such that we can transform a maximum flow in the new graph back to a maximum flow in the original graph. Since the new graph does not have vertex capacities we can find a maximum flow in it using the algorithm of [2] (or of [4] if it is an *st*-planar graph). The time bound of our reduction is linear in the size of the graph, therefore we show that vertex capacities do not increase the time complexity of the maximum flow problem also for planar graphs.

In addition, we show that the algorithm of [8] unfortunately has a flaw. We give an undirected graph in which this algorithm does not find a correct maximum flow. Therefore, in fact our algorithm is also the first to solve the problem for undirected graphs.

**References**

[1] R.K. Ahuja, T. L. Magnanti, and J. B. Orlin. *Network Flows: Theory, Algorithms and Applications*. Prentice-Hall, New Jersey, 1993.

[2] G. Borradaile and P. Klein. An *O*(*n* log *n*) algorithm for maximum *st*-flow in a directed planar graph *J. ACM*, to appear.

[3] L. R. Ford and D. R. Fulkerson. *Flows in Networks*. Princeton University Press, New Jersey, 1962.

[4] R. Hassin. Maximum flow in (*s*, *t*) planar networks. *Information Processing Letters* 13: 107, 1981.

[5] M.R. Henzinger, P. Klein, S. Rao, and S. Subramania. Faster shortest-path algorithms for planar graphs. *J. Comput. Syst. Sci.* 55: 3-23, 1997.

[6] S. Khuller and J. Naor. Flow in planar graphs with vertex capacities. *Algorithmica* 11: 200-225, 1994.

[7] H. Ripphausen-Lipa, D. Wagner, and K. Weihe. The vertex-disjoint Menger problem in planar graphs. *SIAM J. Comput.*, 26: 331-349, 1997.

[8] X. Zhang, W. Liang, and G. Chen. Computing maximum flows in undirected planar networks with both edge and vertex capacities. In *14 ^{th} Annual International Conference on Computing and Combinatorics (COCOON)*, Lecture Notes in Computer Science 5092: 577-586, 2008.

**Selection in Social Networks**

*Oren Ben-Zwi*, Danny Hermelin*, *

*Daniel Lokshtanov**, Ilan Newman**

*University of Haifa, Israel

**University of Bergen, Norway

*This email address is being protected from spambots. You need JavaScript enabled to view it.; This email address is being protected from spambots. You need JavaScript enabled to view it.; This email address is being protected from spambots. You need JavaScript enabled to view it.

**This email address is being protected from spambots. You need JavaScript enabled to view it.

The Target Set Selection problem proposed by Kempe, Kleinberg, and Tardos, gives a nice clean combinatorial formulation for many problems arising in economy, sociology, and medicine. Its input is a graph with vertex thresholds, the social network, and the goal is to find a subset of vertices, *the target set*, that "activates" a prespecified number of vertices in the graph. Activation of a vertex is defined via a so-called activation process as follows: Initially, all vertices in the target set become active. Then at each step *i *of the process, each vertex gets activated if the number of its active neighbors at iteration *i* - 1 exceeds its threshold. The activation process is "monotone" in the sense that once a vertex is activated, it remains active for the entire process.

Unsurprisingly perhaps, Target Set Selection is NP-complete. More surprising is the fact that both of its maximization and minimization variants turn out to be extremely hard to approximate, even for very restrictive special cases. The only known case for which the problem is known to have some sort of acceptable worst-case solution is the case where the given social network is a tree and the problem becomes polynomial-time solvable. In this paper, we attempt at extending this sparse landscape of tractable instances by considering the treewidth parameter of graphs. This parameter roughly measures the degree of tree-likeness of a given graph, e.g. the treewidth of a tree is 1, and has previously been used to tackle many classical NP-hard problems in the literature.

Our contribution is twofold: First, we present an algorithm for Target Set Selection running * *in n time, for graphs with *n* vertices and treewidth bounded by *w*. The algorithm utilizes various combinatorial properties of the problem; drifting somewhat from standard dynamic-programming algorithms for small treewidth graphs. Also, it can be adopted to much more general settings, including the case of directed graphs, weighted edges, and weighted vertices. On the other hand, we also show that it is highly unlikely to find an time algorithm for Target Set Selection, as this would imply a sub-exponential algorithm for all problems in SNP. Together with our upper bound result, this shows that the treewidth parameter determines the complexity of Target Set Selection to a large extent, and should be taken into consideration when tackling this problem in any scenario.

*Dietmar Cieslik*

University of Greifswald*, Germany*

This email address is being protected from spambots. You need JavaScript enabled to view it.

Steiner's Problem is the "Problem of shortest connectivity", that means, given a finite set of points in a metric space *X*, search for a network interconnecting these points with minimal length. This shortest network must be a tree and is called a Steiner Minimal Tree (SMT). It may contain vertices different from the points which are to be connected. Such points are called Steiner points. If we do not allow Steiner points, that means, we only connect certain pairs of the given points, we get a tree which is called a Minimum Spanning Tree (MST).

Steiner's Problem is very hard as well in combinatorial as in computational sense, but, on the other hand, the determination of an MST is simple. Consequently, we are interested in the greatest lower bound for the ratio between the lengths of these trees, which is called the Steiner ratio (of the space *X*):

* M*(*X*) := inf .

It is not hard to see, by Moore, that for Steiner ratio of every metric space

1 * m*(*X*, )

holds. Ivanov and Tuzhilin showed that for any real number between 0.5 and 1 there is a metric space with this Steiner ratio. This remains true, when we restrict ourselves to finite spaces.

The exact value for the Steiner ratio is only known for very few metric spaces, described by several authors:

- Ultrametric spaces have Steiner ratio 1.
- The plane with rectilinear norm has Steiner ratio 2/3.
- Phylogenetic spaces have Steiner ratio 1/2.
- The space of all convergent sequences with supremum norm has Steiner ratio 1/2.
- Hyperbolic surfaces have Steiner ratio 1/2.

Consequently, we look for estimates for the Steiner ratio in several metric spaces, given old and new results.

An interesting problem, but which seems as very difficult, is to determine the range of the Steiner ratio for *d*-dimensional Banach spaces, depending on the value *d*. More exactly, determine the best possible reals *c _{d}* and

*C*such that

_{d}*c _{d}* #

*m*(

*X*) #

*C*,

_{d}where *X* is a Banach space of dimension *d*.

Two conjectures: For *d* =2, 3...

*C _{d}* =

*m*(

*d*, 2),

where *m*(*d*, 2) denotes the Steiner ratio of the *d*-dimensional Euclidean space. And

*c _{d}* > 1/2.

For several metric spaces it is shown that they have the same Steiner ratio as the Euclidean spaces:

- (Rubinstein, Weng) The Steiner ratio for spheres.
- (Ivanov, Tuzhilin, Cieslik) The Steiner ratio for a flat two-dimensional torus, a flat Klein bottle, a projective plain having constant positive curvature.

In particular, in the talk it will be shown that the Steiner ratio of a *d*- dimensional Einstein-Riemann space depends only from the dimension *d*, and not from the specific choice of the matrix. More exactly: Let *M*(*d*, ) be a *d*-dimensional Einstein-Riemann space normed by positive definite matrix . Then

*m*(*M*(*d*, )) = *m*(*d*, 2),

where *m*(*d*, 2) denotes the Steiner ratio of the *d*-dimensional Euclidean space.

The key-idea will be a decomposition of the matrix .

This result shows the importance of the knowledge of *m*(*d*, 2). But the exact values of these quantities are not known. This is also true for the Euclidean plane, where we have the well-known conjecture by Gilbert and Pollak:

*m*(2, 2)= = 0:86602 ....

When the dimension go to infinity, the Steiner ratio decreases:

1=*m*(1,2)$*m*(2, 2)$*m*(3,2)$*m*(4,2)$...$ $0,615...,

whereby the last inequality was shown by Du.

Furthermore, Cieslik and Reisner showed that for an infinite-dimensional Banach space *X* the Steiner ratio is bounded by

0.5 # *m*(*X*) # inf {*m*(*d*, 2) : *d* positive integer} =

*Yuri Rabinovich*

University of Haifa, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

I shall outline the historic development of the theory of finite metric spaces in recent decades,

present some of its key problems (past and present), and give some idea about its methods and applications.

*Alain Hertz*

Ecole Polytechnique de Montréal, Canada

This email address is being protected from spambots. You need JavaScript enabled to view it.

A magnet is a pair *u,v* of adjacent vertices such that the proper neighbours of *u* are completely linked to the proper neighbours of *v*. It has been shown that one can reduce the graph by removing the two vertices *u, v* of a magnet and introducing a new vertex linked to all common neighbours of *u* and *v* without changing the stability number. We characterize a class of graphs such that by repeated use of magnets the graph is reduced to a stable set. A description in terms of forbidden subgraphs will be given as well as a polynomial algorithm.

** **

** **

**without Decompression**

** **

*Dana Shapira*

* *

Ashkelon Academic College, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

This talk deals with applications that process delta files directly without decompressing them first. Two problems are introduced: first, the *Compressed Delta Encoding* problem, which is the case of constructing a delta file directly from two given compressed files, without decompressing them. Second, the *Compressed Transitive Delta Encoding (CTDE)* problem, which is the paradigm of merging two delta files. In the Compressed Delta Encoding problem we concentrate on the case where the given files are compressed using LZSS technique, and propose solutions for the special cases involving substitutions only.

Given a source file *S* and two differencing files Δ*(S, T)* and Δ*(T, R)*, where Δ*(X,Y) *is used to denote the delta file of the target file *Y* with respect to the source file *X*, the objective of the *Compressed Transitive Delta Encoding* problem is to be able to construct *R*. This is intended for the scenario of upgrading software where intermediate releases are missing, or for the case of file system backups, where non consecutive versions must be recovered. The traditional way is to decompress Δ*(S, T)* in order to construct *T* and then apply Δ*(T, R) *on *T* and obtain R. The *Compressed Transitive Delta Encoding (CTDE)* paradigm, is to construct a delta file Δ*(S, R)* working directly on the two given delta files, Δ*(S,T)* and Δ*(T, R)*, without any decompression or the use of the base file *S*. A new algorithm for solving CTDE is proposed and its compression performance is compared against the traditional "double delta decompression". Not only does it use constant additional space, as opposed to the traditional method which uses linear additional memory storage, but experiments show that the size of the delta files involved is reduced by 15% on average.

** **

*Avivit Levy*

Shenkar College and CRI, University of Haifa, Israel

This email address is being protected from spambots. You need JavaScript enabled to view it.

An important implicit assumption in all traditional pattern matching problems is that there may indeed be errors in the content of the data, but the order of the data is inviolate. However, some nonconforming problems have been gnawing at the walls of this assumption. Such problems arise in different situations, such as: text editing, computational biology, Bit Torrent and video on demand and computer architecture. Therefore, the traditional sequential view of strings is becoming, at times, less natural. In such cases, it is more natural to view the string as a set of pairs (symbol, location). Given this view of strings, the problem of approximate pattern matching is reconsidered, and a new pattern matching paradigm is studied - pattern matching with address errors. In this paradigm we study pattern matching problems where the content is unaltered, and only the locations of the different entries may change.

In the talk I will discuss broad classes of problems of this type, describing the new operators and cost models that were studied, while introducing some novel techniques. The study of this new paradigm had also an important consequence, as a solution is given to an old open problem introduced by the mathematician Cayley in 1849.

Department of Software Engineering, Shenkar College, 12 Anna Frank, Ramat-Gan, 52526, Israel.

CRI, Haifa University, Mount Carmel, Haifa 31905, Israel.