Branch-and-Bound Applications in Combinatorial Data Analysis

Branch-and-Bound Applications in Combinatorial Data Analysis PDF Author: Michael J. Brusco
Publisher: Springer Science & Business Media
ISBN: 9780387250373
Category : Business & Economics
Languages : en
Pages : 248

Get Book

Book Description
There are a variety of combinatorial optimization problems that are relevant to the examination of statistical data. Combinatorial problems arise in the clustering of a collection of objects, the seriation (sequencing or ordering) of objects, and the selection of variables for subsequent multivariate statistical analysis such as regression. The options for choosing a solution strategy in combinatorial data analysis can be overwhelming. Because some problems are too large or intractable for an optimal solution strategy, many researchers develop an over-reliance on heuristic methods to solve all combinatorial problems. However, with increasingly accessible computer power and ever-improving methodologies, optimal solution strategies have gained popularity for their ability to reduce unnecessary uncertainty. In this monograph, optimality is attained for nontrivially sized problems via the branch-and-bound paradigm. For many combinatorial problems, branch-and-bound approaches have been proposed and/or developed. However, until now, there has not been a single resource in statistical data analysis to summarize and illustrate available methods for applying the branch-and-bound process. This monograph provides clear explanatory text, illustrative mathematics and algorithms, demonstrations of the iterative process, psuedocode, and well-developed examples for applications of the branch-and-bound paradigm to important problems in combinatorial data analysis. Supplementary material, such as computer programs, are provided on the world wide web. Dr. Brusco is a Professor of Marketing and Operations Research at Florida State University, an editorial board member for the Journal of Classification, and a member of the Board of Directors for the Classification Society of North America. Stephanie Stahl is an author and researcher with years of experience in writing, editing, and quantitative psychology research.

Branch-and-Bound Applications in Combinatorial Data Analysis

Branch-and-Bound Applications in Combinatorial Data Analysis PDF Author: Michael J. Brusco
Publisher: Springer Science & Business Media
ISBN: 9780387250373
Category : Business & Economics
Languages : en
Pages : 248

Get Book

Book Description
There are a variety of combinatorial optimization problems that are relevant to the examination of statistical data. Combinatorial problems arise in the clustering of a collection of objects, the seriation (sequencing or ordering) of objects, and the selection of variables for subsequent multivariate statistical analysis such as regression. The options for choosing a solution strategy in combinatorial data analysis can be overwhelming. Because some problems are too large or intractable for an optimal solution strategy, many researchers develop an over-reliance on heuristic methods to solve all combinatorial problems. However, with increasingly accessible computer power and ever-improving methodologies, optimal solution strategies have gained popularity for their ability to reduce unnecessary uncertainty. In this monograph, optimality is attained for nontrivially sized problems via the branch-and-bound paradigm. For many combinatorial problems, branch-and-bound approaches have been proposed and/or developed. However, until now, there has not been a single resource in statistical data analysis to summarize and illustrate available methods for applying the branch-and-bound process. This monograph provides clear explanatory text, illustrative mathematics and algorithms, demonstrations of the iterative process, psuedocode, and well-developed examples for applications of the branch-and-bound paradigm to important problems in combinatorial data analysis. Supplementary material, such as computer programs, are provided on the world wide web. Dr. Brusco is a Professor of Marketing and Operations Research at Florida State University, an editorial board member for the Journal of Classification, and a member of the Board of Directors for the Classification Society of North America. Stephanie Stahl is an author and researcher with years of experience in writing, editing, and quantitative psychology research.

Assignment Methods in Combinational Data Analysis

Assignment Methods in Combinational Data Analysis PDF Author: Lawrence Hubert
Publisher: CRC Press
ISBN: 9780824776176
Category : Mathematics
Languages : en
Pages : 350

Get Book

Book Description
For the first time in one text, this handy pedagogical reference presents comprehensive inference strategies for organizing disparate nonparametric statistics topics under one scheme, illustrating ways of analyzing data sets based on generic notions of proximity (of "closeness") between objects. Assignment Methods in Combinatorial Data Analysis specifically reviews both linear and quadratic assignment models ... covers extensions to multiple object sets and higher-order assignment indices ... considers methods of applying linear assignment models in common data analysis contexts ... discusses a second motion of assignment (or "matching") based upon pairs of objects ... explores confirmatory methods of augmenting multidimensional sealing, cluster analysis, and related techniques ... labels sections in order of priority for continuity and convenience ... and includes extensive bibliographies of related literature. Assignment Methods in Combinatorial Data Analysis gives authoritative coverage of statistical testing, and measures of association in a single source. It is required reading and an invaluable reference for researchers and graduate students in the behavioral and social sciences using quantitative methods of data representation. Book jacket.

Combinatorial Data Analysis

Combinatorial Data Analysis PDF Author: Lawrence Hubert
Publisher: SIAM
ISBN: 0898714788
Category : Science
Languages : en
Pages : 172

Get Book

Book Description
Combinatorial data analysis refers to methods for the study of data sets where the arrangement of objects is central.

Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering

Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering PDF Author: Israël César Lerman
Publisher: Springer
ISBN: 1447167937
Category : Computers
Languages : en
Pages : 647

Get Book

Book Description
This book offers an original and broad exploration of the fundamental methods in Clustering and Combinatorial Data Analysis, presenting new formulations and ideas within this very active field. With extensive introductions, formal and mathematical developments and real case studies, this book provides readers with a deeper understanding of the mutual relationships between these methods, which are clearly expressed with respect to three facets: logical, combinatorial and statistical. Using relational mathematical representation, all types of data structures can be handled in precise and unified ways which the author highlights in three stages: Clustering a set of descriptive attributes Clustering a set of objects or a set of object categories Establishing correspondence between these two dual clusterings Tools for interpreting the reasons of a given cluster or clustering are also included. Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering will be a valuable resource for students and researchers who are interested in the areas of Data Analysis, Clustering, Data Mining and Knowledge Discovery.

Combinatorial Inference in Geometric Data Analysis

Combinatorial Inference in Geometric Data Analysis PDF Author: Brigitte Le Roux
Publisher: CRC Press
ISBN: 1498781624
Category : Mathematics
Languages : en
Pages : 256

Get Book

Book Description
Geometric Data Analysis designates the approach of Multivariate Statistics that conceptualizes the set of observations as a Euclidean cloud of points. Combinatorial Inference in Geometric Data Analysis gives an overview of multidimensional statistical inference methods applicable to clouds of points that make no assumption on the process of generating data or distributions, and that are not based on random modelling but on permutation procedures recasting in a combinatorial framework. It focuses particularly on the comparison of a group of observations to a reference population (combinatorial test) or to a reference value of a location parameter (geometric test), and on problems of homogeneity, that is the comparison of several groups for two basic designs. These methods involve the use of combinatorial procedures to build a reference set in which we place the data. The chosen test statistics lead to original extensions, such as the geometric interpretation of the observed level, and the construction of a compatibility region. Features: Defines precisely the object under study in the context of multidimensional procedures, that is clouds of points Presents combinatorial tests and related computations with R and Coheris SPAD software Includes four original case studies to illustrate application of the tests Includes necessary mathematical background to ensure it is self–contained This book is suitable for researchers and students of multivariate statistics, as well as applied researchers of various scientific disciplines. It could be used for a specialized course taught at either master or PhD level.

Seriation in Combinatorial and Statistical Data Analysis

Seriation in Combinatorial and Statistical Data Analysis PDF Author: Israël César Lerman
Publisher: Springer Nature
ISBN: 303092694X
Category : Computers
Languages : en
Pages : 287

Get Book

Book Description
This monograph offers an original broad and very diverse exploration of the seriation domain in data analysis, together with building a specific relation to clustering. Relative to a data table crossing a set of objects and a set of descriptive attributes, the search for orders which correspond respectively to these two sets is formalized mathematically and statistically. State-of-the-art methods are created and compared with classical methods and a thorough understanding of the mutual relationships between these methods is clearly expressed. The authors distinguish two families of methods: Geometric representation methods Algorithmic and Combinatorial methods Original and accurate methods are provided in the framework for both families. Their basis and comparison is made on both theoretical and experimental levels. The experimental analysis is very varied and very comprehensive. Seriation in Combinatorial and Statistical Data Analysis has a unique character in the literature falling within the fields of Data Analysis, Data Mining and Knowledge Discovery. It will be a valuable resource for students and researchers in the latter fields.

Fine-grained complexity analysis of some combinatorial data science problems

Fine-grained complexity analysis of some combinatorial data science problems PDF Author: Froese, Vincent
Publisher: Universitätsverlag der TU Berlin
ISBN: 3798330034
Category : Computers
Languages : en
Pages : 185

Get Book

Book Description
This thesis is concerned with analyzing the computational complexity of NP-hard problems related to data science. For most of the problems considered in this thesis, the computational complexity has not been intensively studied before. We focus on the complexity of computing exact problem solutions and conduct a detailed analysis identifying tractable special cases. To this end, we adopt a parameterized viewpoint in which we spot several parameters which describe properties of a specific problem instance that allow to solve the instance efficiently. We develop specialized algorithms whose running times are polynomial if the corresponding parameter value is constant. We also investigate in which cases the problems remain intractable even for small parameter values. We thereby chart the border between tractability and intractability for some practically motivated problems which yields a better understanding of their computational complexity. In particular, we consider the following problems. General Position Subset Selection is the problem to select a maximum number of points in general position from a given set of points in the plane. Point sets in general position are well-studied in geometry and play a role in data visualization. We prove several computational hardness results and show how polynomial-time data reduction can be applied to solve the problem if the sought number of points in general position is very small or very large. The Distinct Vectors problem asks to select a minimum number of columns in a given matrix such that all rows in the selected submatrix are pairwise distinct. This problem is motivated by combinatorial feature selection. We prove a complexity dichotomy with respect to combinations of the minimum and the maximum pairwise Hamming distance of the rows for binary input matrices, thus separating polynomial-time solvable from NP-hard cases. Co-Clustering is a well-known matrix clustering problem in data mining where the goal is to partition a matrix into homogenous submatrices. We conduct an extensive multivariate complexity analysis revealing several NP-hard and some polynomial-time solvable and fixed-parameter tractable cases. The generic F-free Editing problem is a graph modification problem in which a given graph has to be modified by a minimum number of edge modifications such that it does not contain any induced subgraph isomorphic to the graph F. We consider three special cases of this problem: The graph clustering problem Cluster Editing with applications in machine learning, the Triangle Deletion problem which is motivated by network cluster analysis, and Feedback Arc Set in Tournaments with applications in rank aggregation. We introduce a new parameterization by the number of edge modifications above a lower bound derived from a packing of induced forbidden subgraphs and show fixed-parameter tractability for all of the three above problems with respect to this parameter. Moreover, we prove several NP-hardness results for other variants of F-free Editing for a constant parameter value. The problem DTW-Mean is to compute a mean time series of a given sample of time series with respect to the dynamic time warping distance. This is a fundamental problem in time series analysis the complexity of which is unknown. We give an exact exponential-time algorithm for DTW-Mean and prove polynomial-time solvability for the special case of binary time series. Diese Dissertation befasst sich mit der Analyse der Berechnungskomplexität von NP-schweren Problemen aus dem Bereich Data Science. Für die meisten der hier betrachteten Probleme wurde die Berechnungskomplexität bisher nicht sehr detailliert untersucht. Wir führen daher eine genaue Komplexitätsanalyse dieser Probleme durch, mit dem Ziel, effizient lösbare Spezialfälle zu identifizieren. Zu diesem Zweck nehmen wir eine parametrisierte Perspektive ein, bei der wir bestimmte Parameter definieren, welche Eigenschaften einer konkreten Probleminstanz beschreiben, die es ermöglichen, diese Instanz effizient zu lösen. Wir entwickeln dabei spezielle Algorithmen, deren Laufzeit für konstante Parameterwerte polynomiell ist. Darüber hinaus untersuchen wir, in welchen Fällen die Probleme selbst bei kleinen Parameterwerten berechnungsschwer bleiben. Somit skizzieren wir die Grenze zwischen schweren und handhabbaren Probleminstanzen, um ein besseres Verständnis der Berechnungskomplexität für die folgenden praktisch motivierten Probleme zu erlangen. Beim General Position Subset Selection Problem ist eine Menge von Punkten in der Ebene gegeben und das Ziel ist es, möglichst viele Punkte in allgemeiner Lage davon auszuwählen. Punktmengen in allgemeiner Lage sind in der Geometrie gut untersucht und spielen unter anderem im Bereich der Datenvisualisierung eine Rolle. Wir beweisen etliche Härteergebnisse und zeigen, wie das Problem mittels Polynomzeitdatenreduktion gelöst werden kann, falls die Anzahl gesuchter Punkte in allgemeiner Lage sehr klein oder sehr groß ist. Distinct Vectors ist das Problem, möglichst wenige Spalten einer gegebenen Matrix so auszuwählen, dass in der verbleibenden Submatrix alle Zeilen paarweise verschieden sind. Dieses Problem hat Anwendungen im Bereich der kombinatorischen Merkmalsselektion. Wir betrachten Kombinationen aus maximalem und minimalem paarweisen Hamming-Abstand der Zeilenvektoren und beweisen eine Komplexitätsdichotomie für Binärmatrizen, welche die NP-schweren von den polynomzeitlösbaren Kombinationen unterscheidet. Co-Clustering ist ein bekanntes Matrix-Clustering-Problem aus dem Gebiet Data-Mining. Ziel ist es, eine Matrix in möglichst homogene Submatrizen zu partitionieren. Wir führen eine umfangreiche multivariate Komplexitätsanalyse durch, in der wir zahlreiche NP-schwere, sowie polynomzeitlösbare und festparameterhandhabbare Spezialfälle identifizieren. Bei F-free Editing handelt es sich um ein generisches Graphmodifikationsproblem, bei dem ein Graph durch möglichst wenige Kantenmodifikationen so abgeändert werden soll, dass er keinen induzierten Teilgraphen mehr enthält, der isomorph zum Graphen F ist. Wir betrachten die drei folgenden Spezialfälle dieses Problems: Das Graph-Clustering-Problem Cluster Editing aus dem Bereich des Maschinellen Lernens, das Triangle Deletion Problem aus der Netzwerk-Cluster-Analyse und das Problem Feedback Arc Set in Tournaments mit Anwendungen bei der Aggregation von Rankings. Wir betrachten eine neue Parametrisierung mittels der Differenz zwischen der maximalen Anzahl Kantenmodifikationen und einer unteren Schranke, welche durch eine Menge von induzierten Teilgraphen bestimmt ist. Wir zeigen Festparameterhandhabbarkeit der drei obigen Probleme bezüglich dieses Parameters. Darüber hinaus beweisen wir etliche NP-Schwereergebnisse für andere Problemvarianten von F-free Editing bei konstantem Parameterwert. DTW-Mean ist das Problem, eine Durchschnittszeitreihe bezüglich der Dynamic-Time-Warping-Distanz für eine Menge gegebener Zeitreihen zu berechnen. Hierbei handelt es sich um ein grundlegendes Problem der Zeitreihenanalyse, dessen Komplexität bisher unbekannt ist. Wir entwickeln einen exakten Exponentialzeitalgorithmus für DTW-Mean und zeigen, dass der Spezialfall binärer Zeitreihen in polynomieller Zeit lösbar ist.

Analysis and Design of Algorithms for Combinatorial Problems

Analysis and Design of Algorithms for Combinatorial Problems PDF Author: G. Ausiello
Publisher: Elsevier
ISBN: 9780080872209
Category : Mathematics
Languages : en
Pages : 318

Get Book

Book Description
Combinatorial problems have been from the very beginning part of the history of mathematics. By the Sixties, the main classes of combinatorial problems had been defined. During that decade, a great number of research contributions in graph theory had been produced, which laid the foundations for most of the research in graph optimization in the following years. During the Seventies, a large number of special purpose models were developed. The impressive growth of this field since has been strongly determined by the demand of applications and influenced by the technological increases in computing power and the availability of data and software. The availability of such basic tools has led to the feasibility of the exact or well approximate solution of large scale realistic combinatorial optimization problems and has created a number of new combinatorial problems.

Topological and Statistical Methods for Complex Data

Topological and Statistical Methods for Complex Data PDF Author: Janine Bennett
Publisher: Springer
ISBN: 3662449005
Category : Mathematics
Languages : en
Pages : 297

Get Book

Book Description
This book contains papers presented at the Workshop on the Analysis of Large-scale, High-Dimensional, and Multi-Variate Data Using Topology and Statistics, held in Le Barp, France, June 2013. It features the work of some of the most prominent and recognized leaders in the field who examine challenges as well as detail solutions to the analysis of extreme scale data. The book presents new methods that leverage the mutual strengths of both topological and statistical techniques to support the management, analysis, and visualization of complex data. It covers both theory and application and provides readers with an overview of important key concepts and the latest research trends. Coverage in the book includes multi-variate and/or high-dimensional analysis techniques, feature-based statistical methods, combinatorial algorithms, scalable statistics algorithms, scalar and vector field topology, and multi-scale representations. In addition, the book details algorithms that are broadly applicable and can be used by application scientists to glean insight from a wide range of complex data sets.

Handbook of Discrete and Combinatorial Mathematics

Handbook of Discrete and Combinatorial Mathematics PDF Author: Kenneth H. Rosen
Publisher: CRC Press
ISBN: 1584887818
Category : Mathematics
Languages : en
Pages : 1612

Get Book

Book Description
Handbook of Discrete and Combinatorial Mathematics provides a comprehensive reference volume for mathematicians, computer scientists, engineers, as well as students and reference librarians. The material is presented so that key information can be located and used quickly and easily. Each chapter includes a glossary. Individual topics are covered in sections and subsections within chapters, each of which is organized into clearly identifiable parts: definitions, facts, and examples. Examples are provided to illustrate some of the key definitions, facts, and algorithms. Some curious and entertaining facts and puzzles are also included. Readers will also find an extensive collection of biographies. This second edition is a major revision. It includes extensive additions and updates. Since the first edition appeared in 1999, many new discoveries have been made and new areas have grown in importance, which are covered in this edition.