Reports and Papers Archive
Database Integration Using Neural Networks: Implementation and Experiences
Applications in a wide variety of industries require access to multiple heterogeneous distributed databases. One step in heterogeneous database integration is semantic integration: identifying corresponding attributes in different databases that represent the same real world concept. The rules of semantic integration can not be ‘pre-programmed’ since the information to be accessed is heterogeneous and attribute correspondences could be fuzzy. Manually comparing all possible pairs of attributes is an unreasonably large task. We have applied artificial neural networks (ANNs) to this problem. Metadata describing attributes is automatically extracted from a database to represent their ‘signatures’. The metadata is used to train neural networks to find similar patterns of metadata describing corresponding attributes from other databases. In our system, the rules to determine corresponding attributes are discovered through machine learning. This paper describes how we applied neural network techniques in a database integration problem and how we represent an attribute with its metadata as discriminators. This paper focuses on our experiments on effectiveness of neural networks and each discriminator. We also discuss difficulties of using neural networks for this problem and our wish list for the Machine Learning community.
HyperFile: A Data and Query Model for Documents
Non-quantitative information such as documents and pictures pose interesting new problems in the database world. Traditional data models and query languages do not provide appropriate support for this information. Such data are typically stored in file systems, which do not provide the security, integrity, or query features of database management systems. The hypertext model has emerged as a good interface to this information; however finding information using hypertext browsing does not scale well. We develop a query interface that serves as an extension of the browsing model of hypertext systems. These queries minimize the repeated user interactions required to locate data in a standard hypertext system. HyperFile is a prototype data server interface. In this article, we describe HyperFile, including a number of issues such as query generation, query processing, and indexing.
Identifying Rare Classes with Sparse Training Data
Building models and learning patterns from a collection of data are essential tasks for decision making and dissemination of knowledge. One of the common tools to extract knowledge is to build a classifier. However, when the training dataset is sparse, it is difficult to build an accurate classifier. This is especially true in biological science, as biological data are hard to produce and error-prone. Through empirical results, this paper shows challenges in building an accurate classifier with a sparse biological training dataset. Our findings indicate the inadequacies in well known classification techniques. Although certain clustering techniques, such as seeded k-Means, show some promise, there are still spaces for further improvement. In addition, we propose a novel idea that could be used to produce more balanced classifier when training data samples are very limited.
Private Combinatorial Group Testing
Combinatorial group testing, given a set C of individuals (“customersâ€), consists of applying group tests on subsets of C for the purpose of identifying which members of C are infected (or, more generally, defective in some way). The outcome of a group test reveals only the presence or absence of infection(s) in that group, but a number of group tests exactly identifies all infected members.
Information Privacy in Organizations: Empowering Creative and Extra-role Performance
This article examines the relationship of employee perceptions of information privacy in their work organizations and important psychological and behavioral outcomes. A model is presented in which information privacy predicts psychological empowerment, which in turn predicts discretionary behaviors on the job, including creative performance and organizational citizenship behavior. Results from two studies (Study 1 single organization, N = 310; Study 2 multiple organizations, N = 303) confirm that information privacy entails judgments of information gathering control, information handling control, and legitimacy. Moreover, a model linking information privacy to empowerment, and empowerment to creative performance and OCBs was supported. Findings are discussed in light of organizational attempts to control employees through the gathering and handling of their personal information.
Remote Control: Predictors of Electronic Monitoring Intensity and Secrecy
Electronic monitoring research has focused predominantly on the reactions of monitored employees and less attention has been paid to the processes that trigger managers’ decisions to electronically monitor subordinates. Employing a distributed virtual team simulation, this study examined the effects of dependence, future performance expectations, and propensity to trust on team leaders’ decisions to electronically monitor their subordinates. Results indicate that team leaders electronically monitor subordinates more intensely when dependence on subordinates is high or future performance expectations are low. Moreover, team leaders are more likely to monitor in secret when dependence is high or propensity to trust is low. Although team leaders increased their level of electronic monitoring over time, this tendency was stronger when the leader had consistently low performance expectations. Reprinted by permission of the publisher.
When Does the Medium Matter? Knowledge-Building Experiences and Opportunities in Decision Teams.
The purpose of this investigation was to examine whether temporal scope—the extent to which teams have a past or expect to have a future together—affects face-to-face and computer-mediated teams’ ability to communicate effectively and make high quality decisions. Results indicated that media differences existed for teams lacking a history, with face-to-face teams exhibiting higher openness/trust and information sharing than computer-mediated teams. However, computer-mediated teams with a history were able to eliminate these differences. These findings did not extend to team-member exchange (TMX). Although face-to-face teams exhibited higher TMX compared to computer-mediated teams, the interaction of temporal scope and communication media was not significant. In addition, openness/trust and TMX were positively associated with decision-making effectiveness when task interdependence was high, but were unrelated to decision-making effectiveness when task interdependence was low.
Measuring Customer Service Orientation Using a Measure of Interpersonal Skills
Organizations are placing increased emphasis on identifying individuals with customer service orientation. In the present investigation we test whether interpersonal skills, as measured through Holland and Baird’‘s (1968) Interpersonal Competence Scale, provides a narrow, yet valid, measure of customer service orientation. Data were collected from a sample of bus transit operators. Interpersonal skills was positively related to operator self-reported performance, but was not related to supervisor ratings or objective measures of performance. Implications for the study and use of broad versus narrowly defined personality constructs in organizational settings are discussed.
The Effects of Dependence and Trust on The Decision to Electronically Monitor Subordinates
Electronic monitoring of employees is both controversial and on the rise. Unfortunately,research examining electronic monitoring has focused predominantly on the reactions of monitored employees. Little is known about the processes that trigger managers’ decisions to electronically monitor subordinates. Employing a distributed virtual team simulation, this study examined the effects of dependence and trust on managerial decisions to electronically monitor their subordinates. Results indicate that managers who are in higher dependence relationships with subordinates or have lower cognition-based trust in subordinates are more likely to engage in richer electronic monitoring of those subordinates. Moreover, although managers tend to increase the level of electronic monitoring over time, this tendency is stronger when cognition- based trust is low versus high. The implications of these results on electronic monitoring, trust, and cybernetic models of control in organizations are discussed.
Security and Privacy
Defining Privacy for Data Mining
Privacy preserving data mining – getting valid data mining results without learning the underlying data values –has been receiving attention in the research community and beyond. It is unclear what privacy preserving means. This paper provides a framework and metrics for discussing the meaning of privacy preserving data mining, as a foundation for further research in this field.
Transforming Semi-Honest Protocols to Ensure Accountability
Secure multi-party computation (SMC) balances the use and confidentiality of distributed data. This is especially important for privacy-preserving data mining (PPDM). Most secure multi-party computation protocols are only proven secure under the semi-honest model, providing insufficient security for many PPDM applications. SMC protocols under the malicious adversary model generally have impractically high complexities for PPDM. We propose an accountable computing (AC) framework that enables liability for privacy compromise to be assigned to the responsible party without the complexity and cost of an SMC-protocol under the malicious model. We show how to transform a circuitbased semi-honest two-party protocol into a simple and efficient protocol satisfying the AC-framework.
An Approach to Identifying Beneficial Collaboration Securely in Decentralized Logistics Systems
The problem of sharing manufacturing, inventory or capacity to improve performance is applicable in many decentralized operational contexts. However, solution of such problems commonly requires an intermediary or a broker to manage information security concerns of individual participants. Our goal is to examine use of cryptographic techniques to attain the same result without the use of a broker. To illustrate this approach, we focus on a problem faced by independent trucking companies that have separate pickup and delivery tasks and wish to identify potential efficiency enhancing task swaps while limiting the information the companies must reveal to identify these swaps. We present an algorithm that finds opportunities to swap loads without revealing any information except the loads swapped, along with proofs of the security of the protocol. We also show that it is incentive compatible for each company to both follow the protocol correctly as well as provide their true data. We apply this algorithm to an empirical dataset from a large transportation company and present results that suggest significant opportunities to improve efficiency through Pareto improving swaps. This paper uses cryptographic arguments in an operations management problem context to show how an algorithm can be proven incentive compatible as well as demonstrate the potential value of its use on an empirical dataset
Mitigating Attacks against Virtual Coordinate Based Routing in Wireless Sensor Networks
Virtual coordinate system (VCS) based routing provides a practical, efficient and scalable means for point-to-point routing in wireless sensor networks. Several VCS-based routing protocols have been proposed in the last few years, all assuming that nodes behave correctly. However, many applications require deploying sensor networks in adversarial environments, making VCS-based routing protocols vulnerable to numerous attacks.
In this paper, we study the security of VCS-based routing protocols. We first identify novel attacks targeting the underlying virtual coordinate system. The attacks can be mounted with little resource, yet are epidemic in nature and highly destructive to system performance. We then propose lightweight defense mechanisms against each of the identified attacks. Finally, we evaluate experimentally the impact of the attacks and the effectiveness of our defense mechanisms using a well-known VCS-based routing protocol, BVR.

