MetaTOC stay on top of your field, easily

Journal of Information Science

Impact factor: 1.238 5-Year impact factor: 1.384 Print ISSN: 0165-5515 Publisher: Sage Publications

Subject: Information Science & Library Science

Most recent papers:

  • An ensemble scheme based on language function analysis and feature engineering for text genre classification.
    Onan, A.
    Journal of Information Science. December 05, 2016

    Text genre classification is the process of identifying functional characteristics of text documents. The immense quantity of text documents available on the web can be properly filtered, organised and retrieved with the use of text genre classification, which may have potential use on several other tasks of natural language processing and information retrieval. Genre may refer to several aspects of text documents, such as function and purpose. The language function analysis (LFA) concentrates on single aspect of genres and it aims to classify text documents into three abstract classes, such as expressive, appellative and informative. Text genre classification is typically performed by supervised machine learning algorithms. The extraction of an efficient feature set to represent text documents is an essential task for building a robust classification scheme with high predictive performance. In addition, ensemble learning, which combines the outputs of individual classifiers to obtain a robust classification scheme, is a promising research field in machine learning research. In this regard, this article presents an extensive comparative analysis of different feature engineering schemes (such as features used in authorship attribution, linguistic features, character n-grams, part of speech n-grams and the frequency of the most discriminative words) and five different base learners (Naïve Bayes, support vector machines, logistic regression, k-nearest neighbour and Random Forest) in conjunction with ensemble learning methods (such as Boosting, Bagging and Random Subspace). Based on the empirical analysis, an ensemble classification scheme is presented, which integrates Random Subspace ensemble of Random Forest with four types of features (features used in authorship attribution, character n-grams, part of speech n-grams and the frequency of the most discriminative words). For LFA corpus, the highest average predictive performance obtained by the proposed scheme is 94.43%.

    December 05, 2016   doi: 10.1177/0165551516677911   open full text
  • Extraction of protein-protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings.
    Choi, S.-P.
    Journal of Information Science. November 14, 2016

    The automatic extraction of protein–protein interactions (PPIs) reported in scientific publications are of great significance for biomedical researchers in that they could efficiently grasp the recent research results about biochemical events and molecular processes for conducting their original studies. This article introduces a deep convolutional neural network (DCNN) equipped with various feature embeddings to battle the limitations of the existing machine learning-based PPI extraction methods. The proposed model learns and optimises word embeddings based on the publicly available word vectors and also exploits position embeddings to identify the locations of the target protein names in sentences. Furthermore, it can employ various linguistic feature embeddings to improve the PPI extraction. The intensive experiments using AIMed data set known as the most difficult collection not only show the superiority of the suggested model but also indicate important implications in optimising the network parameters and hyperparameters.

    November 14, 2016   doi: 10.1177/0165551516673485   open full text
  • Bayesian Naïve Bayes classifiers to text classification.
    Xu, S.
    Journal of Information Science. November 14, 2016

    Text classification is the task of assigning predefined categories to natural language documents, and it can provide conceptual views of document collections. The Naïve Bayes (NB) classifier is a family of simple probabilistic classifiers based on a common assumption that all features are independent of each other, given the category variable, and it is often used as the baseline in text classification. However, classical NB classifiers with multinomial, Bernoulli and Gaussian event models are not fully Bayesian. This study proposes three Bayesian counterparts, where it turns out that classical NB classifier with Bernoulli event model is equivalent to Bayesian counterpart. Finally, experimental results on 20 newsgroups and WebKB data sets show that the performance of Bayesian NB classifier with multinomial event model is similar to that of classical counterpart, but Bayesian NB classifier with Gaussian event model is obviously better than classical counterpart.

    November 14, 2016   doi: 10.1177/0165551516677946   open full text
  • Fast prediction of web user browsing behaviours using most interesting patterns.
    Sisodia, D. S., Khandal, V., Singhal, R.
    Journal of Information Science. November 01, 2016

    The prediction of users’ browsing behaviours is essential for putting appropriate information on the web. The browsing behaviours are stored as navigational patterns in web server logs. These weblogs are used to predict the frequently accessed patterns of web users, which can be used to predict user behaviour and to collect business intelligence. However, owing to the exponentially increasing weblog size, existing implementations of frequent-pattern-mining algorithms often take too much time and generate too many redundant patterns. This article introduces the most interesting pattern-based parallel FP-growth (MIP-PFP) algorithm. MIP-PFP is an improved implementation of the parallel FP-growth algorithm and implemented on the Apache Spark platform for extracting frequent patterns from huge weblogs. Experiments were performed on openly available National Aeronautics and Space Administration (NASA) weblog data to test the effectiveness of the MIP-PFP algorithm. The results were compared with existing implementation of PFP algorithms. The results suggest that the MIP-PFP algorithm running on Apache Spark reduced the execution time by a factor of more than 10 times. The effect of sequence length that has been used as input to the MIP-PFP algorithm was also evaluated with different interestingness parameters including support, confidence, lift, leverage, cosine, and conviction. It is observed from experimental results that only sequences of length greater than three produced a very low value of support for these interestingness measures.

    November 01, 2016   doi: 10.1177/0165551516673293   open full text
  • News events prediction using Markov logic networks.
    Dami, S., Barforoush, A. A., Shirazi, H.
    Journal of Information Science. November 01, 2016

    Predicting future events from text data has been a controversial and much disputed topic in the field of text analytics. However, far too little attention has been paid to efficient prediction in textual environments. This study has aimed to develop a novel and efficient method for news event prediction. The proposed method is based on Markov logic networks (MLNs) framework, which enables us to concisely represent complex events by full expressivity of first-order logic (FOL), as well as to reason uncertain event with probabilities. In our framework, we first extract text news events via an event representation model at a semantic level and then transform them into web ontology language (OWL) as a posteriori knowledge. A set of domain-specific causal rules in FOL associated with weights were also fed into the system as a priori (common-sense) knowledge. Additionally, several large-scale ontologies including DBpedia, VerbNet and WordNet were used to model common-sense logic rules as contextual knowledge. Finally, all types of such knowledge were integrated into OWL for performing causal inference. The resulted OWL knowledge base is augmented by MLN, which uses weighted first-order formulas to represent probabilistic knowledge. Empirical evaluation of real news showed that our method of news event prediction was better than the baselines in terms of precision, coverage and diversity.

    November 01, 2016   doi: 10.1177/0165551516673285   open full text
  • Enabling smart objects discovery via constructing hypergraphs of heterogeneous IoT interactions.
    Jung, J., Chun, S., Jin, X., Lee, K.-H.
    Journal of Information Science. October 27, 2016

    Recent advances in the Internet of Things (IoT) have led to the rise of a new paradigm: Social Internet of Things (SIoT). However, the new paradigm, as inspired by the idea that smart objects will soon have a certain degree of social consciousness, is still in its infant state for several reasons. Most of the related works are far from embracing the social aspects of smart objects and the dynamicity of inter-object social relations. Furthermore, there is yet to be a coherent structure for organising and managing IoT objects that elicit social-like features. To fully understand how and to what extent these objects mimic the behaviours of humans, we first model SIoT by scrutinising the distinct characteristics and structural facets of human-centric social networks. To elaborate, we describe the process of profiling the IoT objects that become social and classify various inter-object social relationships. Afterwards, a novel discovery mechanism, which utilises our hypergraph-based overlay network model, is proposed. To test the feasibility of the proposed approach, we have performed several experiments on our smart home automation demo box built with various sensors and actuators.

    October 27, 2016   doi: 10.1177/0165551516674164   open full text
  • User identification across online social networks in practice: Pitfalls and solutions.
    Esfandyari, A., Zignani, M., Gaito, S., Rossi, G. P.
    Journal of Information Science. October 21, 2016

    To take advantage of the full range of services that online social networks (OSNs) offer, people commonly open several accounts on diverse OSNs where they leave lots of different types of profile information. The integration of these pieces of information from various sources can be achieved by identifying individuals across social networks. In this article, we address the problem of user identification by treating it as a classification task. Relying on common public attributes available through the official application programming interface (API) of social networks, we propose different methods for building negative instances that go beyond usual random selection so as to investigate the effectiveness of each method in training the classifier. Two test sets with different levels of discrimination are set up to evaluate the robustness of our different classifiers. The effectiveness of the approach is measured in real conditions by matching profiles gathered from Google+, Facebook and Twitter.

    October 21, 2016   doi: 10.1177/0165551516673480   open full text
  • An adaptive plan-based approach to integrating semantic streams with remote RDF data.
    Chun, S., Jung, J., Seo, S., Ro, W., Lee, K.-H.
    Journal of Information Science. October 19, 2016

    To satisfy a user’s complex requirements, Resource Description Framework (RDF) Stream Processing (RSP) systems envision the fusion of remote RDF data with semantic streams, using common data models to query semantic streams continuously. While streaming data are changing at a high rate and are pushed into RSP systems, the remote RDF data are retrieved from different remote sources. With the growth of SPARQL endpoints that provide access to remote RDF data, RSP systems can easily integrate the remote data with streams. Such integration provides new opportunities for mixing static (or quasi-static) data with streams on a large scale. However, the current RSP systems do not offer any optimisation for the integration. In this article, we present an adaptive plan-based approach to efficiently integrate sematic streams with the static data from a remote source. We create a query execution plan based on temporal constraints among constituent services for the timely acquisition of remote data. To predict the change of remote sources in real time, we propose an adaptive process of detecting a source update, forecasting the update in the future, deciding a new plan to obtain remote data and reacting to a new plan. We extend a SPARQL query with operators for describing the multiple strategies of the proposed adaptive process. Experimental results show that our approach is more efficient than the conventional RSP systems in distributed settings.

    October 19, 2016   doi: 10.1177/0165551516670278   open full text
  • Detecting the association of health problems in consumer-level medical text.
    Chen, C., Huang, E., Yan, H.
    Journal of Information Science. October 19, 2016

    Consumers usually do not know the complicated links between related health problems. This fact may cause troubles when they wish to seek complete information regarding such problems. This study detects the associations among health problems by extending the meaning of health terms with methods based on the latent Dirichlet allocation (LDA) probability topic model, the Medical Subject Headings (MeSH) thesaurus structure and the Wikipedia concept mapping. The terms represented health problems are selected from and extended by the consumer-level medical text. The vocabulary is different between the consumer-level and the professional-level medical text. Thus, the findings can be easily understood by the general public and be suitable to consumer-oriented applications. The methods were evaluated in two ways: (1) correlation analysis with expert rating to show the overall performance and (2) P@N to reflect the ability of detecting strong associations. The LDA topic-model-based method outperforms the other two types. The judgment incongruence between the best method and the expert ratings has been examined, and the evidence shows that the automatic method sometimes detects real associations beyond those identified by human experts.

    October 19, 2016   doi: 10.1177/0165551516671629   open full text
  • Theres no shortcut: Building understanding from information in ultrarunning.
    Gorichanaz, T.
    Journal of Information Science. October 13, 2016

    Now that information proliferates, information science should turn its attention towards higher order epistemic aims, such as understanding. Before systems to support the building of understanding can be designed, the process of building understanding must be explored. This article discusses the findings from an interpretative phenomenological analysis study on the information experience of participants in a 100-mile footrace which reveal how these participants have built understanding in their athletic pursuits. Three ways in which ultrarunners build understanding – by taking time, by undergoing struggle and by incorporating multiple perspectives – are described. The ensuing discussion leads to three questions that can guide the future development of information systems that support understanding: First, how can information science slow people down? Second, how can information science encourage people to willingly struggle? And, third, how can information science stimulate analogical thinking?

    October 13, 2016   doi: 10.1177/0165551516670099   open full text
  • Why do online consumers experience information overload? An extension of communication theory.
    Li, C.-Y.
    Journal of Information Science. October 13, 2016

    People surfing the Internet are faced with an onslaught of messages from multiple sources, which can overwhelm receivers. In contrast to previous studies, which have used ‘choice overload’ to represent the amount of information provided to consumers, this study used ‘information overload’ theory to represent the abundance of information received by consumers in online shopping environments. Borrowing from the concepts of the communication model, this study investigated the antecedents of perceived information overload, including information characteristics (message), the information source, the system interface (channel) and recipients’ motivation (receiver). A total of 15 adults with more than 3 years of online shopping experience participated in a focus group discussion. By integrating focus group results and the results of previous studies into a theoretical framework, this study developed and empirically tested a structural equation model of online information overload among 456 PChome customers. The results indicated that the complexity and ambiguity of information characteristics, number of brand alternatives offered by the information source and system interface all positively affect consumers’ perceived information overload. Furthermore, information recipients’ motivation not only negatively affected consumers’ perceived information overload but also moderated the relationship between the number of brand alternatives and consumers’ perceived information overload.

    October 13, 2016   doi: 10.1177/0165551516670096   open full text
  • Overall quality assessment of SKOS thesauri: An AHP-based approach.
    Quarati, A., Albertoni, R., De Martino, M.
    Journal of Information Science. October 12, 2016

    The article proposes a methodology for a thesauri quality assessment that supports decision-makers in selecting thesauri by exploiting an overall quality measure. This measure takes into account the subjective perceptions of the decision-maker according to the reuse of thesauri in a specific application context. The analytic hierarchy process methodology is adopted to capture both subjective and objective facets involved in the thesauri quality assessment, thus providing a ranking of the thesauri assessed. Our methodology is applied to a set of thesauri by using user-driven application contexts. A step-by-step explanation of how the approach supports the decision process in the creation, maintenance and exploitation of a framework of linked thesauri is provided.

    October 12, 2016   doi: 10.1177/0165551516671079   open full text
  • Effects of relevance criteria and subjective factors on web image searching behaviour.
    Hamid, R. A., Thom, J. A., Iskandar, D. A.
    Journal of Information Science. September 13, 2016

    Searching for images is an everyday activity. Nevertheless, even a highly skilled searcher often struggles to find what they are looking for. This article studies the factors that affect users’ online web image search behaviour, investigating (1) the use of criteria in making image relevance judgements and (2) the effect of familiarity, difficulty and satisfaction. The study includes 48 users who performed four online image search tasks using Google Images. Simulated work scenarios, questionnaires and screen capture recordings were used to collect data of their image search behaviour. The results show in judging image relevance, users may apply similar criterion, however, the importance of these criteria depends on the type of image search. Similarly, ratings of users’ perception on subjective aspects of performing image search shows they were task dependent. Users’ perception on subjective aspects of performing image search did not always correspond with their actual search behaviour. Correlation analysis shows that subjective factors cannot be definitively measured by using only one component of search behaviour. Future work includes further analysis on the effects of topic familiarity and satisfaction.

    September 13, 2016   doi: 10.1177/0165551516666968   open full text
  • Modelling information diffusion based on non-dominated friends in social networks.
    Mozafari, N., Hamzeh, A., Hashemi, S.
    Journal of Information Science. September 13, 2016

    In recent years, social networks have played a strong role in diffusing information among people all around the globe. Therefore, the ability to analyse the diffusion pattern is essential. A diffusion model can identify the information dissemination pattern in a social network. One of the most important components of a diffusion model is information perception which determines the source each node receives its information from. Previous studies have assumed information perception to be just based on a single factor, that is, each individual receives information from their friend with the highest amount of information, whereas in reality, there exist other factors, such as trust, that affect the decision of people for selecting the friend who would supply information. These factors might be in conflict with each other, and modelling diffusion process with respect to a single factor can give rise to unacceptable results with respect to the other factors. In this article, we propose a novel information diffusion model based on non-dominated friends (IDNDF). Non-dominated friends are a set of friends of a node for whom there is no friend better than them in the set based on all considered factors, considering different factors simultaneously significantly enhance the proposed information diffusion model. Moreover, our model gives a chance to all non-dominated friends to be selected. Also, IDNDF allows having partial knowledge by each node of the social network. Finally, IDNDF is applicable to different types of data, including well-known real social networks like Epinions, WikiPedia, Advogato and so on. Extensive experiments are performed to assess the performance of the proposed model. The results show the efficiency of the IDNDF in diffusion of information in varieties of social networks.

    September 13, 2016   doi: 10.1177/0165551516667656   open full text
  • Developing information quality assessment framework of presentation slides.
    Kim, S., Lee, J.-G., Yi, M. Y.
    Journal of Information Science. September 07, 2016

    Computerized presentation slides have become essential for many occasions such as business meetings, classroom discussions, multipurpose talks and public events. Given the tremendous increases in online resources and materials, locating high-quality slides relevant to a given task is often a formidable challenge, particularly when a user looks for superior quality slides. This study proposes a new, comprehensive framework for information quality (IQ) developed specifically for computerized presentation slides and explores the possibility of automatically detecting the IQ of slides. To determine slide-specific IQ criteria as well as their relative importances, we carried out a user study, involving 60 participants from two universities, and conducted extensive coding analysis. Further, we subsequently conducted a series of multiple experiments to examine the validity of the IQ features developed on the basis of the selected criteria from the user study. The study findings contribute to identifying key dimensions and related features that can improve effective IQ assessments of computerized presentation slides.

    September 07, 2016   doi: 10.1177/0165551516661917   open full text
  • Mining opinionated product features using WordNet lexicographer files.
    Alrababah, S. A. A., Gan, K. H., Tan, T.-P.
    Journal of Information Science. September 07, 2016

    Online customer reviews are an important assessment tool for businesses as they contain feedback that is valuable from the customer perspective. These reviews provide a significant basis on which potential customers can select the product that best meets their preferences. In online reviews, customers describe positive or negative experiences with a product or service or any part of it (i.e. features). Consumers frequently experience difficulty finding the desired product for comparison because of the massive number of online reviews. The automatic extraction of important product features is necessary to support customers in search of relevant product features. These features are the criteria that make it possible for customers to characterise different types of products. This article proposes a domain independent approach for identifying explicit opinionated features and attributes that are strongly related to a specific domain product using lexicographer files in WordNet. In our approach, N_gram analysis and the SentiStrength opinion lexicon have been employed to support the extraction of opinionated features. The empirical evaluation of the proposed system using online reviews of two popular datasets of supervised and unsupervised systems showed that our approach achieved competitive results for feature extraction from product reviews.

    September 07, 2016   doi: 10.1177/0165551516667651   open full text
  • Explore the research front of a specific research theme based on a novel technique of enhanced co-word analysis.
    Li, M., Chu, Y.
    Journal of Information Science. September 02, 2016

    Discovering the research front of a specific topic remains a significant challenge for researchers in all scientific areas. Over the last decade, burst term detection (BTD) in text streams has become a useful technique for bibliometrics and science mapping. It has been argued that analytical methods based on BTD can indicate certain facets of a research front. To integrate BTD into the framework of traditional co-word analysis, association rule mining between keywords and burst terms (ARM-KB) is introduced to enhance traditional co-word analysis and present a new facet of the research front for a field of science. Based on ARM-KB, possible connections between keywords and burst terms are built, which can facilitate the exploration of a research front from a three-dimensional perspective, through co-word analysis, burst term clues, and association rules. In the case study, the research fronts of anticancer based on nanomedicine (ABN) are explored. Based on theoretical and empirical analyses, ARM-KB can be used as a valuable new technique or a supplement to traditional bibliometrics in the exploration of scientific frontiers.

    September 02, 2016   doi: 10.1177/0165551516661914   open full text
  • A novel algorithm for extracting the user reviews from web pages.
    Ucar, E., Uzun, E., Tüfekci, P.
    Journal of Information Science. September 02, 2016

    Extracting the user reviews in websites such as forums, blogs, newspapers, commerce, trips, etc. is crucial for text processing applications (e.g. sentiment analysis, trend detection/monitoring and recommendation systems) which are needed to deal with structured data. Traditional algorithms have three processes consisting of Document Object Model (DOM) tree creation, extraction of features obtained from this tree and machine learning. However, these algorithms increase time complexity of extraction process. This study proposes a novel algorithm that involves two complementary stages. The first stage determines which HTML tags correspond to review layout for a web domain by using the DOM tree as well as its features and decision tree learning. The second stage extracts review layout for web pages in a web domain using the found tags obtained from the first stage. This stage is more time-efficient, being approximately 21 times faster compared to the first stage. Moreover, it achieves a relatively high accuracy of 96.67% in our experiments of review block extraction.

    September 02, 2016   doi: 10.1177/0165551516666446   open full text
  • Similarity-based link prediction in social networks: A path and node combined approach.
    Yu, C., Zhao, X., An, L., Lin, X.
    Journal of Information Science. August 11, 2016

    With the rapid development of the Internet, the computational analysis of social networks has grown to be a salient issue. Various research analyses social network topics, and a considerable amount of attention has been devoted to the issue of link prediction. Link prediction aims to predict the interactions that might occur between two entities in the network. To this aim, this study proposed a novel path and node combined approach and constructed a methodology for measuring node similarities. The method was illustrated with five real datasets obtained from different types of social networks. An extensive comparison of the proposed method against existing link prediction algorithms was performed to demonstrate that the path and node combined approach achieved much higher mean average precision (MAP) and area under the curve (AUC) values than those that only consider common nodes (e.g. Common Neighbours and Adamic/Adar) or paths (e.g. Random Walk with Restart and FriendLink). The results imply that two nodes are more likely to establish a link if they have more common neighbours of lower degrees. The weight of the path connecting two nodes is inversely proportional to the product of degrees of nodes on the pathway. The combination of node and topological features can substantially improve the performance of similarity-based link prediction, compared with node-dependent and path-dependent approaches. The experiments also demonstrate that the path-dependent approaches outperform the node-dependent appraoches. This indicates that topological features of networks may contribute more to improving performance than node features.

    August 11, 2016   doi: 10.1177/0165551516664039   open full text
  • Profile-based recommendation: A case study in a parliamentary context.
    de Campos, L. M., Fernandez-Luna, J. M., Huete, J. F.
    Journal of Information Science. August 02, 2016

    In the context of e-government and more specifically that of parliament, this paper tackles the problem of finding Members of Parliament (MPs) according to their profiles which have been built from their speeches in plenary or committee sessions. The paper presents a common solution for two problems: firstly, a member of the public who is concerned about a certain issue might want to know who the best MP is for dealing with their problem (recommending task); and secondly, each new piece of textual information that reaches the house must be correctly allocated to the appropriate MP according to its content (filtering task). This paper explores both these ways of searching for relevant people conceptually by encapsulating them into a single problem: that of searching for the relevant MP’s profile given an information need. Our research work proposes various profile construction methods (by selecting and weighting appropriate terms) and compares these using different retrieval models to evaluate their quality and suitability for different types of information needs in order to simulate real and common situations.

    August 02, 2016   doi: 10.1177/0165551516659402   open full text
  • 'Being in a knowledge space: Information behaviour of cult media fan communities.
    Price, L., Robinson, L.
    Journal of Information Science. July 15, 2016

    This article describes the first two parts of a three-stage study investigating the information behaviour of fans and fan communities, focusing on fans of cult media. A literature analysis shows that information practices are an inherent and major part of fan activities, and that fans are practitioners of new forms of information consumption and production, showing sophisticated activities of information organisation and dissemination. A subsequent Delphi study, taking the novel form of a ‘serious leisure’ Delphi, in which the participants are not experts in the usual sense, identifies three aspects of fan information behaviour of particular interest beyond the fan context: information gatekeeping; classifying and tagging; and entrepreneurship and economic activity.

    July 15, 2016   doi: 10.1177/0165551516658821   open full text
  • A social recommender system by combining social network and sentiment similarity: A case study of healthcare.
    Yang, D., Huang, C., Wang, M.
    Journal of Information Science. July 06, 2016

    Social recommender systems aim to support user preferences and help users make better decisions in social media. The social network and the social context are two vital elements in social recommender systems. In this contribution, we propose a new framework for a social recommender system based on both network structure analysis and social context mining. Exponential random graph models (ERGMs) are able to capture and simulate the complex structure of a micro-blog network. We derive the prediction formula from ERGMs for recommending micro-blog users. Then, a primary recommendation list is created by analysing the micro-blog network structure. In the next step, we calculate the sentiment similarities of micro-blog users based on a sentiment feature set which is extracted from users’ tweets. Sentiment similarities are used to filter the primary recommendation list and find users who have similar attitudes on the same topic. The goal of those two steps is to make the social recommender system much more precise and to satisfy users’ psychological preferences. At the end, we use this new framework deal with big real-world data. The recommendation results of diabetes accounts of Weibo show that our method outperforms other social recommender systems.

    July 06, 2016   doi: 10.1177/0165551516657712   open full text
  • Community detection in dynamic social networks: A local evolutionary approach.
    Samie, M. E., Hamzeh, A.
    Journal of Information Science. July 04, 2016

    Communities in social networks are groups of individuals who are connected with specific goals. Discovering information on the structure, members and types of changes of communities have always been of great interest. Despite the extensive global researches conducted on these, discovery has not been confirmed yet and researchers try to find methods and improve estimated techniques by using Data Mining tools, Graph Mining tools and artificial intelligence techniques. This paper proposes a novel two-phase approach based on global and local information to detect communities in social network. It explores the global information in the first phase and then exploits the local information in the second phase to discover communities more accurately. It also proposes a novel algorithm which exploits the local information and mines deeply for the second phase. Experimental results show that the proposed method has better performance and achieves more accurate results compared with the previous ones.

    July 04, 2016   doi: 10.1177/0165551516657717   open full text
  • A comparative study of three teaching methods on student information literacy in stand-alone credit-bearing university courses.
    Dolnicar, D., Podgornik, B. B., Bartol, T.
    Journal of Information Science. June 30, 2016

    Three teaching methods, applied to credit-bearing information literacy (IL) university courses, were evaluated and compared. The effects of lecture-based learning (LBL), project-based learning (PjBL) and problem-based learning (PBL) were investigated using the information literacy test (ILT) as an assessment tool, with regard to the total ILT score, specific IL contents according to the five ACRL standards and students’ mental skills according to the Bloom’s cognitive categories. While all three teaching methods showed a significant improvement in the ILT post-test, the active-learning groups of PjBL and PBL scored significantly better than the LBL group. The most notable positive difference was observed in students’ effective access to information related to database searching skills, in the intellectual property/ethics issues and in the cognitive category of comprehension. The PjBL and PBL post-test results did not differ significantly, indicating that both active learning methods resulted in similar improvements of students’ IL.

    June 30, 2016   doi: 10.1177/0165551516655084   open full text
  • An empirical test of an Antecedents - Privacy Concerns - Outcomes model.
    Benamati, J. H., Ozdemir, Z. D., Smith, H. J.
    Journal of Information Science. June 22, 2016

    This study extends privacy concerns research by providing a test of a model inspired by the ‘Antecedents – Privacy Concerns – Outcomes’ (APCO) framework. Focusing at the individual level of analysis, the study examines the influences of privacy awareness (PA) and demographic variables (age, gender) on concern for information privacy (CFIP). It also considers CFIP’s relationship to privacy-protecting behaviours and incorporates trust and risk into the model. These relationships are tested in a specific, Facebook-related context. Results strongly support the overall model. PA and gender are important explanators for CFIP, which in turn explains privacy-protecting behaviours. We also find that perceived risk affects trust, which in turn affects behaviours in the studied context. The results yield several recommendations for future research as well as some implications for management.

    June 22, 2016   doi: 10.1177/0165551516653590   open full text
  • Collaborative filtering using non-negative matrix factorisation.
    Aghdam, M. H., Analoui, M., Kabiri, P.
    Journal of Information Science. June 21, 2016

    Collaborative filtering is a popular strategy in recommender systems area. This approach gathers users’ ratings and then predicts what users will rate based on their similarity to other users. However, most of the collaborative filtering methods have faced problems such as sparseness and scalability. This paper presents a non-negative matrix factorisation method to alleviate these problems via decomposing rating matrix into user matrix and item matrix. This method tries to find two non-negative user matrix and item matrix whose product can well estimate the rating matrix. This approach proposes updated rules to learn the latent factors for factorising the rating matrix. The proposed method can estimate all the unknown ratings and its computational complexity is very low. Empirical studies on benchmark datasets show that the proposed method is more tolerant of the sparseness and scalability problems.

    June 21, 2016   doi: 10.1177/0165551516654354   open full text
  • Representing and integrating bibliographic information into the Semantic Web: A comparison of four conceptual models.
    Zapounidou, S., Sfakakis, M., Papatheodorou, C.
    Journal of Information Science. June 17, 2016

    Integration of library data into the Semantic Web environment is a key issue for libraries and is approached on the basis of interoperability between conceptual models. Several data models exist for the representation and publication of library data in the Semantic Web and therefore inter-domain and intra-domain interoperability issues emerge as a growing number of web data are generated. Achieving interoperability for different representations of the same or related entities between the library and other cultural heritage institutions shall enhance rich bibliographic data reusability and support the development of new data-driven information services. This paper aims to investigate common ground and convergences between four conceptual models, namely Functional Requirements for Bibliographic Records (FRBR), FRBR Object-Oriented (FRBRoo), Bibliographic Framework (BIBFRAME) and Europeana Data Model (EDM), enabling semantically-richer interoperability by studying the representation of monographs, as well as of content relationships (derivative and equivalent bibliographic relationships) and of whole-part relationships between them.

    June 17, 2016   doi: 10.1177/0165551516650410   open full text
  • The embeddedness of collaborative information seeking in information culture.
    Hansen, P., Widen, G.
    Journal of Information Science. June 17, 2016

    Professionally, people often conduct their work in settings containing a range of different collaborative situations and work practices in which people handle information and work activities. Still, work tasks are usually considered and perceived as individual activities although the technology and the characteristics of the tasks require collaborative and cooperative handling processes. This viewpoint still produces technologies that, in general, assume individual information management and decision-making. Based on previous research on information culture (IC) and collaborative information seeking (CIS), this paper proposes an integrated framework where both environmental (cultural) as well as collaborative aspects of organisational information behaviour are present. This kind of framework would be useful in studies looking into how information is retrieved, how information is organised and managed, and how information is used as a resource in collaborative settings. It gives a more holistic perspective to information use and practices in organisations where culture, collaboration and awareness are especially brought to common attention for effective information management in organisations.

    June 17, 2016   doi: 10.1177/0165551516651544   open full text
  • Classification of news-related tweets.
    Demirsoz, O., Ozcan, R.
    Journal of Information Science. June 17, 2016

    It is important to obtain public opinion about a news article. Microblogs such as Twitter are popular and an important medium for people to share ideas. An important portion of tweets are related to news or events. Our aim is to find tweets about newspaper reports and measure the popularity of these reports on Twitter. However, it is a challenging task to match informal and very short tweets with formal news reports. In this study, we formulate this problem as a supervised classification task. We propose to form a training set using tweets containing a link to the news and the content of the same news article. We preprocess tweets by removing unnecessary words and symbols and apply stemming by means of morphological analysers. We apply binary classifiers and anomaly detection to this task. We also propose a textual similarity-based approach. We observed that preprocessing of tweets increases accuracy. The textual similarity method obtains results with the highest recognition rate. Success increases in some cases when report text is used with tweets containing a link to the news report within the training set of classification studies. We propose that this study, which is made directly in consideration of tweet texts that measure the trends of national newspaper reports on social media, has a higher significance when compared to Twitter analyses made by using a hashtag. Given the limited number of scientific studies on Turkish tweets, this study makes a contribution to the literature.

    June 17, 2016   doi: 10.1177/0165551516653082   open full text
  • SKOS concepts and natural language concepts: An analysis of latent relationships in KOSs.
    Mastora, A., Peponakis, M., Kapidakis, S.
    Journal of Information Science. May 17, 2016

    The vehicle to represent Knowledge Organisation Systems (KOSs) in the environment of the Semantic Web and linked data is the Simple Knowledge Organisation System (SKOS). SKOS provides a way to assign a Uniform Resource Identifier (URI) to each concept, and this URI functions as a surrogate for the concept. This fact makes of main concern the need to clarify the URIs’ ontological meaning. The aim of this study is to investigate the relationship between the ontological substance of KOS concepts and concepts revealed through the grammatical and syntactic formalisms of natural language. For this purpose, we examined the dividableness of concepts in specific KOSs (i.e. a thesaurus, a subject headings system and a classification scheme) by applying Natural Language Processing (NLP) techniques (i.e. morphosyntactic analysis) to the lexical representations (i.e. RDF literals) of SKOS concepts. The results of the comparative analysis reveal that, despite the use of multi-word units, thesauri tend to represent concepts in a way that can hardly be further divided conceptually, while subject headings and classification schemes – to a certain extent – comprise terms that can be decomposed into more conceptual constituents. Consequently, SKOS concepts deriving from thesauri are more likely to represent atomic conceptual units and thus be more appropriate tools for inference and reasoning. Since identifiers represent the meaning of a concept, complex concepts are neither the most appropriate nor the most efficient way of modelling a KOS for the Semantic Web.

    May 17, 2016   doi: 10.1177/0165551516648108   open full text
  • Augmented intuitive dissimilarity metric for clustering of Web user sessions.
    Sisodia, D. S., Verma, S., Vyas, O. P.
    Journal of Information Science. May 17, 2016

    Clustering is a very useful technique to categorise Web users with common browsing activities, access patterns and navigational behaviour. Web user clustering is used to build Web visitor profiles that make the core of a personalised information recommender system. These systems are used to comprehend Web users surfing activities by offering tailored content to Web users with similar interests. The principle objective of Web user sessions clustering is to maximise the intra-group while minimising the inter-group similarity. Efficient clustering of Web users’ sessions not only depend on the clustering algorithm’s nature but also depend on how well user concerns are captured and accommodated by the dissimilarity measure that are used. Determining the right dissimilarity measure to capture the access behaviour of the Web user is very significant for substantial clustering. In this paper, an intuitive dissimilarity measure is presented to estimate a Web user’s concern from augmented Web user sessions. The proposed usage dissimilarity measure between two Web user sessions is based on the accessing page relevance, the syntactic structure of page URL and hierarchical structure of the website. This proposed intuitive dissimilarity measure was used with K-Medoids Clustering algorithm for experimentation and results were compared with other independent dissimilarity measures. The worth of the generated clusters were evaluated by two unsupervised cluster validity indexes. The experimental results show that intuitive augmented session dissimilarity measure is more realistic and superior as compared to the other independent dissimilarity measures regarding cluster validity indexes.

    May 17, 2016   doi: 10.1177/0165551516648259   open full text
  • Feature-based opinion mining in financial news: An ontology-driven approach.
    Salas-Zarate, M. d. P., Valencia-Garcia, R., Ruiz-Martinez, A., Colomo-Palacios, R.
    Journal of Information Science. May 11, 2016

    Financial news plays a significant role with regard to predicting the behaviour of financial markets. However, the exponential growth of financial news on the Web has led to a need for new technologies that automatically collect and categorise large volumes of information in a fast and easy manner. Sentiment analysis, or opinion mining, is the field of study that analyses people’s opinions, moods and evaluations using written text on Web platforms. In recent research, a substantial effort has been made to develop sophisticated methods with which to classify sentiments in the financial domain. However, there is a lack of approaches that analyse the positive or negative orientation of each aspect contained in a document. In this respect, we propose a new sentiment analysis method for feature and news polarity classification. The method presented is based on an ontology-driven approach that makes it possible to semantically describe relations between concepts in the financial news domain. The polarity of the features in each document is also calculated by taking into account the words from around the linguistic expression of the feature. These words are obtained by using the ‘N_GRAM After’, ‘N_GRAM Before’, ‘N_GRAM Around’ and ‘All_Phrase’ methods. The effectiveness of our method has been proved by carrying out a set of experiments on a corpus of 1000 financial news items. Our proposal obtained encouraging results with an accuracy of 66.7% and an F-measure of 64.9% for feature polarity classification and an accuracy of 89.8% and an F-measure of 89.7% for news polarity classification. The experimental results additionally show that the N_GRAM Around method provides the best average results.

    May 11, 2016   doi: 10.1177/0165551516645528   open full text
  • Query-specific signature selection for efficient k-nearest neighbour approximation.
    Park, Y., Hwang, H., Lee, S.-g.
    Journal of Information Science. May 10, 2016

    Finding k-nearest neighbours (k-NN) is one of the most important primitives of many applications such as search engines and recommendation systems. However, its computational cost is extremely high when searching for k-NN points in a huge collection of high-dimensional points. Locality-sensitive hashing (LSH) has been introduced for an efficient k-NN approximation, but none of the existing LSH approaches clearly outperforms others. We propose a novel LSH approach, Signature Selection LSH (S2LSH), which finds approximate k-NN points very efficiently in various datasets. It first constructs a large pool of highly diversified signature regions with various sizes. Given a query point, it dynamically generates a query-specific signature region by merging highly effective signature regions selected from the signature pool. We also suggest S2LSH-M, a variant of S2LSH, which processes multiple queries more efficiently by using query-specific features and optimization techniques. Extensive experiments show the performance superiority of our approaches in diverse settings.

    May 10, 2016   doi: 10.1177/0165551516644176   open full text
  • Asking for more than an answer: What do askers expect in online Q&A services?
    Choi, E., Shah, C.
    Journal of Information Science. May 03, 2016

    Q&A services allow one to express an information need in the form of a natural language question and seek information from users of those services. Despite a recent rise in the research related to various issues of online Q&A, there is still a lack of consideration for how the situational context behind asking a question affects quality judgements. By focusing on users’ expectations when asking a question, the work reported here builds on a framework of understanding how people assess information. Mixed method analysis – employing sequentially the Internet-based survey, diary and interviews – was used in a study to investigate this issue. A total of 226 online Q&A users participated in the study, and it was found that looking for quick responses, looking for additional or alternative information, and looking for accurate or complete information were the primary expectations of the askers. Findings can help identify why and how users engage in information seeking within an online Q&A context, and may help develop more comprehensive personalised approaches to deriving information relevance and satisfaction that include user expectations.

    May 03, 2016   doi: 10.1177/0165551516645530   open full text
  • A user-oriented semantic annotation approach to knowledge acquisition and conversion.
    Hao, T., Zhu, C., Mu, Y., Liu, G.
    Journal of Information Science. April 29, 2016

    Semantic annotation on natural language texts labels the meaning of an annotated element in specific contexts, and thus is an essential procedure for domain knowledge acquisition. An extensible and coherent annotation method is crucial for knowledge engineers to reduce human efforts to keep annotations consistent. This article proposes a comprehensive semantic annotation approach supported by a user-oriented markup language named UOML to enhance annotation efficiency with the aim of building a high quality knowledge base. UOML is operable by human annotators and convertible to formal knowledge representation languages. A pattern-based annotation conversion method named PAC is further proposed for knowledge exchange by utilizing automatic pattern learning. We designed and implemented a semantic annotation platform Annotation Assistant to test the effectiveness of the approach. By applying this platform in a long-term international research project for more than three years aiming at high quality knowledge acquisition from a classical Chinese poetry corpus containing 52,621 Chinese characters, we effectively acquired 150,624 qualified annotations. Our test shows that the approach has improved operational efficiency by 56.8%, on average, compared with text-based manual annotation. By using UOML, PAC achieved a conversion error ratio of 0.2% on average, significantly improving the annotation consistency compared with baseline annotations. The results indicate the approach is feasible for practical use in knowledge acquisition and conversion.

    April 29, 2016   doi: 10.1177/0165551516642688   open full text
  • Identification of multi-spreader users in social networks for viral marketing.
    Sheikhahmadi, A., Nematbakhsh, M. A.
    Journal of Information Science. April 29, 2016

    Identifying high spreading power nodes is an interesting problem in social networks. Finding super spreader nodes becomes an arduous task when the nodes appear in large numbers, and the number of existing links becomes enormous among them. One of the methods that is used for identifying the nodes is to rank them based on k-shell decomposition. Nevertheless, one of the disadvantages of this method is that it assigns the same rank to the nodes of a shell. Another disadvantage of this method is that only one indicator is fairly used to rank the nodes. k-Shell is an approach that is used for ranking separate spreaders, yet it does not have enough efficiency when a group of nodes with maximum spreading needs to be selected; therefore, this method, alone, does not have enough efficiency. Accordingly, in this study a hybrid method is presented to identify the super spreaders based on k-shell measure. Afterwards, a suitable method is presented to select a group of superior nodes in order to maximize the spread of influence. Experimental results on seven complex networks show that our proposed methods outperforms other well-known measures and represents comparatively more accurate performance in identifying the super spreader nodes.

    April 29, 2016   doi: 10.1177/0165551516644171   open full text
  • Investigating the precision of Web image search engines for popular and less popular entities.
    Uyar, A., Karapinar, R.
    Journal of Information Science. April 27, 2016

    Image search is the second most frequently used search service on the Web. However, there are very few studies investigating any aspect of it. In this study, we investigate the precision of Web image search engines of Google and Bing for popular and less popular entities using text-based queries. Furthermore, we investigate four additional aspects of Web image search engines that have not been studied before. We used 60 different queries in total from three different domains for popular and less popular categories. We examined the relevancy of the top 100 images for each query. Our results indicate that image search is a solved problem for popular entities. They deliver 97% precision on the average for popular entities. However, precision values are much lower for less popular entities. For the top 100 results, average precision is 48% for Google and 33% for Bing. The most important problem seems to be the worst cases in which the precision can be less than 10%. The results show that significant improvement is needed to better identify relevant images for less popular entities. One of the main issues is the association problem. When a Web page has query words and multiple images, both Google and Bing are having difficulty determining the relevant images.

    April 27, 2016   doi: 10.1177/0165551516642929   open full text
  • A language-model-based approach for subjectivity detection.
    Karimi, S., Shakery, A.
    Journal of Information Science. April 26, 2016

    The rapid growth of opinionated text on the Web increases the demand for efficient methods for detecting subjective texts. In this paper, a subjectivity detection method is proposed which utilizes a language-model-based structure to define a subjectivity score for each document where the topic relevance of documents does not affect the subjectivity scores. In order to overcome the limited content in short documents, we further propose an expansion method to better estimate the language models. Since the lack of linguistic resources in resource-lean languages like Persian makes subjectivity detection difficult in these languages, the method is proposed in two versions: a semi-supervised version for resource-lean languages and a supervised version. Experimental evaluations on five datasets in two languages, English and Persian, demonstrate that the method performs well in distinguishing subjective documents from objective ones in both languages.

    April 26, 2016   doi: 10.1177/0165551516641818   open full text
  • Modelling multi-topic information propagation in online social networks based on resource competition.
    Sun, L., Zhou, Y., Guan, X.
    Journal of Information Science. April 25, 2016

    Understanding information propagation in online social networks is important in many practical applications and is of great interest to many researchers. The challenge with the existing propagation models lies in the requirement of complete network structure, topic-dependent model parameters and topic isolated spread assumption, etc. In this paper, we study the characteristics of multi-topic information propagation based on the data collected from Sina Weibo, one of the most popular microblogging services in China. We find that the daily total amount of user resources is finite and users’ attention transfers from one topic to another. This shows evidence on the competitions between multiple dynamical topics. According to these empirical observations, we develop a competition-based multi-topic information propagation model without social network structure. This model is built based on general mechanisms of resource competitions, i.e. attracting and distracting users’ attention, and considers the interactions of multiple topics. Simulation results show that the model can effectively produce topics with temporal popularity similar to the real data. The impact of model parameters is also analysed. It is found that topic arrival rate reflects the strength of competitions, and topic fitness is significant in modelling the small scale topic propagation.

    April 25, 2016   doi: 10.1177/0165551516642928   open full text
  • Proposal reviewer recommendation system based on big data for a national research management institute.
    Shon, H. S., Han, S. H., Kim, K. A., Cha, E. J., Ryu, K. H.
    Journal of Information Science. April 25, 2016

    National research management organizations need to ensure that research proposals are reviewed fairly and efficiently, which requires the selection of suitable reviewers. In particular, reviewing research proposals in a particular area necessitates the selection of a group with the most reasonable standard for recommending an expert in that area. In this study, we develop an automatic matching system that matches a research proposal with a reviewer who can evaluate it most effectively, using keywords with fuzzy weights based on databases in the corresponding field of research. All functions that we developed were based on the MapReduce framework created by Hadoop, which was verified to enhance matching performance and ensure expandability. This enabled us to select suitable researchers from existing research projects, papers and research reviewer databases. Our system can influence the operation of the national research management system and contribute to academic development.

    April 25, 2016   doi: 10.1177/0165551516644168   open full text
  • A search index-enhanced feature model for news recommendation.
    Chen, K., Ji, X., Wang, H.
    Journal of Information Science. April 19, 2016

    General news recommendations are important but have received limited attention because of the difficulties of measuring public interest. In public search engines, the objects of search terms reflect the issues that interest or concern search engine users. Because of the popularity of search engines, search indexes have become a new measure for describing public interest trends. With the help of a public search index provided by search engines, we construct a news topic search feature and a news object search feature. These features measure the public attention on key elements of the news. In the experiment, we compare various feature models with machine learning algorithms with respect to financial news recommendations. The results demonstrate that the topic search features perform best compared with other feature models. This research contributes to both the feature generation and news recommendation domains.

    April 19, 2016   doi: 10.1177/0165551516639801   open full text
  • How well does Google work with Persian documents?
    Sadeghi, M., Vegas, J.
    Journal of Information Science. March 23, 2016

    The performance evaluation of an information retrieval system is a decisive aspect of the measure of the improvements in search technology. The Google search engine, as a tool for retrieving information on the Web, is used by almost 92% of Iranian users. The purpose of this paper is to study Google’s performance in retrieving relevant information from Persian documents. The information retrieval effectiveness is based on the precision measures of the search results done to a website that we have built with the documents of a TREC standard corpus. We asked Google for 100 topics available on the corpus and we compared the retrieved webpages with the relevant documents. The obtained results indicated that the morphological analysis of the Persian language is not fully taken into account by the Google search engine. The incorrect text tokenisation, considering the stop words as the content keywords of a document and the wrong ‘variants encountered’ of words found by Google are the main reasons that affect the relevance of the Persian information retrieval on the Web for this search engine.

    March 23, 2016   doi: 10.1177/0165551516640437   open full text
  • An improved ant algorithm with LDA-based representation for text document clustering.
    Onan, A., Bulut, H., Korukoglu, S.
    Journal of Information Science. March 22, 2016

    Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.

    March 22, 2016   doi: 10.1177/0165551516638784   open full text
  • Confluence of social network, social question and answering community, and user reputation model for information seeking and experts generation.
    Alam, A., Khusro, S., Ullah, I., Karim, M. S.
    Journal of Information Science. March 11, 2016

    Social question and answering (Q&A) is one of the most effective approaches to knowledge acquisition using information seeking and collaboration. Most modern social Q&A systems use a static points-based user reputation model, which has the effect of diminishing the value of experts. In order to overcome this issue, we have developed a dynamic points-based user reputation model that takes user rating and social network analysis as input. The impact weight of each relation and user ratings are not static but are dependent on the current level of asker and answerer and on the difficulty level of the question. We propose a novel social Q&A platform that is the confluence of different features of social network, social Q&A, and the dynamic points-based user reputation model. The beta version of the system was evaluated by conducting a clinical study for 4 months in different academic environments. The results show that the proposed social Q&A outperforms the available static points-based social Q&A systems in representing the actual user reputation with an increased user satisfaction.

    March 11, 2016   doi: 10.1177/0165551516637322   open full text
  • Impression management through people tagging in the enterprise: Implications for social media sampling and design.
    Raban, D. R., Danan, A., Ronen, I., Guy, I.
    Journal of Information Science. March 03, 2016

    People tagging allows a person to tag one’s self or others; it is reciprocal and therefore has social implications. The main uses of corporate people tagging systems are for building internal social networks, solving problems, and seeking expertise. We explored the statistical and terminological relation between self-presentation and perception by others as reflected by the use of tags in a people tagging system within a large enterprise.

    Due to the features of the power law distribution of the data, two different samples were analyzed. Using content analysis, we found that when there are few self or social tags, users prefer to use tags from the Environment and Technology categories, providing tags that tend to be objective or factual. When tagging approaches saturation, it becomes more subjective and social, using tags from the Individual category. Self-tags tend to be more factual describing technology expertise while social tags augment the individual tags by adding a personal dimension. The more people tag and get tagged, the more terminological overlap develops. We conclude by providing practical advice on how to create a sustainable system by balancing originality and duplication using interactivity and feedback.

    March 03, 2016   doi: 10.1177/0165551516636305   open full text
  • Seeking information in circles: The application of Chatmans life in the round theory to the information small world of Catholic clergy in northern Nigeria.
    Dankasa, J.
    Journal of Information Science. February 22, 2016

    This study explores Chatman’s proposition of the theory of life in the round that members of a small world who live in the round will not cross the boundaries of their world to seek information. The study tests Chatman’s proposition to find out whether it is applicable to the special population of Catholic clergy. The study was conducted with Catholic clergy from northern Nigeria. Findings show that these clergy are not likely to cross boundaries of their small worlds to seek information about their ministry or private lives. They prefer to seek such information within their circle of clergy. The findings align with Chatman’s conclusion that life lived in the round has a negative influence on information seeking. This study advances the understanding of Chatman’s theory of life in the round and positions religious status as a factor that is capable of influencing the information-seeking process.

    February 22, 2016   doi: 10.1177/0165551516632659   open full text
  • Modelling trust networks using resistive circuits for trust-aware recommender systems.
    Hosseinzadeh Aghdam, M., Analoui, M., Kabiri, P.
    Journal of Information Science. February 16, 2016

    Recommender systems have been widely used for predicting unknown ratings. Collaborative filtering as a recommendation technique uses known ratings for predicting user preferences in the item selection. However, current collaborative filtering methods cannot distinguish malicious users from unknown users. Also, they have serious drawbacks in generating ratings for cold-start users. Trust networks among recommender systems have been proved beneficial to improve the quality and number of predictions. This paper proposes an improved trust-aware recommender system that uses resistive circuits for trust inference. This method uses trust information to produce personalized recommendations. The result of evaluating the proposed method on Epinions dataset shows that this method can significantly improve the accuracy of recommender systems while not reducing the coverage of recommender systems.

    February 16, 2016   doi: 10.1177/0165551516628733   open full text
  • Linking and using social media data for enhancing public health analytics.
    Ji, X., Chun, S. A., Cappellari, P., Geller, J.
    Journal of Information Science. February 12, 2016

    There is a large amount of health information available for any patient to address his/her health concerns. The freely available health datasets include community health data at the national, state, and community level, readily accessible and downloadable. These datasets can help to assess and improve healthcare performance, as well as help to modify health-related policies. There are also patient-generated datasets, accessible through social media, on the conditions, treatments, or side effects that individual patients experience. Clinicians and healthcare providers may benefit from being aware of national health trends and individual healthcare experiences that are relevant to their current patients. The available open health datasets vary from structured to highly unstructured. Due to this variability, an information seeker has to spend time visiting many, possibly irrelevant, Websites, and has to select information from each and integrate it into a coherent mental model.

    In this paper, we discuss an approach to integrating these openly available health data sources and presenting them to be easily understandable by physicians, healthcare staff, and patients. Through linked data principles and Semantic Web technologies we construct a generic model that integrates diverse open health data sources. The integration model is then used as the basis for developing a set of analytics as part of a system called ‘Social InfoButtons’, providing awareness of both community and patient health issues as well as healthcare trends that may shed light on a specific patient care situation. The prototype system provides patients, public health officials, and healthcare specialists with a unified view of health-related information from both official scientific sources and social networks, and provides the capability of exploring the current data along multiple dimensions, such as time and geographical location.

    February 12, 2016   doi: 10.1177/0165551515625029   open full text
  • A community-based approach to identify the most influential nodes in social networks.
    Hosseini-Pozveh, M., Zamanifar, K., Naghsh-Nilchi, A. R.
    Journal of Information Science. February 04, 2016

    One of the important issues concerning the spreading process in social networks is the influence maximization. This is the problem of identifying the set of the most influential nodes in order to begin the spreading process based on an information diffusion model in the social networks. In this study, two new methods considering the community structure of the social networks and influence-based closeness centrality measure of the nodes are presented to maximize the spread of influence on the multiplication threshold, minimum threshold and linear threshold information diffusion models. The main objective of this study is to improve the efficiency with respect to the run time while maintaining the accuracy of the final influence spread. Efficiency improvement is obtained by reducing the number of candidate nodes subject to evaluation in order to find the most influential. Experiments consist of two parts: first, the effectiveness of the proposed influence-based closeness centrality measure is established by comparing it with available centrality measures; second, the evaluations are conducted to compare the two proposed community-based methods with well-known benchmarks in the literature on the real datasets, leading to the results demonstrate the efficiency and effectiveness of these methods in maximizing the influence spread in social networks.

    February 04, 2016   doi: 10.1177/0165551515621005   open full text
  • The impact of indexing approaches on Arabic text classification.
    Al-Badarneh, A., Al-Shawakfa, E., Bani-Ismail, B., Al-Rababah, K., Shatnawi, S.
    Journal of Information Science. February 01, 2016

    This paper investigates the impact of using different indexing approaches (full-word, stem, and root) when classifying Arabic text. In this study, the naïve Bayes classifier is used to construct the multinomial classification models and is evaluated using stratified k-fold cross-validation (k ranges from 2 to 10). It is also uses a corpus that consists of 1000 normalized Arabic documents. The results of one experiment in this study show that significant accuracy improvements have occurred when the full-word form is used in most k-folds. Further experiments show that the classifier has achieved the highest accuracy in the eight-fold by using 7/8–1/8 train–test ratio, despite the indexing approach being used. The overall results of this study show that the classifier has achieved the maximum micro-average accuracy 99.36%, either by using the full-word form or the stem form. This proves that the stem is a better choice to use when classifying Arabic text, because it makes the corpus dataset smaller and this will enhance both the processing time and storage utilization, and achieve the highest level of accuracy.

    February 01, 2016   doi: 10.1177/0165551515625030   open full text
  • Authentic versus fictitious online reviews: A textual analysis across luxury, budget, and mid-range hotels.
    Banerjee, S., Chua, A. Y. K.
    Journal of Information Science. February 01, 2016

    Extant literature suggests that authentic and fictitious online reviews could be distinguished by leveraging on their textual characteristics. However, nuances in textual differences between authentic and fictitious reviews across different categories of hotels remain largely unknown. Therefore, this paper analyzes textual differences between authentic and fictitious reviews across three hotel categories, namely, luxury, budget, and mid-range. It leverages on four possible textual characteristics – comprehensibility, specificity, exaggeration, and negligence – that could offer clues to ascertain review authenticity. Using a dataset of 1800 reviews (900 authentic + 900 fictitious), the results suggest that differences between authentic and fictitious reviews are largely inconsistent across hotel categories. This generally points to the difficulties in ascertaining review authenticity, which in turn offer implications for both research and practice.

    February 01, 2016   doi: 10.1177/0165551515625027   open full text
  • A flexible aggregation framework on large-scale heterogeneous information networks.
    Yin, D., Gao, H.
    Journal of Information Science. February 01, 2016

    OLAP (On-line Analytical Processing) can provide users with aggregate results from different perspectives and granularities. With the advent of heterogeneous information networks that consist of multi-type, interconnected nodes, such as bibliographic networks and knowledge graphs, it is important to study flexible aggregation in such networks. The aggregation results by existing work are limited to one type of node, which cannot be applied to aggregation on multi-type nodes, and relations in large-scale heterogeneous information networks. In this paper, we investigate the flexible aggregation problem on large-scale heterogeneous information networks, which is defined on multi-type nodes and relations. Moreover, by considering both attributes and structures, we propose a novel function based on graph entropy to measure the similarities of nodes. Further, we prove that the aggregation problem based on the function is NP-hard. Therefore, we develop an efficient heuristic algorithm for aggregation in two phases: informational aggregation and structural aggregation. The algorithm has linear time and space complexity. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of the proposed algorithm.

    February 01, 2016   doi: 10.1177/0165551516630237   open full text
  • Evaluating open access journals using Semantic Web technologies and scorecards.
    Hallo, M., Lujan-Mora, S., Mate, A.
    Journal of Information Science. January 13, 2016

    This paper describes a process to develop and publish a scorecard from an OAJ (Open Access Journal) on the Semantic Web using Linked Data technologies in such a way that it can be linked to related datasets. Furthermore, methodological guidelines are presented with activities related to each step of the process. The proposed process was applied to a university OAJ, including the definition of the KPIs (Key Performance Indicators) linked to the institutional strategies, the extraction, cleaning and loading of data from the data sources into a data mart, the transformation of data into RDF (Resource Description Framework), and the publication of data by means of a SPARQL endpoint using the Virtuoso software. Additionally, the RDF data cube vocabulary has been used to publish the multidimensional data on the Web. The visualization was made using CubeViz, a faceted browser to present the KPIs in interactive charts.

    January 13, 2016   doi: 10.1177/0165551515624353   open full text
  • Time sensitive blog retrieval using temporal properties of queries.
    Zahedi, M., Aleahmad, A., Rahgozar, M., Oroumchian, F., Bozorgi, A.
    Journal of Information Science. December 23, 2015

    Blogs are one of the main user-generated contents on the web and are growing in number rapidly. The characteristics of blogs require the development of specialized search methods which are tuned for the blogosphere. In this paper, we focus on blog retrieval, which aims at ranking blogs with respect to their recurrent relevance to a user’s topic. Although different blog retrieval algorithms have already been proposed, few of them have considered temporal properties of the input queries. Therefore, we propose an efficient approach to improving relevant blog retrieval using temporal property of queries. First, time sensitivity of each query is automatically computed for different time intervals based on an initially retrieved set of relevant posts. Then a temporal score is calculated for each blog and finally all blogs are ranked based on their temporal and content relevancy with regard to the input query. Experimental analysis and comparison of the proposed method are carried out using a standard dataset with 45 diverse queries. Our experimental results demonstrate that, using different measurement criteria, our proposed method outperforms other blog retrieval methods.

    December 23, 2015   doi: 10.1177/0165551515618589   open full text
  • TTC-3600: A new benchmark dataset for Turkish text categorization.
    Kılınc, D., Özcift, A., Bozyigit, F., Yıldırım, P., Yücalar, F., Borandag, E.
    Journal of Information Science. December 23, 2015

    Owing to the rapid growth of the World Wide Web, the number of documents that can be accessed via the Internet explosively increases with each passing day. Considering news portals in particular, sometimes documents related to categories such as technology, sports and politics seem to be in the wrong category or documents are located in a generic category called others. At this point, text categorization (TC), which is generally addressed as a supervised learning task is needed. Although there are substantial number of studies conducted on TC in other languages, the number of studies conducted in Turkish is very limited owing to the lack of accessibility and usability of datasets created. In this paper, a new dataset named TTC-3600, which can be widely used in studies of TC of Turkish news and articles, is created. TTC-3600 is a well-documented dataset and its file formats are compatible with well-known text mining tools. Five widely used classifiers within the field of TC and two feature selection methods are evaluated on TTC-3600. The experimental results indicate that the best accuracy criterion value 91.03% is obtained with the combination of Random Forest classifier and attribute ranking-based feature selection method in all comparisons performed after pre-processing and feature selection steps. The publicly available TTC-3600 dataset and the experimental results of this study can be utilized in comparative experiments by other researchers.

    December 23, 2015   doi: 10.1177/0165551515620551   open full text
  • Topic modelling for qualitative studies.
    Nikolenko, S. I., Koltcov, S., Koltsova, O.
    Journal of Information Science. December 11, 2015

    Qualitative studies, such as sociological research, opinion analysis and media studies, can benefit greatly from automated topic mining provided by topic models such as latent Dirichlet allocation (LDA). However, examples of qualitative studies that employ topic modelling as a tool are currently few and far between. In this work, we identify two important problems along the way to using topic models in qualitative studies: lack of a good quality metric that closely matches human judgement in understanding topics and the need to indicate specific subtopics that a specific qualitative study may be most interested in mining. For the first problem, we propose a new quality metric, tf-idf coherence, that reflects human judgement more accurately than regular coherence, and conduct an experiment to verify this claim. For the second problem, we propose an interval semi-supervised approach (ISLDA) where certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments. Our experiments show that ISLDA is better for topic extraction than LDA in terms of tf-idf coherence, number of topics identified to predefined keywords and topic stability. We also present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.

    December 11, 2015   doi: 10.1177/0165551515617393   open full text
  • A qualitative investigation of users discovery, access, and organization of video games as information objects.
    Lee, J. H., Clarke, R. I., Rossi, S.
    Journal of Information Science. December 11, 2015

    Video games are popular consumer products as well as research subjects, yet little exists about how players and other stakeholders find video games and what information they need to select, acquire and play video games. With the aim of better understanding people’s game-related information needs and behaviour, we conducted 56 semi-structured interviews with users who find, play, purchase, collect and recommend video games. Participants included gamers, parents, collectors, industry professionals, librarians, educators and scholars. From this user data, we derive and discuss key design implications for video game information systems: designing for target user populations, enabling recommendations based on appeals, offering multiple automatic organization options and providing relationship-based, user-generated, subject and visual metadata. We anticipate this work will contribute to building future video game information systems with new and improved access to games.

    December 11, 2015   doi: 10.1177/0165551515618594   open full text
  • SMS spam filtering and thread identification using bi-level text classification and clustering techniques.
    Nagwani, N. K., Sharaff, A.
    Journal of Information Science. December 03, 2015

    SMS spam detection is an important task where spam SMS messages are identified and filtered. As greater numbers of SMS messages are communicated every day, it is very difficult for a user to remember and correlate the newer SMS messages received in context to previously received SMS. SMS threads provide a solution to this problem. In this work the problem of SMS spam detection and thread identification is discussed and a state of the art clustering-based algorithm is presented. The work is planned in two stages. In the first stage the binary classification technique is applied to categorize SMS messages into two categories namely, spam and non-spam SMS; then, in the second stage, SMS clusters are created for non-spam SMS messages using non-negative matrix factorization and K-means clustering techniques. A threading-based similarity feature, that is, time between consecutive communications, is described for the identification of SMS threads, and the impact of the time threshold in thread identification is also analysed experimentally. Performance parameters like accuracy, precision, recall and F-measure are also evaluated. The SMS threads identified in this proposed work can be used in applications like SMS thread summarization, SMS folder classification and other SMS management-related tasks.

    December 03, 2015   doi: 10.1177/0165551515616310   open full text
  • 'An intensity around information: the changing face of chemical information literacy.
    Bawden, D., Robinson, L.
    Journal of Information Science. December 03, 2015

    The changing nature of chemical information literacy over 50 years is examined by a comparison of a number of guides to chemical literature and information. It is concluded that: an understanding of the world of information is the sole aspect to have remained important and essentially unchanged over time; that knowledge of sources, ability to access information and ability to organize information have been of importance throughout, but have changed their nature dramatically; and that evaluation of information has gained in importance since the advent of the World Wide Web. The link between chemical structure and corresponding substance information is the most significant threshold concept. Information literacy in chemistry is strongly subject-specific.

    December 03, 2015   doi: 10.1177/0165551515616919   open full text
  • Exploring performance of clustering methods on document sentiment analysis.
    Ma, B., Yuan, H., Wu, Y.
    Journal of Information Science. December 03, 2015

    Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.

    December 03, 2015   doi: 10.1177/0165551515617374   open full text
  • A semantic-based approach for querying linked data using natural language.
    Paredes-Valverde, M. A., Valencia-Garcia, R., Rodriguez-Garcia, M. A., Colomo-Palacios, R., Alor-Hernandez, G.
    Journal of Information Science. November 19, 2015

    The semantic Web aims to provide to Web information with a well-defined meaning and make it understandable not only by humans but also by computers, thus allowing the automation, integration and reuse of high-quality information across different applications. However, current information retrieval mechanisms for semantic knowledge bases are intended to be only used by expert users. In this work, we propose a natural language interface that allows non-expert users the access to this kind of information through formulating queries in natural language. The present approach uses a domain-independent ontology model to represent the question’s structure and context. Also, this model allows determination of the answer type expected by the user based on a proposed question classification. To prove the effectiveness of our approach, we have conducted an evaluation in the music domain using LinkedBrainz, an effort to provide the MusicBrainz information as structured data on the Web by means of Semantic Web technologies. Our proposal obtained encouraging results based on the F-measure metric, ranging from 0.74 to 0.82 for a corpus of questions generated by a group of real-world end users.

    November 19, 2015   doi: 10.1177/0165551515616311   open full text
  • A stream-based method to detect differences between XML documents.
    Jang, B., Park, S., Ha, Y.-g.
    Journal of Information Science. November 05, 2015

    Detecting differences between XML documents is one of most important research topics for XML. Since XML documents are generally considered to be organized in a tree structure, most previous research has attempted to detect differences using tree-matching algorithms. However, most tree-matching algorithms have inadequate performance owing to limitations in terms of the execution time, optimality and scalability. This study proposes a stream-based difference detection method in which an XML binary encoding algorithm is used to provide improved performance relative to that of previous tree-matching algorithms. A tree-structured analysis of XML is not essential in order to detect differences. We use a D-Path algorithm that has an optimal result quality for difference detection between two streams and has a lower time complexity than tree-based methods. We then modify the existing XML binary encoding method to tokenize the stream and the algorithm in order to support more operations than D-Path algorithm does. The experimental results reveal greater efficiency for the proposed method relative to tree-based methods. The execution time is at least 4 times faster than state-of-the-art tree-based methods. In addition, the scalability is much more efficient.

    November 05, 2015   doi: 10.1177/0165551515602805   open full text
  • A feature selection model based on genetic rank aggregation for text sentiment classification.
    Onan, A., Korukoglu, S.
    Journal of Information Science. November 05, 2015

    Sentiment analysis is an important research direction of natural language processing, text mining and web mining which aims to extract subjective information in source materials. The main challenge encountered in machine learning method-based sentiment classification is the abundant amount of data available. This amount makes it difficult to train the learning algorithms in a feasible time and degrades the classification accuracy of the built model. Hence, feature selection becomes an essential task in developing robust and efficient classification models whilst reducing the training time. In text mining applications, individual filter-based feature selection methods have been widely utilized owing to their simplicity and relatively high performance. This paper presents an ensemble approach for feature selection, which aggregates the several individual feature lists obtained by the different feature selection methods so that a more robust and efficient feature subset can be obtained. In order to aggregate the individual feature lists, a genetic algorithm has been utilized. Experimental evaluations indicated that the proposed aggregation model is an efficient method and it outperforms individual filter-based feature selection methods on sentiment classification.

    November 05, 2015   doi: 10.1177/0165551515613226   open full text
  • Information as causality: An approach to a general theory of information.
    Luo, T., Pan, Y.
    Journal of Information Science. October 28, 2015

    Although various approaches have been proposed throughout history, information, as one of the most fundamental elements in the world, does not have a general definition or theory that is acceptable to all disciplines. The biggest challenge is the unification of objective and subjective views, because they represent very different characteristics of information which are difficult to integrate into a single framework. We argue that the key to bridging the gap between objective and subjective views of information is a proper understanding of intelligence, because it gives rise to subjective experiences and assigns meaning to things. The purpose of this research is to explore possibilities and implications of applying neuroscience theory in the discussion of information. By incorporating the memory–prediction framework of intelligence developed by Jeff Hawkins, we propose causality to be the general definition of information, and the combination of ‘Physical Representations of Mental Patterns’ and ‘Physical Representations of Physical Patterns’ to be the restricted definition in social contexts. With both general and restricted definitions clarified, we then discuss a few cases of information use and the implications of our approach.

    October 28, 2015   doi: 10.1177/0165551515612662   open full text
  • A hybrid ontology-based information extraction system.
    Gutierrez, F., Dou, D., Fickas, S., Wimalasuriya, D., Zong, H.
    Journal of Information Science. October 26, 2015

    Information Extraction is the process of automatically obtaining knowledge from plain text. Because of the ambiguity of written natural language, Information Extraction is a difficult task. Ontology-based Information Extraction (OBIE) reduces this complexity by including contextual information in the form of a domain ontology. The ontology provides guidance to the extraction process by providing concepts and relationships about the domain. However, OBIE systems have not been widely adopted because of the difficulties in deployment and maintenance. The Ontology-based Components for Information Extraction (OBCIE) architecture has been proposed as a form to encourage the adoption of OBIE by promoting reusability through modularity. In this paper, we propose two orthogonal extensions to OBCIE that allow the construction of hybrid OBIE systems with higher extraction accuracy and a new functionality. The first extension utilizes OBCIE modularity to integrate different types of implementation into one extraction system, producing a more accurate extraction. For each concept or relationship in the ontology, we can select the best implementation for extraction, or we can combine both implementations under an ensemble learning schema. The second extension is a novel ontology-based error detection mechanism. Following a heuristic approach, we can identify sentences that are logically inconsistent with the domain ontology. Because the implementation strategy for the extraction of a concept is independent of the functionality of the extraction, we can design a hybrid OBIE system with concepts utilizing different implementation strategies for extracting correct or incorrect sentences. Our evaluation shows that, in the implementation extension, our proposed method is more accurate in terms of correctness and completeness of the extraction. Moreover, our error detection method can identify incorrect statements with a high accuracy.

    October 26, 2015   doi: 10.1177/0165551515610989   open full text
  • Profiling users with tag networks in diffusion-based personalized recommendation.
    Mao, J., Lu, K., Li, G., Yi, M.
    Journal of Information Science. October 19, 2015

    This study explores new ways of tag-based personalized recommendation by relieving the assumption that tags assigned by a user occur independently of each other. The new methods profile users using tag co-occurrence networks, upon which link-based node weighting methods (e.g. PageRank and HITS) are applied to refine the weights of tags. A diffusion process is then performed upon the tag-item bipartite graph to transform the weights of tags into recommendation scores for items. Experiments on three datasets showed improvements of the proposed method over the tag-based collaborative filtering in terms of precision and recall in the datasets with dense user-tag networks and in terms of inter-diversity in all datasets. In addition, the user-tag network is found to be a necessary instrument for the improvements. The findings of this study contribute to more accurate user profiling and personalized recommendations using network theory and have practical implications for tag-based recommendation systems.

    October 19, 2015   doi: 10.1177/0165551515603321   open full text
  • Arabic tweets sentiment analysis - a hybrid scheme.
    Aldayel, H. K., Azmi, A. M.
    Journal of Information Science. October 19, 2015

    The fact that people freely express their opinions and ideas in no more than 140 characters makes Twitter one of the most prevalent social networking websites in the world. Being popular in Saudi Arabia, we believe that tweets are a good source to capture the public’s sentiment, especially since the country is in a fractious region. Going over the challenges and the difficulties that the Arabic tweets present – using Saudi Arabia as a basis – we propose our solution. A typical problem is the practice of tweeting in dialectical Arabic. Based on our observation we recommend a hybrid approach that combines semantic orientation and machine learning techniques. Through this approach, the lexical-based classifier will label the training data, a time-consuming task often prepared manually. The output of the lexical classifier will be used as training data for the SVM machine learning classifier. The experiments show that our hybrid approach improved the F-measure of the lexical classifier by 5.76% while the accuracy jumped by 16.41%, achieving an overall F-measure and accuracy of 84 and 84.01% respectively.

    October 19, 2015   doi: 10.1177/0165551515610513   open full text
  • Exploiting semantics for searching agricultural bibliographic data.
    Beneventano, D., Bergamaschi, S., Martoglia, R.
    Journal of Information Science. October 05, 2015

    Filtering and search mechanisms which permit to identify key bibliographic references are fundamental for researchers. In this paper we propose a fully automatic and semantic method for filtering/searching bibliographic data, which allows users to look for information by specifying simple keyword queries or document queries, i.e. by simply submitting existing documents to the system. The limitations of standard techniques, based on either syntactical text search and on manually assigned descriptors, are overcome by considering the semantics intrinsically associated to the document/query terms; to this aim, we exploit different kinds of external knowledge sources (both general and specific domain dictionaries or thesauri). The proposed techniques have been developed and successfully tested for agricultural bibliographic data, which play a central role to enable researchers and policy makers to retrieve related agricultural and scientific information by using the AGROVOC thesaurus.

    October 05, 2015   doi: 10.1177/0165551515606579   open full text
  • Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news.
    Kim, E. H.-J., Jeong, Y. K., Kim, Y., Kang, K. Y., Song, M.
    Journal of Information Science. October 05, 2015

    The present study investigates topic coverage and sentiment dynamics of two different media sources, Twitter and news publications, on the hot health issue of Ebola. We conduct content and sentiment analysis by: (1) applying vocabulary control to collected datasets; (2) employing the n-gram LDA topic modeling technique; (3) adopting entity extraction and entity network; and (4) introducing the concept of topic-based sentiment scores. With the query term ‘Ebola’ or ‘Ebola virus’, we collected 16,189 news articles from 1006 different publications and 7,106,297 tweets with the Twitter stream API. The experiments indicate that topic coverage of Twitter is narrower and more blurry than that of the news media. In terms of sentiment dynamics, the life span and variance of sentiment on Twitter is shorter and smaller than in the news. In addition, we observe that news articles focus more on event-related entities such as person, organization and location, whereas Twitter covers more time-oriented entities. Based on the results, we report on the characteristics of Twitter and news media as two distinct news outlets in terms of content coverage and sentiment dynamics.

    October 05, 2015   doi: 10.1177/0165551515608733   open full text
  • When time meets information retrieval: Past proposals, current plans and future trends.
    Moulahi, B., Tamine, L., Yahia, S. B.
    Journal of Information Science. September 30, 2015

    With the advent of Web search and the large amount of data published on the Web sphere, a tremendous amount of documents become strongly time-dependent. In this respect, the time dimension has been extensively exploited as a highly important relevance criterion to improve the retrieval effectiveness of document ranking models. Thus, a compelling research interest is going on the temporal information retrieval realm, which gives rise to several temporal search applications. In this article, we intend to provide a scrutinizing overview of time-aware information retrieval models. We specifically put the focus on the use of timeliness and its impact on the global value of relevance as well as on the retrieval effectiveness. First, we attempt to motivate the importance of temporal signals, whenever combined with other relevance features, in accounting for document relevance. Then, we review the relevant studies standing at the crossroads of both information retrieval and time according to three common information retrieval aspects: the query level, the document content level and the document ranking model level. We organize the related temporal-based approaches around specific information retrieval tasks and regarding the task at hand, we emphasize the importance of results presentation and particularly timelines to the end user. We also report a set of relevant research trends and avenues that can be explored in the future.

    September 30, 2015   doi: 10.1177/0165551515607277   open full text
  • Implications of augmented reality in the management of television audiovisual information.
    Caldera-Serrano, J., Leon-Moreno, J.-A.
    Journal of Information Science. September 30, 2015

    This document analyses the possibilities offered by augmented reality for exploiting the audiovisual collections of television archives, thereby presenting the idea, which has not been developed by any network, of providing viewers with the material issued and submitted synchronously for broadcasting. By using external elements other than TV sets (tablets, iPads, Smartphones), users can access images from the archive, which have been used to generate the information or programme, contributing also with additional elements that may be of interest. Other contents related to the information provided may be similarly accessed, thus facilitating a real conceptual map of the audiovisual contents of any event. Access may be granted free of charge or by paying a fee. Commercial exploitation is achieved in the form of viewer loyalty by gaining access to additional content and providing greater bi-directionality to the communication between viewers and the media.

    September 30, 2015   doi: 10.1177/0165551515608341   open full text
  • Social informatics as a concept: Widening the discourse.
    Smutny, Z.
    Journal of Information Science. September 30, 2015

    This contribution examines the different concepts known as social informatics that have historically been separate. The paradigm that is preferred worldwide (based on Kling) is well described and often promoted, with a strong base both in the USA and Europe. This article, however, introduces lesser-known paradigms (based on Sokolov and later Ursul) that originated in the era of the USSR and have so far been employed chiefly in post-Soviet countries, including Russia. These paradigms have been neglected in English-written scientific literature, mainly because of the limited number of articles available in English. Other approaches are also introduced and related, which were historically named or classified as social informatics (American, British, Norwegian, Slovenian, German and Japanese). The present article introduces and further discusses the origin, historical development and basic methodological grounding of these approaches. All the approaches are then discussed and their differences as well as their similarities are pointed out. The aim is to create connections across the current generation of researchers, which includes the formation and conceptualization of different approaches and an exploration of possible areas for future cooperation.

    September 30, 2015   doi: 10.1177/0165551515608731   open full text
  • Approximate pattern matching with gap constraints.
    Wu, Y., Tang, Z., Jiang, H., Wu, X.
    Journal of Information Science. September 21, 2015

    Pattern matching is a key issue in sequential pattern mining. Many researchers now focus on pattern matching with gap constraints. However, most of these studies involve exact pattern matching problems, a special case of approximate pattern matching and a more challenging task. In this study, we introduce an approximate pattern matching problem with Hamming distance. Its objective is to compute the number of approximate occurrences of pattern P with gap constraints in sequence S under similarity constraint d. We propose an efficient algorithm named Single-rOot Nettree for approximate pattern matchinG with gap constraints (SONG) based on a new non-linear data structure Single-root Nettree to effectively solve the problem. Theoretical analysis and experiments demonstrate an interesting law that the ratio M(P,S,d)/N(P,S,m) approximately follows a binomial distribution, where M(P,S,d) and N(P,S,m) are the numbers of the approximate occurrences whose distances to pattern P are d (0≤dm) and no more than m (the length of pattern P), respectively. Experimental results for real biological data validate the efficiency and effectiveness of SONG.

    September 21, 2015   doi: 10.1177/0165551515603286   open full text
  • OLFinder: Finding opinion leaders in online social networks.
    Aleahmad, A., Karisani, P., Rahgozar, M., Oroumchian, F.
    Journal of Information Science. September 21, 2015

    Opinion leaders are the influential people who are able to shape the minds and thoughts of other people in their society. Finding opinion leaders is an important task in various domains ranging from marketing to politics. In this paper, a new effective algorithm for finding opinion leaders in a given domain in online social networks is introduced. The proposed algorithm, named OLFinder, detects the main topics of discussion in a given domain, calculates a competency and a popularity score for each user in the given domain, then calculates a probability for being an opinion leader in that domain by using the competency and the popularity scores and finally ranks the users of the social network based on their probability of being an opinion leader. Our experimental results show that OLFinder outperforms other methods based on precision-recall, average precision and P@N measures.

    September 21, 2015   doi: 10.1177/0165551515605217   open full text
  • Applying the semantic web to represent an individual's academic and professional background.
    Teixeira, F., Araujo, G. D., Baptista, R., Araujo, L. V., Pisa, I. T.
    Journal of Information Science. September 21, 2015

    The Lattes Platform is a web-based system that brings together the academic, professional and scientific histories of students, teachers, researchers and other professionals linked to scientific and technological careers. The data are entered by users themselves and are the subject of much research and forecasting in relation to how educational resources are directed in Brazil. In this paper, we report our experience in applying the Linked Data principles to this system. We have also demonstrated the potential of federated queries using data from DBPedia.

    September 21, 2015   doi: 10.1177/0165551515605742   open full text
  • Exploring collaborative work among graduate students through the C5 model of collaboration: A diary study.
    Shah, C., Leeder, C.
    Journal of Information Science. September 11, 2015

    Collaborative work among students, while an important topic of inquiry, needs further treatment as we still lack the knowledge regarding obstacles that students face, the strategies they apply, and the relations among personal and group aspects. This article presents a diary study of 54 master’s students conducting group projects across four semesters. A total of 332 diary entries were analysed using the C5 model of collaboration that incorporates elements of communication, contribution, coordination, cooperation and collaboration. Quantitative and qualitative analyses show how these elements relate to one another for students working on collaborative projects. It was found that face-to-face communication related positively with satisfaction and group dynamics, whereas online chat correlated positively with feedback and closing the gap. Managing scope was perceived to be the most common challenge. The findings suggest the varying affordances and drawbacks of different methods of communication, collaborative work styles and the strategies of group members.

    September 11, 2015   doi: 10.1177/0165551515603322   open full text
  • Topic segmentation using word-level semantic relatedness functions.
    Ercan, G., Cicekli, I.
    Journal of Information Science. September 04, 2015

    Semantic relatedness deals with the problem of measuring how much two words are related to each other. While there is a large body of research for developing new measures, the use of semantic relatedness (SR) measures in topic segmentation has not been explored. In this research the performance of different SR measures is evaluated in the topic segmentation problem. To this end, two topic segmentation algorithms that use the difference in SR of words are introduced. Our results indicate that using an SR measure trained with a general domain corpora achieves better results than topic segmentation algorithms using Wordnet or simple word repetition. Furthermore, when compared with computationally more complex algorithms performing global analysis, our local analysis, enhanced with general domain lexical semantic information, achieves comparable results.

    September 04, 2015   doi: 10.1177/0165551515602460   open full text
  • QSem: A novel question representation framework for question matching over accumulated question-answer data.
    Hao, T., Qu, Y.
    Journal of Information Science. August 24, 2015

    This paper proposes a novel question representation framework to assist automated question answering through reusing accumulated question–answer data. The framework, named QSem, defines three types of question words – question-target words, user-oriented words and irrelevant words, along with semantic patterns, for representing a question. The question word types are semantically labelled by a pre-defined ontology to enrich the semantic representation of questions. The semantic patterns through equivalent pattern linking enhance normal structure matching aiming at improving question matching performance. We trained QSem on 400 randomly selected questions with semantic patterns and obtained optimized parameters. After that, 5000 questions from our system were tested and the precision of question matching was between 0.71 and 0.93 with respect to various generators, indicating the stability of the approach. We further compared our approach with Cosine similarity, WordNet-based semantic similarity and IBM translation model on a standard TREC dataset containing 5536 questions. The results presented that our approach achieved best performance with mean reciprocal rank increased by 7.2% and accuracy increased by 7.5% on average, demonstrating the effectiveness of the approach.

    August 24, 2015   doi: 10.1177/0165551515602457   open full text
  • A refined twig-join swift query algorithm for diversification issues of XML.
    Kung, Y.-W., Chang, H.-K., Lee, C.-N.
    Journal of Information Science. August 19, 2015

    Compiling documents in extensible markup language (XML) plays an important role in accessing data services. An efficient query service should be based on a skillful representation that can support query diversification and solve ambiguity in order to improve high-precision search capabilities. However, to the best of our knowledge, research on query diversification, target hierarchical level and the problem of ambiguity is insufficient. In this study we aimed to solve these problems so that the results are able not only to satisfy query diversification, but also to offer better precision compared with the existing twig join algorithms. An extended twig join Swift (TJSwift) associated with adjacent linked lists for the provision of efficient XML query services is also proposed, whereby queries can be versatile in terms of predicates. It can completely preserve hierarchical information; in addition, the new index generated from XML is used to save semantic information.

    August 19, 2015   doi: 10.1177/0165551515601004   open full text
  • Incorporating social media comments in affective video retrieval.
    Nemati, S., Naghsh-Nilchi, A. R.
    Journal of Information Science. August 12, 2015

    Affective video retrieval systems aim at finding video contents matching the desires and needs of users. Existing systems typically use the information contained in the video itself to specify its affect category. These systems either extract low-level features or build up higher-level attributes to train classification algorithms. However, using low-level features ignores global relations in data and constructing high-level features is time consuming and problem dependent. To overcome these drawbacks, an external source of information may be helpful. With the explosive growth and availability of social media, users’ comments could be such a valuable source of information. In this study, a new method for incorporating social media comments with the audio-visual contents of videos is proposed. Furthermore, for the combination stage a decision-level fusion method based on the Dempster–Shafer theory of evidence is presented. Experiments are carried out on the video clips of the DEAP (Database for Emotion Analysis using Physiological signals) dataset and their associated users’ comments on YouTube. Results show that the proposed system significantly outperforms the baseline method of using only the audio-visual contents for affective video retrieval.

    August 12, 2015   doi: 10.1177/0165551515593689   open full text
  • Discovering aspects of online consumer reviews.
    Suleman, K., Vechtomova, O.
    Journal of Information Science. August 12, 2015

    In this paper we propose a fully unsupervised approach for product aspect discovery in on-line consumer reviews. We apply a two-step hierarchical clustering process in which we first cluster words representing aspects based on the semantic similarity of their contexts and then on the similarity of the hypernyms of the cluster members. Our approach also includes a method for assigning class labels to each of the clusters. We evaluated our methods on large datasets of restaurant and camera reviews and found that the two-step clustering process performed better than a single-step clustering process at identifying aspects and words refering to aspects. Finally, we compare our method to a state-of-the-art topic modelling approach by Titov and McDonald, and demonstrate better results on both datasets.

    August 12, 2015   doi: 10.1177/0165551515595742   open full text
  • Improving the geospatial consistency of digital libraries metadata.
    Renteria-Agualimpia, W., Lopez-Pellicer, F. J., Lacasta, J., Zarazaga-Soria, F. J., Muro-Medrano, P. R.
    Journal of Information Science. August 12, 2015

    Consistency is an essential aspect of the quality of metadata. Inconsistent metadata records are harmful: given a themed query, the set of retrieved metadata records would contain descriptions of unrelated or irrelevant resources, and may even not contain some resources considered obvious. This is even worse when the description of the location is inconsistent. Inconsistent spatial descriptions may yield invisible or hidden geographical resources that cannot be retrieved by means of spatially themed queries. Therefore, ensuring spatial consistency should be a primary goal when reusing, sharing and developing georeferenced digital collections. We present a methodology able to detect geospatial inconsistencies in metadata collections based on the combination of spatial ranking, reverse geocoding, geographic knowledge organization systems and information-retrieval techniques. This methodology has been applied to a collection of metadata records describing maps and atlases belonging to the Library of Congress. The proposed approach was able to automatically identify inconsistent metadata records (870 out of 10,575) and propose fixes to most of them (91.5%) These results support the ability of the proposed methodology to assess the impact of spatial inconsistency in the retrievability and visibility of metadata records and improve their spatial consistency.

    August 12, 2015   doi: 10.1177/0165551515597364   open full text
  • Information encountering on social media and tacit knowledge sharing.
    Panahi, S., Watson, J., Partridge, H.
    Journal of Information Science. August 12, 2015

    The purpose of this paper is to investigate how social media may support information encountering (i.e. where individuals encounter useful and interesting information while seeking or browsing for some other information) and how this may lead to the facilitation of tacit knowledge creation and sharing. The study employed a qualitative survey design that interviewed 24 physicians who were active users of social media to better understand the phenomenon of information encountering on social media. The data was analysed using the thematic analysis approach. The study found six main ways through which social media supports information encountering. Furthermore, drawing upon knowledge creation theories, the study concluded that information encountering on social media facilitates tacit knowledge creation and sharing among individuals. The study provides new directions for further empirical investigations to examine whether information encountering on social media actually leads to tacit knowledge creation and sharing. The findings of the study may also provide opportunities for users to adopt social media effectively or gain greater value from social media use.

    August 12, 2015   doi: 10.1177/0165551515598883   open full text
  • An exploration of search session patterns in an image-based digital library.
    Han, H., Wolfram, D.
    Journal of Information Science. August 12, 2015

    Three months of server transaction logs containing complete clickstream data for an image collection digital library were analysed for usage patterns to better understand user searching and browsing behaviour in this environment. Eleven types of user actions were identified from the log content. The study is novel in its combined analytical techniques and use of clickstream data from an image-based digital library. Three analytical techniques were used to analyse the data: (a) network analysis to better understand the relationship between sequential actions; (b) sequential pattern mining to identify frequent action sequences; and (c) k-means cluster analysis to identify groups of session patterns. The analysis revealed strong ties between several pairs of actions, relatively short pattern sequences that frequently duplicate previous actions and largely uniform session behaviour with little individual item browsing within sessions, indicating users are primarily engaged in purposeful and directed searching. Developers of image-based digital libraries should consider design features that support rapid browsing.

    August 12, 2015   doi: 10.1177/0165551515598952   open full text
  • The information environment and information behaviour of the Offshore Installation Manager (OIM) in the context of safety and emergency response: An exploratory study.
    Marcella, R., Lockerbie, H.
    Journal of Information Science. August 12, 2015

    The offshore installation manager (OIM) is a unique role in the oil and gas industry with the legal responsibility for the health and safety of individuals on an offshore installation, as well as holding commercial responsibilities. Using exploratory, qualitative data based on 10 interviews conducted with OIMs, the information environment and behaviour of the OIM are described and areas for further research are explored. The OIM’s information environment is one that is complex and relies heavily on both formal and informal sources of information. Two modes of OIM information behaviour are identified: everyday information need, in which the OIM seeks, uses and shares information to maintain safe operations; and emergency information need, in which there is both reliance on information that must be known in order to react to an emergency situation and a need for accessible information about the status of a rapidly changing environment. The OIM is both the user of information and a source of information for others and as such must be trusted, reliable and automotive.

    August 12, 2015   doi: 10.1177/0165551515600118   open full text
  • Incorporating social information to perform diverse replier recommendation in question and answer communities.
    Liu, Y., Lin, Z., Zheng, X., Chen, D.
    Journal of Information Science. July 21, 2015

    Social information is contextual information that has made significant contributions to intelligent information systems. However, social information has not been fully used, especially in question and answer (Q&A) systems. This study describes a contextual recommendation method in which diverse repliers are recommended for new questions using incorporated social information in Q&A communities. We have mined multiple kinds of social information by analysing social behaviours and relations found in a Q&A community and proposed an algorithm to incorporate different social information in various social contexts to perform diverse repliers’ recommendations. Recommendation diversity and social contexts have been considered and the properly used social information has been emphasized in this study. We conducted experiments using a dataset collected from the Stack Overflow website. The results demonstrate that different social information makes different contributions in promoting question answering, and incorporating social information properly could improve recommendation diversity and performance, which would then result in the promotion of satisfactory question solving.

    July 21, 2015   doi: 10.1177/0165551515592093   open full text
  • Generating query suggestions by exploiting latent semantics in query logs.
    Momtazi, S., Lindenberg, F.
    Journal of Information Science. July 21, 2015

    Search engines assist users in expressing their information needs more accurately by reformulating the issued queries automatically and suggesting the generated formulations to the users. Many approaches to query suggestion draw on the information stored in query logs, recommending recorded queries that are textually similar to the current user’s query or that frequently co-occurred with it in the past. In this paper, we propose an approach that concentrates on deducing the actual information need from the user’s query. The challenge therein lies not only in processing keyword queries, which are often short and possibly ambiguous, but especially in handling the complexity of natural language that allows users to express the same or similar information needs in various differing ways. We expect a higher-level semantic representation of a user’s query to more accurately reflect the information need than the explicit query terms alone can. To this aim, we employ latent Dirichlet allocation as a probabilistic topic model to reveal latent semantics in the query log. Our evaluations show that, whereas purely topic-based query suggestion performs the worst, the interpolation of our proposed topic-based model with the baseline word-based model that generates suggestions based on matching query terms achieves significant improvements in suggestion quality over the already well performing purely word-based approach.

    July 21, 2015   doi: 10.1177/0165551515594723   open full text
  • Summary generation approaches based on semantic analysis for news documents.
    Kogilavani, S. V., Kanimozhiselvi, C. S., Malliga, S.
    Journal of Information Science. July 21, 2015

    With the exponential growth of the internet, a lot of online news reports are produced on the web every day. The news stream flows so rapidly that no one has the time to look at each and every item of information. In this situation, a person would naturally prefer to read updated information at certain time intervals. Document updating technique is very helpful for individuals to acquire new information or knowledge by eliminating out-of-date or redundant information. Existing summarization systems involve identifying the most relevant sentences from the text and putting them together to create a concise initial summary. In the process of identifying the important sentences, features influencing the relevance of sentences are determined. Based on these features the salience of the sentence is calculated and an initial summary is generated from highly important sentences at different compression rates. These types of initial summaries work on a batch of documents and do not consider the documents that may arrive at later time, so that corresponding summaries need to get updated. The update summarization system addresses this issue by taking into account the documents read by the user in the past and seeks to present only fresh or different information. The first step is to create an initial summary based on basic and additional features. The next step is to create an update summary based on the basic, additional and update features. In this paper, two approaches are proposed for generating initial and update summary from multiple documents about given news. The first approach performs semantic analysis by modifying the vector space model with dependency parse relations and applying latent semantic analysis on it to create a summary. The second approach applies sentence annotation based on aspects, prepositions and named entities to generate summary. Experimental results show that the proposed approaches generate better initial and update summaries compared with the existing systems.

    July 21, 2015   doi: 10.1177/0165551515594726   open full text
  • Improving pseudo relevance feedback based query expansion using genetic fuzzy approach and semantic similarity notion.
    Bhatnagar, P., Pareek, N.
    Journal of Information Science. May 19, 2014

    Pseudo relevance feedback-based query expansion is a popular automatic query expansion technique. However, a survey of work done in the area shows that it has a mixed chance of success. This paper captures the limitations of pseudo relevance feedback (PRF)-based query expansion and proposes a method of enhancing its performance by hybridizing corpus-based information, with a genetic fuzzy approach and semantic similarity notion. First the paper suggests use of a genetic fuzzy approach to select an optimal combination of query terms from a pool of terms obtained using PRF-based query expansion. The query terms obtained are further ranked on the basis of semantic similarity with original query terms. The experiments were performed on CISI collection, a benchmark dataset for information retrieval. It was found that the results were better in both terms of recall and precision. The main observation is that the hybridization of various techniques of query expansion in an intelligent way allows us to incorporate the good features of all of them. As this is a preliminary attempt in this direction, there is a large scope for enhancing these techniques.

    May 19, 2014   doi: 10.1177/0165551514533771   open full text
  • Modalities, motivations, and materials - investigating traditional and social online Q&A services.
    Shah, C., Kitzie, V., Choi, E.
    Journal of Information Science. May 19, 2014

    With the advent of ubiquitous connectivity and a constant flux of user-generated content, people’s online information-seeking behaviours are rapidly changing, one o f which includes seeking information from peers through online questioning. Ways to understand this new behaviour can be broken down into three aspects, also referred to as the three M’s – the modalities (sources and strategies) that people use when asking their questions online, their motivations behind asking these questions and choosing specific services, and the types and quality of the materials (content) generated in such an online Q&A environment. This article will provide a new framework – three M’s – based on the synthesis of relevant literature. It will then identify some of the gaps in our knowledge about online Q&A based on this framework. These gaps will be transformed into six research questions, stemming from the three M’s, and addressed by (a) consolidating and synthesizing findings previously reported in the literature, (b) conducting new analyses of data used in prior work, and (c) administering a new study to answer questions unaddressed by the pre-existing and new analyses of prior work.

    May 19, 2014   doi: 10.1177/0165551514534140   open full text
  • Integrating Spanish lexical resources by meta-classifiers for polarity classification.
    Martinez-Camara, E., Martin-Valdivia, M. T., Molina-Gonzalez, M. D., Perea-Ortega, J. M.
    Journal of Information Science. May 19, 2014

    In this paper we focus on unsupervised sentiment analysis in Spanish. The lack of resources for languages other than English, as for example Spanish, adds more complexity to the task. However, we take advantage of some good already existing lexical resources. We have carried out several experiments using different unsupervised approaches in order to compare the different methodologies for solving the problem of the Spanish polarity classification in a corpus of movie reviews. Among all these approaches, perhaps the newest one integrates SentiWordNet with the Multilingual Central Repository to tackle polarity detection directly over the Spanish corpus. However, the results obtained were not as promising as we expected, and so we carried out another group of experiments combining all the methods using meta-classifiers. The results obtained with stacking outperformed the individual experiments and encourage us to continue in this way.

    May 19, 2014   doi: 10.1177/0165551514535710   open full text
  • Hyperlinks as inter-university collaboration indicators.
    Kenekayoro, P., Buckley, K., Thelwall, M.
    Journal of Information Science. May 13, 2014

    Collaboration is essential for some types of research, and some agencies include collaboration among the requirements for funding research projects. This makes it important to analyse collaborative research ties. Traditional methods to indicate the extent of collaboration between organizations use co-authorship data in citation databases. Publication data from these databases are not publicly available and can be expensive to access and so hyperlink data has been proposed as an alternative. This paper investigates whether using machine learning methods to filter page types can improve the extent to which hyperlink data can be used to indicate the extent of collaboration between universities. Structured information about research projects extracted from UK and EU funding agency websites, co-authored publications and academic links between universities were analysed to identify if there is any association between the number of hyperlinks connecting two universities, with and without machine learning filtering, and the number of publications they co-authored. An increased correlation was found between the number of inlinks to a university’s website and the extent to which it collaborates with other universities when machine learning techniques were used to filter out apparently irrelevant inlinks.

    May 13, 2014   doi: 10.1177/0165551514534141   open full text
  • A study of the effects of preprocessing strategies on sentiment analysis for Arabic text.
    Duwairi, R., El-Orfali, M.
    Journal of Information Science. May 12, 2014

    Sentiment analysis has drawn considerable interest among researchers owing to the realization of its fascinating commercial and business benefits. This paper deals with sentiment analysis in Arabic text from three perspectives. First, several alternatives of text representation were investigated. In particular, the effects of stemming, feature correlation and n-gram models for Arabic text on sentiment analysis were investigated. Second, the behaviour of three classifiers, namely, SVM, Naive Bayes, and K-nearest neighbour classifiers, with sentiment analysis was investigated. Third, the effects of the characteristics of the dataset on sentiment analysis were analysed. To this end, we applied the techniques proposed in this paper to two datasets; one was prepared in-house by the authors and the second one is freely available online. All the experimentation was done using Rapidminer. The results show that our selection of preprocessing strategies on the reviews increases the performance of the classifiers.

    May 12, 2014   doi: 10.1177/0165551514534143   open full text
  • Accurate keyphrase extraction by discriminating overlapping phrases.
    Haddoud, M., Abdeddaim, S.
    Journal of Information Science. April 15, 2014

    In this paper we define the document phrase maximality index (DPM-index), a new measure to discriminate overlapping keyphrase candidates in a text document. As an application we developed a supervised learning system that uses 18 statistical features, among them the DPM-index and five other new features. We experimentally compared our results with those of 21 keyphrase extraction methods on SemEval-2010/Task-5 scientific articles corpus. When all the systems extract 10 keyphrases per document, our method enhances by 13% the F-score of the best system. In particular, the DPM-index feature increases the F-score of our keyphrase extraction system by a rate of 9%. This makes the DPM-index contribution comparable to that of the well-known TFIDF measure on such a system.

    April 15, 2014   doi: 10.1177/0165551514530210   open full text
  • Automatic identification of light stop words for Persian information retrieval systems.
    Sadeghi, M., Vegas, J.
    Journal of Information Science. April 11, 2014

    Stop word identification is one of the most important tasks for many text processing applications such as information retrieval. Stop words occur too frequently in documents in a collection and do not contribute significantly to determining the context or information about the documents. These words are worthless as index terms and should be removed during indexing as well as before querying by an information retrieval system. In this paper, we propose an automatic aggregated methodology based on term frequency, normalized inverse document frequency and information model to extract the light stop words from Persian text. We define a ‘light stop word’ as a stop word that has few letters and is not a compound word. In the Persian language, a complete stop word list can be derived by combining the light stop words. The evaluation results, using a standard corpus, show a good percentage of coincidence between the Persian and English stop words and a significant improvement in the number of index terms. Specifically, the first 32 Persian light stop words have a great impact on the index size reduction and the set of stop words can reduce the number of index terms by about 27%.

    April 11, 2014   doi: 10.1177/0165551514530655   open full text
  • Evaluating collaborative information seeking - synthesis, suggestions, and structure.
    Shah, C.
    Journal of Information Science. April 10, 2014

    Evaluating the performance of collaborative information seeking (CIS) systems and users can be challenging, often more so than individual information-seeking environments. This can be attributed to the complex and dynamic interactions that take place among various users and systems processes in a CIS environment. While some of the aspects of a CIS system or user could be measured by typical assessment techniques from single-user information retrieval/seeking (IR/IS), one often needs to go beyond them to provide a meaningful evaluation, helping to provide not only a sense of performance, but also insights into design decisions (regarding systems) and behavioural trends (regarding users). This article first provides an overview of existing methods and techniques for evaluating CIS (synthesis). It then extracts valuable directives and advice from the literature that inform evaluation choices (suggestions). Finally, the article presents a framework for CIS evaluation with two major parts: system-based and user-based (structure). The proposed framework incorporates various instruments taken from computer and social sciences literature as applicable to CIS evaluations. The lessons from the literature and the framework could serve as important starting points for designing experiments and systems, as well as evaluating system and user performances in CIS and related research areas.

    April 10, 2014   doi: 10.1177/0165551514530651   open full text
  • Refining Kea++ automatic keyphrase assignment.
    Irfan, R., Khan, S., Qamar, A. M., Bloodsworth, P. C.
    Journal of Information Science. March 31, 2014

    Keyphrases facilitate finding the right information in digital sources. Keyphrase assignment is the alignment of documents or text with keyphrases of any standard taxonomy/classification system. Kea++ is an automatic keyphrase assignment tool using a machine learning-based technique. However, it does not effectively exploit the hierarchical relations that exist in its input taxonomy and returns noise in its results. The refinement methodology was designed as a top layer of Kea++ in order to fine tune its results. It was an initial step and focused on a single Computing domain. It was neither validated on multiple domains nor evaluated to determine whether the improvement in the results is significant or not. The aim of this task was to solidify the refinement methodology. The main contributions of this work are (a) to extend the methodology for multiple domains and (b) to statistically verify that the improvement in the Kea++ results is significant.

    March 31, 2014   doi: 10.1177/0165551514529054   open full text
  • Automatic image annotation using affective vocabularies: Attribute-based learning approach.
    Jeong, J.-W., Lee, D.-H.
    Journal of Information Science. March 24, 2014

    To improve image search results, understanding and exploiting the subjective aspects of an image is critical. However, how to effectively extract these subjective aspects (e.g. feeling, emotion, and so on) from an image is a challenging problem. In this paper, we propose a novel approach for predicting affective aspects, one of the most interesting subjective aspects, of concepts in images by learning the semantic attributes of the concept and mining the association between the attributes and affective aspects. The main idea of the proposed approach comes from the assumption that semantic attributes of a concept will influence the user’s affect towards the concept (e.g. an animal with the semantic attributes ‘small’, ‘furry’, ‘white’ can be associated with the affective term ‘cute’). Based on this assumption, we build a multi-layer affect learning framework that consists of (1) an attribute learning layer that predicts semantic attributes of a concept and (2) an affect learning layer that exploits the outputs from the attribute learning layer for predicting the affective aspects of the concept. Through the experimental results on the Animals with Attributes dataset, we show that the proposed approach outperforms traditional approaches by up to 25% in terms of precision and successfully predicts the affect of concepts in images according to different user preferences.

    March 24, 2014   doi: 10.1177/0165551513501267   open full text
  • Extracting the roots of Arabic Words without removing affixes.
    Yaseen, Q., Hmeidi, I.
    Journal of Information Science. March 10, 2014

    Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words’ roots. The proposed algorithm, which is called the Word Substring Stemming Algorithm, does not remove affixes during the extraction process. Rather, it is based on producing the set of all substrings of an Arabic word, and uses the Arabic roots file, the Arabic patterns file and a concrete set of rules to extract correct roots from substrings. The experiments have shown that the proposed approach is competitive and its accuracy is 83.9%, Furthermore, its accuracy can be enhanced more in the sense that, for about 9.9% of the tested words, the WSS algorithm retrieves two candidates (in most cases) for the correct root.

    March 10, 2014   doi: 10.1177/0165551514526348   open full text
  • Extracting term units and fact units from existing databases using the Knowledge Discovery Metamodel.
    Normantas, K., Vasilecas, O.
    Journal of Information Science. March 10, 2014

    The extraction of business vocabulary is one of the main tasks in discovering business knowledge implemented in a software system. In this paper we present a model-driven approach to the extraction of business vocabularies from databases of existing software systems. We describe a transformation framework for obtaining the Knowledge Discovery Metamodel based representation of data structure and define an algorithm for the extraction of candidates for business vocabulary entries (i.e. Term and Fact Units) from the representation. The extracted candidates may be further refined by business analysts and used for the identification of business scenarios and rules in software systems.

    March 10, 2014   doi: 10.1177/0165551514526336   open full text
  • Intellectual structure of the institutional repository field: A co-word analysis.
    Cho, J.
    Journal of Information Science. March 04, 2014

    The institutional repository is a major means of providing open access to academic output and is changing academic communications. As use of the institutional repository is spreading, research advancing its management policy and technology has been conducted in the library and academic communities. This study has undertaken a co-word analysis of author keywords in articles from the SCOPUS database from 1997 to 2012 and found 8 clusters that represent the intellectual structure of Institutional Repository Research, including ‘Metadata’, ‘Open Access’, ‘Institutional Repository’, ‘digital Library’, ‘dSpace’, ‘Copyright’, ‘Preservation’ and ‘Sematic Web’. To understand these intellectual structures, this study used a co-occurrence matrix based on Pearson’s correlation coefficient to create a clustering of the words using the hierarchical clustering technique. To visualize these intellectual structures, this study carried out a multidimensional scaling analysis, to which a PROXCAL algorithm was applied.

    March 04, 2014   doi: 10.1177/0165551514524686   open full text
  • Multilingual query expansion in the SveMed+ bibliographic database: A case study.
    Gavel, Y., Andersson, P.-O.
    Journal of Information Science. March 03, 2014

    SveMed+ is a bibliographic database covering Scandinavian medical journals. It is produced by the University Library of Karolinska Institutet in Sweden. The bibliographic references are indexed with terms from the Medical Subject Headings (MeSH) thesaurus. The MeSH has been translated into several languages, including Swedish, making it suitable as the basis for multilingual tools in the medical field. The data structure of SveMed+ closely mimics that of PubMed/MEDLINE. Users of PubMed/MEDLINE and similar databases typically expect retrieval features that are not readily available off-the-shelf. The SveMed+ interface is based on a free text search engine (Solr) and a relational database management system (Microsoft SQL Server) containing the bibliographic database and a multilingual thesaurus database. The thesaurus database contains medical terms in three different languages and information about relationships between the terms. A combined approach involving the Solr free text index, the bibliographic database and the thesaurus database allowed the implementation of functionality such as automatic multilingual query expansion, faceting and hierarchical explode searches. The present paper describes how this was done in practice.

    March 03, 2014   doi: 10.1177/0165551514524685   open full text
  • Aara'- a system for mining the polarity of Saudi public opinion through e-newspaper comments.
    Azmi, A. M., Alzanin, S. M.
    Journal of Information Science. March 03, 2014

    Aara’ is a system for mining opinion polarity through the pool of comments that readers write anonymously at the online edition of Saudi newspapers. We use a nave Bayes classifier with a revised n-gram approach to extract the public opinion polarity, which is expressed in Arabic, classifying it into four categories. For training we manually marked the comments as belonging to one of the categories. All the words in the documents of the training set were removed except those with explicit connotations. After the training the words designated as vocabulary were classified into one of the categories. Our system carries out polarity classification over informal colloquial Arabic that is unstructured and with a reasonable proportion of spelling errors. The result of testing our system showed a macro-averaged precision of 86.5%, while the macro-averaged F-score was 84.5%. The accuracy of the system is 82%.

    March 03, 2014   doi: 10.1177/0165551514524675   open full text
  • Performance of LDA and DCT models.
    Rathore, A. S., Roy, D.
    Journal of Information Science. February 24, 2014

    The Doubly Correlated Topic Model is a generative probabilistic topic model for automatically identifying topics from the corpus of the text documents. It is a mixed membership model, based on the fact that a document exhibits a number of topics. We used word co-occurrence statistical information for identifying an initial set of topics as posterior information for the model. Posterior inference methods utilized by the existing models are intractable and therefore provide an approximate solution. Consideration of co-occurred words as initial topics provides a tighter bound on the topic coherence. The proposed model is motivated by the Latent Dirichlet Allocation Model. The Doubly Correlated Topic Model differs from the Latent Dirichlet Allocation Model in its posterior inference; it uses the highest ranked co-occurred words as initial topics rather than obtaining from Dirichlet priors. The results of the proposed model suggest some improved performance on entropy and topical coherence over different datasets.

    February 24, 2014   doi: 10.1177/0165551514524678   open full text
  • SQL-based semantics for path expressions over hierarchical data in relational databases.
    Vainio, J., Junkkari, M.
    Journal of Information Science. February 18, 2014

    Hierarchical part-of relationships/aggregation structures and related queries are essential parts of information systems. However, relational database query languages do not explicitly support hierarchical relationships and queries. A hierarchical query may require a great number of join operations, which increases the effort in query formulation. Therefore, we propose path expressions in formulating hierarchical views over relational data because path expressions are a conventional and compact way to represent hierarchical relationships. We embed path expressions within SQL queries and compile them to standard SQL. This ensures that the path expressions can straightforwardly be implemented on the top of standard relational database systems. The compilation of a path expression is given by an attribute grammar, a conventional formalism to define the semantics of a language.

    February 18, 2014   doi: 10.1177/0165551514520943   open full text
  • Learning time-sensitive domain ontology from scientific papers with a hybrid learning method.
    Ren, F.
    Journal of Information Science. February 17, 2014

    Large numbers of available scientific papers makes the research of ontology construction an attractive application area. However, there are two shortcomings for most current ontology construction approaches. First, implicit time properties of domain concepts are rarely taken into account in current approaches. Second, current automatic concept relation extraction methods mainly rely on the local context information that surrounds current considered concepts. These two problems prevent most current ontology construction methods from being employed to their full potential. To tackle these problems, we propose a hybrid learning method to integrate concepts’ global information and human experts’ knowledge together into ontology construction, among which concepts’ temporal attributes are taken into account. Our method first divides each concept into four time periods according to their attribution distribution on a time axis. Then global time-related attributions are collected for each concept. Finally, concept relations are extracted with a hybrid learning method. We evaluated our method by testing it on Chinese academic papers. It outperformed a baseline system based on only hierarchical concept relations, showing the effectiveness of our approach.

    February 17, 2014   doi: 10.1177/0165551514521927   open full text
  • Systematically retrieving research in the digital age: Case study on the topic of social networking sites and young people's mental health.
    Best, P., Taylor, B., Manktelow, R., McQuilkin, J.
    Journal of Information Science. February 17, 2014

    Online information seeking has become normative practice among both academics and the general population. This study appraised the performance of eight databases to retrieve research pertaining to the influence of social networking sites on the mental health of young people. A total of 43 empirical studies on young people’s use of social networking sites and the mental health implications were retrieved. Scopus and SSCI had the highest sensitivity with PsycINFO having the highest precision. Effective searching requires large generic databases, supplemented by subject-specific catalogues. The methodology developed here may provide inexperienced searchers, such as undergraduate students, with a framework to define a realistic scale of searching to undertake for a particular literature review or similar project.

    February 17, 2014   doi: 10.1177/0165551514521936   open full text
  • Exploiting reviewers' comment histories for sentiment analysis.
    Basiri, M. E., Ghasem-Aghaee, N., Naghsh-Nilchi, A. R.
    Journal of Information Science. February 17, 2014

    Sentiment analysis is used to extract people’s opinion from their online comments in order to help automated systems provide more precise recommendations. Existing sentiment analysis methods often assume that the comments of any single reviewer are independent of each other and so they do not take advantage of significant information that may be extracted from reviewers’ comment histories. Using psychological findings and the theory of negativity bias, we propose a method for exploiting reviewers’ comment histories to improve sentiment analysis. Furthermore, to use more fine-grained information about the content of a review, our method predicts the overall ratings by aggregating sentence-level scores. In the proposed system, the Dempster–Shafer theory of evidence is utilized for score aggregation. The results from four large and diverse social Web datasets establish the superiority of our approach in comparison with the state-of-the-art machine learning techniques. In addition, the results show that the suggested method is robust to the size of training dataset.

    February 17, 2014   doi: 10.1177/0165551514522734   open full text
  • An algorithm to improve the performance of string matching.
    Hlayel, A. A., Hnaif, A.
    Journal of Information Science. January 14, 2014

    Approximate string matching algorithms are techniques used to find a pattern ‘P’ in a text ‘T’ partially or exactly. These techniques become very important in terms of performance and the accuracy of searching results. In this paper, we propose a general approach algorithm, called the Direct Matching Algorithm (DMA). The function of this algorithm is to perform direct access matching for the exact pattern or its similarities within a text depending on the location of a character in alphabetical order. We simulated the DMA in order to show its competence. The simulation result showed significant improvement in the exact string matching or similarity matching, and therefore extreme competence in the real applications.

    January 14, 2014   doi: 10.1177/0165551513519039   open full text
  • Performance evaluation of parallel multithreaded A* heuristic search algorithm.
    Mahafzah, B. A.
    Journal of Information Science. January 13, 2014

    Heuristic search is used in many problems and applications, such as the 15 puzzle problem, the travelling salesman problem and web search engines. In this paper, the A* heuristic search algorithm is reconsidered by proposing a parallel generic approach based on multithreading for solving the 15 puzzle problem. Using multithreading, sequential computers are provided with virtual parallelization, yielding faster execution and easy communication. These advantageous features are provided through creating a dynamic number of concurrent threads at the run time of an application. The proposed approach is evaluated analytically and experimentally and compared with its sequential counterpart in terms of various performance metrics. It is revealed by the experimental results that multithreading is a viable approach for parallel A* heuristic search. For instance, it has been found that the parallel multithreaded A* heuristic search algorithm, in particular, outperforms the sequential approach in terms of time complexity and speedup.

    January 13, 2014   doi: 10.1177/0165551513519212   open full text
  • Corrigendum.

    Journal of Information Science. August 15, 2013

    Corrigendum to Alan Gilchrist, Marcia Lei Zeng, Stella Dextre Clarke, Antoine Isaac, Patrick Lambe and Judi Vernau (2013) Logic and the Organization of Knowledge – an appreciation of the book of this title by Martin Frické. A set of short essays. Journal of Information Science, published OnlineFirst on July 1, 2013 as DOI: 10.1177/0165551513480310.

    Please note that the title of the book by Martin Frické is incorrect throughout, and should be Logic and the Organization of Information.

    The authors would like to apologise for this error.

    August 15, 2013   doi: 10.1177/0165551513502917   open full text
  • Survey of social search from the perspectives of the village paradigm and online social networks.
    Trias i Mansilla, A., de la Rosa i Esteva, J. L.
    Journal of Information Science. August 08, 2013

    Two paradigms currently exist for information search. The first is the library paradigm, which has been largely automated and is the prevailing paradigm in today’s web search. The second is the village paradigm, and although it is older than the library paradigm, its automation has not been considered, yet certain elements of its key aspects have been automated, as in the cases of the Q&A communities or novel services such as Quora. The increasing popularity and availability of online social networks and question-answering communities have encouraged revisiting of the automation of the village paradigm owing to new helpful developments, primarily that people are more connected with their acquaintances on the internet and their contact lists are available. In this survey, we study how the village paradigm is today partially automated: we consider the selection of candidates for answering questions, answering questions automatically and helping candidates to decide what questions to answer. Other aspects are also considered, for example, the automation of a reward system. We conclude that a next step towards the automation of the village paradigm involves intelligent agents that can leverage a P2P (peer-to-peer) social network, which will create new and interesting issues deeply entwined with social networks in the form of information processing by agents in parallel and side by side with people.

    August 08, 2013   doi: 10.1177/0165551513495635   open full text
  • Concept map construction from text documents using affinity propagation.
    Qasim, I., Jeong, J.-W., Heu, J.-U., Lee, D.-H.
    Journal of Information Science. July 29, 2013

    Concept maps are playing an increasingly important role in various computing fields. In particular, they have been popularly used for organizing and representing knowledge. However, constructing concept maps manually is a complex and time-consuming task. Therefore, the creation of concept maps automatically or semi-automatically from text documents is a worthwhile research challenge. Recently, various approaches for automatic or semi-automatic construction of concept maps have been proposed. However, these approaches suffer from several limitations. First, only the noun phrases in text documents are included without resolution of the anaphora problems for pronouns. This omission causes important propositions available in the text documents to be missed, resulting in decreased recall. Second, although some approaches label the relationship to form propositions, they do not show the direction of the relationship between the subject and object in the form of Subject–Relationship–Object, leading to ambiguous propositions. In this paper, we present a cluster-based approach to semi-automatically construct concept maps from text documents. First, we extract the candidate terms from documents using typed dependency linguistic rules. Anaphoric resolution for pronouns is introduced to map the pronouns with candidate terms. Second, the similarities are calculated between the pairs of extracted candidate terms of a document and clusters are made through affinity propagation by providing the calculated similarities between the candidate terms. Finally, the extracted relationships are assigned between the candidate terms in each cluster. Our empirical results show that the semi-automatically constructed concept maps conform to the outputs generated manually by domain experts, since the degree of difference between them is proportionally small based on a Likert scale. Furthermore, domain experts verified that the constructed concept maps are in accordance with their knowledge of the information system domain.

    July 29, 2013   doi: 10.1177/0165551513494645   open full text
  • A learning approach for email conversation thread reconstruction.
    Dehghani, M., Shakery, A., Asadpour, M., Koushkestani, A.
    Journal of Information Science. July 29, 2013

    An email conversation thread is defined as a topic-centric discussion unit that is composed of exchanged emails among the same group of people by reply or forwarding. Detecting conversation threads contained in email corpora can be beneficial for both humans to digest the content of discussions and automatic methods to extract useful information from the conversations. This research explores two new feature-enriched learning approaches, LExLinC and LExTreC, to reconstruct linear structure and tree structure of conversation threads in email data. In this work, some simplifying assumptions considered in previous methods for extracting conversation threads are relaxed, which makes the proposed methods more powerful in detecting real conversations. Additionally, the supervised nature of the proposed methods makes them adaptable to new environments by automatically adjusting the features and their weights. Experimental results show that the proposed methods are highly effective in detecting conversation threads and outperform the existing methods.

    July 29, 2013   doi: 10.1177/0165551513494638   open full text
  • Cross-language patent matching via an international patent classification-based concept bridge.
    Chen, Y.-L., Chiu, Y.-T.
    Journal of Information Science. July 08, 2013

    Patent documents with sophisticated technical information are valuable for developing new technologies and products. They can be written in almost any language, leading to language barrier problems during retrieval. Traditionally, cross-language information retrieval and cross-language document matching have used text-translation-based or index-set-mapping methods. There are several challenges to the traditional methods, however, such as difficulties with natural language translation, complications owing to bilingual or multi-lingual translations (translating between two or more than two languages), and the unavailability of a parallel dual-language document set. This study offers a new and robust solution to cross-language patent document matching: the International Patent Classification (IPC) based concept bridge approach. The proposed method applies Latent Semantic Indexing to extract concepts from each set of patent documents and utilizes the IPC codes to construct a cross-language mediator that expresses patent documents in different languages. Experiments were carried out to demonstrate the performance of the proposed method. There were 3000 English patents and 3000 Chinese patents gathered as training documents from the United States Patent and Trademark Office and the Taiwan Intellectual Property Office, respectively. Another 30 English patents and another 30 Chinese patents were collected to be query patents. Finally, evaluations using an objective measure and subjective judgement were conducted to prove the feasibility and effectiveness of our method. The results show that our method out-performs the traditional text-translation methods.

    July 08, 2013   doi: 10.1177/0165551513494641   open full text
  • Logic and the Organization of Knowledge - an appreciation of the book of this title by Martin Fricke. A set of short essays.
    Gilchrist, A., Zeng, M. L., Dextre Clarke, S., Isaac, A., Lambe, P., Vernau, J.
    Journal of Information Science. July 01, 2013

    The Journal of Information Science does not normally carry book reviews, but when the Editor received a copy of Martin Frické’s book, Logic and the Organization of Knowledge, he thought it was too interesting to ignore. It succinctly encapsulates years of accumulated research and practice in the field, while also adding ‘logic’ into the mix. He asked me if I could think of a suitable way of acknowledging the work, and I proposed a collection of short pieces by prominent people in the information science community. This is not, then, a review; more a salute to the author for making explicit the fundamental relationship between the ancient disciplines of logic and knowledge organization.

    July 01, 2013   doi: 10.1177/0165551513480310   open full text
  • 'So wide and varied': The origins and character of British information science.
    Robinson, L., Bawden, D.
    Journal of Information Science. June 24, 2013

    This paper examines some characteristics of the ‘British School’ of information science. Three main forces driving the development of the new subject in Britain are identified: the documentation movement; special libraries; and the need for better treatment of scientific and technical information. Five characteristics which, taken together, distinguish the early British approach to information science from those adopted elsewhere are identified: its subject-based nature; its broad approach to information and information science; its status as an academic subject with a strong professional remit; its involvement with, but distinction from, information technology; and its involvement with memory institutions. Lessons are drawn for the future development of the information sciences.

    June 24, 2013   doi: 10.1177/0165551513492257   open full text
  • References-enriched Concept Map: a tool for collecting and comparing disparate definitions appearing in multiple references.
    Rodriguez-Priego, E., Garcia-Izquierdo, F. J., Rubio, A. L.
    Journal of Information Science. May 23, 2013

    Finding and sharing a common vocabulary is a critical task for the development of any area of knowledge. However, it is very common to find heated debate in the literature on the meaning of particular terms. Different authors propose different definitions, some of them even contradictory. This situation, while enriching the scientific process, may hinder the understanding of fundamental concepts regarding a certain subject. To address this problem, we propose a technique called References-enriched Concept Maps (RCM), inspired by concept maps. RCM can be used to compare definitions and therefore improve the understanding of terms, keeping track of the publications in which the different definitions were proposed. We present a method of RCM construction as well as different metrics for analysing them. An analysis carried out using the proposed metrics allows one to find answers while also raising new questions about the discussed concepts.

    May 23, 2013   doi: 10.1177/0165551513487848   open full text
  • Hybrid pseudo-relevance feedback for microblog retrieval.
    Chen, L., Chun, L., Ziyu, L., Quan, Z.
    Journal of Information Science. May 23, 2013

    The microblog has become a new global hot spot. Information retrieval (IR) technologies are necessary for accessing the massive amounts of valuable user-generated contents in the microblog sphere. The challenge in searching relevant microblogs is that they are usually very short with sparse vocabulary and may fail to match queries. Pseudo-relevance feedback (PRF) via query-expansion has been proven in previous studies to successfully increase the number of matches in IR. However, a critical problem of PRF is that the pseudo-relevant feedback may not be truly relevant, and thus may introduce noise to query expansion. In this paper, we exploit the dynamic nature of microblogs to address this problem. We first present a novel dynamic PRF technique, which is capable of expanding queries with truly relevant keywords by extracting representative terms based on the query’s temporal profile. Next we present query expansion from external knowledge sources based on negative and positive feedbacks. We further consider that the choice of PRF strategy is query-dependent. A two-level microblog search framework is presented. At the high level, a temporal profile is constructed and categorized for each query; at the low level, hybrid PRF query expansion combining dynamic and external PRF is adopted based on the query category. Experiments on a real data set demonstrate that the proposed method significantly increases the performance of microblog searching, compared with several traditional retrieval models, various query expansion methods and state-of-art recency-based models for microblog searching.

    May 23, 2013   doi: 10.1177/0165551513487846   open full text
  • Effectiveness of search result classification based on relevance feedback.
    Baskaya, F., Keskustalo, H., Jarvelin, K.
    Journal of Information Science. May 23, 2013

    Relevance feedback (RF) has been studied under laboratory conditions using test collections and either test persons or simple simulation. These studies have given mixed results. Automatic (or pseudo) RF and intellectual RF, both leading to query reformulation, are the main approaches to explicit RF. In the present study we perform RF with the help of classification of search results. We conduct our experiments in a comprehensive collection, namely various TREC ad-hoc collections with 250 topics. We also studied various term space reduction techniques for the classification process. The research questions are: given RF on top results of pseudo RF (PRF) query results, is it possible to learn effective classifiers for the following results? What is the effectiveness of various classification methods? Our findings indicate that this approach of applying RF is significantly more effective than PRF with short (title) queries and long (title and description) queries.

    May 23, 2013   doi: 10.1177/0165551513488317   open full text
  • A weakly supervised approach to Chinese sentiment classification using partitioned self-training.
    Zhang, P., He, Z.
    Journal of Information Science. April 09, 2013

    With the rapid evolution of documents on the World Wide Web which express opinions, there exists an increasing demand for developing such a sentiment analysis technique that can easily adapt to new domains with minimum supervision. This article introduces a novel weakly supervised approach for Chinese sentiment classification. The approach applies a variant of self-training algorithm on two partitions split from test dataset, and combines classification results of the two partitions into a pseudo-labelled training set and an unlabelled test set, then trains an initial classifier on the pseudo-labelled training set and adopts a standard self-learning cycle to obtain the overall classification results. Experiments on the four datasets from two domains show that our approach has competitive advantages over baseline approaches; it even outperforms the supervised approach in some of the datasets despite using no labelled documents.

    April 09, 2013   doi: 10.1177/0165551513480330   open full text
  • Web link-based relationships among top European universities.
    Figuerola, C. G., Alonso Berrocal, J. L.
    Journal of Information Science. April 09, 2013

    In this paper, an analysis of interlinking between 100 major European universities is given. Since websites contain links to webpages for other organizations, they may reveal the strongest relationships established between two organizations. This analysis of web links allowed us to determine the different behaviours among the universities with regard to incoming or outgoing web links; some universities had significantly greater incoming than outgoing activity. In general, there was a low level of interaction between the universities studied. Also, we observed the existence of geographic–linguistic patterns in establishing links. Five primary nuclei or blocks of universities can be identified: the group composed almost exclusively of universities from the UK; the group composed in large part of German universities, along with some from Switzerland and Austria; the cluster of universities from Mediterranean countries, including various French universities; the group of Belgian and Dutch universities, along with some from French-speaking Switzerland; and finally, the group made up of universities from the Nordic countries. Although there are some universities that overlap with several groups or clusters, the overall design is rather clear. On the other hand, the whole picture seems to agree with the results of other studies based on bibliographic co-authorship production.

    April 09, 2013   doi: 10.1177/0165551513480579   open full text
  • Local social knowledge management: A case study of social learning and knowledge sharing across organizational boundaries.
    Lahtinen, J.
    Journal of Information Science. April 09, 2013

    Knowledge management is normally approached in the context of a single organization’s activities. Recently the focus has been extended to activities which span beyond organizational boundaries, especially to the key role of social learning across organizations. The concept of ‘local social knowledge management’ has been used to stress the process of social learning in regional networking. This study describes the local social knowledge management in a regional development project. The knowledge sharing and creation practices in the theme groups of the project are described and particular attention is paid to the evolution of the social learning process. Three distinct but interdependent forms of knowledge sharing and creation were identified in networking. Operational networking helped people manage the current project responsibilities while strategic networking opened pathways to the future. The third form of networking boosted the personal development of project participants even when the cooperation was not continuing. The results show that there can be more accurate models in knowledge management research if the viewpoint is shifted to broader contexts where people normally interact.

    April 09, 2013   doi: 10.1177/0165551513481431   open full text
  • Basic-level categories: A review.
    Hajibayova, L.
    Journal of Information Science. April 09, 2013

    This paper analyses selected literature on basic-level categories, explores related theories and discusses theoretical explanations of the phenomenon of basic-level categories. A substantial body of research has proposed that basic-level categories are the first categories formed during perception of the environment, the first learned by children and those most used in language. Experimental studies suggest that high-level (or superordinate) categories lack informativeness because they are represented by only a few attributes and low-level (or subordinate) categories lack cognitive economy because they are represented by too many attributes. Studies in library and information science have demonstrated the prevalence of basic-level categories in knowledge organization and representation systems such as thesauri and in image indexing and retrieval; and it has been suggested that the universality of basic-level categories could be used for building crosswalks between classificatory systems and user-centred indexing. However, while there is evidence of the pervasiveness of basic-level categories, they may actually be unstable across individuals, domains or cultures and thus unable to support broad generalizations. This paper discusses application of Heidegger’s notion of handiness as a framework for understanding the relational nature of basic-level categories.

    April 09, 2013   doi: 10.1177/0165551513481443   open full text
  • A new approach to complex web site organization.
    Pisanski, J., Pisanski, T., Zumer, M.
    Journal of Information Science. April 09, 2013

    The methodology presented in this paper is based on concept mapping, which is a technique for representing knowledge in graphs. Its applications are broader and cover, in addition to presentation of knowledge, the complex organization of systems such as web sites. The paper presents a method for reaching consensus from several organizations of data/web site independently produced by different people. A class of methods was initiated, considering a number of parameters that can be chosen in order to match closely any specific real-life application. Although the methodology can be fully automated in terms of a suitable computer program, it is meant to be mainly a useful tool for experts in web site organization.

    April 09, 2013   doi: 10.1177/0165551513482270   open full text
  • Multi Small Index (MSI): A spatial indexing structure.
    Al-Badarneh, A. F., Al-Alaj, A. S., Mahafzah, B. A.
    Journal of Information Science. April 09, 2013

    Most of the existing spatial indices are constructed using a single hierarchal index structure; hence a large number of index pages (nodes) are most likely to be inspected during spatial query execution. Since spatial queries usually fetch spatial objects based on their spatial position in the space, it is significant that spatial objects are clustered in such a way that pertinent objects to a query are fetched quickly. This paper presents a method for partitioning the whole space into set of small subspaces. Then, an index structure for each subspace (called the Multi Small Index) is built. This makes it is easy to quickly retrieve spatial objects that are relevant to the query in question using their corresponding small spatial index structures and ignoring other irrelevant indices. To evaluate our new approach, we conducted a set of experimental studies using a collection of real-life spatial datasets (TIGER data files) with diverse sizes and different object sizes, densities and distributions, as well as various query sizes. The results show that (using small query sizes) our proposed structure (Multi Small Index) outperforms the original R-tree (Single Big Index) structure, achieving nearly 50% saving in disk access.

    April 09, 2013   doi: 10.1177/0165551513483253   open full text
  • A probability-based unified framework for semantic search and recommendation.
    Lee, J.-w., Kim, H.-j., Lee, S.-g.
    Journal of Information Science. March 20, 2013

    The objective of search and recommendation is to provide users with documents that are relevant to their needs. Keyword-based search and recommendation approaches suffer from sparsity and semantic ambiguity problems because they correlate users’ needs with documents only via keywords. Thus, for a given query, some documents that are semantically relevant to a user’s needs are not provided if they do not include specific keywords. To address this, some search approaches have used the authority of documents, which is commonly represented using hyperlinks within documents. However, if there are no hyperlinks, it is difficult to exploit the authority for ranking documents. As the links of documents are determined by their owners, the authority derived from links does not consider users’ current needs. In order to resolve these problems, we propose a unified framework for semantic search and recommendation to enrich the semantics of users’ needs and documents with their corresponding concepts and to use personalized authority derived from recommendation approaches. The proposed approach makes it possible to retrieve documents with a high degree of semantic relevance as well as high authority. Through extensive experiments, we show that our approach outperforms conventional search and recommendation approaches.

    March 20, 2013   doi: 10.1177/0165551513480100   open full text
  • Building and evaluating a collaboratively built structured folksonomy.
    Yoo, D., Choi, K., Suh, Y., Kim, G.
    Journal of Information Science. March 20, 2013

    Flat folksonomy uses simple tags and has emerged as a powerful instrument for classifying and sharing a huge amount of knowledge on Web 2.0. However, it has semantic problems, such as ambiguous and misunderstood tags. To alleviate such problems, researchers have built structured folksonomies with a hierarchical structure or relationships among tags. Structured folksonomies, however, also have some fundamental problems, such as limited tagging of pre-defined vocabulary and time-consuming manual effort required to select tags. To resolve these problems, we suggested a new method of attaching a tag with its category, which we call a categorized tag (CT), to web content. CTs entered by users are automatically and immediately integrated into a collaboratively built structured folksonomy (CSF), reflecting the tag-and-category relationships supported by the majority of users. Then, we developed a CT-based knowledge organization system (CTKOS), which builds upon the CSF to classify organizational knowledge and enables us to locate appropriate knowledge. In addition, the results of the evaluation, which we conducted to compare our proposed system with the flat folksonomy system, indicate that users perceive CTKOS to be more useful than the flat folksonomy system in terms of knowledge sharing (i.e. the tagging mechanism) and retrieval (i.e. the searching mechanism).

    March 20, 2013   doi: 10.1177/0165551513480309   open full text
  • A robust approach for finding conceptually related queries using feature selection and tripartite graph structure.
    Goyal, P., Mehala, N., Bansal, A.
    Journal of Information Science. March 18, 2013

    The information explosion on the Internet has placed high demands on search engines. Despite the improvements in search engine technology, the precision of current search engines is still unsatisfactory. Moreover, the queries submitted by users are short, ambiguous and imprecise. This leads to a number of problems in dealing with similar queries. The problems include lack of common keywords, selection of different documents by the search engine and lack of common clicks etc. These problems render the traditional query clustering methods unsuitable for query recommendations. In this paper, we propose a new query recommendation system. For this, we have identified conceptually related queries by capturing users’ preferences using click-through graphs of web search logs and by extracting the best features, relevant to the queries, from the snippets. The proposed system has an online feature extraction phase and an offline phase in which feature filtering and query clustering are performed. Query clustering is carried out by a new tripartite agglomerative clustering algorithm, Query-Document-Concept Clustering, in which the documents are used innovatively to decouple queries and features/concepts in a tripartite graph structure. This results in clusters of similar queries, associated clusters of documents and clusters of features. We model the query recommendation problem in four different ways. Two models are non-personalized and personalized content-ignorant models. Other two are non-personalized and personalized content-aware models. Three similarity measures are introduced to estimate different kinds of similarities. Experimental results show that the proposed approach has better precision, recall and F-measure than the existing approaches.

    March 18, 2013   doi: 10.1177/0165551513477819   open full text