MetaTOC stay on top of your field, easily

Journal of the American Society for Information Science and Technology

Impact factor: 2.005 5-Year impact factor: 2.159 Print ISSN: 1532-2882 Online ISSN: 1532-2890 Publisher: Wiley Blackwell (John Wiley & Sons)

Subject: Information Science & Library Science

Most recent papers:

  • An empirical investigation on search engine ad disclosure.
    Dirk Lewandowski, Friederike Kerkmann, Sandra Rümmele, Sebastian Sünkler.
    Journal of the American Society for Information Science and Technology. October 17, 2017
    This representative study of German search engine users (N = 1,000) focuses on the ability of users to distinguish between organic results and advertisements on Google results pages. We combine questions about Google's business with task‐based studies in which users were asked to distinguish between ads and organic results in screenshots of results pages. We find that only a small percentage of users can reliably distinguish between ads and organic results, and that user knowledge of Google's business model is very limited. We conclude that ads are insufficiently labelled as such, and that many users may click on ads assuming that they are selecting organic results.
    October 17, 2017   doi: 10.1002/asi.23963   open full text
  • Online consumer reviews and sales: Examining the chicken‐egg relationships.
    Jie Ren, William Yeoh, Mong Shan Ee, Aleš Popovič.
    Journal of the American Society for Information Science and Technology. October 17, 2017
    This article examines the “chicken‐egg” two‐way relationships between online consumer reviews and sales, and assesses the dual influencer and indicator roles of online consumer reviews in relation to purchase behavior. Considering the time factor, we adopt the methodology of Granger causality test and track 3,390 products on Amazon.com over a 2‐month period. The results reveal that a causality loop exists between online consumer review volume and sales. Specifically, our findings indicate that the volume of negative consumer reviews drive consumers' purchasing decisions, but the volume of positive consumers reviews only marginally affects purchasing decisions. Also, consumers generate more positive reviews than negative reviews after sales. Our results highlight the importance of negative consumer reviews; negative reviews not only lead to sales, but sales, in turn, lead to higher volume of negative reviews. The findings suggest an alternative strategy for practitioners to address negative online consumer reviews and highlight the awareness effect of online consumer review postings that can later convert into purchase behaviors.
    October 17, 2017   doi: 10.1002/asi.23967   open full text
  • geNov: A new metric for measuring novelty and relevancy in biomedical information retrieval.
    Xiangdong An, Jimmy Xiangji Huang.
    Journal of the American Society for Information Science and Technology. October 06, 2017
    For diversity and novelty evaluation in information retrieval, we expect that the novel documents are always ranked higher than the redundant ones and the relevant ones higher than the irrelevant ones. We also expect that the level of novelty and relevancy should be acknowledged. Accordingly, we expect that the evaluation algorithm would reward rankings that respect these expectations. Nevertheless, there are few research articles in the literature that study how to meet such expectations, even fewer in the field of biomedical information retrieval. In this article, we propose a new metric for novelty and relevancy evaluation in biomedical information retrieval based on an aspect‐level performance measure introduced by TREC Genomics Track with formal results to show that those expectations above can be respected under ideal conditions. The empirical evaluation indicates that the proposed metric, geNov, is greatly sensitive to the desired characteristics above, and the three parameters are highly tuneable for different evaluation preferences. By experimentally comparing with state‐of‐the‐art metrics for novelty and diversity, the proposed metric shows its advantages in recognizing the ranking quality in terms of novelty, redundancy, relevancy, and irrelevancy and in its discriminative power. Experiments reveal the proposed metric is faster to compute than state‐of‐the‐art metrics.
    October 06, 2017   doi: 10.1002/asi.23958   open full text
  • Identifying functional aspects from user reviews for functionality‐based mobile app recommendation.
    Xiaoying Xu, Kaushik Dutta, Anindya Datta, Chunmian Ge.
    Journal of the American Society for Information Science and Technology. October 06, 2017
    The explosive growth of mobile apps makes it difficult for users to find their needed apps in a crowded market. An effective mechanism that provides high quality app recommendations becomes necessary. However, existing recommendation techniques tend to recommend similar items but fail to consider users’ functional requirements, making them not effective in the app domain. In this article, we propose a recommendation architecture that can generate app recommendations at the functionality level. We address the redundant recommendation problem in the app domain by highlighting users’ functional requirements, an element that has received scant attention from existing recommendation research. Another main feature of our work is extracting app functionalities from textural user reviews for recommendation. We also propose an effective approach for functionality extraction. Experiments conducted on a real‐world dataset show that our proposed AppRank method outperforms other commonly used recommendation methods. In particular, it doubles the recall value of the second best method under an extremely sparse setting, increases the overall ranking accuracy of the second best method by 14.27%, and retains a high diversity of 0.99.
    October 06, 2017   doi: 10.1002/asi.23932   open full text
  • SUDMAD: Sequential and unsupervised decomposition of a multi‐author document based on a hidden markov model.
    Khaled Aldebei, Xiangjian He, Wenjing Jia, Weichang Yeh.
    Journal of the American Society for Information Science and Technology. September 29, 2017
    Decomposing a document written by more than one author into sentences based on authorship is of great significance due to the increasing demand for plagiarism detection, forensic analysis, civil law (i.e., disputed copyright issues), and intelligence issues that involve disputed anonymous documents. Among existing studies for document decomposition, some were limited by specific languages, according to topics or restricted to a document of two authors, and their accuracies have big room for improvement. In this paper, we consider the contextual correlation hidden among sentences and propose an algorithm for Sequential and Unsupervised Decomposition of a Multi‐Author Document (SUDMAD) written in any language, disregarding topics, through the construction of a Hidden Markov Model (HMM) reflecting the authors' writing styles. To build and learn such a model, an unsupervised, statistical approach is first proposed to estimate the initial values of HMM parameters of a preliminary model, which does not require the availability of any information of author's or document's context other than how many authors contributed to writing the document. To further boost the performance of this approach, a boosted HMM learning procedure is proposed next, where the initial classification results are used to create labeled training data to learn a more accurate HMM. Moreover, the contextual relationship among sentences is further utilized to refine the classification results. Our proposed approach is empirically evaluated on three benchmark datasets that are widely used for authorship analysis of documents. Comparisons with recent state‐of‐the‐art approaches are also presented to demonstrate the significance of our new ideas and the superior performance of our approach.
    September 29, 2017   doi: 10.1002/asi.23956   open full text
  • Triaging content severity in online mental health forums.
    Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian.
    Journal of the American Society for Information Science and Technology. September 25, 2017
    In recent years, social media has become a significant resource for improving healthcare and mental health. Mental health forums are online communities where people express their issues, and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self‐harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self‐harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We propose an approach for triaging user content into four severity categories that are defined based on an indication of self‐harm ideation. Our models are based on a feature‐rich classification framework, which includes lexical, psycholinguistic, contextual, and topic modeling features. Our approaches improve over the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F‐1 scores). Furthermore, using our proposed model, we analyze the mental state of users and we show that overall, long‐term users of the forum demonstrate decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.
    September 25, 2017   doi: 10.1002/asi.23865   open full text
  • Predicting data science sociotechnical execution challenges by categorizing data science projects.
    Jeffrey Saltz, Ivan Shamshurin, Colin Connors.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    The challenge in executing a data science project is more than just identifying the best algorithm and tool set to use. Additional sociotechnical challenges include items such as how to define the project goals and how to ensure the project is effectively managed. This paper reports on a set of case studies where researchers were embedded within data science teams and where the researcher observations and analysis was focused on the attributes that can help describe data science projects and the challenges faced by the teams executing these projects, as opposed to the algorithms and technologies that were used to perform the analytics. Based on our case studies, we identified 14 characteristics that can help describe a data science project. We then used these characteristics to create a model that defines two key dimensions of the project. Finally, by clustering the projects within these two dimensions, we identified four types of data science projects, and based on the type of project, we identified some of the sociotechnical challenges that project teams should expect to encounter when executing data science projects.
    September 22, 2017   doi: 10.1002/asi.23873   open full text
  • Internet usage and patient's trust in physician during diagnoses: A knowledge power perspective.
    Tian Lu, Yunjie (Calvin) Xu, Scott Wallace.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    Does patients’ Internet search of disease information affect their trust in physicians during diagnosis? This study proposes a research model from a knowledge power perspective, that is, Internet search affects patients’ perception of their knowledge level. Our empirical study of more than 400 subjects suggests that for patients who searched online for disease information, the inconsistency between their self‐diagnosis expectations and their physician's diagnosis reduces their trust in their physician. The effect is stronger for those who spent more time on Internet search. Patients with chronic conditions are less affected by the inconsistency, as are patients of physicians with a higher professional status. This study also found that physicians’ interaction quality in the diagnosis process—how well they communicate with their patient—still plays a dominant role in gaining patient's trust. This finding suggests that even in the high‐tech age, high‐touch remains an important factor to physician‐patient trust.
    September 22, 2017   doi: 10.1002/asi.23920   open full text
  • Trustworthiness attribution: Inquiry into insider threat detection.
    Shuyuan Mary Ho, Michelle Kaarst‐Brown, Izak Benbasat.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    Insider threat is a “wicked” contemporary organizational problem. It poses significant threats to organizational operations and information security. This article reviews insider threat research and outlines key propositions to conceptualize the interpretation of dynamic human information behavior in an organizational setting, which represent an integration of trustworthiness and human sensors’ attribution in close relationships. These propositions posit that when a focal individual violates integrity‐based trust, the group can collectively attribute a shift in trustworthiness, triggering a natural peer attribution process that assigns cause to observed behavior. Group communication can thus reflect subtle changes in a focal individual's perceived trustworthiness. The ability to understand group‐based computer‐mediated communication patterns over time may become essential in safeguarding information assets and the “digital well‐being” of today's organizations. This article contributes a novel theoretical lens to examine dynamic insights on insider threat detection.
    September 22, 2017   doi: 10.1002/asi.23938   open full text
  • “Smart girls” versus “sleeping beauties” in the sciences: The identification of instant and delayed recognition by using the citation angle.
    Fred Y. Ye, Lutz Bornmann.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    In recent years, a number of studies have introduced methods for identifying papers with delayed recognition (so called “sleeping beauties,” SBs) or have presented single publications as cases of SBs. Most recently, Ke, Ferrara, Radicchi, and Flammini (, Proceedings of the National Academy of Sciences of the USA, 112(24), 7426–7431) proposed the so called “beauty coefficient” (denoted as B) to quantify how much a given paper can be considered as a paper with delayed recognition. In this study, the new term smart girl (SG) is suggested to differentiate instant credit or “flashes in the pan” from SBs. Although SG and SB are qualitatively defined, the dynamic citation angle β is introduced in this study as a simple way for identifying SGs and SBs quantitatively — complementing the beauty coefficient B. The citation angles for all articles from 1980 (n = 166,870) in natural sciences are calculated for identifying SGs and SBs and their extent. We reveal that about 3% of the articles are typical SGs and about 0.1% typical SBs. The potential advantages of the citation angle approach are explained.
    September 22, 2017   doi: 10.1002/asi.23846   open full text
  • Discourse relations in rationale‐containing text‐segments.
    Lu Xiao, Niall Conroy.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    Offering one's perspective and justifying it has become a common practice in online text‐based communications, just as it is in typical, face‐to‐face communication. Compared to the face‐to‐face communications, it can be particularly more challenging for users to understand and evaluate another's perspective in online communications. On the other hand, the availability of the communication record in online communications offers a potential to leverage computational techniques to automatically detect user opinions and rationales. One promising approach to automatically detect the rationales is to detect the common discourse relations in rationale texts. However, no empirical work has been done with regard to which discourse relations are commonly present in the users’ rationales in online communications. To fill this gap, we annotated the discourse relations in the text segments that contain the rationales (N = 527 text segments). These text segments are obtained from five datasets that consist of five online posts and the first 100 comments. We identified 10 discourse relations that are commonly present in this sample. Our finding marks an important contribution to this rationale detection approach. We encourage more empirical work, preferably with a larger sample, to examine the generalizability of our findings.
    September 22, 2017   doi: 10.1002/asi.23882   open full text
  • Records management in the cloud: From system design to resource ownership.
    Lorraine L. Richards.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    New technology implementations impact organizational behavior and outcomes, sometimes in unintended ways. A combination of design decisions, altered affordances, and political struggles within a state cloud computing implementation reduced levels of service among records management professionals, in spite of their strongly expressed desire to manage records with excellence. Struggles to maintain ownership and control over organizational processes and resources illustrate the power dynamics that are affected by the design of a new system implementation. By designing the system with a single goal in mind (centralization to reduce costs), strategic management failed to consider otherwise predictable outcomes of reducing the resources controlled by a group with lesser power and increasing the resources controlled by an already dominant power within the institution. These findings provide valuable insights into the considerations which cloud computing designs should take into account. They also offer an understanding of changing educational requirements for records management workers to engage more effectively across occupations in technologically changing environments and the potential risks that cloud computing provide to productivity. The research was comprised of an extensive literature review, a grounded theory methodological approach, and rigorous data collection and synthesis via an empirical case study.
    September 22, 2017   doi: 10.1002/asi.23939   open full text
  • Metric assessments of books as families of works.
    Alesia Zuccala, Mads Breum, Kasper Bruun, Bernd T. Wunsch.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    We describe the intellectual and physical properties of books as manifestations, expressions, and works and assess the current indexing and metadata structure of monographs in the Book Citation Index (BKCI). Our focus is on the interrelationship of these properties in light of the Functional Requirements for Bibliographic Records (FRBR). Data pertaining to monographs were collected from the Danish PURE repository system as well as the BKCI (2005–2015) via their International Standard Book Numbers (ISBNs). Each ISBN was then matched to the same ISBN and family‐related ISBNs cataloged in two additional databases: OCLC‐WorldCat and Goodreads. With the retrieval of all family‐related ISBNs, we were able to determine the number of monograph expressions present in the BKCI and their collective relationship to one work. Our results show that the majority of missing expressions from the BKCI are emblematic (i.e., first editions of monographs) and that both the indexing and metadata structure of this commercial database could significantly improve with the introduction of distinct expression IDs (i.e., for every distinct edition) and unifying work‐related IDs. This improved metadata structure would support the collection of more accurate publication and citation counts for monographs and has implications for developing new indicators based on bibliographic levels.
    September 22, 2017   doi: 10.1002/asi.23921   open full text
  • Characterizing, predicting, and handling web search queries that match very few or no results.
    Erdem Sarigil, Ismail Sengor Altingovde, Roi Blanco, B. Barla Cambazoglu, Rifat Ozcan, Özgür Ulusoy.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    A non‐negligible fraction of user queries end up with very few or even no matching results in leading commercial web search engines. In this work, we provide a detailed characterization of such queries and show that search engines try to improve such queries by showing the results of related queries. Through a user study, we show that these query suggestions are usually perceived as relevant. Also, through a query log analysis, we show that the users are dissatisfied after submitting a query that match no results at least 88.5% of the time. As a first step towards solving these no‐answer queries, we devised a large number of features that can be used to identify such queries and built machine‐learning models. These models can be useful for scenarios such as the mobile‐ or meta‐search, where identifying a query that will retrieve no results at the client device (i.e., even before submitting it to the search engine) may yield gains in terms of the bandwidth usage, power consumption, and/or monetary costs. Experiments over query logs indicate that, despite the heavy skew in class sizes, our models achieve good prediction quality, with accuracy (in terms of area under the curve) up to 0.95.
    September 22, 2017   doi: 10.1002/asi.23955   open full text
  • Patient‐centered and experience‐aware mining for effective adverse drug reaction discovery in online health forums.
    Yunzhong Liu, Jinhe Shi, Yi Chen.
    Journal of the American Society for Information Science and Technology. September 22, 2017
    Adverse Drug Reactions (ADRs) have become a serious health problem and even a leading cause of death in the United States. Pre‐marketing clinical trials and traditional post‐marketing surveillance using voluntary and spontaneous report systems are insufficient for ADR detection. On the other hand, online health forums provide valuable evidences in a large scale and in a timely fashion through the active participation of patients, caregivers, and doctors. In this article, we present patient‐centered and experience‐aware mining framework for effective ADR discovery using online health forum data. Our experimental evaluation with both an official ADR knowledge base and human‐annotated ground truth verifies the effectiveness of the proposed method for ADR discovery.
    September 22, 2017   doi: 10.1002/asi.23929   open full text
  • Metadata records machine translation combining multi‐engine outputs with limited parallel data.
    Brenda Reyes Ayala, Ryan Knudson, Jiangping Chen, Gaohui Cao, Xinyue Wang.
    Journal of the American Society for Information Science and Technology. September 19, 2017
    One way to facilitate Multilingual Information Access (MLIA) for digital libraries is to generate multilingual metadata records by applying Machine Translation (MT) techniques. Current online MT services are available and affordable, but are not always effective for creating multilingual metadata records. In this study, we implemented 3 different MT strategies and evaluated their performance when translating English metadata records to Chinese and Spanish. These strategies included combining MT results from 3 online MT systems (Google, Bing, and Yahoo!) with and without additional linguistic resources, such as manually‐generated parallel corpora, and metadata records in the two target languages obtained from international partners. The open‐source statistical MT platform Moses was applied to design and implement the three translation strategies. Human evaluation of the MT results using adequacy and fluency demonstrated that two of the strategies produced higher quality translations than individual online MT systems for both languages. Especially, adding small, manually‐generated parallel corpora of metadata records significantly improved translation performance. Our study suggested an effective and efficient MT approach for providing multilingual services for digital collections.
    September 19, 2017   doi: 10.1002/asi.23925   open full text
  • Understanding scientific collaboration: Homophily, transitivity, and preferential attachment.
    Chenwei Zhang, Yi Bu, Ying Ding, Jian Xu.
    Journal of the American Society for Information Science and Technology. September 19, 2017
    Scientific collaboration is essential in solving problems and breeding innovation. Coauthor network analysis has been utilized to study scholars' collaborations for a long time, but these studies have not simultaneously taken different collaboration features into consideration. In this paper, we present a systematic approach to analyze the differences in possibilities that two authors will cooperate as seen from the effects of homophily, transitivity, and preferential attachment. Exponential random graph models (ERGMs) are applied in this research. We find that different types of publications one author has written play diverse roles in his/her collaborations. An author's tendency to form new collaborations with her/his coauthors' collaborators is strong, where the more coauthors one author had before, the more new collaborators he/she will attract. We demonstrate that considering the authors' attributes and homophily effects as well as the transitivity and preferential attachment effects of the coauthorship network in which they are embedded helps us gain a comprehensive understanding of scientific collaboration.
    September 19, 2017   doi: 10.1002/asi.23916   open full text
  • Tracing the traces: The critical role of metadata within networked communications.
    Matthew S. Mayernik, Amelia Acker.
    Journal of the American Society for Information Science and Technology. September 19, 2017
    The information sciences have traditionally been at the center of metadata‐focused research. The US National Security Agency (NSA) intelligence documents revealed by Edward Snowden in June of 2013 brought the term “metadata” into the public consciousness. Surprisingly little discussion in the information sciences has since occurred on the nature and importance of metadata within networked communication systems. The collection of digital metadata impacts the ways that people experience social and technical communication. Without such metadata, networked communication cannot exist. The NSA leaks, and numerous recent hacks of corporate and government communications, point to metadata as objects of new scholarly inquiry. If we are to engage in meaningful discussions about our digital traces, or make informed decisions about new policies and technologies, it is essential to develop theoretical and empirical frameworks that account for digital metadata. This opinion paper presents 5 key sociotechnical characteristics of metadata within digital networks that would benefit from stronger engagement by the information sciences.
    September 19, 2017   doi: 10.1002/asi.23927   open full text
  • Understanding success through the diversity of collaborators and the milestone of career.
    Yi Bu, Ying Ding, Jian Xu, Xingkun Liang, Gege Gao, Yiming Zhao.
    Journal of the American Society for Information Science and Technology. September 19, 2017
    Scientific collaboration is vital to many fields, and it is common to see scholars seek out experienced researchers or experts in a domain with whom they can share knowledge, experience, and resources. To explore the diversity of research collaborations, this article performs a temporal analysis on the scientific careers of researchers in the field of computer science. Specifically, we analyze collaborators using 2 indicators: the research topic diversity, measured by the Author‐Conference‐Topic model and cosine, and the impact diversity, measured by the normalized standard deviation of h‐indices. We find that the collaborators of high‐impact researchers tend to study diverse research topics and have diverse h‐indices. Moreover, by setting PhD graduation as an important milestone in researchers' careers, we examine several indicators related to scientific collaboration and their effects on a career. The results show that collaborating with authoritative authors plays an important role prior to a researcher's PhD graduation, but working with non‐authoritative authors carries more weight after PhD graduation.
    September 19, 2017   doi: 10.1002/asi.23911   open full text
  • Five decades of gratitude: A meta‐synthesis of acknowledgments research.
    Nadine Desrochers, Adèle Paul‐Hus, Jen Pecoskie.
    Journal of the American Society for Information Science and Technology. September 19, 2017
    This review of the literature presents an overview of the last 50 years of research on acknowledgments in the context of scholarly communication. Through qualitative coding and bibliometric methods, this meta‐synthesis provides an in‐depth description of acknowledgments research and reveals the five main thematic categories that emerge from this corpus of literature. Adopting a historical approach, this review shows a diversified and scattered research landscape. Despite five decades of analysis putting forward the potential value of acknowledgments as markers of scientific capital, the literature still lacks consensus as to the value and functions of acknowledgments within the reward system of science.
    September 19, 2017   doi: 10.1002/asi.23903   open full text
  • Exploring the social influence of multichannel access in an online health community.
    Peng Luo, Kun Chen, Chong Wu, Yongli Li.
    Journal of the American Society for Information Science and Technology. September 19, 2017
    Social influence has a great impact on human behavior, which has been widely investigated in various research fields. Even so, it has rarely been investigated in the online health community. In this paper, we focus on the multichannel access in online health communities, defining social influence as the average degree of multichannel access to a physician's colleagues. Based on the multinomial logistic regression model, we examined the direct effects of social influence and patients' rating to multichannel access. In addition, we explored the moderating effect of social influence on the relationship between patients' rating and multichannel access in online health communities. The results of the experiment and robustness testing support the propositions that social influence and patients' rating significantly and positively affect multichannel access in an online health community. The moderating effect of social influence is negative and significantly influences the accessible channels provided by the focal physician. This research contributes to the literature concerning online health communities, social influence, and multichannel access; it also has practical implications.
    September 19, 2017   doi: 10.1002/asi.23928   open full text
  • Data set mentions and citations: A content analysis of full‐text publications.
    Mengnan Zhao, Erjia Yan, Kai Li.
    Journal of the American Society for Information Science and Technology. September 19, 2017
    This study provides evidence of data set mentions and citations in multiple disciplines based on a content analysis of 600 publications in PLoS One. We find that data set mentions and citations varied greatly among disciplines in terms of how data sets were collected, referenced, and curated. While a majority of articles provided free access to data, formal ways of data attribution such as DOIs and data citations were used in a limited number of articles. In addition, data reuse took place in less than 30% of the publications that used data, suggesting that researchers are still inclined to create and use their own data sets, rather than reusing previously curated data. This paper provides a comprehensive understanding of how data sets are used in science and helps institutions and publishers make useful data policies.
    September 19, 2017   doi: 10.1002/asi.23919   open full text
  • Intrainstitutional EHR collections for patient‐level information retrieval.
    Stephen Wu, Sijia Liu, Yanshan Wang, Tamara Timmons, Harsha Uppili, Steven Bedrick, William Hersh, Hongfang Liu.
    Journal of the American Society for Information Science and Technology. September 18, 2017
    Research in clinical information retrieval has long been stymied by the lack of open resources. However, both clinical information retrieval research innovation and legitimate privacy concerns can be served by the creation of intrainstitutional, fully protected resources. In this article, we provide some principles and tools for information retrieval resource‐building in the unique problem setting of patient‐level information retrieval, following the tradition of the Cranfield paradigm. We further include an analysis of parallel information retrieval resources at Oregon Health & Science University and Mayo Clinic that were built on these principles.
    September 18, 2017   doi: 10.1002/asi.23884   open full text
  • Clinical information extraction using small data: An active learning approach based on sequence representations and word embeddings.
    Mahnoosh Kholghi, Lance De Vine, Laurianne Sitbon, Guido Zuccon, Anthony Nguyen.
    Journal of the American Society for Information Science and Technology. September 18, 2017
    This article demonstrates the benefits of using sequence representations based on word embeddings to inform the seed selection and sample selection processes in an active learning pipeline for clinical information extraction. Seed selection refers to choosing an initial sample set to label to form an initial learning model. Sample selection refers to selecting informative samples to update the model at each iteration of the active learning process. Compared to supervised machine learning approaches, active learning offers the opportunity to build statistical classifiers with a reduced amount of training samples that require manual annotation. Reducing the manual annotation effort can support automating the clinical information extraction process. This is particularly beneficial in the clinical domain, where manual annotation is a time‐consuming and costly task, as it requires extensive labor from clinical experts. Our empirical findings demonstrate that (a) using sequence representations along with the length of sequence for seed selection shows potential towards more effective initial models, and (b) using sequence representations for sample selection leads to significantly lower manual annotation efforts, with up to 3% and 6% fewer tokens and concepts requiring annotation, respectively, compared to state‐of‐the‐art query strategies.
    September 18, 2017   doi: 10.1002/asi.23936   open full text
  • What makes an effective clinical query and querier?
    Bevan Koopman, Guido Zuccon, Peter Bruza.
    Journal of the American Society for Information Science and Technology. September 18, 2017
    In this paper, we perform an in‐depth study into how clinicians represent their information needs and the influence this has on information retrieval (IR) effectiveness. While much research in IR has considered the effectiveness of IR systems, there is still a significant gap in the understanding of how users contribute to the effectiveness of these systems. The paper aims to contribute to this by studying how clinicians search for information. Multiple representations of an information need—from verbose patient case descriptions to ad‐hoc queries—were considered in order to understand their effect on retrieval. Four clinicians provided queries and performed relevance assessment to form a test collection used in this study. The different query formulation strategies of each clinician, and their effectiveness, were investigated. The results show that query formulation had more impact on retrieval effectiveness than the particular retrieval systems used. The most effective queries were short, ad‐hoc keyword queries. Different clinicians were observed to consistently adopt specific query formulation strategies. The most effective queriers were those who, given their information need, inferred novel keywords most likely to appear in relevant documents. This study reveals aspects of how people search within the clinical domain. This can help inform the development of new models and methods that specifically focus on the query formulation process to improve retrieval effectiveness.
    September 18, 2017   doi: 10.1002/asi.23959   open full text
  • A novel approach to explore patent development paths for subfield technologies.
    Jae Ha Gwak, So Young Sohn.
    Journal of the American Society for Information Science and Technology. September 18, 2017
    Many algorithms for searching the main paths according to a subfield technology have been suggested using patent citation networks. However, the process by which the specific field is divided into subfields is problematic, in the sense that citation information among individual patents in different subfields is lost. If convergence technologies are common among different subfields, this problem can result in an inappropriate main path being found. To resolve this problem, we propose a new algorithm capable of extracting the main paths for each subfield of a technology by using the International Patent Classification (IPC) codes of patents. The proposed algorithm is applied to the waste management field to allow core technologies in the subfields of waste disposal, waste treatment, and the reuse of waste materials to be found. Finally, a sensitivity analysis is performed by considering the standard of the component ratio along a specific path to demonstrate the robustness of the proposed algorithm in the waste management field.
    September 18, 2017   doi: 10.1002/asi.23962   open full text
  • Antecedents and learning outcomes of online news engagement.
    Heather L. O'Brien.
    Journal of the American Society for Information Science and Technology. September 17, 2017
    User engagement (UE) is a quality of user experience characterized by the depth of an actor's cognitive, temporal, and/or emotional investment in an interaction with a digital system. Currently more art than science, UE has gained theoretical and methodological traction over the past decade, yet there is still a need to establish empirical links between UE and desired outcomes (e.g., learning, behavior change), and to understand the myriad user, system, contextual, and so on, factors that predict successful digital engagement. This paper focuses on the relationship between UE and media format as a potential antecedent, and the outcome of learning, operationalized as short‐term knowledge retention. Participants interacted with two human‐interest stories in one of four media formats: video, audio, narrative text, or transcript‐style text; short‐term knowledge retention was measured using post‐task multiple choice and short‐answer questions. It was anticipated that format would have a strong effect on UE, and that more engaged users would recall more information about the stories. However, these hypotheses were not fully supported, and the nature of the relationship between UE and learning was more nuanced than expected. This research has implications for the design of information systems and, more fundamentally, the impetus to make digital environments engaging.
    September 17, 2017   doi: 10.1002/asi.23854   open full text
  • Ethical dilemma: Deception dynamics in computer‐mediated group communication.
    Shuyuan Mary Ho, Jeffrey T. Hancock, Cheryl Booth.
    Journal of the American Society for Information Science and Technology. September 17, 2017
    Words symbolically represent communicative and behavioral intent, and can provide clues to a communicator's future actions in online communication. This paper describes a sociotechnical study conducted from 2008 through 2015 to identify deceptive communicative intent within group context as manifested in language‐action cues. Specifically, this study used an online team‐based game that simulates real‐world deceptive insider scenarios to examine several dimensions of group communication. First, we studied how language‐action cues differ between groups with and groups without a compromised actor. We also examine how these cues differ within groups in terms of the group members' individual and collective interactions with the compromised actor. Finally, we look at how the cues of compromised actors differ from those of noncompromised actors, and how communication behavior changes after an actor is presented with an ethical dilemma. The results of the study further our understanding of language‐action cues as indicators for unmasking a potential deceptive insider.
    September 17, 2017   doi: 10.1002/asi.23849   open full text
  • Benford's law: A “sleeping beauty” sleeping in the dirty pages of logarithmic tables.
    Tariq Ahmad Mir, Marcel Ausloos.
    Journal of the American Society for Information Science and Technology. September 17, 2017
    Benford's law is an empirical observation, first reported by Simon Newcomb in 1881 and then independently by Frank Benford in 1938: the first significant digits of numbers in large data are often distributed according to a logarithmically decreasing function. Being contrary to intuition, the law was forgotten as a mere curious observation. However, in the last two decades relevant literature has grown exponentially—an evolution typical of “Sleeping Beauties” (SBs) publications that go unnoticed (sleep) for a long time and then suddenly become the center of attention (are awakened). Thus, in the present study, we show that the two papers, Newcomb () and Benford (), Newcomb (, American Journal of Mathematics, 4, 39–40) and Benford (1938, Proc. Am. Phil. Soc., 78, 551–572) papers are clearly SBs. The former was in a deep sleep for 110 years, whereas the latter was in a deep sleep for a comparatively lesser period of 31 years up to 1968, and in a state of less deep sleep for another 27 years, up to 1995. Both SBs were awakened in the year 1995 by Hill (, Statistical Science, 10, 354–363). In so doing, we show that the waking prince (Hill, ) is more often quoted than the SB whom he kissed—in this Benford's law case, wondering whether this is a general effect—to be usefully studied.
    September 17, 2017   doi: 10.1002/asi.23845   open full text
  • A bibliometric model for identifying emerging research topics.
    Qi Wang.
    Journal of the American Society for Information Science and Technology. September 17, 2017
    Detecting emerging research topics is essential, not only for research agencies but also for individual researchers. Previous studies have created various bibliographic indicators for the identification of emerging research topics. However, as indicated by Rotolo et al. (Research Policy 44, 1827–1843, ), the most serious problems are the lack of an acknowledged definition of emergence and incomplete elaboration of the linkages between the definitions that are used and the indicators that are created. With these issues in mind, this study first adjusts the definition of an emerging technology that Rotolo et al. () have proposed to accommodate the analysis. Next, a set of criteria for the identification of emerging topics is proposed according to the adjusted definition and attributes of emergence. Using two sets of parameter values, several emerging research topics are identified. Finally, evaluation tests are conducted by demonstration of the proposed approach and comparison with previous studies. The strength of the present methodology lies in the fact that it is fully transparent, straightforward, and flexible.
    September 17, 2017   doi: 10.1002/asi.23930   open full text
  • Assessing perceived organizational leadership styles through twitter text mining.
    Agostino La Bella, Andrea Fronzetti Colladon, Elisa Battistoni, Silvia Castellan, Matteo Francucci.
    Journal of the American Society for Information Science and Technology. September 15, 2017
    We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000—out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10‐factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
    September 15, 2017   doi: 10.1002/asi.23918   open full text
  • Classifying tumor event attributes in radiology reports.
    Wen‐wai Yim, Sharon W. Kwan, Meliha Yetisgen.
    Journal of the American Society for Information Science and Technology. September 14, 2017
    Radiology reports contain vital diagnostic information that characterizes patient disease progression. However, information from reports is represented in free text, which is difficult to query against for secondary use. Automatic extraction of important information, such as tumor events using natural language processing, offers possibilities in improved clinical decision support, cohort identification, and retrospective evidence‐based research for cancer patients. The goal of this work was to classify tumor event attributes: negation, temporality, and malignancy, using biomedical ontology and linguistically enriched features. We report our results on an annotated corpus of 101 hepatocellular carcinoma patient radiology reports, and show that the improved classification improves overall template structuring. Classification performances for negation identification, past temporality classification, and malignancy classification were at 0.94, 0.62, and 0.77 F1, respectively. Incorporating the attributes into full templates led to an improvement of 0.72 F1 for tumor‐related events over a baseline of 0.65 F1. Improvement of negation, malignancy, and temporality classifications led to significant improvements in template extraction for the majority of categories. We present our machine‐learning approach to identifying these several tumor event attributes from radiology reports, as well as highlight challenges and areas for improvement.
    September 14, 2017   doi: 10.1002/asi.23937   open full text
  • Learning to reformulate long queries for clinical decision support.
    Luca Soldaini, Andrew Yates, Nazli Goharian.
    Journal of the American Society for Information Science and Technology. September 14, 2017
    The large volume of biomedical literature poses a serious problem for medical professionals, who are often struggling to keep current with it. At the same time, many health providers consider knowledge of the latest literature in their field a key component for successful clinical practice. In this work, we introduce two systems designed to help retrieving medical literature. Both receive a long, discursive clinical note as input query, and return highly relevant literature that could be used in support of clinical practice. The first system is an improved version of a method previously proposed by the authors; it combines pseudo relevance feedback and a domain‐specific term filter to reformulate the query. The second is an approach that uses a deep neural network to reformulate a clinical note. Both approaches were evaluated on the 2014 and 2015 TREC CDS datasets; in our tests, they outperform the previously proposed method by up to 28% in inferred NDCG; furthermore, they are competitive with the state of the art, achieving up to 8% improvement in inferred NDCG.
    September 14, 2017   doi: 10.1002/asi.23924   open full text
  • Location‐aware targeted influence maximization in social networks.
    Sen Su, Xiao Li, Xiang Cheng, Chenna Sun.
    Journal of the American Society for Information Science and Technology. September 12, 2017
    In this paper, we study the location‐aware targeted influence maximization problem in social networks, which finds a seed set to maximize the influence spread over the targeted users. In particular, we consider those users who have both topic and geographical preferences on promotion products as targeted users. To efficiently solve this problem, one challenge is how to find the targeted users and compute their preferences efficiently for given requests. To address this challenge, we devise a TR‐tree index structure, where each tree node stores users' topic and geographical preferences. By traversing the TR‐tree in depth‐first order, we can efficiently find the targeted users. Another challenge of the problem is to devise algorithms for efficient seeds selection. We solve this challenge from two complementary directions. In one direction, we adopt the maximum influence arborescence (MIA) model to approximate the influence spread, and propose two efficient approximation algorithms with 1−1/e approximation ratio, which prune some candidate seeds with small influences by precomputing users' initial influences offline and estimating the upper bound of their marginal influences online. In the other direction, we propose a fast heuristic algorithm to improve efficiency. Experiments conducted on real‐world data sets demonstrate the effectiveness and efficiency of our proposed algorithms.
    September 12, 2017   doi: 10.1002/asi.23931   open full text
  • Scientists' data reuse behaviors: A multilevel analysis.
    Youngseek Kim, Ayoung Yoon.
    Journal of the American Society for Information Science and Technology. September 12, 2017
    This study explores the factors that influence the data reuse behaviors of scientists and identifies the generalized patterns that occur in data reuse across various disciplines. This research employed an integrated theoretical framework combining institutional theory and the theory of planned behavior. The combined theoretical framework can apply the institutional theory at the individual level and extend the theory of planned behavior by including relevant contexts. This study utilized a survey method to test the proposed research model and hypotheses. Study participants were recruited from the Community of Science's (CoS) Scholar Database, and a total of 1,528 scientists responded to the survey. A multilevel analysis method was used to analyze the 1,237 qualified responses. This research showed that scientists' data reuse intentions are influenced by both disciplinary level factors (availability of data repositories) and individual level factors (perceived usefulness, perceived concern, and the availability of internal resources). This study has practical implications for promoting data reuse practices. Three main areas that need to be improved are identified: Educating scientists, providing internal supports, and providing external resources and supports such as data repositories.
    September 12, 2017   doi: 10.1002/asi.23892   open full text
  • Consumer valuation of personal information in the age of big data.
    Sesil Lim, JongRoul Woo, Jongsu Lee, Sung‐Yoon Huh.
    Journal of the American Society for Information Science and Technology. September 12, 2017
    In a big data environment, there are growing concerns about the violation of consumer rights regarding information privacy. To induce rational regulations for protecting personal information, it is necessary to separately estimate consumers' values related to different types of personal information. In this article, discrete choice experiments using hypothetical information leakage situations given certain compensation amounts and discrete choice models were used to quantitatively analyze the value of personal information. The results indicate that consumers generally place high value on information that could cause immediate and actual damage from the leakage after identification, such as basic personal information and purchase list and payment information. Consumers value location information and personal medical information differently based on their perceived importance of privacy and their prior experience with personal information leakage. We suggest that the level of regulation should differ according to the type of personal information based on the consumers' valuation. This article contributes to a better understanding of a quantitative approach to pricing personal information.
    September 12, 2017   doi: 10.1002/asi.23915   open full text
  • Representing transmedia fictional worlds through ontology.
    Frank Branch, Theresa Arias, Jolene Kennah, Rebekah Phillips, Travis Windleharth, Jin Ha Lee.
    Journal of the American Society for Information Science and Technology. September 12, 2017
    Currently, there is no structured data standard for representing elements commonly found in transmedia fictional worlds. Although there are websites dedicated to individual universes, the information found on these sites separate out the various formats, concentrate on only the bibliographic aspects of the material, and are only searchable with full text. We have created an ontological model that will allow various user groups interested in transmedia to search for and retrieve the information contained in these worlds based upon their structure. We conducted a domain analysis and user studies based on the contents of Harry Potter, Lord of the Rings, the Marvel Universe, and Star Wars in order to build a new model using Ontology Web Language (OWL) and an artificial intelligence‐reasoning engine. This model can infer connections between transmedia properties such as characters, elements of power, items, places, events, and so on. This model will facilitate better search and retrieval of the information contained within these vast story universes for all users interested in them. The result of this project is an OWL ontology reflecting real user needs based upon user research, which is intuitive for users and can be used by artificial intelligence systems.
    September 12, 2017   doi: 10.1002/asi.23886   open full text
  • Toward effective automated weighted subject indexing: A comparison of different approaches in different environments.
    Kun Lu, Jin Mao, Gang Li.
    Journal of the American Society for Information Science and Technology. September 12, 2017
    Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost‐effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag‐of‐words representation shows the best average performance in the full text environment, while cosine with bag‐of‐words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag‐of‐words representation generally outperforms the concept‐based representation. Further improvement in performance can be obtained by using the learning‐to‐rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776–1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
    September 12, 2017   doi: 10.1002/asi.23912   open full text
  • Local vector pattern with global index angles for a content‐based image retrieval system.
    Jatothu Brahmaiah Naik, Chanamallu Srinivasarao, Giri Babu Kande.
    Journal of the American Society for Information Science and Technology. September 12, 2017
    This article proposes a content‐based image retrieval (CBIR) system that employs an informative pattern‐based descriptor. Recent literature has reported the development of efficient local‐pattern‐based descriptors, including the local vector pattern (LVP). This article extends the LVP formulation by making it more computationally efficient and informative. In the extended LVP‐based extraction process, the global index angles are determined using the mutual information between the patterns, which are obtained from a pair of indexed angles. Thus, the Proposed LVP (PLVP) no longer requires a step to identify patterns in every indexed angle found in the querying phase of the CBIR system. A CBIR system with the PLVP is developed in this article, and the system and its associated methods are tested using data from a benchmark texture database and a natural image database. A performance comparison of the PLVP and traditional patterns, such as the local binary pattern (LBP), completely modeled local binary pattern (CLBP) and local tetra pattern (LTrP), is conducted using the CBIR system. The experimental results reveal the superiority of the PLVP in terms of precision, recall, F‐score and computational efficiency.
    September 12, 2017   doi: 10.1002/asi.23907   open full text
  • Discovering story chains: A framework based on zigzagged search and news actors.
    Cagri Toraman, Fazli Can.
    Journal of the American Society for Information Science and Technology. September 02, 2017
    A story chain is a set of related news articles that reveal how different events are connected. This study presents a framework for discovering story chains, given an input document, in a text collection. The framework has 3 complementary parts that i) scan the collection, ii) measure the similarity between chain‐member candidates and the chain, and iii) measure similarity among news articles. For scanning, we apply a novel text‐mining method that uses a zigzagged search that reinvestigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct 2 user studies in terms of 4 effectiveness measures—relevance, coverage, coherence, and ability to disclose relations. The first user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides statistically significant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines significantly outperforms our method.
    September 02, 2017   doi: 10.1002/asi.23885   open full text
  • Understanding an enriched multidimensional user relevance model by analyzing query logs.
    Jingfei Li, Peng Zhang, Dawei Song, Yue Wu.
    Journal of the American Society for Information Science and Technology. August 29, 2017
    Modeling multidimensional relevance in information retrieval (IR) has attracted much attention in recent years. However, most existing studies are conducted through relatively small‐scale user studies, which may not reflect a real‐world and natural search scenario. In this article, we propose to study the multidimensional user relevance model (MURM) on large scale query logs, which record users' various search behaviors (e.g., query reformulations, clicks and dwelling time, etc.) in natural search settings. We advance an existing MURM model (including five dimensions: topicality, novelty, reliability, understandability, and scope) by providing two additional dimensions, that is, interest and habit. The two new dimensions represent personalized relevance judgment on retrieved documents. Further, for each dimension in the enriched MURM model, a set of computable features are formulated. By conducting extensive document ranking experiments on Bing's query logs and TREC session Track data, we systematically investigated the impact of each dimension on retrieval performance and gained a series of insightful findings which may bring benefits for the design of future IR systems.
    August 29, 2017   doi: 10.1002/asi.23868   open full text
  • How quickly do publications get read? The evolution of mendeley reader counts for new articles.
    Nabeil Maflahi, Mike Thelwall.
    Journal of the American Society for Information Science and Technology. August 29, 2017
    Within science, citation counts are widely used to estimate research impact but publication delays mean that they are not useful for recent research. This gap can be filled by Mendeley reader counts, which are valuable early impact indicators for academic articles because they appear before citations and correlate strongly with them. Nevertheless, it is not known how Mendeley readership counts accumulate within the year of publication, and so it is unclear how soon they can be used. In response, this paper reports a longitudinal weekly study of the Mendeley readers of articles in 6 library and information science journals from 2016. The results suggest that Mendeley readers accrue from when articles are first available online and continue to steadily build. For journals with large publication delays, articles can already have substantial numbers of readers by their publication date. Thus, Mendeley reader counts may even be useful as early impact indicators for articles before they have been officially published in a journal issue. If field normalized indicators are needed, then these can be generated when journal issues are published using the online first date.
    August 29, 2017   doi: 10.1002/asi.23909   open full text
  • Motivations and intentions of flickr users in enriching flick records for library of congress photos.
    Margaret E. I. Kipp, Jihee Beak, Inkyung Choi.
    Journal of the American Society for Information Science and Technology. August 11, 2017
    The purpose of this study is to understand users' motivations and intentions in the use of institutional collections on social tagging sites. Previous social tagging studies have collected social tagging data and analyzed how tagging functions as a tool to organize and retrieve information. Many studies focused on the patterns of tagging rather than the users' perspectives. To provide a more comprehensive picture of users' social tagging activities in institutional collections, and how this compares to social tagging in a more personal context, we collected data from social tagging users by surveying 7,563 participants in the Library of Congress's Flickr Collection. We asked users to describe their motivations for activities within the LC Flickr Collection in their own words using open‐ended questions. As a result, we identified 11 motivations using a bottom‐up, open‐coding approach: affective reactions, opinion on photo, interest in subject, contribution to description, knowledge sharing, improving findability, social network, appreciation, personal use, and personal relationship. Our study revealed that affective or emotional reactions play a critical role in the use of social tagging of institutional collections by comparing our findings to existing frameworks for tagging motivations. We also examined the relationships between participants' occupations and our 11 motivations.
    August 11, 2017   doi: 10.1002/asi.23869   open full text
  • Temporal dynamics of eye‐tracking and EEG during reading and relevance decisions.
    Jacek Gwizdka, Rahilsadat Hosseini, Michael Cole, Shouyi Wang.
    Journal of the American Society for Information Science and Technology. August 11, 2017
    Assessment of text relevance is an important aspect of human–information interaction. For many search sessions it is essential to achieving the task goal. This work investigates text relevance decision dynamics in a question‐answering task by direct measurement of eye movement using eye‐tracking and brain activity using electroencephalography EEG. The EEG measurements are correlated with the user's goal‐directed attention allocation revealed by their eye movements. In a within‐subject lab experiment (N = 24), participants read short news stories of varied relevance. Eye movement and EEG features were calculated in three epochs of reading each news story (early, middle, final) and for periods where relevant words were read. Perceived relevance classification models were learned for each epoch. The results show reading epochs where relevant words were processed could be distinguished from other epochs. The classification models show increasing divergence in processing relevant vs. irrelevant documents after the initial epoch. This suggests differences in cognitive processes used to assess texts of varied relevance levels and provides evidence for the potential to detect these differences in information search sessions using eye tracking and EEG.
    August 11, 2017   doi: 10.1002/asi.23904   open full text
  • Cultural diversity of quality of information on Wikipedias.
    Dariusz Jemielniak, Maciej Wilamowski.
    Journal of the American Society for Information Science and Technology. August 07, 2017
    This article explores the relationship between linguistic culture and the preferred standards of presenting information based on article representation in major Wikipedias. Using primary research analysis of the number of images, references, internal links, external links, words, and characters, as well as their proportions in Good and Featured articles on the eight largest Wikipedias, we discover a high diversity of approaches and format preferences, correlating with culture. We demonstrate that high‐quality standards in information presentation are not globally shared and that in many aspects, the language culture's influence determines what is perceived to be proper, desirable, and exemplary for encyclopedic entries. As a result, we demonstrate that standards for encyclopedic knowledge are not globally agreed‐upon and “objective” but local and very subjective.
    August 07, 2017   doi: 10.1002/asi.23901   open full text
  • Toward the operationalization of visual metaphor.
    Alexis Hiniker, Sungsoo (Ray) Hong, Yea‐Seul Kim, Nan‐Chen Chen, Jevin D. West, Cecilia Aragon.
    Journal of the American Society for Information Science and Technology. August 07, 2017
    Many successful digital interfaces employ visual metaphors to convey features or data properties to users, but the characteristics that make a visual metaphor effective are not well understood. We used a theoretical conception of metaphor from cognitive linguistics to design an interactive system for viewing the citation network of the corpora of literature in the JSTOR database, a highly connected compound graph of 2 million papers linked by 8 million citations. We created 4 variants of this system, manipulating 2 distinct properties of metaphor. We conducted a between‐subjects experimental study with 80 participants to compare understanding and engagement when working with each version. We found that building on known image schemas improved response time on look‐up tasks, while contextual detail predicted increases in persistence and the number of inferences drawn from the data. Schema‐congruency combined with contextual detail produced the highest gains in comprehension. These findings provide concrete mechanisms by which designers presenting large data sets through metaphorical interfaces may improve their effectiveness and appeal with users.
    August 07, 2017   doi: 10.1002/asi.23857   open full text
  • Stochastic reranking of biomedical search results based on extracted entities.
    Pavlos Fafalios, Yannis Tzitzikas.
    Journal of the American Society for Information Science and Technology. July 31, 2017
    Health‐related information is nowadays accessible from many sources and is one of the most searched‐for topics on the Internet. However, existing search systems often fail to provide users with a good list of medical search results, especially for classic (keyword‐based) queries. In this article we elaborate on whether and how we can exploit biomedicine‐related entities from the emerging Web of Data for improving (through reranking) the results returned by a search system. The aim is to promote relevant but low‐ranked hits containing entities that are important to the current search context. We introduce an approach that is based on entity extraction applied on the retrieved documents, yielding a graph of documents along with entities, which in turn is analyzed probabilistically using a Random Walk‐based method. The proposed approach is independent of the submitted query and the underlying retrieval models, and thus can be applied over any ranked list of medical search results. Evaluation results using the data set of TREC Clinical Decision Support track demonstrate that the proposed approach can significantly improve the results returned by classic and widely applicable retrieval models. The results also enabled us to identify cases where the proposed reranking method fails to improve the ranking.
    July 31, 2017   doi: 10.1002/asi.23877   open full text
  • The aboutness of words.
    Edward T. O'Neill, Kerre A. Kammerer, Rick Bennett.
    Journal of the American Society for Information Science and Technology. July 31, 2017
    Word aboutness is defined as the relationship between words and subjects associated with them. An aboutness coefficient is developed to estimate the strength of the aboutness relationship. Words that are randomly distributed across subjects are assumed to lack aboutness and the degree to which their usage deviates from a random pattern indicates the strength of the aboutness. To estimate aboutness, title words and their associated subjects are extracted from the titles of non‐fiction English language books in the OCLC WorldCat database. The usage patterns of the title words are analyzed and used to compute aboutness coefficients for each of the common title words. Words with low aboutness coefficients (An and In) are commonly found in stop word lists, whereas words with high aboutness coefficients (Carbonate, Autism) are unambiguous and have a strong subject association. The aboutness coefficient potentially can enhance indexing, advance authority control, and improve retrieval.
    July 31, 2017   doi: 10.1002/asi.23856   open full text
  • Cognitive modeling of age‐related differences in information search behavior.
    Saraschandra Karanam, Herre van Oostendorp, Mylene Sanchiz, Aline Chevalier, Jessie Chin, Wai‐Tat Fu.
    Journal of the American Society for Information Science and Technology. July 18, 2017
    In this study, we evaluated the ability of computational cognitive models of web‐navigation like CoLiDeS and CoLiDeS+ to model i) user interactions with search engines and ii) individual differences in search behavior due to variations in cognitive factors such as aging. CoLiDeS and CoLiDeS+ were extended to predict user clicks on search engine result pages. Their performance was evaluated using actual behavioral data from an experiment in which 2 types of information search tasks (simple vs. difficult), were presented to younger and older participants. The results showed that the model predictions matched significantly better with the actual user behavior on difficult tasks compared to simple tasks and with younger participants compared to older participants, especially for difficult tasks. Also, the matches were significantly better with CoLiDeS+ compared to CoLiDeS, especially for difficult tasks. We conclude that the advanced capabilities of CoLiDeS+, such as incorporating contextual information and implementing backtracking strategies enable it to predict user behavior significantly better than CoLiDeS, especially on difficult tasks. The usefulness of these modeling outcomes for the design of support systems for older adults is discussed.
    July 18, 2017   doi: 10.1002/asi.23893   open full text
  • Citations, mandates, and money: Author motivations to publish in chemistry hybrid open access journals.
    Gregory M. Nelson, Dennis L. Eggett.
    Journal of the American Society for Information Science and Technology. July 06, 2017
    Hybrid open access refers to articles freely accessible via the Internet but which originate from an academic journal that provides most of its content via subscription. The effect of hybrid open access on citation counts and author behavior in the field of chemistry is something that has not been widely studied. We compared 814 open access articles and 27,621 subscription access articles published from 2006 through 2011 in American Chemical Society journals. As expected, the 2 comparison groups are not equal in all respects. Cumulative citation data were analyzed from years 2–5 following an article's publication date. A citation advantage for open access articles was correlated with the journal impact factor (IF) in low and medium IF journals, but not in high IF journals. Open access articles have a 24% higher mean citation rate than their subscription counterparts in low IF journals (confidence limits 8–42%, p = .0022) and similarly, a 26% higher mean citation rate in medium IF journals (confidence limits 14–40%, p < .001). Open access articles in high IF journals had no significant difference compared to subscription access articles (13% lower mean citation rate, confidence limits −27–3%, p = .10). These results are correlative, not causative, and may not be completely due to an open access effect. Authors of the open access articles were also surveyed to determine why they chose a hybrid open access option, paid the required article processing charge, and whether they believed it was money well spent. Authors primarily chose open access because of funding mandates; however, most considered the money well spent because open access increases information access to the scientific community and the general public, and potentially increases citations to their scholarship.
    July 06, 2017   doi: 10.1002/asi.23897   open full text
  • Exploiting item co‐utility to improve collaborative filtering recommendations.
    A. Bessa, R.L.T. Santos, A. Veloso, N. Ziviani.
    Journal of the American Society for Information Science and Technology. July 06, 2017
    In this article we study the extent to which the interplay between recommended items affect recommendation effectiveness. We introduce and formalize the concept of co‐utility as the property that any pair of recommended items has of being useful to a user, and exploit it to improve collaborative filtering recommendations. We present different techniques to estimate co‐utility probabilities, all of them independent of content information, and compare them with each other. We use these probabilities, as well as normalized predicted ratings, in an instance of an NP‐hard problem termed the Max‐Sum Dispersion Problem (MSDP). A solution to MSDP hence corresponds to a set of items for recommendation. We study one heuristic and one exact solution to MSDP and perform comparisons among them. We also contrast our solutions (the best heuristic to MSDP) to different baselines by comparing the ratings users give to different recommendations. We obtain expressive gains in the utility of recommendations and our solutions also recommend higher‐rated items to the majority of users. Finally, we show that our co‐utility solutions are scalable in practice and do not harm recommendations' diversity.
    July 06, 2017   doi: 10.1002/asi.23853   open full text
  • A combined fuzzy‐SEM evaluation approach to identify the key drivers of the academic library service quality in the digital technology era: An empirical study.
    Concetta Manuela La Fata, Toni Lupo.
    Journal of the American Society for Information Science and Technology. July 06, 2017
    A conceptual model of the Academic Library (AL) service quality is hypothesized in the present article, and then validated and analyzed by a novel evaluation approach. Specifically, the conceptual model integrates the fundamental attributes of the canonical AL service together with those more relevant of the new and widely considered AL Electronic Service (e‐services). As concerns the evaluation approach, it incorporates the Fuzzy Sets Theory (FST) so as to deal with the students' uncertainty over their own judgments on the AL service quality and a Structural Equation Model (SEM) to validate the conceptual model and to determine the key drivers of the AL service quality. The effectiveness of the proposed approach is proved by an empirical study concerning the AL of the Polytechnic School of the University of Palermo (Italy). Data collected via a survey involving more than 600 students are used, and the key drivers of the AL service quality are found out. Particularly, the obtained results reveal that Collections and materials represents the main driver of the AL service quality followed by Infrastructure and Access to the service, whereas Staff plays the fundamental role of interface between AL service aspects and students' needs and necessities.
    July 06, 2017   doi: 10.1002/asi.23878   open full text
  • MOOC visual analytics: Empowering students, teachers, researchers, and platform developers of massively open online courses.
    Scott R. Emmons, Robert P. Light, Katy Börner.
    Journal of the American Society for Information Science and Technology. July 06, 2017
    Along with significant opportunities, Massively Open Online Courses (MOOCs) provide major challenges to students (keeping track of course materials and effectively interacting with teachers and fellow students), teachers (managing thousands of students and supporting their learning progress), researchers (understanding how students interact with materials and each other), and MOOC platform developers (supporting effective course design and delivery in a scalable way). This article demonstrates the use of data analysis and visualization as a means to empower students, teachers, researchers, and platform developers by making large volumes of data easy to understand. First, we introduce the insight needs of different stakeholder groups. Second, we compare the wide variety of data provided by major MOOC platforms. Third, we present a novel framework that distinguishes visualizations by the type of questions they answer. We then review the state of the art MOOC visual analytics using a tabulation of stakeholder needs versus visual analytics workflow types. Finally, we present new data analysis and visualization workflows for statistical, geospatial, and topical insights. The workflows have been optimized and validated in the Information Visualization MOOC (IVMOOC) annually taught at Indiana University since 2013. All workflows, sample data, and visualizations are provided at http://cns.iu.edu/2016-MOOCVis.html.
    July 06, 2017   doi: 10.1002/asi.23852   open full text
  • Automatic event detection in microblogs using incremental machine learning.
    Tharindu Rukshan Bandaragoda, Daswin De Silva, Damminda Alahakoon.
    Journal of the American Society for Information Science and Technology. July 06, 2017
    The global popularity of microblogs has led to an increasing accumulation of large volumes of text data on microblogging platforms such as Twitter. These corpora are untapped resources to understand social expressions on diverse subjects. Microblog analysis aims to unlock the value of such expressions by discovering insights and events of significance hidden among swathes of text. Besides velocity; diversity of content, brevity, absence of structure and time‐sensitivity are key challenges in microblog analysis. In this paper, we propose an unsupervised incremental machine learning and event detection technique to address these challenges. The proposed technique separates a microblog discussion into topics to address the key problem of diversity. It maintains a record of the evolution of each topic over time. Brevity, time‐sensitivity and unstructured nature are addressed by these individual topic pathways which contribute to generate a temporal, topic‐driven structure of a microblog discussion. The proposed event detection method continuously monitors these topic pathways using multiple domain‐independent event indicators for events of significance. The autonomous nature of topic separation, topic pathway generation, new topic identification and event detection, appropriates the proposed technique for extensive applications in microblog analysis. We demonstrate these capabilities on tweets containing #microsoft and tweets containing #obama.
    July 06, 2017   doi: 10.1002/asi.23896   open full text
  • Exploring the social effect of outstanding scholars on future research accomplishments.
    Chien Hsiang Liao.
    Journal of the American Society for Information Science and Technology. July 04, 2017
    Outstanding scholars have generally been regarded as having special influence that enables them to publish articles in top‐tier journals and obtain higher levels of research funding. This study proposes that the social effect of an outstanding scholar, which is derived from the halo effect and the Matthew effect, is favorable for the expansion of the scholar's personal research network and will improve that scholar's future research accomplishments. Data for a total of 101 outstanding information systems scholars and 36 ordinary scholars were collected. The definition of an outstanding scholar is based on the quality and quantity of their publications. The results show that the social effect of the outstanding scholars is beneficial for the development of a research network, including 3 types of network structures. In addition, being highly connected with colleagues leads to higher research accomplishments in terms of quantity, while being connected with colleagues from different sub‐fields leads to higher research accomplishments in terms of novelty. Additionally, this study found that the social effect of outstanding scholars is a double‐edged sword, with both positive and negative impacts on research accomplishments. The findings contribute several theoretical and practical implications for future research.
    July 04, 2017   doi: 10.1002/asi.23887   open full text
  • Reader characteristics, behavior, and success in fiction book search.
    Mikkonen Anna, Vakkari Pertti.
    Journal of the American Society for Information Science and Technology. July 04, 2017
    We examined the search behaviors of diverse fiction readers in different search scenarios. The aim was to understand how fiction readers with varied reading preferences are selecting interesting novels in library catalogs. We conducted a controlled user study with 80 participants. Two reader groups were elicited according to similar reading preference patterns. The readers enjoyed the entertainment, escape, and comfort that reading as a pleasurable activity offered. The aesthetic readers valued the artistic and aesthetic pleasures, widening vocabulary, and ability to express oneself through fiction books. We compared the search queries and search actions between the 2 reader groups. Our results demonstrated that preference patterns were associated with readers' search behavior, that is, the number of viewed search result pages, opened book pages, dwell time on book pages, and the type of search queries. Based on the findings, we present 3 search tactics for fiction books in library catalogs: i) focused querying, ii) topical browsing, and iii) similarity‐based tactic. The most popular search tactic in each search scenario was “focused querying” with known author in both reader groups.
    July 04, 2017   doi: 10.1002/asi.23843   open full text
  • Person entity linking in email with NIL detection.
    Ning Gao, Mark Dredze, Douglas W. Oard.
    Journal of the American Society for Information Science and Technology. July 04, 2017
    For each specific mention of an entity found in a text, the goal of entity linking is to determine whether the referenced entity is present in an existing knowledge base, and if so to determine which KB entity is the correct referent. Entity linking has been well explored for dissemination‐oriented sources such as news stories, blogs, and microblog posts, but the limited work to date on “conversational” sources such as email or text chat has not yet attempted to determine when the referent entity is not in the knowledge base (a task known as “NIL detection”). This article presents a supervised machine learning system for linking named mentions of people in email messages to a collection‐specific knowledge base, and that is also capable of NIL detection. This system learns from manually annotated training examples to leverage a rich set of features. The entity linking accuracy for entities present in the knowledge base is substantially and significantly better than the best previously reported results on the Enron email collection, comparable accuracy is reported for the challenging NIL detection task, and these results are for the first time replicated on a second email collection from a different source with comparable results.
    July 04, 2017   doi: 10.1002/asi.23888   open full text
  • Mendeley readership as a filtering tool to identify highly cited publications.
    Zohreh Zahedi, Rodrigo Costas, Paul Wouters.
    Journal of the American Society for Information Science and Technology. July 03, 2017
    This study presents a large‐scale analysis of the distribution and presence of Mendeley readership scores over time and across disciplines. We study whether Mendeley readership scores (RS) can identify highly cited publications more effectively than journal citation scores (JCS). Web of Science (WoS) publications with digital object identifiers (DOIs) published during the period 2004–2013 and across five major scientific fields were analyzed. The main result of this study shows that RS are more effective (in terms of precision/recall values) than JCS to identify highly cited publications across all fields of science and publication years. The findings also show that 86.5% of all the publications are covered by Mendeley and have at least one reader. Also, the share of publications with Mendeley RS is increasing from 84% in 2004 to 89% in 2009, and decreasing from 88% in 2010 to 82% in 2013. However, it is noted that publications from 2010 onwards exhibit on average a higher density of readership versus citation scores. This indicates that compared to citation scores, RS are more prevalent for recent publications and hence they could work as an early indicator of research impact. These findings highlight the potential and value of Mendeley as a tool for scientometric purposes and particularly as a relevant tool to identify highly cited publications.
    July 03, 2017   doi: 10.1002/asi.23883   open full text
  • Trolling here, there, and everywhere: Perceptions of trolling behaviors in context.
    Madelyn Sanfilippo, Shengnan Yang, Pnina Fichman.
    Journal of the American Society for Information Science and Technology. June 28, 2017
    Online trolling has become increasingly prevalent and visible in online communities. Perceptions of and reactions to trolling behaviors varies significantly from one community to another, as trolling behaviors are contextual and vary across platforms and communities. Through an examination of seven trolling scenarios, this article intends to answer the following questions: how do trolling behaviors differ across contexts; how do perceptions of trolling differ from case to case; and what aspects of context of trolling are perceived to be important by the public? Based on focus groups and interview data, we discuss the ways in which community norms and demographics, technological features of platforms, and community boundaries are perceived to impact trolling behaviors. Two major contributions of the study include a codebook to support future analysis of trolling and formal concept analysis surrounding contextual perceptions of trolling.
    June 28, 2017   doi: 10.1002/asi.23902   open full text
  • Extracting audio summaries to support effective spoken document search.
    Damiano Spina, Johanne R. Trippas, Lawrence Cavedon, Mark Sanderson.
    Journal of the American Society for Information Science and Technology. June 28, 2017
    We address the challenge of extracting query biased audio summaries from podcasts to support users in making relevance decisions in spoken document search via an audio‐only communication channel. We performed a crowdsourced experiment that demonstrates that transcripts of spoken documents created using Automated Speech Recognition (ASR), even with significant errors, are effective sources of document summaries or “snippets” for supporting users in making relevance judgments against a query. In particular, the results show that summaries generated from ASR transcripts are comparable, in utility and user‐judged preference, to spoken summaries generated from error‐free manual transcripts of the same collection. We also observed that content‐based audio summaries are at least as preferred as synthesized summaries obtained from manually curated metadata, such as title and description. We describe a methodology for constructing a new test collection, which we have made publicly available.
    June 28, 2017   doi: 10.1002/asi.23831   open full text
  • Implicit opinion analysis: Extraction and polarity labelling.
    Hen‐Hsen Huang, Jun‐Jie Wang, Hsin‐Hsi Chen.
    Journal of the American Society for Information Science and Technology. June 27, 2017
    Opinion words are crucial information for sentiment analysis. In some text, however, opinion words are absent or highly ambiguous. The resulting implicit opinions are more difficult to extract and label than explicit ones. In this paper, cutting‐edge machine‐learning approaches – deep neural network and word‐embedding – are adopted for implicit opinion mining at the snippet and clause levels. Hotel reviews written in Chinese are collected and annotated as the experimental data set. Results show the convolutional neural network models not only outperform traditional support vector machine models, but also capture hidden knowledge within the raw text. The strength of word‐embedding is also analyzed.
    June 27, 2017   doi: 10.1002/asi.23835   open full text
  • Design and in‐situ evaluation of a mixed‐initiative approach to information organization.
    Mona Haraty, Zhongyuan Wang, Helen Wang, Shamsi Iqbal, Jaime Teevan.
    Journal of the American Society for Information Science and Technology. June 22, 2017
    Organizing personal information by folders or tags has proved to be effective for finding, remembering, and understanding information. However, past studies have shown that the cost of organization can be too high for some users to be worth the effort. Mixed‐initiative approaches attempt to reduce the burden of manual organization by automatically identifying and suggesting organizational units such as folders to users. However, little is known about how such mixed‐initiative approaches influence users' organizational experiences. In this paper, we explore a mixed‐initiative approach that suggests high‐level organizational units to users to facilitate e‐mail organization. In 2 in‐situ experiments with 34 knowledge workers, we study how our mixed‐initiative approach influenced users' experience with organization. We show that our approach made it easier to create organizational units without negatively affecting recall of those units, and led to the creation of units that otherwise would have not been created. Our findings suggest ways computers and people can most effectively work together to organize information.
    June 22, 2017   doi: 10.1002/asi.23823   open full text
  • Ontologies for the representation of electronic medical records: The obstetric and neonatal ontology.
    Mauricio Barcellos Almeida, Fernanda Farinelli.
    Journal of the American Society for Information Science and Technology. June 22, 2017
    Ontology is an interdisciplinary field that involves both the use of philosophical principles and the development of computational artifacts. As artifacts, ontologies can have diverse applications in knowledge management, information retrieval, and information systems, to mention a few. They have been largely applied to organize information in complex fields like Biomedicine. In this article, we present the OntoNeo Ontology, an initiative to build a formal ontology in the obstetrics and neonatal domain. OntoNeo is a resource that has been designed to serve as a comprehensive infrastructure providing scientific research and healthcare professionals with access to relevant information. The goal of OntoNeo is twofold: (a) to organize specialized medical knowledge, and (b) to provide a potential consensual representation of the medical information found in electronic health records and medical information systems. To describe our initiative, we first provide background information about distinct theories underlying ontology, top‐level computational ontologies and their applications in Biomedicine. Then, we present the methodology employed in the development of OntoNeo and the results obtained to date. Finally, we discuss the applicability of OntoNeo by presenting a proof of concept that illustrates its potential usefulness in the realm of healthcare information systems.
    June 22, 2017   doi: 10.1002/asi.23900   open full text
  • Information practices for sustainability: Role of iSchools in achieving the UN sustainable development goals (SDGs).
    Gobinda Chowdhury, Kushwanth Koya.
    Journal of the American Society for Information Science and Technology. June 22, 2017
    In September 2015, the United Nations (UN) General Assembly passed a resolution identifying 17 Sustainable Development Goals (SDGs) and 169 associated targets, and countries around the world agreed to achieve these by 2030. By conducting a thematic analysis of four key UN policy documents related to sustainable development, this paper argues that alongside financial and other resources, access to, and use of, appropriate information are essential for achieving SDGs. The paper also reviews research on information and sustainability undertaken at the iSchools and the computer and human–computer interaction HCI communities. Given that the mission of iSchools is to connect people and society with the required information through the use of appropriate technologies and tools, this paper argues that iSchools can play a key role in helping people, institutions, and businesses, and thus countries around the world achieve SDGs. The paper identifies 4 broad areas of teaching and research that can help iSchools around the world prepare a trained workforce who can manage, and facilitate access to, information in specific domains and contexts. It is also argued that cooperation and collaborations among iSchools can promote a culture of sustainable information practices among university graduates and researchers in different disciplines that will pave the way for achieving SDGs in every sector.
    June 22, 2017   doi: 10.1002/asi.23825   open full text
  • Securing the human: Employee security vulnerability risk in organizational settings.
    Nina Sebescen, Jessica Vitak.
    Journal of the American Society for Information Science and Technology. June 22, 2017
    As organizational security breaches increase, so too does the need to fully understand the human factors that lead to these breaches and take the necessary steps to minimize threats. The present study evaluates how three sets of employee characteristics (demographic, company‐specific, and skills‐based) predict an employee's likelihood of becoming a security breach victim. In order to move beyond traditional evaluations of security threats, which generally consider security threats individually, analyses in this paper approach security vulnerability from a more holistic approach to analyze four risk categories concurrently: phishing, passwords, bring your own device (BYOD), and company‐supplied laptops. Findings from a survey of 250 employees at a medium‐sized American information technology (IT) consulting firm identify higher‐risk employees across the four risk areas and provide new insights into the challenges organizations face when trying to ensure the protection of company data.
    June 22, 2017   doi: 10.1002/asi.23851   open full text
  • AI Anxiety.
    Deborah G. Johnson, Mario Verdicchio.
    Journal of the American Society for Information Science and Technology. June 22, 2017
    Recently a number of well‐known public figures have expressed concern about the future development of artificial intelligence (AI), by noting that AI could get out of control and affect human beings and society in disastrous ways. Many of these cautionary notes are alarmist and unrealistic, and while there has been some pushback on these concerns, the deep flaws in the thinking that leads to them have not been called out. Much of the fear and trepidation is based on misunderstanding and confusion about what AI is and can ever be. In this work we identify 3 factors that contribute to this “AI anxiety”: an exclusive focus on AI programs that leaves humans out of the picture, confusion about autonomy in computational entities and in humans, and an inaccurate conception of technological development. With this analysis we argue that there are good reasons for anxiety about AI but not for the reasons typically given by AI alarmists.
    June 22, 2017   doi: 10.1002/asi.23867   open full text
  • Measuring text difficulty using parse‐tree frequency.
    David Kauchak, Gondy Leroy, Alan Hogue.
    Journal of the American Society for Information Science and Technology. June 20, 2017
    Text simplification often relies on dated, unproven readability formulas. As an alternative and motivated by the success of term familiarity, we test a complementary measure: grammar familiarity. Grammar familiarity is measured as the frequency of the 3rd level sentence parse tree and is useful for evaluating individual sentences. We created a database of 140K unique 3rd level parse structures by parsing and binning all 5.4M sentences in English Wikipedia. We then calculated the grammar frequencies across the corpus and created 11 frequency bins. We evaluate the measure with a user study and corpus analysis. For the user study, we selected 20 sentences randomly from each bin, controlling for sentence length and term frequency, and recruited 30 readers per sentence (N = 6,600) on Amazon Mechanical Turk. We measured actual difficulty (comprehension) using a Cloze test, perceived difficulty using a 5‐point Likert scale, and time taken. Sentences with more frequent grammatical structures, even with very different surface presentations, were easier to understand, perceived as easier, and took less time to read. Outcomes from readability formulas correlated with perceived but not with actual difficulty. Our corpus analysis shows how the metric can be used to understand grammar regularity in a broad range of corpora.
    June 20, 2017   doi: 10.1002/asi.23855   open full text
  • Online disclosure of illicit information: Information behaviors in two drug forums.
    Kaitlin L. Costello, John D. Martin, Ashlee Edwards Brinegar.
    Journal of the American Society for Information Science and Technology. June 14, 2017
    Although people disclose illicit activities such as drug use online, we currently know little about what information people choose to disclose and share or whether there are differences in behavior depending on the illicit activity being disclosed. This exploratory mixed‐methods study examines how people discuss and disclose the use of two different drugs—marijuana and opioids—on Reddit. In this study, hermeneutic content analysis is employed to describe the type of comments people make in forums dedicated to discussions about illicit drugs. With inductive analysis, seven categories of comments were identified: disclosure, instruction and advice, culture, community norms, moralizing, legality, and banter. Our subsequent quantitative analysis indicates that although the amounts of disclosure are similar in each subreddit, there are more instances of instruction and advice in discussions about opiates, and more examples of banter in comments about marijuana use. In fact, both subreddits have high rates of banter. We argue that banter fosters disclosure in both subreddits, and that banter and disclosure are linked with information‐seeking behaviors in online forums. This work has implications for future explorations of disclosure online and for public health interventions aimed at disseminating credible information about drug use to at‐risk individuals.
    June 14, 2017   doi: 10.1002/asi.23880   open full text
  • Format technology lifecycle analysis.
    Kresimir Duretec, Christoph Becker.
    Journal of the American Society for Information Science and Technology. June 14, 2017
    The lifecycles of format technology have been a defining concern for digital stewardship research and practice. However, little evidence exists to provide robust methods for assessing the state of any given format technology and describing its evolution over time. This article introduces relevant models from diffusion theory and market research and presents a replicable analysis method to compute models of technology evolution. Data cleansing and the combination of multiple data sources enable the application of nonlinear regression to estimate the parameters of the Bass diffusion model on format technology market lifecycles. Through its application to a longitudinal data set from the UK Web Archive, we demonstrate that the method produces reliable results and show that the Bass model can be used to describe format lifecycles. By analyzing adoption patterns across market segments, new insights are inferred about how the diffusion of formats and products such as applications occurs over time. The analysis provides a stepping stone to a more robust and evidence‐based approach to model technology evolution.
    June 14, 2017   doi: 10.1002/asi.23881   open full text
  • Shangri–La: A medical case–based retrieval tool.
    Alba G. Seco de Herrera, Roger Schaer, Henning Müller.
    Journal of the American Society for Information Science and Technology. June 14, 2017
    Large amounts of medical visual data are produced in hospitals daily and made available continuously via publications in the scientific literature, representing the medical knowledge. However, it is not always easy to find the desired information and in clinical routine the time to fulfil an information need is often very limited. Information retrieval systems are a useful tool to provide access to these documents/images in the biomedical literature related to information needs of medical professionals. Shangri–La is a medical retrieval system that can potentially help clinicians to make decisions on difficult cases. It retrieves articles from the biomedical literature when querying a case description and attached images. The system is based on a multimodal retrieval approach with a focus on the integration of visual information connected to text. The approach includes a query–adaptive multimodal fusion criterion that analyses if visual features are suitable to be fused with text for the retrieval. Furthermore, image modality information is integrated in the retrieval step. The approach is evaluated using the ImageCLEFmed 2013 medical retrieval benchmark and can thus be compared to other approaches. Results show that the final approach outperforms the best multimodal approach submitted to ImageCLEFmed 2013.
    June 14, 2017   doi: 10.1002/asi.23858   open full text
  • The mood of Chinese Pop music: Representation and recognition.
    Xiao Hu, Yi‐Hsuan Yang.
    Journal of the American Society for Information Science and Technology. June 13, 2017
    Music mood recognition (MMR) has attracted much attention in music information retrieval research, yet there are few MMR studies that focus on non‐Western music. In addition, little has been done on connecting the 2 most adopted music mood representation models: categorical and dimensional. To bridge these gaps, we constructed a new data set consisting of 818 Chinese Pop (C‐Pop) songs, 3 complete sets of mood annotations in both representations, as well as audio features corresponding to 5 distinct categories of musical characteristics. The mood space of C‐Pop songs was analyzed and compared to that of Western Pop songs. We also explored the relationship between categorical and dimensional annotations and the results revealed that one set of annotations could be reliably predicted by the other. Classification and regression experiments were conducted on the data set, providing benchmarks for future research on MMR of non‐Western music. Based on these analyses, we reflect and discuss the implications of the findings to MMR research.
    June 13, 2017   doi: 10.1002/asi.23813   open full text
  • Using interactive “Nutrition labels” for financial products to assist decision making under uncertainty.
    Junius Gunaratne, Oded Nov.
    Journal of the American Society for Information Science and Technology. June 08, 2017
    Product information labels can help users understand complex information, leading them to make better decisions. One area where consumers are particularly prone to make costly decision‐making errors is long‐term saving, which requires understanding of complex concepts such as uncertainty and trade‐offs. Although most people are poorly equipped to deal with such concepts, interactive design can potentially help users make better decisions. We developed an interactive information label to assist consumers with retirement saving decision‐making. To evaluate it, we exposed 450 users to one of four user interface conditions in a retirement saving simulator where they made 35 yearly decisions under changing circumstances. We found a significantly better ability of users to reach their goals with the information label. Furthermore, users who interacted with the label made better decisions than those who were presented with a static information label. Lastly, we found the label particularly effective in helping novice savers.
    June 08, 2017   doi: 10.1002/asi.23844   open full text
  • Large‐scale extraction of drug–disease pairs from the medical literature.
    Pengwei Wang, Tianyong Hao, Jun Yan, Lianwen Jin.
    Journal of the American Society for Information Science and Technology. June 06, 2017
    Automatic extraction of large‐scale and accurate drug–disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time‐consuming to manually label drug–disease pairs datasets. There are many drug–disease pairs buried in free text. In this work, we first leverage a pattern‐based method to automatically extract drug–disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug–disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug–disease pair. In the experiments, we use the method to extract treatment and inducement drug–disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug–disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug–disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug–disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine‐grained evaluation of extracting frequent pairs.
    June 06, 2017   doi: 10.1002/asi.23876   open full text
  • News stories as evidence for research? BBC citations from articles, Books, and Wikipedia.
    Kayvan Kousha, Mike Thelwall.
    Journal of the American Society for Information Science and Technology. June 05, 2017
    Although news stories target the general public and are sometimes inaccurate, they can serve as sources of real‐world information for researchers. This article investigates the extent to which academics exploit journalism using content and citation analyses of online BBC News stories cited by Scopus articles. A total of 27,234 Scopus‐indexed publications have cited at least one BBC News story, with a steady annual increase. Citations from the arts and humanities (2.8% of publications in 2015) and social sciences (1.5%) were more likely than citations from medicine (0.1%) and science (<0.1%). Surprisingly, half of the sampled Scopus‐cited science and technology (53%) and medicine and health (47%) stories were based on academic research, rather than otherwise unpublished information, suggesting that researchers have chosen a lower‐quality secondary source for their citations. Nevertheless, the BBC News stories that were most frequently cited by Scopus, Google Books, and Wikipedia introduced new information from many different topics, including politics, business, economics, statistics, and reports about events. Thus, news stories are mediating real‐world knowledge into the academic domain, a potential cause for concern.
    June 05, 2017   doi: 10.1002/asi.23862   open full text
  • Exploring characteristics of highly cited authors according to citation location and content.
    Juyoung An, Namhee Kim, Min‐Yen Kan, Muthu Kumar Chandrasekaran, Min Song.
    Journal of the American Society for Information Science and Technology. June 05, 2017
    Big Science and cross‐disciplinary collaborations have reshaped the intellectual structure of research areas. A number of works have tried to uncover this hidden intellectual structure by analyzing citation contexts. However, none of them analyzed by document logical structures such as sections. The two major goals of this study are to find characteristics of authors who are highly cited section‐wise and to identify the differences in section‐wise author networks. This study uses 29,158 of research articles culled from the ACL Anthology, which hosts articles on computational linguistics and natural language processing. We find that the distribution of citations across sections is skewed and that a different set of highly cited authors share distinct academic characteristics, according to their citation locations. Furthermore, the author networks based on citation context similarity reveal that the intellectual structure of a domain differs across different sections.
    June 05, 2017   doi: 10.1002/asi.23834   open full text
  • Analysis of roles in engaging contentious online discussions in science.
    Noriko Hara, Madelyn Rose Sanfilippo.
    Journal of the American Society for Information Science and Technology. June 05, 2017
    The prevalence of sites in which users can contribute content increases ordinary citizens' participation in emerging forms of knowledge sharing. This article investigates the practices associated with the roles of participants who actively contribute to the coproduction of knowledge in three online communities and how these roles differ in controversial and noncontroversial threads. The Measles, Mumps, and Rubella (MMR) vaccine was selected as a contentious scientific topic because of persistent belief about an alleged link between the vaccine and autism. Contributions to three online communities that engage mothers with young children were analyzed to identify participant roles. No consistent roles were evident in noncontroversial threads, but the role of mediator consistently appeared in controversial threads in all three communities. This study helps to articulate the roles played in online communities that engage in knowledge collaboration. The variety of roles in online communities has implications for both the study for practice and the design of information technologies.
    June 05, 2017   doi: 10.1002/asi.23850   open full text
  • Personal‐discount sensitivity prediction for mobile coupon conversion optimization.
    Asnat Greenstein‐Messica, Lior Rokach, Asaf Shabtai.
    Journal of the American Society for Information Science and Technology. June 05, 2017
    The high adoption of smart mobile devices among consumers provides an opportunity for e‐commerce retailers to increase their sales by recommending consumers with real time, personalized coupons that take into account the specific contextual situation of the consumer. Although context‐aware recommender systems (CARS) have been widely analyzed, personalized pricing or discount optimization in recommender systems to improve recommendations' accuracy and commercial KPIs has hardly been researched. This article studies how to model user‐item personalized discount sensitivity and incorporate it into a real time contextual recommender system in such a way that it can be integrated into a commercial service. We propose a novel approach for modeling context‐aware user‐item personalized discount sensitivity in a sparse data scenario and present a new CARS algorithm that combines coclustering and random forest classification (CBRF) to incorporate the personalized discount sensitivity. We conducted an experimental study with real consumers and mobile discount coupons to evaluate our solution. We compared the CBRF algorithm to the widely used context‐aware matrix factorization (CAMF) algorithm. The experimental results suggest that incorporating personalized discount sensitivity significantly improves the consumption prediction accuracy and that the suggested CBRF algorithm provides better prediction results for this use case.
    June 05, 2017   doi: 10.1002/asi.23838   open full text
  • Learning and adapting user criteria for recommending followees in social networks.
    Antonela Tommasel, Daniela Godoy.
    Journal of the American Society for Information Science and Technology. June 05, 2017
    The accurate suggestion of interesting friends arises as a crucial issue in recommendation systems. The selection of friends or followees responds to several reasons whose importance might differ according to the characteristics and preferences of each user. Furthermore, those preferences might also change over time. Consequently, understanding how friends or followees are selected emerges as a key design factor of strategies for personalized recommendations. In this work, we argue that the criteria for recommending followees needs to be adapted and combined according to each user's behavior, preferences, and characteristics. A method is proposed for adapting such criteria to the characteristics of the previously selected followees. Moreover, the criteria can evolve over time to adapt to changes in user behavior, and broaden the diversity of the recommendation of potential followees based on novelty. Experimental evaluation showed that the proposed method improved precision results regarding static criteria weighting strategies and traditional rank aggregation techniques.
    June 05, 2017   doi: 10.1002/asi.23861   open full text
  • The effects of credibility cues on the selection of search engine results.
    Julian Unkel, Alexander Haas.
    Journal of the American Society for Information Science and Technology. June 05, 2017
    Web search engines act as gatekeepers when people search for information online. Research has shown that search engine users seem to trust the search engines' ranking uncritically and mostly select top‐ranked results. This study further examines search engine users' selection behavior. Drawing from the credibility and information research literature, we test whether the presence or absence of certain credibility cues influences the selection probability of search engine results. In an observational study, participants (N = 247) completed two information research tasks on preset search engine results pages, on which three credibility cues (source reputation, message neutrality, and social recommendations) as well as the search result ranking were systematically varied. The results of our study confirm the significance of the ranking. Of the three credibility cues, only reputation had an additional effect on selection probabilities. Personal characteristics (prior knowledge about the researched issues, search engine usage patterns, etc.) did not influence the preference for search results linked with certain credibility cues. These findings are discussed in light of situational and contextual characteristics (e.g., involvement, low‐cost scenarios).
    June 05, 2017   doi: 10.1002/asi.23820   open full text
  • Does it matter how you play? The effects of collaboration and competition among players of human computation games.
    Ei Pa Pa Pe‐Than, Dion Hoe‐Lian Goh, Chei Sian Lee.
    Journal of the American Society for Information Science and Technology. June 05, 2017
    Human computation games (HCGs) harness human intelligence through enjoyable gameplay to address computational problems that are beyond the power of computer programs but trivial for humans. With the popularity of crowdsourcing, different types of HCGs have been developed using various gameplay mechanics to attract online users to contribute outputs. Two commonly used mechanics are collaboration and competition. Yet there is little research examining whether HCGs perform better than nongame applications in terms of motivations and perceptions. Thus, this study investigates the effects of collaborative and competitive mechanics on intrinsic motivation and perceived output quality in mobile content sharing HCGs. Using a within‐subjects experiment, 160 participants were recruited from 2 local universities. The findings suggest that the nongame application was perceived to yield better quality output than both HCGs, but the latter offered a greater satisfaction of motivational needs, which may motivate individuals to continue playing them. Taken together, the present findings inform researchers and designers of HCGs that games could serve as a motivator to encourage participation. However, the usefulness of HCGs may be dependent on how one can effectively manage the entertainment–output generation duality of such games. This article concludes by presenting implications, limitations, and future research directions.
    June 05, 2017   doi: 10.1002/asi.23863   open full text
  • Effects of language and terminology of query suggestions on medical accuracy considering different user characteristics.
    Carla Teixeira Lopes, Dagmara Paiva, Cristina Ribeiro.
    Journal of the American Society for Information Science and Technology. June 05, 2017
    Searching for health information is one of the most popular activities on the web. In this domain, users often misspell or lack knowledge of the proper medical terms to use in queries. To overcome these difficulties and attempt to retrieve higher‐quality content, we developed a query suggestion system that provides alternative queries combining the Portuguese or English language with lay or medico‐scientific terminology. Here we evaluate this system's impact on the medical accuracy of the knowledge acquired during the search. Evaluation shows that simply providing these suggestions contributes to reduce the quantity of incorrect content. This indicates that even when suggestions are not clicked, they are useful either for subsequent queries' formulation or for interpreting search results. Clicking on suggestions, regardless of type, leads to answers with more correct content. An analysis by type of suggestion and user characteristics showed that the benefits of certain languages and terminologies are more perceptible in users with certain levels of English proficiency and health literacy. This suggests a personalization of this suggestion system toward these characteristics. Overall, the effect of language is more preponderant than the effect of terminology. Clicks on English suggestions are clearly preferable to clicks on Portuguese ones.
    June 05, 2017   doi: 10.1002/asi.23874   open full text
  • Goodreads reviews to assess the wider impacts of books.
    Kayvan Kousha, Mike Thelwall, Mahshid Abdoli.
    Journal of the American Society for Information Science and Technology. June 01, 2017
    Although peer‐review and citation counts are commonly used to help assess the scholarly impact of published research, informal reader feedback might also be exploited to help assess the wider impacts of books, such as their educational or cultural value. The social website Goodreads seems to be a reasonable source for this purpose because it includes a large number of book reviews and ratings by many users inside and outside of academia. To check this, Goodreads book metrics were compared with different book‐based impact indicators for 15,928 academic books across broad fields. Goodreads engagements were numerous enough in the arts (85% of books had at least one), humanities (80%), and social sciences (67%) for use as a source of impact evidence. Low and moderate correlations between Goodreads book metrics and scholarly or non‐scholarly indicators suggest that reader feedback in Goodreads reflects the many purposes of books rather than a single type of impact. Although Goodreads book metrics can be manipulated, they could be used guardedly by academics, authors, and publishers in evaluations.
    June 01, 2017   doi: 10.1002/asi.23805   open full text
  • SlideShare presentations, citations, users, and trends: A professional site with academic and educational uses.
    Mike Thelwall, Kayvan Kousha.
    Journal of the American Society for Information Science and Technology. June 01, 2017
    SlideShare is a free social website that aims to help users distribute and find presentations. Owned by LinkedIn since 2012, it targets a professional audience but may give value to scholarship through creating a long‐term record of the content of talks. This article tests this hypothesis by analyzing sets of general and scholarly related SlideShare documents using content and citation analysis and popularity statistics reported on the site. The results suggest that academics, students, and teachers are a minority of SlideShare uploaders, especially since 2010, with most documents not being directly related to scholarship or teaching. About two thirds of uploaded SlideShare documents are presentation slides, with the remainder often being files associated with presentations or video recordings of talks. SlideShare is therefore a presentation‐centered site with a predominantly professional user base. Although a minority of the uploaded SlideShare documents are cited by, or cite, academic publications, probably too few articles are cited by SlideShare to consider extracting SlideShare citations for research evaluation. Nevertheless, scholars should consider SlideShare to be a potential source of academic and nonacademic information, particularly in library and information science, education, and business.
    June 01, 2017   doi: 10.1002/asi.23815   open full text
  • Scientific evolutionary pathways: Identifying and visualizing relationships for scientific topics.
    Yi Zhang, Guangquan Zhang, Donghua Zhu, Jie Lu.
    Journal of the American Society for Information Science and Technology. June 01, 2017
    Whereas traditional science maps emphasize citation statistics and static relationships, this paper presents a term‐based method to identify and visualize the evolutionary pathways of scientific topics in a series of time slices. First, we create a data preprocessing model for accurate term cleaning, consolidating, and clustering. Then we construct a simulated data streaming function and introduce a learning process to train a relationship identification function to adapt to changing environments in real time, where relationships of topic evolution, fusion, death, and novelty are identified. The main result of the method is a map of scientific evolutionary pathways. The visual routines provide a way to indicate the interactions among scientific subjects and a version in a series of time slices helps further illustrate such evolutionary pathways in detail. The detailed outline offers sufficient statistical information to delve into scientific topics and routines and then helps address meaningful insights with the assistance of expert knowledge. This empirical study focuses on scientific proposals granted by the United States National Science Foundation, and demonstrates the feasibility and reliability. Our method could be widely applied to a range of science, technology, and innovation policy research, and offer insight into the evolutionary pathways of scientific activities.
    June 01, 2017   doi: 10.1002/asi.23814   open full text
  • Graph‐based recommendation integrating rating history and domain knowledge: Application to on‐site guidance of museum visitors.
    Einat Minkov, Keren Kahanov, Tsvi Kuflik.
    Journal of the American Society for Information Science and Technology. June 01, 2017
    Visitors to museums and other cultural heritage sites encounter a wealth of exhibits in a variety of subject areas, but can explore only a small number of them. Moreover, there typically exists rich complementary information that can be delivered to the visitor about exhibits of interest, but only a fraction of this information can be consumed during the limited time of the visit. Recommender systems may help visitors to cope with this information overload. Ideally, the recommender system of choice should model user preferences, as well as background knowledge about the museum's environment, considering aspects of physical and thematic relevancy. We propose a personalized graph‐based recommender framework, representing rating history and background multi‐facet information jointly as a relational graph. A random walk measure is applied to rank available complementary multimedia presentations by their relevancy to a visitor's profile, integrating the various dimensions. We report the results of experiments conducted using authentic data collected at the Hecht museum. An evaluation of multiple graph variants, compared with several popular and state‐of‐the‐art recommendation methods, indicates on advantages of the graph‐based approach.
    June 01, 2017   doi: 10.1002/asi.23837   open full text
  • Factors that influence query reformulations and search performance in health information retrieval: A multilevel modeling approach.
    Kun Lu, Soohyung Joo, Taehun Lee, Rong Hu.
    Journal of the American Society for Information Science and Technology. June 01, 2017
    Query reformulations can occur multiple times in a session, and queries observed in the same session tend to be related to each other. Due to the interdependent nature of queries in a session, it has been challenging to analyze query reformulation data while controlling for possible dependencies among queries. This study proposes a multilevel modeling approach in an attempt to analyze the effects of contextual factors and system features on types of query reformulation, as well as the relationship between types of query reformulation and search performance within a single research model. The results revealed that system features and users' educational background significantly influence users' query reformulation behaviors. Also, types of query reformulation had a significant impact on search performance. The main contribution of this study lies in that it adopted the multilevel modeling method to analyze query reformulation behavior while considering the nested structure of search session data. Multilevel analysis enables us to design an extensible research model to include both session‐level and action‐level factors, which provides a more extended understanding of the relationships among factors that influence query reformulation behavior and search performance. The multilevel modeling used in this study has practical implications for future query reformulation studies.
    June 01, 2017   doi: 10.1002/asi.23872   open full text
  • How does the world connect? Exploring the global diffusion of social network sites.
    Gregory D. Larosiliere, Lemuria D. Carter, Christian Meske.
    Journal of the American Society for Information Science and Technology. June 01, 2017
    This study explores the main determinants of social network adoption at the country level. We use the technology‐organization‐environment (TOE) framework to investigate factors influencing social network adoption. The authors use cross‐sectional data from 130 countries. The results indicate that social network adoption, at the country level, is positively influenced by the technological maturity, public readiness, and information and communication technology law sophistication. Technological, organizational, and environmental factors altogether accounted for 67% of variance in social network adoption. These findings provide a first insight into the usage of social network sites at the country level, as well as the main factors that influence public adoption. Implications for research and practice are discussed.
    June 01, 2017   doi: 10.1002/asi.23804   open full text
  • Online serendipity: A contextual differentiation of antecedents and outcomes.
    Christoph Lutz, Christian Pieter Hoffmann, Miriam Meckel.
    Journal of the American Society for Information Science and Technology. May 30, 2017
    Critics worry that algorithmic filtering could lead to overly polished, homogeneous web experiences. “Serendipity,” in turn, has been touted as an antidote. Yet, the desirability of serendipity could vary by context, as users may be more or less receptive depending on the services they employ. We propose a nomological model of online serendipity experiences, conceptualizing both cognitive and behavioral antecedents. Based on a survey of 1,173 German Internet users, we conduct structural equation modeling and multigroup analyses to differentiate the antecedents and effects of serendipity across three distinct contexts: online shopping, information services, and social networking sites. Our findings confirm that antecedents and outcomes of digital serendipity vary by context, with serendipity only being associated with user satisfaction in the context of social network sites.
    May 30, 2017   doi: 10.1002/asi.23771   open full text
  • Online serendipity: A contextual differentiation of antecedents and outcomes.
    Christoph Lutz, Christian Pieter Hoffmann, Miriam Meckel.
    Journal of the American Society for Information Science and Technology. May 30, 2017
    Critics worry that algorithmic filtering could lead to overly polished, homogeneous web experiences. “Serendipity,” in turn, has been touted as an antidote. Yet, the desirability of serendipity could vary by context, as users may be more or less receptive depending on the services they employ. We propose a nomological model of online serendipity experiences, conceptualizing both cognitive and behavioral antecedents. Based on a survey of 1,173 German Internet users, we conduct structural equation modeling and multigroup analyses to differentiate the antecedents and effects of serendipity across three distinct contexts: online shopping, information services, and social networking sites. Our findings confirm that antecedents and outcomes of digital serendipity vary by context, with serendipity only being associated with user satisfaction in the context of social network sites.
    May 30, 2017   doi: 10.1002/asi.23771   open full text
  • Task complexity and difficulty in music information retrieval.
    Xiao Hu, Noriko Kando.
    Journal of the American Society for Information Science and Technology. May 30, 2017
    There has been little research on task complexity and difficulty in music information retrieval (MIR), whereas many studies in the text retrieval domain have found that task complexity and difficulty have significant effects on user effectiveness. This study aimed to bridge the gap by exploring i) the relationship between task complexity and difficulty; ii) factors affecting task difficulty; and iii) the relationship between task difficulty, task complexity, and user search behaviors in MIR. An empirical user experiment was conducted with 51 participants and a novel MIR system. The participants searched for 6 topics across 3 complexity levels. The results revealed that i) perceived task difficulty in music search is influenced by task complexity, user background, system affordances, and task uncertainty and enjoyability; and ii) perceived task difficulty in MIR is significantly correlated with effectiveness metrics such as the number of songs found, number of clicks, and task completion time. The findings have implications for the design of music search tasks (in research) or use cases (in system development) as well as future MIR systems that can detect task difficulty based on user effectiveness metrics.
    May 30, 2017   doi: 10.1002/asi.23803   open full text
  • Task complexity and difficulty in music information retrieval.
    Xiao Hu, Noriko Kando.
    Journal of the American Society for Information Science and Technology. May 30, 2017
    There has been little research on task complexity and difficulty in music information retrieval (MIR), whereas many studies in the text retrieval domain have found that task complexity and difficulty have significant effects on user effectiveness. This study aimed to bridge the gap by exploring i) the relationship between task complexity and difficulty; ii) factors affecting task difficulty; and iii) the relationship between task difficulty, task complexity, and user search behaviors in MIR. An empirical user experiment was conducted with 51 participants and a novel MIR system. The participants searched for 6 topics across 3 complexity levels. The results revealed that i) perceived task difficulty in music search is influenced by task complexity, user background, system affordances, and task uncertainty and enjoyability; and ii) perceived task difficulty in MIR is significantly correlated with effectiveness metrics such as the number of songs found, number of clicks, and task completion time. The findings have implications for the design of music search tasks (in research) or use cases (in system development) as well as future MIR systems that can detect task difficulty based on user effectiveness metrics.
    May 30, 2017   doi: 10.1002/asi.23803   open full text
  • Texts as actions: Requests in online chats between reference librarians and library patrons.
    Alan Zemel.
    Journal of the American Society for Information Science and Technology. May 28, 2017
    Virtual reference services provide opportunities for library patrons to produce requests of reference librarians through quasi‐synchronous computer‐mediated exchanges in which requests and deliverables are produced as online textual objects. Text postings only become the actions they perform, such as an information request or deliverable, through the recipient's work of reading. Text postings thus are designed for their recipients and are built in ways that instruct particular readings. In this paper, I show that patron requests are interactional achievements co‐constituted by librarians and patrons through the exchange of text postings that are designed to be seen as requests. The Reference and User Services Association offers guidelines for online interactions between librarians and patrons. However, such guidelines provide only general recommendations by which librarians may overcome difficulties in identifying the specific information needs of patrons. I examine actual chat logs of virtual reference interactions and describe how librarians engage with patrons to co‐construct actionable requests to specify and fulfill patron information needs. Conversation analytic methods are used to identify the way texts are produced to instruct recipients in the ways they are to be read and how these texts serve, through reading's work, as an analysis of the actions prior texts perform.
    May 28, 2017   doi: 10.1002/asi.23819   open full text
  • An analysis of 14 Million tweets on hashtag‐oriented spamming*.
    Surendra Sedhai, Aixin Sun.
    Journal of the American Society for Information Science and Technology. May 28, 2017
    Over the years, Twitter has become a popular platform for information dissemination and information gathering. However, the popularity of Twitter has attracted not only legitimate users but also spammers who exploit social graphs, popular keywords, and hashtags for malicious purposes. In this paper, we present a detailed analysis of the HSpam14 dataset, which contains 14 million tweets with spam and ham (i.e., nonspam) labels, to understand spamming activities on Twitter. The primary focus of this paper is to analyze various aspects of spam on Twitter based on hashtags, tweet contents, and user profiles, which are useful for both tweet‐level and user‐level spam detection. First, we compare the usage of hashtags in spam and ham tweets based on frequency, position, orthography, and co‐occurrence. Second, for content‐based analysis, we analyze the variations in word usage, metadata, and near‐duplicate tweets. Third, for user‐based analysis, we investigate user profile information. In our study, we validate that spammers use popular hashtags to promote their tweets. We also observe differences in the usage of words in spam and ham tweets. Spam tweets are more likely to be emphasized using exclamation points and capitalized words. Furthermore, we observe that spammers use multiple accounts to post near‐duplicate tweets to promote their services and products. Unlike spammers, legitimate users are likely to provide more information such as their locations and personal descriptions in their profiles. In summary, this study presents a comprehensive analysis of hashtags, tweet contents, and user profiles in Twitter spamming.
    May 28, 2017   doi: 10.1002/asi.23836   open full text
  • Matched control groups for modeling events in citation data: An illustration of nobel prize effects in citation networks.
    Rudolf Farys, Tobias Wolbring.
    Journal of the American Society for Information Science and Technology. May 26, 2017
    Bibliometric data are frequently used to study the effects of events, such as the honoring of a scholar with an award, and to investigate changes of citation impact over time. However, the number of yearly citations depends upon time for multiple reasons: a) general time trends in citation data, b) changing coverage of databases, c) individual citation life‐cycles, and d) selection on citation impact. Hence, it is often ill‐advised to simply compare the average number of citations before and after an event to estimate its causal effect. Using a recent publication in this journal on the potential citation chain reaction of a Nobel Prize, we demonstrate that a simple pre‐post comparison can lead to biased and misleading results. We propose using matched control groups to improve causal inference and illustrate that the inclusion of a tailor‐made synthetic control group in the statistical analysis helps to avoid methodological artifacts. Our results suggest that there is neither a Nobel Prize effect as regards citation impact of the Nobel laureate under investigation nor a related chain reaction in the citation network, as suggested in the original study. Finally, we explain that these methodological recommendations extend far beyond the study of Nobel Prize effects in citation data.
    May 26, 2017   doi: 10.1002/asi.23802   open full text
  • Extracting fine‐grained location with temporal awareness in tweets: A two‐stage approach.
    Chenliang Li, Aixin Sun.
    Journal of the American Society for Information Science and Technology. May 26, 2017
    Twitter has attracted billions of users for life logging and sharing activities and opinions. In their tweets, users often reveal their location information and short‐term visiting histories or plans. Capturing user's short‐term activities could benefit many applications for providing the right context at the right time and location. In this paper we are interested in extracting locations mentioned in tweets at fine‐grained granularity, with temporal awareness. Specifically, we recognize the points‐of‐interest (POIs) mentioned in a tweet and predict whether the user has visited, is currently at, or will soon visit the mentioned POIs. A POI can be a restaurant, a shopping mall, a bookstore, or any other fine‐grained location. Our proposed framework, named TS‐Petar (Two‐Stage POI Extractor with Temporal Awareness), consists of two main components: a POI inventory and a two‐stage time‐aware POI tagger. The POI inventory is built by exploiting the crowd wisdom of the Foursquare community. It contains both POIs' formal names and their informal abbreviations, commonly observed in Foursquare check‐ins. The time‐aware POI tagger, based on the Conditional Random Field (CRF) model, is devised to disambiguate the POI mentions and to resolve their associated temporal awareness accordingly. Three sets of contextual features (linguistic, temporal, and inventory features) and two labeling schema features (OP and BILOU schemas) are explored for the time‐aware POI extraction task. Our empirical study shows that the subtask of POI disambiguation and the subtask of temporal awareness resolution call for different feature settings for best performance. We have also evaluated the proposed TS‐Petar against several strong baseline methods. The experimental results demonstrate that the two‐stage approach achieves the best accuracy and outperforms all baseline methods in terms of both effectiveness and efficiency.
    May 26, 2017   doi: 10.1002/asi.23816   open full text
  • Boosting attribute recognition with latent topics by matrix factorization.
    Zhuo Su, Donghui Li, Hanhui Li, Xiaonan Luo.
    Journal of the American Society for Information Science and Technology. May 26, 2017
    Attribute‐based approaches have recently attracted much attention in visual recognition tasks. These approaches describe images by using semantic attributes as the mid‐level feature. However, low recognition accuracy becomes the biggest barrier that limits their practical applications. In this paper, we propose a novel framework termed Boosting Attribute Recognition (BAR) for the image recognition task. Our framework stems from matrix factorization, and can explore latent relationships from the aspect of attribute and image simultaneously. Furthermore, to apply our framework in large‐scale visual recognition tasks, we present both offline and online learning implementation of the proposed framework. Extensive experiments on 3 data sets demonstrate that our framework achieves a sound accuracy of attribute recognition.
    May 26, 2017   doi: 10.1002/asi.23827   open full text
  • Person Name Disambiguation in the Web Using Adaptive Threshold Clustering.
    Agustín D. Delgado, Raquel Martínez, Soto Montalvo, Víctor Fresno.
    Journal of the American Society for Information Science and Technology. May 25, 2017
    In this article, we present a new clustering algorithm for Person Name Disambiguation in web search results. The algorithm groups web results according to the individuals they refer to. The best state‐of‐the‐art approaches require training data in order to learn thresholds for deciding when to group the webpages. However, the ambiguity level of person names on the web could not be previously estimated and the results of those methods strongly depend on the thresholds obtained with the training collections. We present the concept of adaptive threshold, which avoids the need of a previous supervised learning process, depending only on the content of the compared documents to decide if they refer to the same person. We evaluated our approach using three datasets reaching close results to those obtained by the most successful algorithms in the state‐of‐the‐art that require such a learning process, and outperforming the results of those obtained by algorithms that do not require it.
    May 25, 2017   doi: 10.1002/asi.23810   open full text
  • Rhetoric and the cold war politics of information science.
    Nathan R. Johnson.
    Journal of the American Society for Information Science and Technology. May 17, 2017
    Histories of information help clarify the values and intellectual commitments of the discipline. This study takes a rhetorical history approach to better understand the development of information studies as a discipline. Information studies historians have identified that the Cold War period was critical for the development of information science and consequently of its modern‐day incarnations. Due to post‐World War II prosperity, the 1960s saw a surge in interest in scientific and technical information. Many from government, education, and private sectors took interest in developing new ways to compete with Soviet science. This interest led to the National Science Foundation (NSF)‐funded Georgia Tech conferences of 1961 and 1962, which are analyzed here. I find that concerns about the “science information problem” provided language that was critical for transforming some of the information studies’ central concepts. In particular, I find that the idea of an “information scientist” was made possible by national funding for science information. I suggest that attending to the discursive traffic between public and disciplinary discourse of information studies can better attune the field to its intellectual commitments.
    May 17, 2017   doi: 10.1002/asi.23866   open full text
  • Ensemble analysis of topical journal ranking in bioinformatics.
    Min Song, SuYeon Kim, Keeheon Lee.
    Journal of the American Society for Information Science and Technology. May 07, 2017
    Journal rankings, frequently determined by the journal impact factor or similar indices, are quantitative measures for evaluating a journal's performance in its discipline, which is presently a major research thrust in the bibliometrics field. Recently, text mining was adopted to augment journal ranking‐based evaluation with the content analysis of a discipline taking a time‐variant factor into consideration. However, previous studies focused mainly on a silo analysis of a discipline using either citation‐or content‐oriented approaches, and no attempt was made to analyze topical journal ranking and its change over time in a seamless and integrated manner. To address this issue, we propose a journal‐time‐topic model, an extension of Dirichlet multinomial regression, which we applied to the field of bioinformatics to understand journal contribution to topics in a field and the shift of topic trends. The journal‐time‐topic model allows us to identify which journals are the major leaders in what topics and the manner in which their topical focus. It also helps reveal an interesting distinct pattern in the journal impact factor of high‐ and low‐ranked journals. The study results shed a new light for understanding topic specific journal rankings and shifts in journals' concentration on a subject.
    May 07, 2017   doi: 10.1002/asi.23840   open full text
  • “A greatly unexplored area”: Digital curation and innovation in digital humanities.
    Alex H. Poole.
    Journal of the American Society for Information Science and Technology. May 05, 2017
    New types of digital data, tools, and methods, for instance those that cross academic disciplines and domains, those that feature teams instead of single scholars, and those that involve individuals from outside the academy, enables new forms of scholarship and teaching in digital humanities. Such scholarship promotes reuse of digital data, provokes new research questions, and cultivates new audiences. Digital curation, the process of managing a trusted body of information for current and future use, helps maximize the value of research in digital humanities. Predicated on semistructured interviews, this naturalistic case study explores the creation, use, storage, and planned reuse of data by 45 interviewees involved with 19 Office of Digital Humanities Start‐Up Grant (SUG) projects. Interviewees grappled with challenges surrounding data, collaboration and communication, planning and project management, awareness and outreach, resources, and technology. Overall this study explores the existing digital curation practices and needs of scholars engaged in innovative digital humanities work and to discern how closely these practices and needs align with the digital curation literature.
    May 05, 2017   doi: 10.1002/asi.23743   open full text
  • Ontology for cultural variations in interpersonal communication: Building on theoretical models and crowdsourced knowledge.
    Dhavalkumar Thakker, Stan Karanasios, Emmanuel Blanchard, Lydia Lau, Vania Dimitrova.
    Journal of the American Society for Information Science and Technology. May 05, 2017
    The domain of cultural variations in interpersonal communication is becoming increasingly important in various areas, including human–human interaction (e.g., business settings) and human–computer interaction (e.g., during simulations, or with social robots). User‐generated content (UGC) in social media can provide an invaluable source of culturally diverse viewpoints for supporting the understanding of cultural variations. However, discovering and organizing UGC is notoriously challenging and laborious for humans, especially in ill‐defined domains such as culture. This calls for computational approaches to automate the UGC sensemaking process by using tagging, linking, and exploring. Semantic technologies allow automated structuring and qualitative analysis of UGC, but are dependent on the availability of an ontology representing the main concepts in a specific domain. For the domain of cultural variations in interpersonal communication, no ontological model exists. This paper presents the first such ontological model, called AMOn+, which defines cultural variations and enables tagging culture‐related mentions in textual content. AMOn+ is designed based on a novel interdisciplinary approach that combines theoretical models of culture with crowdsourced knowledge (DBpedia). An evaluation of AMOn+ demonstrated its fitness‐for‐purpose regarding domain coverage for annotating culture‐related concepts mentioned in text corpora. This ontology can underpin computational models for making sense of UGC.
    May 05, 2017   doi: 10.1002/asi.23824   open full text
  • Insight workflow: Systematically combining human and computational methods to explore textual data.
    Alastair J. Gill, Saba Hinrichs‐Krapels, Tobias Blanke, Jonathan Grant, Mark Hedges, Simon Tanner.
    Journal of the American Society for Information Science and Technology. May 03, 2017
    Analyzing large quantities of real‐world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real‐world data set as our case study, and use this exploration as a demonstration of our “insight workflow,” which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in‐depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
    May 03, 2017   doi: 10.1002/asi.23767   open full text
  • Web citations in patents: Evidence of technological impact?
    Enrique Orduna‐Malea, Mike Thelwall, Kayvan Kousha.
    Journal of the American Society for Information Science and Technology. May 02, 2017
    Patents sometimes cite webpages either as general background to the problem being addressed or to identify prior publications that limit the scope of the patent granted. Counts of the number of patents citing an organization's website may therefore provide an indicator of its technological capacity or relevance. This article introduces methods to extract URL citations from patents and evaluates the usefulness of counts of patent web citations as a technology indicator. An analysis of patents citing 200 US universities or 177 UK universities found computer science and engineering departments to be frequently cited, as well as research‐related webpages, such as Wikipedia, YouTube, or the Internet Archive. Overall, however, patent URL citations seem to be frequent enough to be useful for ranking major US and the top few UK universities if popular hosted subdomains are filtered out, but the hit count estimates on the first search engine results page should not be relied upon for accuracy.
    May 02, 2017   doi: 10.1002/asi.23821   open full text
  • Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?
    Eric P. S. Baumer, David Mimno, Shion Guha, Emily Quan, Geri K. Gay.
    Journal of the American Society for Information Science and Technology. April 28, 2017
    Researchers in information science and related areas have developed various methods for analyzing textual data, such as survey responses. This article describes the application of analysis methods from two distinct fields, one method from interpretive social science and one method from statistical machine learning, to the same survey data. The results show that the two analyses produce some similar and some complementary insights about the phenomenon of interest, in this case, nonuse of social media. We compare both the processes of conducting these analyses and the results they produce to derive insights about each method's unique advantages and drawbacks, as well as the broader roles that these methods play in the respective fields where they are often used. These insights allow us to make more informed decisions about the tradeoffs in choosing different methods for analyzing textual data. Furthermore, this comparison suggests ways that such methods might be combined in novel and compelling ways.
    April 28, 2017   doi: 10.1002/asi.23786   open full text
  • Political engagement and ICTs: Internet use in marginalized communities.
    David Nemer, Michail Tsikerdekis.
    Journal of the American Society for Information Science and Technology. April 28, 2017
    Information and communication technologies (ICTs) provide a distinctive structure of opportunities with the potential to promote political engagement. However, concerns remain over unequal technological access in our society, as political resources available on the internet empower those with the resources and motivation to take advantage of them, leaving those who are disengaged farther behind. Hence, those who face digital inequalities are not only deprived of the benefits of the so‐called Information Society, they are also deprived of exercising their civic rights. To promote political engagement among the marginalized, we analyze different sociotechnical factors that may play a role in promoting their inclusion in future political activities. We employed a survey for marginalized communities to analyze a set of research questions relating to sociotechnical factors. We show that online content creation, digital freedom, and access to the mobile Internet may positively impact political engagement. The development of these factors may not only promote the inclusion of marginalized populations in future political events, but also help to build a more equal society where everyone's voice has a chance to be heard.
    April 28, 2017   doi: 10.1002/asi.23779   open full text
  • A Multidimensional Investigation of the Effects of Publication Retraction on Scholarly Impact.
    Xin Shuai, Jason Rollins, Isabelle Moulinier, Tonya Custis, Mathilda Edmunds, Frank Schilder.
    Journal of the American Society for Information Science and Technology. April 26, 2017
    During the past few decades, the rate of publication retractions has increased dramatically in academia. In this study, we investigate retractions from a quantitative perspective, aiming to answer two fundamental questions. One, how do retractions influence the scholarly impact of retracted papers, authors, and institutions? Two, does this influence propagate to the wider academic community through scholarly associations? Specifically, we analyzed a set of retracted articles indexed in Thomson Reuters Web of Science (WoS), and ran multiple experiments to compare changes in scholarly impact against a control set of nonretracted articles, authors, and institutions. We further applied the Granger Causality test to investigate whether different scientific topics are dynamically affected by retracted papers occurring within those topics. Our results show two key findings: first, the scholarly impact of retracted papers and authors significantly decreases after retraction, and the most severe impact decrease correlates with retractions based on proven, purposeful scientific misconduct; second, this retraction penalty does not seem to spread through the broader scholarly social graph, but instead has a limited and localized effect. Our findings may provide useful insights for scholars or science committees to evaluate the scholarly value of papers, authors, or institutions related to retractions.
    April 26, 2017   doi: 10.1002/asi.23826   open full text
  • Don't be deceived: Using linguistic analysis to learn how to discern online review authenticity.
    Snehasish Banerjee, Alton Y. K. Chua, Jung‐Jae Kim.
    Journal of the American Society for Information Science and Technology. April 20, 2017
    This article uses linguistic analysis to help users discern the authenticity of online reviews. Two related studies were conducted using hotel reviews as the test case for investigation. The first study analyzed 1,800 authentic and fictitious reviews based on the linguistic cues of comprehensibility, specificity, exaggeration, and negligence. The analysis involved classification algorithms followed by feature selection and statistical tests. A filtered set of variables that helped discern review authenticity was identified. The second study incorporated these variables to develop a guideline that aimed to inform humans how to distinguish between authentic and fictitious reviews. The guideline was used as an intervention in an experimental setup that involved 240 participants. The intervention improved human ability to identify fictitious reviews amid authentic ones.
    April 20, 2017   doi: 10.1002/asi.23784   open full text
  • A local context‐aware LDA model for topic modeling in a document network.
    Yang Liu, Songhua Xu.
    Journal of the American Society for Information Science and Technology. April 13, 2017
    With the rapid development of the Internet and its applications, growing volumes of documents increasingly become interconnected to form large‐scale document networks. Accordingly, topic modeling in a network of documents has been attracting continuous research attention. Most of the existing network‐based topic models assume that topics in a document are influenced by its directly linked neighbouring documents in a document network and overlook the potential influence from indirectly linked ones. The existing work also has not carefully modeled variations of such influence among neighboring documents. Recognizing these modeling limitations, this paper introduces a novel Local Context‐Aware LDA Model (LC‐LDA), which is capable of observing a local context comprising a rich collection of documents that may directly or indirectly influence the topic distributions of a target document. The proposed model can also differentiate the respective influence of each document in the local context on the target document according to both structural and temporal relationships between the two documents. The proposed model is extensively evaluated through multiple document clustering and classification tasks conducted over several large‐scale document sets. Evaluation results clearly and consistently demonstrate the effectiveness and superiority of the new model with respect to several state‐of‐the‐art peer models.
    April 13, 2017   doi: 10.1002/asi.23822   open full text
  • Developments in research data management in academic libraries: Towards an understanding of research data service maturity.
    Andrew M. Cox, Mary Anne Kennan, Liz Lyon, Stephen Pinfield.
    Journal of the American Society for Information Science and Technology. March 25, 2017
    This article reports an international study of research data management (RDM) activities, services, and capabilities in higher education libraries. It presents the results of a survey covering higher education libraries in Australia, Canada, Germany, Ireland, the Netherlands, New Zealand, and the UK. The results indicate that libraries have provided leadership in RDM, particularly in advocacy and policy development. Service development is still limited, focused especially on advisory and consultancy services (such as data management planning support and data‐related training), rather than technical services (such as provision of a data catalog, and curation of active data). Data curation skills development is underway in libraries, but skills and capabilities are not consistently in place and remain a concern. Other major challenges include resourcing, working with other support services, and achieving “buy in” from researchers and senior managers. Results are compared with previous studies in order to assess trends and relative maturity levels. The range of RDM activities explored in this study are positioned on a “landscape maturity model,” which reflects current and planned research data services and practice in academic libraries, representing a “snapshot” of current developments and a baseline for future research.
    March 25, 2017   doi: 10.1002/asi.23781   open full text
  • Connecting theory and practice in digital humanities information work.
    Tanya E. Clement, Daniel Carter.
    Journal of the American Society for Information Science and Technology. March 25, 2017
    The omnipresence and escalating efficiency of digital, networked information systems alongside the resulting deluge of digital corpora, apps, software, and data has coincided with increased concerns in the humanities with new topics and methods of inquiry. In particular, digital humanities (DH), the subfield that has emerged as the site of most of this work, has received growing attention in higher education in recent years. This study seeks to facilitate a better understanding of digital humanities by studying the motivations and practices of digital humanists as information workers in the humanities. To this end, we observe information work through interviews with DH scholars about their work practices and through a survey of DH programs such as graduate degrees, certificates, minors, and training institutes. In this study we focus on how the goals behind methodology (a link between theories and method) surface in everyday DH work practices and in DH curricula in order to investigate if the critiques that have appeared in relation to DH information work are well founded and to suggest alternative narratives about information work in DH that will help advance the impact of the field in the humanities and beyond.
    March 25, 2017   doi: 10.1002/asi.23732   open full text
  • Types of personal information categorization: Rigid, fuzzy, and flexible.
    Kyong Eun Oh.
    Journal of the American Society for Information Science and Technology. March 20, 2017
    This study aims to identify different styles of personal digital information categorization based on the mindscape of the categorizers. To collect data, a questionnaire, a diary study, and 2 semistructured interviews were conducted with each of 18 participants. Then a content analysis was used to analyze the data. Based on the analysis of the data, this study identified 3 different types of categorizers: (i) rigid categorizers, (ii) fuzzy categorizers, and (iii) flexible categorizers. This study provides a unique way to understand personal information categorization by showing how it reflects the mindscapes of the categorizers. In particular, this study explains why people organize their personal information differently and have different tendencies in developing and maintaining their organizational structures. The findings provide insights on different ways of categorizing personal information and deepen our knowledge of categorization, personal information management, and information behavior. In practice, understanding different types of personal digital information categorization can make contributions to the development of systems, tools, and applications that support effective personal digital information categorization.
    March 20, 2017   doi: 10.1002/asi.23787   open full text
  • A webometric analysis of the online vaccination debate.
    Anton Ninkov, Liwen Vaughan.
    Journal of the American Society for Information Science and Technology. March 20, 2017
    Webometrics research methods can be effectively used to measure and analyze information on the web. One topic discussed vehemently online that could benefit from this type of analysis is vaccines. We carried out a study analyzing the web presence of both sides of this debate. We collected a variety of webometric data and analyzed the data both quantitatively and qualitatively. The study found far more anti‐ than pro‐vaccine web domains. The anti and pro sides had similar web visibility as measured by the number of links coming from general websites and Tweets. However, the links to the pro domains were of higher quality measured by PageRank scores. The result from the qualitative content analysis confirmed this finding. The analysis of site ages revealed that the battle between the two sides had a long history and is still ongoing. The web scene was polarized with either pro or anti views and little neutral ground. The study suggests ways that professional information can be promoted more effectively on the web. The study demonstrates that webometrics analysis is effective in studying online information dissemination. This kind of analysis can be used to study not only health information but other information as well.
    March 20, 2017   doi: 10.1002/asi.23758   open full text
  • The interplay between information practices and information context: The case of mobile knowledge workers.
    Mohammad Hossein Jarrahi, Leslie Thomson.
    Journal of the American Society for Information Science and Technology. March 20, 2017
    The knowledge workforce is changing: global economic factors, increasing professional specialization, and rapid technological advancements mean that more individuals than ever can be found working in independent, modular, and mobile arrangements. Little is known about professional information practices or actions outside of traditional, centralized offices; however, the dynamic, unconventional, and less stable mobile work context diverges substantially from this model, and presents significant challenges and opportunities for the accomplishing of work tasks. This article identifies 5 main information practices geared toward mobilizing work, based on in‐depth interviews with 31 mobile knowledge workers (MKWs). It then uses these 5 practices as starting points for beginning to delineate the context of mobile knowledge work. We find that the information practices and information contexts of MKWs are mutually constitutive: challenges and opportunities of their work arrangements are what enable the development of practices that continually (re)construct productive spatial, temporal, social, and material contexts for work. This article contributes to an empirical understanding of the information practices of an increasingly visible yet understudied population, and to a theoretical understanding of the contemporary mobile knowledge work information context.
    March 20, 2017   doi: 10.1002/asi.23773   open full text
  • Effects of task complexity on online search behavior of adolescents.
    Jaap Walhout, Paola Oomen, Halszka Jarodzka, Saskia Brand‐Gruwel.
    Journal of the American Society for Information Science and Technology. March 20, 2017
    Evaluation of information during information problem‐solving processes already starts when trying to select the appropriate search result on a search engine results page (SERP). Up to now, research has mainly focused on the evaluation of webpages, while the evaluation of SERPs received less attention. Furthermore, task complexity is often not taken into account. A within‐subjects design was used to study the influence of task complexity on search query formulation, evaluation of search results, and task performance. Three search tasks were used: a fact‐finding, cause–effect, and a controversial topic task. To measure perceptual search processes, we used a combination of log files, eye‐tracking data, answer forms, and think‐aloud protocols. The results reveal that an increase in task complexity results in more search queries and used keywords, more time to formulate search queries, and more considered search results on the SERPs. Furthermore, higher ranked search results were considered more often than lower ranked results. However, not all the results for the most complex task were in line with expectations. These conflicting results can be explained by a lack of prior knowledge and the possible interference of prior attitudes.
    March 20, 2017   doi: 10.1002/asi.23782   open full text
  • Filtering patent maps for visualization of diversification paths of inventors and organizations.
    Bowen Yan, Jianxi Luo.
    Journal of the American Society for Information Science and Technology. March 20, 2017
    In the information science literature, recent studies have used patent databases and patent classification information to construct network maps of patent technology classes. In such a patent technology map, almost all pairs of technology classes are connected, whereas most of the connections between them are extremely weak. This observation suggests the possibility of filtering the patent network map by removing weak links. However, removing links may reduce the explanatory power of the network on inventor or organization diversification. The network links may explain the patent portfolio diversification paths of inventors and inventing organizations. We measure the diversification explanatory power of the patent network map, and present a method to objectively choose an optimal tradeoff between explanatory power and removing weak links. We show that this method can remove a degree of arbitrariness compared with previous filtering methods based on arbitrary thresholds, and also identify previous filtering methods that created filters outside the optimal tradeoff. The filtered map aims to aid in network visualization analyses of the technological diversification of inventors, organizations, and other innovation agents, and potential foresight analysis. Such applications to a prolific inventor (Leonard Forbes) and company (Google) are demonstrated.
    March 20, 2017   doi: 10.1002/asi.23780   open full text
  • Academics' behaviors and attitudes towards open access publishing in scholarly journals.
    Jennifer Rowley, Frances Johnson, Laura Sbaffi, Will Frass, Elaine Devine.
    Journal of the American Society for Information Science and Technology. March 20, 2017
    While there is significant progress with policy and a lively debate regarding the potential impact of open access publishing, few studies have examined academics' behavior and attitudes to open access publishing (OAP) in scholarly journals. This article seeks to address this gap through an international and interdisciplinary survey of academics. Issues covered include: use of and intentions regarding OAP, and perceptions regarding advantages and disadvantages of OAP, journal article publication services, peer review, and reuse. Despite reporting engagement in OAP, academics were unsure about their future intentions regarding OAP. Broadly, academics identified the potential for wider circulation as the key advantage of OAP, and were more positive about its benefits than they were negative about its disadvantages. As regards services, rigorous peer review, followed by rapid publication were most valued. Academics reported strong views on reuse of their work; they were relatively happy with noncommercial reuse, but not in favor of commercial reuse, adaptations, and inclusion in anthologies. Comparing science, technology, and medicine with arts, humanities, and social sciences showed a significant difference in attitude on a number of questions, but, in general, the effect size was small, suggesting that attitudes are relatively consistent across the academic community.
    March 20, 2017   doi: 10.1002/asi.23710   open full text
  • Adding the dimension of knowledge trading to source impact assessment: Approaches, indicators, and implications.
    Erjia Yan, Yongjun Zhu.
    Journal of the American Society for Information Science and Technology. March 13, 2017
    The objective of this paper is to systematically assess sources' (e.g., journals and proceedings) impact in knowledge trading. While there have been efforts at evaluating different aspects of journal impact, the dimension of knowledge trading is largely absent. To fill the gap, this study employed a set of trading‐based indicators, including weighted degree centrality, Shannon entropy, and weighted betweenness centrality, to assess sources' trading impact. These indicators were applied to several time‐sliced source‐to‐source citation networks that comprise 33,634 sources indexed in the Scopus database. The results show that several interdisciplinary sources, such as Nature, PLoS One, Proceedings of the National Academy of Sciences, and Science, and several specialty sources, such as Lancet, Lecture Notes in Computer Science, Journal of the American Chemical Society, Journal of Biological Chemistry, and New England Journal of Medicine, have demonstrated their marked importance in knowledge trading. Furthermore, this study also reveals that, overall, sources have established more trading partners, increased their trading volumes, broadened their trading areas, and diversified their trading contents over the past 15 years from 1997 to 2011. These results inform the understanding of source‐level impact assessment and knowledge diffusion.
    March 13, 2017   doi: 10.1002/asi.23670   open full text
  • Domain‐independent search expertise: Gaining knowledge in query formulation through guided practice.
    Catherine L. Smith.
    Journal of the American Society for Information Science and Technology. March 07, 2017
    Although modern search systems require minimal skill for meeting simple information needs, most systems provide weak support for gaining advanced skill; hence, the goal of designing systems that guide searchers in developing expertise. Essential to developing such systems are a description of expert search behavior and an understanding of how it may be acquired. The present study contributes a detailed analysis of the query behavior of 10 students as they completed assigned exercises during a semester‐long course on expert search. Detailed query logs were coded for three dimensions of query expression: the information structure searched, the type of query term used, and intent of the query with respect to specificity. Patterns of query formulation were found to evidence a progression of instruction, suggesting that the students gained knowledge of fundamental system‐independent constructs that underlie expert search, and that domain‐independent search expertise may be defined as the ability to use these constructs. Implications for system design are addressed.
    March 07, 2017   doi: 10.1002/asi.23776   open full text
  • Learning to cite framework: How to automatically construct citations for hierarchical data.
    Gianmaria Silvello.
    Journal of the American Society for Information Science and Technology. March 07, 2017
    The practice of citation is foundational for the propagation of knowledge along with scientific development and it is one of the core aspects on which scholarship and scientific publishing rely. Within the broad context of data citation, we focus on the automatic construction of citations problem for hierarchically structured data. We present the “learning to cite” framework, which enables the automatic construction of human‐ and machine‐readable citations with different levels of coarseness. The main goal is to reduce the human intervention on data to a minimum and to provide a citation system general enough to work on heterogeneous and complex XML data sets. We describe how this framework can be realized by a system for creating citations to single nodes within an XML data set and, as a use case, show how it can be applied in the context of digital archives. We conduct an extensive evaluation of the proposed citation system by analyzing its effectiveness from the correctness and completeness viewpoints, showing that it represents a suitable solution that can be easily employed in real‐world environments and that reduces human intervention on data to a minimum.
    March 07, 2017   doi: 10.1002/asi.23774   open full text
  • A “Gold‐centric” implementation of open access: Hybrid journals, the “Total cost of publication,” and policy development in the UK and beyond.
    Stephen Pinfield, Jennifer Salter, Peter A. Bath.
    Journal of the American Society for Information Science and Technology. February 27, 2017
    This paper reports analysis of data from higher education institutions in the UK on their experience of the open‐access (OA) publishing market working within a policy environment favoring “Gold” OA (OA publishing in journals). It models the “total cost of publication”—comprising costs of journal subscriptions, OA article‐processing charges (APCs), and new administrative costs—for a sample of 24 institutions. APCs are shown to constitute 12% of the “total cost of publication,” APC administration, 1%, and subscriptions, 87% (for a sample of seven publishers). APC expenditure in institutions rose between 2012 and 2014 at the same time as rising subscription costs. There was disproportionately high take up of Gold options for Health and Life Sciences articles. APC prices paid varied widely, with a mean APC of £1,586 in 2014. “Hybrid” options (subscription journals also offering OA for individual articles on payment of an APC) were considerably more expensive than fully OA titles, but the data indicate a correlation between APC price and journal quality (as reflected in the citation rates of journals). The policy implications of these developments are explored, particularly in relation to hybrid OA and potential of offsetting subscription and APC costs.
    February 27, 2017   doi: 10.1002/asi.23742   open full text
  • Mining correlations between medically dependent features and image retrieval models for query classification.
    Hajer Ayadi, Mouna Torjmen‐Khemakhem, Mariam Daoud, Jimmy Xiangji Huang, Maher Ben Jemaa.
    Journal of the American Society for Information Science and Technology. February 27, 2017
    The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State‐of‐the‐art image retrieval models are classified into three categories: content‐based (visual) models, textual models, and combined models. Content‐based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier (NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier (SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification.
    February 27, 2017   doi: 10.1002/asi.23772   open full text
  • A longitudinal study of user queries and browsing requests in a case‐based reasoning retrieval system.
    Wu He, Xin Tian.
    Journal of the American Society for Information Science and Technology. February 27, 2017
    This article reports on a longitudinal analysis of query logs of a web‐based case library system during an 8‐year period (from 2005 to 2012). The analysis studies 3 different information‐seeking approaches: keyword searching, browsing, and case‐based reasoning (CBR) searching provided by the system by examining the query logs that stretch over 8 years. The longitudinal dimension of this study offers unique possibilities to see how users used the 3 different approaches over time. Various user information‐seeking patterns and trends are identified through the query usage pattern analysis and session analysis. The study identified different user groups and found that a majority of the users tend to stick to their favorite information‐seeking approach to meet their immediate information needs and do not seem to care whether alternative search options will offer greater benefits. The study also found that return users used CBR searching much more frequently than 1‐time users and tend to use more query terms to look for information than 1‐time users.
    February 27, 2017   doi: 10.1002/asi.23738   open full text
  • Control and syntagmatization: Vocabulary requirements in information retrieval thesauri and natural language lexicons.
    Volkmar Engerer.
    Journal of the American Society for Information Science and Technology. February 27, 2017
    This paper explores the relationships between natural language lexicons in lexical semantics and thesauri in information retrieval research. These different areas of knowledge have different restrictions on use of vocabulary; thesauri are used only in information search and retrieval contexts, whereas lexicons are mental systems and generally applicable in all domains of life. A set of vocabulary requirements that defines the more concrete characteristics of vocabulary items in the 2 contexts can be derived from this framework: lexicon items have to be learnable, complex, transparent, etc., whereas thesaurus terms must be effective, current and relevant, searchable, etc. The differences in vocabulary properties correlate with 2 other factors, the well‐known dimension of Control (deliberate, social activities of building and maintaining vocabularies), and Syntagmatization, which is less known and describes vocabulary items' varying formal preparedness to exit the thesaurus/lexicon, enter into linear syntactic constructions, and, finally, acquire communicative functionality. It is proposed that there is an inverse relationship between Control and Syntagmatization.
    February 27, 2017   doi: 10.1002/asi.23783   open full text
  • Identification of long‐term concept‐symbols among citations: Do common intellectual histories structure citation behavior?
    Jordan A. Comins, Loet Leydesdorff.
    Journal of the American Society for Information Science and Technology. February 21, 2017
    “Citation classics” are not only highly cited, but also cited during several decades. We explore whether the peaks in the spectrograms generated by Reference Publication Years Spectroscopy (RPYS) indicate such long‐term impact by comparing across RPYS for subsequent time intervals. Multi‐RPYS enables us to distinguish between short‐term citation peaks at the research front that decay within 10 years versus historically constitutive (long‐term) citations that function as concept symbols. Using these constitutive citations, one is able to cluster document sets (e.g., journals) in terms of intellectually shared histories. We test this premise by clustering 40 journals in the Web of Science Category of Information and Library Science using multi‐RPYS. It follows that RPYS can not only be used for retrieving roots of sets under study (cited), but also for algorithmic historiography of the citing sets. Significant references are historically rooted symbols among other citations that function as currency.
    February 21, 2017   doi: 10.1002/asi.23769   open full text
  • The false Donald J. Trump article and the ethics of misleading journalism.
    Jaime A. Teixeira da Silva.
    Journal of the American Society for Information Science and Technology. February 01, 2017
    On August 8, 2016, a story appeared on the top page of Yahoo that read “Trump: You people really believed me?” It was written by Walker Lundy of the Charlotte Observer in Charlotte, NC, USA on August 6, 2016. The article was shocking, even to a non‐U.S. citizen, because it revealed the sudden, and unexpected, withdrawal of Donald J. Trump from the U.S. Presidential race, which would surely have been welcome news for the Hillary Clinton camp. Less than 24 hours later, that article could no longer be traced on Yahoo, and the original links led to a 404 error. The original story is still published on another Charlotte Observer webpage. National and independent polls related to the presidential election fluctuate, but pollsters are always influenced by news events related to either candidate. Yet, one has to wonder why Mr. Lundy would have published a false and misleading story, which was widely circulated, including by the powerful media engine Yahoo, that has serious national and international consequences. Some of the policies espoused by both presidential candidates are radically different, and thus any media story that can tilt the opinion of readers in the direction of one or other candidate needs to be carefully analyzed. Sowing doubt by using false or misleading journalism does not do the public any favors and casts a doubtful light on the practice of political journalism. The Charlotte Observer and Mr. Lundy failed to respond to a request for comment.
    February 01, 2017   doi: 10.1002/asi.23828   open full text
  • How many ways to use CiteSpace? A study of user interactive events over 14 months.
    Qing Ping, Jiangen He, Chaomei Chen.
    Journal of the American Society for Information Science and Technology. January 27, 2017
    Using visual analytic systems effectively may incur a steep learning curve for users, especially for those who have little prior knowledge of either using the tool or accomplishing analytic tasks. How do users deal with a steep learning curve over time? Are there particularly problematic aspects of an analytic process? In this article we investigate these questions through an integrative study of the use of CiteSpace—a visual analytic tool for finding trends and patterns in scientific literature. In particular, we analyze millions of interactive events in logs generated by users worldwide over a 14‐month period. The key findings are: (i) three levels of proficiency are identified, namely, level 1: low proficiency, level 2: intermediate proficiency, and level 3: high proficiency, and (ii) behavioral patterns at level 3 are resulted from a more engaging interaction with the system, involving a wider variety of events and being characterized by longer state transition paths, whereas behavioral patterns at levels 1 and 2 seem to focus on learning how to use the tool. This study contributes to the development and evaluation of visual analytic systems in realistic settings and provides a valuable addition to the study of interactive visual analytic processes.
    January 27, 2017   doi: 10.1002/asi.23770   open full text
  • Users and uses of a global union catalog: A mixed‐methods study of WorldCat.org.
    Simon Wakeling, Paul Clough, Lynn Silipigni Connaway, Barbara Sen, David Tomás.
    Journal of the American Society for Information Science and Technology. January 20, 2017
    This paper presents the first large‐scale investigation of the users and uses of WorldCat.org, the world's largest bibliographic database and global union catalog. Using a mixed‐methods approach involving focus group interviews with 120 participants, an online survey with 2,918 responses, and an analysis of transaction logs of approximately 15 million sessions from WorldCat.org, the study provides a new understanding of the context for global union catalog use. We find that WorldCat.org is accessed by a diverse population, with the three primary user groups being librarians, students, and academics. Use of the system is found to fall within three broad types of work‐task (professional, academic, and leisure), and we also present an emergent taxonomy of search tasks that encompass known‐item, unknown‐item, and institutional information searches. Our results support the notion that union catalogs are primarily used for known‐item searches, although the volume of traffic to WorldCat.org means that unknown‐item searches nonetheless represent an estimated 250,000 sessions per month. Search engine referrals account for almost half of all traffic, but although WorldCat.org effectively connects users referred from institutional library catalogs to other libraries holding a sought item, users arriving from a search engine are less likely to connect to a library.
    January 20, 2017   doi: 10.1002/asi.23708   open full text
  • Metrics for openness.
    David M. Nichols, Michael B. Twidale.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    The characterization of scholarly communication is dominated by citation‐based measures. In this paper we propose several metrics to describe different facets of open access and open research. We discuss measures to represent the public availability of articles along with their archival location, licenses, access costs, and supporting information. Calculations illustrating these new metrics are presented using the authors’ publications. We argue that explicit measurement of openness is necessary for a holistic description of research outputs.
    December 21, 2016   doi: 10.1002/asi.23741   open full text
  • Knowledge‐dissemination channels: Analytics of stature evaluation.
    Liang Chen, Clyde W. Holsapple, Shih‐Hui (Steven) Hsiao, Zhihong Ke, Jae‐Young Oh, Zhiguo Yang.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Understanding relative statures of channels for disseminating knowledge is of practical interest to both generators and consumers of knowledge flows. For generators, stature can influence attractiveness of alternative dissemination routes and deliberations of those who assess generator performance. For knowledge consumers, channel stature may influence knowledge content to which they are exposed. This study introduces a novel approach to conceptualizing and measuring stature of knowledge‐dissemination channels: the power‐impact (PI) technique. It is a flexible technique having 3 complementary variants, giving holistic insights about channel stature by accounting for both attraction of knowledge generators to a distribution channel and degree to which knowledge consumers choose to use a channel's knowledge content. Each PI variant is expressed in terms of multiple parameters, permitting customization of stature evaluation to suit its user's preferences. In the spirit of analytics, each PI variant is driven by objective evidence of actual behaviors. The PI technique is based on 2 building blocks: (a) power that channels have for attracting results of generators' knowledge work, and (b) impact that channel contents' exhibit on prospective recipients. Feasibility and functionality of the PI‐technique design are demonstrated by applying it to solve a problem of journal stature evaluation for the information‐systems discipline.
    December 21, 2016   doi: 10.1002/asi.23725   open full text
  • A note concerning primary source knowledge.
    Harry M. Collins, Luis Reyes‐Galindo, Paul Ginsparg.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    We present the results of running 4 different papers through the automated filtering system used by the open access preprint server “arXiv” to classify papers and implement quality control barriers. The exercise was carried out in order to assess whether these highly sophisticated, state‐of‐the‐art filters can distinguish between papers that are controversial or have gone past their “sell‐by date,” and otherwise normal papers. We conclude that not even the arXiv filters, which are otherwise successful in filtering fringe‐topic papers, can fully acquire “Domain‐Specific Discrimination” and thus distinguish technical papers that are taken seriously by an expert community from those that are not. Finally, we discuss the implications this has for citizen and policy‐maker engagement with the Primary Source Knowledge of a technical domain.
    December 21, 2016   doi: 10.1002/asi.23753   open full text
  • Analysis of change in users' assessment of search results over time.
    Maayan Zhitomirsky‐Geffet, Judit Bar‐Ilan, Mark Levene.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    We present the first systematic study of the influence of time on user judgements for rankings and relevance grades of web search engine results. The goal of this study is to evaluate the change in user assessment of search results and explore how users' judgements change. To this end, we conducted a large‐scale user study with 86 participants who evaluated 2 different queries and 4 diverse result sets twice with an interval of 2 months. To analyze the results we investigate whether 2 types of patterns of user behavior from the theory of categorical thinking hold for the case of evaluation of search results: (a) coarseness and (b) locality. To quantify these patterns we devised 2 new measures of change in user judgements and distinguish between local (when users swap between close ranks and relevance values) and nonlocal changes. Two types of judgements were considered in this study: (a) relevance on a 4‐point scale, and (b) ranking on a 10‐point scale without ties. We found that users tend to change their judgements of the results over time in about 50% of cases for relevance and in 85% of cases for ranking. However, the majority of these changes were local.
    December 21, 2016   doi: 10.1002/asi.23745   open full text
  • Citation behavior: A large‐scale test of the persuasion by name‐dropping hypothesis.
    Tove Faber Frandsen, Jeppe Nicolaisen.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Citation frequencies are commonly interpreted as measures of quality or impact. Yet, the true nature of citations and their proper interpretation have been the center of a long, but still unresolved discussion in Bibliometrics. A comparison of 67,578 pairs of studies on the same healthcare topic, with the same publication age (1–15 years) reveals that when one of the studies is being selected for citation, it has on average received about three times as many citations as the other study. However, the average citation‐gap between selected or deselected studies narrows slightly over time, which fits poorly with the name‐dropping interpretation and better with the quality and impact‐interpretation. The results demonstrate that authors in the field of Healthcare tend to cite highly cited documents when they have a choice. This is more likely caused by differences related to quality than differences related to status of the publications cited.
    December 21, 2016   doi: 10.1002/asi.23746   open full text
  • Incremental author name disambiguation by exploiting domain‐specific heuristics.
    Alan Filipe Santana, Marcos André Gonçalves, Alberto H. F. Laender, Anderson A. Ferreira.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    The vast majority of the current author name disambiguation solutions are designed to disambiguate a whole digital library (DL) at once considering the entire repository. However, these solutions besides being very expensive and having scalability problems, also may not benefit from eventual manual corrections, as they may be lost whenever the process of disambiguating the entire repository is required. In the real world, in which repositories are updated on a daily basis, incremental solutions that disambiguate only the newly introduced citation records, are likely to produce improved results in the long run. However, the problem of incremental author name disambiguation has been largely neglected in the literature. In this article we present a new author name disambiguation method, specially designed for the incremental scenario. In our experiments, our new method largely outperforms recent incremental proposals reported in the literature as well as the current state‐of‐the‐art non‐incremental method.
    December 21, 2016   doi: 10.1002/asi.23726   open full text
  • Goodreads: A social network site for book readers.
    Mike Thelwall, Kayvan Kousha.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Goodreads is an Amazon‐owned book‐based social web site for members to share books, read, review books, rate books, and connect with other readers. Goodreads has tens of millions of book reviews, recommendations, and ratings that may help librarians and readers to select relevant books. This article describes a first investigation of the properties of Goodreads users, using a random sample of 50,000 members. The results suggest that about three quarters of members with a public profile are female, and that there is little difference between male and female users in patterns of behavior, except for females registering more books and rating them less positively. Goodreads librarians and super‐users engage extensively with most features of the site. The absence of strong correlations between book‐based and social usage statistics (e.g., numbers of friends, followers, books, reviews, and ratings) suggests that members choose their own individual balance of social and book activities and rarely ignore one at the expense of the other. Goodreads is therefore neither primarily a book‐based website nor primarily a social network site but is a genuine hybrid, social navigation site.
    December 21, 2016   doi: 10.1002/asi.23733   open full text
  • Comparative evaluation of bibliometric content networks by tomographic content analysis: An application to Parkinson's disease.
    Keeheon Lee, SuYeon Kim, Erin Hea‐Jin Kim, Min Song.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    To understand the current state of a discipline and to discover new knowledge of a certain theme, one builds bibliometric content networks based on the present knowledge entities. However, such networks can vary according to the collection of data sets relevant to the theme by querying knowledge entities. In this study we classify three different bibliometric content networks. The primary bibliometric network is based on knowledge entities relevant to a keyword of the theme, the secondary network is based on entities associated with the lower concepts of the keyword, and the tertiary network is based on entities influenced by the theme. To explore the content and properties of these networks, we propose a tomographic content analysis that takes a slice‐and‐dice approach to analyzing the networks. Our findings indicate that the primary network is best suited to understanding the current knowledge on a certain topic, whereas the secondary network is good at discovering new knowledge across fields associated with the topic, and the tertiary network is appropriate for outlining the current knowledge of the topic and relevant studies.
    December 21, 2016   doi: 10.1002/asi.23752   open full text
  • Co‐word maps and topic modeling: A comparison using small and medium‐sized corpora (N < 1,000).
    Loet Leydesdorff, Adina Nerghes.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Induced by “big data,” “topic modeling” has become an attractive alternative to mapping co‐words in terms of co‐occurrences and co‐absences using network techniques. Does topic modeling provide an alternative for co‐word mapping in research practices using moderately sized document collections? We return to the word/document matrix using first a single text with a strong argument (“The Leiden Manifesto”) and then upscale to a sample of moderate size (n = 687) to study the pros and cons of the two approaches in terms of the resulting possibilities for making semantic maps that can serve an argument. The results from co‐word mapping (using two different routines) versus topic modeling are significantly uncorrelated. Whereas components in the co‐word maps can easily be designated, the topic models provide sets of words that are very differently organized. In these samples, the topic models seem to reveal similarities other than semantic ones (e.g., linguistic ones). In other words, topic modeling does not replace co‐word mapping in small and medium‐sized sets; but the paper leaves open the possibility that topic modeling would work well for the semantic mapping of large sets.
    December 21, 2016   doi: 10.1002/asi.23740   open full text
  • Understanding users of cloud music services: Selection factors, management and access behavior, and perceptions.
    Jin Ha Lee, Rachel Wishkoski, Lara Aase, Perry Meas, Chris Hubbles.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Recent, rapid changes in technology have resulted in a proliferation of choices for music storage and access. Portable, web‐enabled music devices are widespread, and listeners now enjoy a plethora of options regarding formats, devices, and access methods. Yet in this mobile music environment, listeners' access and management strategies for music collections are poorly understood, because behaviors surrounding the organization and retrieval of music collections have received little formal study. Our current research seeks to enrich our knowledge of people's music listening and collecting behavior through a series of systematic user studies. In this paper we present our findings from interviews involving 20 adult and 20 teen users of commercial cloud music services. Our results contribute to theoretical understandings of users' music information behavior in a time of upheaval in music usage patterns, and more generally, the purposes and meanings users ascribe to personal media collections in cloud‐based systems. The findings suggest improvements to the future design of cloud‐based music services, as well as to any information systems and services designed for personal media collections, benefiting both commercial entities and listeners.
    December 21, 2016   doi: 10.1002/asi.23754   open full text
  • Free‐to‐publish, free‐to‐read, or both? Cost, equality of access, and integrity in science publishing.
    Jack E. James.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    The Internet has triggered transformational change in the dissemination of science in the form of a global transition to open access (OA) publishing. Heavy investment favoring Gold over Green OA has been associated with increased total publication costs, inequality of opportunity to publish, and concerns about integrity in science reporting. Notwithstanding current fluidity because of ongoing competition for market share between supporters of the major alternative publishing strategies, emerging trends indicate the need for material and human resources to be redirected away from Gold and toward Green OA. Doing so will reduce total publication costs, increase equality of access for authors and readers, and remove the financial incentives that have encouraged poor and corrupt publishing practices.
    December 21, 2016   doi: 10.1002/asi.23757   open full text
  • The scaling relationship between citation‐based performance and coauthorship patterns in natural sciences.
    Guillermo Armando Ronda‐Pupo, J. Sylvan Katz.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    The aim of this paper is to extend our knowledge about the power‐law relationship between citation‐based performance and coauthorship patterns in papers in the natural sciences. We analyzed 829,924 articles that received 16,490,346 citations. The number of articles published through coauthorship accounts for 89%. The citation‐based performance and coauthorship patterns exhibit a power‐law correlation with a scaling exponent of 1.20 ± 0.07. Citations to a subfield's research articles tended to increase 2.1.20 or 2.30 times each time it doubled the number of coauthored papers. The scaling exponent for the power‐law relationship for single‐authored papers was 0.85 ± 0.11. The citations to a subfield's single‐authored research articles increased 2.0.85 or 1.89 times each time the research area doubled the number of single‐authored papers. The Matthew Effect is stronger for coauthored papers than for single‐authored. In fact, with a scaling exponent <1.0 the impact of single‐authored papers exhibits a cumulative disadvantage or inverse Matthew Effect.
    December 21, 2016   doi: 10.1002/asi.23759   open full text
  • Book genre and author gender: Romance>Paranormal‐Romance to Autobiography>Memoir.
    Mike Thelwall.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Although gender differences are known to exist in the publishing industry and in reader preferences, there is little public systematic data about them. This article uses evidence from the book‐based social website Goodreads to provide a large scale analysis of 50 major English book genres based on author genders. The results show gender differences in authorship in almost all categories and gender differences the level of interest in, and ratings of, books in a minority of categories. Perhaps surprisingly in this context, there is not a clear gender‐based relationship between the success of an author and their prevalence within a genre. The unexpected almost universal authorship gender differences should give new impetus to investigations of the importance of gender in fiction and the success of minority genders in some genres should encourage publishers and librarians to take their work seriously, except perhaps for most male‐authored chick‐lit.
    December 21, 2016   doi: 10.1002/asi.23768   open full text
  • User involvement and system support in applying search tactics.
    Iris Xie, Soohyung Joo, Renee Bennett‐Kapusniak.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Both user involvement and system support play important roles in applying search tactics. To apply search tactics in the information retrieval (IR) processes, users make decisions and take actions in the search process, while IR systems assist them by providing different system features. After analyzing 61 participants’ information searching diaries and questionnaires we identified various types of user involvement and system support in applying different types of search tactics. Based on quantitative analysis, search tactics were classified into 3 groups: user‐dominated, system‐dominated, and balanced tactics. We further explored types of user involvement and types of system support in applying search tactics from the 3 groups. The findings show that users and systems play major roles in applying user‐dominated and system‐dominated tactics, respectively. When applying balanced tactics, users and systems must collaborate closely with each other. In this article, we propose a model that illustrates user involvement and system support as they occur in user‐dominated tactics, system‐dominated tactics, and balanced tactics. Most important, IR system design implications are discussed to facilitate effective and efficient applications of the 3 groups of search tactics.
    December 21, 2016   doi: 10.1002/asi.23765   open full text
  • Search task features in work tasks of varying types and complexity.
    Miamaria Saastamoinen, Kalervo Järvelin.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Information searching in practice seldom is an end in itself. In work, work task (WT) performance forms the context, which information searching should serve. Therefore, information retrieval (IR) systems development/evaluation should take the WT context into account. The present paper analyzes how WT features: task complexity and task types, affect information searching in authentic work: the types of information needs, search processes, and search media. We collected data on 22 information professionals in authentic work situations in three organization types: city administration, universities, and companies. The data comprise 286 WTs and 420 search tasks (STs). The data include transaction logs, video recordings, daily questionnaires, interviews. and observation. The data were analyzed quantitatively. Even if the participants used a range of search media, most STs were simple throughout the data, and up to 42% of WTs did not include searching. WT's effects on STs are not straightforward: different WT types react differently to WT complexity. Due to the simplicity of authentic searching, the WT/ST types in interactive IR experiments should be reconsidered.
    December 21, 2016   doi: 10.1002/asi.23766   open full text
  • On the feasibility of predicting popular news at cold start.
    Ioannis Arapakis, Berkant Barla Cambazoglu, Mounia Lalmas.
    Journal of the American Society for Information Science and Technology. December 21, 2016
    Prominent news sites on the web provide hundreds of news articles daily. The abundance of news content competing to attract online attention, coupled with the manual effort involved in article selection, necessitates the timely prediction of future popularity of these news articles. The future popularity of a news article can be estimated using signals indicating the article's penetration in social media (e.g., number of tweets) in addition to traditional web analytics (e.g., number of page views). In practice, it is important to make such estimations as early as possible, preferably before the article is made available on the news site (i.e., at cold start). In this paper we perform a study on cold‐start news popularity prediction using a collection of 13,319 news articles obtained from Yahoo News, a major news provider. We characterize the popularity of news articles through a set of online metrics and try to predict their values across time using machine learning techniques on a large collection of features obtained from various sources. Our findings indicate that predicting news popularity at cold start is a difficult task, contrary to the findings of a prior work on the same topic. Most articles' popularity may not be accurately anticipated solely on the basis of content features, without having the early‐stage popularity values.
    December 21, 2016   doi: 10.1002/asi.23756   open full text
  • Funding Data from Publication Acknowledgments: Coverage, Uses, and Limitations.
    Nicola Grassano, Daniele Rotolo, Joshua Hutton, Frédérique Lang, Michael M. Hopkins.
    Journal of the American Society for Information Science and Technology. November 22, 2016
    This article contributes to the development of methods for analysing research funding systems by exploring the robustness and comparability of emerging approaches to generate funding landscapes useful for policy making. We use a novel data set of manually extracted and coded data on the funding acknowledgements of 7,510 publications representing UK cancer research in the year 2011 and compare these “reference data” with funding data provided by Web of Science (WoS) and MEDLINE/PubMed. Findings show high recall (around 93%) of WoS funding data. By contrast, MEDLINE/PubMed data retrieved less than half of the UK cancer publications acknowledging at least one funder. Conversely, both databases have high precision (+90%): That is, few cases of publications with no acknowledgment to funders are identified as having funding data. Nonetheless, funders acknowledged in UK cancer publications were not correctly listed by MEDLINE/PubMed and WoS in around 75% and 32% of the cases, respectively. Reference data on the UK cancer research funding system are used as a case study to demonstrate the utility of funding data for strategic intelligence applications (e.g., mapping of funding landscape and co‐funding activity, comparison of funders' research portfolios).
    November 22, 2016   doi: 10.1002/asi.23737   open full text
  • The state and evolution of U.S. iSchools: From talent acquisitions to research outcome.
    Zhiya Zuo, Kang Zhao, David Eichmann.
    Journal of the American Society for Information Science and Technology. November 18, 2016
    The past 2 decades have witnessed the emergence of information as a scientific discipline and the growth of information schools around the world. We analyzed the current state of the iSchool community in the U.S. with a special focus on the evolution of the community. We conducted our study from the perspectives of acquiring talents and producing research, including the analysis on iSchool faculty members' educational backgrounds, research topics, and the hiring network among iSchools. Applying text mining techniques and social network analysis to data from various sources, our research revealed how the iSchool community gradually built its own identity over time, including the growing number of faculty members who received their doctorates from the field that studies information, the deviation from computer science and library science, the rising emphasis on the intersection of information, technology, and people, and the increasing educational and research homogeneity as a community. These findings suggest that iSchools in the U.S. are evolving into a mature and independent discipline with a more established identity.
    November 18, 2016   doi: 10.1002/asi.23751   open full text
  • Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge?
    Richard Klavans, Kevin W. Boyack.
    Journal of the American Society for Information Science and Technology. October 13, 2016
    In 1965, Price foresaw the day when a citation‐based taxonomy of science and technology would be delineated and correspondingly used for science policy. A taxonomy needs to be comprehensive and accurate if it is to be useful for policy making, especially now that policy makers are utilizing citation‐based indicators to evaluate people, institutions and laboratories. Determining the accuracy of a taxonomy, however, remains a challenge. Previous work on the accuracy of partition solutions is sparse, and the results of those studies, although useful, have not been definitive. In this study we compare the accuracies of topic‐level taxonomies based on the clustering of documents using direct citation, bibliographic coupling, and co‐citation. Using a set of new gold standards—articles with at least 100 references—we find that direct citation is better at concentrating references than either bibliographic coupling or co‐citation. Using the assumption that higher concentrations of references denote more accurate clusters, direct citation thus provides a more accurate representation of the taxonomy of scientific and technical knowledge than either bibliographic coupling or co‐citation. We also find that discipline‐level taxonomies based on journal schema are highly inaccurate compared to topic‐level taxonomies, and recommend against their use.
    October 13, 2016   doi: 10.1002/asi.23734   open full text
  • Amplifying the impact of open access: Wikipedia and the diffusion of science.
    Misha Teplitskiy, Grace Lu, Eamon Duede.
    Journal of the American Society for Information Science and Technology. October 13, 2016
    With the rise of Wikipedia as a first‐stop source for scientific information, it is important to understand whether Wikipedia draws upon the research that scientists value most. Here we identify the 250 most heavily used journals in each of 26 research fields (4,721 journals, 19.4M articles) indexed by the Scopus database, and test whether topic, academic status, and accessibility make articles from these journals more or less likely to be referenced on Wikipedia. We find that a journal's academic status (impact factor) and accessibility (open access policy) both strongly increase the probability of it being referenced on Wikipedia. Controlling for field and impact factor, the odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to paywall journals. These findings provide evidence is that a major consequence of open access policies is to significantly amplify the diffusion of science, through an intermediary like Wikipedia, to a broad audience.
    October 13, 2016   doi: 10.1002/asi.23687   open full text
  • Behavior‐based personalization in web search.
    Fei Cai, Shuaiqiang Wang, Maarten de Rijke.
    Journal of the American Society for Information Science and Technology. September 19, 2016
    Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a different type of signal: We investigate the use of behavioral information for the purpose of search personalization. That is, we consider clicks and dwell time for reranking an initially retrieved list of documents. In particular, we (i) investigate the impact of distributions of users and queries on document reranking; (ii) estimate the relevance of a document for a query at 2 levels, at the query‐level and at the word‐level, to alleviate the problem of sparseness; and (iii) perform an experimental evaluation both for users seen during the training period and for users not seen during training. For the latter, we explore the use of information from similar users who have been seen during the training period. We use the dwell time on clicked documents to estimate a document's relevance to a query, and perform Bayesian probabilistic matrix factorization to generate a relevance distribution of a document over queries. Our experiments show that: (i) for personalized ranking, behavioral information helps to improve retrieval effectiveness; and (ii) given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user‐dependent adaptive weight outperforms any combination with a fixed weight.
    September 19, 2016   doi: 10.1002/asi.23735   open full text
  • Increasing citizen science contribution using a virtual peer.
    Jeffrey Laut, Francesco Cappa, Oded Nov, Maurizio Porfiri.
    Journal of the American Society for Information Science and Technology. August 16, 2016
    Online participation is becoming an increasingly common means for individuals to contribute to citizen science projects, yet such projects often rely on only a small fraction of participants to make the majority of contributions. Here, we investigate a means for influencing the performance of citizen scientists toward enhancing overall participation. Building on past social comparison research, we pair citizen scientists with a software‐based virtual peer in an environmental monitoring project. Through a series of experiments in which virtual peers outperform, underperform, or perform similarly to human participants, we investigate the influence of their presence on citizen science participation. To offer insight into the psychological determinants to the response to this intervention, we propose a new dynamic model describing the bidirectional interaction between humans and virtual peers. Our results demonstrate that participant contribution can be enhanced through the presence of a virtual peer, creating a feedback loop where participants tend to increase or decrease their contribution in response to their peers' performance. By including virtual peers that systematically outperform the participants, we demonstrate a fourfold increase in their contribution to the citizen science project.
    August 16, 2016   doi: 10.1002/asi.23685   open full text
  • Evaluation of context‐aware recommendation systems for information re‐finding.
    Maya Sappelli, Suzan Verberne, Wessel Kraaij.
    Journal of the American Society for Information Science and Technology. August 03, 2016
    In this article we evaluate context‐aware recommendation systems for information re‐finding by knowledge workers. We identify 4 criteria that are relevant for evaluating the quality of knowledge worker support: context relevance, document relevance, prediction of user action, and diversity of the suggestions. We compare 3 different context‐aware recommendation methods for information re‐finding in a writing support task. The first method uses contextual prefiltering and content‐based recommendation (CBR), the second uses the just‐in‐time information retrieval paradigm (JITIR), and the third is a novel network‐based recommendation system where context is part of the recommendation model (CIA). We found that each method has its own strengths: CBR is strong at context relevance, JITIR captures document relevance well, and CIA achieves the best result at predicting user action. Weaknesses include that CBR depends on a manual source to determine the context and in JITIR the context query can fail when the textual content is not sufficient. We conclude that to truly support a knowledge worker, all 4 evaluation criteria are important. In light of that conclusion, we argue that the network‐based approach the CIA offers has the highest robustness and flexibility for context‐aware information recommendation.
    August 03, 2016   doi: 10.1002/asi.23717   open full text
  • A journal's impact factor is influenced by changes in publication delays of citing journals.
    Dongbo Shi, Ronald Rousseau, Liu Yang, Jiang Li.
    Journal of the American Society for Information Science and Technology. August 03, 2016
    In this article we describe another problem with journal impact factors by showing that one journal's impact factor is dependent on other journals' publication delays. The proposed theoretical model predicts a monotonically decreasing function of the impact factor as a function of publication delay, on condition that the citation curve of the journal is monotone increasing during the publication window used in the calculation of the journal impact factor; otherwise, this function has a reversed U shape. Our findings based on simulations are verified by examining three journals in the information sciences: the Journal of Informetrics, Scientometrics, and the Journal of the Association for Information Science and Technology.
    August 03, 2016   doi: 10.1002/asi.23706   open full text
  • Mapping science through bibliometric triangulation: An experimental approach applied to water research.
    Bei Wen, Edwin Horlings, Mariëlle van der Zouwen, Peter van den Besselaar.
    Journal of the American Society for Information Science and Technology. August 03, 2016
    The idea of constructing science maps based on bibliographic data has intrigued researchers for decades, and various techniques have been developed to map the structure of research disciplines. Most science mapping studies use a single method. However, as research fields have various properties, a valid map of a field should actually be composed of a set of maps derived from a series of investigations using different methods. That leads to the question of what can be learned from a combination—triangulation—of these different science maps. In this paper we propose a method for triangulation, using the example of water science. We combine three different mapping approaches: journal–journal citation relations (JJCR), shared author keywords (SAK), and title word‐cited reference co‐occurrence (TWRC). Our results demonstrate that triangulation of JJCR, SAK, and TWRC produces a more comprehensive picture than each method applied individually. The outcomes from the three different approaches can be associated with each other and systematically interpreted to provide insights into the complex multidisciplinary structure of the field of water research.
    August 03, 2016   doi: 10.1002/asi.23696   open full text
  • Identifying potential “breakthrough” publications using refined citation analyses: Three related explorative approaches.
    Jesper W. Schneider, Rodrigo Costas.
    Journal of the American Society for Information Science and Technology. August 03, 2016
    The article presents three advanced citation‐based methods used to detect potential breakthrough articles among very highly cited articles. We approach the detection of such articles from three different perspectives in order to provide different typologies of breakthrough articles. In all three cases we use the hierarchical classification of scientific publications developed at CWTS based on direct citation relationships. We assume that such contextualized articles focus on similar research interests. We utilize the characteristics scores and scales (CSS) approach to partition citation distributions and implement a specific filtering algorithm to sort out potential highly‐cited “followers,” articles not considered breakthroughs. After invoking thresholds and filtering, three methods are explored: A very exclusive one where only the highest cited article in a micro‐cluster is considered as a potential breakthrough article (M1); as well as two conceptually different methods, one that detects potential breakthrough articles among the 2% highest cited articles according to CSS (M2a), and finally a more restrictive version where, in addition to the CSS 2% filter, knowledge diffusion is also considered (M2b). The advance citation‐based methods are explored and evaluated using validated publication sets linked to different Danish funding instruments including centers of excellence.
    August 03, 2016   doi: 10.1002/asi.23695   open full text
  • Information exchange on an academic social networking site: A multidiscipline comparison on researchgate Q&A.
    Wei Jeng, Spencer DesAutels, Daqing He, Lei Li.
    Journal of the American Society for Information Science and Technology. August 03, 2016
    The increasing popularity of academic social networking sites (ASNSs) requires studies on the usage of ASNSs among scholars and evaluations of the effectiveness of these ASNSs. However, it is unclear whether current ASNSs have fulfilled their design goal, as scholars' actual online interactions on these platforms remain unexplored. To fill the gap, this article presents a study based on data collected from ResearchGate. Adopting a mixed‐method design by conducting qualitative content analysis and statistical analysis on 1,128 posts collected from ResearchGate Q&A, we examine how scholars exchange information and resources, and how their practices vary across three distinct disciplines: library and information services, history of art, and astrophysics. Our results show that the effect of a questioner's intention (i.e., seeking information or discussion) is greater than disciplinary factors in some circumstances. Across the three disciplines, responses to questions provide various resources, including experts' contact details, citations, links to Wikipedia, images, and so on. We further discuss several implications of the understanding of scholarly information exchange and the design of better academic social networking interfaces, which should stimulate scholarly interactions by minimizing confusion, improving the clarity of questions, and promoting scholarly content management.
    August 03, 2016   doi: 10.1002/asi.23692   open full text
  • User‐level microblogging recommendation incorporating social influence.
    Daifeng Li, Zhipeng Luo, Ying Ding, Jie Tang, Gordon Guo‐Zheng Sun, Xiaowen Dai, John Du, Jingwei Zhang, Shoubin Kong.
    Journal of the American Society for Information Science and Technology. August 03, 2016
    With the information overload of user‐generated content in microblogging, users find it extremely challenging to browse and find valuable information in their first attempt. In this paper we propose a microblogging recommendation algorithm, TSI‐MR (Topic‐Level Social Influence‐based Microblogging Recommendation), which can significantly improve users' microblogging experiences. The main innovation of this proposed algorithm is that we consider social influences and their indirect structural relationships, which are largely based on social status theory, from the topic level. The primary advantage of this approach is that it can build an accurate description of latent relationships between two users with weak connections, which can improve the performance of the model; furthermore, it can solve sparsity problems of training data to a certain extent. The realization of the model is mainly based on Factor Graph. We also applied a distributed strategy to further improve the efficiency of the model. Finally, we use data from Tencent Weibo, one of the most popular microblogging services in China, to evaluate our methods. The results show that incorporating social influence can improve microblogging performance considerably, and outperform the baseline methods.
    August 03, 2016   doi: 10.1002/asi.23681   open full text
  • Decentralized subject indexing of television programs: The effects of using a semicontrolled indexing language.
    Veslemøy Søbak, Nils Pharo.
    Journal of the American Society for Information Science and Technology. June 13, 2016
    We performed an exploratory case study to understand how subject indexing performed by television production staff using a semicontrolled vocabulary affects indexing quality. In the study we used triangulation, combining tag analysis and semistructured interviews, with production staff of the Norwegian Broadcasting Corporation. The main findings reveal incomplete indexing of TV programs and their parts, in addition to low indexing consistency and uneven indexing exhaustivity. The informants expressed low motivation and a high level of uncertainty regarding the task. Internal guidelines and high domain knowledge among the indexers does not form a sufficient basis for creating quality and consistency in the vocabulary. The challenges that are revealed in the terminological analysis, combined with low indexing knowledge and lack of motivation, will create difficulties in the retrieval phase.
    June 13, 2016   doi: 10.1002/asi.23700   open full text
  • The Societal Responsibilities of Computational Modelers: Human Values and Professional Codes of Ethics.
    Kenneth R. Fleischmann, Cindy Hui, William A. Wallace.
    Journal of the American Society for Information Science and Technology. June 13, 2016
    Information and communication technology (ICT) has increasingly important implications for our everyday lives, with the potential to both solve existing social problems and create new ones. This article focuses on one particular group of ICT professionals, computational modelers, and explores how these ICT professionals perceive their own societal responsibilities. Specifically, the article uses a mixed‐method approach to look at the role of professional codes of ethics and explores the relationship between modelers’ experiences with, and attitudes toward, codes of ethics and their values. Statistical analysis of survey data reveals a relationship between modelers’ values and their attitudes and experiences related to codes of ethics. Thematic analysis of interviews with a subset of survey participants identifies two key themes: that modelers should be faithful to the reality and values of users and that codes of ethics should be built from the bottom up. One important implication of the research is that those who value universalism and benevolence may have a particular duty to act on their values and advocate for, and work to develop, a code of ethics.
    June 13, 2016   doi: 10.1002/asi.23697   open full text
  • Are wikipedia citations important evidence of the impact of scholarly articles and books?
    Kayvan Kousha, Mike Thelwall.
    Journal of the American Society for Information Science and Technology. June 13, 2016
    Individual academics and research evaluators often need to assess the value of published research. Although citation counts are a recognized indicator of scholarly impact, alternative data is needed to provide evidence of other types of impact, including within education and wider society. Wikipedia is a logical choice for both of these because the role of a general encyclopaedia is to be an understandable repository of facts about a diverse array of topics and hence it may cite research to support its claims. To test whether Wikipedia could provide new evidence about the impact of scholarly research, this article counted citations to 302,328 articles and 18,735 monographs in English indexed by Scopus in the period 2005 to 2012. The results show that citations from Wikipedia to articles are too rare for most research evaluation purposes, with only 5% of articles being cited in all fields. In contrast, a third of monographs have at least one citation from Wikipedia, with the most in the arts and humanities. Hence, Wikipedia citations can provide extra impact evidence for academic monographs. Nevertheless, the results may be relatively easily manipulated and so Wikipedia is not recommended for evaluations affecting stakeholder interests.
    June 13, 2016   doi: 10.1002/asi.23694   open full text
  • The effect of the “very important paper” (VIP) designation in Angewandte Chemie International Edition on citation impact: A propensity score matching analysis.
    Rüdiger Mutz, Tobias Wolbring, Hans‐Dieter Daniel.
    Journal of the American Society for Information Science and Technology. June 13, 2016
    Scientific journals publish an increasing number of articles every year. To steer readers’ attention to the most important papers, journals use several techniques (e.g., lead paper). Angewandte Chemie International Edition (AC), a leading international journal in chemistry, signals high‐quality papers through designating them as a “very important paper” (VIP). This study aims to investigate the citation impact of Communications in AC receiving the special feature VIP, both cumulated and over time. Using propensity score matching, treatment group (VIP) and control group (non‐VIP) were balanced for 14 covariates to estimate the unconfounded “average treatment effect on the treated” for the VIP designation. Out of N = 3,011 Communications published in 2007 and 2008, N = 207 received the special feature VIP. For each Communication, data were collected from AC (e.g., referees’ ratings) and from the databases Chemical Abstracts (e.g., sections) and the Web of Science (e.g., citations). The estimated unconfounded average treatment effect on the treated (that is, Communications designated as a VIP) was statistically significant and amounted to 19.83 citations. In addition, the special feature VIP fostered the cumulated annual citation growth. For instance, the time until a Communication reached its maximum annual number of citations, was reduced.
    June 13, 2016   doi: 10.1002/asi.23701   open full text
  • Is collaboration among scientists related to the citation impact of papers because their quality increases with collaboration? An analysis based on data from F1000Prime and normalized citation scores.
    Lutz Bornmann.
    Journal of the American Society for Information Science and Technology. June 11, 2016
    In recent years, the relationship of collaboration among scientists and the citation impact of papers have been frequently investigated. Most of the studies show that the two variables are closely related: An increasing collaboration activity (measured in terms of number of authors, number of affiliations, and number of countries) is associated with an increased citation impact. However, it is not clear whether the increased citation impact is based on the higher quality of papers that profit from more than one scientist giving expert input or other (citation‐specific) factors. Thus, the current study addresses this question by using two comprehensive data sets with publications (in the biomedical area) including quality assessments by experts (F1000Prime member scores) and citation data for the publications. The study is based on more than 15,000 papers. Robust regression models are used to investigate the relationship between number of authors, number of affiliations, and number of countries, respectively, and citation impact—controlling for the papers' quality (measured by F1000Prime expert ratings). The results point out that the effect of collaboration activities on impact is largely independent of the papers' quality. The citation advantage is apparently not quality related; citation‐specific factors (e.g., self‐citations) seem to be important here.
    June 11, 2016   doi: 10.1002/asi.23728   open full text
  • Publication boost in web of science journals and its effect on citation distributions.
    Lovro Šubelj, Dalibor Fiala.
    Journal of the American Society for Information Science and Technology. June 11, 2016
    In this article, we show that the dramatic increase in the number of research articles indexed in the Web of Science database impacts the commonly observed distributions of citations within these articles. First, we document that the growing number of physics articles in recent years is attributed to existing journals publishing more and more articles rather than more new journals coming into being as it happens in computer science. Second, even though the references from the more recent articles generally cover a longer time span, the newer articles are cited more frequently than the older ones if the uneven article growth is not corrected for. Nevertheless, despite this change in the distribution of citations, the citation behavior of scientists does not seem to have changed.
    June 11, 2016   doi: 10.1002/asi.23718   open full text
  • Looking for “normal”: Sense making in the context of health disruption.
    Shelagh K. Genuis, Jenny Bronstein.
    Journal of the American Society for Information Science and Technology. June 06, 2016
    This investigation examines perceptions of normality emerging from two distinct studies of information behavior associated with life disrupting health symptoms and theorizes the search for normality in the context of sense making theory. Study I explored the experiences of women striving to make sense of symptoms associated with menopause; Study II examined posts from two online discussion groups for people with symptoms of obsessive compulsive disorder. Joint data analysis demonstrates that normality was initially perceived as the absence of illness. A breakdown in perceived normality because of disruptive symptoms created gaps and discontinuities in understanding. As participants interacted with information about the experiences of health‐challenged peers, socially constructed notions of normality emerged. This was internalized as a “new normal.” Findings demonstrate normality as an element of sense making that changes and develops over time, and experiential information and social contexts as central to health‐related sense making. Re‐establishing perceptions of normality, as experienced by health‐challenged peers, was an important element of sense making. This investigation provides nuanced insight into notions of normality, extends understanding of social processes involved in sense making, and represents the first theorizing of and model development for normality within the information science and sense making literature.
    June 06, 2016   doi: 10.1002/asi.23715   open full text
  • Presenting bibliographic families using information visualization: Evaluation of FRBR‐based prototype and hierarchical visualizations.
    Tanja Merčun, Maja Žumer, Trond Aalberg.
    Journal of the American Society for Information Science and Technology. June 06, 2016
    Since their beginnings, bibliographic information systems have been displaying results in the form of long, textual lists. With the development of new data models and computer technologies, the need for new approaches to present and interact with bibliographic data has slowly been maturing. To investigate how this could be accomplished, a prototype system, FrbrVis1, was designed to present work families within a bibliographic information system using information visualization. This paper reports on two user studies, a controlled and an observational experiment, that have been carried out to assess the Functional Requirements for Bibliographic Records (FRBR)‐based against an existing system as well as to test four different hierarchical visual layouts. The results clearly show that FrbrVis offers better performance and user experience compared to the baseline system. The differences between the four hierarchical visualizations (Indented tree, Radial tree, Circlepack, and Sunburst) were, on the other hand, not as pronounced, but the Indented tree and Sunburst design proved to be the most successful, both in performance as well as user perception. The paper therefore not only evaluates the application of a visual presentation of bibliographic work families, but also provides valuable results regarding the performance and user acceptance of individual hierarchical visualization techniques.
    June 06, 2016   doi: 10.1002/asi.23659   open full text
  • Data reusers' trust development.
    Ayoung Yoon.
    Journal of the American Society for Information Science and Technology. June 03, 2016
    Data reuse refers to the secondary use of data—not for its original purpose but for studying new problems. Although reusing data might not yet be the norm in every discipline, the benefits of reusing shared data have been asserted by a number of researchers, and data reuse has been a major concern in many disciplines. Assessing data for trustworthiness becomes important in data reuse with the growth in data creation because of the lack of standards for ensuring data quality and potential harm from using poor‐quality data. This research explores many facets of data reusers' trust in data generated by other researchers focusing on the trust judgment process with influential factors that determine reusers' trust. The author took an interpretive qualitative approach by using in‐depth semistructured interviews as the primary research method. The study results suggest different stages of trust development associated with the process of data reuse. Data reusers' trust may remain the same throughout their experiences, but it can also be formed, lost, declined, and recovered during their data reuse experiences. These various stages reflect the dynamic nature of trust.
    June 03, 2016   doi: 10.1002/asi.23730   open full text
  • Print or digital? Reading behavior and preferences in Japan.
    Keiko Kurata, Emi Ishita, Yosuke Miyata, Yukiko Minami.
    Journal of the American Society for Information Science and Technology. June 03, 2016
    In today's digital age, daily reading may be becoming digital reading. To understand this possible shift from reading print media to reading digital media, we investigated reading behavior for 11 media and reading preferences between print and digital in different circumstances. In August 2012, an online survey was used to inquire about the reading behavior and preference of 1,755 participants, ranging in age from 18 to 69 years. The participants contained equal numbers of men and women from five age brackets. Our main finding was that approximately 70% of total reading time was spent on digital media and that preferences favored print media. Cluster analysis of reading time by media was used to categorize respondents into eight clusters, and a second cluster analysis on stated preference (digital or print) yielded six clusters. The correspondence analysis between reading behavior clusters and preference clusters revealed that there is a mismatch between reading behavior and stated preference for either print or digital media.
    June 03, 2016   doi: 10.1002/asi.23712   open full text
  • Understanding and supporting anonymity policies in peer review.
    Syavash Nobarany, Kellogg S. Booth.
    Journal of the American Society for Information Science and Technology. June 03, 2016
    Design of peer‐review support systems is shaped by the policies that define and govern the process of peer review. An important component of these are policies that deal with anonymity: The rules that govern the concealment and transparency of information related to identities of the various stakeholders (authors, reviewers, editors, and others) involved in the peer‐review process. Anonymity policies have been a subject of debate for several decades within scholarly communities. Because of widespread criticism of traditional peer‐review processes, a variety of new peer‐review processes have emerged that manage the trade‐offs between disclosure and concealment of identities in different ways. Based on an analysis of policies and guidelines for authors and reviewers provided by publication venues, we developed a framework for understanding how disclosure and concealment of identities is managed. We discuss the appropriate role of information technology and computer support for the peer‐review process within that framework.
    June 03, 2016   doi: 10.1002/asi.23711   open full text
  • Hypertext configurations: Genres in networked digital media.
    Niels Ole Finnemann.
    Journal of the American Society for Information Science and Technology. June 03, 2016
    The article presents a conceptual framework for distinguishing different sorts of heterogeneous digital materials. The hypothesis is that a wide range of heterogeneous data resources can be characterized and classified due to their particular configurations of hypertext features such as scripts, links, interactive processes, and time scalings, and that the hypertext configuration is a major but not sole source of the messiness of big data. The notion of hypertext will be revalidated, placed at the center of the interpretation of networked digital media, and used in the analysis of the fast‐growing amounts of heterogeneous digital collections, assemblages, and corpora. The introduction summarizes the wider background of a fast‐changing data landscape.
    June 03, 2016   doi: 10.1002/asi.23709   open full text
  • Story‐focused reading in online news and its potential for user engagement.
    Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ricardo Baeza‐Yates.
    Journal of the American Society for Information Science and Technology. June 03, 2016
    We study the news reading behavior of several hundred thousand users on 65 highly visited news sites. We focus on a specific phenomenon: users reading several articles related to a particular news development, which we call story‐focused reading. Our goal is to understand the effect of story‐focused reading on user engagement and how news sites can support this phenomenon. We found that most users focus on stories that interest them and that even casual news readers engage in story‐focused reading. During story‐focused reading, users spend more time reading and a larger number of news sites are involved. In addition, readers employ different strategies to find articles related to a story. We also analyze how news sites promote story‐focused reading by looking at how they link their articles to related content published by them, or by other sources. The results show that providing links to related content leads to a higher engagement of the users, and that this is the case even for links to external sites. We also show that the performance of links can be affected by their type, their position, and how many of them are present within an article.
    June 03, 2016   doi: 10.1002/asi.23707   open full text
  • The effect of social network site use on the psychological well‐being of cancer patients.
    Seyedezahra Shadi Erfani, Babak Abedin, Yvette Blount.
    Journal of the American Society for Information Science and Technology. June 03, 2016
    Social network sites (SNSs) are growing in popularity and social significance. Although researchers have attempted to explain the effect of SNS use on users' psychological well‐being, previous studies have produced inconsistent results. In addition, most previous studies relied on healthy students as participants; other cohorts of SNSs users, in particular people living with serious health conditions, have been neglected. In this study, we carried out semistructured interviews with users of the Ovarian Cancer Australia (OCA) Facebook to assess how and in what ways SNS use impacts their psychological well‐being. A theoretical model was proposed to develop a better understanding of the relationships between SNS use and the psychological well‐being of cancer patients. Analysis of data collected through a subsequent quantitative survey confirmed the theoretical model and empirically revealed the extent to which SNS use impacts the psychological well‐being of cancer patients. Findings showed the use of OCA Facebook enhances social support, enriches the experience of social connectedness, develops social presence and learning and ultimately improves the psychological well‐being of cancer patients.
    June 03, 2016   doi: 10.1002/asi.23702   open full text
  • Going beyond intention: Integrating behavioral expectation into the unified theory of acceptance and use of technology.
    Likoebe M. Maruping, Hillol Bala, Viswanath Venkatesh, Susan A. Brown.
    Journal of the American Society for Information Science and Technology. June 03, 2016
    Research on information technology (IT) adoption and use, one of the most mature streams of research in the information science and information systems literature, is primarily based on the intentionality framework. Behavioral intention (BI) to use an IT is considered the sole proximal determinant of IT adoption and use. Recently, researchers have discussed the limitations of BI and argued that behavioral expectation (BE) would be a better predictor of IT use. However, without a theoretical and empirical understanding of the determinants of BE, we remain limited in our comprehension of what factors promote greater IT use in organizations. Using the unified theory of acceptance and use of technology as the theoretical framework, we develop a model that posits 2 determinants (i.e., social influence and facilitating conditions) of BE and 4 moderators (i.e., gender, age, experience, and voluntariness of use) of the relationship between BE and its determinants. We argue that the cognitions underlying the formation of BI and BE differ. We found strong support for the proposed model in a longitudinal field study of 321 users of a new IT. We offer theoretical and practical IT implications of our findings.
    June 03, 2016   doi: 10.1002/asi.23699   open full text
  • News censorship in online social networks: A study of circumvention in the commentsphere.
    David G. Schwartz, Inbal Yahav, Gahl Silverman.
    Journal of the American Society for Information Science and Technology. June 03, 2016
    This study investigates the interplay between online news, reader comments, and social networks to detect and characterize comments leading to the revelation of censored information. Censorship of identity occurs in different contexts–for example, the military censors the identity of personnel and the judiciary censors the identity of minors and victims. We address three objectives: (a) assess the relevance of identity censorship in the presence of user‐generated comments, (b) understand the fashion of censorship circumvention (what people say and how), and (c) determine how comment analysis can aid in identifying decensorship and information leakage through comments. After examining 3,582 comments made on 48 articles containing obfuscated terms, we find that a systematic examination of comments can compromise identity censorship. We identify and categorize information leakage in comments indicative of knowledge of censored information that may result in information decensorship. We show that the majority of censored articles contained at least one comment leading to censorship circumvention.
    June 03, 2016   doi: 10.1002/asi.23698   open full text
  • Measuring metrics ‐ a 40‐year longitudinal cross‐validation of citations, downloads, and peer review in astrophysics.
    Michael J. Kurtz, Edwin A. Henneken.
    Journal of the American Society for Information Science and Technology. April 22, 2016
    Citation measures, and newer altmetric measures such as downloads are now commonly used to inform personnel decisions. How well do or can these measures measure or predict the past, current, or future scholarly performance of an individual? Using data from the Smithsonian/NASA Astrophysics Data System we analyze the publication, citation, download, and distinction histories of a cohort of 922 individuals who received a U.S. PhD in astronomy in the period 1972‐1976. By examining the same and different measures at the same and different times for the same individuals we are able to show the capabilities and limitations of each measure. Because the distributions are lognormal, measurement uncertainties are multiplicative; we show that in order to state with 95% confidence that one person's citations and downloads are significantly higher than another person's, the log difference in the ratio of counts must be at least 0.3dex, which corresponds to a multiplicative factor of 2.
    April 22, 2016   doi: 10.1002/asi.23689   open full text
  • Toward multiviewpoint ontology construction by collaboration of non‐experts and crowdsourcing: The case of the effect of diet on health.
    Maayan Zhitomirsky‐Geffet, Eden S. Erez, Judit Bar‐Ilan.
    Journal of the American Society for Information Science and Technology. April 22, 2016
    Domain experts are skilled in buliding a narrow ontology that reflects their subfield of expertise based on their work experience and personal beliefs. We call this type of ontology a single‐viewpoint ontology. There can be a variety of such single viewpoint ontologies that represent a wide spectrum of subfields and expert opinions on the domain. However, to have a complete formal vocabulary for the domain they need to be linked and unified into a multiviewpoint model while having the subjective viewpoint statements marked and distinguished from the objectively true statements. In this study, we propose and implement a two‐phase methodology for multiviewpoint ontology construction by nonexpert users. The proposed methodology was implemented for the domain of the effect of diet on health. A large‐scale crowdsourcing experiment was conducted with about 750 ontological statements to determine whether each of these statements is objectively true, viewpoint, or erroneous. Typically, in crowdsourcing experiments the workers are asked for their personal opinions on the given subject. However, in our case their ability to objectively assess others' opinions was examined as well. Our results show substantially higher accuracy in classification for the objective assessment approach compared to the results based on personal opinions.
    April 22, 2016   doi: 10.1002/asi.23686   open full text
  • Attitudes of referees in a multidisciplinary journal: An empirical analysis.
    Niccolò Casnici, Francisco Grimaldo, Nigel Gilbert, Flaminio Squazzoni.
    Journal of the American Society for Information Science and Technology. April 15, 2016
    This paper looks at 10 years of reviews in a multidisciplinary journal, The Journal of Artificial Societies and Social Simulation (JASSS), which is the flagship journal of social simulation. We measured referee behavior and referees' agreement. We found that the disciplinary background and the academic status of the referee have an influence on the report time, the type of recommendation and the acceptance of the reviewing task. Referees from the humanities tend to be more generous in their recommendations than other referees, especially economists and environmental scientists. Second, we found that senior researchers are harsher in their judgments than junior researchers, and the latter accept requests to review more often and are faster in reporting. Finally, we found that articles that had been refereed and recommended for publication by a multidisciplinary set of referees were subsequently more likely to receive citations than those that had been reviewed by referees from the same discipline. Our results show that common standards of evaluation can be established even in multidisciplinary communities.
    April 15, 2016   doi: 10.1002/asi.23665   open full text
  • An empirical look at the nature index.
    Lutz Bornmann, Robin Haunschild.
    Journal of the American Society for Information Science and Technology. April 04, 2016
    In November 2014, the Nature Index (NI) was introduced (see http://www.natureindex.com) by the Nature Publishing Group (NPG). The NI comprises the primary research articles published in the past 12 months in a selection of reputable journals. Starting from two short comments on the NI (Haunschild & Bornmann, , ), we undertake an empirical analysis of the NI using comprehensive country data. We investigate whether the huge efforts of computing the NI are justified and whether the size‐dependent NI indicators should be complemented by size‐independent variants. The analysis uses data from the Max Planck Digital Library in‐house database (which is based on Web of Science data) and from the NPG. In the first step of the analysis, we correlate the NI with other metrics that are simpler to generate than the NI. The resulting large correlation coefficients point out that the NI produces similar results as simpler solutions. In the second step of the analysis, relative and size‐independent variants of the NI are generated that should be additionally presented by the NPG. The size‐dependent NI indicators favor large countries (or institutions) and the top‐performing small countries (or institutions) do not come into the picture.
    April 04, 2016   doi: 10.1002/asi.23682   open full text
  • Bridging the gap between wikipedia and academia.
    Dariusz Jemielniak, Eduard Aibar.
    Journal of the American Society for Information Science and Technology. April 04, 2016
    In this opinion piece, we would like to present a short literature review of perceptions and reservations towards Wikipedia in academia, address the common questions about overall reliability of Wikipedia entries, review the actual practices of Wikipedia usage in academia, and conclude with possible scenarios for a peaceful coexistence. Because Wikipedia is a regular topic of JASIST publications (Lim, 2009; Meseguer‐Artola, Aibar, Lladós, Minguillón, & Lerga, ; Mesgari, Okoli, Mehdi, Nielsen, & Lanamäki, ; Okoli, Mehdi, Mesgari, Nielsen, & Lanamäki, ), we hope to start a useful discussion with the right audience.
    April 04, 2016   doi: 10.1002/asi.23691   open full text
  • Metadata, infrastructure, and computer‐mediated communication in historical perspective.
    Bradley Fidler, Amelia Acker.
    Journal of the American Society for Information Science and Technology. April 04, 2016
    In this paper we describe the creation and use of metadata on the early Arpanet as part of normal network function. By using the Arpanet Host‐Host Protocol and its sockets as an entry point for studying the generation of metadata, we show that the development and function of key Arpanet infrastructure can be studied by examining the creation and stabilization of metadata. More specifically, we use the Host‐Host Protocol's sockets as an example of something that, at the level of the network, functions as both network infrastructure and metadata simultaneously. By presenting the function of sockets in tandem with an overview of the Host‐Host Protocol, we argue for the further integrated study of infrastructure and metadata. Finally, we reintroduce the concept of infradata to refer specifically to data that locate data throughout an infrastructure and are required by the infrastructure to function, separating them from established and stabilized standards. We argue for the future application of infradata as a concept for the study of histories and political economies of networks, bridging the largely library and information science (LIS) study of metadata with the largely science and technology studies (STS) domain of infrastructure.
    April 04, 2016   doi: 10.1002/asi.23660   open full text
  • Exploring interdisciplinary relationships between linguistics and information retrieval from the 1960s to today.
    Volkmar Engerer.
    Journal of the American Society for Information Science and Technology. April 04, 2016
    This article explores how linguistics has influenced information retrieval (IR) and attempts to explain the impact of linguistics through an analysis of internal developments in information science generally, and IR in particular. It notes that information science/IR has been evolving from a case science into a fully fledged, “disciplined”/disciplinary science. The article establishes correspondences between linguistics and information science/IR using the three established IR paradigms—physical, cognitive, and computational—as a frame of reference. The current relationship between information science/IR and linguistics is elucidated through discussion of some recent information science publications dealing with linguistic topics and a novel technique, “keyword collocation analysis,” is introduced. Insights from interdisciplinarity research and case theory are also discussed. It is demonstrated that the three stages of interdisciplinarity, namely multidisciplinarity, interdisciplinarity (in the narrow sense), and transdisciplinarity, can be linked to different phases of the information science/IR‐linguistics relationship and connected to different ways of using linguistic theory in information science and IR.
    April 04, 2016   doi: 10.1002/asi.23684   open full text
  • Impact in interdisciplinary and cross‐sector research: Opportunities and challenges.
    Daniel Gooch, Asimina Vasalou, Laura Benton.
    Journal of the American Society for Information Science and Technology. April 04, 2016
    Impact is embedded in today's research culture, with increasing importance being placed on the value of research to society. In interdisciplinary and cross‐sector projects, team members may hold distinct views on the types of impact they want to create. Set in the context of an interdisciplinary, cross‐sector project comprised of partners from academia, industry, and the nonprofit sector, our paper unpacks how these diverse project members understand impact. Our analysis shows that interdisciplinary projects offer a unique opportunity to create impact on a number of different levels. Moreover, it demonstrates that a lack of accountable design and collaboration practices can potentially hinder pathways to impact. Finally, we find that the interdisciplinary perspectives that such projects introduce encourage a rich gamut of sustainable outcomes that go beyond commercialization. Our findings support researchers working in these complex contexts to appreciate the opportunities and challenges involved in interdisciplinary cross‐sector research contexts while imparting them with strategies for overcoming these challenges.
    April 04, 2016   doi: 10.1002/asi.23658   open full text
  • Author publication preferences and journal competition.
    Ji‐Lung Hsieh.
    Journal of the American Society for Information Science and Technology. April 04, 2016
    The processes that authors use to publish their papers in journals can be analyzed in terms of field‐specific practices. How they select targeted publications can influence competitive relationships among journals. In this paper, the author quantifies the publishing choices of a set of scholars to confirm this ecological perspective. The results indicate a strong focus on a small number of journals. A measure of author publishing choices was used to define four ecological characteristics: coverage, coreness, exclusivity, and journal overlap. Several types of journals indexed in the Information Science and Library Science section of the Journal Citation Reports are compared in terms of their ecological characteristics. The data show that some journals cover large numbers of authors, but compete with other journals in subcommunities. Some journals with author profiles similar to those of high‐ranking journals lost potential submissions. Others with low coverage, high coreness, and high exclusivity were found to have groups of “fans” who used them for all of their submissions, but still exhibited a strong need to sustain their exclusivity. It is hoped that the method and results presented in this paper will provide useful information for editorial boards interested in managing their submissions according to author profiles.
    April 04, 2016   doi: 10.1002/asi.23657   open full text
  • How collaborators make sense of tasks together: A comparative analysis of collaborative sensemaking behavior in collaborative information‐seeking tasks.
    Yihan Tao, Anastasios Tombros.
    Journal of the American Society for Information Science and Technology. March 28, 2016
    Collaborative information‐seeking (CIS) tasks, such as holiday planning, academic research, medical/health information seeking, cannot be tackled without making sense of the task and the encountered information together with collaborators, that is, collaborative sensemaking. In CIS, collaborative sensemaking is an important but understudied aspect. A thorough understanding of collaborative sensemaking behavior in CIS tasks is essential to develop tools to support collaborative sensemaking activities in CIS. In this article, we investigate the general patterns and differences in collaborative sensemaking behavior in travel planning and topic research tasks using the data from 2 observational user studies. The results show the common stages of the collaborative sensemaking process and the differences in users' collaborative sensemaking strategies and activities between the 2 tasks. This comparative study enhances our understanding of the collaborative sensemaking process in CIS tasks and the differences in user's sensemaking behavior according to tasks, and describes implications for supporting collaborative sensemaking behavior in CIS tasks.
    March 28, 2016   doi: 10.1002/asi.23693   open full text
  • Information management in the humanities: Scholarly processes, tools, and the construction of personal collections.
    Ciaran B. Trace, Unmil P. Karadkar.
    Journal of the American Society for Information Science and Technology. March 28, 2016
    The promise and challenge of information management in the humanities has garnered a great deal of attention and interest (Bulger et al., ; Freiman et al., ; Trace & Karadkar, ; University of Minnesota Libraries, ; Wilson & Patrick, ). Research libraries and archives, as well as groups from within the humanities disciplines themselves, are being tasked with providing robust support for information management practices, including helping to engage humanities scholars with appropriate digital technologies in ways that are sensitive to disciplinary‐based cultures and practices. However, significant barriers impede this work, primarily because the infrastructure (services, tools, and collaborative networks) to support scholarly information management is still under development. Under the aegis of the Scholars Tracking Archival Resources (STAR) project we are studying how humanities scholars gather and manage primary source materials with a goal of developing software to support their information management practices. This article reports the findings from our interviews with 26 humanities scholars, in conjunction with a set of initial requirements for a mobile application that will support scholars in capturing documents, recreating the archival context, and uploading these documents to cloud storage for access and sharing from other devices.
    March 28, 2016   doi: 10.1002/asi.23678   open full text
  • Beyond university rankings? Generating new indicators on universities by linking data in open platforms.
    Cinzia Daraio, Andrea Bonaccorsi.
    Journal of the American Society for Information Science and Technology. March 28, 2016
    The need for new indicators on universities is growing enormously. Governments and decision makers at all levels are faced with the huge opportunities generated by the availability of new knowledge and information and, simultaneously, are pressed by tight budget constraints. University rankings, in particular, are attracting policy and media attention, but at the same time receive harsh methodological criticism. After summarizing the main criticisms of rankings, we describe 2 trends in the user requirements for indicators; namely, granularity and cross‐referencing. We then suggest that a change in the paradigm of the design and production of indicators is needed. The traditional approach is one that not only leverages on the existing data but also suggests heavy investment to integrate existing databases and to build up tailored indicators. We show, based on the European universities case, how the intelligent integration of existing data may lead to an open‐linked data platform which permits the construction of new indicators. The power of the approach derives from the ability to combine heterogeneous sources of data to generate indicators that address a variety of user requirements without the need to design indicators on a custom basis.
    March 28, 2016   doi: 10.1002/asi.23679   open full text
  • The use of a graph‐based system to improve bibliographic information retrieval: System design, implementation, and evaluation.
    Yongjun Zhu, Erjia Yan, Il‐Yeol Song.
    Journal of the American Society for Information Science and Technology. March 28, 2016
    In this article, we propose a graph‐based interactive bibliographic information retrieval system—GIBIR. GIBIR provides an effective way to retrieve bibliographic information. The system represents bibliographic information as networks and provides a form‐based query interface. Users can develop their queries interactively by referencing the system‐generated graph queries. Complex queries such as “papers on information retrieval, which were cited by John's papers that had been presented in SIGIR” can be effectively answered by the system. We evaluate the proposed system by developing another relational database‐based bibliographic information retrieval system with the same interface and functions. Experiment results show that the proposed system executes the same queries much faster than the relational database‐based system, and on average, our system reduced the execution time by 72% (for 3‐node query), 89% (for 4‐node query), and 99% (for 5‐node query).
    March 28, 2016   doi: 10.1002/asi.23677   open full text
  • ResearchGate articles: Age, discipline, audience size, and impact.
    Mike Thelwall, Kayvan Kousha.
    Journal of the American Society for Information Science and Technology. March 28, 2016
    The large multidisciplinary academic social website ResearchGate aims to help academics to connect with each other and to publicize their work. Despite its popularity, little is known about the age and discipline of the articles uploaded and viewed in the site and whether publication statistics from the site could be useful impact indicators. In response, this article assesses samples of ResearchGate articles uploaded at specific dates, comparing their views in the site to their Mendeley readers and Scopus‐indexed citations. This analysis shows that ResearchGate is dominated by recent articles, which attract about three times as many views as older articles. ResearchGate has uneven coverage of scholarship, with the arts and humanities, health professions, and decision sciences poorly represented and some fields receiving twice as many views per article as others. View counts for uploaded articles have low to moderate positive correlations with both Scopus citations and Mendeley readers, which is consistent with them tending to reflect a wider audience than Scopus‐publishing scholars. Hence, for articles uploaded to the site, view counts may give a genuinely new audience indicator.
    March 28, 2016   doi: 10.1002/asi.23675   open full text
  • Health information technologies: From hazardous to the dark side.
    Carol Saunders, Anne F. Rutkowski, Jon Pluyter, Ronald Spanjers.
    Journal of the American Society for Information Science and Technology. March 28, 2016
    This article explores the effects of health information technologies (HIT) in operating rooms (ORs). When functioning well, HIT are a boon to mankind. However, HIT in the OR also create hazards for patients for a number of interrelated reasons. We introduce 5 interrelated components of hazard situations for medical teams operating in the OR: complexity, overload/underload, inadequate individual training, inadequate training of medical teams, and overconfidence of surgeons. These components of hazard situations in the OR may negatively impact patient safety. We discuss implications, especially in terms of individuals and medical teams in the OR, as well as work substitution as a broader aspect of the potential dark side of health IT.
    March 28, 2016   doi: 10.1002/asi.23671   open full text
  • Dimensions of trust in scholarly communication: Problematizing peer review in the aftermath of John Bohannon's “Sting” in science.
    Jutta Haider, Fredrik Åström.
    Journal of the American Society for Information Science and Technology. March 28, 2016
    This study investigates online material published in reaction to a Science Magazine report showing the absence of peer‐review and editorial processes in a set of fee‐charging open access journals in biology. Quantitative and qualitative textual analyses are combined to map conceptual relations in these reactions, and to explore how understandings of scholarly communication and publishing relate to specific conceptualizations of science and of the hedging of scientific knowledge. A discussion of the connection of trust and scientific knowledge and of the role of peer review for establishing and communicating this connection provides for the theoretical and topical framing. Special attention is paid to the pervasiveness of digital technologies in formal scholarly communication processes. Three dimensions of trust are traced in the material analyzed: (a) trust through personal experience and informal knowledge, (b) trust through organized, internal control, (c) trust through form. The article concludes by discussing how certain understandings of the conditions for trust in science are challenged by perceptions of possibilities for deceit in digital environments.
    March 28, 2016   doi: 10.1002/asi.23669   open full text
  • Contributions to conceptual growth: The elaboration of Ellis's model for information‐seeking behavior.
    Reijo Savolainen.
    Journal of the American Society for Information Science and Technology. March 24, 2016
    Using Ellis's seminal model of information seeking as an example, this study demonstrates how the elaborations made to the original framework since the late 1980s have contributed to conceptual growth in information‐seeking studies. To this end, nine key studies elaborating Ellis's model were scrutinized by conceptual analysis. The findings indicate that the elaborations are based on two main approaches: adding novel, context‐specific components in the model and redefining and restructuring the components. The elaborations have contributed to conceptual growth in three major ways. First, integrating formerly separate parts of knowledge; second, generalizing and explaining lower abstraction‐level knowledge through higher‐level constructs; and third, expanding knowledge by identifying new characteristics of the object of study, that is, information‐seeking behavior. Further elaboration of Ellis's model toward a theory would require more focused attempts to test hypotheses in work‐related environments in particular.
    March 24, 2016   doi: 10.1002/asi.23680   open full text
  • Leveraging metadata to recommend keywords for academic papers.
    Ido Blank, Lior Rokach, Guy Shani.
    Journal of the American Society for Information Science and Technology. March 15, 2016
    Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.
    March 15, 2016   doi: 10.1002/asi.23571   open full text
  • A general multiview framework for assessing the quality of collaboratively created content on web 2.0.
    Daniel H. Dalip, Marcos André Gonçalves, Marco Cristo, Pável Calado.
    Journal of the American Society for Information Science and Technology. March 14, 2016
    User‐generated content is one of the most interesting phenomena of current published media, as users are now able not only to consume, but also to produce content in a much faster and easier manner. However, such freedom also carries concerns about content quality. In this work, we propose an automatic framework to assess the quality of collaboratively generated content. Quality is addressed as a multidimensional concept, modeled as a combination of independent assessments, each regarding different quality dimensions. Accordingly, we adopt a machine‐learning (ML)‐based multiview approach to assess content quality. We perform a thorough analysis of our framework on two different domains: Questions and Answer Forums and Collaborative Encyclopedias. This allowed us to better understand when and how the proposed multiview approach is able to provide accurate quality assessments. Our main contributions are: (a) a general ML multiview framework that takes advantage of different views of quality indicators; (b) the improvement (up to 30%) in quality assessment over the best state‐of‐the‐art baseline methods; (c) a thorough feature and view analysis regarding impact, informativeness, and correlation, based on two distinct domains.
    March 14, 2016   doi: 10.1002/asi.23650   open full text
  • Assessing geographic relevance for mobile search: A computational model and its validation via crowdsourcing.
    Tumasch Reichenbacher, Stefano De Sabbata, Ross S. Purves, Sara I. Fabrikant.
    Journal of the American Society for Information Science and Technology. March 09, 2016
    The selection and retrieval of relevant information from the information universe on the web is becoming increasingly important in addressing information overload. It has also been recognized that geography is an important criterion of relevance, leading to the research area of geographic information retrieval. As users increasingly retrieve information in mobile situations, relevance is often related to geographic features in the real world as well as their representation in web documents. We present 2 methods for assessing geographic relevance (GR) of geographic entities in a mobile use context that include the 5 criteria topicality, spatiotemporal proximity, directionality, cluster, and colocation. To determine the effectiveness and validity of these methods, we evaluate them through a user study conducted on the Amazon Mechanical Turk crowdsourcing platform. An analysis of relevance ranks for geographic entities in 3 scenarios produced by two GR methods, 2 baseline methods, and human judgments collected in the experiment reveal that one of the GR methods produces similar ranks as human assessors.
    March 09, 2016   doi: 10.1002/asi.23625   open full text
  • Shared values, new vision: Collaboration and communities of practice in virtual reference and SQA.
    Marie L. Radford, Lynn Silipigni Connaway, Stephanie Mikitish, Mark Alpert, Chirag Shah, Nicole A. Cooke.
    Journal of the American Society for Information Science and Technology. March 06, 2016
    This investigation of new approaches to improving collaboration, user/librarian experiences, and sustainability for virtual reference services (VRS) reports findings from a grant project titled “Cyber Synergy: Seeking Sustainability between Virtual Reference and Social Q&A Sites” (Radford, Connaway, & Shah, –2014). In‐depth telephone interviews with 50 VRS librarians included questions on collaboration, referral practices, and attitudes toward Social Question and Answer (SQA) services using the Critical Incident Technique (Flanagan, ). The Community of Practice (CoP) (Wenger, ; Davies, ) framework was found to be a useful conceptualization for understanding VRS professionals' approaches to their work. Findings indicate that participants usually refer questions from outside of their area of expertise to other librarians, but occasionally refer them to nonlibrarian experts. These referrals are made possible because participants believe that other VRS librarians are qualified and willing collaborators. Barriers to collaboration include not knowing appropriate librarians/experts for referral, inability to verify credentials, and perceived unwillingness to collaborate. Facilitators to collaboration include knowledge of appropriate collaborators who are qualified and willingness to refer. Answers from SQA services were perceived as less objective and authoritative, but participants were open to collaborating with nonlibrarian experts with confirmation of professional expertise or extensive knowledge.
    March 06, 2016   doi: 10.1002/asi.23668   open full text
  • Measuring technological distance for patent mapping.
    Bowen Yan, Jianxi Luo.
    Journal of the American Society for Information Science and Technology. March 06, 2016
    Recent works in the information science literature have presented cases of using patent databases and patent classification information to construct network maps of technology fields, which aim to aid in competitive intelligence analysis and innovation decision making. Constructing such a patent network requires a proper measure of the distance between different classes of patents in the patent classification systems. Despite the existence of various distance measures in the literature, it is unclear how to consistently assess and compare them, and which ones to select for constructing patent technology network maps. This ambiguity has limited the development and applications of such technology maps. Herein, we propose to compare alternative distance measures and identify the superior ones by analyzing the differences and similarities in the structural properties of resulting patent network maps. Using United States patent data from 1976 to 2006 and the International Patent Classification (IPC) system, we compare 12 representative distance measures, which quantify interfield knowledge base proximity, field‐crossing diversification likelihood or frequency of innovation agents, and co‐occurrences of patent classes in the same patents. Our comparative analyses suggest the patent technology network maps based on normalized coreference and inventor diversification likelihood measures are the best representatives.
    March 06, 2016   doi: 10.1002/asi.23664   open full text
  • The application of bibliometrics to research evaluation in the humanities and social sciences: An exploratory study using normalized Google Scholar data for the publications of a research institute.
    Lutz Bornmann, Andreas Thor, Werner Marx, Hermann Schier.
    Journal of the American Society for Information Science and Technology. March 03, 2016
    In the humanities and social sciences, bibliometric methods for the assessment of research performance are (so far) less common. This study uses a concrete example in an attempt to evaluate a research institute from the area of social sciences and humanities with the help of data from Google Scholar (GS). In order to use GS for a bibliometric study, we developed procedures for the normalization of citation impact, building on the procedures of classical bibliometrics. In order to test the convergent validity of the normalized citation impact scores, we calculated normalized scores for a subset of the publications based on data from the Web of Science (WoS) and Scopus. Even if scores calculated with the help of GS and the WoS/Scopus are not identical for the different publication types (considered here), they are so similar that they result in the same assessment of the institute investigated in this study: For example, the institute's papers whose journals are covered in the WoS are cited at about an average rate (compared with the other papers in the journals).
    March 03, 2016   doi: 10.1002/asi.23627   open full text
  • Time‐based tags for fiction movies: comparing experts to novices using a video labeling game.
    Liliana Melgar Estrada, Michiel Hildebrand, Victor de Boer, Jacco van Ossenbruggen.
    Journal of the American Society for Information Science and Technology. March 02, 2016
    The cultural heritage sector has embraced social tagging as a way to increase both access to online content and to engage users with their digital collections. In this article, we build on two current lines of research. (a) We use Waisda?, an existing labeling game, to add time‐based annotations to content. (b) In this context, we investigate the role of experts in human‐based computation (nichesourcing). We report on a small‐scale experiment in which we applied Waisda? to content from film archives. We study the differences in the type of time‐based tags between experts and novices for film clips in a crowdsourcing setting. The findings show high similarity in the number and type of tags (mostly factual). In the less frequent tags, however, experts used more domain‐specific terms. We conclude that competitive games are not suited to elicit real expert‐level descriptions. We also confirm that providing guidelines, based on conceptual frameworks that are more suited to moving images in a time‐based fashion, could result in increasing the quality of the tags, thus allowing for creating more tag‐based innovative services for online audiovisual heritage.
    March 02, 2016   doi: 10.1002/asi.23656   open full text
  • ASK: A taxonomy of accuracy, social, and knowledge information seeking posts in social question and answering.
    Zhe Liu, Bernard J. Jansen.
    Journal of the American Society for Information Science and Technology. March 02, 2016
    Many people turn to their social networks to find information through the practice of question and answering. We believe it is necessary to use different answering strategies based on the type of questions to accommodate the different information needs. In this research, we propose the ASK taxonomy that categorizes questions posted on social networking sites into three types according to the nature of the questioner's inquiry of accuracy, social, or knowledge. To automatically decide which answering strategy to use, we develop a predictive model based on ASK question types using question features from the perspectives of lexical, topical, contextual, and syntactic as well as answer features. By applying the classifier on an annotated data set, we present a comprehensive analysis to compare questions in terms of their word usage, topical interests, temporal and spatial restrictions, syntactic structure, and response characteristics. Our research results show that the three types of questions exhibited different characteristics in the way they are asked. Our automatic classification algorithm achieves an 83% correct labeling result, showing the value of the ASK taxonomy for the design of social question and answering systems.
    March 02, 2016   doi: 10.1002/asi.23655   open full text
  • Factors motivating, demotivating, or impeding information seeking and use by people with type 2 diabetes: A call to work toward preventing, identifying, and addressing incognizance.
    Beth St. Jean.
    Journal of the American Society for Information Science and Technology. March 02, 2016
    Type 2 diabetes has grown increasingly prevalent over recent decades, now affecting nearly 400 million people worldwide; however, nearly half of these individuals have no idea they have it. Consumer health information behavior (CHIB), which encompasses people's health‐related information needs as well as the ways in which they interact (or do not interact) with health‐related information, plays an important role in people's ability to prevent, cope with, and successfully manage a serious chronic disease across time. In this mixed‐method longitudinal study, the CHIB of 34 people with type 2 diabetes is explored with the goal of identifying the factors that motivate, demotivate, or impede their diabetes‐related information seeking and use. The findings reveal that while these processes can be motivated by many different factors and can lead to important benefits, there are significant barriers (such as “incognizance,” defined herein as having an information need that one is not aware of) that may demotivate or impede their information seeking and use. The implications of these findings are discussed, focusing on how we might work toward preventing, identifying, and addressing incognizance among this population, ensuring they have the information they need when it can be of the most use to them.
    March 02, 2016   doi: 10.1002/asi.23652   open full text
  • The influence of diversity and experience on the effects of crowd size.
    Lionel P. Robert, Daniel M. Romero.
    Journal of the American Society for Information Science and Technology. March 02, 2016
    One advantage of crowds over traditional teams is that crowds enable the assembling of a large number of individuals to address problems. The literature is unclear, however, about when crowd size leads to better outcomes. To better understand the effects of crowd size we conducted a study on the retention and performance of 4,317 articles in the WikiProject Film community. Results indicate that crowd composition, specifically diversity and experience, is vital to understanding when size leads to better retention and performance. Crowd size was positively related to retention and performance when crowds were high in diversity and experience. Retention was important to determining when crowd size led to better performance. Crowd size was positively related to performance when retention was low. Our results suggest that crowds benefit from their size when they are diverse, experienced, and have low retention rates.
    March 02, 2016   doi: 10.1002/asi.23653   open full text
  • A framework for evaluating multimodal music mood classification.
    Xiao Hu, Kahyun Choi, J. Stephen Downie.
    Journal of the American Society for Information Science and Technology. January 13, 2016
    This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio‐only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.
    January 13, 2016   doi: 10.1002/asi.23649   open full text
  • Predicting the impact of scientific concepts using full‐text features.
    Kathy McKeown, Hal Daume, Snigdha Chaturvedi, John Paparrizos, Kapil Thadani, Pablo Barrio, Or Biran, Suvarna Bothe, Michael Collins, Kenneth R. Fleischmann, Luis Gravano, Rahul Jha, Ben King, Kevin McInerney, Taesun Moon, Arvind Neelakantan, Diarmuid O'Seaghdha, Dragomir Radev, Clay Templeton, Simone Teufel.
    Journal of the American Society for Information Science and Technology. January 11, 2016
    New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time‐series analysis. The results from two large‐scale experiments with 3.8 million full‐text articles and 48 million metadata records support the conclusion that full‐text features are significantly more useful for prediction than metadata‐only features and that the most accurate predictions result from combining the metadata and full‐text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full‐text features.
    January 11, 2016   doi: 10.1002/asi.23612   open full text
  • The effect of personalization provider characteristics on privacy attitudes and behaviors: An Elaboration Likelihood Model approach.
    Alfred Kobsa, Hichang Cho, Bart P. Knijnenburg.
    Journal of the American Society for Information Science and Technology. January 11, 2016
    Many computer users today value personalization but perceive it in conflict with their desire for privacy. They therefore tend not to disclose data that would be useful for personalization. We investigate how characteristics of the personalization provider influence users' attitudes towards personalization and their resulting disclosure behavior. We propose an integrative model that links these characteristics via privacy attitudes to actual disclosure behavior. Using the Elaboration Likelihood Model, we discuss in what way the influence of the manipulated provider characteristics is different for users engaging in different levels of elaboration (represented by the user characteristics of privacy concerns and self‐efficacy). We find particularly that (a) reputation management is effective when users predominantly use the peripheral route (i.e., a low level of elaboration), but much less so when they predominantly use the central route (i.e., a high level of elaboration); (b) client‐side personalization has a positive impact when users use either route; and (c) personalization in the cloud does not work well in either route. Managers and designers can use our results to instill more favorable privacy attitudes and increase disclosure, using different techniques that depend on each user's levels of privacy concerns and privacy self‐efficacy.
    January 11, 2016   doi: 10.1002/asi.23629   open full text
  • Using course‐subject Co‐occurrence (CSCO) to reveal the structure of an academic discipline: A framework to evaluate different inputs of a domain map.
    Peter A. Hook.
    Journal of the American Society for Information Science and Technology. January 11, 2016
    This article proposes, exemplifies, and validates the use of course‐subject co‐occurrence (CSCO) data to generate topic maps of an academic discipline. A CSCO event is when 2 course‐subjects are taught in the same academic year by the same teacher. A total of 61,856 CSCO events were extracted from the 2010–11 directory of the American Association of Law Schools and used to visualize the structure of law school education in the United States. Different normalization, ordination (layout), and clustering algorithms were compared and the best performing algorithm of each type was used to generate the final map. Validation studies demonstrate that CSCO produces topic maps that are consistent with expert opinion and 4 other indicators of the topical similarity of law school course‐subjects. This research is the first to use CSCO to produce a visualization of a domain. It is also the first to use an expanded, multi‐part gold standard to evaluate the validity of domain maps and the intermediate steps in their creation. It is suggested that the framework used herein may be adopted for other studies that compare different inputs of a domain map in order to empirically derive the best maps as measured against extrinsic sources of topical similarity (gold standards).
    January 11, 2016   doi: 10.1002/asi.23630   open full text
  • Can “hot spots” in the sciences be mapped using the dynamics of aggregated journal–journal citation Relations?
    Loet Leydesdorff, Wouter Nooy.
    Journal of the American Society for Information Science and Technology. January 11, 2016
    Using 3 years of the Journal Citation Reports (2011, 2012, and 2013), indicators of transitions in 2012 (between 2011 and 2013) were studied using methodologies based on entropy statistics. Changes can be indicated at the level of journals using the margin totals of entropy production along the row or column vectors, but also at the level of links among journals by importing the transition matrices into network analysis and visualization programs (and using community‐finding algorithms). Seventy‐four journals were flagged in terms of discontinuous changes in their citations, but 3,114 journals were involved in “hot” links. Most of these links are embedded in a main component; 78 clusters (containing 172 journals) were flagged as potential “hot spots” emerging at the network level. An additional finding was that PLoS ONE introduced a new communication dynamic into the database. The limitations of the methodology were elaborated using an example. The results of the study indicate where developments in the citation dynamics can be considered as significantly unexpected. This can be used as heuristic information, but what a “hot spot” in terms of the entropy statistics of aggregated citation relations means substantively can be expected to vary from case to case.
    January 11, 2016   doi: 10.1002/asi.23634   open full text
  • Scholarly publication and collaboration in Brazil: The role of geography.
    Otávio José Guerci Sidone, Eduardo Amaral Haddad, Jesús Pascual Mena‐Chalco.
    Journal of the American Society for Information Science and Technology. January 11, 2016
    Brazilian scholarly output has rapidly increased, accompanied by the expansion of domestic collaborations. In this paper, we identify spatial patterns of collaboration in Brazil and measure the role of geographic proximity in determining the interaction among researchers. Using a database comprising more than one million researchers and seven million publications, we consolidated information on interregional research collaboration in terms of scientific coauthorship networks among 4,615 municipalities during the period between 1992 and 2009, which allowed us to analyze a range of data unprecedented in the literature. The effects of geographic distance on collaboration were measured for different areas by estimating spatial interaction models. The main results provide strong evidence of geographic deconcentration of collaboration in recent years, with increased participation of authors in scientifically less traditional regions, such as south and northeast Brazil. Distance remains a significant factor in determining the intensity of knowledge flow in collaboration networks in Brazil, as an increase of 100 km between two researchers reduces the probability of collaboration by an average of 16%, and there is no evidence that the effect of distance has diminished over time, although the magnitude of such effects varies among networks of different areas.
    January 11, 2016   doi: 10.1002/asi.23635   open full text
  • A simple and efficient algorithm for authorship verification.
    Mirco Kocher, Jacques Savoy.
    Journal of the American Society for Information Science and Technology. January 11, 2016
    This paper describes and evaluates an unsupervised and effective authorship verification model called Spatium‐L1. As features, we suggest using the 200 most frequent terms of the disputed text (isolated words and punctuation symbols). Applying a simple distance measure and a set of impostors, we can determine whether or not the disputed text was written by the proposed author. Moreover, based on a simple rule we can define when there is enough evidence to propose an answer or when the attribution scheme is unable to make a decision with a high degree of certainty. Evaluations based on 6 test collections (PAN CLEF 2014 evaluation campaign) indicate that Spatium‐L1 usually appears in the top 3 best verification systems, and on an aggregate measure, presents the best performance. The suggested strategy can be adapted without any problem to different Indo‐European languages (such as English, Dutch, Spanish, and Greek) or genres (essay, novel, review, and newspaper article).
    January 11, 2016   doi: 10.1002/asi.23648   open full text
  • Predicting information searchers' topic knowledge at different search stages.
    Jingjing Liu, Chang Liu, Nicholas J. Belkin.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    As a significant contextual factor in information search, topic knowledge has been gaining increased research attention. We report on a study of the relationship between information searchers' topic knowledge and their search behaviors, and on an attempt to predict searchers' topic knowledge from their behaviors during the search. Data were collected in a controlled laboratory experiment with 32 undergraduate journalism student participants, each searching on 4 tasks of different types. In general, behavioral variables were not found to have significant differences between users with high and low levels of topic knowledge, except the mean first dwell time on search result pages. Several models were built to predict topic knowledge using behavioral variables calculated at 3 different stages of search episodes: the first‐query‐round, the middle point of the search, and the end point. It was found that a model using some search behaviors observed in the first query round led to satisfactory prediction results. The results suggest that early‐session search behaviors can be used to predict users' topic knowledge levels, allowing personalization of search for users with different levels of topic knowledge, especially in order to assist users with low topic knowledge.
    December 23, 2015   doi: 10.1002/asi.23606   open full text
  • Liberating interdisciplinarity from myth. An exploration of the discursive construction of identities in information studies.
    Dorte Madsen.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    Recent research in information studies suggests that the tradition of seeing the discipline as weak is still alive and kicking. This is a problem because the discourse of the weak discipline creates conceptual confusion in relation to interdisciplinarity. Considering the growth of the iSchools and what is assumed to be a major institutional redrawing of boundaries, there is a pressing need to conceptualize interdisciplinary practices and boundary work. This paper explores the “weak” discipline through a discourse analytical lens and identifies a myth. Perceiving the discipline as weak is part of a myth, fueled by the ideal of a unitary discipline; the ideal discipline has strong boundaries, and as long as the discourse continues to focus on a need for boundaries, the only available discourse is one that articulates the discipline as weak. Thus, the myth is a vicious circle that can be broken if weakness is no longer ascribed to the discipline by tradition. The paper offers an explanation of the workings of the myth so that its particular way of interpreting the world does not mislead us when theorizing interdisciplinarity. This is a conceptual paper, and the examples serve as an empirical backdrop to the conceptual argument.
    December 23, 2015   doi: 10.1002/asi.23622   open full text
  • Strategic intelligence on emerging technologies: Scientometric overlay mapping.
    Daniele Rotolo, Ismael Rafols, Michael M. Hopkins, Loet Leydesdorff.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    This paper examines the use of scientometric overlay mapping as a tool of “strategic intelligence” to aid the governing of emerging technologies. We develop an integrative synthesis of different overlay mapping techniques and associated perspectives on technological emergence across geographical, social, and cognitive spaces. To do so, we longitudinally analyze (with publication and patent data) three case studies of emerging technologies in the medical domain. These are RNA interference (RNAi), human papillomavirus (HPV) testing technologies for cervical cancer, and thiopurine methyltransferase (TPMT) genetic testing. Given the flexibility (i.e., adaptability to different sources of data) and granularity (i.e., applicability across multiple levels of data aggregation) of overlay mapping techniques, we argue that these techniques can favor the integration and comparison of results from different contexts and cases, thus potentially functioning as a platform for “distributed” strategic intelligence for analysts and decision makers.
    December 23, 2015   doi: 10.1002/asi.23631   open full text
  • The effects of research resources on international collaboration in the astronomy community.
    Han‐Wen Chang, Mu‐Hsuan Huang.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    This study examines whether an institution's research resources affect its centrality and relationships in international collaboration among 606 astronomical institutions worldwide. The findings support our theoretical hypotheses that an institution's research resources are positively related to its central position in the network. Astronomical institutions with superior resources, such as being equipped with international observational facilities and having substantial research manpower, tend to have more foreign partners (high degree centrality) and play an influential role (high betweenness centrality) in the international collaboration network. An institution becomes more and more active in international collaborations as its research population expands. In terms of the relationship, which is captured by an actor institution's co‐authorship preference for each partner in the network, the effect of research resources is not as significant as expected. We found that astronomical institutions are not necessarily preferentially co‐authoring with partners that have better research resources. In addition, this study indicates that geographic closeness (or “geographic proximity”) largely affects the occurrence of international collaboration. The investigated institutions apparently prefer partners from neighboring countries. This finding gives an indication of the phenomenon of “regional homophily” in the international collaboration network.
    December 23, 2015   doi: 10.1002/asi.23592   open full text
  • Estimating open access mandate effectiveness: The MELIBEA score.
    Philippe Vincent‐Lamarre, Jade Boivin, Yassine Gargouri, Vincent Larivière, Stevan Harnad.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    MELIBEA is a directory of institutional open‐access policies for research output that uses a composite formula with eight weighted conditions to estimate the “strength” of open access (OA) mandates (registered in ROARMAP). We analyzed total Web of Science‐(WoS)‐indexed publication output in years 2011–2013 for 67 institutions in which OA was mandated to estimate the mandates' effectiveness: How well did the MELIBEA score and its individual conditions predict what percentage of the WoS‐indexed articles is actually deposited in each institution's OA repository, and when? We found a small but significant positive correlation (0.18) between the MELIBEA “strength” score and deposit percentage. For three of the eight MELIBEA conditions (deposit timing, internal use, and opt‐outs), one value of each was strongly associated with deposit percentage or latency ([a] immediate deposit required; [b] deposit required for performance evaluation; [c] unconditional opt‐out allowed for the OA requirement but no opt‐out for deposit requirement). When we updated the initial values and weights of the MELIBEA formula to reflect the empirical association we had found, the score's predictive power for mandate effectiveness doubled (0.36). There are not yet enough OA mandates to test further mandate conditions that might contribute to mandate effectiveness, but the present findings already suggest that it would be productive for existing and future mandates to adopt the three identified conditions so as to maximize their effectiveness, and thereby the growth of OA.
    December 23, 2015   doi: 10.1002/asi.23601   open full text
  • Herd behavior in consumers’ adoption of online reviews.
    Xiao‐Liang Shen, Kem Z.K. Zhang, Sesia J. Zhao.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    It has been demonstrated that online consumer reviews are an important source of information that affect individuals’ purchase decision making. To understand the influence of online reviews, this study extends prior research on information adoption by incorporating the perspective of herd behavior. We develop and empirically test a research model using data collected from an existing book review site. We report 2 major findings. First, argument quality and source credibility predict information usefulness, which affects the adoption of online reviews. Second, we determine that the adoption of online reviews is also influenced by 2 herd factors, namely, discounting own information and imitating others. We further identify the key determinants of these herd factors, including background homophily and attitude homophily. The theoretical and practical implications are discussed.
    December 23, 2015   doi: 10.1002/asi.23602   open full text
  • Citation analysis as a literature search method for systematic reviews.
    Christopher W. Belter.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    Systematic reviews are essential for evaluating biomedical treatment options, but the growing size and complexity of the available biomedical literature combined with the rigor of the systematic review method mean that systematic reviews are extremely difficult and labor‐intensive to perform. In this article, I propose a method of searching the literature by systematically mining the various types of citation relationships between articles. I then test the method by comparing its precision and recall to that of 14 published systematic reviews. The method successfully retrieved 74% of the studies included in these reviews and 90% of the studies it could reasonably be expected to retrieve. The method also retrieved fewer than half of the total number of publications retrieved by these reviews and can be performed in substantially less time. This suggests that the proposed method offers a promising complement to traditional text‐based methods of literature identification and retrieval for systematic reviews.
    December 23, 2015   doi: 10.1002/asi.23605   open full text
  • Automated arabic text classification with P‐Stemmer, machine learning, and a tailored news article taxonomy.
    Tarek Kanan, Edward A. Fox.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine‐learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)‐funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic‐speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P‐Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10‐fold cross‐validation and the Wilcoxon signed‐rank test, we showed that our approach to stemming and classification is superior to state‐of‐the‐art techniques.
    December 23, 2015   doi: 10.1002/asi.23609   open full text
  • Identification of nonliteral language in social media: A case study on sarcasm.
    Smaranda Muresan, Roberto Gonzalez‐Ibanez, Debanjan Ghosh, Nina Wacholder.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    With the rapid development of social media, spontaneously user‐generated content such as tweets and forum posts have become important materials for tracking people's opinions and sentiments online. A major hurdle for current state‐of‐the‐art automatic methods for sentiment analysis is the fact that human communication often involves the use of sarcasm or irony, where the author means the opposite of what she/he says. Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. Lack of naturally occurring utterances labeled for sarcasm is one of the key problems for the development of machine‐learning methods for sarcasm detection. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine‐learning effectiveness for identifying sarcastic utterances and we compare the performance of machine‐learning techniques and human judges on this task.
    December 23, 2015   doi: 10.1002/asi.23624   open full text
  • Understanding eye movements on mobile devices for better presentation of search results.
    Jaewon Kim, Paul Thomas, Ramesh Sankaranarayana, Tom Gedeon, Hwan‐Jin Yoon.
    Journal of the American Society for Information Science and Technology. December 23, 2015
    Compared to the early versions of smart phones, recent mobile devices have bigger screens that can present more web search results. Several previous studies have reported differences in user interaction between conventional desktop computer and mobile device‐based web searches, so it is imperative to consider the differences in user behavior for web search engine interface design on mobile devices. However, it is still unknown how the diversification of screen sizes on hand‐held devices affects how users search. In this article, we investigate search performance and behavior on three different small screen sizes: early smart phones, recent smart phones, and phablets. We found no significant difference with respect to the efficiency of carrying out tasks, however participants exhibited different search behaviors: less eye movement within top links on the larger screen, fast reading with some hesitation before choosing a link on the medium, and frequent use of scrolling on the small screen. This result suggests that the presentation of web search results for each screen needs to take into account differences in search behavior. We suggest several ideas for presentation design for each screen size.
    December 23, 2015   doi: 10.1002/asi.23628   open full text
  • Fuzzy retrieval for software reuse.
    Erin Colvin, Donald H. Kraft.
    Journal of the American Society for Information Science and Technology. October 29, 2015
    Finding software for reuse is a problem that programmers face. To reuse code that has been proven to work can increase any programmer's productivity, benefit corporate productivity, and also increase the stability of software programs. This paper shows that fuzzy retrieval has an improved retrieval performance over typical Boolean retrieval. Various methods of fuzzy information retrieval implementation and their use for software reuse will be examined. A deeper explanation of the fundamentals of designing a fuzzy information retrieval system for software reuse is presented. Future research options and necessary data storage systems are explored.
    October 29, 2015   doi: 10.1002/asi.23584   open full text
  • The role of team cognition in collaborative information seeking.
    Nathan J. McNeese, Madhu C. Reddy.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    Collaborative information seeking (CIS) is of growing importance in the information sciences and human–computer interaction (HCI) research communities. Current research has primarily focused on examining the social and interactional aspects of CIS in organizational or other settings and developing technical approaches to support CIS activities. As we continue to develop a better understanding of the interactional aspects of CIS, we need also start to examine the cognitive aspects of CIS. In particular, we need to understand CIS from a team cognition perspective. To examine how team cognition develops during CIS, we conducted a study using observations and interviews of student teams engaged in colocated CIS tasks in a laboratory setting. We found that a variety of awareness mechanisms play a key role in the development of team cognition during CIS. Specifically, we identify that search, information, and social methods of awareness are critical to developing team cognition during CIS. We discuss why awareness is important for team cognition, how team cognition comprises both individual and team‐level cognitive activities, and the importance of examining both interaction and cognition to truly understand team cognition.
    October 22, 2015   doi: 10.1002/asi.23614   open full text
  • The MIREX grand challenge: A framework of holistic user‐experience evaluation in music information retrieval.
    Xiao Hu, Jin Ha Lee, David Bainbridge, Kahyun Choi, Peter Organisciak, J. Stephen Downie.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    Music Information Retrieval (MIR) evaluation has traditionally focused on system‐centered approaches where components of MIR systems are evaluated against predefined data sets and golden answers (i.e., ground truth). There are two major limitations of such system‐centered evaluation approaches: (a) The evaluation focuses on subtasks in music information retrieval, but not on entire systems and (b) users and their interactions with MIR systems are largely excluded. This article describes the first implementation of a holistic user‐experience evaluation in MIR, the MIREX Grand Challenge, where complete MIR systems are evaluated, with user experience being the single overarching goal. It is the first time that complete MIR systems have been evaluated with end users in a realistic scenario. We present the design of the evaluation task, the evaluation criteria and a novel evaluation interface, and the data‐collection platform. This is followed by an analysis of the results, reflection on the experience and lessons learned, and plans for future directions.
    October 22, 2015   doi: 10.1002/asi.23618   open full text
  • University citation distributions.
    Antonio Perianes‐Rodriguez, Javier Ruiz‐Castillo.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    We investigate the citation distributions of the 500 universities in the 2013 edition of the Leiden Ranking produced by The Centre for Science and Technological Studies. We use a Web of Science data set consisting of 3.6 million articles published in 2003 to 2008 and classified into 5,119 clusters. The main findings are the following. First, the universality claim, according to which all university‐citation distributions, appropriately normalized, follow a single functional form, is not supported by the data. Second, the 500 university citation distributions are all highly skewed and very similar. Broadly speaking, university citation distributions appear to behave as if they differ by a relatively constant scale factor over a large, intermediate part of their support. Third, citation‐impact differences between universities account for 3.85% of overall citation inequality. This percentage is greatly reduced when university citation distributions are normalized using their mean normalized citation scores (MNCSs) as normalization factors. Finally, regarding practical consequences, we only need a single explanatory model for the type of high skewness characterizing all university citation distributions, and the similarity of university citation distributions goes a long way in explaining the similarity of the university rankings obtained with the MNCS and the Top 10% indicator.
    October 22, 2015   doi: 10.1002/asi.23619   open full text
  • Towards an understanding of the relationship between disciplinary research cultures and open access repository behaviors.
    Jenny Fry, Valérie Spezi, Stephen Probets, Claire Creaser.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    This article explores the cultural characteristics of three open access (OA)‐friendly disciplines (physics, economics, and clinical medicine) and the ways in which those characteristics influence perceptions, motivations, and behaviors toward green OA. The empirical data are taken from two online surveys of European authors. Taking a domain analytic approach, the analysis draws on Becher and Trowler's (2001) and Whitley's (2000) theories to gain a deeper understanding of why OA repositories (OAR) play a particularly important role in the chosen disciplines. The surveys provided a unique opportunity to compare perceptions, motivations, and behaviors of researchers at the discipline level with the parent metadiscipline. It should be noted that participants were not drawn from a stratified sample of all the different subdisciplines that constitute each discipline, and therefore the generalizability of the findings to the discipline may be limited. The differential role of informal and formal communication in each of the three disciplines has shaped green OA practices. For physicists and economists, preprints are an essential feature of their respective OAR landscapes, whereas for clinical medics final published articles have a central role. In comparing the disciplines with their parent metadisciplines there were some notable similarities/differences, which have methodological implications for studying research cultures.
    October 22, 2015   doi: 10.1002/asi.23621   open full text
  • Is exploratory search different? A comparison of information search behavior for exploratory and lookup tasks.
    Kumaripaba Athukorala, Dorota Głowacka, Giulio Jacucci, Antti Oulasvirta, Jilles Vreeken.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    Exploratory search is an increasingly important activity yet challenging for users. Although there exists an ample amount of research into understanding exploration, most of the major information retrieval (IR) systems do not provide tailored and adaptive support for such tasks. One reason is the lack of empirical knowledge on how to distinguish exploratory and lookup search behaviors in IR systems. The goal of this article is to investigate how to separate the 2 types of tasks in an IR system using easily measurable behaviors. In this article, we first review characteristics of exploratory search behavior. We then report on a controlled study of 6 search tasks with 3 exploratory—comparison, knowledge acquisition, planning—and 3 lookup tasks—fact‐finding, navigational, question answering. The results are encouraging, showing that IR systems can distinguish the 2 search categories in the course of a search session. The most distinctive indicators that characterize exploratory search behaviors are query length, maximum scroll depth, and task completion time. However, 2 tasks are borderline and exhibit mixed characteristics. We assess the applicability of this finding by reporting on several classification experiments. Our results have valuable implications for designing tailored and adaptive IR systems.
    October 22, 2015   doi: 10.1002/asi.23617   open full text
  • Chatting through pictures? A classification of images tweeted in one week in the UK and USA.
    Mike Thelwall, Olga Goriunova, Farida Vis, Simon Faulkner, Anne Burns, Jim Aulich, Amalia Mas‐Bleda, Emma Stuart, Francesco D'Orazio.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    Twitter is used by a substantial minority of the populations of many countries to share short messages, sometimes including images. Nevertheless, despite some research into specific images, such as selfies, and a few news stories about specific tweeted photographs, little is known about the types of images that are routinely shared. In response, this article reports a content analysis of random samples of 800 images tweeted from the UK or USA during a week at the end of 2014. Although most images were photographs, a substantial minority were hybrid or layered image forms: phone screenshots, collages, captioned pictures, and pictures of text messages. About half were primarily of one or more people, including 10% that were selfies, but a wide variety of other things were also pictured. Some of the images were for advertising or to share a joke but in most cases the purpose of the tweet seemed to be to share the minutiae of daily lives, performing the function of chat or gossip, sometimes in innovative ways.
    October 22, 2015   doi: 10.1002/asi.23620   open full text
  • Do autocomplete functions reduce the impact of dyslexia on information‐searching behavior? The case of Google.
    Gerd Berget, Frode Eika Sandnes.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    Dyslexic users often do not exhibit spelling and reading skills at a level required to perform effective search. To explore whether autocomplete functions reduce the impact of dyslexia on information searching, 20 participants with dyslexia and 20 controls solved 10 predefined tasks in the search engine Google. Eye‐tracking and screen‐capture documented the searches. There were no significant differences between the dyslexic students and the controls in time usage, number of queries, query lengths, or the use of the autocomplete function. However, participants with dyslexia made more misspellings and looked less at the screen and the autocomplete suggestions lists while entering the queries. The results indicate that although the autocomplete function supported the participants in the search process, a more extensive use of the autocomplete function would have reduced misspellings. Further, the high tolerance for spelling errors considerably reduced the effect of dyslexia, and may be as important as the autocomplete function.
    October 22, 2015   doi: 10.1002/asi.23572   open full text
  • Keeping up to date: An academic researcher's information journey.
    Sheila Pontis, Ann Blandford, Elke Greifeneder, Hesham Attalla, David Neal.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    Keeping up to date with research developments is a central activity of academic researchers, but researchers face difficulties in managing the rapid growth of available scientific information. This study examined how researchers stay up to date, using the information journey model as a framework for analysis and investigating which dimensions influence information behaviors. We designed a 2‐round study involving semistructured interviews and prototype testing with 61 researchers with 3 levels of seniority (PhD student to professor). Data were analyzed following a semistructured qualitative approach. Five key dimensions that influence information behaviors were identified: level of seniority, information sources, state of the project, level of familiarity, and how well defined the relevant community is. These dimensions are interrelated and their values determine the flow of the information journey. Across all levels of professional expertise, researchers used similar hard (formal) sources to access content, while soft (interpersonal) sources were used to filter information. An important “pain point” that future information tools should address is helping researchers filter information at the point of need.
    October 22, 2015   doi: 10.1002/asi.23623   open full text
  • How to improve the sustainability of digital libraries and information Services?
    Gobiinda G. Chowdhury.
    Journal of the American Society for Information Science and Technology. October 22, 2015
    Arguing that environmental sustainability is a growing concern for digital information systems and services, this article proposes a simple method for estimation of the energy and environmental costs of digital libraries and information services. It is shown that several factors contribute to the overall energy and environmental costs of information and communication technology (ICT) in general and digital information systems and services in particular. It is also shown that end‐user energy costs play a key role in the overall environmental costs of a digital library or information service. It is argued that appropriate user research, transaction log analysis, user modeling, and better design and delivery of services can significantly reduce the user interaction time, and thus the environmental costs, of digital information systems and services, making them more sustainable.
    October 22, 2015   doi: 10.1002/asi.23599   open full text
  • Robustness of journal rankings by network flows with different amounts of memory.
    Ludvig Bohlin, Alcides Viamontes Esquivel, Andrea Lancichinetti, Martin Rosvall.
    Journal of the American Society for Information Science and Technology. October 13, 2015
    As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions influenced by journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network between a selection of journals and unavoidably depend on this selection. However, little is known about how robust rankings are to the selection of included journals. We compare the robustness of three journal rankings based on network flows induced on citation networks. They model pathways of researchers navigating the scholarly literature, stepping between journals and remembering their previous steps to different degrees: zero‐step memory as impact factor, one‐step memory as Eigenfactor, and two‐step memory, corresponding to zero‐, first‐, and second‐order Markov models of citation flow between journals. We conclude that higher‐order Markov models perform better and are more robust to the selection of journals. Whereas our analysis indicates that higher‐order models perform better, the performance gain for higher‐order Markov models comes at the cost of requiring more citation data over a longer time period.
    October 13, 2015   doi: 10.1002/asi.23582   open full text
  • Investigating the role of semantic priming in query expression: A framework and two experiments.
    Catherine L. Smith.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    Modern search systems often meet their users' information needs, but when the system fails, searchers struggle to formulate effective queries. Query suggestions may help, but research suggests these often go unused. Although much is known about how searchers scan results pages when assessing relevance, little is known about the processes searchers use when struggling to reformulate queries. Investigating how searchers overcome query difficulties, and how search systems help and hinder that process, requires enquiry into the cognitive procedures searchers use to select words for queries. The purpose of this paper is to investigate one cognitive process involved: semantic priming of words in memory. A framework for conceptualizing the role of semantic priming in search interaction is presented, along with results from two experiments that applied research methods from cognitive psychology, in an investigation of word selection and subsequent search for selected words. The results show that word selection activates related words in memory and that looking for a selected word among related words is effortful. The finding suggests that semantic priming may play a role in the difficulties people experience when reformulating queries. Ideas for continued development of semantic priming methods and their use in future research are also presented.
    September 23, 2015   doi: 10.1002/asi.23611   open full text
  • Exploratory information searching in the enterprise: A study of user satisfaction and task performance.
    Paul H. Cleverley, Simon Burnett, Laura Muir.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    No prior research has been identified that investigates the causal factors for workplace exploratory search task performance. The impact of user, task, and environmental factors on user satisfaction and task performance was investigated through a mixed methods study with 26 experienced information professionals using enterprise search in an oil and gas enterprise. Some participants found 75% of high‐value items, others found none, with an average of 27%. No association was found between self‐reported search expertise and task performance, with a tendency for many participants to overestimate their search expertise. Successful searchers may have more accurate mental models of both search systems and the information space. Organizations may not have effective exploratory search task performance feedback loops, a lack of learning. This may be caused by management bias towards technology, not capability, a lack of systems thinking. Furthermore, organizations may not “know” they “don't know” their true level of search expertise, a lack of knowing. A metamodel is presented identifying the causal factors for workplace exploratory search task performance. Semistructured qualitative interviews with search staff from the defense, pharmaceutical, and aerospace sectors indicates the potential transferability of the finding that organizations may not know their search expertise levels.
    September 23, 2015   doi: 10.1002/asi.23595   open full text
  • Patent citation analysis with Google.
    Kayvan Kousha, Mike Thelwall.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    Citations from patents to scientific publications provide useful evidence about the commercial impact of academic research, but automatically searchable databases are needed to exploit this connection for large‐scale patent citation evaluations. Google covers multiple different international patent office databases but does not index patent citations or allow automatic searches. In response, this article introduces a semiautomatic indirect method via Bing to extract and filter patent citations from Google to academic papers with an overall precision of 98%. The method was evaluated with 322,192 science and engineering Scopus articles from every second year for the period 1996–2012. Although manual Google Patent searches give more results, especially for articles with many patent citations, the difference is not large enough to be a major problem. Within Biomedical Engineering, Biotechnology, and Pharmacology & Pharmaceutics, 7% to 10% of Scopus articles had at least one patent citation but other fields had far fewer, so patent citation analysis is only relevant for a minority of publications. Low but positive correlations between Google Patent citations and Scopus citations across all fields suggest that traditional citation counts cannot substitute for patent citations when evaluating research.
    September 23, 2015   doi: 10.1002/asi.23608   open full text
  • The normalization of occurrence and Co‐occurrence matrices in bibliometrics using Cosine similarities and Ochiai coefficients.
    Qiuju Zhou, Loet Leydesdorff.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    We prove that Ochiai similarity of the co‐occurrence matrix is equal to cosine similarity in the underlying occurrence matrix. Neither the cosine nor the Pearson correlation should be used for the normalization of co‐occurrence matrices because the similarity is then normalized twice, and therefore overestimated; the Ochiai coefficient can be used instead. Results are shown using a small matrix (5 cases, 4 variables) for didactic reasons, and also Ahlgren et al.'s (2003) co‐occurrence matrix of 24 authors in library and information sciences. The overestimation is shown numerically and will be illustrated using multidimensional scaling and cluster dendograms. If the occurrence matrix is not available (such as in internet research or author cocitation analysis) using Ochiai for the normalization is preferable to using the cosine.
    September 23, 2015   doi: 10.1002/asi.23603   open full text
  • Form‐ing institutional order: The scaffolding of lists and identifiers.
    Paul Beynon‐Davies.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    This paper examines the central place of the list and the associated concept of an identifier within the scaffolding of contemporary institutional order. These terms are deliberately chosen to make strange and help unpack the constitutive capacity of information systems and information technology within and between contemporary organizations. We draw upon the substantial body of work by John Searle to help understand the place of lists and identifiers in the constitution of institutional order. To enable us to ground our discussion of the potentiality and problematic associated with lists we describe a number of significant instances of list‐making, situated particularly around the use of identifiers to refer to people, places, and products. The theorization developed allows us to better explain not only the significance imbued within lists and identifiers but the key part they play in form‐ing the institutional order. We also hint at the role such symbolic artifacts play within breakdowns in institutional order.
    September 23, 2015   doi: 10.1002/asi.23613   open full text
  • The power–law relationship between citation‐based performance and collaboration in articles in management journals: A scale‐independent approach.
    Guillermo Armando Ronda‐Pupo, J. Sylvan Katz.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    The objective of this article is to determine if academic collaboration is associated with the citation‐based performance of articles that are published in management journals. We analyzed 127,812 articles published between 1988 and 2013 in 173 journals on the ISI Web of Science in the “management” category. Collaboration occurred in approximately 60% of all articles. A power–law relationship was found between citation‐based performance and journal size and collaboration patterns. The number of citations expected by collaborative articles increases 21.89 or 3.7 times when the number of collaborative articles published in a journal doubles. The number of citations expected by noncollaborative articles only increases 21.35 or 2.55 times if a journal publishes double the number of noncollaborative articles. The Matthew effect is stronger for collaborative than for noncollaborative articles. Scale‐independent indicators increase the confidence in the evaluation of the impact of the articles published in management journals.
    September 23, 2015   doi: 10.1002/asi.23575   open full text
  • Map of science with topic modeling: Comparison of unsupervised learning and human‐assigned subject classification.
    Arho Suominen, Hannes Toivanen.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised‐learning classification, and discuss the advantages and disadvantages of this approach vis‐à‐vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning‐based classification frameworks of scientific knowledge, as they typically try to fit new‐to‐the‐world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large‐scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large‐scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.
    September 23, 2015   doi: 10.1002/asi.23596   open full text
  • Sentence simplification, compression, and disaggregation for summarization of sophisticated documents.
    Catherine Finegan‐Dollak, Dragomir R. Radev.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    Sophisticated documents like legal cases and biomedical articles can contain unusually long sentences. Extractive summarizers can select such sentences—potentially adding hundreds of unnecessary words to the summary—or exclude them and lose important content. Sentence simplification or compression seems on the surface to be a promising solution. However, compression removes words before the selection algorithm can use them, and simplification generates sentences that may be ambiguous in an extractive summary. We therefore compare the performance of an extractive summarizer selecting from the sentences of the original document with that of the summarizer selecting from sentences shortened in three ways: simplification, compression, and disaggregation, which splits one sentence into several according to rules designed to keep all meaning. We find that on legal cases and biomedical articles, these shortening methods generate ungrammatical output. Human evaluators performed an extrinsic evaluation consisting of comprehension questions about the summaries. Evaluators given compressed, simplified, or disaggregated versions of the summaries answered fewer questions correctly than did those given summaries with unaltered sentences. Error analysis suggests 2 causes: Altered sentences sometimes interact with the sentence selection algorithm, and alterations to sentences sometimes obscure information in the summary. We discuss future work to alleviate these problems.
    September 23, 2015   doi: 10.1002/asi.23576   open full text
  • Multiple viewpoints increase students' attention to source features in social question and answer forum messages.
    Ladislao Salmerón, Mônica Macedo‐Rouet, Jean‐François Rouet.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    Social question & answer forums offer great learning opportunities, but students need to evaluate the credibility of answers to avoid being misled by untrustworthy sources. This critical evaluation may be beyond the capabilities of students from primary and secondary school. We conducted 2 studies to assess how students from primary, secondary, and undergraduate education perceive and use 2 relevant credibility cues in forums: author's identity and evidence used to support his answer. Students didn't use these cues when they evaluated forums with a single answer (Experiment 1), but they recommended more often answers from self‐reported experts than from users with a pseudonym when multiple sources were discussed in the forum (Experiment 2). This pattern of results suggested that multiple viewpoints increase students' attention to source features in forum messages. Experiment 2 also revealed that primary school students preferred personal experience as evidence in the messages, whereas undergraduate students preferred the inclusion of documentary sources. Thus, while children mimic the adult preference for expert sources in web forums, they treat source information in a rather superficial manner. To conclude, we outline possible mechanisms to understand how credibility assessment evolves across educational levels, and discuss potential implications for the educational curriculum in information literacy.
    September 23, 2015   doi: 10.1002/asi.23585   open full text
  • Trustworthiness and authority of scholarly information in a digital age: Results of an international questionnaire.
    Carol Tenopir, Kenneth Levine, Suzie Allard, Lisa Christian, Rachel Volentine, Reid Boehm, Frances Nichols, David Nicholas, Hamid R. Jamali, Eti Herman, Anthony Watkinson.
    Journal of the American Society for Information Science and Technology. September 23, 2015
    An international survey of over 3,600 researchers examined how trustworthiness and quality are determined for making decisions on scholarly reading, citing, and publishing and how scholars perceive changes in trust with new forms of scholarly communication. Although differences in determining trustworthiness and authority of scholarly resources exist among age groups and fields of study, traditional methods and criteria remain important across the board. Peer review is considered the most important factor for determining the quality and trustworthiness of research. Researchers continue to read abstracts, check content for sound arguments and credible data, and rely on journal rankings when deciding whether to trust scholarly resources in reading, citing, or publishing. Social media outlets and open access publications are still often not trusted, although many researchers believe that open access has positive implications for research, especially if the open access journals are peer reviewed.
    September 23, 2015   doi: 10.1002/asi.23598   open full text
  • Gender as an influencer of online health information‐seeking and evaluation behavior.
    Jennifer Rowley, Frances Johnson, Laura Sbaffi.
    Journal of the American Society for Information Science and Technology. August 25, 2015
    This article contributes to the growing body of research that explores the significance of context in health information behavior. Specifically, through the lens of trust judgments, it demonstrates that gender is a determinant of the information evaluation process. A questionnaire‐based survey collected data from adults regarding the factors that influence their judgment of the trustworthiness of online health information. Both men and women identified credibility, recommendation, ease of use, and brand as being of importance in their trust judgments. However, women also take into account style, while men eschew this for familiarity. In addition, men appear to be more concerned with the comprehensiveness and accuracy of the information, the ease with which they can access it, and its familiarity, whereas women demonstrate greater interest in cognition, such as the ease with which they can read and understand the information. These gender differences are consistent with the demographic data, which suggest that: women consult more types of sources than men; men are more likely to be searching with respect to a long‐standing health complaint; and, women are more likely than men to use tablets in their health information seeking. Recommendations for further research to better inform practice are offered.
    August 25, 2015   doi: 10.1002/asi.23597   open full text
  • Evaluation of the citation matching algorithms of CWTS and iFQ in comparison to the Web of science.
    Marlies Olensky, Marion Schmidt, Nees Jan Eck.
    Journal of the American Society for Information Science and Technology. August 25, 2015
    The results of bibliometric studies provided by bibliometric research groups, for example, the Centre for Science and Technology Studies (CWTS) and the Institute for Research Information and Quality Assurance (iFQ), are often used in the process of research assessment. Their databases use Web of Science (WoS) citation data, which they match according to their own matching algorithms—in the case of CWTS for standard usage in their studies and in the case of iFQ on an experimental basis. Because the problem of nonmatched citations in the WoS persists due to inaccuracies in the references or inaccuracies introduced in the data extraction process, it is important to ascertain how well these inaccuracies are rectified in these citation matching algorithms. This article evaluates the algorithms of CWTS and iFQ in comparison to the WoS in a quantitative and a qualitative analysis. The analysis builds upon the method and the manually verified corpus of a previous study. The algorithm of CWTS performs best, closely followed by that of iFQ. The WoS algorithm still performs quite well (F1 score: 96.41%), but shows deficits in matching references containing inaccuracies. An additional problem is posed by incorrectly provided cited reference information in source articles by the WoS.
    August 25, 2015   doi: 10.1002/asi.23590   open full text
  • Detecting temporal patterns of user queries.
    Pengjie Ren, Zhumin Chen, Jun Ma, Zhiwei Zhang, Luo Si, Shuaiqiang Wang.
    Journal of the American Society for Information Science and Technology. August 18, 2015
    Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from the perspective of queries' temporal patterns. Queries' temporal patterns are inherent time series patterns of the search volumes of queries that reflect the evolution of the popularity of a query over time. By analyzing the temporal patterns of queries, search engines can more deeply understand the users' search intents and thus improve performance. Furthermore, we extract three groups of features based on the queries' search volume time series and use a support vector machine (SVM) to automatically detect the temporal patterns of user queries. Extensive experiments on the Million Query Track data sets of the Text REtrieval Conference (TREC) demonstrate the effectiveness of our approach.
    August 18, 2015   doi: 10.1002/asi.23578   open full text
  • A quantitative analysis of the temporal effects on automatic text classification.
    Thiago Salles, Leonardo Rocha, Marcos André Gonçalves, Jussara M. Almeida, Fernando Mourão, Wagner Meira, Felipe Viegas.
    Journal of the American Society for Information Science and Technology. August 07, 2015
    Automatic text classification (TC) continues to be a relevant research topic and several TC algorithms have been proposed. However, the majority of TC algorithms assume that the underlying data distribution does not change over time. In this work, we are concerned with the challenges imposed by the temporal dynamics observed in textual data sets. We provide evidence of the existence of temporal effects in three textual data sets, reflected by variations observed over time in the class distribution, in the pairwise class similarities, and in the relationships between terms and classes. We then quantify, using a series of full factorial design experiments, the impact of these effects on four well‐known TC algorithms. We show that these temporal effects affect each analyzed data set differently and that they restrict the performance of each considered TC algorithm to different extents. The reported quantitative analyses, which are the original contributions of this article, provide valuable new insights to better understand the behavior of TC algorithms when faced with nonstatic (temporal) data distributions and highlight important requirements for the proposal of more accurate classification models.
    August 07, 2015   doi: 10.1002/asi.23452   open full text
  • Core indicators and professional recognition of scientometricians.
    Péter Vinkler.
    Journal of the American Society for Information Science and Technology. August 07, 2015
    The publication performance of 30 scientometricians is studied. The individuals are classified into 3 cohorts according to their manifested professional recognition, as Price medalists (Pm), members of the editorial board of Scientometrics and the Journal of Informetrics (Rw), and session chairs (Sc) at an International Society of Scientometrics and Informetrics (ISSI) conference. Several core impact indicators are calculated: h, g, π, citation distribution score (CDS), percentage rank position (PRP), and weight of influence of papers (WIP10). The indices significantly correlate with each other. The mean value of the indices of the cohorts decreases parallel with the decrease in professional recognition: Pm > Rw > Sc. The 30 scientometricians studied were clustered according to the core impact indices. The members in the clusters so obtained overlap only partly with the members in the cohorts made by professional recognition. The Total Overlap is calculated by dividing the sum of the diagonal elements in the cohorts‐clusters matrix with the total number of elements, times 100. The highest overlap (76.6%) was obtained with the g‐index. Accordingly, the g‐index seems to have the greatest discriminative power in the system studied. The cohorts‐clusters method may be used for validating scientometric indicators.
    August 07, 2015   doi: 10.1002/asi.23589   open full text
  • Improving proverb search and retrieval with a generic multidimensional ontology.
    Maayan Zhitomirsky‐Geffet, Gila Prebor, Orna Bloch.
    Journal of the American Society for Information Science and Technology. August 07, 2015
    The goal of this research is to develop a generic ontological model for proverbs that unifies potential classification criteria and various characteristics of proverbs to enable their effective retrieval and large‐scale analysis. Because proverbs can be described and indexed by multiple characteristics and criteria, we built a multidimensional ontology suitable for proverb classification. To evaluate the effectiveness of the constructed ontology for improving search and retrieval of proverbs, a large‐scale user experiment was arranged with 70 users who were asked to search a proverb repository using ontology‐based and free‐text search interfaces. The comparative analysis of the results shows that the use of this ontology helped to substantially improve the search recall, precision, user satisfaction, and efficiency and to minimize user effort during the search process. A practical contribution of this work is an automated web‐based proverb search and retrieval system which incorporates the proposed ontological scheme and an initial corpus of ontology‐based annotated proverbs.
    August 07, 2015   doi: 10.1002/asi.23573   open full text
  • Modeling journal bibliometrics to predict downloads and inform purchase decisions at university research libraries.
    Daniel M. Coughlin, Bernard J. Jansen.
    Journal of the American Society for Information Science and Technology. August 07, 2015
    University libraries provide access to thousands of online journals and other content, spending millions of dollars annually on these electronic resources. Providing access to these online resources is costly, and it is difficult both to analyze the value of this content to the institution and to discern those journals that comparatively provide more value. In this research, we examine 1,510 journals from a large research university library, representing more than 40% of the university's annual subscription cost for electronic resources at the time of the study. We utilize a web analytics approach for the creation of a linear regression model to predict usage among these journals. We categorize metrics into two classes: global (journal focused) and local (institution dependent). Using 275 journals for our training set, our analysis shows that a combination of global and local metrics creates the strongest model for predicting full‐text downloads. Our linear regression model has an accuracy of more than 80% in predicting downloads for the 1,235 journals in our test set. The implications of the findings are that university libraries that use local metrics have better insight into the value of a journal and therefore more efficient cost content management.
    August 07, 2015   doi: 10.1002/asi.23549   open full text
  • Comparing and combining Content‐ and Citation‐based approaches for plagiarism detection.
    Solange de L. Pertile, Viviane P. Moreira, Paolo Rosso.
    Journal of the American Society for Information Science and Technology. August 07, 2015
    The vast amount of scientific publications available online makes it easier for students and researchers to reuse text from other authors and makes it harder for checking the originality of a given text. Reusing text without crediting the original authors is considered plagiarism. A number of studies have reported the prevalence of plagiarism in academia. As a consequence, numerous institutions and researchers are dedicated to devising systems to automate the process of checking for plagiarism. This work focuses on the problem of detecting text reuse in scientific papers. The contributions of this paper are twofold: (a) we survey the existing approaches for plagiarism detection based on content, based on content and structure, and based on citations and references; and (b) we compare content and citation‐based approaches with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection. We carry out experiments with real data sets of scientific papers and concluded that a combination of the methods can be beneficial.
    August 07, 2015   doi: 10.1002/asi.23593   open full text
  • Visualizing the world's scientific publications.
    Rex H.‐G. Chen, Chi‐Ming Chen.
    Journal of the American Society for Information Science and Technology. August 07, 2015
    Automated methods for the analysis, modeling, and visualization of large‐scale scientometric data provide measures that enable the depiction of the state of world scientific development. We aimed to integrate minimum span clustering (MSC) and minimum spanning tree methods to cluster and visualize the global pattern of scientific publications (PSP) by analyzing aggregated Science Citation Index (SCI) data from 1994 to 2011. We hypothesized that PSP clustering is mainly affected by countries' geographic location, ethnicity, and level of economic development, as indicated in previous studies. Our results showed that the 100 countries with the highest rates of publications were decomposed into 12 PSP groups and that countries within a group tended to be geographically proximal, ethnically similar, or comparable in terms of economic status. Hubs and bridging nodes in each knowledge production group were identified. The performance of each group was evaluated across 16 knowledge domains based on their specialization, volume of publications, and relative impact. Awareness of the strengths and weaknesses of each group in various knowledge domains may have useful applications for examining scientific policies, adjusting the allocation of resources, and promoting international collaboration for future developments.
    August 07, 2015   doi: 10.1002/asi.23591   open full text
  • Evaluating topic representations for exploring document collections.
    Nikolaos Aletras, Timothy Baldwin, Jey Han Lau, Mark Stevenson.
    Journal of the American Society for Information Science and Technology. July 28, 2015
    Topic models have been shown to be a useful way of representing the content of large document collections, for example, via visualization interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a term list; that is the top‐n words with highest conditional probability within the topic. Other topic representations such as textual and image labels also have been proposed. However, there has been no comparison of these alternative representations. In this article, we compare 3 different topic representations in a document retrieval task. Participants were asked to retrieve relevant documents based on predefined queries within a fixed time limit, presenting topics in one of the following modalities: (a) lists of terms, (b) textual phrase labels, and (c) image labels. Results show that textual labels are easier for users to interpret than are term lists and image labels. Moreover, the precision of retrieved documents for textual and image labels is comparable to the precision achieved by representing topics using term lists, demonstrating that labeling methods are an effective alternative topic representation.
    July 28, 2015   doi: 10.1002/asi.23574   open full text
  • Analyzing Web behavior in indoor retail spaces.
    Yongli Ren, Martin Tomko, Flora Dilys Salim, Kevin Ong, Mark Sanderson.
    Journal of the American Society for Information Science and Technology. July 28, 2015
    We analyze 18‐ million rows of Wi‐Fi access logs collected over a 1‐year period from over 120,000 anonymized users at an inner city shopping mall. The anonymized data set gathered from an opt‐in system provides users' approximate physical location as well as web browsing and some search history. Such data provide a unique opportunity to analyze the interaction between people's behavior in physical retail spaces and their web behavior, serving as a proxy to their information needs. We found that (a) there is a weekly periodicity in users' visits to the mall; (b) people tend to visit similar mall locations and web content during their repeated visits to the mall; (c) around 60% of registered Wi‐Fi users actively browse the web, and around 10% of them use Wi‐Fi for accessing web search engines; (d) people are likely to spend a relatively constant amount of time browsing the web while the duration of their visit may vary; (e) the physical spatial context has a small, but significant, influence on the web content that indoor users browse; and (f) accompanying users tend to access resources from the same web domains.
    July 28, 2015   doi: 10.1002/asi.23587   open full text
  • Author practices in citing other authors, institutions, and journals.
    Ali Gazni, Zahra Ghaseminik.
    Journal of the American Society for Information Science and Technology. July 28, 2015
    This study explores the extent to which authors with different impact and productivity levels cite journals, institutions, and other authors through an analysis of the scientific papers of 37,717 authors during 1990–2013. The results demonstrate that the core‐scatter distribution of cited authors, institutions, and journals varies for authors in each impact and productivity class. All authors in the science network receive the majority of their credit from high‐impact authors; however, this effect decreases as authors' impact levels decrease. Similarly, the proportion of citations that lower‐impact authors make to each other increases as authors' impact levels decrease. High‐impact authors, who have the highest degree of membership in the science network, publish fewer papers in comparison to highly productive authors. However, authors with the highest impact make both more references per paper and also more citations to papers in the science network. This suggests that high‐impact authors produce the most relevant work in the science network. Comparing practices by productivity level, authors receive the majority of their credit from highly productive authors and authors cite highly productive authors more frequently than less productive authors.
    July 28, 2015   doi: 10.1002/asi.23580   open full text
  • Development, testing, and validation of an information literacy test (ILT) for higher education.
    Bojana Boh Podgornik, Danica Dolničar, Andrej Šorgo, Tomaž Bartol.
    Journal of the American Society for Information Science and Technology. July 28, 2015
    A new information literacy test (ILT) for higher education was developed, tested, and validated. The ILT contains 40 multiple‐choice questions (available in Appendix) with four possible answers and follows the recommendations of information literacy (IL) standards for higher education. It assesses different levels of thinking skills and is intended to be freely available to educators, librarians, and higher education managers, as well as being applicable internationally for study programs in all scientific disciplines. Testing of the ILT was performed on a group of 536 university students. The overall test analysis confirmed the ILT reliability and discrimination power as appropriate (Cronbach's alpha 0.74; Ferguson's delta 0.97). The students' average overall achievement was 66%, and IL increased with the year of study. The students were less successful in advanced database search strategies, which require a combination of knowledge, comprehension, and logic, and in topics related to intellectual property and ethics. A group of 163 students who took a second ILT assessment after participating in an IL‐specific study course achieved an average posttest score of 78.6%, implying an average IL increase of 13.1%, with most significant improvements in advanced search strategies (23.7%), and in intellectual property and ethics (12.8%).
    July 28, 2015   doi: 10.1002/asi.23586   open full text
  • Motivation to share knowledge using wiki technology and the moderating effect of role perceptions.
    Ofer Arazy, Ian Gellatly, Esther Brainin, Oded Nov.
    Journal of the American Society for Information Science and Technology. July 28, 2015
    One of the key challenges for innovation and technology‐mediated knowledge collaboration within organizational settings is motivating contributors to share their knowledge. Drawing upon self‐determination theory, we investigate 2 forms of motivation: internally driven (autonomous motivation) and externally driven (controlled motivation). Knowledge sharing could be viewed as a required in‐role activity or as discretionary extra‐role behavior. In this study, we examine the moderating effect of role perceptions on the relations between each of the two motivational constructs and knowledge sharing, paying particular attention to the affordances of the enabling information technology. An analysis of survey data from a wiki‐based organizational encyclopedia in a large, multinational firm reveals that when contributors' motivation is externally driven, they are more likely to share knowledge if this activity is viewed as in‐role behavior. However, when contributors' motivation is internally driven, they are more likely to participate in knowledge sharing when this activity is viewed as extra‐role behavior. Theoretical and practical implications are discussed.
    July 28, 2015   doi: 10.1002/asi.23579   open full text
  • Sharing “happy” information.
    Fiona Tinto, Ian Ruthven.
    Journal of the American Society for Information Science and Technology. July 28, 2015
    This study focuses on the sharing of “happy” information: information that creates a sense of happiness within the individual sharing the information. We explore the range of factors motivating and impacting individuals' happy information‐sharing behavior within a casual leisure context through 30 semistructured interviews. The findings reveal that the factors influencing individuals' happy information‐sharing behavior are numerous, and impact each other. Most individuals considered sharing happy information important to their friendships and relationships. In various contexts the act of sharing happy information was shown to enhance the sharer's happiness.
    July 28, 2015   doi: 10.1002/asi.23581   open full text
  • The role of social capital in selecting interpersonal information sources.
    J. Christopher Zimmer, Raymond M. Henry.
    Journal of the American Society for Information Science and Technology. July 07, 2015
    Although the information‐seeking literature has tended to focus upon the selection and use of inanimate objects as information sources, this research follows the more recent trend of investigating how individuals evaluate and use interpersonal information sources. By drawing from the structural, relational, and cognitive elements of social capital theory to inform antecedents to information quality and source accessibility, a research model is developed and tested. For interpersonal information sources, information quality is the key determinant of source use. Perceptions of information quality and accessibility of an interpersonal source are shown to be influenced by boundary spanning, transactive memory, and content type. Implications and prescriptions for future research are discussed.
    July 07, 2015   doi: 10.1002/asi.23577   open full text
  • Disciplinary knowledge production and diffusion in science.
    Erjia Yan.
    Journal of the American Society for Information Science and Technology. July 07, 2015
    This study examines patterns of dynamic disciplinary knowledge production and diffusion. It uses a citation data set of Scopus‐indexed journals and proceedings. The journal‐level citation data set is aggregated into 27 subject areas and these subjects are selected as the unit of analysis. A 3‐step approach is employed: the first step examines disciplines' citation characteristics through scientific trading dimensions; the second step analyzes citation flows between pairs of disciplines; and the third step uses egocentric citation networks to assess individual disciplines' citation flow diversity through Shannon entropy. The results show that measured by scientific impact, the subjects of Chemical Engineering, Energy, and Environmental Science have the fastest growth. Furthermore, most subjects are carrying out more diversified knowledge trading practices by importing higher volumes of knowledge from a greater number of subjects. The study also finds that the growth rates of disciplinary citations align with the growth rates of global research and development (R&D) expenditures, thus providing evidence to support the impact of R&D expenditures on knowledge production.
    July 07, 2015   doi: 10.1002/asi.23541   open full text
  • Overlay maps based on Mendeley data: The use of altmetrics for readership networks.
    Lutz Bornmann, Robin Haunschild.
    Journal of the American Society for Information Science and Technology. July 07, 2015
    Visualization of scientific results using networks has become popular in scientometric research. We provide base maps for Mendeley reader count data using the publication year 2012 from the Web of Science data. Example networks are shown and explained. The reader can use our base maps to visualize other results with the VOSViewer. The proposed overlay maps are able to show the impact of publications in terms of readership data. The advantage of using our base maps is that it is not necessary for the user to produce a network based on all data (e.g., from 1 year), but can collect the Mendeley data for a single institution (or journals, topics) and can match them with our already produced information. Generation of such large‐scale networks is still a demanding task despite the available computer power and digital data availability. Therefore, it is very useful to have base maps and create the network with the overlay technique.
    July 07, 2015   doi: 10.1002/asi.23569   open full text
  • Distributed or concentrated research excellence? Evidence from a large‐scale research assessment exercise.
    Andrea Bonaccorsi, Tindaro Cicero.
    Journal of the American Society for Information Science and Technology. July 07, 2015
    Almost all research evaluation exercises, by construction, deliver data at the level of departments and universities, not at the level of individuals. Yet, the aggregate performance is the average of the performance of individual researchers. This paper explores the issue of the relative magnitude of variability in performance within departments and between departments. It exploits anonymized data at the individual level from one of the largest research evaluation exercises, the Italian VQR 2004–2010 (Valutazione della Qualità della Ricerca). If the variability between departments were much larger than variability within departments, we would see evidence of a process of stratification, or vertical differentiation, arguably driven by competition and researcher mobility. The data show that the opposite pattern is at play.
    July 07, 2015   doi: 10.1002/asi.23539   open full text
  • Understanding the sustained use of online health communities from a self‐determination perspective.
    Yan Zhang.
    Journal of the American Society for Information Science and Technology. July 07, 2015
    Sustained use of an information source is sometimes important for achieving an individual's long‐term goals, such as learning and self‐development. It is even more important for users of online health communities because health benefits usually come with sustained use. However, little is known about what retains a user. We interviewed 21 participants who had been using online diabetes communities in a sustained manner. Guided by self‐determination theory, which posits that behaviors are sustained when they can satisfy basic human needs for autonomy, competence, and relatedness, we identified mechanisms that help satisfy these needs, and thus sustain users in online health communities. Autonomy‐supportive mechanisms include being respected and supported as a unique individual, feeling free in making choices, and receiving meaningful rationales about others' decisions. Competence‐cultivating mechanisms include seeking information, providing information, and exchanging information with others to construct knowledge. Mechanisms that cultivate relatedness include seeing similarities between oneself and peers, receiving responses from others, providing emotional support, and forming small underground groups for closer interactions. The results suggest that, like emotions, information and small group interactions also play a key role in retaining users. System design and community management strategies are discussed based on these mechanisms.
    July 07, 2015   doi: 10.1002/asi.23560   open full text
  • Business process costs of implementing “gold” and “green” open access in institutional and national contexts.
    Robert Johnson, Stephen Pinfield, Mattia Fosci.
    Journal of the American Society for Information Science and Technology. June 26, 2015
    As open access (OA) publication of research outputs becomes increasingly common and is mandated by institutions and research funders, it is important to understand different aspects of the costs involved. This paper provides an early review of administrative costs incurred by universities in making research outputs OA, either via publication in journals (“Gold” OA), involving payment of article‐processing charges (APCs), or via deposit in repositories (“Green” OA). Using data from 29 UK institutions, it finds that the administrative time, as well as the cost incurred by universities, to make an article OA using the Gold route is over 2.5 times higher than Green. Costs are then modeled at a national level using recent UK policy initiatives from Research Councils UK and the Higher Education Funding Councils' Research Excellence Framework as case studies. The study also demonstrates that the costs of complying with research funders' OA policies are considerably higher than where an OA publication is left entirely to authors' discretion. Key target areas for future efficiencies in the business processes are identified and potential cost savings calculated. The analysis is designed to inform ongoing policy development at the institutional and national levels.
    June 26, 2015   doi: 10.1002/asi.23545   open full text
  • Exploiting heterogeneous scientific literature networks to combat ranking bias: Evidence from the computational linguistics area.
    Xiaorui Jiang, Xiaoping Sun, Zhe Yang, Hai Zhuge, Jianmin Yao.
    Journal of the American Society for Information Science and Technology. June 24, 2015
    It is important to help researchers find valuable papers from a large literature collection. To this end, many graph‐based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph‐based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less‐biased ranking than previous methods. MutualRank provides a unified model that involves both intra‐ and inter‐network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well‐known universities and two well‐known textbooks. The experimental results show that MutualRank greatly outperforms the state‐of‐the‐art competitors, including PageRank, HITS, CoRank, Future Rank, and P‐Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
    June 24, 2015   doi: 10.1002/asi.23463   open full text
  • Interfaces for accessing location‐based information on mobile devices: An empirical evaluation.
    Dion Hoe‐Lian Goh, Chei Sian Lee, Khasfariyati Razikin.
    Journal of the American Society for Information Science and Technology. June 24, 2015
    Location‐based information can now be easily accessed anytime and anywhere using mobile devices. Common ways of presenting such information include lists, maps, and augmented reality (AR). Each of these interface types has its strengths and weaknesses, but few empirical evaluations have been conducted to compare them in terms of performance and perceptions of usability. In this paper, we investigate these issues using three interface types for searching and browsing location‐based information across two task types: open and closed ended. The experimental study involved 180 participants who were issued an Android mobile phone preloaded with a specific interface and asked to perform a set of open‐ and closed‐ended tasks using both searching and browsing approaches. The results suggest that the list interface performed best across all tasks in terms of completion times, whereas the AR interface ranked second and the map interface performed worst. Participants rated the list as best across most usability constructs but the map was rated better than the AR interface, even though the latter performed better. Implications of the work are discussed.
    June 24, 2015   doi: 10.1002/asi.23566   open full text
  • The role of information in health behavior: A scoping study and discussion of major public health models.
    Devon L. Greyson, Joy L. Johnson.
    Journal of the American Society for Information Science and Technology. June 24, 2015
    Information interventions that influence health behavior are a major element of the public health toolkit and an area of potential interest and investigation for library and information science (LIS) researchers. To explore the use of information as a concept within dominant public health behavior models and the manner in which information practices are handled therein, we undertook a scoping study. We scoped the use of “information” within core English‐language health behavior textbooks and examined dominant models of health behavior for information practices. Index terms within these texts indicated a lack of common language around information‐related concepts. Nine models/theories were discussed in a majority of the texts. These were grouped by model type and examined for information‐related concepts/constructs. Information was framed as a “thing” or resource, and information practices were commonly included or implied. However, lack of specificity regarding the definition of information, how it differs from knowledge, and how context affects information practices make the exact role of information within health behavior models unclear. Although health information interventions may be grounded in behavioral theory, a limited understanding of the ways information works within people's lives hinders our ability to effectively use information to improve health. By the same token, information scientists should explore public health's interventionist approach.
    June 24, 2015   doi: 10.1002/asi.23392   open full text
  • Seeing is believing (or at least changing your mind): The influence of visibility and task complexity on preference changes in computer‐supported team decision making.
    Babajide Osatuyi, Starr Roxanne Hiltz, Katia Passerini.
    Journal of the American Society for Information Science and Technology. June 15, 2015
    This article describes an experimental study that examines the extent to which a group decision support system (GDSS), which allows team members to view other members' preference ratings, can encourage changes in individual preferences. We studied 22, four‐person teams performing 2 hidden profile tasks—simple and complex—in a controlled setting. Transparency of the interactions, achieved through the visibility of ratings, influenced changes in participants' preferences as measured before, during and after the team discussion. Visibility of team scores could then offer an effective way to reach consensus, despite individual incumbent preferences. Changes between individuals' initial preferences and team preferences were found to be larger for members working on a complex task compared to a simple task, as were changes between individuals' prediscussion and postdiscussion preferences. Although prior studies established that the initial preferences of individual team members are rather sticky, this study reveals that individuals adjusted their initial preferences to reach a team consensus, as well as modified their preferences after team discussions. Despite the mixed earlier research results on the impact of GDSS on efficient decision making, findings from this study suggest that in complex decision‐making contexts, GDSS tools can be effective in enabling consensus building in groups.
    June 15, 2015   doi: 10.1002/asi.23555   open full text
  • Data sharing for the advancement of science: Overcoming barriers for citizen scientists.
    Kirsty Williamson, Mary Anne Kennan, Graeme Johanson, John Weckert.
    Journal of the American Society for Information Science and Technology. June 15, 2015
    Systematic study of data sharing by citizen scientists will make a significant contribution to science because of the growing importance of aggregated data in data‐intensive science. This article expands on the data sharing component of a paper presented at the 2013 ASIST conference. A three‐phase project is reported. Conducted between 2011 and 2013 within an environmental voluntary group, the Australian Plants Society Victoria (APSV), the interviews of the first phase are the major data source. Because the project revealed the importance of data sharing with professional scientists, their views are included in the literature review where four themes are explored: lack of shared disciplinary culture, trust, responsibility and controlled access to data, and describing data to enable reuse. The findings, presented under these themes, revealed that, within APSV, sharing among members is mostly generous and uninhibited. Beyond APSV, when online repositories were involved, barriers came very strongly into play. Trust was weaker and barriers also included issues of data quality, data description, and ownership and control. The conclusion is that further investigation of these barriers, including the attitudes of professional scientists to using data contributed by citizen scientists, would indicate how more extensive and useful data sharing could be achieved.
    June 15, 2015   doi: 10.1002/asi.23564   open full text
  • Toward understanding short‐term personal information preservation: A study of backup strategies of end users.
    Matjaž Kljun, John Mariani, Alan Dix.
    Journal of the American Society for Information Science and Technology. June 15, 2015
    The segment of companies providing storage services and hardware for end users and small businesses has been growing in the past few years. Cloud storage, personal network‐attached storage (NAS), and external hard drives are more affordable than ever before and one would think that backing up personal digital information is a straightforward process nowadays. Despite this, small group studies and corporate surveys show the opposite. In this paper we present the results from a quantitative and qualitative survey of 319 participants about how they back up their personal computers and restore personal information in case of computer failures. The results show that the majority of users do manual, selective, and noncontinuous backups, rely on a set of planned and unplanned backups (as a consequence of other activities), have inadequate knowledge about possible solutions and implications of using known solutions, and so on. The study also reveals that around a fifth of all computers are not backed up, and a quarter of most important files and a third of most important folders at the time of the survey could not be (fully) restored in the event of computer failure. Based on the results, several implications for practice and research are presented.
    June 15, 2015   doi: 10.1002/asi.23526   open full text
  • Spatial mediations in historical understanding: GIS and epistemic practices of history.
    Venkata Ratnadeep Suri, Hamid R. Ekbia.
    Journal of the American Society for Information Science and Technology. June 05, 2015
    Scientific disciplines are distinct not only in what they know but in how they know what they know—that is, in their “epistemic cultures.” There is a close relationship between the technologies that a field utilizes and sanctions and the process of inquiry, the character and meaning of corroborative data and evidence, and the kinds of models and theories developed in a field. As the machinery changes, epistemic practices also change. A case in point is how the epistemic practices of historians are reconfigured by the introduction of Geographic Information Systems (GIS). We argue that GIS mediates historical understanding and knowledge creation in at least three ways: (a) by allowing historians to bring new sets of data into analysis, (b) by introducing novel questions, fresh insights, and new modes of analysis and reasoning, or discovering new answers to older questions; and (c) by providing new tools for historians to communicate with each other and with their audiences. We illustrate these mediations through the study of the historiography of Budapest Ghettos during World War II. Our study shows how GIS functionalities reveal hitherto unknown aspects of social life in the ghettos, while pushing certain other aspects into the background.
    June 05, 2015   doi: 10.1002/asi.23562   open full text
  • Data science on the ground: Hype, criticism, and everyday work.
    Daniel Carter, Dan Sholler.
    Journal of the American Society for Information Science and Technology. June 05, 2015
    Modern organizations often employ data scientists to improve business processes using diverse sets of data. Researchers and practitioners have both touted the benefits and warned of the drawbacks associated with data science and big data approaches, but few studies investigate how data science is carried out “on the ground.” In this paper, we first review the hype and criticisms surrounding data science and big data approaches. We then present the findings of semistructured interviews with 18 data analysts from various industries and organizational roles. Using qualitative coding techniques, we evaluated these interviews in light of the hype and criticisms surrounding data science in the popular discourse. We found that although the data analysts we interviewed were sensitive to both the allure and the potential pitfalls of data science, their motivations and evaluations of their work were more nuanced. We conclude by reflecting on the relationship between data analysts' work and the discourses around data science and big data, suggesting how future research can better account for the everyday practices of this profession.
    June 05, 2015   doi: 10.1002/asi.23563   open full text
  • Mendeley readership counts: An investigation of temporal and disciplinary differences.
    Mike Thelwall, Pardeep Sud.
    Journal of the American Society for Information Science and Technology. June 05, 2015
    Scientists and managers using citation‐based indicators to help evaluate research cannot evaluate recent articles because of the time needed for citations to accrue. Reading occurs before citing, however, and so it makes sense to count readers rather than citations for recent publications. To assess this, Mendeley readers and citations were obtained for articles from 2004 to late 2014 in five broad categories (agriculture, business, decision science, pharmacy, and the social sciences) and 50 subcategories. In these areas, citation counts tended to increase with every extra year since publication, and readership counts tended to increase faster initially but then stabilize after about 5 years. The correlation between citations and readers was also higher for longer time periods, stabilizing after about 5 years. Although there were substantial differences between broad fields and smaller differences between subfields, the results confirm the value of Mendeley reader counts as early scientific impact indicators.
    June 05, 2015   doi: 10.1002/asi.23559   open full text
  • Enhancing information retrieval through concept‐based language modeling and semantic smoothing.
    Lynda Said Lhadj, Mohand Boughanem, Karima Amrouche.
    Journal of the American Society for Information Science and Technology. June 05, 2015
    Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two well‐known problems in information retrieval (IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasal‐concepts, or word relationships, but such models are estimated using simple n‐grams or concept counting. In this paper, we address polysemy and synonymy mismatch with a concept‐based language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the concept‐based model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single word‐based model and the Markov Random Field model (using a Markov classifier).
    June 05, 2015   doi: 10.1002/asi.23553   open full text
  • Teen online information disclosure: Empirical testing of a protection motivation and social capital model.
    Hongliang Chen, Christopher E. Beaudoin, Traci Hong.
    Journal of the American Society for Information Science and Technology. June 05, 2015
    With bases in protection motivation theory and social capital theory, this study investigates teen and parental factors that determine teens’ online privacy concerns, online privacy protection behaviors, and subsequent online information disclosure on social network sites. With secondary data from a 2012 survey (N = 622), the final well‐fitting structural equation model revealed that teen online privacy concerns were primarily influenced by parental interpersonal trust and parental concerns about teens’ online privacy, whereas teen privacy protection behaviors were primarily predicted by teen cost–benefit appraisal of online interactions. In turn, teen online privacy concerns predicted increased privacy protection behaviors and lower teen information disclosure. Finally, restrictive and instructive parental mediation exerted differential influences on teens’ privacy protection behaviors and online information disclosure.
    June 05, 2015   doi: 10.1002/asi.23567   open full text
  • Media studies research in the data‐driven age: How research questions evolve.
    Marc Bron, Jasmijn Van Gorp, Maarten Rijke.
    Journal of the American Society for Information Science and Technology. June 02, 2015
    The introduction of new technologies and access to new information channels continue to change the way media studies researchers work and the questions they seek to answer. We investigate the current practices of media studies researchers and how these practices affect their research questions. Through the analysis of 27 interviews about the research practices of media studies researchers during a research project we developed a model of the activities in their research cycle. We find that information gathering and analysis activities are dominating the research cycle. These activities influence the research outcomes as they determine how research questions asked by media studies researchers evolve. Specifically, we show how research questions are related to the availability and accessibility of data as well as new information sources for contextualization of the research topic. Our contribution is a comprehensive account of the overall research cycle of media studies researchers as well as specific aspects of the research cycle, i.e., information sources, information seeking challenges, and the development of research questions. This work confirms findings of previous work in this area using a previously unstudied group of researchers, as well as providing new details about how research questions evolve.
    June 02, 2015   doi: 10.1002/asi.23458   open full text
  • Uncovering social semantics from textual traces: A theory‐driven approach and evidence from public statements of U.S. Members of Congress.
    Yu‐Ru Lin, Drew Margolin, David Lazer.
    Journal of the American Society for Information Science and Technology. June 02, 2015
    The increasing abundance of digital textual archives provides an opportunity for understanding human social systems. Yet the literature has not adequately considered the disparate social processes by which texts are produced. Drawing on communication theory, we identify three common processes by which documents might be detectably similar in their textual features—authors sharing subject matter, sharing goals, and sharing sources. We hypothesize that these processes produce distinct, detectable relationships between authors in different kinds of textual overlap. We develop a novel n‐gram extraction technique to capture such signatures based on n‐grams of different lengths. We test the hypothesis on a corpus where the author attributes are observable: the public statements of the members of the U.S. Congress. This article presents the first empirical finding that shows different social relationships are detectable through the structure of overlapping textual features. Our study has important implications for designing text modeling techniques to make sense of social phenomena from aggregate digital traces.
    June 02, 2015   doi: 10.1002/asi.23540   open full text
  • The sharing economy: Why people participate in collaborative consumption.
    Juho Hamari, Mimmi Sjöklint, Antti Ukkonen.
    Journal of the American Society for Information Science and Technology. June 02, 2015
    Information and communications technologies (ICTs) have enabled the rise of so‐called “Collaborative Consumption” (CC): the peer‐to‐peer‐based activity of obtaining, giving, or sharing the access to goods and services, coordinated through community‐based online services. CC has been expected to alleviate societal problems such as hyper‐consumption, pollution, and poverty by lowering the cost of economic coordination within communities. However, beyond anecdotal evidence, there is a dearth of understanding why people participate in CC. Therefore, in this article we investigate people's motivations to participate in CC. The study employs survey data (N = 168) gathered from people registered onto a CC site. The results show that participation in CC is motivated by many factors such as its sustainability, enjoyment of the activity as well as economic gains. An interesting detail in the result is that sustainability is not directly associated with participation unless it is at the same time also associated with positive attitudes towards CC. This suggests that sustainability might only be an important factor for those people for whom ecological consumption is important. Furthermore, the results suggest that in CC an attitude‐behavior gap might exist; people perceive the activity positively and say good things about it, but this good attitude does not necessary translate into action.
    June 02, 2015   doi: 10.1002/asi.23552   open full text
  • How much does the expected number of citations for a publication change if it contains the address of a specific scientific institute? A new approach for the analysis of citation data on the institutional level based on regression models.
    Lutz Bornmann.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    Citation data for institutes are generally provided as numbers of citations or as relative citation rates (as, for example, in the Leiden Ranking). These numbers can then be compared between the institutes. This study aims to present a new approach for the evaluation of citation data at the institutional level, based on regression models. As example data, the study includes all articles and reviews from the Web of Science for the publication year 2003 (n = 886,416 papers). The study is based on an in‐house database of the Max Planck Society. The study investigates how much the expected number of citations for a publication changes if it contains the address of an institute. The calculation of the expected values allows, on the one hand, investigating how the citation impact of the papers of an institute appears in comparison with the total of all papers. On the other hand, the expected values for several institutes can be compared with one another or with a set of randomly selected publications. Besides the institutes, the regression models include factors which can be assumed to have a general influence on citation counts (e.g., the number of authors).
    June 01, 2015   doi: 10.1002/asi.23546   open full text
  • Information inequality in contemporary Chinese urban society: The results of a cluster analysis.
    Liangzhi Yu, Wenjie Zhou.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    Having reflected on the theoretical tradition of previous information inequality research that treats society's information rich/poor as identical with its socioeconomic rich/poor, this study examines the informational structure of contemporary Chinese urban society through a cluster analysis of a sample of 3,361 urban residents measured by a holistic informational measurement developed around the concept of “an individual's information world.” It finds that, first, 4 groups, instead of a binary “haves versus have‐nots,” best characterize Chinese urban society informationally; second, the distribution of people among these groups conforms to normal distribution, in striking contrast with the pyramid‐shaped socioeconomic structure of Chinese society; third, although the demographic characteristics of these groups suggest a significant correlation between people's informational and socioeconomic statuses, the 2 are far from identical; fourth, although the 4 groups differ in all aspects investigated, they differ most notably in information assets and the range and type of materials they choose as their regular information resources; fifth, although the 4 groups vary significantly, each differs from the others in its own way. This study concludes that society's informational and socioeconomic structures are 2 related but distinctive structures, and that the informational structure is characterized by highly complicated textures of inequality.
    June 01, 2015   doi: 10.1002/asi.23531   open full text
  • Nobel numbers: Time‐dependent centrality measures on coauthorship graphs.
    Chris Fields.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    A time‐dependent centrality metric for disciplinary coauthorship graphs, the “Nobel number” for a discipline, is introduced. A researcher's Nobel number for a given discipline in a given year is defined as the researcher's average coauthorship distance to that discipline's Nobel laureates in that year. Plotting Nobel numbers over several decades provides a quantitative as well as visual indication of a researcher's proximity to the intuitive “center” of a discipline as defined by recognized scientific achievement. It is shown that the Nobel number distributions for physics of several researchers both within and outside of physics are surprisingly flat over the five‐decade span from 1951 to 2000. A model in which Nobel laureates are typically connected by short coauthorship paths both intergenerationally and between subdisciplines reproduces such flat Nobel number distributions.
    June 01, 2015   doi: 10.1002/asi.23547   open full text
  • Optimization of the subject directory in a government agriculture department web portal.
    Jin Zhang, Shanshan Zhai, Jennifer Ann Stevenson, Lixin Xia.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    We investigated a subject directory in the US Agriculture Department‐Economic Research Service portal. Parent–child relationships, related connections among the categories, and related connections among the subcategories in the subject directory were optimized using social network analysis. The optimization results were assessed by both density analysis and edge strength analysis methods. In addition, the results were evaluated by domain experts. From this study, it is recommended that four subcategories be switched from their original four categories into two different categories as a result of the parent–child relationship optimization. It is also recommended that 132 subcategories be moved to 40 subcategories and that eight categories be moved to two categories as a result of the related connection optimization. The findings show that optimization boosted the densities of the optimized categories, and the recommended connections of both the related categories and subcategories were stronger than the existing connections of the related categories and subcategories. This paper provides visual displays of the optimization analysis as well as suggestions to enhance the subject directory of this portal.
    June 01, 2015   doi: 10.1002/asi.23550   open full text
  • Scientific research measures.
    Marco Frittelli, Loriano Mancini, Ilaria Peri.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    The evaluation of scientific research is crucial for both the academic community and society as a whole. Numerous bibliometric indices have been proposed for the ranking of research performance, mainly on an ad hoc basis. We introduce the novel class of Scientific Research Measures (SRMs) to rank scientists' research performance and provide a rigorous theoretical foundation for these measures. In contrast to many bibliometric indices, SRMs take into account the whole citation curve of the scientist, offer appealing structural properties, allow a finer ranking of scientists, correspond to specific features of different disciplines, research areas and seniorities, and include several bibliometric indices as special cases. Thus SRMs result in more accurate rankings than ad hoc bibliometric indices. We also introduce the further general class of Dual SRMs that reflect the “value” of journals and permit the ranking of research institutions based on theoretically sound criteria, which has been a central theme in the scientific community over recent decades. An empirical application to the citation curves of 173 finance scholars shows that SRMs can be easily calibrated to actual citation curves and generate different authors' rankings than those produced by seven traditional bibliometric indices.
    June 01, 2015   doi: 10.1002/asi.23530   open full text
  • Constructing conceptual trajectory maps to trace the development of research fields.
    Yi‐Ning Tu, Shu‐Lan Hsu.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    This study proposes a new method to construct and trace the trajectory of conceptual development of a research field by combining main path analysis, citation analysis, and text‐mining techniques. Main path analysis, a method used commonly to trace the most critical path in a citation network, helps describe the developmental trajectory of a research field. This study extends the main path analysis method and applies text‐mining techniques in the new method, which reflects the trajectory of conceptual development in an academic research field more accurately than citation frequency, which represents only the articles examined. Articles can be merged based on similarity of concepts, and by merging concepts the history of a research field can be described more precisely. The new method was applied to the “h‐index” and “text mining” fields. The precision, recall, and F‐measures of the h‐index were 0.738, 0.652, and 0.658 and those of text‐mining were 0.501, 0.653, and 0.551, respectively. Last, this study not only establishes the conceptual trajectory map of a research field, but also recommends keywords that are more precise than those used currently by researchers. These precise keywords could enable researchers to gather related works more quickly than before.
    June 01, 2015   doi: 10.1002/asi.23522   open full text
  • Booklovers' world: An examination of factors affecting continued usage of social cataloging sites.
    Namjoo Choi, Soohyung Joo.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    Little is known about what factors influence users' continued use of social cataloging sites. This study therefore examines the impacts of key factors from theories of information systems (IS) success and sense of community (SOC) on users' continuance intention in the social cataloging context. Data collected from an online survey of 323 social cataloging users provide empirical support for the research model. The findings indicate that both information quality (IQ) and system quality (SQ) are significant predictors of satisfaction and SOC, which in turn lead to users' intentions to continue using these sites. In addition, SOC was found to affect continuance intention not only directly, but also indirectly through satisfaction. Theoretically, this study draws attention to a largely unexplored but essential area of research in the social cataloging literature and provides a fundamental basis to understand the determinants of continued social cataloging usage. From a managerial perspective, the findings suggest that social cataloging service providers should constantly focus their efforts on the quality control of their contents and system, and the enhancement of SOC among their users.
    June 01, 2015   doi: 10.1002/asi.23556   open full text
  • The effects of research level and article type on the differences between citation metrics and F1000 recommendations.
    Jian Du, Xiaoli Tang, Yishan Wu.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    F1000 recommendations were assessed as a potential data source for research evaluation, but the reasons for differences between F1000 Article Factor (FFa scores) and citations remain unexplored. By linking recommendations for 28,254 publications in F1000 with citations in Scopus, we investigated the effect of research level (basic, clinical, mixed) and article type on the internal consistency of assessments based on citations and FFa scores. The research level has little impact on the differences between the 2 evaluation tools, while article type has a big effect. These 2 measures differ significantly for 2 groups: (a) nonprimary research or evidence‐based research are more highly cited but not highly recommended, while (b) translational research or transformative research are more highly recommended but have fewer citations. This can be expected, since citation activity is usually practiced by academic authors while the potential for scientific revolutions and the suitability for clinical practice of an article should be investigated from a practitioners' perspective. We conclude with a recommendation that the application of bibliometric approaches in research evaluation should consider the proportion of 3 types of publications: evidence‐based research, transformative research, and translational research. The latter 2 types are more suitable for assessment through peer review.
    June 01, 2015   doi: 10.1002/asi.23548   open full text
  • An automatic method for assessing the teaching impact of books from online academic syllabi.
    Kayvan Kousha, Mike Thelwall.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    Scholars writing books that are widely used to support teaching in higher education may be undervalued because of a lack of evidence of teaching value. Although sales data may give credible evidence for textbooks, these data may poorly reflect educational uses of other types of books. As an alternative, this article proposes a method to search automatically for mentions of books in online academic course syllabi based on Bing searches for syllabi mentioning a given book, filtering out false matches through an extensive set of rules. The method had an accuracy of over 90% based on manual checks of a sample of 2,600 results from the initial Bing searches. Over one third of about 14,000 monographs checked had one or more academic syllabus mention, with more in the arts and humanities (56%) and social sciences (52%). Low but significant correlations between syllabus mentions and citations across most fields, except the social sciences, suggest that books tend to have different levels of impact for teaching and research. In conclusion, the automatic syllabus search method gives a new way to estimate the educational utility of books in a way that sales data and citation counts cannot.
    June 01, 2015   doi: 10.1002/asi.23542   open full text
  • Analyzing data citation practices using the data citation index.
    Nicolas Robinson‐García, Evaristo Jiménez‐Contreras, Daniel Torres‐Salinas.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    We present an analysis of data citation practices based on the Data Citation Index (DCI) (Thomson Reuters). This database launched in 2012 links data sets and data studies with citations received from the other citation indexes. The DCI harvests citations to research data from papers indexed in the Web of Science. It relies on the information provided by the data repository. The findings of this study show that data citation practices are far from common in most research fields. Some differences have been reported on the way researchers cite data: Although in the areas of science and engineering & technology data sets were the most cited, in the social sciences and arts & humanities data studies play a greater role. A total of 88.1% of the records have received no citation, but some repositories show very low uncitedness rates. Although data citation practices are rare in most fields, they have expanded in disciplines such as crystallography and genomics. We conclude by emphasizing the role that the DCI could play in encouraging the consistent, standardized citation of research data—a role that would enhance their value as a means of following the research process from data collection to publication.
    June 01, 2015   doi: 10.1002/asi.23529   open full text
  • Using path‐based approaches to examine the dynamic structure of discipline‐level citation networks: 1997–2011.
    Erjia Yan, Qi Yu.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    The objective of this paper is to identify the dynamic structure of several time‐dependent, discipline‐level citation networks through a path‐based method. A network data set is prepared that comprises 27 subjects and their citations aggregated from more than 27,000 journals and proceedings indexed in the Scopus database. A maximum spanning tree method is employed to extract paths in the weighted, directed, and cyclic networks. This paper finds that subjects such as Medicine, Biochemistry, Chemistry, Materials Science, Physics, and Social Sciences are the ones with multiple branches in the spanning tree. This paper also finds that most paths connect science, technology, engineering, and mathematics (STEM) fields; 2 critical paths connecting STEM and non‐STEM fields are the one from Mathematics to Decision Sciences and the one from Medicine to Social Sciences.
    June 01, 2015   doi: 10.1002/asi.23516   open full text
  • Information retrieval from historical newspaper collections in highly inflectional languages: A query expansion approach.
    Anni Järvelin, Heikki Keskustalo, Eero Sormunen, Miamaria Saastamoinen, Kimmo Kettunen.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield‐style test. Finally, a detailed topic‐level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
    June 01, 2015   doi: 10.1002/asi.23379   open full text
  • Success in online searches: Differences between evaluation and finding tasks.
    Werner Wirth, Katharina Sommer, Thilo Pape, Veronika Karnowski.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    Several studies have identified important factors for search success in online searches, but until now it has not been determined whether the influence of these factors varies during the search process. This study analyzes (a) whether search expertise, prior topic knowledge, topic interest, or flow experience during a search of the World Wide Web (WWW) influence success in finding relevant information and (b) whether the effects of these predictors vary during the course of the search process. Two different search tasks are investigated: The evaluating task focuses on the selection of relevant websites from a large number of potentially relevant sites, whereas the finding task focuses on the difficulty of finding information in the case of a lack of potentially relevant websites. Survival analysis is applied to data from a quasi‐experiment. This analysis considers not only the question of whether information is found, but also when. Findings show that search expertise and flow explain success in the evaluation task; however, flow is only influential in the first phase of the search process. For the finding task, the predictors have no explanatory strength.
    June 01, 2015   doi: 10.1002/asi.23389   open full text
  • A content analysis of Twitter hyperlinks and their application in web resource indexing.
    Kwan Yi, Namjoo Choi, Yung Soo Kim.
    Journal of the American Society for Information Science and Technology. June 01, 2015
    Twitter has emerged as a popular source of sharing and delivering news information. In tweet messages, URLs to web resources and hashtags are often included. This study investigates the potential of the hyperlinks and hashtags as topical clues and indicators to tweet messages. For this study, we crawled and analyzed about 1.5 million tweets for a 3‐month period covering any topic or subject. The findings of this study revealed a power law relationship for the ranking and frequency of (a) the host names of URLs, and (b) a pair of hashtags and URLs that appeared in the tweet messages. This study also discovered that the most popular URLs used in tweets come from news and media websites, and a majority of the hyperlinked resources are news web pages. One implication of this study is that Twitter users are becoming more active in sharing already published information than producing new information. Finally, our investigation on hashtags for web resource indexing reveals that hashtags have the potential to be used as indexing terms for co‐occurring URLs in the same tweet. We also discuss the implications of this study for web resource recommendation.
    June 01, 2015   doi: 10.1002/asi.23508   open full text
  • The construction of interdisciplinarity: The development of the knowledge base and programmatic focus of the journal Climatic Change, 1977–2013.
    Iina Hellsten, Loet Leydesdorff.
    Journal of the American Society for Information Science and Technology. May 27, 2015
    Climate change as a complex physical and social issue has gained increasing attention in the natural as well as the social sciences. Climate change research has become more interdisciplinary and even transdisciplinary as a typical Mode‐2 science that is also dependent on an application context for its further development. We propose to approach interdisciplinarity as a co‐construction of the knowledge base in the reference patterns and the programmatic focus in the editorials in the core journal of the climate‐change sciences—Climatic Change—during the period 1977–2013. First, we analyze the knowledge base of the journal and map journal–journal relations on the basis of the references in the articles. Second, we follow the development of the programmatic focus by analyzing the semantics in the editorials. We argue that interdisciplinarity is a result of the co‐construction between different agendas: The selection of publications into the knowledge base of the journal, and the adjustment of the programmatic focus to the political context in the editorials. Our results show a widening of the knowledge base from referencing the multidisciplinary journals Nature and Science to citing journals from specialist fields. The programmatic focus follows policy‐oriented issues and incorporates public metaphors.
    May 27, 2015   doi: 10.1002/asi.23528   open full text
  • Social‐media‐based public policy informatics: Sentiment and network analyses of U.S. Immigration and border security.
    Wingyan Chung, Daniel Zeng.
    Journal of the American Society for Information Science and Technology. May 21, 2015
    Social media provide opportunities for policy makers to gauge pubic opinion. However, the large volumes and variety of expressions on social media have challenged traditional policy analysis and public sentiment assessment. In this article, we describe a framework for social‐media‐based public policy informatics and a system called “iMood” that addresses the needs for sentiment and network analyses of U.S. immigration and border security. iMood collects related messages on Twitter, extracts user sentiment and emotion, and constructs networks of the Twitter users, helping policy makers to identify opinion leaders, influential users, and community activists. We evaluated the sentiment, emotion, and network characteristics found in 909,035 tweets posted by over 300,000 users during three phases between May and November 2013. Statistical analyses reveal significant differences in emotion and sentiment among the 3 phases. The Twitter networks of the 3 phases also had significantly different relationship counts, network densities, and total influence scores from those of other phases. This research should contribute to developing a new framework and a new system for social‐media‐based public policy informatics, providing new empirical findings and data sets of sentiment and network analyses of U.S. immigration and border security, and demonstrating a general applicability to different domains.
    May 21, 2015   doi: 10.1002/asi.23449   open full text
  • Social media and problematic everyday life information‐seeking outcomes: Differences across use frequency, gender, and problem‐solving styles.
    Sei‐Ching Joanna Sin.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Social media offers both opportunities and challenges in everyday life information seeking (ELIS). Despite their popularity, it is unclear whether the use of social media for ELIS heightens problematic outcomes, such as encountering too much information and finding irrelevant, conflicting, outdated, and noncredible information. In light of this gap, this study tested (a) whether the level of problematic informational outcomes varies with the use of social networking sites, microblogs, and social question and answer sites; (b) whether the problem level varies by gender and problem‐solving styles; and (c) whether the aforementioned factors have significant interaction effects. An online questionnaire was used to survey 791 U.S. undergraduates. Irrelevant information was the top issue. Gender difference was statistically significant for conflicting information, which was more problematic for women. The multiway analysis of variance (ANOVA) indicated notable problem‐solving style differences, especially on the Personal Control subscale. This highlights the importance of affective factors. It is noteworthy that although social media use has no significant main effect, there were significant interaction effects between microblog use and the Approach‐Avoidance and Problem Solving Confidence subscales. The impact of microblog use on ELIS outcomes therefore warrants further investigation. Five propositions are posited for further testing.
    May 13, 2015   doi: 10.1002/asi.23509   open full text
  • Rain or shine? Forecasting search process performance in exploratory search tasks.
    Chirag Shah, Chathra Hendahewa, Roberto González‐Ibáñez.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Most information retrieval (IR) systems consider relevance, usefulness, and quality of information objects (documents, queries) for evaluation, prediction, and recommendation, often ignoring the underlying search process of information seeking. This may leave out opportunities for making recommendations that analyze the search process and/or recommend alternative search process instead of objects. To overcome this limitation, we investigated whether by analyzing a searcher's current processes we could forecast his likelihood of achieving a certain level of success with respect to search performance in the future. We propose a machine‐learning‐based method to dynamically evaluate and predict search performance several time‐steps ahead at each given time point of the search process during an exploratory search task. Our prediction method uses a collection of features extracted from expression of information need and coverage of information. For testing, we used log data collected from 4 user studies that included 216 users (96 individuals and 60 pairs). Our results show 80–90% accuracy in prediction depending on the number of time‐steps ahead. In effect, the work reported here provides a framework for evaluating search processes during exploratory search tasks and predicting search performance. Importantly, the proposed approach is based on user processes and is independent of any IR system.
    May 13, 2015   doi: 10.1002/asi.23484   open full text
  • Thesaurus structure, descriptive parameters, and scale.
    Robert Losee.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    A thesaurus contains a set of terms or features that may be used to represent recorded information, including prose documents or scientific data sets. The focus of this work is on the basic structural nature of a thesaurus itself, not on how people develop a thesaurus or how a thesaurus effects retrieval performance. Thesauri in this research are automatically developed in a simulation from sets of randomly or exhaustively generated documents. Each thesaurus is generated by the Thesaurus Generator software from a set of several hundred documents, and thousands of different document sets are used as input to the Thesaurus Generator, producing thousands of thesauri. Thus, thousands of thesauri are generated for each data point in accompanying graphs. The characteristics of this large number of thesauri are studied so that the relationships between thesaurus parameters can be determined. Some rules governing these relationships are suggested, addressing factors such as tree height and width, number of tree roots in thesauri, and number of terms available for the vocabulary. How these parameters scale as vocabularies grow is addressed. These results apply to various information systems that contain features with hierarchical relationships, including many thesauri and ontologies.
    May 13, 2015   doi: 10.1002/asi.23544   open full text
  • Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature.
    James Howison, Julia Bullard.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Software is increasingly crucial to scholarship, yet the visibility and usefulness of software in the scientific record are in question. Just as with data, the visibility of software in publications is related to incentives to share software in reusable ways, and so promote efficient science. In this article, we examine software in publications through content analysis of a random sample of 90 biology articles. We develop a coding scheme to identify software “mentions” and classify them according to their characteristics and ability to realize the functions of citations. Overall, we find diverse and problematic practices: Only between 31% and 43% of mentions involve formal citations; informal mentions are very common, even in high impact factor journals and across different kinds of software. Software is frequently inaccessible (15%–29% of packages in any form; between 90% and 98% of specific versions; only between 24%–40% provide source code). Cites to publications are particularly poor at providing version information, whereas informal mentions are particularly poor at providing crediting information. We provide recommendations to improve the practice of software citation, highlighting recent nascent efforts. Software plays an increasingly great role in scientific practice; it deserves a clear and useful place in scholarly communication.
    May 13, 2015   doi: 10.1002/asi.23538   open full text
  • A machine‐learning approach to negation and speculation detection for sentiment analysis.
    Noa P. Cruz, Maite Taboada, Ruslan Mitkov.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Recognizing negative and speculative information is highly relevant for sentiment analysis. This paper presents a machine‐learning approach to automatically detect this kind of information in the review domain. The resulting system works in two steps: in the first pass, negation/speculation cues are identified, and in the second phase the full scope of these cues is determined. The system is trained and evaluated on the Simon Fraser University Review corpus, which is extensively used in opinion mining. The results show how the proposed method outstrips the baseline by as much as roughly 20% in the negation cue detection and around 13% in the scope recognition, both in terms of F1. In speculation, the performance obtained in the cue prediction phase is close to that obtained by a human rater carrying out the same task. In the scope detection, the results are also promising and represent a substantial improvement on the baseline (up by roughly 10%). A detailed error analysis is also provided. The extrinsic evaluation shows that the correct identification of cues and scopes is vital for the task of sentiment analysis.
    May 13, 2015   doi: 10.1002/asi.23533   open full text
  • The quality versus accessibility debate revisited: A contingency perspective on human information source selection.
    Lilian Woudstra, Bart Hooff, Alexander Schouten.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Previous studies have not fully investigated the role of source accessibility versus source quality in the selection of information sources. It remains unclear what their (relative) importance is. Three different models have been identified: (a) an exclusively accessibility‐driven model, (b) a cost‐benefit model in which both accessibility and quality are significant influences, and (c) an exclusively quality‐driven model. Moreover, the conditions under which accessibility and quality are important are not well understood. The goal of our study is to shed more light on both issues by assessing the role of different dimensions of accessibility and quality and how their importance is affected by time pressure. We conducted a policy‐capturing study in which 89 financial specialists participated. Each judged 20 scenarios in which the accessibility and quality of human information sources, as well as time pressure, were manipulated. Results showed that both accessibility and quality affect the likelihood of asking a human information source for information. Moreover, although the weights attached to physical accessibility and the source's perceived technical quality were indeed moderated by time pressure, in both conditions we find support for a cost‐benefit model of information seeking, in which both accessibility and quality are significant influences.
    May 13, 2015   doi: 10.1002/asi.23536   open full text
  • A mixture model of global internet capacity distributions.
    Hyunjin Seo, Stuart Thorson.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    This article develops a preferential attachment‐based mixture model of global Internet bandwidth and investigates it in the context of observed bandwidth distributions between 2002 and 2011. Our longitudinal analysis shows, among other things, that the bandwidth share distributions—and thus bandwidth differences—exhibit considerable path dependence where country proportions of international bandwidth in 2011 can be substantially accounted for by a preferential attachment‐based mixture of micro‐level processes. Our preferential attachment model, consistent with empirical data, does not predict increasing concentration of bandwidth within top‐ranked countries. We argue that recognizing the strong, but nuanced, historical inertia of bandwidth distributions is helpful in better discriminating among competing theoretical perspectives on the global digital divide as well as in clarifying policy discussions related to gaps between bandwidth‐rich and bandwidth‐poor countries.
    May 13, 2015   doi: 10.1002/asi.23523   open full text
  • Spamming in scholarly publishing: A case study.
    Marcin Kozak, Olesia Iefremova, James Hartley.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Spam has become an issue of concern in almost all areas where the Internet is involved, and many people today have become victims of spam from publishers and individual journals. We studied this phenomenon in the field of scholarly publishing from the perspective of a single author. We examined 1,024 such spam e‐mails received by Marcin Kozak from publishers and journals over a period of 391 days, asking him to submit an article to their journal. We collected the following information: where the request came from; publishing model applied; fees charged; inclusion or not in the Directory of Open Access Journals (DOAJ); and presence or not in Beall's (2014) listing of dubious journals. Our research showed that most of the publishers that sent e‐mails inviting manuscripts were (i) using the open access model, (ii) using article‐processing charges to fund their journal's operations; (iii) offering very short peer‐review times, (iv) on Beall's list, and (v) misrepresenting the location of their headquarters. Some years ago, a letter of invitation to submit an article to a particular journal was considered a kind of distinction. Today, e‐mails inviting submissions are generally spam, something that misleads young researchers and irritates experienced ones.
    May 13, 2015   doi: 10.1002/asi.23521   open full text
  • Author credit‐assignment schemas: A comparison and analysis.
    Jian Xu, Ying Ding, Min Song, Tamy Chambers.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Credit assignment to multiple authors of a publication is a challenging task owing to the conventions followed within different areas of research. In this study, we present a review of different author credit‐assignment schemas, which are designed mainly based on author position and the total number of coauthors on the publication. We implemented, tested, and classified 15 author credit‐assignment schemas into 3 types: linear, curve, and “other” assignment schemas. Further investigation and analysis revealed that most of the methods provide reasonable credit‐assignment results, even though the credit‐assignment distribution approaches are quite different among different types. The evaluation of each schema based on PubMed articles published in 2013 shows that there exist positive correlations among different schemas and that the similarity of credit‐assignment distributions can be derived from the similar design principles that stress the number of coauthors or the author position, or consider both. We provide a summary about the features of each credit‐assignment schema to facilitate the selection of the appropriate one, depending on the different conditions required to meet diverse needs.
    May 13, 2015   doi: 10.1002/asi.23495   open full text
  • Wikipedia, collective memory, and the Vietnam war.
    Brendan Luyt.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Wikipedia is increasingly an important source of information for many. Hence, it is important to develop an understanding of how it is situated within society and the wider roles it is called onto perform. This article argues that one of these roles is as a depository of collective memory. Building on the work of Pentzold, I present a case study of the English Wikipedia article on the Vietnam War to demonstrate that the article, or more accurately, its talk pages, provide a forum for the contestation of collective memory. I further argue that this function is one that should be supported by libraries as they position themselves within a rapidly changing digital world.
    May 13, 2015   doi: 10.1002/asi.23518   open full text
  • Academics' responses to encountered information: Context matters.
    Sheila Pontis, Genovefa Kefalidou, Ann Blandford, Jamie Forth, Stephann Makri, Sarah Sharples, Geraint Wiggins, Mel Woods.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    An increasing number of tools are being developed to help academics interact with information, but little is known about the benefits of those tools for their users. This study evaluated academics' receptiveness to information proposed by a mobile app, the SerenA Notebook: information that is based in their inferred interests but does not relate directly to a prior recognized need. The evaluated app aimed at creating the experience of serendipitous encounters: generating ideas and inspiring thoughts, and potentially triggering follow‐up actions, by providing users with suggestions related to their work and leisure interests. We studied how 20 academics interacted with messages sent by the mobile app (3 per day over 10 consecutive days). Collected data sets were analyzed using thematic analysis. We found that contextual factors (location, activity, and focus) strongly influenced their responses to messages. Academics described some unsolicited information as interesting but irrelevant when they could not make immediate use of it. They highlighted filtering information as their major struggle rather than finding information. Some messages that were positively received acted as reminders of activities participants were meant to be doing but were postponing, or were relevant to ongoing activities at the time the information was received.
    May 13, 2015   doi: 10.1002/asi.23502   open full text
  • The effects of distraction on task completion scores in a natural environment test setting.
    Elke Greifeneder.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    The effects of distraction on completion scores generate a gap that is generally not taken into account in information behavior studies. This research investigated what happens if researchers de facto allow distractions to occur in a test situation. It examined the type and magnitude of occurred distractions, the effects distractions have on completion scores, and whether different distractions affect different test activities differently. In the research design, participants were randomly assigned to either a controlled environment or their natural environment. The results showed that whereas participants in the natural environment needed more time to complete the post task questionnaire than their laboratory counterparts, they spent a similar amount of time on the tasks. Participants were capable of, and indeed willing to, limit the less‐urgent distractions in the interests of getting the tasks done. If they were interrupted by a human contact, however, the completion time for tasks increased significantly. Previous studies showed that distractions change information behavior. Yet, the present results provide evidence that these changes do not always occur, and thus there needs to be a better demarcation of the limits within which distraction can be expected to change how people interact with information.
    May 13, 2015   doi: 10.1002/asi.23537   open full text
  • Text representation strategies: An example with the State of the union addresses.
    Jacques Savoy.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Based on State of the Union addresses from 1790 to 2014 (225 speeches delivered by 42 presidents), this paper describes and evaluates different text representation strategies. To determine the most important words of a given text, the term frequencies (tf) or the tf idf weighting scheme can be applied. Recently, latent Dirichlet allocation (LDA) has been proposed to define the topics included in a corpus. As another strategy, this study proposes to apply a vocabulary specificity measure (Z score) to determine the most significantly overused word‐types or short sequences of them. Our experiments show that the simple term frequency measure is not able to discriminate between specific terms associated with a document or a set of texts. Using the tf idf or LDA approach, the selection requires some arbitrary decisions. Based on the term‐specific measure (Z score), the term selection has a clear theoretical basis. Moreover, the most significant sentences for each presidency can be determined. As another facet, we can visualize the dynamic evolution of usage of some terms associated with their specificity measures. Finally, this technique can be employed to define the most important lexical leaders introducing terms overused by the k following presidencies.
    May 13, 2015   doi: 10.1002/asi.23510   open full text
  • Not all international collaboration is beneficial: The Mendeley readership and citation impact of biochemical research collaboration.
    Pardeep Sud, Mike Thelwall.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Biochemistry is a highly funded research area that is typified by large research teams and is important for many areas of the life sciences. This article investigates the citation impact and Mendeley readership impact of biochemistry research from 2011 in the Web of Science according to the type of collaboration involved. Negative binomial regression models are used that incorporate, for the first time, the inclusion of specific countries within a team. The results show that, holding other factors constant, larger teams robustly associate with higher impact research, but including additional departments has no effect and adding extra institutions tends to reduce the impact of research. Although international collaboration is apparently not advantageous in general, collaboration with the United States, and perhaps also with some other countries, seems to increase impact. In contrast, collaborations with some other nations seems to decrease impact, although both findings could be due to factors such as differing national proportions of excellent researchers. As a methodological implication, simpler statistical models would find international collaboration to be generally beneficial and so it is important to take into account specific countries when examining collaboration.
    May 13, 2015   doi: 10.1002/asi.23515   open full text
  • Reducing digital divide effects through student engagement in coordinated game design, online resource use, and social computing activities in school.
    Rebecca Reynolds, Ming Ming Chiu.
    Journal of the American Society for Information Science and Technology. May 13, 2015
    Participating in online social, cultural, and political activities requires digital skill and knowledge. This study investigates how sustained student engagement in game design and social media use can attenuate the relations between socioeconomic factors and digital inequality among youth. This study of 242 middle and high school students participating in the Globaloria project shows that participation eliminates gender effects, and reduces parent education effects in home computer use. Further, students from schools with lower parent education show greater increases in frequency of school technology engagement. Globaloria participation also weakens the link between prior school achievement and advanced technology activities. Results offer evidence that school‐based digital literacy programs can attenuate digital divide effects known to occur cross‐sectionally in the general U.S. population.
    May 13, 2015   doi: 10.1002/asi.23504   open full text
  • Recovering uncaptured citations in a scholarly network: A two‐step citation analysis to estimate publication importance.
    Zhuoren Jiang, Xiaozhong Liu, Yan Chen.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    The citation relationships between publications, which are significant for assessing the importance of scholarly components within a network, have been used for various scientific applications. Missing citation metadata in scholarly databases, however, create problems for classical citation‐based ranking algorithms and challenge the performance of citation‐based retrieval systems. In this research, we utilize a two‐step citation analysis method to investigate the importance of publications for which citation information is partially missing. First, we calculate the importance of the author and then use his importance to estimate the publication importance for some selected articles. To evaluate this method, we designed a simulation experiment—“random citation‐missing”—to test the two‐step citation analysis that we carried out with the Association for Computing Machinery (ACM) Digital Library (DL). In this experiment, we simulated different scenarios in a large‐scale scientific digital library, from high‐quality citation data, to very poor quality data, The results show that a two‐step citation analysis can effectively uncover the importance of publications in different situations. More importantly, we found that the optimized impact from the importance of an author (first step) is exponentially increased when the quality of citation decreases. The findings from this study can further enhance citation‐based publication‐ranking algorithms for real‐world applications.
    May 05, 2015   doi: 10.1002/asi.23475   open full text
  • Genetic algorithms and Gaussian Bayesian networks to uncover the predictive core set of bibliometric indices.
    Alfonso Ibáñez, Rubén Armañanzas, Concha Bielza, Pedro Larrañaga.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    The diversity of bibliometric indices today poses the challenge of exploiting the relationships among them. Our research uncovers the best core set of relevant indices for predicting other bibliometric indices. An added difficulty is to select the role of each variable, that is, which bibliometric indices are predictive variables and which are response variables. This results in a novel multioutput regression problem where the role of each variable (predictor or response) is unknown beforehand. We use Gaussian Bayesian networks to solve the this problem and discover multivariate relationships among bibliometric indices. These networks are learnt by a genetic algorithm that looks for the optimal models that best predict bibliometric data. Results show that the optimal induced Gaussian Bayesian networks corroborate previous relationships between several indices, but also suggest new, previously unreported interactions. An extended analysis of the best model illustrates that a set of 12 bibliometric indices can be accurately predicted using only a smaller predictive core subset composed of citations, g‐index, q2‐index, and hr‐index. This research is performed using bibliometric data on Spanish full professors associated with the computer science area.
    May 05, 2015   doi: 10.1002/asi.23467   open full text
  • The linguistic construal of disciplinarity: A data‐mining approach using register features.
    Elke Teich, Stefania Degaetano‐Ortlieb, Peter Fankhauser, Hannah Kermes, Ekaterina Lapshinova‐Koltunski.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    We analyze the linguistic evolution of selected scientific disciplines over a 30‐year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use—both individually and collectively—over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus‐based methods of feature extraction (various aggregated features [part‐of‐speech based], n‐grams, lexico‐grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
    May 05, 2015   doi: 10.1002/asi.23457   open full text
  • Why are these similar? Investigating item similarity types in a large digital library.
    Aitor Gonzalez‐Agirre, German Rigau, Eneko Agirre, Nikolaos Aletras, Mark Stevenson.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    We introduce a new problem, identifying the type of relation that holds between a pair of similar items in a digital library. Being able to provide a reason why items are similar has applications in recommendation, personalization, and search. We investigate the problem within the context of Europeana, a large digital library containing items related to cultural heritage. A range of types of similarity in this collection were identified. A set of 1,500 pairs of items from the collection were annotated using crowdsourcing. A high intertagger agreement (average 71.5 Pearson correlation) was obtained and demonstrates that the task is well defined. We also present several approaches to automatically identifying the type of similarity. The best system applies linear regression and achieves a mean Pearson correlation of 71.3, close to human performance. The problem formulation and data set described here were used in a public evaluation exercise, the *SEM shared task on Semantic Textual Similarity. The task attracted the participation of 6 teams, who submitted 14 system runs. All annotations, evaluation scripts, and system runs are freely available.
    May 05, 2015   doi: 10.1002/asi.23482   open full text
  • Sentiment‐based event detection in Twitter.
    Georgios Paltoglou.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    The main focus of this article is to examine whether sentiment analysis can be successfully used for “event detection,” that is, detecting significant events that occur in the world. Most solutions to this problem are typically based on increases or spikes in frequency of terms in social media. In our case, we explore whether sudden changes in the positivity or negativity that keywords are typically associated with can be exploited for this purpose. A data set that contains several million Twitter messages over a 1‐month time span is presented and experimental results demonstrate that sentiment analysis can be successfully utilized for this purpose. Further experiments study the sensitivity of both frequency‐ or sentiment‐based solutions to a number of parameters. Concretely, we show that the number of tweets that are used for event detection is an important factor, while the number of days used to extract token frequency or sentiment averages is not. Lastly, we present results focusing on detecting local events and conclude that all approaches are dependant on the level of coverage that such events receive in social media.
    May 05, 2015   doi: 10.1002/asi.23465   open full text
  • Testing a model of user‐experience with news websites.
    Gabor Aranyi, Paul Schaik.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    Although the Internet has become a major source for accessing news, there is little research regarding users' experience with news sites. We conducted an experiment to test a comprehensive model of user experience with news sites that was developed previously by means of an online survey. Level of adoption (novel or adopted site) was controlled with a between‐subjects manipulation. We collected participants' answers to psychometric scales at 2 times: after presentation of 5 screenshots of a news site and directly after 10 minutes of hands‐on experience with the site. The model was extended with the prediction of users' satisfaction with news sites as a high‐level design goal. A psychometric measure of trust in news providers was developed and added to the model to better predict people's intention to use particular news sites. The model presented in this article represents a theoretically founded, empirically tested basis for evaluating news websites, and it holds theoretical relevance to user‐experience research in general. Finally, the findings and the model are applied to provide practical guidance in design prioritization.
    May 05, 2015   doi: 10.1002/asi.23462   open full text
  • Information seeking for musical creativity: A systematic literature review.
    Charilaos Lavranos, Petros Kostagiolas, Nikolaos Korfiatis, Joseph Papadatos.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    This paper aims to present a systematic literature review of research in music information seeking and its application to musical creativity and creative activities and in particular composition, performance and improvisation, and listening and analysis. A seed set of 901 articles published between 1973 and 2015 was evaluated and in total 65 studies were considered for further analyses. Data extraction and synthesis was performed through content analysis using the PRISMA method. Three thematic categories were identified in regard to music information needs: (a) those related to scholarly activities, (b) musically motivated, as well as (c) those which are related to socializing and communication. In addition, 3 categories of music information sources were connected to musical creativity: (i) those that are related to Internet and media technologies, (ii) those that are related to music libraries, organizations, and music stores, and (iii) those that are related to the subjects' social settings. The paper provides a systematic review, with the aim of showcasing the effect of modern information retrieval techniques in a creative and intensive area of information‐dependent activity such as music making and consumption.
    May 05, 2015   doi: 10.1002/asi.23534   open full text
  • Estimating the probability of an authorship attribution.
    Jacques Savoy.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    In authorship attribution, various distance‐based metrics have been proposed to determine the most probable author of a disputed text. In this paradigm, a distance is computed between each author profile and the query text. These values are then employed only to rank the possible authors. In this article, we analyze their distribution and show that we can model it as a mixture of 2 Beta distributions. Based on this finding, we demonstrate how we can derive a more accurate probability that the closest author is, in fact, the real author. To evaluate this approach, we have chosen 4 authorship attribution methods (Burrows' Delta, Kullback‐Leibler divergence, Labbé's intertextual distance, and the naïve Bayes). As the first test collection, we have downloaded 224 State of the Union addresses (from 1790 to 2014) delivered by 41 U.S. presidents. The second test collection is formed by the Federalist Papers. The evaluations indicate that the accuracy rate of some authorship decisions can be improved. The suggested method can signal that the proposed assignment should be interpreted as possible, without strong certainty. Being able to quantify the certainty associated with an authorship decision can be a useful component when important decisions must be taken.
    May 05, 2015   doi: 10.1002/asi.23455   open full text
  • The development and validation of a one‐bit comparison for evaluating the maturity of tag distributions in a Web 2.0 environment.
    Kuo‐Hao Tang, Li‐Chen Tsai, Sheue‐Ling Hwang.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    Tags generated by domain experts reaching a consensus under social influence reflect the core concepts of the tagged resource. Such tags can act as navigational cues that enable users to discover meaningful and relevant information in a Web 2.0 environment. This is particularly critical for nonexperts for understanding formal academic or scientific resources, also known as hard content. The goal of this study was to develop a novel one‐bit comparison (OBC) metric and to assess in what circumstances a set of tags describing a hard‐content resource is mature and representative. We compared OBC with the conventional Shannon entropy approach to determine performance when distinguishing tags generated by domain experts and nonexperts in the early and later stages under social influence. The results indicated that OBC can accurately distinguish mature tags generated by a strong expert consensus from other tags, and outperform Shannon entropy. The findings support tag‐based learning, and provide insights and tools for the design of applications involving tags, such as tag recommendation and tag‐based organization.
    May 05, 2015   doi: 10.1002/asi.23454   open full text
  • Investment decision paths in the information age: The effect of online journalism.
    Michal Gaziel Yablowitz, Daphne R. Raban.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    In the rapidly evolving technology world blogs have become a popular genre of communication. Their potential influence on decision making is the focus of the present research. Based on interplay between Social Judgment Theory and Framing Theory this study investigates whether the information delivered by technology blogs is treated differently during investment decision making than information from traditional financial newspapers in digital form, while containing information cues and text framing. Using an online experiment with a 3 × 2 design, this research compares the influence of this trio of variables on the investment decisions of 236 participants. Results indicate a complex investment decision‐making process differing according to the type of medium presented, the text framing, the information cues, and the decision maker's background.
    May 05, 2015   doi: 10.1002/asi.23453   open full text
  • Research synthesis methods and library and information science: Shared problems, limited diffusion.
    Laura Sheble.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    Interests of researchers who engage with research synthesis methods (RSM) intersect with library and information science (LIS) research and practice. This intersection is described by a summary of conceptualizations of research synthesis in a diverse set of research fields and in the context of Swanson's (1986) discussion of undiscovered public knowledge. Through a selective literature review, research topics that intersect with LIS and RSM are outlined. Topics identified include open access, information retrieval, bias and research information ethics, referencing practices, citation patterns, and data science. Subsequently, bibliometrics and topic modeling are used to present a systematic overview of the visibility of RSM in LIS. This analysis indicates that RSM became visible in LIS in the 1980s. Overall, LIS research has drawn substantially from general and internal medicine, the field's own literature, and business; and is drawn on by health and medical sciences, computing, and business. Through this analytical overview, it is confirmed that research synthesis is more visible in the health and medical literature in LIS; but suggests that, LIS, as a meta‐science, has the potential to make substantive contributions to a broader variety of fields in the context of topics related to research synthesis methods.
    May 05, 2015   doi: 10.1002/asi.23499   open full text
  • Mendeley readership altmetrics for medical articles: An analysis of 45 fields.
    Mike Thelwall, Paul Wilson.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    Medical research is highly funded and often expensive and so is particularly important to evaluate effectively. Nevertheless, citation counts may accrue too slowly for use in some formal and informal evaluations. It is therefore important to investigate whether alternative metrics could be used as substitutes. This article assesses whether one such altmetric, Mendeley readership counts, correlates strongly with citation counts across all medical fields, whether the relationship is stronger if student readers are excluded, and whether they are distributed similarly to citation counts. Based on a sample of 332,975 articles from 2009 in 45 medical fields in Scopus, citation counts correlated strongly (about 0.7; 78% of articles had at least one reader) with Mendeley readership counts (from the new version 1 applications programming interface [API]) in almost all fields, with one minor exception, and the correlations tended to decrease slightly when student readers were excluded. Readership followed either a lognormal or a hooked power law distribution, whereas citations always followed a hooked power law, showing that the two may have underlying differences.
    May 05, 2015   doi: 10.1002/asi.23501   open full text
  • Information flows as bases for archeology‐specific geodata infrastructures: An exploratory study in flanders.
    Berdien De Roo, Philippe De Maeyer, Jean Bourgeois.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    Accurate and detailed data recording is indispensable for documenting archeological projects and for subsequent information exchange. To prevent comprehension and accessibility issues in these cases, data infrastructures can be useful. The establishment of such data infrastructures requires a clear understanding of the business processes and information flows within the archeological domain. This study attempts to provide insights into how information is managed in Flemish archeological processes and how this management process can be enhanced: an exploratory study based on an analysis of the new Flemish Immovable Heritage Decree, informal interviews with Flemish archeological organizations, and the results of an international survey. Three main processes, in which certified archeologists and the Flemish Heritage agency are key actors, were identified. Multiple types of information, the majority of which contain a geographical component, are recorded, acquired, used, and exchanged. Geographical information systems (GIS) and geodatabases therefore appear to be valuable components of an archeology‐specific data infrastructure. This is of interest because GIS are widely adopted in archeology and multiple Flemish archeological organizations are in favor of a government‐provided exchange standard or database templates for data recording. Furthermore, free and open source software is preferred to ensure cost efficiency and customizability.
    May 05, 2015   doi: 10.1002/asi.23511   open full text
  • Web mining for navigation problem detection and diagnosis in Discapnet: A website aimed at disabled people.
    Olatz Arbelaitz, Aizea Lojo, Javier Muguerza, Iñigo Perona.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    The dramatic increase in the amount of information stored on the web makes it more important to familiarize people with disabilities and elderly people with digital devices and applications and to adapt websites to enable their use by these users. Discapnet is a website mainly aimed at visually disabled people, and navigation is a challenging task for its users. In this context, system evaluation and problem detection become crucial aspects for enhancing user experience and may contribute greatly to diminishing the existing technological gap. This study proposes a system based on web‐mining techniques that collects in‐use information while the user is accessing the web (thus, being a noninvasive system). The proposed system models users in the wild and discovers navigation problems appearing in Discapnet and can also be used for problem detection when new users are navigating the site. The system was tested and its efficiency demonstrated in an experiment involving navigation under supervision, in which 82.6% of a set of disabled people were automatically labeled as having problems with the website.
    May 05, 2015   doi: 10.1002/asi.23506   open full text
  • Using the wayback machine to mine websites in the social sciences: A methodological resource.
    Sanjay K. Arora, Yin Li, Jan Youtie, Philip Shapira.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web‐based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step‐by‐step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small‐ and medium‐sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationalization, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high‐quality data set from archived web information.
    May 05, 2015   doi: 10.1002/asi.23503   open full text
  • Why experience matters to privacy: How context‐based experience moderates consumer privacy expectations for mobile applications.
    Kirsten Martin, Katie Shilton.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    Two dominant theoretical models for privacy—individual privacy preferences and context‐dependent definitions of privacy—are often studied separately in information systems research. This paper unites these theories by examining how individual privacy preferences impact context‐dependent privacy expectations. The paper theorizes that experience provides a bridge between individuals' general privacy attitudes and nuanced contextual factors. This leads to the hypothesis that, when making judgments about privacy expectations, individuals with less experience in a context rely more on individual preferences such as their generalized privacy beliefs, whereas individuals with more experience in a context are influenced by contextual factors and norms. To test this hypothesis, 1,925 American users of mobile applications made judgments about whether varied real‐world scenarios involving data collection and use met their privacy expectations. Analysis of the data suggests that experience using mobile applications did moderate the effect of individual preferences and contextual factors on privacy judgments. Experience changed the equation respondents used to assess whether data collection and use scenarios met their privacy expectations. Discovering the bridge between 2 dominant theoretical models enables future privacy research to consider both personal and contextual variables by taking differences in experience into account.
    May 05, 2015   doi: 10.1002/asi.23500   open full text
  • Understanding scientific collaboration in the research life cycle: Bio‐ and nanoscientists' motivations, information‐sharing and communication practices, and barriers to collaboration.
    EunKyung Chung, Nahyun Kwon, Jungyeoun Lee.
    Journal of the American Society for Information Science and Technology. May 05, 2015
    This study aims to identify the way researchers collaborate with other researchers in the course of the scientific research life cycle and provide information to the designers of e‐Science and e‐Research implementations. On the basis of in‐depth interviews with and on‐site observations of 24 scientists and a follow‐up focus group interview in the field of bioscience/nanoscience and technology in Korea, we examined scientific collaboration using the framework of the scientific research life cycle. We attempt to explain the major motivations, characteristics of communication and information sharing, and barriers associated with scientists' research collaboration practices throughout the research life cycle. The findings identify several notable phenomena including motivating factors, the timing of collaboration formation, partner selection, communication methods, information‐sharing practices, and barriers at each phase of the life cycle. We find that specific motivations were related to specific phases. The formation of collaboration was observed throughout the entire process, not only in the beginning phase of the cycle. For communication and information‐sharing practices, scientists continue to favor traditional means of communication for security reasons. Barriers to collaboration throughout the phases included different priorities, competitive tensions, and a hierarchical culture among collaborators, whereas credit sharing was a barrier in the research product phase.
    May 05, 2015   doi: 10.1002/asi.23520   open full text
  • Social scientists' satisfaction with data reuse.
    Ixchel M. Faniel, Adam Kriesberg, Elizabeth Yakel.
    Journal of the American Society for Information Science and Technology. May 04, 2015
    Much of the recent research on digital data repositories has focused on assessing either the trustworthiness of the repository or quantifying the frequency of data reuse. Satisfaction with the data reuse experience, however, has not been widely studied. Drawing from the information systems and information science literature, we developed a model to examine the relationship between data quality and data reusers' satisfaction. Based on a survey of 1,480 journal article authors who cited Inter‐University Consortium for Political and Social Research (ICPSR) data in published papers from 2008–2012, we found several data quality attributes—completeness, accessibility, ease of operation, and credibility—had significant positive associations with data reusers' satisfaction. There was also a significant positive relationship between documentation quality and data reusers' satisfaction.
    May 04, 2015   doi: 10.1002/asi.23480   open full text
  • Disciplinary, national, and departmental contributions to the literature of library and information science, 2007–2012.
    William H. Walters, Esther Isabelle Wilder.
    Journal of the American Society for Information Science and Technology. April 29, 2015
    We investigate the contributions of particular disciplines, countries, and academic departments to the literature of library and information science (LIS) using data for the articles published in 31 journals from 2007 to 2012. In particular, we examine the contributions of authors outside the United States, the United Kingdom, and Canada; faculty in departments other than LIS; and practicing librarians. Worldwide, faculty in LIS departments account for 31% of the journal literature; librarians, 23%; computer science faculty, 10%; and management faculty, 10%. The top contributing nations are the United States, the United Kingdom, Spain, China, Canada, and Taiwan. Within the United States and the United Kingdom, the current productivity of LIS departments is correlated with past productivity and with other measures of reputation and performance. More generally, the distribution of contributions is highly skewed. In the United States, five departments account for 27% of the articles contributed by LIS faculty; in the United Kingdom, four departments account for nearly two‐thirds of the articles. This skewed distribution reinforces the possibility that high‐status departments may gain a permanent advantage in the competition for students, faculty, journal space, and research funding. At the same time, concentrations of research‐active faculty in particular departments may generate beneficial spillover effects.
    April 29, 2015   doi: 10.1002/asi.23448   open full text
  • The boundaries between: Parental involvement in a teen's online world.
    Lee B. Erickson, Pamela Wisniewski, Heng Xu, John M. Carroll, Mary Beth Rosson, Daniel F. Perkins.
    Journal of the American Society for Information Science and Technology. April 29, 2015
    The increasing popularity of the Internet and social media is creating new and unique challenges for parents and adolescents regarding the boundaries between parental control and adolescent autonomy in virtual spaces. Drawing on developmental psychology and Communication Privacy Management (CPM) theory, we conduct a qualitative study to examine the challenge between parental concern for adolescent online safety and teens' desire to independently regulate their own online experiences. Analysis of 12 parent–teen pairs revealed five distinct challenges: (a) increased teen autonomy and decreased parental control resulting from teens' direct and unmediated access to virtual spaces, (b) the shift in power to teens who are often more knowledgeable about online spaces and technology, (c) the use of physical boundaries by parents as a means to control virtual spaces, (d) an increase in indirect boundary control strategies such as covert monitoring, and (e) the blurring of lines in virtual spaces between parents' teens and teens' friends.
    April 29, 2015   doi: 10.1002/asi.23450   open full text
  • Indexing by Latent Dirichlet Allocation and an Ensemble Model.
    Yanshan Wang, Jae‐Sung Lee, In‐Chan Choi.
    Journal of the American Society for Information Science and Technology. April 27, 2015
    The contribution of this article is twofold. First, we present Indexing by latent Dirichlet allocation (LDI), an automatic document indexing method. Many ad hoc applications, or their variants with smoothing techniques suggested in LDA‐based language modeling, can result in unsatisfactory performance as the document representations do not accurately reflect concept space. To improve document retrieval performance, we introduce a new definition of document probability vectors in the context of LDA and present a novel scheme for automatic document indexing based on LDA. Second, we propose an Ensemble Model (EnM) for document retrieval. EnM combines basic indexing models by assigning different weights and attempts to uncover the optimal weights to maximize the mean average precision. To solve the optimization problem, we propose an algorithm, which is derived based on the boosting method. The results of our computational experiments on benchmark data sets indicate that both the proposed approaches are viable options for document retrieval.
    April 27, 2015   doi: 10.1002/asi.23444   open full text
  • The decision to submit to a journal: Another example of a valence‐consistent Shift?
    Guido Pepermans, Sandra Rousseau.
    Journal of the American Society for Information Science and Technology. April 15, 2015
    In this article we use a stated choice experiment to study researcher preferences in the information sciences and to investigate the relative importance of different journal characteristics in convincing potential authors to submit to a particular journal. The analysis distinguishes high quality from standard quality articles and focuses on the question whether communicating acceptance rates rather than rejection rates leads to other submission decisions. Our results show that a positive framing effect might be present when authors decide on submitting a high quality article. No evidence of a framing effect is found when authors consider a standard quality article. From a journal marketing perspective, this is important information for editors. Communicating acceptance rates rather than rejection rates might help to convince researchers to submit to their journal.
    April 15, 2015   doi: 10.1002/asi.23491   open full text
  • Science communication and dissemination in different cultures: An analysis of the audience for TED videos in China and abroad.
    Xuelian Pan, Erjia Yan, Weina Hua.
    Journal of the American Society for Information Science and Technology. April 07, 2015
    Disseminated across the world in more than 100 languages and viewed over 1 billion times, TED Talks is a successful example of web‐based science communication. This study investigates the impact of TED Talks videos on YouKu, a Chinese video portal, and YouTube using 6 measures of impact: number of views; likes; dislikes; comments; bookmarks; and shares. In particular, we study the relationship between the topicality and impact of these videos. Findings demonstrate that topics vary greatly in terms of their impact: Topics on entertainment and psychology/philosophy receive more views and likes, whereas design/art and astronomy/biology/oceanography attract fewer comments and bookmarks. Moreover, we identify several topical differences between YouKu and YouTube users. Topics on global issues and technology are more popular on YouKu, whereas topics on entertainment and psychology/philosophy are more popular on YouTube. By analyzing the popularity distribution of videos and the audience characteristics of YouKu, we find that women are more interested in topics on education and psychology/philosophy, whereas men favor topics on technology and astronomy/biology/oceanography.
    April 07, 2015   doi: 10.1002/asi.23461   open full text
  • Users' music information needs and behaviors: Design implications for music information retrieval systems.
    Jin Ha Lee, Hyerim Cho, Yea‐Seul Kim.
    Journal of the American Society for Information Science and Technology. April 07, 2015
    User studies in the music information retrieval (MIR) domain tend to be exploratory and qualitative in nature, involving a small number of users, which makes it difficult to derive broader implications for system design. In order to fill this gap, we conducted a large‐scale user survey questioning various aspects of people's music information needs and behaviors. In particular, we investigated if general music users' needs and behaviors have significantly changed over time by comparing our current survey results with a similar survey conducted in 2004. In this paper, we present the key findings from the survey data and discuss 4 emergent themes—(a) the shift in access and use of personal music collections; (b) the growing need for tools to support collaborative music seeking, listening, and sharing; (c) the importance of “visual” music experiences; and (d) the need for ontologies for providing rich contextual information. We conclude by making specific recommendations for improving the design of MIR systems and services.
    April 07, 2015   doi: 10.1002/asi.23471   open full text
  • Accessibility of graphics in STEM research articles: Analysis and proposals for improvement.
    Bruno Splendiani, Mireia Ribera.
    Journal of the American Society for Information Science and Technology. April 02, 2015
    Images convey essential information in science, technology, engineering, and mathematics communication. Current guidelines on publishing recommend making images accessible to all readers. However, academic publishers do not always follow these guidelines and therefore fail to guarantee access by all readers to the visual content of academic articles. People with severe visual impairments cannot access the visual content of images unless a text alternative describing the images is provided. This study investigates the current use of texts commonly related to images in academic articles, such as captions and mentions, in order to assess their suitability as potential text alternatives to the images for readers who are blind or have severe low vision. A sample of 30 academic articles in the fields of biomedicine, computer science, and mathematics was analyzed and quantitative and qualitative data were collected about images and their related texts. We suggest a practical and sustainable solution that can foster the adoption of good accessibility practices by authors and publishers and facilitate their inclusion in regular publishing workflows.
    April 02, 2015   doi: 10.1002/asi.23464   open full text
  • Distortive effects of initial‐based name disambiguation on measurements of large‐scale coauthorship networks.
    Jinseok Kim, Jana Diesner.
    Journal of the American Society for Information Science and Technology. April 02, 2015
    Scholars have often relied on name initials to resolve name ambiguities in large‐scale coauthorship network research. This approach bears the risk of incorrectly merging or splitting author identities. The use of initial‐based disambiguation has been justified by the assumption that such errors would not affect research findings too much. This paper tests that assumption by analyzing coauthorship networks from five academic fields—biology, computer science, nanoscience, neuroscience, and physics—and an interdisciplinary journal, PNAS. Name instances in data sets of this study were disambiguated based on heuristics gained from previous algorithmic disambiguation solutions. We use disambiguated data as a proxy of ground‐truth to test the performance of three types of initial‐based disambiguation. Our results show that initial‐based disambiguation can misrepresent statistical properties of coauthorship networks: It deflates the number of unique authors, number of components, average shortest paths, clustering coefficient, and assortativity, while it inflates average productivity, density, average coauthor number per author, and largest component size. Also, on average, more than half of top 10 productive or collaborative authors drop off the lists. Asian names were found to account for the majority of misidentification by initial‐based disambiguation due to their common surname and given name initials.
    April 02, 2015   doi: 10.1002/asi.23489   open full text
  • What motivates people to review articles? The case of the human‐computer interaction community.
    Syavash Nobarany, Kellogg S. Booth, Gary Hsieh.
    Journal of the American Society for Information Science and Technology. April 02, 2015
    Recruiting qualified reviewers, though challenging, is crucial for ensuring a fair and robust scholarly peer review process. We conducted a survey of 307 reviewers of submissions to the International Conference on Human Factors in Computing Systems (CHI 2011) to gain a better understanding of their motivations for reviewing. We found that encouraging high‐quality research, giving back to the research community, and finding out about new research were the top general motivations for reviewing. We further found that relevance of the submission to a reviewer's research and relevance to the reviewer's expertise were the strongest motivations for accepting a request to review, closely followed by a number of social factors. Gender and reviewing experience significantly affected some reviewing motivations, such as the desire for learning and preparing for higher reviewing roles. We discuss implications of our findings for the design of future peer review processes and systems to support them.
    April 02, 2015   doi: 10.1002/asi.23469   open full text
  • Assessment of learning to rank methods for query expansion.
    Bo Xu, Hongfei Lin, Yuan Lin.
    Journal of the American Society for Information Science and Technology. April 02, 2015
    Pseudo relevance feedback, as an effective query expansion method, can significantly improve information retrieval performance. However, the method may negatively impact the retrieval performance when some irrelevant terms are used in the expanded query. Therefore, it is necessary to refine the expansion terms. Learning to rank methods have proven effective in information retrieval to solve ranking problems by ranking the most relevant documents at the top of the returned list, but few attempts have been made to employ learning to rank methods for term refinement in pseudo relevance feedback. This article proposes a novel framework to explore the feasibility of using learning to rank to optimize pseudo relevance feedback by means of reranking the candidate expansion terms. We investigate some learning approaches to choose the candidate terms and introduce some state‐of‐the‐art learning to rank methods to refine the expansion terms. In addition, we propose two term labeling strategies and examine the usefulness of various term features to optimize the framework. Experimental results with three TREC collections show that our framework can effectively improve retrieval performance.
    April 02, 2015   doi: 10.1002/asi.23476   open full text
  • Understanding collaborative search for places of interest.
    Misfer Aldosari, Mark Sanderson, Audrey Tam, Alexandra L. Uitdenbogerd.
    Journal of the American Society for Information Science and Technology. April 02, 2015
    Finding a place of interest (e.g., a restaurant, hotel, or attraction) is often related to a group information need, however, the actual multiparty collaboration in such searches has not been explored, and little is known about its significance and related practices. We surveyed 100 computer science students and found that 94% (of respondents) searched for places online; 87% had done so as part of a group. Search for place by multiple active participants was experienced by 78%, with group sizes typically being 2 or 3. Search occurred in a range of settings with both desktop PCs and mobile devices. Difficulties were reported with coordinating tasks, sharing results, and making decisions. The results show that finding a place of interest is a quite different group‐based search than other multiparty information‐seeking activities. The results suggest that local search systems, their interfaces and the devices that access them can be made more usable for collaborative search if they include support for coordination, sharing of results, and decision making.
    April 02, 2015   doi: 10.1002/asi.23466   open full text
  • Tweet‐biased summarization.
    Evi Yulianti, Sharin Huspi, Mark Sanderson.
    Journal of the American Society for Information Science and Technology. April 02, 2015
    We examined whether the microblog comments given by people after reading a web document could be exploited to improve the accuracy of a web document summarization system. We examined the effect of social information (i.e., tweets) on the accuracy of the generated summaries by comparing the user preference for TBS (tweet‐biased summary) with GS (generic summary). The result of crowdsourcing‐based evaluation shows that the user preference for TBS was significantly higher than GS. We also took random samples of the documents to see the performance of summaries in a traditional evaluation using ROUGE, which, in general, TBS was also shown to be better than GS. We further analyzed the influence of the number of tweets pointed to a web document on summarization accuracy, finding a positive moderate correlation between the number of tweets pointed to a web document and the performance of generated TBS as measured by user preference. The results show that incorporating social information into the summary generation process can improve the accuracy of summary. The reason for people choosing one summary over another in a crowdsourcing‐based evaluation is also presented in this article.
    April 02, 2015   doi: 10.1002/asi.23496   open full text
  • The conditions of peak empiricism in big data and interaction design.
    Michael Marcinkowski, Fred Fonseca.
    Journal of the American Society for Information Science and Technology. March 27, 2015
    An influx of mechanisms for the collection of large sets of data has prompted widespread consideration of the impact that data analytic methods can have on a number of disciplines. Having an established record of the use of a unique mixture of empirical methods, the work of understanding and designing for user behavior is well situated to take advantage of the advances claimed by “big data” methods. Beyond any straightforward benefit of the use of large sets of data, such an increase in the scale of empirical evidence has far‐reaching implications for the work of empirically guided design. We develop the concept of “peak empiricism” to explain the new role that large‐scale data comes to play in design, one in which data become more than a simple empirical tool. In providing such an expansive empirical setting for design, big data weakens the subjective conditions necessary for empirical insight, pointing to a more performative approach to the relationship between a designer and his or her work. In this, the work of design is characterized as “thinking with” the data in a partnership that weakens not only any sense of empiricism but also the agentive foundations of a classical view of design work.
    March 27, 2015   doi: 10.1002/asi.23497   open full text
  • The “total cost of publication” in a hybrid open‐access environment: Institutional approaches to funding journal article‐processing charges in combination with subscriptions.
    Stephen Pinfield, Jennifer Salter, Peter A. Bath.
    Journal of the American Society for Information Science and Technology. February 13, 2015
    As open‐access (OA) publishing funded by article‐processing charges (APCs) becomes more widely accepted, academic institutions need to be aware of the “total cost of publication” (TCP), comprising subscription costs plus APCs and additional administration costs. This study analyzes data from 23 UK institutions covering the period 2007–2014 modeling the TCP. It shows a clear rise in centrally managed APC payments from 2012 onward, with payments projected to increase further. As well as evidencing the growing availability and acceptance of OA publishing, these trends reflect particular UK policy developments and funding arrangements intended to accelerate the move toward OA publishing (“Gold” OA). Although the mean value of APCs has been relatively stable, there was considerable variation in APC prices paid by institutions since 2007. In particular, “hybrid” subscription/OA journals were consistently more expensive than fully OA journals. Most APCs were paid to large “traditional” commercial publishers who also received considerable subscription income. New administrative costs reported by institutions varied considerably. The total cost of publication modeling shows that APCs are now a significant part of the TCP for academic institutions, in 2013 already constituting an average of 10% of the TCP (excluding administrative costs).
    February 13, 2015   doi: 10.1002/asi.23446   open full text
  • Aggregated journal–journal citation relations in scopus and web of science matched and compared in terms of networks, maps, and interactive overlays.
    Loet Leydesdorff, Félix Moya‐Anegón, Wouter Nooy.
    Journal of the American Society for Information Science and Technology. January 08, 2015
    We compare the network of aggregated journal–journal citation relations provided by the Journal Citation Reports (JCR) 2012 of the Science Citation Index (SCI) and Social Sciences Citation Index (SSCI) with similar data based on Scopus 2012. First, global and overlay maps were developed for the 2 sets separately. Using fuzzy‐string matching and ISSN numbers, we were able to match 10,524 journal names between the 2 sets: 96.4% of the 10,936 journals contained in JCR, or 51.2% of the 20,554 journals covered by Scopus. Network analysis was pursued on the set of journals shared between the 2 databases and the 2 sets of unique journals. Citations among the shared journals are more comprehensively covered in JCR than in Scopus, so the network in JCR is denser and more connected than in Scopus. The ranking of shared journals in terms of indegree (i.e., numbers of citing journals) or total citations is similar in both databases overall (Spearman rank correlation ρ > 0.97), but some individual journals rank very differently. Journals that are unique to Scopus seem to be less important—they are citing shared journals rather than being cited by them—but the humanities are covered better in Scopus than in JCR.
    January 08, 2015   doi: 10.1002/asi.23372   open full text
  • Seven dimensions of contemporary participation disentangled.
    Christopher Kelty, Aaron Panofsky, Morgan Currie, Roderic Crooks, Seth Erickson, Patricia Garcia, Michael Wartenbe, Stacy Wood.
    Journal of the American Society for Information Science and Technology. May 28, 2014
    Participation is today central to many kinds of research and design practice in information studies and beyond. From user‐generated content to crowdsourcing to peer production to fan fiction to citizen science, the concept remains both unexamined and heterogeneous in its definition. Intuitions about participation are confirmed by some examples, but scandalized by others, and it is difficult to pinpoint why participation seems to be robust in some cases and partial in others. In this paper we offer an empirically based, comparative analysis of participation that demonstrates its multidimensionality and provides a framework that allows clear distinctions and better analyses of the role of participation. We derive 7 dimensions of participations from the literature on participation and exemplify those dimensions using a set of 102 cases of contemporary participation that include uses of the Internet and new media.
    May 28, 2014   doi: 10.1002/asi.23202   open full text
  • ResearchGate: Disseminating, communicating, and measuring Scholarship?
    Mike Thelwall, Kayvan Kousha.
    Journal of the American Society for Information Science and Technology. May 23, 2014
    ResearchGate is a social network site for academics to create their own profiles, list their publications, and interact with each other. Like Academia.edu, it provides a new way for scholars to disseminate their work and hence potentially changes the dynamics of informal scholarly communication. This article assesses whether ResearchGate usage and publication data broadly reflect existing academic hierarchies and whether individual countries are set to benefit or lose out from the site. The results show that rankings based on ResearchGate statistics correlate moderately well with other rankings of academic institutions, suggesting that ResearchGate use broadly reflects the traditional distribution of academic capital. Moreover, while Brazil, India, and some other countries seem to be disproportionately taking advantage of ResearchGate, academics in China, South Korea, and Russia may be missing opportunities to use ResearchGate to maximize the academic impact of their publications.
    May 23, 2014   doi: 10.1002/asi.23236   open full text
  • Constructing an inter‐post similarity measure to differentiate the psychological stages in offensive chats.
    Md. Waliur Rahman Miah, John Yearwood, Siddhivinayak Kulkarni.
    Journal of the American Society for Information Science and Technology. May 23, 2014
    Offensive Internet chats, particularly the child‐exploiting type, tend to follow a documented psychological behavioral pattern. Researchers have identified some important stages in this pattern. The psychological stages broadly include befriending, information exchange, grooming, and approach. Similarities among the posts of a chat play an important role in differentiating as well as in identifying these stages. In this article a novel similarity measure is constructed which gives high Inter‐post‐similarity among the chat‐posts within a particular behavioral stage and low inter‐post‐similarity across different behavioral stages. A psychological stage corpus‐based dictionary is constructed from mining the terms associated with each stage. The dictionary works as a background knowledge‐base to support the similarity measure. To find the inter‐post similarity a modified sentence similarity measure is used. The proposed measure gives improved recognition of inter‐stage and intra‐stage similarity among the chat posts compared with other types of similarity measures. The pairwise inter‐post similarity is used for clustering chat‐posts into the psychological stages. Results of experiments demonstrate that the new clustering method gives better results than some current clustering methods.
    May 23, 2014   doi: 10.1002/asi.23247   open full text
  • Exploring the knowledge development process of English language learners at a high school: How do English language proficiency and the nature of research task influence student Learning?
    Sung Un Kim.
    Journal of the American Society for Information Science and Technology. May 22, 2014
    This study aims to understand the learning experience of English language learners (ELLs) within the framework of Kuhlthau's Information Search Process (ISP). Forty‐eight ELL students from three classes at a high school participated in the study while they conducted a research project in English. Data were collected through demographic questionnaire and process surveys. Students' demographic information, knowledge about their research topic, labeling of knowledge, estimate of interest and knowledge, and learning outcomes were collected and analyzed with content analysis and statistical techniques. The findings indicate that ELL students, as a whole group, showed significant increases in their topical knowledge and estimate of interest and knowledge as they progressed in the research project, which are consistent with what other ISP‐based studies found. When three different English proficiency‐level groups were compared, only the intermediate group showed significant increases in topical knowledge and estimate of knowledge throughout the process. Also, different research tasks impacted the amount and substance of knowledge students built and their estimated knowledge during the research project. The findings led to suggestions for instructional strategies such as learning goals reflecting various kinds of learning, differentiated instructions in mixed‐ability classrooms, structured interventions, personalized research topics, and teacher–school librarian collaborations.
    May 22, 2014   doi: 10.1002/asi.23164   open full text
  • Measuring academic influence: Not all citations are equal.
    Xiaodan Zhu, Peter Turney, Daniel Lemire, André Vellino.
    Journal of the American Society for Information Science and Technology. May 21, 2014
    The importance of a research article is routinely measured by counting how many times it has been cited. However, treating all citations with equal weight ignores the wide variety of functions that citations perform. We want to automatically identify the subset of references in a bibliography that have a central academic influence on the citing paper. For this purpose, we examine the effectiveness of a variety of features for determining the academic influence of a citation. By asking authors to identify the key references in their own work, we created a data set in which citations were labeled according to their academic influence. Using automatic feature selection with supervised machine learning, we found a model for predicting academic influence that achieves good performance on this data set using only four features. The best features, among those we evaluated, were those based on the number of times a reference is mentioned in the body of a citing paper. The performance of these features inspired us to design an influence‐primed h‐index (the hip‐index). Unlike the conventional h‐index, it weights citations by how many times a reference is mentioned. According to our experiments, the hip‐index is a better indicator of researcher performance than the conventional h‐index.
    May 21, 2014   doi: 10.1002/asi.23179   open full text
  • Information seeking, use, and decision making.
    Jyoti Mishra, David Allen, Alan Pearman.
    Journal of the American Society for Information Science and Technology. May 21, 2014
    In this paper we explored three areas: decision making and information seeking, the relationship between information seeking and uncertainty, and the role of expertise in influencing information use. This was undertaken in the context of a qualitative study into decision making in the initial stages of emergency response to major incidents. The research took an interpretive approach in which activity theory is used as an analytical framework. The research provides further evidence that the context of the activity and individual differences influence the choice of decision mode and associated information behavior. We also established that information is often not used to resolve uncertainty in decision making and indeed information is often sought and used after the decision is made to justify the decision. Finally, we point to the significance of both expertise and confidence in understanding information behavior. The contribution of the research to existing theoretical frameworks is discussed and a modified version of Wilson's problem‐solving model is proposed.
    May 21, 2014   doi: 10.1002/asi.23204   open full text
  • Effect of web page menu orientation on retrieving information by people with learning disabilities.
    Peter Williams, Christian Hennig.
    Journal of the American Society for Information Science and Technology. May 21, 2014
    The Internet facilitates the provision of accessible information to people with learning disabilities. However, problems with navigation and retrieval represent a barrier for this cohort. This article addresses one aspect of page design, testing whether a horizontal or vertical contents arrangement facilitates faster access to content for people with learning disabilities. Participants were timed as they looked for one‐word “dummy” menu entries appearing in various locations along a horizontal or vertical grid. The words corresponded to images shown at random in a word‐search type activity. Results were analyzed using mixed effects models. Results showed that mean search times increased as the position shifted from left to right and from top to bottom. Thus, participants undertook the test as if it were a reading exercise, despite the images appearing in the center of the page and the words appearing at random positions. The research also suggests that a horizontal menu may be more effective than a vertical one, with the most important links placed on the left. The propensity to imbibe information “serially” (word‐for‐word) rather than to skim or look “globally” has important website design implications.
    May 21, 2014   doi: 10.1002/asi.23214   open full text
  • Correlations between user voting data, budget, and box office for films in the internet movie database.
    Max Wasserman, Satyam Mukherjee, Konner Scott, Xiao Han T. Zeng, Filippo Radicchi, Luís A. N. Amaral.
    Journal of the American Society for Information Science and Technology. May 21, 2014
    The Internet Movie Database (IMDb) is one of the most‐visited websites in the world and the premier source for information on films. Similar to Wikipedia, much of IMDb's information is user contributed. IMDb also allows users to voice their opinion on the quality of films through voting. We investigate whether there is a connection between user voting data and economic film characteristics. We perform distribution and correlation analysis on a set of films chosen to mitigate effects of bias due to the language and country of origin of films. Production budget, box office gross, and total number of user votes for films are consistent with double‐log normal distributions for certain time periods. Both total gross and user votes are consistent with a double‐log normal distribution from the late 1980s onward while for budget it extends from 1935 to 1979. In addition, we find a strong correlation between number of user votes and the economic statistics, particularly budget. Remarkably, we find no evidence for a correlation between number of votes and average user rating. Our results suggest that total user votes is an indicator of a film's prominence or notability, which can be quantified by its promotional costs.
    May 21, 2014   doi: 10.1002/asi.23213   open full text
  • Factors affecting citation rates of research articles.
    Natsuo Onodera, Fuyuki Yoshikane.
    Journal of the American Society for Information Science and Technology. May 21, 2014
    This study examines whether there are some general trends across subject fields regarding the factors affecting the number of citations of articles, focusing especially on those factors that are not directly related to the quality or content of articles (extrinsic factors). For this purpose, from 6 selected subject fields (condensed matter physics, inorganic and nuclear chemistry, electric and electronic engineering, biochemistry and molecular biology, physiology, and gastroenterology), original articles published in the same year were sampled (n = 230–240 for each field). Then, the citation counts received by the articles in relatively long citation windows (6 and 11 years after publication) were predicted by negative binomial multiple regression (NBMR) analysis for each field. Various article features about author collaboration, cited references, visibility, authors' achievements (measured by past publications and citedness), and publishing journals were considered as the explanatory variables of NBMR. Some generality across the fields was found with regard to the selected predicting factors and the degree of significance of these predictors. The Price index was the strongest predictor of citations, and number of references was the next. The effects of number of authors and authors' achievement measures were rather weak.
    May 21, 2014   doi: 10.1002/asi.23209   open full text
  • How are people enticed to disclose personal information despite privacy concerns in social network sites? The calculus between benefit and cost.
    Jinyoung Min, Byoungsoo Kim.
    Journal of the American Society for Information Science and Technology. May 21, 2014
    Although social network sites (SNS) users' privacy concerns cannot be completely removed by privacy policies and security safeguards, the user base of SNS is constantly expanding. To explain this phenomenon, we use the lens of the calculus of behavior within a cost–benefit framework suggesting privacy concerns as cost factors and behavior enticements as benefit factors and examine how the enticements operate against privacy concerns in users' cost–benefit calculus regarding disclosing personal information and using SNS continuously. Adopting social influence process theory, we examine three enticements—the motivation of relationship management through SNS, the perceived usefulness of SNS for self‐presentation, and the subjective social norms of using SNS. From a survey of 362 Facebook users who have disclosed personal information on Facebook, we find that the motivation of relationship management through SNS and the perceived usefulness of SNS for self‐presentation lead users to disclose information but that subjective social norms do not, suggesting that the perceived benefit of behavior enticements should be assimilated into users' own value systems to truly operate as benefit factors. The results regarding the positive and negative effects of suggested benefit and cost factors on information disclosure show that only the combined positive effects of all three behavior enticements exceed the negative effect of privacy concerns, suggesting that privacy concerns can be offset only by multiple benefit factors.
    May 21, 2014   doi: 10.1002/asi.23206   open full text
  • A patento‐scientometric approach to venture capital investment prioritization.
    Gustavo Silva Motta, Pauli Adriano de Almada Garcia, Rogério Hermida Quintella.
    Journal of the American Society for Information Science and Technology. May 21, 2014
    This paper proposes an approach to analyzing and prioritizing venture capital investments with the use of scientometric and patentometric indicators. The article highlights the importance of such investments in the development of technology‐based companies and their positive impacts on the economic development of regions and countries. It also notes that the managers of venture capital funds struggle to objectify the evaluation of investment proposals. This paper analyzes the selection process of 10 companies, five of which received investments by the largest venture capital fund in Brazil and the other five of which were rejected by this same fund. We formulated scientometric and patentometric indicators related to each company and conducted a comparative analysis of each by considering the indicators grouped by the nonfinancial criteria (technology, market, and divestiture team) from analysis of the investment proposals. The proposed approach clarifies aspects of the criteria evaluated and contributes to the construction of a method for prioritizing venture capital investments.
    May 21, 2014   doi: 10.1002/asi.23205   open full text
  • Providing informational support in an online discussion group and a Q&A site: The case of travel planning.
    Reijo Savolainen.
    Journal of the American Society for Information Science and Technology. May 19, 2014
    This study examines the ways in which informational support based on user‐generated content is provided for the needs of leisure‐related travel planning in an online discussion group and a Q&A site. Attention is paid to the grounds by which the participants bolster the informational support. The findings draw on the analysis of 200 threads of a Finnish online discussion group and a Yahoo! Answers Q&A (question and answer) forum. Three main types of informational support were identified: providing factual information, providing advice, and providing personal opinion. The grounds used in the answers varied across the types of informational support. While providing factual information, the most popular ground was description of the attributes of an entity. In the context of providing advice, reference to external sources of information was employed most frequently. Finally, although providing personal opinions, the participants most often bolstered their views by articulating positive or negative evaluations of an entity. Overall, regarding the grounds, there were more similarities than differences between the discussion group and the Q&A site.
    May 19, 2014   doi: 10.1002/asi.23191   open full text
  • Domain‐independent search expertise: A description of procedural knowledge gained during guided instruction.
    Catherine L. Smith.
    Journal of the American Society for Information Science and Technology. May 19, 2014
    This longitudinal study examined the search behavior of 10 students as they completed assigned exercises for an online professional course in expert searching. The research objective was to identify, describe, and hypothesize about features of the behavior that are indicative of procedural knowledge gained during guided instruction. Log‐data of search interaction were coded using a conceptual framework focused on components of search practice hypothesized to organize an expert searcher's attention during search. The coded data were analyzed using a measure of pointwise mutual information and state‐transition analysis. Results of the study provide important insight for future investigation of domain‐independent search expertise and for the design of systems that assist searchers in gaining expertise.
    May 19, 2014   doi: 10.1002/asi.23272   open full text
  • Identifying ISI‐indexed articles by their lexical usage: A text analysis approach.
    Mohammadreza Moohebat, Ram Gopal Raj, Sameem Binti Abdul Kareem, Dirk Thorleuchter.
    Journal of the American Society for Information Science and Technology. May 19, 2014
    This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information (ISI) and non‐ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI‐ and non‐ISI‐indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non‐ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI‐indexed articles in both disciplines with higher precision than do the Naïve Bayesian and K‐Nearest Neighbors techniques.
    May 19, 2014   doi: 10.1002/asi.23194   open full text
  • Using content and network analysis to understand the social support exchange patterns and user behaviors of an online smoking cessation intervention program.
    Mi Zhang, Christopher C. Yang.
    Journal of the American Society for Information Science and Technology. May 19, 2014
    Informational support and nurturant support are two basic types of social support offered in online health communities. This study identifies types of social support in the QuitStop forum and brings insights to exchange patterns of social support and user behaviors with content analysis and social network analysis. Motivated by user information behavior, this study defines two patterns to describe social support exchange: initiated support exchange and invited support exchange. It is found that users with a longer quitting time tend to actively give initiated support, and recent quitters with a shorter abstinent time are likely to seek and receive invited support. This study also finds that support givers of informational support quit longer ago than support givers of nurturant support, and support receivers of informational support quit more recently than support receivers of nurturant support. Usually, informational support is offered by users at late quit stages to users at early quit stages. Nurturant support is also exchanged among users within the same quit stage. These findings help us understand how health consumers are supporting each other and reveal new capabilities of online intervention programs that can be designed to offer social support in a timely and effective manner.
    May 19, 2014   doi: 10.1002/asi.23189   open full text
  • Classifying scientific disciplines in Slovenia: A study of the evolution of collaboration structures.
    Luka Kronegger, Franc Mali, Anuška Ferligoj, Patrick Doreian.
    Journal of the American Society for Information Science and Technology. May 19, 2014
    We explore classifying scientific disciplines including their temporal features by focusing on their collaboration structures over time. Bibliometric data for Slovenian researchers registered at the Slovenian Research Agency were used. These data were obtained from the Slovenian National Current Research Information System. We applied a recently developed hierarchical clustering procedure for symbolic data to the coauthorship structure of scientific disciplines. To track temporal changes, we divided data for the period 1986–2010 into five 5‐year time periods. The clusters of disciplines for the Slovene science system revealed 5 clusters of scientific disciplines that, in large measure, correspond with the official national classification of sciences. However, there were also some significant differences pointing to the need for a dynamic classification system of sciences to better characterize them. Implications stemming from these results, especially with regard to classifying scientific disciplines, understanding the collaborative structure of science, and research and development policies, are discussed.
    May 19, 2014   doi: 10.1002/asi.23171   open full text
  • Data mining from web search queries: A comparison of google trends and baidu index.
    Liwen Vaughan, Yue Chen.
    Journal of the American Society for Information Science and Technology. May 19, 2014
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    May 19, 2014   doi: 10.1002/asi.23201   open full text
  • Metadata quality in digital repositories: Empirical results from the cross‐domain transfer of a quality assurance process.
    Nikos Palavitsinis, Nikos Manouselis, Salvador Sanchez‐Alonso.
    Journal of the American Society for Information Science and Technology. March 12, 2014
    Metadata quality presents a challenge faced by many digital repositories. There is a variety of proposed quality assurance frameworks applied in repositories that are deployed in various contexts. Although studies report that there is an improvement of the quality of the metadata in many of the applications, the transfer of a successful approach from one application context to another has not been studied to a satisfactory extent. This article presents the empirical results of the application of a metadata quality assurance process that has been developed and successfully applied in an educational context (learning repositories) to 2 different application contexts to compare results with the previous application and assess its generalizability. More specifically, it reports results from the adaptation and application of this process in a library context (institutional repositories) and in a cultural context (digital cultural repositories). Initial empirical findings indicate that content providers seem to be gaining a better understanding of metadata when the proposed process is put in place and that the quality of the produced metadata records increases.
    March 12, 2014   doi: 10.1002/asi.23045   open full text
  • Information and ontologies: Challenges in scaling knowledge for development.
    Jessica Seddon, Ramesh Srinivasan.
    Journal of the American Society for Information Science and Technology. March 12, 2014
    This article calls for a conceptual and empirical research agenda on ways in which policymakers and researchers can aggregate socioeconomic information shared by diverse communities without losing contextual information that is important for extracting meaning from the data. We describe the knowledge loss that occurs when information is aggregated across diverse ontologies into databases or archives relying on a single schema and use a series of illustrative examples demonstrate the significance of this information loss for policy design and implementation. While there are important gains from information aggregation across ontologies, the potential trade‐offs involved in creating large‐scale databases are significant. The differences between locally constituted ways of knowing and the organizing ontology used for larger scale databases affects the extent to which these collections, or “knowledge banks,” provide accurate guidance for policy and action. The article draws on insights from information science and social science to discuss two classes of socio‐technical approaches for overcoming information loss at the interface between ontologies: first, technology‐enabled efforts to soften ontological interfaces by making data open, unconstructed, and available and/or creating ontologies collaboratively and, second, organizational changes that reduce the need for information to cross interfaces, such as reconstructing knowledge platforms to be more interactive, thereby decentralizing decision‐making. The framing of the challenges involved in building large‐scale knowledge banks as a matter of ontology mismatch creates an opportunity for an interdisciplinary and analytically integrated research agenda to implement and test these potential approaches.
    March 12, 2014   doi: 10.1002/asi.23000   open full text
  • Microsoft academic search and Google scholar citations: Comparative analysis of author profiles.
    José Luis Ortega, Isidro F. Aguillo.
    Journal of the American Society for Information Science and Technology. February 26, 2014
    This article offers a comparative analysis of the personal profiling capabilities of the two most important free citation‐based academic search engines, namely, Microsoft Academic Search (MAS) and Google Scholar Citations (GSC). Author profiles can be useful for evaluation purposes once the advantages and the shortcomings of these services are described and taken into consideration. In total, 771 personal profiles appearing in both the MAS and the GSC databases were analyzed. Results show that the GSC profiles include more documents and citations than those in MAS but with a strong bias toward the information and computing sciences, whereas the MAS profiles are disciplinarily better balanced. MAS shows technical problems such as a higher number of duplicated profiles and a lower updating rate than GSC. It is concluded that both services could be used for evaluation proposes only if they are applied along with other citation indices as a way to supplement that information.
    February 26, 2014   doi: 10.1002/asi.23036   open full text
  • Data, information, knowledge: An information science analysis.
    Antonio Badia.
    Journal of the American Society for Information Science and Technology. February 25, 2014
    I analyze the text of an article that appeared in this journal in 2007 that published the results of a questionnaire in which a number of experts were asked to define the concepts of data, information, and knowledge. I apply standard information retrieval techniques to build a list of the most frequent terms in each set of definitions. I then apply information extraction techniques to analyze how the top terms are used in the definitions. As a result, I draw data‐driven conclusions about the aggregate opinion of the experts. I contrast this with the original analysis of the data to provide readers with an alternative viewpoint on what the data tell us.
    February 25, 2014   doi: 10.1002/asi.23043   open full text
  • Where your photo is taken: Geolocation prediction for social images.
    Bo Liu, Quan Yuan, Gao Cong, Dong Xu.
    Journal of the American Society for Information Science and Technology. February 25, 2014
    Social image‐sharing websites have attracted a large number of users. These systems allow users to associate geolocation information with their images, which is essential for many interesting applications. However, only a small fraction of social images have geolocation information. Thus, an automated tool for suggesting geolocation is essential to help users geotag their images. In this article, we use a large data set consisting of 221 million Flickr images uploaded by 2.2 million users. For the first time, we analyze user uploading patterns, user geotagging behaviors, and the relationship between the taken‐time gap and the geographical distance between two images from the same user. Based on the findings, we represent a user profile by historical tags for the user and build a multinomial model on the user profile for geotagging. We further propose a unified framework to suggest geolocations for images, which combines the information from both image tags and the user profile. Experimental results show that for images uploaded by users who have never done geotagging, our method outperforms the state‐of‐the‐art method by 10.6 to 34.2%, depending on the granularity of the prediction. For images from users who have done geotagging, a simple method is able to achieve very high accuracy.
    February 25, 2014   doi: 10.1002/asi.23050   open full text
  • Students' group work strategies in source‐based writing assignments.
    Eero Sormunen, Mikko Tanni, Tuulikki Alamettälä, Jannica Heinström.
    Journal of the American Society for Information Science and Technology. February 25, 2014
    Source‐based writing assignments conducted by groups of students are a common learning task used in information literacy instruction. The fundamental assumption in group assignments is that students' collaboration substantially enhances their learning. The present study focused on the group work strategies adopted by upper secondary school students in source‐based writing assignments. Seventeen groups authored Wikipedia or Wikipedia‐style articles and were interviewed during and after the assignment. Group work strategies were analyzed in 6 activities: planning, searching, assessing sources, reading, writing, and editing. The students used 2 cooperative strategies: delegation and division of work, and 2 collaborative strategies: pair and group collaboration. Division of work into independently conducted parts was the most popular group work strategy. Also group collaboration, where students worked together to complete an activity, was commonly applied. Division of work was justified by efficiency in completing the project and by ease of control in the fair division of contributions. The motivation behind collaboration was related to quality issues and shared responsibility. We suggest that the present designs of learning tasks lead students to avoid collaboration, increasing the risk of low learning outcomes in information literacy instruction.
    February 25, 2014   doi: 10.1002/asi.23032   open full text
  • Self‐training author name disambiguation for information scarce scenarios.
    Anderson A. Ferreira, Adriano Veloso, Marcos André Gonçalves, Alberto H. F. Laender.
    Journal of the American Society for Information Science and Technology. February 22, 2014
    We present a novel 3‐step self‐training method for author name disambiguation—SAND (self‐training associative name disambiguator)—which requires no manual labeling, no parameterization (in real‐world scenarios) and is particularly suitable for the common situation in which only the most basic information about a citation record is available (i.e., author names, and work and venue titles). During the first step, real‐world heuristics on coauthors are able to produce highly pure (although fragmented) clusters. The most representative of these clusters are then selected to serve as training data for the third supervised author assignment step. The third step exploits a state‐of‐the‐art transductive disambiguation method capable of detecting unseen authors not included in any training example and incorporating reliable predictions to the training data. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation, demonstrate that our proposed method outperforms all representative unsupervised author grouping disambiguation methods and is very competitive with fully supervised author assignment methods. Thus, different from other bootstrapping methods that explore privileged, hard to obtain information such as self‐citations and personal information, our proposed method produces topnotch performance with no (manual) training data or parameterization and in the presence of scarce information.
    February 22, 2014   doi: 10.1002/asi.22992   open full text
  • The effect on citation inequality of differences in citation practices at the web of science subject category level.
    Juan A. Crespo, Neus Herranz, Yunrong Li, Javier Ruiz‐Castillo.
    Journal of the American Society for Information Science and Technology. February 22, 2014
    This article studies the impact of differences in citation practices at the subfield, or Web of Science subject category level, using the model introduced in Crespo, Li, and Ruiz‐Castillo (2013a), according to which the number of citations received by an article depends on its underlying scientific influence and the field to which it belongs. We use the same Thomson Reuters data set of about 4.4 million articles used in Crespo et al. (2013a) to analyze 22 broad fields. The main results are the following: First, when the classification system goes from 22 fields to 219 subfields the effect on citation inequality of differences in citation practices increases from ∼14% at the field level to 18% at the subfield level. Second, we estimate a set of exchange rates (ERs) over a wide [660, 978] citation quantile interval to express the citation counts of articles into the equivalent counts in the all‐sciences case. In the fractional case, for example, we find that in 187 of 219 subfields the ERs are reliable in the sense that the coefficient of variation is smaller than or equal to 0.10. Third, in the fractional case the normalization of the raw data using the ERs (or subfield mean citations) as normalization factors reduces the importance of the differences in citation practices from 18% to 3.8% (3.4%) of overall citation inequality. Fourth, the results in the fractional case are essentially replicated when we adopt a multiplicative approach.
    February 22, 2014   doi: 10.1002/asi.23006   open full text
  • Scholar metadata and knowledge generation with human and artificial intelligence.
    Xiaozhong Liu, Chun Guo, Lin Zhang.
    Journal of the American Society for Information Science and Technology. February 22, 2014
    Scholar metadata have traditionally centered on descriptive representations, which have been used as a foundation for scholarly publication repositories and academic information retrieval systems. In this article, we propose innovative and economic methods of generating knowledge‐based structural metadata (structural keywords) using a combination of natural language processing‐based machine‐learning techniques and human intelligence. By allowing low‐barrier participation through a social media system, scholars (both as authors and users) can participate in the metadata editing and enhancing process and benefit from more accurate and effective information retrieval. Our experimental web system ScholarWiki uses machine learning techniques, which automatically produce increasingly refined metadata by learning from the structural metadata contributed by scholars. The cumulated structural metadata add intelligence and automatically enhance and update recursively the quality of metadata, wiki pages, and the machine‐learning model.
    February 22, 2014   doi: 10.1002/asi.23013   open full text
  • Beyond bag‐of‐words: Bigram‐enhanced context‐dependent term weights.
    Edward K. F. Dang, Robert W. P. Luk, James Allan.
    Journal of the American Society for Information Science and Technology. February 22, 2014
    While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n‐grams in document representation instead of unigrams. However, the majority of early works on n‐grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or “contexts” of queries has been found to be promising. In particular, recent studies showed that using new context‐dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag‐of‐words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n‐gram and context approaches by computing context‐dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)‐6, TREC‐7, TREC‐8, and TREC‐2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context‐dependent term weights with bigrams is effective in further improving retrieval performance.
    February 22, 2014   doi: 10.1002/asi.23024   open full text
  • Extracting evolutionary communities in community question answering.
    Zhongfeng Zhang, Qiudan Li, Daniel Zeng, Heng Gao.
    Journal of the American Society for Information Science and Technology. January 29, 2014
    With the rapid growth of Web 2.0, community question answering (CQA) has become a prevalent information seeking channel, in which users form interactive communities by posting questions and providing answers. Communities may evolve over time, because of changes in users' interests, activities, and new users joining the network. To better understand user interactions in CQA communities, it is necessary to analyze the community structures and track community evolution over time. Existing work in CQA focuses on question searching or content quality detection, and the important problems of community extraction and evolutionary pattern detection have not been studied. In this article, we propose a probabilistic community model (PCM) to extract overlapping community structures and capture their evolution patterns in CQA. The empirical results show that our algorithm appears to improve the community extraction quality. We show empirically, using the iPhone data set, that interesting community evolution patterns can be discovered, with each evolution pattern reflecting the variation of users' interests over time. Our analysis suggests that individual users could benefit to gain comprehensive information from tracking the transition of products. We also show that the communities provide a decision‐making basis for business.
    January 29, 2014   doi: 10.1002/asi.23003   open full text
  • Modeling users' web search behavior and their cognitive styles.
    Khamsum Kinley, Dian Tjondronegoro, Helen Partridge, Sylvia Edwards.
    Journal of the American Society for Information Science and Technology. January 29, 2014
    Previous studies have shown that users' cognitive styles play an important role during web searching. However, only a limited number of studies have showed the relationship between cognitive styles and web search behavior. Most importantly, it is not clear which components of web search behavior are influenced by cognitive styles. This article examines the relationships between users' cognitive styles and their web searching and develops a model that portrays the relationship. The study uses qualitative and quantitative analyses based on data gathered from 50 participants. A questionnaire was utilized to collect participants' demographic information, and Riding's (1991) Cognitive Styles Analysis (CSA) test to assess their cognitive styles. Results show that users' cognitive styles influenced their information‐searching strategies, query reformulation behavior, web navigational styles, and information‐processing approaches. The user model developed in this study depicts the fundamental relationships between users' web search behavior and their cognitive styles. Modeling web search behavior with a greater understanding of users' cognitive styles can help information science researchers and information systems designers to bridge the semantic gap between the user and the systems. Implications of the research for theory and practice, and future work, are discussed.
    January 29, 2014   doi: 10.1002/asi.23053   open full text
  • arXiv E‐prints and the journal of record: An analysis of roles and relationships.
    Vincent Larivière, Cassidy R. Sugimoto, Benoit Macaluso, Staša Milojević, Blaise Cronin, Mike Thelwall.
    Journal of the American Society for Information Science and Technology. January 27, 2014
    Since its creation in 1991, arXiv has become central to the diffusion of research in a number of fields. Combining data from the entirety of arXiv and the Web of Science (WoS), this article investigates (a) the proportion of papers across all disciplines that are on arXiv and the proportion of arXiv papers that are in the WoS, (b) the elapsed time between arXiv submission and journal publication, and (c) the aging characteristics and scientific impact of arXiv e‐prints and their published version. It shows that the proportion of WoS papers found on arXiv varies across the specialties of physics and mathematics, and that only a few specialties make extensive use of the repository. Elapsed time between arXiv submission and journal publication has shortened but remains longer in mathematics than in physics. In physics, mathematics, as well as in astronomy and astrophysics, arXiv versions are cited more promptly and decay faster than WoS papers. The arXiv versions of papers—both published and unpublished—have lower citation rates than published papers, although there is almost no difference in the impact of the arXiv versions of published and unpublished papers.
    January 27, 2014   doi: 10.1002/asi.23044   open full text
  • Reliability and validity of query intent assessments.
    Suzan Verberne, Maarten Heijden, Max Hinne, Maya Sappelli, Saskia Koldijk, Wessel Kraaij, Eduard Hoenkamp.
    Journal of the American Society for Information Science and Technology. August 09, 2013
    In most intent recognition studies, annotations of query intent are created post hoc by external assessors who are not the searchers themselves. It is important for the field to get a better understanding of the quality of this process as an approximation for determining the searcher's actual intent. Some studies have investigated the reliability of the query intent annotation process by measuring the interassessor agreement. However, these studies did not measure the validity of the judgments, that is, to what extent the annotations match the searcher's actual intent. In this study, we asked both the searchers themselves and external assessors to classify queries using the same intent classification scheme. We show that of the seven dimensions in our intent classification scheme, four can reliably be used for query annotation. Of these four, only the annotations on the topic and spatial sensitivity dimension are valid when compared with the searcher's annotations. The difference between the interassessor agreement and the assessor‐searcher agreement was significant on all dimensions, showing that the agreement between external assessors is not a good estimator of the validity of the intent classifications. Therefore, we encourage the research community to consider using query intent classifications by the searchers themselves as test data.
    August 09, 2013   doi: 10.1002/asi.22948   open full text
  • Aggregation of the web performance of internal university units as a method of quantitative analysis of a university system: The case of Spain.
    Enrique Orduña‐Malea.
    Journal of the American Society for Information Science and Technology. August 07, 2013
    The aggregation of web performance data (page count and visibility) of internal university units could constitute a more precise indicator than the overall web performance of the universities and, therefore, be of use in the design of university web rankings. In order to test this hypothesis, a longitudinal analysis of the internal units of the Spanish university system was conducted over the course of 2010. For the 13,800 URLs identified, page count and visibility were calculated using the Yahoo! API. The internal values obtained were aggregated by university and compared with the values obtained from the analysis of the universities' general URLs. The results indicate that, although the correlations between general and internal values are high, internal performance is low in comparison to general performance, and that they give rise to different performance rankings. The conclusion is that the aggregation of unit performance is of limited use due to the low levels of internal development of the websites, and so its use is not recommended for the design of rankings. Despite this, the internal analysis enabled the detection of, among other things, a low correlation between page count and visibility due to the widespread use of subdirectories and problems accessing certain content.
    August 07, 2013   doi: 10.1002/asi.22912   open full text
  • Visualizing the history of evidence‐based medicine: A bibliometric analysis.
    Jiantong Shen, Leye Yao, Youping Li, Mike Clarke, Li Wang, Dan Li.
    Journal of the American Society for Information Science and Technology. August 06, 2013
    The aim of this paper is to visualize the history of evidence‐based medicine (EBM) and to examine the characteristics of EBM development in China and the West. We searched the Web of Science and the Chinese National Knowledge Infrastructure database for papers related to EBM. We applied information visualization techniques, citation analysis, cocitation analysis, cocitation cluster analysis, and network analysis to construct historiographies, themes networks, and chronological theme maps regarding EBM in China and the West. EBM appeared to develop in 4 stages: incubation (1972–1992 in the West vs. 1982–1999 in China), initiation (1992–1993 vs. 1999–2000), rapid development (1993–2000 vs. 2000–2004), and stable distribution (2000 onwards vs. 2004 onwards). Although there was a lag in EBM initiation in China compared with the West, the pace of development appeared similar. Our study shows that important differences exist in research themes, domain structures, and development depth, and in the speed of adoption between China and the West. In the West, efforts in EBM have shifted from education to practice, and from the quality of evidence to its translation. In China, there was a similar shift from education to practice, and from production of evidence to its translation. In addition, this concept has diffused to other healthcare areas, leading to the development of evidence‐based traditional Chinese medicine, evidence‐based nursing, and evidence‐based policy making.
    August 06, 2013   doi: 10.1002/asi.22890   open full text
  • Social tagging in the scholarly world.
    Chen Xu, Benjiang Ma, Xiaohong Chen, Feicheng Ma.
    Journal of the American Society for Information Science and Technology. August 02, 2013
    The number of research studies on social tagging has increased rapidly in the past years, but few of them highlight the characteristics and research trends in social tagging. A set of 862 academic documents relating to social tagging and published from 2005 to 2011 was thus examined using bibliometric analysis as well as the social network analysis technique. The results show that social tagging, as a research area, develops rapidly and attracts an increasing number of new entrants. There are no key authors, publication sources, or research groups that dominate the research domain of social tagging. Research on social tagging appears to focus mainly on the following three aspects: (a) components and functions of social tagging (e.g., tags, tagging objects, and tagging network), (b) taggers' behaviors and interface design, and (c) tags' organization and usage in social tagging. The trend suggest that more researchers turn to the latter two integrated with human computer interface and information retrieval, although the first aspect is the fundamental one in social tagging. Also, more studies relating to social tagging pay attention to multimedia tagging objects and not only text tagging. Previous research on social tagging was limited to a few subject domains such as information science and computer science. As an interdisciplinary research area, social tagging is anticipated to attract more researchers from different disciplines. More practical applications, especially in high‐tech companies, is an encouraging research trend in social tagging.
    August 02, 2013   doi: 10.1002/asi.22903   open full text
  • Folder versus tag preference in personal information management.
    Ofer Bergman, Noa Gradovitch, Judit Bar‐Ilan, Ruth Beyth‐Marom.
    Journal of the American Society for Information Science and Technology. August 02, 2013
    Users’ preferences for folders versus tags was studied in 2 working environments where both options were available to them. In the Gmail study, we informed 75 participants about both folder‐labeling and tag‐labeling, observed their storage behavior after 1 month, and asked them to estimate the proportions of different retrieval options in their behavior. In the Windows 7 study, we informed 23 participants about tags and asked them to tag all their files for 2 weeks, followed by a period of 5 weeks of free choice between the 2 methods. Their storage and retrieval habits were tested prior to the learning session and, after 7 weeks, using special classification recording software and a retrieval‐habits questionnaire. A controlled retrieval task and an in‐depth interview were conducted. Results of both studies show a strong preference for folders over tags for both storage and retrieval. In the minority of cases where tags were used for storage, participants typically used a single tag per information item. Moreover, when multiple classification was used for storage, it was only marginally used for retrieval. The controlled retrieval task showed lower success rates and slower retrieval speeds for tag use. Possible reasons for participants’ preferences are discussed.
    August 02, 2013   doi: 10.1002/asi.22906   open full text
  • Extending SemRep to the public health domain.
    Graciela Rosemblat, Melissa P. Resnick, Ione Auston, Dongwook Shin, Charles Sneiderman, Marcelo Fizsman, Thomas C. Rindflesch.
    Journal of the American Society for Information Science and Technology. July 30, 2013
    We describe the use of a domain‐independent method to extend a natural language processing (NLP) application, SemRep (Rindflesch, Fiszman, & Libbus, 2005), based on the knowledge sources afforded by the Unified Medical Language System (UMLS®; Humphreys, Lindberg, Schoolman, & Barnett, ) to support the area of health promotion within the public health domain. Public health professionals require good information about successful health promotion policies and programs that might be considered for application within their own communities. Our effort seeks to improve access to relevant information for the public health profession, to help those in the field remain an information‐savvy workforce. Natural language processing and semantic techniques hold promise to help public health professionals navigate the growing ocean of information by organizing and structuring this knowledge into a focused public health framework paired with a user‐friendly visualization application as a way to summarize results of PubMed® searches in this field of knowledge.
    July 30, 2013   doi: 10.1002/asi.22899   open full text
  • Deriving query suggestions for site search.
    Udo Kruschwitz, Deirdre Lungley, M‐Dyaa Albakour, Dawei Song.
    Journal of the American Society for Information Science and Technology. July 30, 2013
    Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single‐shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine‐grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files.
    July 30, 2013   doi: 10.1002/asi.22901   open full text
  • So fast so good: An analysis of answer quality and answer speed in community Question‐answering sites.
    Alton Y. K. Chua, Snehasish Banerjee.
    Journal of the American Society for Information Science and Technology. July 30, 2013
    The authors investigate the interplay between answer quality and answer speed across question types in community question‐answering sites (CQAs). The research questions addressed are the following: (a) How do answer quality and answer speed vary across question types? (b) How do the relationships between answer quality and answer speed vary across question types? (c) How do the best quality answers and the fastest answers differ in terms of answer quality and answer speed across question types? (d) How do trends in answer quality vary over time across question types? From the posting of 3,000 questions in six CQAs, 5,356 answers were harvested and analyzed. There was a significant difference in answer quality and answer speed across question types, and there were generally no significant relationships between answer quality and answer speed. The best quality answers had better overall answer quality than the fastest answers but generally took longer to arrive. In addition, although the trend in answer quality had been mostly random across all question types, the quality of answers appeared to improve gradually when given time. By highlighting the subtle nuances in answer quality and answer speed across question types, this study is an attempt to explore a territory of CQA research that has hitherto been relatively uncharted.
    July 30, 2013   doi: 10.1002/asi.22902   open full text
  • Knowledge sharing and knowledge management system avoidance: The role of knowledge type and the social network in bypassing an organizational knowledge management system.
    Susan A. Brown, Alan R. Dennis, Diana Burley, Priscilla Arling.
    Journal of the American Society for Information Science and Technology. July 25, 2013
    Knowledge sharing is a difficult task for most organizations, and there are many reasons for this. In this article, we propose that the nature of the knowledge shared and an individual's social network influence employees to find more value in person‐to‐person knowledge sharing, which could lead them to bypass the codified knowledge provided by a knowledge management system (KMS). We surveyed employees of a workman's compensation board in Canada and used social network analysis and hierarchical linear modeling to analyze the data. The results show that knowledge complexity and knowledge teachability increased the likelihood of finding value in person‐to‐person knowledge transfer, but knowledge observability did not. Contrary to expectations, whether the knowledge was available in the KMS had no impact on the value of person‐to‐person knowledge transfer. In terms of the social network, individuals with larger networks tended to perceive more value in the person‐to‐person transfer of knowledge than those with smaller networks.
    July 25, 2013   doi: 10.1002/asi.22892   open full text
  • Seeking beyond with IntegraL: A user study of sense‐making enabled by anchor‐based virtual integration of library systems.
    Shuyuan Mary Ho, Michael Bieber, Min Song, Xiangmin Zhang.
    Journal of the American Society for Information Science and Technology. July 22, 2013
    This article presents a user study showing the effectiveness of a linked‐based, virtual integration infrastructure that gives users access to relevant online resources, empowering them to design an information‐seeking path that is specifically relevant to their context. IntegraL provides a lightweight approach to improve and augment search functionality by dynamically generating context‐focused “anchors” for recognized elements of interest generated by library services. This article includes a description of how IntegraL's design supports users' information‐seeking behavior. A full user study with both objective and subjective measures of IntegraL and hypothesis testing regarding IntegraL's effectiveness of the user's information‐seeking experience are described along with data analysis, implications arising from this kind of virtual integration, and possible future directions.
    July 22, 2013   doi: 10.1002/asi.22904   open full text
  • An exploration of the digital library evaluation literature based on an ontological representation.
    Giannis Tsakonas, Angelos Mitrelis, Leonidas Papachristopoulos, Christos Papatheodorou.
    Journal of the American Society for Information Science and Technology. July 19, 2013
    Evaluation is a vital research area in the digital library domain, demonstrating a growing literature in conference and journal articles. We explore the directions and the evolution of evaluation research for the period 2001–2011 by studying the evaluation initiatives presented at 2 main conferences of the digital library domain, namely the Association for Computing Machinery and the Institute of Electrical and Electronics Engineers (ACM/IEEE) Joint Conference on Digital Libraries (JCDL), and the European Conference on Digital Libraries (ECDL; since 2011 renamed to the International Conference on Theory and Practice of Digital Libraries [TPDL]). The literature is annotated using a domain ontology, named DiLEO, which defines explicitly the main concepts of the digital library evaluation domain and their correlations. The ontology instances constitute a semantic network that enables the uniform and formal representation of the critical evaluation constructs in both conferences, untangles their associations, and supports the study of their evolution. We discuss interesting patterns in the evaluation practices as well as in the research foci of the 2 venues, and outline current research trends and areas for further research.
    July 19, 2013   doi: 10.1002/asi.22900   open full text
  • Improving the accuracy of co‐citation clustering using full text.
    Kevin W. Boyack, Henry Small, Richard Klavans.
    Journal of the American Society for Information Science and Technology. July 19, 2013
    Historically, co‐citation models have been based only on bibliographic information. Full‐text analysis offers the opportunity to significantly improve the quality of the signals upon which these co‐citation models are based. In this work we study the effect of reference proximity on the accuracy of co‐citation clusters. Using a corpus of 270,521 full text documents from 2007, we compare the results of traditional co‐citation clustering using only the bibliographic information to results from co‐citation clustering where proximity between reference pairs is factored into the pairwise relationships. We find that accounting for reference proximity from full text can increase the textual coherence (a measure of accuracy) of a co‐citation cluster solution by up to 30% over the traditional approach based on bibliographic information.
    July 19, 2013   doi: 10.1002/asi.22896   open full text
  • The effect of ad rank on the performance of keyword advertising campaigns.
    Bernard J. Jansen, Zhe Liu, Zach Simon.
    Journal of the American Society for Information Science and Technology. July 19, 2013
    The goal of this research is to evaluate the effect of ad rank on the performance of keyword advertising campaigns. We examined a large‐scale data file comprised of nearly 7,000,000 records spanning 33 consecutive months of a major US retailer's search engine marketing campaign. The theoretical foundation is serial position effect to explain searcher behavior when interacting with ranked ad listings. We control for temporal effects and use one‐way analysis of variance (ANOVA) with Tamhane's T2 tests to examine the effect of ad rank on critical keyword advertising metrics, including clicks, cost‐per‐click, sales revenue, orders, items sold, and advertising return on investment. Our findings show significant ad rank effect on most of those metrics, although less effect on conversion rates. A primacy effect was found on both clicks and sales, indicating a general compelling performance of top‐ranked ads listed on the first results page. Conversion rates, on the other hand, follow a relatively stable distribution except for the top 2 ads, which had significantly higher conversion rates. However, examining conversion potential (the effect of both clicks and conversion rate), we show that ad rank has a significant effect on the performance of keyword advertising campaigns. Conversion potential is a more accurate measure of the impact of an ad's position. In fact, the first ad position generates about 80% of the total profits, after controlling for advertising costs. In addition to providing theoretical grounding, the research results reported in this paper are beneficial to companies using search engine marketing as they strive to design more effective advertising campaigns.
    July 19, 2013   doi: 10.1002/asi.22910   open full text
  • Characterizing user tagging and Co‐occurring metadata in general and specialized metadata collections.
    Hong Huang, Corinne Jörgensen.
    Journal of the American Society for Information Science and Technology. July 15, 2013
    This study aims to identify the categorical characteristics and usage patterns of the most popular image tags in Flickr. The “metadata usage ratio” is introduced as a means of assessing the usage of a popular tag as metadata. We also compare how popular tags are used as image tags or metadata in the Flickr general collection and the Library of Congress's photostream (LCP), also in Flickr. The Flickr popular tags in the list overall are categorically stable, and the changes that do appear reflect Flickr users' evolving technology‐driven cultural experience. The popular tags in Flickr had a high number of generic objects and specific locations‐related tags and were rarely at the abstract level. Conversely, the popular tags in the LCP describe more in the specific objects and time categories. Flickr users copied the Library of Congress‐supplied metadata that related to specific objects or events and standard bibliographic information (e.g., author, format, time references) as popular tags in the LCP. Those popular tags related to generic objects and events showed a high metadata usage ratio, while those related to specific locations and objects showed a low image metadata usage ratio. Popular tags in Flickr appeared less frequently as image metadata when describing specific objects than specific times and locations for historical images in Flickr LCP collections. Understanding how people contribute image tags or image metadata in Flickr helps determine what users need to describe and query images, and could help improve image browsing and retrieval.
    July 15, 2013   doi: 10.1002/asi.22891   open full text
  • Analyzing group E‐mail exchange to detect data leakage.
    Polina Zilberman, Gilad Katz, Asaf Shabtai, Yuval Elovici.
    Journal of the American Society for Information Science and Technology. July 15, 2013
    Today's organizations spend a great deal of time and effort on e‐mail leakage prevention. However, there are still no satisfactory solutions; addressing mistakes are not detected and in some cases correct recipients are wrongly marked as potential mistakes. In this article we present a new approach for preventing e‐mail addressing mistakes in organizations. The approach is based on an analysis of e‐mail exchanges among members of an organization and the identification of groups based on common topics. When a new e‐mail is about to be sent, each recipient is analyzed. A recipient is approved if the e‐mail's content belongs to at least one common topic to both the sender and the recipient. This can be applied even if the sender and recipient have never communicated directly before. The new approach was evaluated using the Enron e‐mail data set and was compared with a well known method for the detection of e‐mail addressing mistakes. The results show that the proposed approach is capable of detecting 87% of nonlegitimate recipients while incorrectly classifying only 0.5% of the legitimate recipients. These results outperform previous work, which reports a detection rate of 82% without reference to the false positive rate.
    July 15, 2013   doi: 10.1002/asi.22886   open full text
  • The thematic and conceptual flow of disciplinary research: A citation context analysis of the journal of informetrics, 2007.
    Gali Halevi, Henk F. Moed.
    Journal of the American Society for Information Science and Technology. July 12, 2013
    This article analyzes the context of citations within the full text of research articles. It studies articles published in a single journal: the Journal of Informetrics (JOI), in the first year the journal was published, 2007. The analysis classified the citations into in‐ and out‐disciplinary content and looked at their use within the articles' sections such as introduction, literature review, methodology, findings, discussion, and conclusions. In addition, it took into account the age of cited articles. A thematic analysis of these citations was performed in order to identify the evolution of topics within the articles sections and the journal's content. A matrix describing the relationships between the citations' use, and their in‐ and out‐disciplinary focus within the articles' sections is presented. The findings show that an analysis of citations based on their in‐ and out‐disciplinary orientation within the context of the articles' sections can be an indication of the manner by which cross‐disciplinary science works, and reveals the connections between the issues, methods, analysis, and conclusions coming from different research disciplines.
    July 12, 2013   doi: 10.1002/asi.22897   open full text
  • Knowledge popularity in a heterogeneous network: Exploiting the contextual effects of document popularity in knowledge management systems.
    Xiqing Sha, Klarissa Ting‐Ting Chang, Cheng Zhang, Chenghong Zhang.
    Journal of the American Society for Information Science and Technology. July 11, 2013
    In organizations, the amount of attention that user‐generated knowledge receives in knowledge management systems (KMSs) may not imply its potential for benefiting organizational activities in terms of accelerating innovation and product development. To optimize the utilization of knowledge in organizations, it is crucial to identify factors that influence knowledge popularity. From a network perspective, this study proposes a model to evaluate knowledge popularity by investigating 2 attributes of contextual information (i.e., authors and tags) that are embedded in a heterogeneous knowledge network, and how they interact to impact knowledge popularity. Objective data obtained through the interaction history of a KMS in a global telecommunication company was applied to test the hypotheses. This paper contributes to the extant literature on knowledge popularity by identifying contextual attributions of knowledge, and empirically tests the impact of their interactions on knowledge popularity.
    July 11, 2013   doi: 10.1002/asi.22879   open full text
  • The Swedish system of innovation: Regional synergies in a knowledge‐based economy.
    Loet Leydesdorff, Øivind Strand.
    Journal of the American Society for Information Science and Technology. July 10, 2013
    Based on the complete set of firm data for Sweden (N = 1,187,421; November 2011), we analyze the mutual information among the geographical, technological, and organizational distributions in terms of synergies at regional and national levels. Using this measure, the interaction among three dimensions can become negative and thus indicate a net export of uncertainty by a system or, in other words, synergy in how knowledge functions are distributed over the carriers. Aggregation at the regional level (NUTS3) of the data organized at the municipal level (NUTS5) shows that 48.5% of the regional synergy is provided by the 3 metropolitan regions of Stockholm, Gothenburg, and Malmö/Lund. Sweden can be considered a centralized and hierarchically organized system. Our results accord with other statistics, but this triple helix indicator measures synergy more specifically and quantitatively. The analysis also provides us with validation for using this measure in previous studies of more regionalized systems of innovation (such as Hungary and Norway).
    July 10, 2013   doi: 10.1002/asi.22895   open full text
  • Full‐text citation analysis: A new method to enhance scholarly networks.
    Xiaozhong Liu, Jinsong Zhang, Chun Guo.
    Journal of the American Society for Information Science and Technology. July 09, 2013
    In this article, we use innovative full‐text citation analysis along with supervised topic modeling and network‐analysis algorithms to enhance classical bibliometric analysis and publication/author/venue ranking. By utilizing citation contexts extracted from a large number of full‐text publications, each citation or publication is represented by a probability distribution over a set of predefined topics, where each topic is labeled by an author‐contributed keyword. We then used publication/citation topic distribution to generate a citation graph with vertex prior and edge transitioning probability distributions. The publication importance score for each given topic is calculated by PageRank with edge and vertex prior distributions. To evaluate this work, we sampled 104 topics (labeled with keywords) in review papers. The cited publications of each review paper are assumed to be “important publications” for the target topic (keyword), and we use these cited publications to validate our topic‐ranking result and to compare different publication‐ranking lists. Evaluation results show that full‐text citation and publication content prior topic distribution, along with the classical PageRank algorithm can significantly enhance bibliometric analysis and scientific publication ranking performance, comparing with term frequency–inverted document frequency (tf–idf), language model, BM25, PageRank, and PageRank + language model (p < .001), for academic information retrieval (IR) systems.
    July 09, 2013   doi: 10.1002/asi.22883   open full text
  • Improving polarity classification of bilingual parallel corpora combining machine learning and semantic orientation approaches.
    José M. Perea‐Ortega, M. Teresa Martín‐Valdivia, L. Alfonso Ureña‐López, Eugenio Martínez‐Cámara.
    Journal of the American Society for Information Science and Technology. July 03, 2013
    Polarity classification is one of the main tasks related to the opinion mining and sentiment analysis fields. The aim of this task is to classify opinions as positive or negative. There are two main approaches to carrying out polarity classification: machine learning and semantic orientation based on the integration of knowledge resources. In this study, we propose to combine both approaches using a voting system based on the majority rule. In this way, we attempt to improve the polarity classification of two parallel corpora such as the opinion corpus for Arabic (OCA) and the English version of the OCA (EVOCA). Several experiments have been performed to check the feasibility of the proposed method. The results show that the experiment that took into account both approaches in the voting system obtained the best performance. Moreover, it is also shown that the proposed method slightly improves the best results obtained using machine learning approaches solely over the OCA and the EVOCA separately. Therefore, we can conclude that the approach proposed here might be considered a good strategy for polarity detection when we work with bilingual parallel corpora.
    July 03, 2013   doi: 10.1002/asi.22884   open full text
  • Adolescents' information‐creating behavior embedded in digital Media practice using scratch.
    Kyungwon Koh.
    Journal of the American Society for Information Science and Technology. July 03, 2013
    This study explores the ways adolescents create information collaboratively in the digital environment. In spite of the current widespread practice of information creation by young people, little research exists to illuminate how youth are engaged in creative information behavior or how they make participatory contributions to the changing information world. The purposefully selected sample includes teenagers who actively produce and share information projects, such as online school magazines, an information‐sharing website in Wiki, and a digital media library, using Scratch—a graphical programming language developed by MIT Media Lab. Qualitative data were collected through group and individual interviews informed by Dervin's Sense‐Making Methodology. The data analysis technique included directed qualitative content analysis with Atlas.ti. Findings reveal the process of information creation, including content development, organization, and presentation of information, as well as noticeable patterns by youth such as visualizing, remixing, tinkering, and gaining a sense of empowerment. This study extends our knowledge of the creative aspects of information behavior.
    July 03, 2013   doi: 10.1002/asi.22878   open full text
  • You scratch someone's back and we'll scratch yours: Collective reciprocity in social Q&A communities.
    Philip Fei Wu, Nikolaos Korfiatis.
    Journal of the American Society for Information Science and Technology. July 03, 2013
    Taking a structuration perspective and integrating reciprocity research in economics, this study examines the dynamics of reciprocal interactions in social question & answer communities. We postulate that individual users of social Q&A constantly adjust their kindness in the direction of the observed benefit and effort of others. Collective reciprocity emerges from this pattern of conditional strategy of reciprocation and helps form a structure that guides the very interactions that give birth to the structure. Based on a large sample of data from Yahoo! Answers, our empirical analysis supports the collective reciprocity premise, showing that the more effort (relative to benefit) an asker contributes to the community, the more likely the community will return the favor. On the other hand, the more benefit (relative to effort) the asker takes from the community, the less likely the community will cooperate in terms of providing answers. We conclude that a structuration view of reciprocity sheds light on the duality of social norms in online communities.
    July 03, 2013   doi: 10.1002/asi.22913   open full text
  • An open‐set size‐adjusted Bayesian classifier for authorship attribution.
    G. Bruce Schaalje, Natalie J. Blades, Tomohiko Funai.
    Journal of the American Society for Information Science and Technology. June 28, 2013
    Recent studies of authorship attribution have used machine‐learning methods including regularized multinomial logistic regression, neural nets, support vector machines, and the nearest shrunken centroid classifier to identify likely authors of disputed texts. These methods are all limited by an inability to perform open‐set classification and account for text and corpus size. We propose a customized Bayesian logit‐normal‐beta‐binomial classification model for supervised authorship attribution. The model is based on the beta‐binomial distribution with an explicit inverse relationship between extra‐binomial variation and text size. The model internally estimates the relationship of extra‐binomial variation to text size, and uses Markov Chain Monte Carlo (MCMC) to produce distributions of posterior authorship probabilities instead of point estimates. We illustrate the method by training the machine‐learning methods as well as the open‐set Bayesian classifier on undisputed papers of The Federalist, and testing the method on documents historically attributed to Alexander Hamilton, John Jay, and James Madison. The Bayesian classifier was the best classifier of these texts.
    June 28, 2013   doi: 10.1002/asi.22877   open full text
  • On the assessment of expertise profiles.
    Richard Berendsen, Maarten Rijke, Krisztian Balog, Toine Bogers, Antal Bosch.
    Journal of the American Society for Information Science and Technology. June 28, 2013
    Expertise retrieval has attracted significant interest in the field of information retrieval. Expert finding has been studied extensively, with less attention going to the complementary task of expert profiling, that is, automatically identifying topics about which a person is knowledgeable. We describe a test collection for expert profiling in which expert users have self‐selected their knowledge areas. Motivated by the sparseness of this set of knowledge areas, we report on an assessment experiment in which academic experts judge a profile that has been automatically generated by state‐of‐the‐art expert‐profiling algorithms; optionally, experts can indicate a level of expertise for relevant areas. Experts may also give feedback on the quality of the system‐generated knowledge areas. We report on a content analysis of these comments and gain insights into what aspects of profiles matter to experts. We provide an error analysis of the system‐generated profiles, identifying factors that help explain why certain experts may be harder to profile than others. We also analyze the impact on evaluating expert‐profiling systems of using self‐selected versus judged system‐generated knowledge areas as ground truth; they rank systems somewhat differently but detect about the same amount of pairwise significant differences despite the fact that the judged system‐generated assessments are more sparse.
    June 28, 2013   doi: 10.1002/asi.22908   open full text
  • Linked Open Data technologies for publication of census microdata.
    Gustavo Pabón, Claudio Gutiérrez, Javier D. Fernández, Miguel A. Martínez‐Prieto.
    Journal of the American Society for Information Science and Technology. June 20, 2013
    Censuses are one of the most relevant types of statistical data, allowing analyses of the population in terms of demography, economy, sociology, and culture. For fine‐grained analysis, census agencies publish census microdata that consist of a sample of individual records of the census containing detailed anonymous individual information. Working with microdata from different censuses and doing comparative studies are currently difficult tasks due to the diversity of formats and granularities. In this article, we show that novel data processing techniques can be applied to make census microdata interoperable and easy to access and combine. In fact, we demonstrate how Linked Open Data principles, a set of techniques to publish and make connections of (semi‐)structured data on the web, can be fruitfully applied to census microdata. We present a step‐by‐step process to achieve this goal and we study, in theory and practice, two real case studies: the 2001 Spanish census and a general framework for Integrated Public Use Microdata Series (IPUMS‐I).
    June 20, 2013   doi: 10.1002/asi.22876   open full text
  • Using bibliometrics to support the facilitation of cross‐disciplinary communication.
    Christopher J. Williams, Michael O'Rourke, Sanford D. Eigenbrode, Ian O'Loughlin, Stephen J. Crowley.
    Journal of the American Society for Information Science and Technology. June 20, 2013
    Given the importance of cross‐disciplinary research (CDR), facilitating CDR effectiveness is a priority for many institutions and funding agencies. There are a number of CDR types, however, and the effectiveness of facilitation efforts will require sensitivity to that diversity. This article presents a method characterizing a spectrum of CDR designed to inform facilitation efforts that relies on bibliometric techniques and citation data. We illustrate its use by the Toolbox Project, an ongoing effort to enhance cross‐disciplinary communication in CDR teams through structured, philosophical dialogue about research assumptions in a workshop setting. Toolbox Project workshops have been conducted with more than 85 research teams, but the project's extensibility to an objectively characterized range of CDR collaborations has not been examined. To guide wider application of the Toolbox Project, we have developed a method that uses multivariate statistical analyses of transformed citation proportions from published manuscripts to identify candidate areas of CDR, and then overlays information from previous Toolbox participant groups on these areas to determine candidate areas for future application. The approach supplies 3 results of general interest: A way to employ small data sets and familiar statistical techniques to characterize CDR spectra as a guide to scholarship on CDR patterns and trends. A model for using bibliometric techniques to guide broadly applicable interventions similar to the Toolbox. A method for identifying the location of collaborative CDR teams on a map of scientific activity, of use to research administrators, research teams, and other efforts to enhance CDR projects.
    June 20, 2013   doi: 10.1002/asi.22874   open full text
  • Tracing the footprint of knowledge spillover: Evidence from U.S.–China collaboration in nanotechnology.
    Li Tang, Guangyuan Hu.
    Journal of the American Society for Information Science and Technology. June 19, 2013
    The impact of international collaboration on research performance has been extensively explored over the past two decades. Most research, however, focuses on quantity and citation‐based indicators. Using the turnover of keywords, this study develops an integrative approach, tracking and visualizing the shift of the research stream, and tests it within the context of U.S.–China collaboration in nanotechnology. The results show evidence in support of the linkage between the emergence of a new research stream of Chinese researchers when there is U.S.–China collaboration. We also find that the triggered research stream diffused further via extended coauthorship. Policy implications for science and technology development and resource allocation in the United States and China are discussed.
    June 19, 2013   doi: 10.1002/asi.22873   open full text
  • Scientific impact evaluation and the effect of self‐citations: Mitigating the bias by discounting the h‐index.
    Emilio Ferrara, Alfonso E. Romero.
    Journal of the American Society for Information Science and Technology. June 19, 2013
    In this article, we propose a measure to assess scientific impact that discounts self‐citations and does not require any prior knowledge of their distribution among publications. This index can be applied to both researchers and journals. In particular, we show that it fills the gap of the h‐index and similar measures that do not take into account the effect of self‐citations for authors or journals impact evaluation. We provide 2 real‐world examples: First, we evaluate the research impact of the most productive scholars in computer science (according to DBLP Computer Science Bibliography, Universität Trier, Trier, Germany); then we revisit the impact of the journals ranked in the Computer Science Applications section of the SCImago Journal & Country Rank ranking service (Consejo Superior de Investigaciones Científicas, University of Granada, Extremadura, Madrid, Spain). We observe how self‐citations, in many cases, affect the rankings obtained according to different measures (including h‐index and ch‐index), and show how the proposed measure mitigates this effect.
    June 19, 2013   doi: 10.1002/asi.22976   open full text
  • Exploring methods to improve access to Music resources by aligning library Data with Linked Data: A report of methodologies and preliminary findings.
    Karen F. Gracy, Marcia Lei Zeng, Laurence Skirvin.
    Journal of the American Society for Information Science and Technology. June 07, 2013
    As a part of a research project aiming to connect library data to the unfamiliar data sets available in the Linked Data (LD) community's CKAN Data Hub (thedatahub.org), this project collected, analyzed, and mapped properties used in describing and accessing music recordings, scores, and music‐related information used by selected music LD data sets, library catalogs, and various digital collections created by libraries and other cultural institutions. This article reviews current efforts to connect music data through the Semantic Web, with an emphasis on the Music Ontology (MO) and ontology alignment approaches; it also presents a framework for understanding the life cycle of a musical work, focusing on the central activities of composition, performance, and use. The project studied metadata structures and properties of 11 music‐related LD data sets and mapped them to the descriptions commonly used in the library cataloging records for sound recordings and musical scores (including MARC records and their extended schema.org markup), and records from 20 collections of digitized music recordings and scores (featuring a variety of metadata structures). The analysis resulted in a set of crosswalks and a unified crosswalk that aligns these properties. The paper reports on detailed methodologies used and discusses research findings and issues. Topics of particular concern include (a) the challenges of mapping between the overgeneralized descriptions found in library data and the specialized, music‐oriented properties present in the LD data sets; (b) the hidden information and access points in library data; and (c) the potential benefits of enriching library data through the mapping of properties found in library catalogs to similar properties used by LD data sets.
    June 07, 2013   doi: 10.1002/asi.22914   open full text