Frontiers of Computer Science, Beijing (2025)

16/04/2021

Attention Based Simplified Deep Residual Network for Citywide Crowd Flows Prediction
https://link.springer.com/article/10.1007/s11704-020-9194-x
Abstract:Crowd flows prediction is an important problem of urban computing whose goal is to predict the number of incoming and outgoing people of regions in the future. In practice, emergency applications often require less training time. However, there is a little work on how to obtain good prediction performance with less training time. In this paper, we propose a simplified deep residual network for our problem. By using the simplified deep residual network, we can obtain not only less training time but also competitive prediction performance compared with the existing similar method. Moreover, we adopt the spatio-temporal attention mechanism to further improve the simplified deep residual network with reasonable additional time cost. Based on the real datasets, we construct a series of experiments compared with the existing methods. The experimental results confirm the efficiency of our proposed methods.

16/04/2021

Predicting protein subchloroplast locations: The 10th anniversary
https://link.springer.com/article/10.1007/s11704-020-9507-0
Abstract:Chloroplast is a type of subcellular organelle in green plants and algae. It is the main subcellular organelle for conducting photosynthetic process. The proteins, which localize within the chloroplast, are responsible for the photosynthetic process at molecular level. The chloroplast can be further divided into several compartments. Proteins in different compartments are related to different steps in the photosynthetic process. Since the molecular function of a protein is highly correlated to the exact cellular localization, pinpointing the subchloroplast location of a chloroplast protein is an important step towards the understanding of its role in the photosynthetic process. Experimental process for determining protein subchloroplast location is always costly and time consuming. Therefore, computational approaches were developed to predict the protein subchloroplast locations from the primary sequences. Over the last decades, more than a dozen studies have tried to predict protein subchloroplast locations with machine learning methods. Various sequence features and various machine learning algorithms have been introduced in this research topic. In this review, we collected the comprehensive information of all existing studies regarding the prediction of protein subchloroplast locations. We compare these studies in the aspects of benchmarking datasets, sequence features, machine learning algorithms, predictive performances, and the implementation availability. We summarized the progress and current status in this special research topic. We also try to figure out the most possible future works in predicting protein subchloroplast locations. We hope this review not only list all existing works, but also serve the readers as a useful resource for quickly grasping the big picture of this research topic. We also hope this review work can be a starting point of future methodology studies regarding the prediction of protein subchloroplast locations.

13/11/2019

Real-time Manifold Regularized Context-aware Correlation Tracking
https://link.springer.com/article/10.1007/s11704-018-8104-y
Abstract: Despite the demonstrated success of numerouscorrelation filter (CF) based tracking approaches, their assumption of circulant structure of samples introduces significant redundancy to learn an effective classifier. In this paper, we develop a fast manifold regularized context-aware correlation tracking algorithm that mines the local manifold structure information of different types of samples. First, different from the traditional CF based tracking that only uses one base sample, we employ a set of contextual samples near to the base sample, and impose a manifold structure assumption onthem. Afterwards, to take into account the manifold structure among these samples, we introduce a linear graph Laplacian regularized term into the objective of CF learning. Fortunately, the optimization can be efficiently solved in a closed form with fast Fourier transforms (FFTs), which contributes to a highly efficient implementation. Extensive evaluations on the OTB100 and VOT2016 datasets demonstrate that the proposed tracker performs favorably against several state-of-the-art algorithms in terms of accuracy and robustness. Especially, our tracker is able to run in real-time with 28 fps on a single CPU.

07/11/2019

NEXT: A Neural Network Framework for Next POI Recommendation
https://link.springer.com/article/10.1007/s11704-018-8011-2
Abstract: The task of \textit{next POI recommendations} has been studied extensively in recent years. However, developing a unified recommendation framework to incorporate multiple factors associated with both POIs and users remains challenging, because of the heterogeneity nature of these information. Further, effective mechanisms to smoothly handle cold-start cases are also a difficult topic. Inspired by the recent success of neural networks in many areas, in this paper, we propose a simple yet effective neural network framework, named NEXT, for next POI recommendations. NEXT is a unified framework to learn the hidden intent regarding user's next move, by incorporating different factors in a unified manner. Specifically, in NEXT, we incorporate meta-data information, \eg user friendship and textual descriptions of POIs, and two kinds of temporal contexts (\ie time interval and visit time). To leverage sequential relations and geographical influence, we propose to adopt DeepWalk, a network representation learning technique, to encode such knowledge. We evaluate the effectiveness of NEXT against other state-of-the-art alternatives and neural networks based solutions. Experimental results on three publicly available datasets demonstrate that NEXT significantly outperforms baselines in real-time next POI recommendations. Further experiments show inherent ability of NEXT in handling cold-start.

06/11/2019

Leveraging proficiency and preference for online Karaoke recommendation
https://link.springer.com/article/10.1007/s11704-018-7072-6
Abstract: Recently, many online Karaoke (KTV) platforms have been released, where music lovers sing songs on these platforms. In the meantime, the system automatically evaluates user proficiency according to their singing behavior. Recommending approximate songs to users can initialize singers’ participation and improve users’ loyalty to these platforms. However, this is not an easy task due to the unique characteristics of these platforms. First, since users may be not achieving high scores evaluated by the system on their favorite songs, how to balance user preferences with user proficiency on singing for song recommendation is still open. Second, the sparsity of the user-song interaction behavior may greatly impact the recommendation task. To solve the above two challenges, in this paper, we propose an information-fused song recommendation model by considering the unique characteristics of the singing data. Specifically, we first devise a pseudo-rating matrix by combing users’ singing behavior and the system evaluations, thus users’ preferences and proficiency are leveraged. Then we mitigate the data sparsity problem by fusing users’ and songs’ rich information in the matrix factorization process of the pseudo-rating matrix. Finally, extensive experimental results on a real-world dataset show the effectiveness of our proposed model.

28/10/2019

Frontiers of Computer Science 2019 13 (6) is available online. Check it out here:
https://link.springer.com/journal/11704/13/6

28/10/2019

Optimized high order product quantization for approximate nearest neighbors search
https://link.springer.com/article/10.1007/s11704-018-7049-5
Abstract: Product quantization is now considered as an effective approach to solve the approximate nearest neighbor (ANN) search. A collection of derivative algorithms have been developed. However, the current techniques ignore the intrinsic high order structures of data, which usually contain helpful information for improving the computational precision.Inthispaper,aimingatthecomplexstructureofhighorder data, we designan optimizedtechnique,called optimized high order product quantization (O-HOPQ) for ANN search. In O-HOPQ, we incorporate the high order structures of the data into the process of designing a more eﬀective subspace decomposition way. As a result, spatial adjacent elements in the high order data space are grouped into the same subspace. Then, O-HOPQ generates its spatial structured codebook,byoptimizingthequantizationdistortion.Startingfrom the structured codebook, the global optimum quantizers can be obtained eﬀectively and eﬃciently. Experimental results show that appropriate utilization of the potential information that exists in the complex structure of high order data will result in signiﬁcant improvements to the performance of the product quantizers. Besides, the high order structure based approaches are eﬀective to the scenario where the data have intrinsic complex structures.

28/10/2019

Computer comparisons in the presence of performance variation
https://link.springer.com/article/10.1007/s11704-018-7319-2
Abstract:We present three resampling methods for computer evaluation and comparison in the presence of performance variation. The proposed methods have more stable results and can illustrate real situations compared to prior work. They can be applied to SPEC benchmarks and Bigdata benchmarks with more variances.

14/10/2019

HGeoHashBase: an optimized storage model of spatial objects for location-based services
https://link.springer.com/article/10.1007/s11704-018-7030-3
Abstract：Many location-based services need to query objects existing in a specific space, such as location-based tourism resource recommendation. Both a large number of spatial objects and the real-time object access requirements of location-based services pose a big challenge for spatial object storage and query management. In this paper, we propose HGeoHashBase, an improved storage model by integrating GeoHash with key-value structure, to organize spatial objects for efficient range queries. GeoHash is responsible for spatial encoding and key-value structure as underlying data storage. Both the similarity of the encodings for objects in the close geographical locations and the multi-version data mechanism are blended into the proposed model well. Considering the tradeoff between encoding precision and query performance, a theoretical proof is presented. Extensive experiments are designed and conducted, whose results show that the proposed model can gain significant performance improvement.

10/10/2019

Compiler Testing: A Systematic Literature Analysis
https://link.springer.com/article/10.1007/s11704-019-8231-0
Abstract Compilers are widely-used infrastructures in the software development and expected to be trustworthy. In the literature, various testing technologies have been proposed to guarantee the quality of compilers. However, there remains an obstacle to comprehensively characterize and understand compiler testing. To overcome this obstacle, we propose a literature analysis framework to gain insights into the compiler testing area. First, we perform an extensive search to construct a dataset related to compiler testing papers. Then, we conduct a bibliometric analysis to analyze the productive authors, the influential papers, and the frequently tested compilers based on our dataset. Finally, we utilize association rules and collaboration networks to mine the authorships and the communities of interests among researchers and keywords. Some valuable results are reported. We find that the USA is the leading country that has the most influential researchers and institutions. The most active keyword is ``random testing''. We also find that most researchers have broad interests within small-scale collaborators in the compiler testing area.

16/09/2019

AAMcon: an adaptively distributed SDN controller in data center networks
https://link.springer.com/article/10.1007/s11704-019-7266-6
Abstract: when evaluating the performance of distributed Software-Defined Network (SDN) controller architecture in Data Center Networks, the required number of controllers for a given network topology and their location are major issues of interest. To address these issues, this study proposes the Adaptively Adjusting and Mapping controllers (AAMcon) to design a stateful data plane. We use the complex network community theory to select a key switch to place the controller which is closer to switches it controls in a subnet. A physically distributed but logically centralized controller pool is built based on the Network Function Virtualization (NFV). And then we propose a fast start/overload avoid algorithm to adaptively adjust the number of controllers according to the demand. We performed an analysis for AAMcon to find the optimal distance between the switch and controller. Finally, experiments show the following results. (1) For the number of controllers, AAMcon can greatly follow the demand; for the placement location of controller, controller can respond to the request of switch with the least distance to minimize the delay between the switch and it. (2) For failure tolerance, AAMcon shows good robustness. (3) AAMcon requires less delay to the network with more significant community structure. In fact, there is an inverse relationship between the community modularity and average distance between the switch and controller, i.e., the average delay decreases when the community modularity increases. (4) AAMcon can achieve the load balance between the controllers. (5) Compared to DCP-GK and k-critical, AAMcon shows good performance.

27/08/2019

Non-sequential striping encoder from replication to erasure coding for distributed storage system
https://link.springer.com/article/10.1007/s11704-019-8403-y
Abstract:Replication and erasure coding are well deployed in modern distributed storage systems. According to the popularity of data, replication scheme is well-performance for hot data and erasure coding scheme is storage-efficient for cold data. When hot data turn cold, an encoder will start to convert multi-replica data to coded data. However, current encoders based on sequential striping do not perform well on various data layouts, resulting in risky co-located blocks and heavy I/O consumption.
We propose ASICE, a new encoder based on non-sequential striping, which constructs non-sequential stripes according to the data layout, performs the conversion quickly with low overheads, especially low cross-rack traffic, and avoids co-located blocks. Moreover, ASICE matches the data popularity and the number of replicas at a fine grain, which helps to balance the load and amortise the encoding overheads to avoid I/O burst. We make evaluation on a 9-node Hadoop testbed and simulation for large-scale clusters, and the results show that ASICE can achieve the goal of reducing cross-rack traffic, avoiding co-located blocks and amortizing the encoding overheads.

23/08/2019

An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment
https://link.springer.com/article/10.1007/s11704-018-7167-0
Abstract: N-hop neighborhoods information is very useful in analytic tasks on large-scale graphs, like finding clique in a social network, recommending friends or advertising links according to one’s interests, predicting links among websites and etc. To get the N-hop neighborhoods information on alarge graph, such as a web graph, a twitter social graph, the most straightforward method is to conduct a breadth first search (BFS) on a parallel distributed graph processing framework, such as Pregel and GraphLab. However, due to the massive volume of message transfer, the BFS method results in high communication cost and has low efficiency.
In this work, we propose a key/value based method, namely KVB, which perfectly fits into the prevailing parallel graph processing framework and computes N-hop neighborhoods on a large scale graph efficiently. Unlike the BFS method, our method need not transfer large amount of neigh borhoods information, thus, significantly reduces the over head on both the communication and intermediate results in the distributedframework.We formalizethe N-hopneighbor hoods query processing as an optimization problem based on a set of quantitative cost metrics of parallel graph processing. Moreover, we propose a solution to efficiently load only the relevant neighborhoodsfor computation. Specially, we prove the optimal partial neighborhoods load problem is NP-hard and carefully design a heuristic strategy. We have implemented our algorithm on a distributed graph framework - Spark GraphX and validated our solution with extensive experiments over a number of real world and synthetic large graphson a modest indoorcluster. Experimentsshow that our solution generally gains an order of magnitude speedup comparing to the state-of-art BFS implementation.

19/08/2019

Understanding the mechanism of social tie in the propagation process of social network with communication channel
https://link.springer.com/article/10.1007/s11704-018-7453-x
Abstract: Information propagation is an important role of online social networks. It is critical for governments, enterprises and individuals to understand the law of information diffusion in social networks. In this paper, we focus on the information propagation mechanism between adjacent users, i.e., what the factors that affect uses’ decision are, and how they work.
We start from considering the decision stochasticity. It means that if a user likes a message he will share it with a probability, otherwise he will share it with another probability. Then we find that there is close correlation between social tie and communication channel, and propose a novel propagation model based on communication channel, that is Social Tie Channel model. By coding information item based on receiver’s features, the model avoids optimizing propagating parameters for every topic, so that its complexity is close to the IC model (two parameters for each edge). The model also could naturally incorporate many factors affecting the information propagation through edges such as content topic, user preference, etc., and it has good generality, the major propagation models can be interpreted in its framework.
Extensive experiments conducted on real-world datasets demonstrate that STC model can effectively capture not only the user’s decision stochasticity characteristic but also the difference of the relationship between user pairs, and out-perform the up-to-date methods in dig/retweet prediction task.

09/08/2019

Algebraic criteria for finite automata understanding of regular language
https://link.springer.com/article/10.1007/s11704-019-6525-x
Abstract: Using the theories of many-valued logic and semi-tensor product of matrices (STP), this paper investigates how to mathematically determine whether or not a regular language is recognized by a finite automaton. To this end, the behavior of finite automata is first formulated as bilinear dynamic equations, which provide a uniform model for deterministic and non-deterministic finite automata. Based on the bilinear model, the recognition power of finite automata understanding of regular languages is investigated and several algebraic criteria are obtained. With the algebraic criteria, to judge whether a regular sentence is accepted by a finite automaton or not, one only need to calculate an STP of some vectors, rather than making the sentence run over the machine as traditional manners. Further, the inverse problem of recognition is considered, an algorithm is developed that can mathematically construct all the accepted sentences for a given finite automaton. The algebraic approach of this paper may be a new angle and means to understand and analyze the dynamics of finite automata.

05/08/2019

Patent expanded retrieval via word embedding under composite-domain perspectives
https://link.springer.com/article/10.1007/s11704-018-7056-6
Abstract：Patent prior art search uses dispersed information to retrieve all the relevant documents with strong ambiguity from the massive patent database. This challenging task consists in patent reduction and patent expansion. Existing studies on patent reduction ignore the relevance betweentechnical characteristics and technical domains, and result in ambiguous queries. Works on patent expansion expand terms from external resource by selecting words with similar distribution or similar semantics. However, this splits the relevance between the distribution and semantics of the terms. Besides, common repository hardly meets the requirement of patent expansion for uncommon semantics and unusual terms. In order to solve these problems, we first present a novel composite-domain perspective model which converts the technical characteristic of a query patent to a specific composite classified domain and generates aspect queries. We then implement patent expansion with double consistency by combining distribution and semantics imultaneously. We also propose to train semantic vector spaces via word embedding under the specific classified domains, so as to provide domain-aware expanded resource. Finally, multiple retrieval results of the same topic are merged based on perspective weight and rank in the results. Our experimental results on CLEP-IP 2010 demonstrate that our method is very effective. It reaches about 5.43% improvement in recall and nearly 12.38% improvement in PRES over the state-of-the-art. Our work also achieves the best performance balance in terms of recall, MAP and PRES.

01/08/2019

A V2I communication-based pipeline model for adaptive urban traffic light scheduling
https://link.springer.com/article/10.1007/s11704-017-7043-3
Abstract：Adaptive traffic light scheduling based on real-time traffic information processing has proven effective for urban traffic congestion management. However, fine-grained information regarding individual vehicles is difficult to acquire through traditional data collection techniques and its accuracy cannot be guaranteed because of congestion and harsh environments. In this study, we first build a pipeline model based on vehicle-to-infrastructure communication, which is a salient technique in vehicular ad hoc networks.
This model enables the acquisition of fine-grained and accurate traffic information in real time via message exchange between vehicles and roadside units. We then propose an intelligent traffic light scheduling method (ITLM) based on a “demand assignment” principle by considering the types and turning intentions of vehicles. In the context of this principle, a signal phase with more vehicles will be assigned a longer green time. Furthermore, a green-way traffic light scheduling method (GTLM) is investigated for special vehicles (e.g., ambulances and fire engines) in emergency scenarios. Signal states will be adjusted or maintained by the traffic light control system to keep special vehicles moving along smoothly. Comparative experiments demonstrate that the ITLM reduces average wait time by 34%–78% and average stop frequency.