Abstract:A new concept of rare axis based on statistical facts is proposed, and an evaluation algorithm is designed thereafter. For the nested regular expressions containing rare axes, the proposed algorithm can reduce its evaluation complexity from polynomial time to nearly linear time. The distributed technique is also employed to construct the navigation axis indexes for resource description framework (RDF) graph data. Experiment results in DrugBank and BioGRID show that this method can improve the query efficiency significantly while ensuring the accuracy and meet the query requirements on Web-scale RDF graph data.The core idea of semantic web is to build a machineunderstandable data networks by giving formal semantics to the data on the web [1] . As the data base of semantic web, resource description framework(RDF) graph data have reached ten billion triples driven by the linked data movement [2] . RDF is a special kind of graph data model, whose characteristic is that when an ontology layer semantics is expressed, the edges of RDF graph can be configured as nodes, i.e., the set of edge labels may have a nonempty intersection with its set of nodes, which ensures that a query can get more implicit information by inference [3] . As a traditional graph data model, path query has always been the focus and difficulty of the research: on one hand, it can express a path query of any length, especially the navigational path query of unlimited length; on the other hand, its high evaluation complexity makes it difficult to meet the query requirements on the large scale RDF graph data [4, 5] .Nested regular expression(NRE) [6] is the latest path query expression with polynomial time complexity. It can achieve RDF schema(RDFS) semantic inference on the original RDF graph, and has the equivalence relationship with property path under the existential semantics [7] , i.e, the two kinds of expressions can be transformed to each other equivalently under the existential semantics [8] . However, for other path query expressions, they either do not involve RDFS semantic inference or achieve the reasoning by the closure-oriented method. For example, the representative implementation systems are Gleen [9] , SPARQLeR [10] , SPARQ2L [10] and PSPARQL [11] . Moreover, NRE is also proved to have strong expression ability and polynomial time evaluation complexity [12] . However, with the explosive growth of RDF graph data, even the polynomial time evaluation complexity cannot meet the path query requirements on the web-scale RDF graph data [13, 14] .In order to further improve the query efficiency of NRE on the large-scale RDF graph data, we first use distributed technology to build the navigation axis index of NRE on RDF graph data, and then make a statistics on the frequency of NRE navigation axis in the index. As the frequency of different navigation axes appearing in the NRE index on RDF graph is different, and when those low frequency navigation axes appear in the NRE expression, the NRE expression can be cut from these ...