2016年11月13日 星期日

XPath

參考資料(英文版)
參考資料(簡中版)
XPath在javax.xml.xpath包裡
這篇的Document、Element、NodeList是import org.w3c.dom 這一包,不要import錯了

※test.xml

<root>
    <first fa="f1">
        <second sa="s1">a1</second>
        <second>
            <third>
                a2
                <fourth>
                    a3
                </fourth>
            </third>
            b1
        </second>
        <second>c1</second>
    </first>
    
    <first fa="f2">
        <second sa="s2">d1</second>
        <first fa="f2">
            e1
            <second sa="s3">
                e2
                <third>e3</third>
            </second>
        </first>
        <second>f1</second>
    </first>
    
    <first fa="f3">
        <second sa="s3">g1</second>
        <second>
            h1
            <first fa="f3">
                h2
                <third>
                    h3
                    <second>
                        h4
                    </second>
                </third>
            </first>
        </second>
        <second>i1</second>
    </first>
</root>

※測試用的xml


※Test.java

try (InputStream is = Test.class.getClassLoader().getResourceAsStream("test.xml")) {
    try {
        DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = db.parse(is);
        XPath xpath = XPathFactory.newInstance().newXPath();
        NodeList list = (NodeList) xpath.evaluate("//first[@fa='f1']", doc, XPathConstants.NODESET);
    
        for (int i = 0; i < list.getLength(); i++) {
            Element element = (Element) list.item(i);
            NodeList list2 = element.getElementsByTagName("second");
            for (int j = 0; j < list2.getLength(); j++) {
                // System.out.println("node name=" + list2.item(j).getNodeName());
                System.out.println("text content=" + list2.item(j).getTextContent());
                
                NamedNodeMap map = list2.item(j).getAttributes();
                System.out.println("attr=>" + map.getNamedItem("sa"));
                System.out.println("==========");
            }
        }
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (XPathExpressionException e) {
        e.printStackTrace();
    }
} catch (IOException e1) {
    e1.printStackTrace();
}

※XPathConstants.NODESET表示回傳什麼型態,有5種
NUMBER、STRING、BOOLEAN這3個就不用說了
NODESET:回傳0~多個結點
NODE:回傳0~1個結點
下面還會說明兩者的差異

※「//first[@fa='f1']」的結果為:
text content=a1
attr=>sa="s1"
==========
text content=

a2

a3


b1

attr=>null
==========
text content=c1
attr=>null
==========

second有3個,所以會跑3次,可以看到第2個second以下的(包括子元素)都會跑,甚至連換行都有


※「//first[@fa='f2']」的結果為:
text content=d1
attr=>sa="s2"
==========
text content=
e2
e3

attr=>sa="s3"
==========
text content=f1
attr=>null
==========
text content=
e2
e3

attr=>sa="s3"
==========

因為first屬性是f2的有兩個,所以會跑4次


※「//first[@fa='f3']」的結果為:
text content=g1
attr=>sa="s3"
==========
text content=
h1

h2

h3

h4




attr=>null
==========
text content=
h4

attr=>null
==========
text content=i1
attr=>null
==========
text content=
h4

attr=>null
==========

first屬性是f3有兩個,第一次跑second有4個
然後第二次又再跑一次,所以是5次


※「//」和「/」的差別

//:不管任何層級,全部都會找
/:只會從根目錄開始找,以上面的例子,就是root

以「//first[@fa='f2']」為例,改成「/first[@fa='f2']」什麼結果都沒有,因為根目錄是root
再改成「/root/first[@fa='f2']」可以找到,但因為是從根目錄開始找,所以結果只會跑3次,少了子元素是first且屬性是fa,值是f2的那一次

※以上面的例子「/root/first」 可以發現NodeList長度為3,也就是全部都抓,如果只想抓第二筆,可以用「/root/first[2]」,要注意是從1開始的


※XPathConstants.NODESET/NODE差異

※「/」也可以用Node,就是上面程式碼的XPathConstants.NODESET,改成XPathConstants.NODE,如下:
try (InputStream is = TestJunit.class.getClassLoader().getResourceAsStream("xml.xml")) {
    try {
        DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = db.parse(is);
        XPath xpath = XPathFactory.newInstance().newXPath();
        Element element = (Element) xpath.evaluate("//first[@fa='f2']", doc, XPathConstants.NODE);
        NodeList list2 = element.getElementsByTagName("second");

        System.out.println(list2.getLength());
        for (int j = 0; j < list2.getLength(); j++) {
            System.out.println("text content=" + list2.item(j).getTextContent());
    
            NamedNodeMap map = list2.item(j).getAttributes();
            System.out.println("attr=>" + map.getNamedItem("sa"));
            System.out.println("==========");
        }
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (XPathExpressionException e) {
        e.printStackTrace();
    }
} catch (IOException e1) {
    e1.printStackTrace();
}

※Node強轉成Element;Node強轉成NodeList

沒有留言:

張貼留言