IT源码网

java之从 jsoup 的嵌套 href 检索数据

bluestorm 2024年09月07日 程序员 27 0

我想从 jsoup 的嵌套 href 检索数据,我的意思是: 我有链接: https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999

我想从这 10 个战斗机中获取每个数据,例如:

1. 斯蒂普·米奥西奇 年龄:37 或者 协会: 强悍风格的战斗团队

2. 丹尼尔·科米尔 年龄:40 或者 协会: 美国跆拳道学院

等等..

如何做到这一点?

    String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999"; 
    Document document = Jsoup.connect(url).get(); 
 
    Elements allH1 = document.select("h2"); 
    for (Element href : allH1) { 
 
        Elements allAge = document.select("div.birth_info"); 
        for (Element  age : allAge) { 
            System.out.println(href.select("a[href]").text().toString()); 
            System.out.println(age.select() // something there?); 
        } 

请您参考如下方法:

您要查找的数据存在于单独的页面上 - 每个战士都有自己的页面,因此您必须逐一抓取所有页面才能获取数据。
首先,您必须使用选择器 h2 > a[href] 获取每个页面的链接:

String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999"; 
Document document = Jsoup.connect(url).get(); 
Elements fighters = document.select("h2 > a[href]"); 
for (Element fighter : fighters) { 
     System.out.println(fighter.text() + " " + fighter.attr("href")); 
} 

之后,您可以加载每个页面并提取数据:

String fighterUrl = "https://www.sherdog.com" + fighter.attr("href");  
Document doc = Jsoup.connect(fighterUrl).get(); 
Element fighterData = doc.select("div.data").first(); 
System.out.println(fighterData.text()); 

结合在一起,你会得到:

String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999"; 
Document document = Jsoup.connect(url).get(); 
Elements fighters = document.select("h2 > a[href]"); 
for (Element fighter : fighters) { 
    System.out.println(fighter.text()); 
    String fighterUrl = "https://www.sherdog.com" + fighter.attr("href");  
    Document doc = Jsoup.connect(fighterUrl).get(); 
    Element fighterData = doc.select("div.data").first(); 
    System.out.println(fighterData.text()); 
    System.out.println("---------------"); 
} 

(部分)输出是:

Stipe Miocic Born: 1982-08-19 AGE: 37 Independence, Ohio United States Height 6'4" 193.04 cm Weight 245 lbs 111.13 kg Association: Strong Style Fight Team Class: Heavyweight Wins 19 15 KO/TKO (79%) 0 SUBMISSIONS (0%) 4 DECISIONS (21%) Losses 3 2 KO/TKO (67%) 0 SUBMISSIONS (0%) 1 DECISIONS (33%)

Daniel Cormier Born: 1979-03-20 AGE: 40 San Jose, California United States Height 5'11" 180.34 cm Weight 251 lbs 113.85 kg Association: American Kickboxing Academy Class: Heavyweight Wins 22 10 KO/TKO (45%) 5 SUBMISSIONS (23%) 7 DECISIONS (32%) Losses 2 1 KO/TKO (50%) 0 SUBMISSIONS (0%) 1 DECISIONS (50%) N/C 1

如果您想获取年龄、关联等作为单独的字段,则必须使用正则表达式提取它们。


评论关闭
IT源码网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!