Парсер BeautifulSoup пропускает некоторые теги отделенные текстовыми содержимым

Question

Привествую, тренируюсь в парсинге на python. Поставил себе задачу спарсить всю информацию из каждого блока с цитатами c сайта Quotes to Scrape. С самой цитатой или ссылкой "(about)" проблем не возникло. Однако я не могу получить ссылку "(Goodreads page)", потому что парсер BeautifulSoup опускает нужный мне тег. Вся нужная информация на сайте хранится в подобных блоках:

<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">
          “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
        </span>
        <span>
          by 
          <small class="author" itemprop="author">Albert Einstein</small>
          <a href="/author/Albert-Einstein">(about)</a>
          - 
          <a href="http://goodreads.com/author/show/9810.Albert_Einstein">(Goodreads page)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" itemprop="keywords" content="change,deep-thoughts,thinking,world"> 
            
            <a class="tag" href="/tag/change/page/1/">change</a>
            
            <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
            
            <a class="tag" href="/tag/thinking/page/1/">thinking</a>
            
            <a class="tag" href="/tag/world/page/1/">world</a>
            
        </div>
    </div>

Мне нужен второй тег a, который идет после дефиса внутри второго блока span. Почему BeautifulSoup таковой опускает и как мне его получить? Вот пример моего кода:

from requests import Session
from bs4 import BeautifulSoup

headers = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36'
}
with Session() as our_session :

    our_session.get('https://quotes.toscrape.com/', headers=headers)

    response = our_session.get(
    'https://quotes.toscrape.com/login',
    headers=headers
    )
    soup_for_login = BeautifulSoup(response.text, 'html5lib')

    our_csrf_token = soup_for_login.find('form').find('input').get('value')

    data = {
            'crsf_token': our_csrf_token,
            'password': '******',
            'username': 'Somelogin'
            }
    main_page_response = our_session.post('https://quotes.toscrape.com/login',
                     headers=headers,
                     data=data,
                     allow_redirects=True)

    # Теперь, когда мы авторизовались с использованием crsf-токена, можем попробовать спарсить интересующие нас данные.
    
    n = 1
    tmp = True
    while tmp is True :
        res = our_session.get(f'https://quotes.toscrape.com/page/{n}/', headers=headers)
        soup = BeautifulSoup(res.text, 'lxml')
        print(soup) # здесь можно увидеть, что искомого тега просто нет в структуре(
        
        # list_of_blocks = soup.find_all('div', class_='quote')
        # tmp = True if len(list_of_blocks) != 0 else False
        # for item in list_of_blocks:
        #     quote = item.find('span', class_='text').text
        #     author = item.find('small', class_='author').text
        #     about_ref = f'https://quotes.toscrape.com{item.find_all('span')[1].find('a', string='(about)').get('href')}'
        #     goodreads_ref = item.find_all('span')#.find('a', string='(Goodreads page)').get('href')
        #     print(f'{quote}\n{author}\n{about_ref}\n{goodreads_ref}')
        #n += 1

Получаем следующее:

#...
#<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
#<span class="text" itemprop="text">“The world as we have created it is a process of our #thinking. It cannot be changed without changing our thinking.”</span>
#<span>by <small class="author" itemprop="author">Albert Einstein</small>
#<a href="/author/Albert-Einstein">(about)</a>
#</span>        <----------------
#<div class="tags">
#            Tags:
#            <meta class="keywords" content="change,deep-thoughts,thinking,world" #itemprop="keywords"/>
#<a class="tag" href="/tag/change/page/1/">change</a>
#<a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
#<a class="tag" href="/tag/thinking/page/1/">thinking</a>
#<a class="tag" href="/tag/world/page/1/">world</a>
#</div>
#...

Желанный тег отсутствует.

БЛОГ НА HUSL

Парсер BeautifulSoup пропускает некоторые теги отделенные текстовыми содержимым

Ответы (0 шт):