Web Crawling

[AI SCHOOL 5기] 웹 크롤링 실습 - 웹 크롤링

Wadis 마감 상품 재고 체크 Google 메일 설정 1 2 3 4 5 6 7 8 9 10 11 12 import smtplib from email.mime.text import MIMEText def sendMail(sender, receiver, msg): smtp = smtplib.SMTP_SSL('smtp.gmail.com', 465) smtp.login(sender, 'your google app password') msg = MIMEText(msg) msg['Subject'] = 'Product is available!' smtp.sendmail(sender, receiver, msg.as_string()) smtp.quit() Wadis 상품 재고 체크 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # 라이브러리 선언 check_status = 1 url = 'https://www....

[AI SCHOOL 5기] 웹 크롤링 실습 - 셀레니움

Selenium 브라우저의 기능을 체크할 때 사용하는 도구 브라우저를 조종해야할 때도 사용 Import Libraries 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # 크롬 드라이버 파일 자동 다운로드 from webdriver_manager.chrome import ChromeDriverManager # 크롬 드라이버를 파일에 연결 from selenium.webdriver.chrome.service import Service from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time import pandas as pd import warnings warnings....

[AI SCHOOL 5기] 웹 크롤링 실습 - 웹 스크래핑 심화

Import Libraries 1 2 3 4 5 6 7 import requests from bs4 import BeautifulSoup import pandas as pd from datetime import datetime import time # time.sleep() import re 뉴스 검색 결과에서 네이버 뉴스 추출 네이버 뉴스 검색 결과 URL 분석 1 2 3 4 https://search.naver.com/search.naver? where=news& sm=tab_jum&  query=데이터분석 네이버 뉴스 검색 URL 불러오기 1 2 3 4 5 query = input() # 데이터분석 url = f'https://search.naver.com/search.naver?where=news&query={query}' web = requests....

[AI SCHOOL 5기] 웹 크롤링 실습 - 웹 스크래핑 기본

BeautifulSoup Library 1 2 from bs4 import BeautifulSoup from urllib.request import urlopen 단어의 검색 결과 출력 다음 어학사전 URL 불러오기 1 2 3 4 5 6 # 찾는 단어 입력 word = 'happiness' url = f'https://alldic.daum.net/search.do?q={word}' web = urlopen(url) web_page = BeautifulSoup(web, 'html.parser') 찾는 단어 출력 1 2 text_search = web_page.find('span', {'class': 'txt_emph1'}) print(f'찾는 단어: {text_search.get_text()}') 단어의 뜻 출력 1 2 3 4 list_search = web_page.find('ul', {'class': 'list_search'}) list_text = list_search....

[AI SCHOOL 5기] 웹 크롤링

Web Crawling vs Web Scraping Web Crawling: Bot이 web을 link를 통해 돌아다니는 것 Web Scraping: Webpage에서 원하는 자료를 긇어오는 것 HTML Tags Tag’s Name: html, head, body, p, span, li, ol, ul, div Tag’s Attribute: class, id, style, href, src The Process of Web Scraping URL 분석 (query 종류 등) URL 구성 HTTP Response 얻기 (urlopen(URL) or request.get(URL).content) HTTP source 얻기 (BeautifulSoup(HTTP Response, 'html.parser')) HTML Tag 꺼내기 (.find('tag_name', {'attr_name':'attr_value'})) Tag로부터 텍스트 혹은 Attribute values 꺼내기 (Tag....