Cách Xây Dựng Bot Hộp Mù Pop Mart | Hướng Dẫn Tự Động Hóa Đầy Đủ

Cách Xây Dựng Bot Săn Hộp Mù Pop Mart: Hướng Dẫn Tự Động Hóa Hoàn Chỉnh Cho Các Nhà Sưu Tầm

Việc phát hành các hộp mù rất được săn đón thường kết thúc chỉ trong tích tắc, để lại vô số người đam mê thất bại trong cuộc chiến chống lại các kịch bản tự động. Bằng cách xây dựng một công cụ tự động hóa chuyên dụng, những người sưu tầm bình thường có thể đạt được phản hồi ở mức mili giây, giám sát tồn kho theo thời gian thực và cạnh tranh bình đẳng với các tay buôn bán hàng bằng cách sử dụng các công cụ chuyên nghiệp. Bạn đã sẵn sàng để sở hữu những món sưu tầm hiếm có đầy khao khát chưa?

Tại Sao Nên Tự Động Hóa Mua Hộp Mù?

Giữa bộ sưu tập đồ chơi thiết kế lấp lánh của Pop Mart, loạt hộp mù nổi bật với sức hấp dẫn độc đáo của nó. Được tạo ra bởi nghệ sĩ Molly Yllom, loạt đồ này khéo léo kết hợp những biểu cảm tình cảm phức tạp, gây tiếng vang sâu sắc với những người sưu tầm coi trọng sức khỏe tâm thần và sự kết nối tình cảm.

Loạt này đã thành công trong việc ra mắt một số chủ đề phổ biến, chẳng hạn như "Khóc Một Lần Nữa", "Nhà Máy Nước Mắt", và "Khóc Vì Tình Yêu", và đã hợp tác với các thương hiệu nổi tiếng như The Powerpuff Girls. Mỗi chủ đề thường bao gồm 6 đến 12 thiết kế khác biệt, trong khi các phiên bản bí mật rất được mong đợi xuất hiện khoảng một lần cho mỗi 72 đến 288 hộp mù, với độ hiếm cụ thể tùy theo loạt.

Đối với những người sưu tầm hộp mù, việc mua thủ công rất khó khăn, chủ yếu do tình cảm sâu sắc mà người hâm mộ dành cho những tác phẩm nghệ thuật này. Những bức tượng này không chỉ là hàng tiêu dùng đơn giản; chúng là những phần có ý nghĩa trong bộ sưu tập cá nhân. Mối liên kết cảm xúc mạnh mẽ này dẫn đến nhu cầu thị trường vượt xa nguồn cung thực tế, làm cho việc mua sắm tự động trở thành chiến lược quan trọng để đảm bảo việc sở hữu thành công.

Tại Sao Khi Mua Hộp Mù Bằng Tay Thường Thất Bại

Hiểu biết sâu sắc về các rào cản kỹ thuật mà việc mua thủ công đối mặt trong các đợt phát hành hộp mù cho thấy sự cần thiết không thể thiếu của các công cụ tự động hóa.

Trang web chính thức của Pop Mart triển khai nhiều lớp cơ chế chống tự động hóa phức tạp, bao gồm nhận diện CAPTCHA, hạn chế tần suất truy cập theo địa chỉ IP, và kiểm soát truy cập theo khu vực. Trong những thời điểm mua sắm cao điểm khi nhu cầu sản phẩm tăng vọt, các máy chủ thường quá tải do lưu lượng truy cập lớn, dẫn đến việc tải trang rất chậm hoặc thậm chí hoàn toàn sập, khiến cho việc thao tác thủ công gần như không khả thi.

import asyncio
import json
import os
from playwright.async_api import async_playwright
import sys

# Define target keywords for filtering relevant products
TARGET_KEYWORDS = ["CRYBABY", "Crybaby"]
# Base URL of the Pop Mart website
BASE_URL = "https://www.popmart.com"
# Output file path for saving scraped product data
OUTPUT_FILE = os.path.join("data", "products.json")

# Nstproxy proxy service configuration (please replace with your actual credentials)
# This information can usually be found in the "Proxy Setup" section of your Nstproxy user dashboard
NSTPROXY_USERNAME = "your_nstproxy_username"
NSTPROXY_PASSWORD = "your_nstproxy_password"
NSTPROXY_HOST = "gate.nstproxy.io" # Nstproxy gateway address, may vary
NSTPROXY_PORT = "24125" # Nstproxy port number, may vary

async def scrape_popmart_new_arrivals():
    print("New product information scraping task started...")

    try:
        async with async_playwright() as p:
            # Construct the Nstproxy proxy server connection string
            proxy_server_address = f"http://{NSTPROXY_USERNAME}:{NSTPROXY_PASSWORD}@{NSTPROXY_HOST}:{NSTPROXY_PORT}"
            
            # Launch Chromium browser instance and configure Nstproxy proxy
            browser_instance = await p.chromium.launch(
                headless=True, # Set to True to run in headless mode for efficiency
                proxy={
                    "server": proxy_server_address
                }
            )
            
            # Create a new browser context and configure the proxy again to ensure all requests go through the proxy
            browser_context = await browser_instance.new_context(
                proxy={
                    "server": proxy_server_address
                }
            )
            page_instance = await browser_context.new_page()
            # Navigate to Pop Mart new arrivals page, set timeout
            await page_instance.goto("https://www.popmart.com/us/new-arrivals", timeout=30000)
            # Wait for specific elements to load, indicating page content is mostly rendered
            await page_instance.wait_for_selector("div.index_title__jgc2z")

            # Attempt to handle and close potential location selection pop-up window
            try:
                location_popup_selector = "div.index_siteCountry___tWaj"
                # Briefly wait for pop-up to appear, no exception if not present
                await page_instance.wait_for_selector(location_popup_selector, timeout=2000)
                await page_instance.click(location_popup_selector)
                print("Location selection pop-up closed successfully.")
            except Exception:
                print("No location selection pop-up detected, continuing execution.")

            # Attempt to handle and close potential policy acceptance window
            try:
                policy_accept_selector = "div.policy_acceptBtn__ZNU71"

                # Wait for policy acceptance button to be visible
                await page_instance.wait_for_selector(policy_accept_selector, timeout=8000, state="visible")

                policy_button = await page_instance.query_selector(policy_accept_selector)

                if policy_button:
                    await asyncio.sleep(1)  # Give a small buffer time to ensure JavaScript is loaded
                    await policy_button.click()
                    print("Policy acceptance button clicked successfully.")
                else:
                    print("Could not find policy acceptance button.")
            except Exception as e:
                print(f"Policy acceptance pop-up did not appear or click failed: {e}")

            collected_results = []

            # Find all sections containing new product information
            info_sections = await page_instance.query_selector_all("div.index_title__jgc2z")

            for section_element in info_sections:
                release_date_text = (await section_element.text_content()).strip()

                # Get the product list container adjacent to the current section
                next_sibling_element = await section_element.evaluate_handle("el => el.nextElementSibling")
                product_card_elements = await next_sibling_element.query_selector_all("div.index_productCardCalendarContainer__B96oH")

                for card_element in product_card_elements:
                    # Extract product title
                    title_span = await card_element.query_selector("div.index_title__9DEwH span")
                    product_title = await title_span.text_content() if title_span else ""
                    # Filter products by keywords
                    if not any(keyword.lower() in product_title.lower() for keyword in TARGET_KEYWORDS):
                        continue

                    # Extract release time
                    time_div = await card_element.query_selector("div.index_time__EyE6b")
                    release_time_text = await time_div.text_content() if time_div else "N/A"

                    # Extract product URL
                    link_element = await card_element.query_selector("a[href^=\\'/us\\']")
                    product_href = await link_element.get_attribute("href") if link_element else None
                    full_product_url = f"{BASE_URL}{product_href}" if product_href else "N/A"

                    # Organize scraped data
                    product_entry = {
                        "title": product_title.strip(),
                        "release_date": release_date_text.strip(),  # e.g., "Upcoming JUL 11"
                        "release_time": release_time_text.strip(),     # e.g., "09:00"
                        "url": full_product_url
                    }
                    collected_results.append(product_entry)

            await browser_instance.close()

            # Save scraping results as a JSON file
            os.makedirs("data", exist_ok=True)
            with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
                json.dump(collected_results, f, indent=2, ensure_ascii=False)

            print(f"Successfully scraped {len(collected_results)} matching products. Data saved to {OUTPUT_FILE}")
    except Exception as e:
        print(f"Error during new product scraping: {e}")
        sys.exit(1)  # Exit with error code 1 on task failure


if __name__ == "__main__":
    asyncio.run(scrape_popmart_new_arrivals())

từ apscheduler.schedulers.background nhập BackgroundScheduler
nhập subprocess
nhập os
nhập time

# Định nghĩa đường dẫn tệp dữ liệu và các tham số thử lại
DATA_FILE = os.path.join("dữ liệu", "products.json")
MAX_RETRIES = 5
RETRY_DELAY = 10

def parse_product_release_datetime(date_string, time_string):
    # Chuyển đổi các chuỗi như "Sắp tới JUL 11" và "09:00" thành các đối tượng datetime, mặc định là năm hiện tại.
    try:
        # Làm sạch các từ khóa không liên quan từ chuỗi ngày
        cho keyword_to_remove trong ["Sắp tới", "Có sẵn"]:
            date_string = date_string.replace(keyword_to_remove, "").strip()
        
        # Kết hợp ngày, năm và chuỗi thời gian, và phân tích thành một đối tượng datetime
        full_datetime_string = f"{date_string} {datetime.now().year} {time_string}"
        # Định dạng ví dụ: "JUL 11 2025 09:00"
        return datetime.strptime(full_datetime_string, "%b %d %Y %H:%M")
    ngoại lệ là e:
        print(f"Không thể phân tích datetime, chuỗi nguồn: \\'{date_string} {time_string}\\', lỗi: {e}")
        return None

def initiate_purchase_bot(product_details):
    # Khởi động script purchase-bot.py với logic thử lại tích hợp
    product_url = product_details.get("url")
    product_title = product_details.get("title")
    
    cho attempt_count trong range(MAX_RETRIES + 1):  # Bao gồm cả lần thử đầu tiên cũng như các lần thử lại
        print(f"Khởi động bot mua cho sản phẩm \\'{product_title}\\' (thử {attempt_count + 1}/{MAX_RETRIES + 1})...")
        try:
            # Giả sử purchase-bot.py đã được cấu hình đúng với proxy Nstproxy
            subprocess.run(["python3", "purchase-bot.py", product_url], check=True)
            print(f"Khởi động bot mua thành công cho sản phẩm \\'{product_title}\\'.")
            return  # Thoát ngay lập tức khi thành công
        ngoại lệ là subprocess.CalledProcessError là e:
            print(f"Chạy bot mua thất bại (thử {attempt_count + 1}), mã thoát: {e.returncode}")
            nếu attempt_count < MAX_RETRIES:
                print(f"Thử lại trong {RETRY_DELAY} giây...")
                time.sleep(RETRY_DELAY)
    print(f"Tất cả các lần thử khởi động bot mua cho sản phẩm \\'{product_title}\\' đã thất bại.")

if __name__ == "__main__":
    scheduler_instance = BackgroundScheduler()

    nếu không os.path.exists(DATA_FILE):
        print(f"Lỗi: Tệp dữ liệu {DATA_FILE} không tồn tại. Vui lòng chạy script popmart-scraper.py trước để tạo dữ liệu.")
        sys.exit(1)

    với open(DATA_FILE, "r", encoding="utf-8") như f:
        all_products = json.load(f)

    cho product_item trong all_products:
        release_full_datetime = parse_product_release_datetime(product_item["release_date"], product_item["release_time"])
        nếu release_full_datetime và release_full_datetime > datetime.now():
            scheduler_instance.add_job(initiate_purchase_bot, \\\'date\\', run_date=release_full_datetime, args=[product_item])
            print(f"Lên lịch tác vụ bot mua thành công cho sản phẩm \\'{product_item[\\\\'title\\\']}\\' để bắt đầu vào {release_full_datetime.strftime(\\'%Y-%m-%d %H:%M:%S\\')}.")

    thử:
        scheduler_instance.start()
        print("Trình lập lịch tác vụ đã bắt đầu thành công. Đang chờ các tác vụ đã lên lịch thực thi...")
        # Giữ cho luồng chính sống để đảm bảo trình lập lịch nền chạy đúng cách
        trong khi True:
            time.sleep(2)
    ngoại lệ (KeyboardInterrupt, SystemExit):
        scheduler_instance.shutdown()
        print("Trình lập lịch tác vụ đã ngừng chạy.")

import asyncio
import sys
from playwright.async_api import async_playwright

# Cấu hình dịch vụ proxy Nstproxy (vui lòng thay thế bằng thông tin xác thực thực tế của bạn)
NSTPROXY_USERNAME = "tên_người_dùng_nstproxy_của_bạn"
NSTPROXY_PASSWORD = "mật_khẩu_nstproxy_của_bạn"
NSTPROXY_HOST = "gate.nstproxy.io" # Địa chỉ cổng Nstproxy
NSTPROXY_PORT = "24125" # Số cổng Nstproxy

async def execute_purchase_process(target_product_url):
    print(f"Đang cố gắng truy cập trang sản phẩm: {target_product_url}")
    thử:
        async với async_playwright() như p:
            # Xây dựng địa chỉ máy chủ proxy Nstproxy đầy đủ, bao gồm thông tin xác thực
            proxy_connection_string = f"http://{NSTPROXY_USERNAME}:{NSTPROXY_PASSWORD}@{NSTPROXY_HOST}:{NSTPROXY_PORT}"

# Khởi động trình duyệt Chromium và cấu hình proxy. headless=False để quan sát hành vi của bot.
            browser_instance = await p.chromium.launch(
                headless=False, 
                proxy={
                    "server": proxy_connection_string
                }
            )
            # Tạo một ngữ cảnh trình duyệt mới để đảm bảo cài đặt proxy có hiệu lực trong suốt phiên làm việc
            browser_context = await browser_instance.new_context(
                proxy={
                    "server": proxy_connection_string
                }
            )
            page_instance = await browser_context.new_page()
            # Điều hướng đến trang sản phẩm mục tiêu, đặt thời gian chờ lâu hơn cho các độ trễ mạng
            await page_instance.goto(target_product_url, timeout=60000)

            # Chờ nút "Thêm vào túi" xuất hiện và nhấp vào nó
            # Lưu ý: Điều chỉnh bộ chọn này dựa trên cấu trúc HTML thực tế của trang web Pop Mart
            add_to_bag_button_selector = "button:has-text(\\"THÊM VÀO TÚI\\")"
            await page_instance.wait_for_selector(add_to_bag_button_selector, timeout=30000)
            await page_instance.click(add_to_bag_button_selector)
            print(f"Nhấp vào nút \\"THÊM VÀO TÚI\\" cho sản phẩm: {target_product_url} thành công.")

            # Chờ trang giỏ hàng tải xong và nhấp để vào giỏ
            # Lưu ý: Điều chỉnh bộ chọn này dựa trên cấu trúc HTML thực tế của trang web Pop Mart
            shopping_cart_selector = "a[href*=\\'/cart\\']"
            await page_instance.wait_for_selector(shopping_cart_selector, timeout=30000)
            await page_instance.click(shopping_cart_selector)
            print("Điều hướng đến trang giỏ hàng thành công.")

            print("Trình duyệt vẫn mở để bạn hoàn thành quy trình thanh toán bằng tay.")
            # Giữ cho cửa sổ trình duyệt mở để người dùng can thiệp và hoàn thành các thao tác nhạy cảm như thanh toán
            await page_instance.wait_for_timeout(3600000) # Đặt thời gian chờ 1 giờ để người dùng có đủ thời gian hoạt động

            await browser_instance.close()
            return 0 # Trả về 0 cho quy trình mua hàng thành công
    except Exception as e:
        print(f"Lỗi trong quá trình mua hàng: {e}")
        return 1 # Trả về 1 cho quy trình mua hàng thất bại
if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Sử dụng: python3 purchase-bot.py <product_URL>")
        sys.exit(1)
    
    product_url_from_args = sys.argv[1]
    exit_code_result = asyncio.run(execute_purchase_process(product_url_from_args))
    sys.exit(exit_code_result)

Tại Sao Nên Tự Động Hóa Mua Hộp Mù?

Tại Sao Khi Mua Hộp Mù Bằng Tay Thường Thất Bại

Các Thành Phần Cốt Lõi Để Xây Dựng Bot Mua Hộp Mù

Ngôn Ngữ Lập Trình Python

Framework Tự Động Hóa Trình Duyệt Playwright

Driver Trình Duyệt Chrome hoặc Firefox

Dịch Vụ Proxy Residential Nstproxy

Tài Khoản Người Dùng Pop Mart

Phân tích quy trình thanh toán của Pop Mart

Xây dựng kiến trúc bot chính

Bước 1: Thiết lập môi trường

Bước 2: Lập kế hoạch thư mục dự án

Bước 3: Phát triển kịch bản điều khiển chính

Bước 4: Scraping Dữ liệu Trang Sản Phẩm Mới

Bước 5: Cấu hình Lịch trình Công việc

Bước 6: Phát triển Bot Thực Thi Mua Hàng

Bước 7: Khởi động Hệ thống Bot

Chiến Lược Nâng Cao cho Những Lần Phát Hành Cạnh Tranh Cao

Phối Hợp Nhiều Tài Khoản

Quyết Định Mua Sắm Dự Đoán

Dự Đoán Hàng Tồn Kho Chính Xác

Tích Hợp Thông Tin Cộng Đồng

Thực Tiễn Tốt Nhất cho Testing và Triển Khai

Testing Trong Môi Trường Sandbox

Phân Tích và Tối Ưu Tắc Nghẽn Hiệu Suất

Hệ Thống Giám Sát Thời Gian Thực và Cảnh Báo

Giải Pháp Khôi Phục Thảm Họa và Sao Lưu

Kết Luận