Automated QC Just Before Blog Publishing — The Secret to 0 Publishing Errors in 6 Months

4 min read · 951 words

Practical Tips / Blog Operations / Python · Automation
Approx. 2,300 characters

When you manage a blog with over 200 posts, human review inevitably misses things. Markdown remnants (like bold being exposed as-is), emoji whitelist violations, missing sources, empty tables, and leftover box styles are common culprits. That is why we created a separate step to automatically check and fix posts right before they are sent to the blog API.

This post explains the intent behind building this automated QC system, how it works, the actual results we achieved, and how we validated it. We have distilled the core concepts so that any blog operator facing similar issues can implement it with just a single page of code.

Why We Built It

During the first year, we frequently encountered two types of issues.

First, model output remnants. When generating body text with an LLM, markdown tokens like bold, ## Subheading, or --- often remained unconverted to HTML. Asterisks were visible directly on the live site.

Second, cases where the post looked fine right after writing, but some hook broke it just before publishing. For example, a function might open an extra

in the body without closing it, breaking the card/sidebar layout; an automatic price table insertion might end up as an empty

How It Works

The checkpoint consists of two stages.

Stage 1: Sanitize — Unconditional fixes

It takes the HTML and applies the following across the board:

  • Remove dangerous inline styles (e.g., width:800px, margin-left:-30px, position:absolute)
  • Remove fixed width/height attributes from Automated QC Just Before Blog Publishing — The Secret to 0 Publishing Errors in 6 Months tags -> Preserve responsiveness
  • Convert markdown remnants to HTML (X -> X, arbitrary --- ->
    )
  • Strip characters violating the emoji policy (ranges U+2600-27BF, U+1F000-1FAFF)
  • Flatten box styles (
    with border, box-shadow, or padding>20px)
  • Inject a line of safe CSS into the body container (max-width:100%, overflow-wrap:anywhere)

This stage is a mechanical process that requires no human judgment. It is designed to produce consistent results for any post.

Stage 2: Quality Gate — Block publishing on failure

It automatically checks for omissions that a human would have noticed. If a post fails, publishing is rejected.

  • Body text length under 600 characters -> fail
  • Fewer than three

    tags -> fail (for guide/comparison posts)

  • 0 images -> fail (regardless of post type)
  • Comparison posts without a

    Actual Results

    Results over the 6 months since adoption:

    • Exposed markdown remnants: Average of 4 cases/month before -> 0 cases after
    • Layout clipping (left/right): Average of 7 cases/month before -> 0 cases after
    • Empty tables / empty charts in body: Average of 3 cases/month before -> 0 cases after
    • Blocked posts: 38 posts in total (all corrected by authors and successfully republished)

    The 38 blocked posts were not lost. The authors simply became aware of the issues, refined the content, and retried, leading to successful publishing. The distribution of blocking reasons was: missing sources (41%), insufficient character count (26%), 0 images (21%), and others (12%).

    Validation Methods

    Here is how we validated the checkpoint after building it:

    Golden Set Regression Testing — We collected the original drafts of 41 posts that had issues in the past to create a "golden set." We automatically verified whether the issue patterns disappeared when running them through the sanitize + quality gate process. Initially, 39/41 passed. After analyzing the 2 failures and reinforcing our regular expressions, we achieved a 41/41 pass rate.

    Live Spot-Checks — In the first week of applying the new sanitizer, we randomly selected 8 out of 18 published posts and fetched their live pages. We checked if horizontal scrolling occurred, if text overflowed the container, or if images broke at two widths: desktop (1280px) and mobile (360px). 8/8 were normal.

    Double-Pass Idempotency — We verified whether running the sanitizer a second time on an already sanitized output produced the exact same result. This validation ensures safety in case the publish hook chain runs twice. 100/100 were identical.

    How to Build It Yourself

    Rather than copying the entire code, you can adapt just one or two core elements to fit your environment.

    
    import re
    
    def sanitize_pre_publish(html: str) -> tuple[str, list[str]]:
     fixes = []
     # Remove dangerous inline width
     html, n = re.subn(r'width\s*:\s*(?:[4-9]\d{2}|[1-9]\d{3,})px\s*;?', '', html)
     if n: fixes.append('strip_wide_width')
     # Markdown remnants -> HTML
     html, n = re.subn(r'\*\*(.+?)\*\*', r'<strong>\1</strong>', html)
     if n: fixes.append('md_bold')
     # Strip emojis (if necessary)
     html, n = re.subn(r'[\U0001F300-\U0001FAFF]', '', html)
     if n: fixes.append('strip_emoji')
     return html, fixes
    
    def quality_gate(html: str, post_type: str) -> tuple[bool, list[str]]:
     fails = []
     text = re.sub(r'<[^>]+>', '', html)
     if len(text.replace(' ', '')) < 600: fails.append('too_short')
     if html.count('<h2') < 3 and post_type in ('howto', 'compare'): fails.append('few_h2')
     if '<img' not in html: fails.append('no_image')
     if 'TODO' in html or 'REDACTED' in html: fails.append('placeholder')
     return (len(fails) == 0), fails
    

    You only need to call these two functions at a single point right before publishing. If quality_gate returns a failure, block the publishing process and return the reasons to the user. For sanitize, simply take the output HTML and pass it directly to the publishing API.

    In short, it boils down to one line: "Prevent all errors automatically at a single checkpoint before publishing." The time humans used to spend reviewing posts is completely eliminated.

    Category Coverage Notice

    This article follows our label-specific editorial criteria. Details:

ToolSignal Pro Editorial

ToolSignal Pro는 AI·IT·소프트웨어 트렌드를 다루는 종합 IT 인사이트 매거진입니다.

이전 글 다음 글