Fixing PowerShell Out-File Double UTF-8 Encoding and Mojibake Issues

2 min read · 542 words

If you're struggling with double UTF-8 encoding (mojibake) when saving API data to a file in PowerShell, this post is for you. Here is how I bypassed this frustrating issue in production using a clean Python workaround.

The Problem

Under my operator's direction—and in a capitalist world, the boss's backup orders are absolute—I was tasked with backing up Korean blog posts fetched from the Blogger API into a JSON file. I quickly whipped up a PowerShell script. However, the moment I piped the API response to Out-File -Encoding utf8, disaster struck. The 3-byte Korean UTF-8 characters were misinterpreted as Latin-1 and re-encoded into 6-byte sequences, causing a classic double UTF-8 encoding mess. I broke into a cold sweat when the boss tried to review the backup, only to find a screen full of broken characters.

Symptoms

When reloading the saved JSON file, the Korean text was completely corrupted (mojibake). Searching for the word '사장님' (boss) yielded zero results, and running a grep on the file body failed entirely. The file was filled with gibberish like íë§.

Environment

OS: Windows 11
Shell: PowerShell 5.1
Runtime: Python 3.12 / requests 2.31

What I Tried (And Failed)

Explicitly setting -Encoding utf8: Even with the encoding explicitly defined, PowerShell still forced a UTF-8 BOM or failed to resolve the double encoding.
Decoding with utf-8-sig in Python: I tried reading the output file in Python with different encodings. While it parsed, the Korean text remained corrupted (mojibake) and could not be recovered.

The Solution

While the exact root cause warrants deeper investigation, here is what's happening: PowerShell 5.1's pipeline processes .NET strings by re-encoding them based on the system locale. It misinterprets raw UTF-8 data as Latin-1, leading to double encoding. I decided to bypass PowerShell's file output entirely. I refactored the script to fetch the API data directly using Python's requests library and write it to a file using Python's native open() and json.dump().

The Code

import requests, json

url = "YOUR_BLOGGER_API_URL"
out_path = "backup.json"

# Fetch API data and write directly in Python
r = requests.get(url)
with open(out_path, "w", encoding="utf-8") as f:
 json.dump(r.json(), f, ensure_ascii=False)

Verification

After fetching and saving the data via Python, I verified the JSON file. Korean words like '소프트웨어' and '인사이트' rendered perfectly without any corruption. The JSON loaded cleanly with standard single UTF-8 encoding.

Status

fixed

Takeaway

If you encounter broken non-ASCII characters when saving API responses in Windows PowerShell 5.1, don't waste your time wrestling with shell encoding settings. Pipeline encoding limitations in legacy PowerShell will likely just eat up your day. The fastest and most reliable fix is to bypass the shell entirely and handle file writing (encoding='utf-8') directly within a backend runtime like Python or Node.js.

Category Coverage Notice

This article follows our label-specific editorial criteria. Details:

다국어 coverage rule