<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Chinese Word Segmentation | Qing Shan Hou</title><link>https://jimmyhoulala.github.io/en/tags/chinese-word-segmentation/</link><atom:link href="https://jimmyhoulala.github.io/en/tags/chinese-word-segmentation/index.xml" rel="self" type="application/rss+xml"/><description>Chinese Word Segmentation</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sat, 01 Feb 2025 00:00:00 +0000</lastBuildDate><image><url>https://jimmyhoulala.github.io/media/icon_hu7729264130191091259.png</url><title>Chinese Word Segmentation</title><link>https://jimmyhoulala.github.io/en/tags/chinese-word-segmentation/</link></image><item><title>BiLSTM-CRF Chinese Word Segmentation System</title><link>https://jimmyhoulala.github.io/en/project/chineseseg/</link><pubDate>Sat, 01 Feb 2025 00:00:00 +0000</pubDate><guid>https://jimmyhoulala.github.io/en/project/chineseseg/</guid><description>&lt;p>This project implements a Chinese Word Segmentation (CWS) system based on a BiLSTM-CRF architecture, enhanced with several advanced features such as pretrained character embeddings, character-type embeddings, bigram embeddings, multi-corpus training, a customized CRF decoder with illegal transition constraints, loss visualization, and evaluation utilities.
The system supports common CWS datasets including MSRA, PKU, and Souhu, and provides a complete workflow from data preprocessing → vocabulary construction → model training → evaluation → inference.
Experiments show that the model achieves an F1 score of approximately 0.91 on the MSRA benchmark. The project is modular, extendable, and suitable for both engineering use and academic exploration.&lt;/p></description></item></channel></rss>