Posts in category “Programming”

利用正则表达式精准匹配 Markdown 中的 Tag

在 Markdown 文档中,我们经常会使用 # 来标记标签(tag)。例如,#todo#feature 都可以作为标记嵌入文本中。然而,在 Markdown 中还存在另一种情况:代码块。代码块通常由单个或连续三个反引号(```)包裹,并且代码块内部可能会出现类似 #ff0000 这样的颜色值。如果不加以区分,就会把这些颜色值也识别为标签。

此外,为了避免误判,转义的反引号(\`)也不应被当作真正的分界符。为此,我们需要设计一个既能排除转义反引号,又能将连续的三个反引号作为一个整体处理的正则表达式。

挑战与需求

  • 处理代码块边界:Markdown 支持用单个反引号表示行内代码、用三个反引号表示多行代码块。正则表达式需要区分这两种情况,确保在代码块中的 # 不会被误认为标签前缀。
  • 转义字符的处理:例如 \` 这样的转义反引号不应计入反引号的配对判断中。
  • Tag 匹配规则:标签由 # 开头,后跟 1 至 32 个 Unicode 字母、数字或下划线,且必须在正确的边界下出现。

正则表达式解决方案

经过多次改进,我们最终得到了下面这个正则表达式:

(?=(?:(?:[^`]|\\`)*(?<!\\)(?<delim>```|`)(?:[^`]|\\`)*(?<!\\)(\k<delim>))*(?:[^`]|\\`)*$)(?<=(?:^|[^\\])#)[\p{L}_\p{N}]{1,32}(?=[^\p{L}\p{N}_]|$)

让我们逐步解析这条表达式的关键部分:

1. 保证反引号成对出现

表达式的开头部分使用了复杂的正向前瞻:

(?=(?:(?:[^`]|\\`)*(?<!\\)(?<delim>```|`)(?:[^`]|\\`)*(?<!\\)(\k<delim>))*(?:[^`]|\\`)*$)
  • 核心思想:通过匹配一系列非反引号或转义反引号的字符,再加上捕获组 (?<delim>```|)`,我们捕获了第一次出现的“分界符”——它可能是单个反引号,也可能是连续三个反引号。
  • 反向引用:利用 \k<delim>,确保后续匹配的分界符与前面捕获到的完全一致。这样,无论遇到的是行内代码还是代码块,都只视作一个整体。
  • 成对匹配:整个正向前瞻确保了目标区域内所有未转义的反引号都是成对出现的,避免了误将代码块内部的内容当作标签判断依据。

2. 确保标签前缀正确

接下来的部分:

(?<=(?:^|[^\\])#)
  • 作用:这个正向后查断言确保了标签必须由一个 # 开头,而这个 # 不能被反斜杠转义(即不应为 \#)。

3. 匹配 Tag 内容

标签的主体部分由下面这段完成:

[\p{L}_\p{N}]{1,32}(?=[^\p{L}\p{N}_]|$)
  • 匹配范围:这里 [\\p{L}_\\p{N}] 表示 Unicode 字母、数字以及下划线。{1,32} 限定了标签的长度为 1 至 32 个字符。
  • 边界条件:紧跟的正向查找断言 (?=[^\p{L}\p{N}_]|$) 确保标签后面紧跟的是非单词字符或已经到达字符串末尾,防止匹配到类似 #todoing 这种不完整的标签。

小结

这条正则表达式综合了多种复杂情况,既能处理 Markdown 中单反引号和三反引号的代码块,又能排除转义反引号的干扰,同时严格匹配标签格式。对于开发者来说,这样的表达式在解析 Markdown 文档、提取标签或进行格式化处理时非常有用。

当然,正则表达式的可读性和维护性是一个平衡点。虽然这条表达式在处理边界情况时表现出色,但在实际应用中可能还需要根据具体场景做出微调。希望这篇文章能为你在 Markdown 文档解析方面提供一些灵感和帮助。

Tired of Git Branch Case Sensitivity? Here's a Fix for Git Bash Users

Hey fellow devs! Here’s a small but super useful Git trick for dealing with that annoying branch case sensitivity issue. You know the drill—someone creates feature/UserAuth, but you type feature/userauth. On a case-sensitive OS, Git tells you the branch doesn’t exist. On a case-insensitive OS like Windows or macOS, it’s worse—you switch to the wrong branch without realizing it. Later, you discover both feature/userauth and feature/UserAuth exist on GitHub. Ouch.

This is particularly painful when working with Windows or macOS, where the filesystem is case-insensitive, but Git isn't. Add in a mix of developers with different habits (some love their CamelCase, others are all-lowercase fans), and you've got yourself a daily annoyance.

The Fix

I wrote a little Git alias that makes checkout case-insensitive. It's basically a smarter version of git checkout that you use as git co. Here's what it does:

$ git co feature/userauth
⚠️ Warning: "feature/userauth" is incorrect. Switching to: "feature/UserAuth"
Switched to branch 'feature/UserAuth'

Nice, right? No more "branch not found" errors just because you got the case wrong!

Important Note ⚠️

This works in:

  • Windows Git Bash (tested and used daily)
  • Should work in macOS/Linux (though I haven't tested it - let me know if you try!)

But it won't work in:

  • Windows CMD
  • Windows PowerShell
  • Any other Windows terminals

Why? Because it uses Bash commands and parameters. But hey, you're using Git Bash anyway, right? 😉

How to Install

Just run this in Git Bash:

git config --global alias.co '!f() {
  # If no args or help flag, show git checkout help
  if [ $# -eq 0 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
    git checkout --help;
    return;
  fi;
  # If any flags are present (except -q or --quiet), pass through to regular checkout
  if [ $# -gt 1 ] || [[ "$1" == -* && "$1" != "-q" && "$1" != "--quiet" ]]; then
    git checkout "$@";
    return;
  fi;
  # Pass through if argument is a commit reference (HEAD, SHA, tag, etc)
  if [[ "$1" =~ ^HEAD([~^]|@{)[0-9]*$ ]] || # HEAD~1, HEAD^1, HEAD@{1}
     [[ "$1" =~ ^(FETCH_HEAD|ORIG_HEAD|MERGE_HEAD)$ ]] || # Special refs
     [[ -f ".git/refs/tags/$1" ]]; then # Tags
    git checkout "$@";
    return;
  fi;
  # Fetch and prune to ensure we have latest refs
  git fetch --prune --quiet;
  # Check both remote and local branches
  correct_branch=$(git branch -r | sed "s/^  origin\///" | grep -i "^$1$");
  if [ -z "$correct_branch" ]; then
    correct_branch=$(git branch | sed "s/^[* ] //" | grep -i "^$1$");
  fi;
  # If branch found with different case, warn and use correct case
  if [ -n "$correct_branch" ] && [ "$1" != "$correct_branch" ]; then
    echo "⚠️ Warning: \"$1\" is incorrect. Switching to: \"$correct_branch\"";
    git checkout "$correct_branch";
    return;
  fi;
  # Otherwise just pass through to regular checkout
  git checkout "$1";
}; f'

What's Cool About It?

  • Finds your branch regardless of how you type the case
  • Still works normally for everything else (files, creating branches, etc.)
  • Shows you the correct branch name when you get the case wrong
  • Auto-fetches to make sure it sees all remote branches

The best part? If it doesn't recognize what you're trying to do, it just passes everything to regular git checkout. So it won't break any of your normal Git workflows.

Real World Usage

I use this daily in a Windows-heavy dev team where we have branches like:

  • feature/UpdateUser
  • hotfix/FixLoginBug
  • release/v2.0.0

And now I can type them however I want. Life's too short to remember exact branch capitalization!

Give It a Try!

If you're using Git Bash on Windows and tired of case sensitivity issues, give this a shot. It's one of those small tools that just makes your day a tiny bit better.

And hey, if you try this on macOS or Linux, let me know how it goes! Always curious to hear if these tricks work in other environments.

Claude.AI 帮我改进 IsHtml 判断

让我重新分析这个问题:

  1. 需要考虑的HTML情况:
  • 简单标签: <div>text</div>
  • 带属性标签: <a href="url">text</a>
  • 自闭合标签: <br/>
  • 带属性的自闭合标签: <img src="img.jpg"/>
  1. 需要排除的markdown链接格式:
  • 专指<https://...>这种格式
  • 特征是 < 后紧跟 http 或 https
  1. 当前正则的问题:

<\s*([a-zA-Z]+)[^>]*>.*</\s*\1\s*>|<\s*([a-zA-Z]+)[^>]*/>

它会匹配所有以 < 开头,包含字母的内容,这就导致了误匹配

  1. 改进思路:
  • 排除标签名以 http(s) 开头的情况
  • 使用负向预查 (?!http(s)?)
  • 放在标签名匹配的开始处
  1. 优化后的正则表达式:

<\s*(?!https?)([a-zA-Z]+)[^>]*>.*</\s*\1\s*>|<\s*(?!https?)([a-zA-Z]+)[^>]*/>

  1. 验证效果:
  • <div>text</div> ✓ 匹配
  • <a href="url">text</a> ✓ 匹配
  • <br/> ✓ 匹配
  • <img src="img.jpg"/> ✓ 匹配
  • <https://abc.com/> ✗ 不匹配

这个解决方案:

  • 保留了原有正则的基本结构
  • 只添加了对标签名的限制条件
  • 能正确处理各种HTML标签
  • 成功排除markdown格式的URL

建议修改代码为:

public static bool IsHtml(this string input)
{
    return Regex.IsMatch(input, 
        @"<\s*(?!https?)([a-zA-Z]+)[^>]*>.*</\s*\1\s*>|<\s*(?!https?)([a-zA-Z]+)[^>]*/>", 
        RegexOptions.Singleline | RegexOptions.IgnoreCase);
}

Why We Shouldn't Unconditionally Load Data in `didChangeDependencies

Flutter’s didChangeDependencies lifecycle method is a common source of redundant network requests and data reloads. Based on real-world code examples from a notes app, let’s explore why this happens and how to fix it.


When Does didChangeDependencies Trigger?

didChangeDependencies in Flutter triggers in these scenarios:

  • Immediately after initState()
  • When an InheritedWidget ancestor changes
  • When the widget's dependencies change

In the provided HomePage code:

@override  
void didChangeDependencies() {  
  super.didChangeDependencies();  
  navigateToPage(currentPageNumber); // ❌ Unconditional call  
}  

This causes duplicate API calls every time the event being triggerd.


The Fix: Initialize-Once Flag

class HomePageState extends State<HomePage> {  
  ...
  bool _isInitialized = false; // Add flag  

  @override  
  void didChangeDependencies() {  
    super.didChangeDependencies();  
    if (!_isInitialized) {  
      _isInitialized = true;  
      navigateToPage(currentPageNumber); // ✅ Only first load  
    }  
  }  
  ...
}  

Why This Works

  1. First Load: Initializes data once
  2. Subsequent Route Changes: Skips reload unless explicitly refreshed
  3. Memory Efficiency: Prevents duplicate API calls (evident from the NotesService cache-less implementation)

Key Takeaways

  1. didChangeDependencies isn’t just for initialization
  2. Always guard data-loading logic with flags

Smart Row Selection: Maintaining Infragistics UltraGrid State After Refresh

... skipping 1000 words about bad solutions ...

The Clean/Best Solution

Here's a pattern that elegantly handles this situation:

public void RefreshGrid()
{
    // 1. Store the current entity before refresh
    SomeEntity currentEntity = null;
    if (dataGrid.ActiveRow != null)
    {
        currentEntity = dataGrid.ActiveRow.ListObject as SomeEntity;
    }

    // 2. Refresh the grid
    dataGrid.RefreshData();

    // 3. Restore selection using entity ID
    if (currentEntity != null)
    {
        dataGrid.ActiveRow = dataGrid.Rows.FirstOrDefault(r => 
            ((SomeEntity)r.ListObject).Id == currentEntity.Id);
    }
}

Why This Pattern Is Best Practice

  1. Type Safety: Using the strongly-typed entity object instead of raw values
  2. Identity-Based: Uses unique IDs instead of volatile row positions
  3. Null-Safe: Handles cases where no row is selected
  4. Concise: LINQ makes the code readable and maintainable
  5. Reliable: Works even if data order changes or rows are filtered