URL validation

Detailed

Source: regexr.com/39nr7

This will match multiple URLs on a line.

[(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)

Simplified

I made this and tested in VS Code.

https:\/\/[\w./#-]*

It does not match ?= or trailing / or Jekyll {{ URLs.

Add a space or \s or \( at the start to find URL in a certain context.

Security

Insecure URLs excluding localhost. Make secure. Using VS Code find and replace.

  • Find:
      http:\/\/(?!localhost)[\w./#-]*
    
  • Replace:
      $1s$2
    

Markdown URLs

Check for a URL in a markdown document - searching for both the alternate text and target.

Generally the pattern is:

[Alt text](target)

Here sare

  • Internal and external paths. Allows for empty target too - ().
      \[.+\]\(.*\)
    
  • Internal path.
      \[.+\]\((?!http).+\)
    
  • Internal absolute path.
      \[.+\]\(/.+\)
    
  • External URLs.
      \[.+\]\(http.+\)
    
  • HTTP URL that is not secure.
      \[.+\]\(http[^s].+\)
    
  • Internal path which is not a # ID reference or Jekyll link.
      \[.+\]\((?!(http|[{#])).+\)
    

Note www without a protocol is also external but not considered here as I donโ€™t use that style.

Convert to markdown URL

Find an external URL and convert to []() format, remove protocol and www in the label - using regex find and replace in VS Code.

Includes an optional trailing slash (if not matched, this gets left outside the pattern after the () part).

  • Find:
       (https:\/\/)(www\.)?([\w./-]+[\w-]+)(/?)
    
  • Replace:
       [$3]($1$2$3$4)
    

Note leading space. This could be remove for lines starting with URL.

Samples:

  • ` https://stackedit.io/ => stackedit.io`
  • ` https://github.com/MichaelCurrin/nested-jekyll-menus => [github.com/MichaelCurrin/nested-jekyll-menu](https://github.com/MichaelCurrin/nested-jekyll-menus`
  • ` https://github.com/MichaelCurrin/nested-jekyll-menus/ => [github.com/MichaelCurrin/nested-jekyll-menu](https://github.com/MichaelCurrin/nested-jekyll-menus/`

This variation handles # ID values.

  • Find:
      (https:\/\/)(www\.)?([\w./-]+[\w-]+)(#[\w-]*)?(/?)
    
  • Replace:
      [$3]($1$2$3$4$5)