What is your e-mail address?

My e-mail address is:

Do you have a password?

Forgot your password? Click here
close

    What's in a domain name? NIST has an answer

    Everyone knows how frustrating ' and embarrassing '
    it can be to mistype a URL into your browser. (Remember the
    snickering you used to hear if you went to
    'whitehouse.com' instead of
    'whitehouse.gov'? The .com address is now a political
    news site, by the way.) The Internet Corp. for Assigned Names and
    Numbers (ICANN) plans to launch a new round of proposals later this
    year for generic top-level Internet domains and is looking for a
    way to help avoid confusion and fraud as the number of domains
    increases.


    To help this effort, Paul Black, a computer scientist at the
    National Institute of Standards and Technology,
    has come up with an algorithm to measure the amount of visual
    similarity between domain names. The tool
    scores the similarities between a proposed domain and an existing
    one. For instance, a domain such as '.c0m' (with a
    zero) scores an 88 percent compared with '.com' and
    probably would not be approved.


    Generic top-level domains are the strings of letters and numbers
    that appear after the far right '.' or dot, before a
    '/' or slash in a URL. According to ICANN, there are 21
    generic top-level domains now approved for use ' from .aero
    (reserved for members of the air transport industry) to .travel
    (reserved for the travel industry), as well as the more familiar
    .com, .edu, .gov and .mil.


    According to NIST, Black's algorithm rates the degree of
    similarity between pairs of alphanumeric characters, such as the
    numeral '1' and the lowercase letter 'l,'
    which in some fonts are dead ringers and would receive the highest
    score. Other pairs, such as 'h' and 'n,'
    are similar and get lower scores. The algorithm also takes into
    consideration combinations of letters, such as 'cl,'
    which can look like 'd.' Putting everything together,
    the algorithm then computes the 'cost' of transforming
    one string into another based on visual similarity and expresses
    that in a percentage score.


    NIST says ICANN is considering future enhancements to the
    algorithm, including checks for confusing similarities between
    domains in other alphabets or scripts such as Cyrillic.



    About the Author

    William Jackson is a senior writer for GCN.

    Reader Comments

    Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

    Your Name:(optional)
    Your Email:(optional)
    Your Location:(optional)
    Comment:
    Please type the letters/numbers you see above

    GCN eNewsletters

    eSeminar