Tags in social networks and synthetic (inflectional) languages

I. what's the problem


A computer technology, developed initially in the world analytic language when you migrate to the community with synthetic language come across additional difficulties.

For example, search with account of morphology in the English and Russian texts requires a different level of difficulty. Branching of the Russian inflection has long been a subject of popular jokes about the suffering of foreigners studying Russian grammar with all the rules and exceptions.

One example of how technology stumbles upon the difference of languages are the tags in English and Russian blogs and social networks. While the tags are allocated in a separate unit (as it is implemented on Habrahabr or LJ), there is no problem: both languages are used the initial words, sometimes the plural (and even then English is the remains of the former synthetism). But as soon as the tags enter the text, the difference is exacerbated. And sometimes it seems that, for example, the Twitter hashtags are becoming a powerful factor of increasing of analytism in the Russian language. And then come across a phrase like:

We #husband at the restaurant.

tomorrow in #Moscow.

Returned with a #sea.

It's a very strange feeling, some dizziness and linguistic split.

People who don't want to put up with such a bulging natural, solve the problem in different ways.

Someone makes the tags at the end of the tweet, as if in a separate block.

Back. #sea #vacation

Someone makes tags forward, as if to further isolate the subject of the following sentence.

#Moscow. Tomorrow.

Someone makes the tags all word forms. Although this significantly reduces the potential search by tags — morphology because they have not yet been fastened.

We #my husband at the restaurant.

Someone adds the end with a hyphen, a double or single quote and a space. This insert interrupts the automatic conversion of words in the tag just for this character. But this decision complicates the reading process, though such punctuation "guts" sticking out, it is still easier to accept than a sudden lack of inflection.

We #husband eating at a restaurant.

We #mug eat in the restaurant.

We #husband eating at a restaurant.

This solution is good only for those words whose primary form the so-called zero ending, that is, in which the initial shape coincides with the base words to which are added the end of other cases. With other words this design does not make sense, because no one will look for the tag that is not a complete word. Compare:

the streets of #Kiev. Evening #Kiev-ohms. We are in #Kiev.

the streets of #Moscow. Evening #Moscow. We are in #Moscow.

But all the same words with the same base and the initial shape quite a lot to find them readable tagging implementation.

the

II. Two possible solutions and their features


In fact, the character sets have two matching of the sign, which simultaneously interrupts and tagging, and leave no visible gap between tag-based and the end: soft hyphen (abbreviated as "shy"), and zero width space (abbreviated to "zwsp").

Their application allows you to create, for example, here are the hashtags on Twitter:



Or on Facebook (the previous option does not work, because Facebook filters the same hashtags, leaving only the first tag encountered):


The two characters have a number of common.

1. In most cases they are invisible inside the words.

2. It becomes a place of transfer of part of the word to the next line when the word no longer fits it.

3. They are not allowed to break parts of the words in the text justification (at least in the major browsers the latest versions).

4. To detect both sign the word can in the following ways:. when moving the carriage from one character arrow keys, the caret to the location of the invisible character once "stalled", as it does not react to pressing a key; b. when you add spaces before the word sooner or later, in passing to the end of the string "breaks" into two parts at the location of the inserted sign; V. some text editors (often programming) display invisible characters (e.g., null-space can be seen in a certain service jsfiddle.net, but the soft hyphen is not displayed there; by the way, it is possible to observe the behavior of the text with such characters if papermaster frame blocks: jsfiddle.net/k37ssezj (in the first interval after each "word" inserted zero gap, the second soft hyphen)).

But there are distinction between two signs, and they are all in favour of a soft line break:

1. Soft hyphen — a more traditional and ancient for computer peace sign, he is at the beginning of Unicode is present in a larger number of fonts and it's easier to enter from the keyboard. A gap of zero width is much later in the Unicode table can be in a smaller number of fonts and enter it from the keyboard harder.

2. When transferring a word to a new line a hyphen is more logical and more clearly indicates the unity of the word.

Some of the differences depend on the applications.

1. The signs can influence on-page search (the transfer does not interfere with the search in Chrome and Firefox, but in IE stopping; space does not interfere with the search in Chrome, but disturbs in Firefox and IE).

2. When you double-click on the word containing the sign invisible, sometimes visible only the part of the word under the cursor, sometimes the whole word (with the insert transfer the allocation shares only in a hybrid of Twitter tags in Chrome; if you insert a space — divided all the words in Chrome and IE).

3. Need to remember about the IE11 bug: when working with advanced input fields (called Rich Editor that allows you to see the design in real time in the style of WYSIWYG editors; they are generated by the properties element.contentEditable and document.designMode) sometimes does not work insert from the clipboard — in this case, in the developer console you need to switch from Edge mode to compatibility mode with a lower version of the browser (since IE10). For example, this problem manifests itself when you try to paste in the text of the annotation (Note) on Facebook.

Finally, to insert the characters different effect sites.

1. Facebook is more hostile to the invisible space. He removes it almost immediately during entry, and certainly do not save it when you publish post (). Invisible transfer of remains, including unpopular yet midst a Facebook hestego, but sometimes, for some reason you have to take the word caret input from the keyboard back and forth to the website this sign in the word "saw", otherwise also can not save (in case of manual input, this problem is less than when trying to insert a character into the text programmatically by using a script (more on this later)).

2. When trying to insert a sign in the Twitter script, you need to remember about this old annoying bug in Firefox. Or have to use other ways to insert, or set the key security.csp.enable from about:config to false that will probably be too radical way of solving typographic problems.

the

III. Methods of implementation


1. Enter manually.

If you rarely enter rare characters from the keyboard, it might be useful to read this short article. It describes two methods for entering symbols in decimal notation (works reliably only with an initial block of Unicode characters and some of the common encodings) and in hexadecimal. For convenience referred to in article editing registry, you can save the text file with the extension .reg and Unicode, click on it and agree to make the data in the registry.
EnableHexNumpad.reg
Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Control Panel\Input Method]
"EnableHexNumpad"="1"



So, if you enter manually, we need to know the sequence number of our characters in the two schisleniya:

Soft hyphen: 0173 and 00ad (respectively Alt + '0173' and Alt + '+00ad').

Zero space: 8203 and 200b (respectively Alt + '8203' and Alt + '+200b').

(By the way, I wonder what a null space is not included in the class of whitespace JavaScript '\s': the enumeration of the groups of blanks is interrupted just in front of him).

What to consider:

. Decimal input characters with higher sequence number works very rarely. Sometimes nothing is inserted, is inserted sometimes something quite unexpected.

b. When entering hexadecimal codes that contains the letters often trigger defined in the application keyboard shortcuts (shortcuts), and the character breaks down. Sometimes it depends on the current keyboard layout, sometimes not.

2. Insert from clipboard.

Anywhere we can keep those two characters to copy them out and paste via Ctrl+C/Ctrl+V. Although keep and copy the invisible characters a little harder than visible: have or save each in a separate file, or to know their place and highlight using the keyboard (Shift+arrow).

In Windows you can use the common utility, selecting not the poorest of the font, putting the desired option by entering the ordinal hexadecimal number and pressing the right buttons:





3. Enter using the script (bookmarklet).

A little further there will be two programs, identical in all but character code and variable. They consistently plucked three possible circumstances:

. The input focus is in a simple single-line or multi-line text field. Used common in such cases, properties and methods.

b. The input focus is in a field with extended possibilities of type Rich Editor (count options as an element of the current window/document, and fully dedicated to the editor of the iframe with its window/document). Is used document.execCommand().

V. the Focus in the text field or need a method Rich Editor is not implemented in the browser (IE11 the command insertText is not supported, but will be implemented in Edge). In this case, before the user POPs up a window with a text box in which the desired symbol is already selected (though he and invisible). It remains only to press Ctrl+C to close the window (by pressing Enter or Esc), then insert the cursor at the desired location and press Ctrl+V. You can consider this option a convenient implementation of the previous method of copying and pasting from the file or the utility.

Soft hyphen
javascript: (function(d, e, shy, s1, s2, v, sy, sx) {
if (e.type == 'textarea' || e.type == 'text') {
s1 = e.selectionStart;
s2 = e.selectionEnd;
sy = e.scrollTop;
sx = e.scrollLeft;
v = e.value;
e.value = v.substring(0, s1) + shy + v.substring(s2);
e.selectionStart = e.selectionEnd = ++s1;
e.scrollTop = sy;
e.scrollLeft = sx;
e.focus();
} else if ((e.isContentEditable || d.designMode == 'on') && d.queryCommandSupported('insertText') || (d = e.contentDocument) && (d.activeElement.isContentEditable || d.designMode == 'on') && d.queryCommandSupported('insertText')) {
d.execCommand('insertText', false, shy);
} else {
prompt('Copy and paste in the text:', shy);
}
})(document, document.activeElement, '\u00ad')


Zero blank
javascript: (function(d, e, zwsp, s1, s2, v, sy, sx) {
if (e.type == 'textarea' || e.type == 'text') {
s1 = e.selectionStart;
s2 = e.selectionEnd;
sy = e.scrollTop;
sx = e.scrollLeft;

e.value = v.substring(0, s1) + zwsp + v.substring(s2);
e.selectionStart = e.selectionEnd = ++s1;
e.scrollTop = sy;
e.scrollLeft = sx;
e.focus();
} else if ((e.isContentEditable || d.designMode == 'on') && d.queryCommandSupported('insertText') || (d = e.contentDocument) && (d.activeElement.isContentEditable || d.designMode == 'on') && d.queryCommandSupported('insertText')) {
d.execCommand('insertText', false, zwsp);
} else {
prompt('Copy and paste in the text:', zwsp);
}
})(document, document.activeElement, '\u200b')


In Chrome or Firefox you can create a new bookmark on any page and then paste the code (from the first letters of the javascript: to the last parentheses after '\u00ad' or '\u200b', inclusive) in the address field, changing the name of the bookmark.

In Chrome you can just select the code and drag and drop the selected text to your bookmarks, then changing the name to something more readable.

In IE these two methods impossible (but possible described here way to create a file in the folder "Favourites").

Finally, in all browsers you can drag to your bookmarks links from this page (in IE11 you will need to accept the continuation of the bookmarklet).

the

IV. Features of the application in different browsers and on different sites.


Here is a small table with the results of testing the three browsers are the latest versions in Windows 7 SP 1. Plus or minus indicate the current end result, depending on the amount of circumstances. It may change in the process of development of browsers or sites. A more General circumstances shown in the notes to the headings of rows and columns, more specific circumstances are noted in the cells. Separate the options with a post and Facebook note taken for different implementations of the Rich Editor in them: in the first case, as a page element, and the second as a whole the inline frame with its window/document.



(picture without reducing)

notes to table:
1. Hashtags in notes Facebook (yet?) not work, whether with invisible characters, whether it be without them.

2. When entering a decimal zero code space instead of the expected sign in all the browsers appears ♂ (serial number— 10: 9794, 16: 2642).

3. Facebook zero site space is removed immediately after insertion or after saving the post.

4. Instead of a soft migration trigger keyboard shortcuts.

5. To prevent triggering keyboard shortcuts in Firefox in this case can be added pressing the keys Win.

6. The problem depends on the keyboard layout: strobes or a keyboard shortcut, or nothing happens.

7. Facebook programmatically inserted soft hyphens are not always immediately "picked up" a page and thus are not always preserved when publishing the text, it is sometimes necessary to move the cursor within a word the arrows of the keyboard. Perhaps there are other ways of "actualization" of the insertion. Sometimes the insert is not recognized when you first create the post, but recognized when editing already saved.

8. In IE11 in fields of type Rich Editor will only work a compromise method bookmarklet, by copying and pasting the character from the buffer.

(PS Check virtual machine confirm the implementation of insertText in MS Edge version 11.00.10240.16397 from 7.22.2015 (file-version), she's 20.10240.16384.0 (according to settings): document.queryCommandSupported('insertText') returns true).

9. IE11 box post Facebook becomes a normal field without Rich Editor.


The clear results of the tests in the spreadsheet columns with the universal cross-browser and kassitoome — ways.

Thank you for your attention.
Article based on information from habrahabr.ru

Популярные сообщения из этого блога

Approval of WSUS updates: import, export, copy

Kaspersky Security Center — the fight for automation

The Hilbert curve vs. Z-order