Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- function reduce(numerator,denominator){
- var gcd = function gcd(a,b){
- return b ? gcd(b, a%b) : a;
- };
- gcd = gcd(numerator,denominator);
- return [numerator/gcd, denominator/gcd];
- }
- function spamDetection(messages, spamSignals) {
- let res = ["passed","","passed","passed"], message, user, msgByUsr = {}, fewerThan5 = 0, contain = 0, contWords = [];
- Array.prototype.unique = function() {
- var a = this.concat();
- for(var i=0; i<a.length; ++i) {
- for(var j=i+1; j<a.length; ++j) {
- if(a[i] === a[j])
- a.splice(j--, 1);
- }
- }
- return a;
- };
- for(let i = 0; i<messages.length; i++){
- message = messages[i][0];
- user = messages[i][1];
- // Check less than 5 words
- if(message.split(" ").length<5) fewerThan5++;
- // Check 50% has the same content from the same user
- if(msgByUsr[user] !== undefined){
- if(msgByUsr[user][message] !== undefined) msgByUsr[user][message]++;
- else msgByUsr[user][message] = 1;
- }
- else{
- msgByUsr[user] = {};
- msgByUsr[user][message] = 1;
- }
- // Contain suspicious word
- for(let j = 0; j<spamSignals.length; j++){
- if(message.replace(/[^a-z^A-Z^\ ]/gi, ' ').replace(/[\ \ ]/g, ' ').replace(/[\ \ ]/g, ' ').toLowerCase().split(" ").indexOf(spamSignals[j]) != -1 ){
- contain++;
- // if(contWords[spamSignals[j]] === undefined) contWords[spamSignals] = 0
- contWords.push(spamSignals[j]);
- }
- }
- }
- console.log("fewer than 5: "+fewerThan5);
- let frac = reduce(fewerThan5, messages.length);
- console.log(frac);
- if(fewerThan5/messages.length > 0.9) res[0] = "failed: "+frac[0]+"/"+frac[1];
- let amount = 0, max = -1, maxAll = -1, amountAll = 0, maxMsg, maxMsgAll;
- Object.keys(msgByUsr).sort().forEach(function(key1){
- amount = 0; max = -1;
- Object.keys(msgByUsr[key1]).forEach(function(key2){
- amount += msgByUsr[key1][key2];
- if(max < msgByUsr[key1][key2]){
- max = msgByUsr[key1][key2];
- maxMsg = key2;
- if(!maxMsgAll){
- maxMsgAll = maxMsg;
- maxAll = 0;
- }
- }
- });
- amountAll += amount;
- if(maxMsgAll === maxMsg){
- maxAll += max;
- }
- else{
- if(max > maxAll){
- maxAll = max;
- maxMsgAll = maxMsg;
- }
- }
- if(amount >= 2 && max/amount > 0.5) {
- res[1] += " "+key1;
- }
- });
- console.log("Working: "+(messages.length===amountAll));
- if(res[1] !== "") res[1] = "failed:"+res[1];
- else res[1] = "passed";
- console.log("maxAmount: "+maxAll+". Amount: "+amountAll);
- if(amountAll >= 2 && maxAll/amountAll > 0.5)
- res[2] = "failed: "+maxMsgAll;
- console.log("contain: "+contain+". Amount: "+messages.length);
- if(messages.length >= 2 && contain>messages.length/2){
- res[3] = "failed: "+contWords.unique().sort().join(" ");
- }
- return res;
- }
- //==================== QUESTION ==================
- /**
- Not long ago, a spam campaign originated on some of the major social networks, and it's started to affect Kik users as well. Most of the spam comes from a limited number of highly-motivated individuals, possibly from a single group, who constantly update their spam software. What started off as some simple message-sending bots has now evolved into something that requires a large team of engineers to fight against it.
- At the beginning, the bots weren't that clever. The spam detection could essentially be narrowed down to checking messages against several simple criteria. For a user's stream of messages over a given time period, the spammer could be identified if:
- More than 90 % of all messages had fewer than 5 words (here, a word is defined as a sequence of consecutive letters which is neither immediately preceded nor followed by another letter);
- More than 50 % of messages to any one user had the same content, assuming that there were at least 2 messages to that user;
- More than 50 % of all messages had the same content, assuming that there were at least 2 messages;
- More than 50 % of all messages contained at least one of the words from the given list of spamSignals (the case of the letters doesn't matter).
- You are applying to the Anti-Spam Team at Kik, so you want to make sure you understand how this basic spam detection program worked. Implement a function that, given a stream of messages and a list of spamSignals, determines whether it's possible that the user might be a spammer by checking against the criteria above.
- Example
- For
- messages = [["Sale today!", "2837273"],
- ["Unique offer!", "3873827"],
- ["Only today and only for you!", "2837273"],
- ["Sale today!", "2837273"],
- ["Unique offer!", "3873827"]]
- and spamSignals = ["sale", "discount", "offer"], the output should be
- spamDetection(messages, spamSignals) = [
- "passed",
- "failed: 2837273 3873827",
- "passed",
- "failed: offer sale"
- ]
- Here are the results of the checks per criterion:
- 4 out of 5 (80 %) messages have fewer than five words, which is within acceptable parameters = "passed";
- 2 out of 3 messages to user 2837273 are the same and both messages to user 3873827 are the same, which is a good indicator that they might be spam = "failed: 2837273 3873827";
- 2 out of 5 (40 %) messages have the same content, which is within acceptable parameters = "passed";
- 4 out of 5 (80 %) messages contain words from spamSignals. The two words that appear in the messages are offer and sale and offer is the lexicographically smaller of the two, so the output = "failed: offer sale".
- For
- messages = [["Check Codefights out", "7284736"],
- ["Check Codefights out", "7462832"],
- ["Check Codefights out", "3625374"],
- ["Check Codefights out", "7264762"]]
- and spamSignals = ["sale", "discount", "offer"], the output should be
- spamDetection(messages, spamSignals) = [
- "failed: 1/1",
- "passed",
- "failed: Check Codefights out",
- "passed"
- ]
- Since all users in messages received only one message each, it's impossible to check the second criterion. The fourth criterion doesn't match: there are not any words from spamSignals in the messages. However, the first and the third criteria failed, since all the messages contain 4 words ("failed: 1/1") and have the exact same content ("failed: Check Codefights out").
- Input/Output
- [time limit] 4000ms (js)
- [input] array.array.string messages
- An array of messages, where each message is given in the format [message, id of recipient].
- Guaranteed constraints:
- 1 ≤ messages.length ≤ 100,
- messages[i].length = 2,
- 1 ≤ messages[i][0].length ≤ 100,
- 1 ≤ int(messages[i][1]) ≤ 109.
- [input] array.string spamSignals
- An array of unique spam signals, where each spam signal consists of lowercase English letters.
- Guaranteed constraints:
- 1 ≤ spamSignals.length ≤ 30,
- 1 ≤ spamSignals[i].length ≤ 25.
- [output] array.string
- An array of 4 strings containing the results of the spam checks per criterion. The results for each criterion should be given in the following format:
- "passed" if the check doesn't suggest that the user is a spammer, otherwise:
- for the first criterion: "failed: <failed_ratio>", where <failed_ratio> is the ratio of messages with fewer than 5 words as a reduced fraction;
- for the second criterion: "failed: <recipient_1> <recipient_2> ...", where <recipient_i> is ID of the spammed user. Recipients should be sorted in ascending order of their IDs;
- for the third criterion: "failed: <message>", where <message> is the message, which is equal to more than 50% of all messages;
- for the fourth criterion: "failed: <spamSignal_1> <spamSignal_2> ...", where <spamSignal_i> is the spam signal that appeared in at least one message. Spam signals should be sorted lexicographically.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement