04 가장 흔한 단어

금지된 단어를 제외한 가장 흔하게 등장하는 단어를 출력하라. 대소문자를 구분하지 않으며 구두점 또한 무시한다.

My solution

흔하게 등장하는 단어를 골라내기 위해서는 주어진 문장의 공백을 기준으로 단어를 잘라야된다. 그런데 공백을 기준으로 자른다음에 구두점이 들어오는 경우가 있으므로 이를 replace해줘야 한다고 생각했다.

그리고 대소문자를 구분하지 않으므로 toLowercase()로 소문자로 맞춰준다음 해시맵에 단어의 개수를 누적시키고 가장 흔한 단어를 찾는다.

하지만 더 깔끔한 방법이 있었다. split에 정규표현식을 던져서 영문자가 아닌 것을 seperator로 잘라주는 방식이다.

괄호 밖에서 ^는 문자의 시작을 의미한다.

하지만 []와 ()로 감싸면 의미가 달라진다. 괄호는 아래의 의미가 있고, 괄호 내부에서 ^는 “~이 아닌”의 의미를 가진다.

https://velog.io/@cyanred9/SQL-정규표현식의-소괄호-대괄호-차이

유의해야할 것은 바로 위의 이미지와 같이 ,와 공백이 연달아 잡히는 경우 그 사이에 빈 문자열 ‘’를 잡아서 배열에 넣어준다는 것이다. 이를 걸러내기 위해서 filter 메서드를 사용했다.

const words = p.split(/[^a-zA-Z]/).filter(w => w.length > 0);
JavaScript
복사

이렇게 words가 구성되면 banned 배열에 포함된 단어가 있으면 무시해야 한다. 따라서 for문을 돌 때 includes로 검사한다.

Code

const mostCommonWord = function (p, banned) {
  const words = p.split(/[^a-zA-Z]/).filter(w => w.length > 0);
  const counts = {};
  for (const word of words) {
    const lower = word.toLowerCase();
    if (!banned.includes(lower)) counts[lower] = counts[lower] ? counts[lower] + 1 : 1;
  }

  let max = 0;
  let maxWord = '';
  for (const word in counts) {
    if (counts[word] > max) {
      max = counts[word];
      maxWord = word;
    }
  }
  return maxWord;
};
JavaScript
복사

Solution

1. 리스트 컴프리헨션, Counter 객체 사용

동일하게 정규표현식을 사용해서 잘라준다. [^a-zA-Z]가 아니라 [^\w] (알파벳과 언더스코어가 아닌 문자)를 사용했다.

한줄에 이어서 작성한 리스트 컴프리헨션을 사용했는데 js에서 동일하게 해보면 아래와 같다고 생각한다.

const words = p.split(/[^a-zA-Z]/).filter(w => w.length > 0);
const counts = {};
for (const word of words) {
  const lower = word.toLowerCase();
  if (!banned.includes(lower)) counts[lower] = counts[lower] ? counts[lower] + 1 : 1;
}
=> 
const counts = p
    .split(/[^a-zA-Z]/)
    .filter(w => w.length > 0)
    .reduce((counts, w) => {
      const lower = w.toLowerCase();
      if (banned.includes(lower)) return counts;
      counts[lower] = counts[lower] ? counts[lower] + 1 : 1;
      return counts;
    }, {});
JavaScript
복사

파이썬에서는 딕셔너리로 개수를 센다.

collections.Counter(words)를 하면 첫번째 인덱스의 키가 가장 흔한 단어가 된다.

js에서는 라인을 좀더 줄여보려면 아래와 같이 바꿔볼 수 있다고 생각한다.

hashmap이라면 hashmap.entries() 메서드를 사용해서 스프레드문법을 통해 배열화한다음 sort하면 된다.

하지만 나는 counts 객체로 만들었으므로 Object.entries를 사용해서 배열을 나열하고 정렬했다.

	let max = 0;
  let maxWord = '';
  for (const word in counts) {
    if (counts[word] > max) {
      max = counts[word];
      maxWord = word;
    }
  }
  return maxWord;
=>
return Object.entries(counts).sort((a, b) => b[1] - a[1])[0][0];
JavaScript
복사

파이써닉한 방식의 풀이를 js로 바꿔보니 코드라인이 확 줄어들었다. (return Object.entries~~ 부분은 따로 선언하는 것이 가독성에 더 좋지만 Object.entries안에 counts에 할당하는 값을 그대로 넣어도 결과는 같다.)

final code

const mostCommonWord = function (p, banned) {
  const counts = p
    .split(/[^a-zA-Z]/)
    .filter(w => w.length > 0)
    .reduce((counts, w) => {
      const lower = w.toLowerCase();
      if (banned.includes(lower)) return counts;
      counts[lower] = counts[lower] ? counts[lower] + 1 : 1;
      return counts;
    }, {});
  return Object.entries(counts).sort((a, b) => b[1] - a[1])[0][0];
};

test('TC1', () => {
  expect(mostCommonWord('Bob hit a ball, the hit BALL flew far after it was hit.', ['hit'])).toStrictEqual('ball');
});
JavaScript
복사