Problem of the Day
A new programming or logic puzzle every Mon-Fri

Musical Chorus

Music is one of the things that unites us all. Let's see if we can tap in to that connection with a quick program. Given an input file that contains song lyrics for a song, find the most common set of lyrics aka the chorus. A set of lyrics is consecutive lines where those exact consecutive lines appear elsewhere in the song. If no chorus can be identified print out a funny message. If there is a tie feel free to print out either set.

This program can be a bit challenging to optimize. A naive solution is perfectly fine. The comments are always open to discussions and theories on how to solve the problem. No code required.

Permalink: http://problemotd.com/problem/musical-chorus/

Comments:

  • Keemz - 10 years, 8 months ago

    A very naive solution:

    streaks = []
    
    def recordDups(i, j):
            keepGoing = True
            streak = tokens[i]
            counter = 1
            while (keepGoing):
                    if tokens[i+counter] == tokens[j+counter]:
                            streak = streak + ' ' + tokens[i+counter]
                            counter = counter+1;
                    else:
                            keepGoing = False
            streaks.append(streak)
    
    file = open('lyrics.txt')
    
    tokens = []
    for line in file:
            tokens += line.split(" ")
    
    
    for i in range(0,len(tokens)-1):
            for j in range(i+1,len(tokens)-1):
    
                    # streak start
                    if tokens[i] == tokens[j]:
                            recordDups(i,j)
    
    chorus = max(streaks, key=len)
    print chorus
    
    

    reply permalink

  • gulliver - 10 years, 8 months ago

    So we want the longest set of consecutive lines that is repeated? As opposed to finding the set of consecutive lines that is repeated the most in the song (regardless of the length of the set)?

    reply permalink

  • Max Burstein - 10 years, 8 months ago

    No, you are correct. Sorry if the wording on that is a bit weird. You want the set of consecutive lines that is repeated the most in the song (regardless of the length of the set).

    reply permalink

  • gulliver - 10 years, 8 months ago

    Cool problem! Here's my attempt - hoping I understood it correctly :)

    def musical_chorus(filename):
        lyrics = read_lyric_file(filename)
        pairs = {}
        most_common_count = 1
        most_common_pair = lyrics[0] + lyrics[1]
    
        for i in range(len(lyrics)-1):
            pair = lyrics[i] + lyrics[i+1]
            if pair in pairs:
                pairs[pair].append(i)
                if len(pairs[pair]) > most_common_count:
                    most_common_count = len(pairs[pair])
                    most_common_pair = pair
            else:
                pairs[pair] = [i]
    
        index = pairs[most_common_pair]
        chorus = [lyrics[index[0]],lyrics[index[0]+1]]
    
        # see if we can extend the chorus further than the pair of lines
        counter = 2
        condition = True
        while condition:
            if index[-1] + counter >= len(lyrics):
                break
            temp_string = lyrics[index[-1] + counter]
            for i in range(0,most_common_count-1):
                if lyrics[index[i]+counter] != temp_string:
                    condition = False
                    break
            if condition == True:
                chorus.append(temp_string)
                counter += 1
        for i in chorus:
            print i
    
    def read_lyric_file(filename):
        lines = []
        with open(filename) as FH:
            for line in FH:
                lines.append(line.rstrip())
        return lines
    

    reply permalink

  • Adam - 10 years, 8 months ago

    Here's my attempt at a Haskell version. It can probably be done a lot better...

    module MusicalChorus where
    
    import Data.List
    import Data.Function
    
    main :: IO ()
    main = interact findFileChorus
    
    findFileChorus :: String -> String
    findFileChorus = unlines . chorus . filter (not . null) . lines
    
    -- flip is used to sort in descending order
    chorus :: Ord a => [a] -> [a]
    chorus = reverse . fst . head . sortBy (flip longestMostCommon) . counts . allStreaks
    
    -- Have to take the longest out of all of the most common sequences,
    -- as sub-sequences within the most common sequence will also be included
    longestMostCommon :: ([a], Int) -> ([a], Int) -> Ordering
    longestMostCommon (aStreak, aCount) (bStreak, bCount)
        | aCount > bCount = GT
        | aCount < bCount = LT
        | aCount == bCount = (compare `on` length) aStreak bStreak
    
    -- Convert a list to a list of tuples of items and their count
    counts :: Ord a => [a] -> [(a, Int)]
    counts = map (\s -> (head s, length s)) . group . sort
    
    -- Return a list of all repeated streaks in a list
    allStreaks :: Ord a => [a] -> [[a]]
    allStreaks = concat . filter (not . null) . map matchingStreaks . tails
    
    -- For a given list, find all previous sections in the list that match
    -- at the head of the list for at least two elements
    matchingStreaks :: Eq a => [a] -> [[a]]
    matchingStreaks xs = filter ((> 1) . length) . map (matchingHead xs) . drop 1 $ tails xs
    
    -- Return items from two lists while they are equal
    matchingHead :: Eq a => [a] -> [a] -> [a]
    matchingHead [] _ = []
    matchingHead _ [] = []
    matchingHead (a:as) (b:bs)
        | a == b = a : matchingHead as bs
        | otherwise = []
    

    reply permalink

  • Erik - 10 years, 8 months ago

    Hi, first time posting. Love this website by the way. I've solved a number of them, but this is the first where I felt I had a decent approach.

    from operator import itemgetter
    from itertools import tee, izip
    
    def find_chorus(lyrics_file):
        lines = lyrics_to_lines(lyrics_file)
    
        frequencies, max_freq = frequency_for_length(lines, 1)
    
        if max_freq == 1:
            return None
    
        i = 2
        while True:
            new_frequencies, new_max = frequency_for_length(lines, i)
    
            if new_max < max_freq:
                break
            else:
                i += 1
                frequencies = new_frequencies
    
        return frequencies[0][0]
    
    def lyrics_to_lines(lyrics_file):
        f = open(lyrics_file)
        return [line for line in f.read().splitlines() if not line == '']
    
    def frequency_for_length(lines, length):
        overlap = overlapping_tuples(lines, length)
        frequencies = line_frequencies(overlap)
        sorted_freq = sort_dict_by_value(frequencies)
        return sorted_freq, max_sorted_tuple(sorted_freq)
    
    def overlapping_tuples(iterable, tuple_length):
        shifted_tuples = []
    
        for i in xrange(tuple_length):
            shifted = iter(iterable)
    
            shift_iterator(shifted, i)
    
            shifted_tuples.append(shifted)
    
        return izip(*shifted_tuples)
    
    def shift_iterator(iterator, n):
        for i in xrange(n):
            next(iterator, None)
    
    def line_frequencies(lines):
        frequencies = {}
    
        for line in lines:
            if line in frequencies:
                frequencies[line] += 1
            else:
                frequencies[line] = 1
    
        return frequencies
    
    def sort_dict_by_value(to_sort):
        return sorted(to_sort.iteritems(), key=itemgetter(1), reverse=True)
    
    def max_sorted_tuple(tuples):
        return tuples[0][1]
    

    A run-time test shows 10 000 iterations at 2.17s on my i5-3210M.

    reply permalink

  • Max Burstein - 10 years, 8 months ago

    Hey Erik! Thanks for the kind words. Glad you decided to post.

    reply permalink

  • Kocsen - 10 years, 8 months ago

    Python solution. I essentially split lyrics by groupings (blank line) and keep track of their count in a dictionary. The highest count is the chorus!

    Worked with Happy by Williams, and a couple Bruno Mars Songs.

    def find_chorus(filename):
        # Dictionary that keeps track of repeated lines
        group = {}
        largest = 0            # Counter for group repetitions
        potential_chorus = ''  # Temp variable represents different sections
        chorus = ''            # Variable set when chorus is found
        with open(filename,'r') as song:
            for line in song.readlines():
                if line == '' or line == '\n':
                    # We now have a section. Lets check if its in our group of sections
                    if potential_chorus in group:
                        # It is! Let's ++ its count and check if its repeated the most
                        group[potential_chorus] += 1
                        if group[potential_chorus] > largest:
                            largest = group[potential_chorus]
                            chorus = potential_chorus
                    else:
                        # The section is new, let's add it
                        group[potential_chorus] = 1
    
                    # Clear group
                    potential_chorus = ''
                else:
                    potential_chorus += line
    
    
        print("I think the potential chorus is:\n" + chorus)
        print("With a count of %d", largest)
    
    if __name__ == "__main__":
        find_chorus("song.txt")
    

    reply permalink

Content curated by @MaxBurstein