šŸ¤– AI Brief: Metaā€™s text-to-audio model, Appleā€™s mobile AI plans, and a new Alibaba model

Plus: Google reconstructs music from brain waves

Today is August 8, 2023.

This weekā€™s issue arrives one day behind schedule ā€” but expect future issues to resume regular Monday drops in your inbox.

In this issue:

  • šŸ Apple ramps up generative AI recruiting as it readies its own mobile-first model

  • šŸ¤Æ New deep learning model maps acoustic data from keystrokes with 95% accuracy

  • šŸŽŠ Alibaba releases a set of open-source models to rival Metaā€™s Llama 2

  • šŸŽ¼ Meta open-sources AudioCraft, their text to audio generative AI model

  • šŸ§Ŗ And latest science experiments, including Google reconstructing music from brain scans

šŸ Apple ramps up generative AI recruiting as it readies its own mobile-first model

A deep-dive from the Financial Times (paywalled) covers how Apple is now moving aggressively on the AI front, as they ready their own generative AI models ā€“ in particular, models that can run on mobile devices.

Why it matters: OpenAI and Bard have both released models that run in the cloud. But neither have released a private language model capable of running on mobile devices, personalized for users and fully private in nature.

Driving the news:

  • Recent job postings from Apple highlight their strategy, asking AI talent to join the company and help build ā€œstate of the art foundation models to the phone in your pocket, enabling the next generation of ML-based experiences in a privacy-preserving wayā€.

  • On a recent earnings call, Apple CEO Tim Cook affirmed their AI strategy, calling AI and machine learning ā€œcore, fundamental technologies that are integral to virtually every product that we buildā€.

Appleā€™s R&D spending for the third quarter of 2023 was $3.1B higher than the same quarter last year, which Cook acknowledged was driven by heavy investments in generative AI. Most analysts expect Appleā€™s first generative AI models to see public release in 2024.

šŸ¤Æ New deep learning model maps acoustic data from keystrokes with 95% accuracy

Researchers in the UK revealed a new deep learning model capable of mapping recorded sounds of keystrokes with 95% accuracy. When Zoom-recorded sounds were used, the model still retained 93% accuracy, a startling high rate, the report revealed.

The research setup. Commonly available consumer devices were used to capture acoustic training data. Credit: arXiv

Why it matters: universal availability of microphones combined with advancements in machine learning now make it possible for adversaries to steal passwords and other sensitive information from just acoustic recordings, marking a new security risk in the digital age.

How the model was trained:

  • Researchers pressed 36 keys on a MacBook Pro 25 times each and recorded the sounds on both an iPhone 13 nearby as well as over a Zoom call

  • The keystrokes were translated into spectrogram images, which were then trained as an image classifier. Further tweaks in the modeling process produced the high-accuracy model featured in the research paper.

By converting sounds into spectrograms, an image classifier model was then trained to guess keystrokes. Credit: arXiv

Countermeasures still exist, the researchers noted. In particular, detection of shift key usage remains challenging, and keystroke sound removal in popular VOIP protocols could also ward off attacks. But the authors ultimately conclude weā€™ll need to move away from typed passwords to be truly secure.

šŸŽŠ Alibaba releases a set of open-source models to rival Metaā€™s Llama 2

Weā€™re in the early innings of open-source vs. closed-source AI, and Alibaba has now jumped into the fray with a new set of open-source models.Ā 

These models are also available on ModelScope and Hugging Face as well with code and checkpoints.

Driving the news:

  • Alibaba claims their model represents a new performance milestone: ā€œIn general, Qwen-7B outperforms the baseline models of a similar model size, and even outperforms larger models of around 13B parameters, on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, HumanEval, and WMT22, etc.ā€ they state.

  • Qwen-7B has been trained on a dataset of over 2.2 trillion tokens, including web texts, books, and codes, while supporting an 8K context length and featuring a tokenizer with over 150K tokens. The chat version (Qwen-7B-Chat) has additional features to function as an AI assistant.

Our takeaway:

  • General performance claims aside, weā€™re curious how the modelā€™s training impacts its ability to generate content on banned conversation topics in China.

  • Researchers have already highlighted how AI models can be ā€œpoisonedā€ by the smallest amounts of deliberately tuned training data, so it may take some deep-diving to uncover where this model displays certain biases.

šŸ”Ž Quick Scoops

OpenAI updates ChatGPT with new features, including prompt examples, suggested replies, and keyboard shortcuts. (Twitter)

AI data centers are growing, thanks to an ā€œinsatiable appetiteā€ for computing power, creating a rush to spin up new data centers from scratch at an unprecedented pace. (WSJ - note: paywalled)

Meta is preparing to launch chatbots with distinct personalities, including historical figures such as Abraham Lincoln and a surfer persona. (Al Jazeera)

ChatGPT is getting dumber at basic math possibly due to model drift, the Wall Street Journal reports in its own breakdown of the Stanford paper making waves in the AI community. (WSJ - note: paywalled)

Travel scammers are flooding Amazon with AI-written travel guides, often filled with incorrect information. (New York Times - note: paywalled)

AI tools are threatening the Youtube thumbnail industry, which allowed some workers to earn thousands per thumbnail image created. (Rest of World)

šŸ§Ŗ Science Experiments

Meta open-sources AudioCraft, their text to audio generative AI model

  • AudioCraft is a ā€œsimple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls.ā€

  • See their project page here.

Brain2Music: Reconstructing Music from Human Brain Activity

  • A Google team partnered with Osaka University to show the possibility of reconstructing music from brain wave activity, specifically data captured via functional magnetic resonance imaging (fMRI).

  • See their project page here.

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

  • Researchers demonstrate that zero-shot prompts with only tool documentation are sufficient for eliciting proper tool usage, achieving performance on par with few-shot prompts. This matters because crafting demos of a toolā€™s usage is significantly harder than simply documenting a tool, pointing to a world where AI can learn to use tools by themselves as long as sufficient baseline documentation exists.

  • See the study here.

šŸ˜€ A readerā€™s commentary

šŸ‘‹ How I can help

Hereā€™s other ways you can work together with me:

  • If youā€™re an employer looking to hire tech talent, my search firm (Candidate Labs) helps AI companies hire the best out there. We work on roles ranging from ML engineers to sales leaders, and weā€™ve worked with leading AI companies like Writer, EvenUp, Tome, Twelve Labs and more to help make critical hires. Book a call here.

  • If you would like to sponsor this newsletter, shoot me an email at [email protected]Ā 

As always ā€” have a great week!